Clique density bounds: powerful tools from extremal graph theory

Attention conservation notice: Making people aware of Razborov’s sharp lower bound on the density of triangles in graphs of edge density {p}. Clearing up some things about the Moon-Moser theorem, a non-sharp lower bound on the density of cliques in a graph of edge density {p}, which is not well documented in the literature. Written for readers with a basic (but not expert) understanding of graph theory.

Razborov’s bound and the Moon-Moser bound

This is the story of a hydraulic jackhammer due to Razborov and the simple mallet of Moon-Moser that came before it. These two bounds provide very similar results, but one comes with an instruction manual several hundred pages thick, and might be more than the average graph thresher needs.

In our recent paper on subgraph frequencies [PDF], my co-authors and I took a deep dive into the literature on dense graphs. We were asking: what properties of social graphs follow from properties of graphs, and what properties follow from properties of people? One of the key tools we used came from extremal graph theory for dense graphs.

Our entry into this literature was driven by Jon‘s awareness of an impressive recent result by Alexander Razborov on the minimal density of triangles [PDF] in graphs of a given edge density {p}. Razborov showed that for a fixed edge density {p \in [0,1]}, the asymptotically minimal density of triangles is given by:

t(K_3,G) \ge f_3(p) = \frac{(t-1)(t-2\sqrt{t(t-p(t+1))})(t+\sqrt{t(t-p(t+1))})^2}{t^2(t+1)^2},

where t = \lfloor 1/(1-p) \rfloor. Here {t(K_3,G)} is the homomorphism density of the the triangle {K_3} in {G}. For cliques, the homomorphism density can be understood as exactly equal to the subgraph density — the proportion of subgraphs of a graph that are isomorphic to the clique. For subgraphs other than the clique the relationship is more complicated (see Section 4.1 of our subgraph frequencies paper), but the difference is not relevant here.

So: if you’ve ever wondered, `how few triangles can a graph have at a given edge density {p}?’, you now have your answer. This is a crazy result, very fundamental, and builds upon a rich new theory of flag algebras [PDF] that Razborov developed specifically to answer questions of this sort.

Within the context of social networks, triangles are everywhere. There are many sociological theories to back this up, and triangles are at the heart of balance theory, theories of transitivity and triadic closure, and the study of clustering coefficients. Razborov’s result establishes a mathematical lower bound on how few triangles there can possibly be in any graph, not just social graphs.

Parsing Razborov’s proof of the above bound is no small task. When I began my dive into this literature, however, I discovered that there’s actually a much earlier, much simpler lower bound proven by Moon and Moser in 1962. This bound is not sharp, but for triangles it is remarkably close to Razoborov’s sharp bound, and moreover it provides bounds for the density of arbitrarily large {r}-node cliques, not just the triangle (the 3-node clique). The Moon-Moser bound says that for a fixed edge density {p \in [(r-2)/(r-1),1]}, the asymptotically minimal density of triangles is given by:

t(K_r,G) \ge g_r(p) =\prod_{i=1}^{r-1} (1-i(1-p)), \hspace{.2in} \text{for } p \ge (r-2)/(r-1).

For {r=3} this bound is simply {g_3(p) = p(2p-1)} for {p\ge 1/2}. For triangles, this bound is therefore eons simpler than the Razborov bound above. How close are these two functions? See here:


We see here that the Razborov lower bound is a slim improvement (albiet important!) that requires a lot of added machinery. Parsing the full details of the Razborov result would take great effort, but I wondered, when exploring this literature, if I might not have a chance of understanding where the Moon-Moser bound came from? It turns out that this is not a well documented result, and the original 1962 paper offers just a recursive relation between the number of cliques of size {r+1} and the number of cliques of size {r}. Below I give some missing steps on where this polynomial lower bound comes from that appear to be poorly documented elsewhere.

Importantly, Lovász gives an incorrect version of the closed form formulation in (Theorem 9.3 in Very Large Graphs [PDF], 2008): “If a graph on {n} nodes has {a{n \choose 2}} edges {(0 \le a \le 1)}, then it contains at least {a(2a-1)\ldots((k-2)a-(k-1)){n \choose k}} complete k-graphs.” That formula is unclear (how would {k} proceed?) and also omits the proper restriction on edge density {a}. Lovász is no doubt aware of these restrictions, but mere mortals trying to dive in to the literature on dense graphs may stumble as I did. My hope in documenting this derivation is that someone doing research involving the Moon-Moser bound might find it useful.

From recurrence relation to closed form

Let {G} by a graph with {n} nodes, and let {N_r} denote the number of complete subgraphs of {r} nodes. Then the Moon-Moser Theorem (Moon, Moser. On a problem of Turán, 1962) says:

\frac{N_{r+1}}{N_r} \ge \frac{1}{r^2-1} \left ( r^2 \frac{N_r}{N_{r-1}} - n \right ).

Using the language of graph homomorphisms (see Section 7.2 of Borgs et al., Counting graph homomorphisms [PDF], 2006), we can use that {N_r = t(K_r,G)n^r/r!}, where {t(F,G)} is the homomorphism density of {F} in {G}. This gives us:

\frac{t(K_{r+1},G)}{t(K_{r},G)} \ge \frac{1}{r-1} \left ( r \frac{t(K_{r},G)}{t(K_{r-1},G)} - 1 \right ).

From this we seek to derive the closed form polynomial lower bound {g_r(p)} given earlier:

t(K_r,G) \ge g_r(p) =\prod_{i=1}^{r-1} (1-i(1-p)), \hspace{.2in} \text{for } p \ge (r-2)/(r-1).

Where does this polynomial lower bound come from? Where does the restriction of {p} come from? One could turn to the mathoverflow community for help with this, but I was able to work this out myself, it turned out to not be that bad.

First, some intuition building based on Turán graphs. Recall that the {K_{r}}-free Turán graph has asymptotic (in {n}) edge density {\frac{r-2}{r-1}}. As a result, we know that the lower bound on the density of {K_r} is zero for {p \le \frac{r-2}{r-1}}: there doesn’t have to be any {r}-cliques at all at those low edge densities. Now, it’s clear that the polynomial {g_r(p)} above has roots at {(0, \frac{1}{2}, \ldots, \frac{r-2}{r-1})}. This means that for {r \ge 4}, the polynomial function must cross above zero and be strictly positive for some values of {p \in (0, \frac{r-2}{r-1})}, a fact that is clearly contradicted by the {K_{r}}-free Turán graph. This tell us that we really do need to be careful about the range of {p} in our derivation of the polynomial.

Now, the derivation. Begin with the recursive formulation of Moon-Moser in (4) above. Set {T_{r+1} = \frac{t(K_{r+1},G)}{t(K_r,G)}}, and notice that

\displaystyle  T_{r+1} \ge \frac{r}{r-1} T_{r} - \frac{1}{r-1}

is a first-order linear inhomogeneous recurrence relation with variable coefficients. Importantly, note that the coefficients have unchanging sign. Thus, with some work it can be written as

\displaystyle  T_{r+1} \ge r T_2 - (r-1),

where {T_2 = \frac{t(K_2,G)}{t(K_1,G)} = t(K_2,G)}. Thus:

\displaystyle  t(K_{r+1},G) \ge t(K_r,G) (1-r(1-t(K_2,G))). \hspace{1in} (4)

This itself is a nice tidy homomorphism inequality. Note that {t(K_2,G)=p}, and the recurrence can be written as:

\displaystyle  t(K_r,G) \ge t(K_{r-1},G) ((r-1)p - (r-2)). \hspace{1in} (5)

This is where restrictions on {p} come in to play: for a fixed {p}, we again have a first-order recurrence relation with variable coefficients, but now the variable coefficients have the potential to be variably negative or positive, and we must be very careful about how we divide by negative terms w.r.t. the inequality.

Since Turán’s Theorem tells us that the lower bound {t(K_r,G) \ge 0} is sharp for {p < \frac{r-2}{r-1}}, we can fortunately restrict our analysis of the recurrence to {p \ge \frac{r-2}{r-1}}, at which point the variables of the recurrence are all non-negative, and we obtain the closed polynomial lower bound as stated in the theorem.

Derivations of the initial recurrence relation is fortunately better documented in the literature, and I won’t give that here. Hopefully putting together this note on the internet is useful to someone, somewhere, at some point.