2.1 The Inclusion-Exclusion Formula

Let's return to a problem we have mentioned but not solved:

Example 2.1.1 How many submultisets of the multiset $\{2\cdot a, 4\cdot b, 3\cdot c\}$ have size 7?

We recast the problem: this is the number of solutions to $x_1+x_2+x_3=7$ with $0\le x_1\le 2$, $0\le x_2\le 4$, $0\le x_3\le 3$. We know that the number of solutions in non-negative integers is ${7+3-1\choose 3-1}={9\choose 2}$, so this is an overcount, since we count solutions that do not meet the upper bound restrictions. For example, this includes some solutions with $x_1\ge 3$; how many of these are there? This is a problem we can solve: it is the number of solutions to $x_1+x_2+x_3=7$ with $3\le x_1$, $0\le x_2$, $0\le x_3$. This is the same as the number of non-negative solutions of $y_1+y_2+y_3=7-3=4$, or ${4+3-1\choose 3-1}={6\choose 2}$. Thus, ${9\choose 2}-{6\choose 2}$ corrects this overcount. If we likewise correct for the overcounting of solutions with $x_2\ge 5$ and $x_3\ge 4$, we get ${9\choose 2}-{6\choose 2}-{4\choose 2}-{5\choose 2}$. Is this correct? Not necessarily, because we now have a potential undercount: we have twice subtracted 1 for a solution in which both $x_1\ge 3$ and $x_2\ge 5$, when we should have subtracted just 1. However, by good fortune, there are no such solutions, since $3+5>7$. But the same applies to the other pairs of variables: How many solutions have $x_1\ge 3$ and $x_3\ge 4$? It's easy to see there is only one such solution, namely $3+0+4=7$. Finally, there are no solutions with $x_2\ge 5$ and $x_3\ge 4$, so the corrected count is now ${9\choose 2}-{6\choose 2}-{4\choose 2}-{5\choose 2}+1$. This does not take into account any solutions in which $x_1\ge 3$, $x_2\ge 5$, and $x_3\ge 4$, but there are none of these, so the actual count is $${9\choose 2}-{6\choose 2}-{4\choose 2}-{5\choose 2}+1= 36-15-6-10+1=6.$$ This is small enough that it is not hard to verify by listing all the solutions. $\square$

So we solved this problem, but it is apparent that it could have been much worse, if the number of variables were larger and there were many complicated overcounts and undercounts. Remarkably, it is possible to streamline this sort of argument; it will still, often, be quite messy, but the reasoning will be simpler.

Let's start by rephrasing the example. Let $S$ be the set of all non-negative solutions to $x_1+x_2+x_3=7$, let $A_1$ be all solutions with $x_1\ge 3$, $A_2$ all solutions with $x_2\ge 5$, and $A_3$ all solutions with $x_3\ge 4$. We want to know the size of $A_1^c\cap A_2^c\cap A_3^c$, the solutions for which it is not true that $x_1\ge 3$ and not true that $x_2\ge 5$ and not true that $x_3\ge 4$. Examining our solution, we see that the final count is $$\eqalign{ |S|-|A_1|&-|A_2|-|A_3|+|A_1\cap A_2|+|A_1\cap A_3|+|A_2\cap A_3|- |A_1\cap A_2\cap A_3|\cr &=36-15-6-10+0+1+0-0.\cr }$$ This pattern is completely general:

Theorem 2.1.2 (The inclusion-exclusion formula) If $A_i\subseteq S$ for $1\le i\le n$ then $$|A_1^c\cap\cdots\cap A_n^c|= |S|-|A_1|-\cdots-|A_n|+|A_1\cap A_2|+\cdots-|A_1\cap A_2\cap A_3|-\cdots,$$ or more compactly: $$|\bigcap_{i=1}^n A_i^c|=|S|+\sum_{k=1}^n (-1)^k \sum|\bigcap_{j=1}^k A_{i_j}|,$$ where the internal sum is over all subsets $\{i_1,i_2,\ldots,i_k\}$ of $\{1,2,\ldots,n\}$. Alternately we may write $$|\bigcap_{i=1}^n A_i^c|= \sum_{I\subseteq [n]}(-1)^{|I|}|\bigcap_{j\in I} A_{j}|.$$ (Note that $\ds\bigcap_{j\in \emptyset} A_{j}=S$.)

Proof. We need to show that each element of $\bigcap_{i=1}^n A_i^c$ is counted once by the right hand side, and every other element of $S$ is counted zero times. The first of these is easy: if $x\in \bigcap_{i=1}^n A_i^c$ then for every $i$, $x\notin A_i$, so $x$ is in none of the sets involving the $A_i$ on the right hand side, and so $x$ is counted, once, by the term $|S|$.

Now suppose $x\notin \bigcap_{i=1}^n A_i^c$. On the right hand side, $x$ is counted once by the term $|S|$. For some values $i_1,i_2,\ldots,i_k$, $x\in A_{i_m}$, $1\le m\le k$, and $x$ is not in the remaining sets $A_i$. Then $x$ is counted zero times by any term involving an $A_i$ with $i\notin \{i_1,i_2,\ldots,i_k\}$, and is counted once, positively or negatively, by each term involving only $A_{i_1},A_{i_2},\ldots,A_{i_k}$. There are $k$ terms of the form $-|A_{i_m}|$, which count $x$ a total of $-k$ times. There are $k\choose 2$ terms of the form $|A_{i_l}\cap A_{i_m}|$, counting $x$ a total of $k\choose 2$ times. Continuing in this way, we see that the final count for $x$ on the right hand side is $$1-k+{k\choose 2}-{k\choose 3}+\cdots+(-1)^k{k\choose k},$$ or more compactly $$\sum_{i=0}^k (-1)^i{k\choose i}.$$ We know that this alternating sum of binomial coefficients is zero, so $x$ is counted zero times, as desired. (See equation 1.3.1.) $\qed$

An alternate form of the inclusion exclusion formula is sometimes useful.

Corollary 2.1.3 If $A_i\subseteq S$ for $1\le i\le n$ then $$|\bigcup_{i=1}^n A_i|=\sum_{k=1}^n (-1)^{k+1} \sum|\bigcap_{j=1}^k A_{i_j}|,$$ where the internal sum is over all subsets $\{i_1,i_2,\ldots,i_k\}$ of $\{1,2,\ldots,n\}$.

Proof. Since $(\bigcup_{i=1}^n A_i)^c=\bigcap_{i=1}^n A_i^c$, $$\eqalign{ |\bigcup_{i=1}^n A_i|&=|S|-|\bigcap_{i=1}^n A_i^c|\cr &=|S|-(|S|+\sum_{k=1}^n (-1)^k \sum|\bigcap_{j=1}^k A_{i_j}|)\cr &=(-1)\sum_{k=1}^n (-1)^k \sum|\bigcap_{j=1}^k A_{i_j}|\cr &=\sum_{k=1}^n (-1)^{k+1}\sum|\bigcap_{j=1}^k A_{i_j}|. }$$ $\qed$

Since the right hand side of the inclusion-exclusion formula consists of $2^n$ terms to be added, it can still be quite tedious. In some nice cases, all intersections of the same number of sets have the same size. Since there are $n\choose k$ possible intersections consisting of $k$ sets, the formula becomes $$\eqalignno{ |\bigcap_{i=1}^n A_i^c|&=|S|+\sum_{k=1}^n (-1)^k{n\choose k}m_k,& (2.1.1)\cr }$$ where $m_k$ is the size of an intersection of $k$ of the sets.

Example 2.1.4 Find the number of solutions to $x_1+x_2+x_3+x_4=25$, $0\le x_i\le 10$. Let $A_i$ be the solutions of $x_1+x_2+x_3+x_4=25$ with $x_i\ge 11$. The number of solutions with $x_i\ge 0$ for all $i$ is ${25+4-1\choose 4-1}={25+3\choose 3}$. Also $|A_i|={14+3\choose 3}$, and $|A_i\cap A_j|={3+3\choose 3}$. There are no solutions with 3 or 4 of the variables larger than 10. Hence the number of solutions is $${25+3\choose 3}-{4\choose 1}{14+3\choose 3}+{4\choose 2}{3+3\choose 3}=676.$$ $\square$

Exercises 2.1

Ex 2.1.1 List all 6 solutions to the restricted equation in example 2.1.1, and list the corresponding 6 submultisets.

Ex 2.1.2 Find the number of integer solutions to $x_1+x_2+x_3+x_4=25$, $1\le x_1\le6$, $2\le x_2\le 8$, $0\le x_3\le8$, $5\le x_4\le9$.

Ex 2.1.3 Find the number of submultisets of $\{25\cdot a,25\cdot b,25\cdot c,25\cdot d\}$ of size 80.

Ex 2.1.4 Recall that $\stwo{n}{k}$ is a Stirling number of the second kind (definition 1.8.1). Prove that for $n\ge k\ge 0$, $$ \stwo{n}{k}={1\over k!}\sum_{i=0}^k (-1)^{k-i}i^n{k\choose i}. $$ Do $n=0$ as a special case, then use inclusion-exclusion for the rest. You may assume, by convention, that $0^0=1$.

Collapse menu

1 Fundamentals

2 Inclusion-Exclusion

3 Generating Functions

4 Systems of Distinct Representatives

5 Graph Theory

6 Pólya–Redfield Counting

Exercises 2.1