In some applications, a graph $G$ is augmented by associating a weight or cost with each edge; such a graph is called a weighted graph. For example, if a graph represents a network of roads, the weight of an edge might be the length of the road between its two endpoints, or the amount of time required to travel from one endpoint to the other, or the cost to bury cable along the road from one end to the other. In such cases, instead of being interested in just any spanning tree, we may be interested in a least cost spanning tree, that is, a spanning tree such that the sum of the costs of the edges of the tree is as small as possible. For example, this would be the least expensive way to connect a set of towns by a communication network, burying the cable in such a way as to minimize the total cost of laying the cable.
This problem is one that can be solved by a greedy algorithm. Roughly speaking, a greedy algorithm is one that makes choices that are optimal in the short run. Typically this strategy does not result in an optimal solution in the long run, but in this case this approach works.
Definition 5.6.1 A weighted graph is a graph $G$ together with a cost function $c\colon E(G)\to \R^{>0}$. If $H$ is a subgraph of $G$, the cost of $H$ is $c(H)=\sum_{e\in E(H)} c(e)$. $\square$
The Jarník Algorithm. Given a weighted connected graph $G$, we construct a minimum cost spanning tree $T$ as follows. Choose any vertex $v_0$ in $G$ and include it in $T$. If vertices $S=\{v_0, v_1,\ldots,v_k\}$ have been chosen, choose an edge with one endpoint in $S$ and one endpoint not in $S$ and with smallest weight among all such edges. Let $v_{k+1}$ be the endpoint of this edge not in $S$, and add it and the associated edge to $T$. Continue until all vertices of $G$ are in $T$.
This algorithm was discovered by Vojtěch Jarník in 1930, and rediscovered independently by Robert C. Prim in 1957 and Edsger Dijkstra in 1959. It is often called Prim's Algorithm.
The algorithm proceeds by constructing a sequence of trees $T_1, T_2,\ldots,T_{n-1}$, with $T_{n-1}$ a spanning tree for $G$. At each step, the algorithm adds an edge that will make $c(T_{i+1})$ as small as possible among all trees that consist of $T_i$ plus one edge. This is the best choice in the short run, but it is not obvious that in the long run, that is, by the time $T_{n-1}$ is constructed, that this will turn out to have been the best choice.
Proof. Suppose $G$ is connected on $n$ vertices. Let $T$ be the spanning tree produced by the algorithm, and $T_m$ a minimum cost spanning tree. We prove that $c(T)=c(T_m)$.
Let $e_1, e_2,\ldots,e_{n-1}$ be the edges of $T$ in the order in which they were added to $T$; one endpoint of $e_i$ is $v_i$, the other is in $\{v_0,\ldots,v_{i-1}\}$. We form a sequence of trees $T_m=T_0, T_1,\ldots, T_{n-1}=T$ such that for each $i$, $c(T_i)=c(T_{i+1})$, and we conclude that $c(T_m)=c(T)$.
If $e_1$ is in $T_0$, let $T_1=T_0$. Otherwise, add edge $e_1$ to $T_0$. This creates a cycle containing $e_1$ and another edge incident at $v_0$, say $f_1$. Remove $f_1$ to form $T_1$. Since the algorithm added edge $e_1$, $c(e_1)\le c(f_1)$. If $c(e_1)< c(f_1)$, then $c(T_1)< c(T_0)=c(T_m)$, a contradiction, so $c(e_1)=c(f_1)$ and $c(T_1)=c(T_0)$.
Suppose we have constructed tree $T_i$. If $e_{i+1}$ is in $T_i$, let $T_{i+1}=T_i$. Otherwise, add edge $e_{i+1}$ to $T_i$. This creates a cycle, one of whose edges, call it $f_{i+1}$, is not in $e_1, e_2,\ldots,e_i$ and has exactly one endpoint in $\{v_0,\ldots,v_i\}$. Remove $f_{i+1}$ to create $T_{i+1}$. Since the algorithm added $e_{i+1}$, $c(e_{i+1})\le c(f_{i+1})$. If $c(e_{i+1})< c(f_{i+1})$, then $c(T_{i+1})< c(T_i)=c(T_m)$, a contradiction, so $c(e_{i+1})=c(f_{i+1})$ and $c(T_{i+1})=c(T_i)$. $\qed$
Exercises 5.6
Ex 5.6.1 Kruskal's Algorithm is also a greedy algorithm that produces a minimum cost spanning tree for a connected graph $G$. Begin by choosing an edge in $G$ of smallest cost. Assuming that edges $e_1, e_2,\ldots,e_i$ have been chosen, pick an edge $e_{i+1}$ that does not form a cycle together with $e_1, e_2,\ldots,e_i$ and that has smallest cost among all such edges. The edges $e_1, e_2,\ldots,e_{n-1}$ form a spanning tree for $G$. Prove that this spanning tree has minimum cost.
Ex 5.6.2 Prove that if the edge costs of $G$ are distinct, there is exactly one minimum cost spanning tree. Give an example of a graph $G$ with more than one minimum cost spanning tree.
Ex 5.6.3 In both the Jarník and Kruskal algorithms, it may be that two or more edges can be added at any particular step, and some method is required to choose one over the other. For the graph below, use both algorithms to find a minimum cost spanning tree. Using the labels $e_i$ on the graph, at each stage pick the edge $e_i$ that the algorithm specifies and that has the lowest possible $i$ among all edges available. For the Jarník algorithm, use the designated $v_0$ as the starting vertex. For each algorithm, list the edges in the order in which they are added. The edge weights $e_1,e_2,\ldots,e_{10}$ are $6,7,8,2,3,2,4,6,1,1$, shown in red.