2.5 2.4 Minimum-Cost Spanning Trees

52

The TSP is but one of many graph theory optimization problems that have grown out of real-world problems in both government and industry. Here is another.

EXAMPLE 7 Cable TV Service

Imagine that cable TV (CT) service will be set up on one large university campus. The graph in Figure 2.12 shows the possible links that might be included in the CT network, with each edge showing the cost in hundreds of thousands of dollars to create that particular link. To send video between two locations, a direct cable link is not necessary because it is possible to send a video indirectly via another site. Thus, in Figure 2.12, sending a video from A to C could be achieved by sending the video from A to B, from B to E, and from E to C, provided the links AB, BE, and EC are part of the network. We assume that the cost of relaying a video, compared with the cost of the direct communication link, is so small that we can neglect this amount. The problem that concerns us, therefore, is to provide service between any pair of locations in a way that minimizes the total cost of the links.

image
When students are given access to local cable TV networks, minimizing the cost of creating such networks might involve the techniques of finding a minimum-cost spanning tree.
image
Figure 2.12: Figure 2.12 Costs of installing cable service among five locations.

53

Our first guess at a solution is to put in the cheapest possible links between locations first, until all sites could send video to any other site. Such an approach is analogous to the sorted-edges method that was used to study the traveling salesman problem. in our example, if the cheapest links are added until all locations are joined, we obtain the connections shown in Figure 2.13a.

image
Figure 2.13: Figure 2.13 (a) Sites are linked in order of increasing cost until all locations are connected. (b) Circuit in part (a) highlighted. (c) Most expensive link in circuit in part (a) deleted. (d) Highlighted edges show, as a subgraph of the original graph, those links connecting the locations with minimum cost, obtained using Kruskal’s algorithm. The numbers show the order in which Kruskal’s algorithm selects the edges.

The links were added in the order ED,AD,AE,AB,DC. However, because this graph contains the circuit ADEA (wiggly edges in Figure 2.13b), it has redundant edges: We can still send videos between any pair of sites using relays after omitting the most expensive edge in the circuit—AE. After an edge of a circuit is deleted, a video can still be relayed among the sites of the circuit by sending signals the long way around. After AE is deleted, videos going from A to E can be sent via D (Figure 2.13c). These ideas constitute a procedure developed by Joseph Kruskal (1928-2010) in 1956.

Kruskal’s Algorithm PROCEDURE

Kruskal’s algorithm: Add links in order of cheapest cost so that no circuits form and so that every vertex belongs to some link added (Figure 2.13d).

In Kruskal’s procedure, as in the sorted-edges method for the TSP, the edges that are added need not be connected to each other until the end. A subgraph formed in this way is a tree; that is, it consists of one piece and contains no circuits. it also includes all the vertices of the original graph. A subgraph that is a tree and that contains all the vertices of the original graph is called a spanning tree of the original graph.

To understand these concepts better, consider the graph G in Figure 2.14a. The wiggly edges in Figure 2.14b constitute a subgraph of G that is a tree (because it is connected and has no circuit), but this tree is not a spanning tree of G because the vertices D and E are not included. On the other hand, the wiggly edges in Figure 2.14c and 2.14d show subgraphs of G that include all the vertices of G but are not trees because the first is not connected and the second contains a circuit. Figure 2.14e shows a spanning tree of G; the wiggly edges are connected and contain no circuit, and every vertex of the original graph is an endpoint of some wiggly edge.

54

image
Figure 2.14: Figure 2.14 (a) A graph to help illustrate the concept of a spanning tree. (b) The wiggly edges are a tree, but not a spanning tree, because vertices D and E are not part of the tree. (c) The wiggly edges are not a tree because they are not connected. All of the vertices of the graph are, however, endpoints of wiggly edges. (d) The wiggly edges are not a tree, because they contain the edges of the circuit BDCAB. All the vertices of the graph are, however, endpoints of wiggly edges. (e) The wiggly edges form a tree and include all of the vertices of the graph as endpoints of wiggly edges. Thus, the wiggly edges are a spanning tree.

Self Check 3

Apply Kruskal’s algorithm to find a minimum-cost spanning tree for the graph in Figure 2.3 (page 43). What is the cost associated with the tree you find?

  • The edges to form the minimum-cost spanning are added in the order St. Louis–Chicago, Chicago–Cleveland, and Chicago–Minneapolis. The cost for the tree is 1074.

Finding a minimum-cost spanning tree—that is, a spanning tree whose edge weights sum to a minimum value—solves the cable TV problem. Note that having a different goal in the cable TV problem led to a different mathematical question from that of finding a Chinese postman tour or TSP tour. This application required that we find a minimum-cost spanning tree. In Figure 2.15a, we have a graph model showing the costs of putting in roads to connect new houses in a suburban land-development project. Applying Kruskal’s algorithm— adding the edges in the order of increasing cost, but avoiding the creation of a circuit—yields a minimum-cost spanning tree, indicated by the wiggly edges in Figure 2.15b. This tree is the cheapest one that makes it possible to drive between any pair of homes, though the driving distance between some of the homes will be relatively large because only roads corresponding to wiggly edges will be built.

55

image
Figure 2.15: Figure 2.15 (a) A graph showing costs for construction of roads between houses. (b) Wiggly edges show a minimum-cost spanning tree for the graph in part (a).

Remember that the weights on the edges of the graph in Figure 2.15a represent the costs of building roads, not the driving distances between the houses. Note that Figure 2.15a is not a complete graph, one in which all possible edges are included. Edges that correspond to roads that would be economically prohibitive to build have not been shown in the graph model. Also, in Figure 2.15b, the two edges of weight 5 (shown in Figure 2.15a) do not become part of the minimum-cost spanning tree because they would create circuits with edges already chosen.

Although Kruskal’s algorithm worked in our example, how do we know that the spanning tree found by this algorithm will always achieve the minimum possible cost? While this sounds very plausible, our experience with the TSP should suggest caution. Remember that for the TSP, the sorted-edges algorithm did not necessarily give an optimal solution even though it is a greedy algorithm like Kruskal’s. On what basis should we have more faith in Kruskal’s algorithm?

Kruskal proposed his algorithm as a way to solve a pure mathematics problem put forward by Czechoslovakian mathematician Otakar Borůvka. In mathematics, it is surprising but not uncommon to find that ideas used to solve problems with no apparent application often turn out to have many real-world uses. Kruskal’s solution to the problem of finding a minimum-cost spanning tree in a graph with weights is a good example of this phenomenon.

Kruskal showed that the greedy algorithm described does yield the minimum answer, and his work led to applications of these and related ideas in designing minimum-cost computer and cable TV networks, phone connections, sewer systems, and road and railway systems. For additional discussion of operations research in the communications industry, see Spotlight 2.2. To explore how one can reconstruct full information from partial information using the tree concept, see Spotlight 2.3.

In our discussion of routing problems in graphs, we have not touched on one of the most obvious ones: finding the path between two specified, distinct vertices while keeping the sum of the weights of the edges in the path as small as possible. (Here there is no need to cover all vertices or to cover all edges.) We have seen that the weights on the edges have many possible interpretations, including time, distance, and cost. The following are some of the many possible applications:

56

AT&T Manager Explains How Long-Distance Calls Run Smoothly Spotlight 2.2

Although long-distance calls are now routine, it takes great expertise and careful planning for a company like AT&T to handle its vast amounts of telephone traffic. Rich Wetmore was district manager of AT&T’s Communications Network Operations Center in Bedminster, New Jersey. Here are his responses to questions about how AT&T handles its huge volumes of long-distance traffic and how it tracks its operations to keep things running smoothly.

How do you make sure that a customer doesn’t run into a delayed signal when attempting a long-distance call?

We monitor the performance of our AT&T network by displaying data collected from all over the country on a special wallboard. The wallboard is configured to tell us if a customer’s call is not going through because the network doesn’t have enough capacity to handle it.

That’s when we step in and take control to correct the problem. The typical control we use is to reroute the call. instead of sending the customer’s call directly to its destination, we’ll route it via a third city—to someplace else in the country that has the capacity to complete the call.

It would seem that routing via another city would take longer. Is the customer aware of this process?

Routing a call via a third city is entirely transparent [imperceptible] to the customer. I’m an expert about the network, and even when i make a phone call, I have no idea how that individual call was routed. it’s transparent both in terms of how far away the other person sounds and in how quickly the telephone call gets set up. With the signaling network we use, it takes milliseconds for switching systems to “talk” to each other to set up a call. So the fact that you are involved in a third switch in some distant city is something you would never know.

You want to be sure to keep costs down while supplying enough service to customers. So how do you balance company benefits with customer benefit?

In terms of making the network efficient, we want to do two things. First, we want our customers to be happy with our service and for all their calls to go through, which means we must build enough capacity in the network to allow that to happen. Second, we want to be efficient for stockholders and not spend more money than we need to for the network to be at the optimum size.

There are basically two costs in terms of building the network. There is the cost of switching systems and the cost of the circuits that connect the switching systems. Basically, you can use operations-research techniques and mathematics to determine cost trade-offs. it may make sense to build direct routing between two switching systems and use a lot of circuits, or maybe to involve three switching systems, with fewer circuits between the main two, and so on.

  1. Design routes to be used by an ambulance, police car, or fire engine to get to an emergency as quickly as possible.
  2. Design delivery routes that minimize gasoline use.
  3. Design routes to bring soldiers to the front as quickly as possible.
  4. Design a route for a truck carrying nuclear or biomedical waste.
  5. Help food deliverers fill burger and pizza orders quickly—and while they’re still hot.

Many people find it increasingly convenient to use the Web or software installed in their cars to get driving directions and driving time estimates to a place they wish to visit. The software that provides this information relies on algorithms that compute the shortest-path route in an appropriate weighted graph, which involves distances or times. The software sometimes makes use of global positioning system data involving lots of mathematics—mathematics different from what we are considering here.

57

Common Ancestors? Spotlight 2.3

In the study of ancient manuscripts, different manuscripts of the same book are available, even though the original manuscript upon which they are based has been lost. Examples of this include Euclid’s Elements and Chaucer’s Canterbury Tales. What interests scholars is reconstructing the relationships between the manuscripts and the common ancestors of the manuscripts, even when some of the ancestors are now missing.

Similarly, perceptual psychologists may be interested in which colors people perceive as being closely related and comparing these perceptions with those of people who are color-blind. Linguists are interested in the connections between languages that seem very different today, but have some words that are similar. Finally, in studying different species, biologists are interested in determining which species are more closely related to each other, including species known only in fossil form, and constructing a “tree” of life that shows which species were ancestors of others.

Reconstructions of this kind are made possible by using graph theory, specifically using the graph theory concept of a tree. The value of the graph theory in these and many other situations lies in using the distance between pairs of vertices in the tree as a way of reflecting the closeness of the relationships that pairs of manuscripts, pairs of colors, or pairs of species have. The distance between two vertices in a tree is the sum of the weights along the one path that joins the two vertices. If there are no weights on the edges, the distance is the number of edges in the path. In some reconstruction problems, a special vertex of the tree called the root is singled out. This root plays the role of the original common ancestors, and distances to the root are of critical interest.

In the case of species, trees of family relatedness were traditionally constructed based on similarities of bones and physical appearance. With the discovery of molecular biology, many new avenues have been opened up. We can now draw trees of relatedness based on an organism’s genetic material, DNA, or the proteins that the DNA codes for. The traditional trees based on physical traits often show different species as being more closely related than trees based on newer molecular biological approaches. These differences cause scholars to focus on how to resolve the discrepancies and thereby reach a deeper understanding of the unity of life.

image

The need to find shortest paths seems natural. Next we investigate a situation in which finding a longest path is the right tool.