EXAMPLE 6 Data Compression

Let’s illustrate the principles of data compression with a simple example. Biologists are able to describe genes by specifying sequences composed of the four letters , , , and , which represent the four nucleotides adenine, thymine, guanine, and cytosine, respectively. One way to encode a sequence such as in fixed-length binary form would be to encode the letters as

The corresponding binary code for the sequence is then

On the other hand, if we knew from experience that occurs most frequently, second most frequently, and so on, and that occurs much more frequently than and together, the most efficient binary encoding would be

For this encoding scheme, the sequence is encoded as