Sequencing an Entire Genome

The ultimate goal of structural genomics is to determine the ordered nucleotide sequences of entire genomes of organisms. In Chapter 14, we considered some of the methods used to sequence small fragments of DNA. The main obstacle to sequencing a whole genome is the immense size of most genomes. Bacterial genomes are usually at least several million base pairs long; many eukaryotic genomes are billions of base pairs long and are distributed among dozens of chromosomes. Furthermore, for technical reasons, sequencing cannot begin at one end of a chromosome and continue straight through to the other end; only small fragments of DNA—usually no more than 500 to 700 nucleotides—can be sequenced at one time. Therefore, determining the sequence for an entire genome requires that the DNA be broken into thousands or millions of smaller fragments that can then be sequenced. The difficulty lies in putting those short sequences back together in the correct order. Two different approaches have been used to assemble the short sequenced fragments into a complete genome: map-based sequencing and whole-genome shotgun sequencing. We will consider these two approaches in the context of the Human Genome Project.