The human genomic sequence is a gold mine for new discoveries in molecular cell biology, for identifying new proteins that may be the basis of effective therapies of human diseases, and for understanding early human history and evolution. However, finding new genes is like finding a needle in a haystack, because only about 1.5 percent of the human genome encodes proteins or functional RNAs. Identification of genes in bacterial genomic sequences is relatively simple because of the scarcity of introns; simply searching for open reading frames identifies most genes. In contrast, the search for human genes is complicated by the structure of our genes, most of which are composed of multiple, relatively short exons separated by much longer noncoding introns. Identification of complex transcription units by analysis of genomic DNA sequences alone is extremely challenging. Future improvements in bioinformatic methods for gene identification, as well as characterization of cDNA copies of mRNAs isolated from the hundreds of human cell types, is likely to lead to the discovery of new proteins, to a better understanding of biological processes, and possibly to applications in medicine and agriculture.
We have seen that although most transposons do not function directly in cellular processes, they have helped to shape modern genomes by promoting gene duplications, exon shuffling, and the generation of new combinations of transcription-
As described in Chapter 6, a Drosophila DNA transposon called the P element has been exploited for the facile stable transfection of genes into the Drosophila germ line. This transposon has provided a powerful method for molecular cell biology experimentation in Drosophila. An active area of current research is the use of mammalian transposons and retrotransposons for the transformation of human cells for gene therapy. This promises to be an exciting area of medicine in the future treatment of genetic diseases such as sickle-