Key Concepts of Section 8.4

Key Concepts of Section 8.4

Genomics: Genome-Wide Analysis of Gene Structure and Function

  • The function of a protein that has not been isolated (a query sequence) can often be predicted on the basis of similarity of its amino acid sequence to the sequences of proteins of known function.

  • A computer algorithm known as BLAST rapidly searches databases of known protein sequences to find those with significant similarity to a query protein.

  • Proteins with common functional motifs, which can often be quite short, may not be identified in a typical BLAST search. Such short sequences may be located by searches of structural motif databases.

  • A protein family comprises multiple proteins all derived from the same ancestral protein. The genes encoding these proteins, which constitute the corresponding gene family, arose by an initial gene duplication event and subsequent divergence during speciation (see Figure 8-21).

  • Related genes and their encoded proteins expressed in one organism that derive from a gene duplication event are paralogous, such as the α- and β-globins that combine in hemoglobin (α2β2). Those that derive from mutations that accumulated during speciation are orthologous. Proteins that are orthologous usually have a similar function in different organisms.

  • Open reading frames (ORFs) are regions of genomic DNA containing at least 100 codons located between a start codon and stop codon.

  • Computer searching of the entire bacterial and yeast genomic sequences for open reading frames (ORFs) correctly identifies most protein-coding genes. Several types of additional data must be used to identify probable (putative) genes in the genomic sequences of humans and other higher eukaryotes because of their more complex gene structure, in which relatively short coding exons are separated by relatively long noncoding introns.

  • Analysis of the complete genomic sequences of several different organisms indicates that biological complexity is not directly related to the number of protein-coding genes (see Figure 8-22).