2.2 Primary Structure: Amino Acids Are Linked by Peptide Bonds to Form Polypeptide Chains

Proteins are linear polymers formed by linking the α-carboxyl group of one amino acid to the α-amino group of another amino acid. This type of linkage is called a peptide bond or an amide bond. The formation of a dipeptide from two amino acids is accompanied by the loss of a water molecule (Figure 2.13). The equilibrium of this reaction lies on the side of hydrolysis rather than synthesis under most conditions. Hence, the biosynthesis of peptide bonds requires an input of free energy. Nonetheless, peptide bonds are quite stable kinetically because the rate of hydrolysis is extremely slow; the lifetime of a peptide bond in aqueous solution in the absence of a catalyst approaches 1000 years.

Figure 2.13: Peptide-bond formation. The linking of two amino acids is accompanied by the loss of a molecule of water.

A series of amino acids joined by peptide bonds form a polypeptide chain, and each amino acid unit in a polypeptide is called a residue. A polypeptide chain has directionality because its ends are different: an α-amino group is present at one end and an α-carboxyl group at the other. The amino end is taken to be the beginning of a polypeptide chain; by convention, the sequence of amino acids in a polypeptide chain is written starting with the amino-terminal residue. Thus, in the polypeptide Tyr-Gly-Gly-Phe-Leu (YGGFL), tyrosine is the amino-terminal (N-terminal) residue and leucine is the carboxyl-terminal (C-terminal) residue (Figure 2.14). Leu-Phe-Gly-Gly-Tyr (LFGGY) is a different polypeptide, with different chemical properties.

Figure 2.14: Amino acid sequences have direction. This illustration of the pentapeptide Tyr-Gly-Gly-Phe-Leu (YGGFL) shows the sequence from the amino terminus to the carboxyl terminus. This pentapeptide, Leuenkephalin, is an opioid peptide that modulates the perception of pain. The reverse pentapeptide, Leu-Phe-Gly-Gly-Tyr (LFGGY), is a different molecule and has no such effects.

36

A polypeptide chain consists of a regularly repeating part, called the main chain or backbone, and a variable part, comprising the distinctive side chains (Figure 2.15). The polypeptide backbone is rich in hydrogen-bonding potential. Each residue contains a carbonyl group (C O), which is a good hydrogen-bond acceptor, and, with the exception of proline, an NH group, which is a good hydrogen-bond donor. These groups interact with each other and with functional groups from side chains to stabilize particular structures, as will be discussed in Section 2.3.

Dalton

A unit of mass very nearly equal to that of a hydrogen atom. Named after John Dalton (1766–1844), who developed the atomic theory of matter.

Kilodalton (kDa)

A unit of mass equal to 1000 daltons

Figure 2.15: Components of a polypeptide chain. A polypeptide chain consists of a constant backbone (shown in black) and variable side chains (shown in green).

Most natural polypeptide chains contain between 50 and 2000 amino acid residues and are commonly referred to as proteins. The largest single polypeptide known is the muscle protein titin, which consists of more than 27,000 amino acids. Polypeptide chains made of small numbers of amino acids are called oligopeptides or simply peptides. The mean molecular weight of an amino acid residue is about 110 g mol–1, and so the molecular weights of most proteins are between 5500 and 220,000 g mol–1. We can also refer to the mass of a protein, which is expressed in units of daltons; one dalton is equal to one atomic mass unit. A protein with a molecular weight of 50,000 g mol–1 has a mass of 50,000 daltons, or 50 kDa (kilodaltons).

In some proteins, the linear polypeptide chain is cross-linked. The most common cross-links are disulfide bonds, formed by the oxidation of a pair of cysteine residues (Figure 2.16). The resulting unit of two linked cysteines is called cystine. Extracellular proteins often have several disulfide bonds, whereas intracellular proteins usually lack them. Rarely, nondisulfide cross-links derived from other side chains are present in proteins. For example, collagen fibers in connective tissue are strengthened in this way, as are fibrin blood clots (Section 10.4).

Figure 2.16: Cross-links. The formation of a disulfide bond from two cysteine residues is an oxidation reaction.

37

Proteins have unique amino acid sequences specified by genes

In 1953, Frederick Sanger determined the amino acid sequence of insulin, a protein hormone (Figure 2.17). This work is a landmark in biochemistry because it showed for the first time that a protein has a precisely defined amino acid sequence consisting only of l amino acids linked by peptide bonds. This accomplishment stimulated other scientists to carry out sequence studies of a wide variety of proteins. Currently, the complete amino acid sequences of more than 2,000,000 proteins are known. The striking fact is that each protein has a unique, precisely defined amino acid sequence. The amino acid sequence of a protein is referred to as its primary structure.

Figure 2.17: Amino acid sequence of bovine insulin.

A series of incisive studies in the late 1950s and early 1960s revealed that the amino acid sequences of proteins are determined by the nucleotide sequences of genes. The sequence of nucleotides in DNA specifies a complementary sequence of nucleotides in RNA, which in turn specifies the amino acid sequence of a protein. In particular, each of the 20 amino acids of the repertoire is encoded by one or more specific sequences of three nucleotides (Section 4.6).

Knowing amino acid sequences is important for several reasons. First, knowledge of the sequence of a protein is usually essential to elucidating its function (e.g., the catalytic mechanism of an enzyme). In fact, proteins with novel properties can be generated by varying the sequence of known proteins. Second, amino acid sequences determine the three-dimensional structures of proteins. The amino acid sequence is the link between the genetic message in DNA and the three-dimensional structure that performs a protein’s biological function. Analyses of relations between amino acid sequences and three-dimensional structures of proteins are uncovering the rules that govern the folding of polypeptide chains. Third, alterations in amino acid sequence can lead to abnormal protein function and disease. Severe and sometimes fatal diseases, such as sickle-cell anemia (Chapter 7) and cystic fibrosis, can result from a change in a single amino acid within a protein. Fourth, the sequence of a protein reveals much about its evolutionary history (Chapter 6). Proteins resemble one another in amino acid sequence only if they have a common ancestor. Consequently, molecular events in evolution can be traced from amino acid sequences; molecular paleontology is a flourishing area of research.

38

Polypeptide chains are flexible yet conformationally restricted

Examination of the geometry of the protein backbone reveals several important features. First, the peptide bond is essentially planar (Figure 2.18). Thus, for a pair of amino acids linked by a peptide bond, six atoms lie in the same plane: the α-carbon atom and CO group of the first amino acid and the NH group and α-carbon atom of the second amino acid. The nature of the chemical bonding within a peptide accounts for the bond’s planarity. The bond resonates between a single bond and a double bond. Because of this partial double-bond character, rotation about this bond is prevented and thus the conformation of the peptide backbone is constrained.

Figure 2.18: Peptide bonds are planar. In a pair of linked amino acids, six atoms (Cα, C, O, N, H, and Cα) lie in a plane. Side chains are shown as green balls.

The partial double-bond character is also expressed in the length of the bond between the CO and the NH groups. As shown in Figure 2.19, the C N distance in a peptide bond is typically 1.32 Å, which is between the values expected for a C N single bond (1.49 Å) and a C N double bond (1.27 Å). Finally, the peptide bond is uncharged, allowing polymers of amino acids linked by peptide bonds to form tightly packed globular structures.

Figure 2.19: Typical bond lengths within a peptide unit. The peptide unit is shown in the trans configuration.

Two configurations are possible for a planar peptide bond. In the trans configuration, the two α-carbon atoms are on opposite sides of the peptide bond. In the cis configuration, these groups are on the same side of the peptide bond. Almost all peptide bonds in proteins are trans. This preference for trans over cis can be explained by the fact that steric clashes between groups attached to the α-carbon atoms hinder the formation of the cis configuration but do not arise in the trans configuration (Figure 2.20). By far the most common cis peptide bonds are X Pro linkages. Such bonds show less preference for the trans configuration because the nitrogen of proline is bonded to two tetrahedral carbon atoms, limiting the steric differences between the trans and cis forms (Figure 2.21).

Figure 2.20: Trans and cis peptide bonds. The trans form is strongly favored because of steric clashes, indicated by the orange semicircles, that arise in the cis form.
Figure 2.21: Trans and cis X–Pro bonds. The energies of these forms are similar to one another because steric clashes, indicated by the orange semicircles, arise in both forms.

In contrast with the peptide bond, the bonds between the amino group and the α-carbon atom and between the α-carbon atom and the carbonyl group are pure single bonds. The two adjacent rigid peptide units can rotate about these bonds, taking on various orientations. This freedom of rotation about two bonds of each amino acid allows proteins to fold in many different ways. The rotations about these bonds can be specified by torsion angles (Figure 2.22). The angle of rotation about the bond between the nitrogen and the α-carbon atoms is called phi (ϕ). The angle of rotation about the bond between the α-carbon and the carbonyl carbon atoms is called psi (ψ). A clockwise rotation about either bond as viewed from the nitrogen atom toward the α-carbon atom or from the α-carbon atom toward the carbonyl group corresponds to a positive value. The ϕ and ψ angles determine the path of the polypeptide chain.

Figure 2.22: Rotation about bonds in a polypeptide. The structure of each amino acid in a polypeptide can be adjusted by rotation about two single bonds. (A) Phi (ϕ) is the angle of rotation about the bond between the nitrogen and the α-carbon atoms, whereas psi (ψ) is the angle of rotation about the bond between the α-carbon and the carbonyl carbon atoms. (B) A view down the bond between the nitrogen and the α-carbon atoms, showing how ϕ is measured. (C) A view down the bond between the α-carbon and the carbonyl carbon atoms, showing how ψ is measured.

Torsion angle

A measure of the rotation about a bond, usually taken to lie between −180 and +180 degrees. Torsion angles are sometimes called dihedral angles.

39

Are all combinations of ϕ and ψ possible? Gopalasamudram Ramachandran recognized that many combinations are forbidden because of steric collisions between atoms. The allowed values can be visualized on a two-dimensional plot called a Ramachandran plot (Figure 2.23). Three-quarters of the possible (ϕ, ψ) combinations are excluded simply by local steric clashes. Steric exclusion, the fact that two atoms cannot be in the same place at the same time, can be a powerful organizing principle.

Figure 2.23: A Ramachandran plot showing the values of ϕ and ψ. Not all ϕ and ψ values are possible without collisions between atoms. The most favorable regions are shown in dark green; borderline regions are shown in light green. The structure on the right is disfavored because of steric clashes.

The ability of biological polymers such as proteins to fold into well-defined structures is remarkable thermodynamically. An unfolded polymer exists as a random coil: each copy of an unfolded polymer will have a different conformation, yielding a mixture of many possible conformations. The favorable entropy associated with a mixture of many conformations opposes folding and must be overcome by interactions favoring the folded form. Thus, highly flexible polymers with a large number of possible conformations do not fold into unique structures. The rigidity of the peptide unit and the restricted set of allowed ϕ and ψ angles limits the number of structures accessible to the unfolded form sufficiently to allow protein folding to take place.

40