2.1 Proteins Are Built from a Repertoire of 20 Amino Acids

Amino acids are the building blocks of proteins. An α-amino acid consists of a central carbon atom, called the α carbon, linked to an amino group, a carboxylic acid group, a hydrogen atom, and a distinctive R group. The R group is often referred to as the side chain. With four different groups connected to the tetrahedral α-carbon atom, α-amino acids are chiral: they may exist in one or the other of two mirror-image forms, called the l isomer and the d isomer (Figure 2.4).

Figure 2.4: The l and d isomers of amino acids. The letter R refers to the side chain. The l and d isomers are mirror images of each other.

Notation for distinguishing stereoisomers

The four different substituents of an asymmetric carbon atom are assigned a priority according to atomic number. The lowest-priority substituent, often hydrogen, is pointed away from the viewer. The configuration about the carbon atom is called S (from the Latin sinister, “left”) if the progression from the highest to the lowest priority is counterclockwise. The configuration is called R (from the Latin rectus, “right”) if the progression is clockwise.

Only l amino acids are constituents of proteins. For almost all amino acids, the l isomer has S (rather than R) absolute configuration (Figure 2.5). What is the basis for the preference for l amino acids? The answer has been lost to evolutionary history. It is possible that the preference for l over d amino acids was a consequence of a chance selection. However, there is evidence that l amino acids are slightly more soluble than a racemic mixture of d and l amino acids, which tend to form crystals. This small solubility difference could have been amplified over time so that the l isomer became dominant in solution.

Figure 2.5: Only l amino acids are found in proteins. Almost all l amino acids have an S absolute configuration. The counterclockwise direction of the arrow from highest- to lowest-priority substituents indicates that the chiral center is of the S configuration.

Amino acids in solution at neutral pH exist predominantly as dipolar ions (also called zwitterions). In the dipolar form, the amino group is protonated

30

( NH3+) and the carboxyl group is deprotonated ( COO). The ionization state of an amino acid varies with pH (Figure 2.6). In acid solution (e.g., pH 1), the amino group is protonated ( NH3+) and the carboxyl group is not dissociated ( COOH). As the pH is raised, the carboxylic acid is the first group to give up a proton, inasmuch as its pKa is near 2. The dipolar form persists until the pH approaches 9, when the protonated amino group loses a proton.

Figure 2.6: Ionization state as a function of pH. The ionization state of amino acids is altered by a change in pH. The zwitterionic form predominates near physiological pH.

Twenty kinds of side chains varying in size, shape, charge, hydrogen-bonding capacity, hydrophobic character, and chemical reactivity are commonly found in proteins. Indeed, all proteins in all species—bacterial, archaeal, and eukaryotic—are constructed from the same set of 20 amino acids with only a few exceptions. This fundamental alphabet for the construction of proteins is several billion years old. The remarkable range of functions mediated by proteins results from the diversity and versatility of these 20 building blocks. Understanding how this alphabet is used to create the intricate three-dimensional structures that enable proteins to carry out so many biological processes is an exciting area of biochemistry and one that we will return to in Section 2.6.

Although there are many ways to classify amino acids, we will sort these molecules into four groups, on the basis of the general chemical characteristics of their R groups:

  1. Hydrophobic amino acids with nonpolar R groups

  2. Polar amino acids with neutral R groups but the charge is not evenly distributed

  3. Positively charged amino acids with R groups that have a positive charge at physiological pH

  4. Negatively charged amino acids with R groups that have a negative charge at physiological pH

Hydrophobic amino acids.

The simplest amino acid is glycine, which has a single hydrogen atom as its side chain. With two hydrogen atoms bonded to the α-carbon atom, glycine is unique in being achiral. Alanine, the next simplest amino acid, has a methyl group ( CH3) as its side chain (Figure 2.7).

Figure 2.7: Structures of hydrophobic amino acids. For each amino acid, a ball-and-stick model (top) shows the arrangement of atoms and bonds in space. A stereochemically realistic formula (middle) shows the geometric arrangement of bonds around atoms, and a Fischer projection (bottom) shows all bonds as being perpendicular for a simplified representation (see the Appendix to Chapter 1). The additional chiral center in isoleucine is indicated by an asterisk.

31

32

Larger hydrocarbon side chains are found in valine, leucine, and isoleucine. Methionine contains a largely aliphatic side chain that includes a thioether ( S ) group. The side chain of isoleucine includes an additional chiral center; only the isomer shown in Figure 2.7 is found in proteins. The larger aliphatic side chains are especially hydrophobic; that is, they tend to cluster together rather than contact water. The three-dimensional structures of water-soluble proteins are stabilized by this tendency of hydrophobic groups to come together, which is called the hydrophobic effect. The different sizes and shapes of these hydrocarbon side chains enable them to pack together to form compact structures with little empty space. Proline also has an aliphatic side chain, but it differs from other members of the set of 20 in that its side chain is bonded to both the nitrogen and the α-carbon atoms, yielding a pyrrolidine ring. Proline markedly influences protein architecture because its cyclic structure makes it more conformationally restricted than the other amino acids.

Two amino acids with relatively simple aromatic side chains are part of the fundamental repertoire. Phenylalanine, as its name indicates, contains a phenyl ring attached in place of one of the hydrogen atoms of alanine. Tryptophan has an indole group joined to a methylene ( CH2 ) group; the indole group comprises two fused rings containing an NH group. Phenylalanine is purely hydrophobic, whereas tryptophan is less so because of its NH group.

Polar amino acids.

Six amino acids are polar but uncharged. Three amino acids, serine, threonine, and tyrosine, contain hydroxyl groups ( OH) attached to a hydrophobic side chain (Figure 2.8). Serine can be thought of as a version of alanine with a hydroxyl group attached, threonine resembles valine with a hydroxyl group in place of one of valine’s methyl groups, and tyrosine is a version of phenylalanine with the hydroxyl group replacing a hydrogen atom on the aromatic ring. The hydroxyl group makes these amino acids much more hydrophilic (water loving) and reactive than their hydrophobic analogs. Threonine, like isoleucine, contains an additional asymmetric center; again, only one isomer is present in proteins.

Figure 2.8: Structures of the polar amino acids. The additional chiral center in threonine is indicated by an asterisk.

33

In addition, the set includes asparagine and glutamine, two amino acids that contain a terminal carboxamide. The side chain of glutamine is one methylene group longer than that of asparagine.

Cysteine is structurally similar to serine but contains a sulfhydryl, or thiol ( SH), group in place of the hydroxyl ( OH) group. The sulfhydryl group is much more reactive. Pairs of sulfhydryl groups may come together to form disulfide bonds, which are particularly important in stabilizing some proteins, as will be discussed shortly.

Positively charged amino acids.

We turn now to amino acids with complete positive charges that render them highly hydrophilic. Lysine and arginine have long side chains that terminate with groups that are positively charged at neutral pH. Lysine is capped by a primary amino group and arginine by a guanidinium group. Histidine contains an imidazole group, an aromatic ring that also can be positively charged (Figure 2.9).

Figure 2.9: Positively charged amino acids lysine, arginine, and histidine.

With a pKa value near 6, the imidazole group can be uncharged or positively charged near neutral pH, depending on its local environment (Figure 2.10). Histidine is often found in the active sites of enzymes, where the imidazole ring can bind and release protons in the course of enzymatic reactions.

Figure 2.10: Histidine ionization. Histidine can bind or release protons near physiological pH.

Negatively charged amino acids.

This set of amino acids contains two with acidic side chains: aspartic acid and glutamic acid (Figure 2.11). These amino acids are charged derivatives of asparagine and glutamine (Figure 2.8), with a carboxylic acid in place of a carboxamide. Aspartic acid and glutamic acid are often called aspartate and glutamate to emphasize that, at physiological pH, their side chains usually lack a proton that is present in the acid form and hence are negatively charged. Nonetheless, these side chains can accept protons in some proteins, often with functionally important consequences.

Figure 2.11: Negatively charged amino acids.

Seven of the 20 amino acids have readily ionizable side chains. These 7 amino acids are able to donate or accept protons to facilitate reactions as well as to form ionic bonds. Table 2.1 gives equilibria and typical pKa values for ionization of the side chains of tyrosine, cysteine, arginine, lysine, histidine, and aspartic and glutamic acids in proteins. Two other groups in proteins—the terminal α-amino group and the terminal α-carboxyl group—can be ionized, and typical pKa values for these groups also are included in Table 2.1.

34

Amino acids are often designated by either a three-letter abbreviation or a one-letter symbol (Table 2.2). The abbreviations for amino acids are the first three letters of their names, except for asparagine (Asn), glutamine (Gln), isoleucine (Ile), and tryptophan (Trp). The symbols for many amino acids are the first letters of their names (e.g., G for glycine and L for leucine); the other symbols have been agreed on by convention. These abbreviations and symbols are an integral part of the vocabulary of biochemists.

Amino acid

Three-letter abbreviation

One-letter abbreviation

Alanine

Ala

A

Arginine

Arg

R

Asparagine

Asn

N

Aspartic acid

Asp

D

Cysteine

Cys

C

Glutamine

Gln

Q

Glutamic acid

Glu

E

Glycine

Gly

G

Histidine

His

H

Isoleucine

Ile

I

Leucine

Leu

L

Lysine

Lys

K

Methionine

Met

M

Phenylalanine

Phe

F

Proline

Pro

P

Serine

Ser

S

Threonine

Thr

T

Tryptophan

Trp

W

Tyrosine

Tyr

Y

Valine

Val

V

Asparagine or aspartic acid

Asx

B

Glutamine or glutamic acid

Glx

Z

Table 2.2: Abbreviations for amino acids

35

How did this particular set of amino acids become the building blocks of proteins? First, as a set, they are diverse: their structural and chemical properties span a wide range, endowing proteins with the versatility to assume many functional roles. Second, many of these amino acids were probably available from prebiotic reactions; that is, from reactions that took place before the origin of life. Finally, other possible amino acids may have simply been too reactive. For example, amino acids such as homoserine and homocysteine tend to form five-membered cyclic forms that limit their use in proteins; the alternative amino acids that are found in proteins—serine and cysteine—do not readily cyclize, because the rings in their cyclic forms are too small (Figure 2.12).

Figure 2.12: Undesirable reactivity in amino acids. Some amino acids are unsuitable for proteins because of undesirable cyclization. Homoserine can cyclize to form a stable, five-membered ring, potentially resulting in peptide-bond cleavage. The cyclization of serine would form a strained, four-membered ring and is thus disfavored. X can be an amino group from a neighboring amino acid or another potential leaving group.