11.3 Carbohydrates Can Be Linked to Proteins to Form Glycoproteins

A carbohydrate group can be covalently attached to a protein to form a glycoprotein. Such modifications are not rare, as 50% of the proteome consists of glycoproteins. We will examine three classes of glycoproteins. The first class is simply referred to as glycoproteins. In glycoproteins of this class, the protein constituent is the largest component by weight. This versatile class plays a variety of biochemical roles. Many glycoproteins are components of cell membranes, where they take part in processes such as cell adhesion and the binding of sperm to eggs. Other glycoproteins are formed by linking carbohydrates to soluble proteins. Many of the proteins secreted from cells are glycosylated, or modified by the attachment of carbohydrates, including most proteins present in the serum component of blood.

326

The second class of glycoproteins comprises the proteoglycans. The protein component of proteoglycans is conjugated to a particular type of polysaccharide called a glycosaminoglycan. Carbohydrates make up a much larger percentage by weight of the proteoglycan compared with simple glycoproteins. Proteoglycans function as structural components and lubricants.

Mucins, or mucoproteins, are, like proteoglycans, predominantly carbohydrate. N-Acetylgalactosamine is usually the carbohydrate moiety bound to the protein in mucins. N -Acetylgalactosamine is an example of an amino sugar, so named because an amino group replaces a hydroxyl group. Mucins, a key component of mucus, serve as lubricants.

Glycosylation greatly increases the complexity of the proteome. A given protein with several potential glycosylation sites can have many different glycosylated forms (called glycoforms), each of which can be generated only in a specific cell type or developmental stage.

Carbohydrates can be linked to proteins through asparagine (N-linked) or through serine or threonine (O-linked) residues

Sugars in glycoproteins are attached either to the amide nitrogen atom in the side chain of asparagine (termed an N-linkage) or to the oxygen atom in the side chain of serine or threonine (termed an O-linkage), as shown in Figure 11.15. An asparagine residue can accept an oligosaccharide only if the residue is part of an Asn-X-Ser or Asn-X-Thr sequence, in which X can be any residue, except proline. However, not all potential sites are glycosylated. Which sites are glycosylated depends on other aspects of the protein structure and on the cell type in which the protein is expressed. All N-linked oligosaccharides have in common a pentasaccharide core consisting of three mannose and two N-acetylglucosamine residues. Additional sugars are attached to this core to form the great variety of oligosaccharide patterns found in glycoproteins (Figure 11.16).

Figure 11.15: Glycosidic bonds between proteins and carbohydrates. A glycosidic bond links a carbohydrate to the side chain of asparagine (N-linked) or to the side chain of serine or threonine (O-linked). The glycosidic bonds are shown in red.
Figure 11.16: N-linked oligosaccharides. A pentasaccharide core (shaded gray) is common to all N-linked oligosaccharides and serves as the foundation for a wide variety of N-linked oligosaccharides, two of which are illustrated: (A) high-mannose type; (B) complex type.

327

The glycoprotein erythropoietin is a vital hormone

Let us look at a glycoprotein present in the blood serum that has dramatically improved treatment for anemia, particularly that induced by cancer chemotherapy. The glycoprotein hormone erythropoietin (EPO) is secreted by the kidneys and stimulates the production of red blood cells. EPO is composed of 165 amino acids and is N-glycosylated at three asparagine residues and O-glycosylated on a serine residue (Figure 11.17). The mature EPO is 40% carbohydrate by weight, and glycosylation enhances the stability of the protein in the blood. Unglycosylated protein has only about 10% of the bioactivity of the glycosylated form because the protein is rapidly removed from the blood by the kidneys. The availability of recombinant human EPO has greatly aided the treatment of anemias. However, some endurance athletes have used recombinant human EPO to increase the red-blood-cell count and hence their oxygen-carrying capacity. Drug-testing laboratories are able to distinguish some forms of prohibited human recombinant EPO from natural EPO in athletes by detecting differences in their glycosylation patterns through the use of isoelectric focusing.

Figure 11.17: Oligosaccharides attached to erythropoietin. Erythropoietin has oligosaccharides linked to three asparagine residues and one serine residue. The structures shown are approximately to scale. See Figure 11.16 for the carbohydrate key.
[Drawn from 1BUY.pdf.]

Glycosylation functions in nutrient sensing

An especially important glycosylation reaction is the covalent attachment of N-acetylglucosamine (GlcNAc) to serine or threonine residues of cellular proteins, a reaction catalyzed by GlcNAc transferase. The concentration of GlcNAc reflects the active metabolism of carbohydrates, amino acids and fats, indicating that nutrients are abundant (Figure 11.18). More than one thousand proteins are modified by GlcNAcylation, including transcription factors and components of signaling pathways. Interestingly, because the GlcNAcylation sites are also potential phosphorylation sites, O-GlcNAc transferase and protein kinases may be involved in cross talk to modulate one another’s signaling activity. Like phosphorylation, GlcNAcylation is reversible, with GlcNAcase catalyzing the removal of the carbohydrate. Dysregulation of GlcNAc transferase has been linked to insulin resistance, diabetes, cancer and neurological pathologies.

Figure 11.18: Glycosylation as a nutrient sensor. N-acetylglucosamine is attached to proteins when nutrients are abundant.

Proteoglycans, composed of polysaccharides and protein, have important structural roles

As stated earlier, proteoglycans are proteins attached to glycosaminoglycans. The glycosaminoglycan makes up as much as 95% of the biomolecule by weight, and so the proteoglycan resembles a polysaccharide more than a protein. Proteoglycans not only function as lubricants and structural components in connective tissue, but also mediate the adhesion of cells to the extracellular matrix, and bind factors that regulate cell proliferation.

The properties of proteoglycans are determined primarily by the glycosaminoglycan component. Many glycosaminoglycans are made of repeating units of disaccharides containing a derivative of an amino sugar, either glucosamine or galactosamine (Figure 11.19). At least one of the two sugars in the repeating unit has a negatively charged carboxylate or sulfate group. The major glycosaminoglycans in animals are chondroitin sulfate, keratan sulfate, heparin, dermatan sulfate, and hyaluronate. Recall that heparin acts as an anticoagulant to assist the termination of blood clotting. Mucopolysaccharidoses are a collection of diseases, such as Hurler disease, that result from the inability to degrade glycosaminoglycans (Figure 11.20). Although precise clinical features vary with the disease, all mucopolysaccharidoses result in skeletal deformities and reduced life expectancies.

Figure 11.19: Repeating units in glycosaminoglycans. Structural formulas for five repeating units of important glycosaminoglycans illustrate the variety of modifications and linkages that is possible. Amino groups are shown in blue and negatively charged groups in red. Hydrogen atoms have been omitted for clarity. The right-hand structure is a glucosamine derivative in each case.
Figure 11.20: Hurler disease. Formerly called gargoylism, Hurler disease is a mucopolysaccharidosis having symptoms that include wide nostrils, a depressed nasal bridge, thick lips and earlobes, and irregular teeth. In Hurler disease, glycosaminoglycans cannot be degraded. The excess of these molecules are stored in the soft tissue of the facial regions, resulting in the characteristic facial features.
[Courtesy National MPS Society, www.mpssociety.org.]

328

Proteoglycans are important components of cartilage

Among the best-characterized members of this diverse class is the proteoglycan in the extracellular matrix of cartilage. The proteoglycan aggrecan and the protein collagen are key components of cartilage. The triple helix of collagen provides structure and tensile strength, whereas aggrecan serves as a shock absorber. The protein component of aggrecan is a large molecule composed of 2397 amino acids. The protein has three globular domains, and the site of glycosaminoglycan attachment is the extended region between globular domains 2 and 3. This linear region contains highly repetitive amino acid sequences, which are sites for the attachment of keratan sulfate and chondroitin sulfate. Many molecules of aggrecan are in turn noncovalently bound through the first globular domain to a very long filament formed by linking together molecules of the glycosaminoglycan hyaluronate (Figure 11.21). Water is bound to the glycosaminoglycans, attracted by the many negative charges. Aggrecan can cushion compressive forces because the absorbed water enables it to spring back after having been deformed. When pressure is exerted, as when the foot hits the ground while walking, water is squeezed from the glycosaminoglycan, cushioning the impact. When the pressure is released, the water rebinds. Osteoarthritis, the most common form of arthritis, results when water is lost from proteoglycan with aging. Other forms of arthritis can result from the proteolytic degradation of aggrecan and collagen in the cartilage.

Figure 11.21: Structure of proteoglycan from cartilage. (A) Electron micrograph of a proteoglycan from cartilage (with false color added). Proteoglycan monomers emerge laterally at regular intervals from opposite sides of a central filament of hyaluronate. (B) Schematic representation. G = globular domain.
[(A) Courtesy of Dr. Lawrence Rosenberg. From J. A. Buckwalter and L. Rosenberg. Collagen Relat. Res. 3:489–504, 1983.]

329

Figure 11.22: Chitin, a glycosaminoglycan, is present in insect wings and the exoskeleton. Glycosaminoglycans are components of the exoskeletons of insects, crustaceans, and arachnids.
[FLPA/Alamy.]

In addition to being a key component of structural tissues, glycosaminoglycans are common throughout the biosphere. Chitin is a glycosaminoglycan found in the exoskeleton of insects, crustaceans, and arachnids and is, next to cellulose, the second most abundant polysaccharide in nature (Figure 11.22). Cephalopods such as squid use their razor sharp beaks, which are made of extensively crosslinked chitin, to disable and consume prey.

Mucins are glycoprotein components of mucus

A third class of glycoproteins is the mucins (mucoproteins). In mucins, the protein component is extensively glycosylated at serine or threonine residues by N-acetylgalactosamine (Figure 11.10). Mucins are capable of forming large polymeric structures and are common in mucous secretions. These glycoproteins are synthesized by specialized cells in the tracheobronchial, gastrointestinal, and genitourinary tracts. Mucins are abundant in saliva where they function as lubricants.

A model of a mucin is shown in Figure 11.23A. The defining feature of the mucins is a region of the protein backbone termed the variable number of tandem repeats (VNTR) region, which is rich in serine and threonine residues that are O-glycosylated. Indeed, the carbohydrate moiety can account for as much as 80% of the molecule by weight. A number of core carbohydrate structures are conjugated to the protein component of mucin. Figure 11.23B shows one such structure.

Figure 11.23: Mucin structure. (A) A schematic representation of a mucoprotein. The VNTR region is highly glycosylated, forcing the molecule into an extended conformation. The Cys-rich domains and the d domain facilitate the polymerization of many such molecules. (B) An example of an oligosaccharide that is bound to the VNTR region of the protein. See Figure 11.16 for the carbohydrate key. [Information from A. Varki et al. (Eds.), Essentials of Glycobiology, 2d ed.
(Cold Spring Harbor Press, 2009), pp. 117, 118.]

Mucins adhere to epithelial cells and act as a protective barrier; they also hydrate the underlying cells. In addition to protecting cells from environmental insults, such as stomach acid, inhaled chemicals in the lungs, and bacterial infections, mucins have roles in fertilization, the immune response, and cell adhesion. Mucins are overexpressed in bronchitis and cystic fibrosis, and the overexpression of mucins is characteristic of adeno-carcinomas—cancers of the glandular cells of epithelial origin.

330

Protein glycosylation takes place in the lumen of the endoplasmic reticulum and in the Golgi complex

Figure 11.24: Golgi complex and endoplasmic reticulum. The electron micrograph shows the Golgi complex and adjacent endoplasmic reticulum. The black dots on the cytoplasmic surface of the ER membrane are ribosomes.
[Micrograph courtesy of Lynne Mercer.]

The major pathway for protein glycosylation takes place inside the lumen of the endoplasmic reticulum (ER) and in the Golgi complex, organelles that play central roles in protein trafficking (Figure 11.24). The protein is synthesized by ribosomes attached to the cytoplasmic face of the ER membrane, and the peptide chain is inserted into the lumen of the ER (Section 30.6). The N-linked glycosylation begins in the ER and continues in the Golgi complex, whereas the O-linked glycosylation takes place exclusively in the Golgi complex.

A large oligosaccharide destined for attachment to the asparagine residue of a protein is assembled on dolichol phosphate, a specialized lipid molecule located in the ER membrane and containing about 20 isoprene (C5) units.

The terminal phosphate group of the dolichol phosphate is the site of attachment of the oligosaccharide. This activated (energy-rich) form of the oligosaccharide is subsequently transferred to a specific asparagine residue of the growing polypeptide chain by an enzyme located on the lumenal side of the ER.

Proteins in the lumen of the ER and in the ER membrane are transported to the Golgi complex, which is a stack of flattened membranous sacs. Carbohydrate units of glycoproteins are altered and elaborated in the Golgi complex. The O -linked sugar units are fashioned there, and the N-linked sugars, arriving from the ER as a component of a glycoprotein, are modified in many different ways. The Golgi complex is the major sorting center of the cell. Proteins proceed from the Golgi complex to lysosomes, secretory granules, or the plasma membrane, according to signals encoded within their amino acid sequences and three-dimensional structures (Figure 11.25).

Figure 11.25: Golgi complex as sorting center. The Golgi complex is the sorting center in the targeting of proteins to lysosomes, secretory vesicles, and the plasma membrane. The cis face of the Golgi complex receives vesicles from the endoplasmic reticulum, and the trans face sends a different set of vesicles to target sites. Vesicles also transfer proteins from one compartment of the Golgi complex to another.
[Courtesy of Dr. Marilyn Farquhar.]

331

Specific enzymes are responsible for oligosaccharide assembly

How are the complex carbohydrates formed, be they unconjugated molecules such as glycogen or components of glycoproteins? Complex carbohydrates are synthesized through the action of specific enzymes, glycosyltransferases, which catalyze the formation of glycosidic bonds. Given the diversity of known glycosidic linkages, many different enzymes are required. Indeed, glycosyltransferases account for 1% to 2% of gene products in all organisms examined.

While dolichol phosphate-linked oligosaccharides are substrates for some glycosyltransferases, the most common carbohydrate donors for glycosyltransferases are activated sugar nucleotides, such as UDP-glucose (UDP is the abbreviation for uridine diphosphate) (Figure 11.26). The attachment of a nucleotide to enhance the energy content of a molecule is a common strategy in biosynthesis that we will see many times in our study of biochemistry. The acceptor substrates for glycosyltransferases are quite varied and include carbohydrates, serine, threonine, and asparagine residues of proteins, lipids, and even nucleic acids.

Figure 11.26: General form of a glycosyltransferase reaction. The sugar to be added comes from a sugar nucleotide—in this case, UDP-glucose. The acceptor, designated X in this illustration, can be one of a variety of biomolecules, including other carbohydrates or proteins.

Blood groups are based on protein glycosylation patterns

The human ABO blood groups illustrate the effects of glycosyltransferases on the formation of glycoproteins. Each blood group is designated by the presence of one of the three different carbohydrates, termed A, B, or O, attached to glycoproteins and glycolipids on the surfaces of red blood cells (Figure 11.27). These structures have in common an oligosaccharide foundation called the O (or sometimes H) antigen. The A and B antigens differ from the O antigen by the addition of one extra monosaccharide, either N-acetylgalactosamine (for A) or galactose (for B) through an α -1,3 linkage to a galactose moiety of the O antigen.

Figure 11.27: Structures of A, B, and O oligosaccharide antigens. The carbohydrate structures shown are depicted symbolically by employing a scheme (see the key in Figure 11.16) that is becoming widely used.

Specific glycosyltransferases add the extra monosaccharide to the O antigen. Each person inherits the gene for one glycosyltransferase of this type from each parent. The type A transferase specifically adds N-acetylgalactosamine, whereas the type B transferase adds galactose. These enzymes are identical in all but 4 of 354 positions. The O phenotype is the result of a mutation in the O transferase that results in the synthesis of an inactive enzyme.

These structures have important implications for blood transfusions and other transplantation procedures. If an antigen not normally present in a person is introduced, the person’s immune system recognizes it as foreign. Red-blood-cell lysis occurs rapidly, leading to a severe drop in blood pressure (hypotension), shock, kidney failure, and death from circulatory collapse.

Why are different blood types present in the human population? Suppose that a pathogenic organism such as a parasite expresses on its cell surface a carbohydrate antigen similar to one of the blood-group antigens. This antigen may not be readily detected as foreign in a person whose blood type matches the parasite antigen, and the parasite will flourish. However, other people with different blood types will be protected. Hence, there will be selective pressure on human beings to vary blood type to prevent parasitic mimicry and a corresponding selective pressure on parasites to enhance mimicry. The constant “arms race” between pathogenic microorganisms and human beings drives the evolution of diversity of surface antigens within the human population.

332

Errors in glycosylation can result in pathological conditions

Although the role of carbohydrate attachment to proteins is not known in detail in most cases, data indicate that this glycosylation is important for the processing and stability of these proteins, as it is for EPO. Certain types of muscular dystrophy can be traced to improper glycosylation of dystroglycan, a membrane protein that links the extracellular matrix with the cytoskeleton. Indeed, an entire family of severe inherited human diseases called congenital disorders of glycosylation has been identified. These pathological conditions reveal the importance of proper modification of proteins by carbohydrates and their derivatives.

An especially clear example of the role of glycosylation is provided by I-cell disease (also called mucolipidosis II), a lysosomal storage disease. Normally, a carbohydrate marker directs certain digestive enzymes from the Golgi complex to lysosomes. Lysosomes are organelles that degrade and recycle damaged cellular components or material brought into the cell by endocytosis. In patients with I-cell disease, lysosomes contain large inclusions of undigested glycosaminoglycans and glycolipids—hence the “I” in the name of the disease. These inclusions are present because the enzymes normally responsible for the degradation of glycosaminoglycans are missing from affected lysosomes. Remarkably, the enzymes are present at very high levels in the blood and urine. Thus, active enzymes are synthesized, but, in the absence of appropriate glycosylation, they are exported instead of being sequestered in lysosomes. In other words, in I-cell disease, a whole series of enzymes are incorrectly addressed and delivered to the wrong location. Normally, these enzymes contain a mannose 6-phosphate residue as a component of an N-oligosaccharide that serves as the marker directing the enzymes from the Golgi complex to lysosomes. In I-cell disease, however, the attached mannose lacks a phosphate. I-cell patients are deficient in the N-acetylglucosamine phosphotransferase catalyzing the first step in the addition of the phosphoryl group; the consequence is the mistargeting of eight essential enzymes (Figure 11.28). I-cell disease causes the patient to suffer severe psychomotor retardation and skeletal deformities, similar to those in Hurler disease. Remarkably, mutations in the phosphotransferase have also been linked to stuttering. Why some mutations cause stuttering while other cause I-cell disease is a mystery.

Figure 11.28: Formation of a mannose 6-phosphate marker. A glycoprotein destined for delivery to lysosomes acquires a phosphate marker in the Golgi compartment in a two-step process. First, GlcNAc phosphotransferase adds a phospho-N-acetylglucosamine unit to the 6-OH group of a mannose, and then an N-acetylglucosaminidase removes the added sugar to generate a mannose 6-phosphate residue in the core oligosaccharide.

Oligosaccharides can be “sequenced”

How is it possible to determine the structure of a glycoprotein—the oligosaccharide structures and their points of attachment? Most approaches make use of enzymes that cleave oligosaccharides at specific types of linkages.

The first step is to detach the oligosaccharide from the protein. For example, N-linked oligosaccharides can be released from proteins by an enzyme such as peptide N-glycosidase F, which cleaves the N-glycosidic bonds linking the oligosaccharide to the protein. The oligosaccharides can then be isolated and analyzed. Matrix-assisted laser desorption/ionization/time-of-flight (MALDI-TOF) or other mass spectrometric techniques (Section 3.3) provide the mass of an oligosaccharide fragment. However, many possible oligosaccharide structures are consistent with a given mass. More-complete information can be obtained by cleaving the oligosaccharide with enzymes of varying specificities. For example, β-1,4-galactosidase cleaves β-glycosidic bonds exclusively at galactose residues. The products can again be analyzed by mass spectrometry (Figure 11.29). The repetition of this process with the use of an array of enzymes of different specificity will eventually reveal the structure of the oligosaccharide.

333

Figure 11.29: Mass spectrometric “sequencing” of oligosaccharides. Carbohydrate-cleaving enzymes were used to release and specifically cleave the oligosaccharide component of the glycoprotein fetuin from bovine serum. Parts A and B show the masses obtained with MALDI-TOF spectrometry as well as the corresponding structures of the oligosaccharide-digestion products: (A) digestion with peptide N-glycosidase F (to release the oligosaccharide from the protein) and neuraminidase; (B) digestion with peptide N-glycosidase F, neuraminidase, and β-1,4-galactosidase. Knowledge of the enzyme specificities and the masses of the products permits the characterization of the oligosaccharide. See Figure 11.16 for the carbohydrate key. [Data from A. Varki, R. D. Cummings, J. D. Esko, H. H. Freeze, G. W. Hart, and J. Marth (Eds.), Essentials of Glycobiology
(Cold Spring Harbor Laboratory Press, 1999), p. 596.]

Proteases applied to glycoproteins can reveal the points of oligosaccharide attachment. Cleavage by a specific protease yields a characteristic pattern of peptide fragments that can be analyzed chromatographically. Fragments attached to oligosaccharides can be picked out because their chromatographic properties will change on glycosidase treatment. Mass spectrometric analysis or direct peptide sequencing can reveal the identity of the peptide in question and, with additional effort, the exact site of oligo-saccharide attachment.

While the sequencing of the human genome is complete, the characterization of the much more complex proteome, including the biological roles of glycosylated proteins, presents a challenge to biochemistry.