Chapter 1. Virus module

Introduction

Viruses are discussed as examples of concepts across several chapters in Biology: How Life Works. Here, we compile all of the text and art on viruses from Chapters 1, 4, 13, 15, 19, 32, and 43 into a single module on this topic.

Core Concepts:

  • V.1 Viruses are diverse, but all contain genetic material and a protein coat.
  • V.2 Some viruses cause disease.
  • V.3 Viruses provide useful models and tools for biological research.

Everyone knows the symptoms: fever, chills, sore throat, cough, weakness, and fatigue, perhaps a headache and muscle pain. Many viruses cause these common symptoms, including the flu virus. The flu virus—its full name is influenza—infects mammals and birds (Chapter 43) and is notable for causing seasonal outbreaks, or epidemics. However, four times in the last century—the Spanish flu of 1918, the Asian flu of 1957–58, the Hong Kong flu of 1968–69, and the swine flu of 2009—the virus spread more widely and infected more people, and the disease reached pandemic levels. During these pandemics, the flu virus caused hundreds of thousands of deaths worldwide.

Other viruses have emerged more recently. The human immunodeficiency virus (HIV), which causes acquired immune deficiency syndrome (AIDS), first infected humans sometime in the early 20th century in Africa. The first cases of AIDS were reported by the US Centers for Disease and Control in 1981, and HIV was identified in 1983. Today, HIV is a global pandemic. The Ebola virus first appeared in Sudan and Zaire in 1976. The most recent and largest epidemic occurred in 2014 in West Africa, where several countries were affected. The Zika virus, first isolated in 1947 in Uganda, was recently found in Brazil. From there, it spread to other parts of South America, Central America, and the Caribbean, where outbreaks continue to occur today. Transmitted by mosquitoes as well as sexually, Zika virus has already been found in the United States and is expected to spread rapidly to include most of North America.

Some viruses even cause cancer (Case 2: Cancer). For example, nearly all cases of cervical cancer are caused by a virus called human papillomavirus (HPV). There are hundreds of strains of HPV, some of which are sexually transmitted. Some of these strains cause only minor problems such as common warts and plantar warts. However, a handful of high-risk HPV strains are strongly tied to cancer of the cervix. More than 99% of cervical cancer cases are believed to arise from HPV infections.

Several hundred types of virus are known to infect humans, and the catalog is still incomplete. Furthermore, species from all three domains of life – Bacteria, Archaea, and Eukarya – are susceptible to viral infection.

We mostly think of viruses as causing disease. In fact, the term “virus” comes from the Latin for “poison.” But viruses play other roles as well. Some viruses transfer genetic material from one cell to another. This process, called horizontal gene transfer, has played a major role in the evolution of bacteria and archaea, as well as in the spread of antibiotic-resistance genes (Chapter 26). Molecular biologists have learned to make use of this ability of viruses to deliver genes into cells. In addition to providing useful tools in biological research, viruses provide model systems for many problems in biology, including how genes are turned on and off, how cancer develops, and how organisms protect themselves from pathogens.

What are viruses? Are they living or not? How do they infect cells? How do organisms protect themselves from viruses? How can they be used to study biological processes? We explore the answers to these questions in the following sections.

V.1 Viral structure and diversity

Thousands of viruses have been described in detail, and probably millions more have yet to be discovered. It has been estimated that life on Earth is host to 1031 virus particles—ten hundred thousand million more virus particles than grains of sand! They are they are especially abundant in the ocean, with estimates of about 1011 viruses per liter. Here, they play a particularly large role, lysing up to 30% of phytoplankton daily in coastal ecosystems and releasing large amounts of organic molecules into the water column. This organic matter supports large populations of respiring bacteria, forming a microbial loop in the carbon cycle (Chapter 48). As a group, viruses have had amazing evolutionary success.

A virus contains genetic material, but requires a cell to replicate.

A virus is a small infectious agent that contains a nucleic acid genome packaged inside a protein coat called a capsid and sometimes an outer lipid envelope. A virus infects a cell by binding to the cell’s surface, inserting its genetic material into the cell, and using the cellular machinery to produce more viruses. In this way, it is often said that a virus “hijacks” a cell.

The infected cell may produce more viruses, sometimes by lysis, or breakage, of the cell, and the new viruses can then infect more cells. In some cases, the genetic material of the virus integrates into the DNA of the host cell. Infection of a host cell is essential to viral reproduction because viruses use cellular ATP to replicate, transcribe, and translate their genome.

Is a virus living? In Chapter 5, we noted that the cell is the fundamental unit of life. This is one of the central tenets of the Cell Theory. All cells have three essential features – the capacity to store and transmit information, a membrane that selectively controls movement in and out, and the ability to harness energy from the environment. Do viruses share these features too?

Viruses have a stable archive of genetic information that is stored and transmitted, just like cells have. However, viruses are not able to regulate the passage of substances across their protein coats or lipid envelopes the way that cells do. Nor can they harness energy from the environment without infecting a cell. In fact, on their own, viruses cannot read and use the information contained in their genetic material. To replicate, they require a cell. As a result, most scientists do not consider a virus alive.

The host range of a virus is determined by viral and host surface proteins.

All known cells and organisms are susceptible to viral infection, including bacteria, archaeons, and eukaryotes. A cell in which viral reproduction occurs is called a host cell. Some viruses kill the host cell; others do not. Although viruses can infect all types of living organism, a given virus can infect only certain species or types of cell. At one extreme, a virus can infect just a single species. Smallpox, which infects only humans, is a good example. In cases like this, we say that the virus has a narrow host range.

For other viruses, the host range is broad. For example, rabies infects many different types of mammal, including squirrels, dogs, and humans. Similarly, tobacco mosaic virus, a plant virus, infects more than 100 different species of plants (Fig. 13.17). No matter how broad the host range, however, plant viruses cannot infect bacteria or animals, and bacterial viruses cannot infect plants or animals.

Host specificity results from the way that viruses gain entry into cells. Proteins on the surface of the capsid or envelope (if present) bind to proteins on the surface of host cells. These proteins interact in a specific manner, so the presence of the host protein on the cell surface determines which cells a virus can infect. For example, a protein on the surface of the human immodeficiency virus (HIV) envelope called gp120 binds to a protein called CD4 on the surface of certain immune cells, so HIV infects these cells but not other cells.

The host range of a virus can change because of mutation and other mechanisms (Chapter 43). For example, avian, or bird, flu is a type of influenza virus that was once restricted primarily to birds but now can infect humans. The first reported case in humans was in 1996, and the disease has since spread widely. Similarly, HIV once infected only nonhuman primates, but its host range expanded in the twentieth century to include humans. Canine distemper virus, which infects dogs, expanded its host range in the early 1990s, leading to the infection and death of lions in Tanzania.

Viruses have diverse sizes and shapes.

Viruses show a wide variety of sizes and shapes, which are determined in some cases by the type of genome they carry. Most viruses are tiny, some hardly larger than a ribosome, 25–30 nm in diameter. Roughly speaking, the average size of a virus, relative to that of the host cell it infects, may be compared to the size of an average person relative to that of a commercial airliner.

Most viral genomes range in size from 3 kb to 300 kb (a kb, or kilobase is a thousand base pairs), but a few are even larger. The largest viral genome, found in a virus that infects the amoeba Acanthamoeba polyphaga, is 1.2 Mb (a Mb, or megabase, is a million base pairs). This viral genome contains almost 1000 protein-coding genes, including some for sugar, lipid, and amino acid metabolism not found in any other viruses. In fact, it is approximately twice as large as that of the bacterium Mycoplasma genitalium.

Viruses come in different shapes as well. Three examples are shown in Fig. 13.18. The T4 virus (Fig. 13.18a) infects cells of the bacterium Escherichia coli. Viruses that infect bacterial cells are called bacteriophages, which literally means “bacteria eaters.” The T4 bacteriophage has a complex structure that includes a head composed of protein surrounding a molecule of double-stranded DNA, a tail, and tail fibers. In infecting a host cell, the T4 tail fibers attach to the surface, and the DNA and some proteins are injected into the cell through the tail.

Most viruses are not structurally so complex. Consider the tobacco mosaic virus, which infects plants. It has a helical shape formed by the arrangement of protein subunits entwined with a molecule of single-stranded RNA (Fig. 13.18b). Tobacco mosaic virus was the first virus discovered, revealed in experiments showing that the infectious agent causing brown spots and discoloration of tobacco leaves was so small that it could pass through the pores of filters that could trap even the smallest bacterial cells.

Many viruses have an approximately spherical shape formed from polygons of protein subunits that come together at their edges to form a polyhedral capsid. Among the most common polyhedral shapes is an icosahedron, which has 20 identical triangular faces. The example in Fig. 13.18c is adenovirus, a common cause of upper respiratory infections in humans. Many viruses that infect eukaryotic cells, such as adenovirus, are surrounded by a glycoprotein envelope composed of a lipid bilayer with embedded proteins and glycoproteins that recognize and attach to host cell receptors.

Viral genomes are diverse and are the basis of viral classification.

As we have discussed, all viruses have a nucleic acid genome. In this way, they resemble cells. However, there are important differences between viral genomes and cellular genomes. For example, viral genomes are typically small and compact with little or no repetitive DNA. In addition, the genomes of viruses are far more diverse than the genomes of cellular organisms, which all have genomes of double-stranded DNA. Some viral genomes are composed of RNA and others of DNA. Some are single stranded, others are double stranded, and still others have both single- and double-stranded regions. Some are circular and others are made up of a single piece or multiple linear pieces of DNA (called linear and segmented genomes, respectively).

Finally, whereas biologists classify cellular organisms according to their degree of evolutionary relatedness, they classify viruses based on their type of genome and mode of replication. Unlike forms of cellular life, there is no evidence that all viruses share a single common ancestor. Different types of virus may have evolved independently more than once. Since classification of viruses based on evolutionary relatedness is not possible, other criteria are necessary. One of the most useful classifications is the type of nucleic acid the virus contains and how the messenger RNA, which produces viral proteins, is synthesized. The classification is called the Baltimore system after David Baltimore, who devised it.

According to the Baltimore system, there are seven major groups of viruses, designated I–VII, as shown in Fig. 13.16. These groups are largely based on whether their nucleic acid is double-stranded DNA, double-stranded RNA, partially double-stranded and partially single-stranded, or single-stranded RNA or DNA with a positive (+) or negative (–) sense. The sense of a nucleic acid molecule is positive if its sequence is the same as the sequence of the mRNA that is used for protein synthesis, and negative if it is the complementary sequence. For example, a (+)RNA strand has the same nucleotide sequence as the mRNA, whereas a (–)RNA strand has the complementary sequence. Similarly, a (+)DNA strand has the same sequence as the mRNA, and a (–)DNA strand has the complementary sequence (except that U in RNA is replaced with T in DNA). Because mRNA is synthesized from a DNA template, it is the (–)DNA strand that is used for mRNA synthesis (Chapter 3).

Two groups synthesize mRNA by the enzyme reverse transcriptase, and therefore are placed into their own groups (VI and VII). Reverse transcriptase is an RNA-dependent DNA polymerase that uses a single-stranded RNA as a template to synthesize a DNA strand that is complementary in sequence to the RNA (Fig. 13.16). The reverse transcriptase then displaces the RNA template and replicates the DNA strand to produce a double-stranded DNA molecule that can be incorporated into the host genome. In synthesizing DNA from an RNA template, the enzyme reverses the usual flow of genetic information from DNA to RNA. This capability is so unusual and was so unexpected that many molecular biologists at first doubted whether such an enzyme could exist. Finally the enzyme was purified and its properties verified, for which its discoverers, Howard Temin and David Baltimore, were awarded the Nobel Prize in Physiology or Medicine in 1975, shared with Renato Dulbecco.

As we saw earlier, genome size varies greatly among different viruses. RNA viral genomes tend to be smaller than DNA viral genomes. Most eukaryotic viruses that have RNA genomes and most plant viruses, including tobacco mosaic virus, are in group IV. Among bacterial and archaeal viruses, most genomes consist of double-stranded DNA.

Viruses typically have a high rate of mutation, making pathogenic viruses a moving target for the immune system. The highest rates of mutation per nucleotide per replication are found among RNA viruses and retroviruses, including HIV. Lower rates occur in DNA viruses, and even lower rates in unicellular organisms such as bacteria and yeast. The rates of mutation per nucleotide per DNA replication are nearly the same for all multicellular animals, including mice and humans. Fig. 14.1 compares the rates of newly arising mutations in a given base pair in a single round of replication in viruses and several types of organisms. Most of these mutations are due to errors in replication.

Viruses are capable of self-assembly.

The genome of a virus contains the genetic information needed to specify all the structural components of the virus. Progeny virus particles are formed according to molecular self-assembly: When the viral components are present in the proper relative amounts and under the right conditions, the components interact spontaneously to assemble themselves into the mature virus particle.

Tobacco mosaic virus illustrates the process of self-assembly. In the earliest stages, the coat-protein monomers assemble into two circular layers forming a cylindrical disk. This disk binds with the RNA genome, and the combined structure forms the substrate for polymerization of all the other protein monomers into a helical filament that incorporates the rest of the RNA as the filament grows (Fig. 13.19). The mature virus particle consists of 2130 protein monomers and 1 single-stranded RNA molecule of 6400 ribonucleotides.

V.2 Viral diseases

As we have discussed, viruses have diverse genomes and take on many different forms to infect a specific range of hosts. Many viruses have little noticeable effects on their hosts, but many have pathogenic, or harmful, effects. Here, we examine two well-known viruses that infect humans, the human immunodeficiency virus (HIV) and the influenza virus, to see how their genomes and structure contribute to their virulence.

HIV infects cells of the immune system.

HIV causes acquired immune deficiency syndrome, or AIDS. It is transmitted through bodily fluids and infects a specific type of cell involved in the immune system, called T cells. As a result, the body eventually becomes susceptible to pathogens that the body would usually be able to easily to defend itself against, as well as to certain forms of cancer. It was discovered in the early 1980s and today about 34 million people are infected with the virus worldwide.

Whereas the genome of all cells consists of double-stranded DNA, the genome of HIV is single-stranded RNA. The sequence of the HIV genome identifies it as a retrovirus that replicates by a DNA intermediate that can be incorporated into the host genome. More narrowly, the sequence of the HIV genome groups it among the mammalian lentiviruses, so named because of the long lag between the initial time of infection and the appearance of symptoms (lenti- means “slow”).

Fig. 13.6 shows the evolutionary relationships among a sample of lentiviruses, grouped according to the similarity of their genome sequences. The evolutionary tree shows that closely related viruses have closely related hosts. For example, simian lentiviruses are more closely related to one another than they are to cat lentiviruses. This observation implies that the genomes of the viruses evolve along with the genomes of their hosts. A second feature shown by the evolutionary tree is that human HIV originated from at least two separate simian viruses that switched hosts from simians (most likely chimpanzees) to humans.

The annotated sequence of the HIV genome tells us a lot about the biology of HIV (Fig. 13.7). The open reading frame denoted gag encodes protein components of the capsid, pol encodes proteins needed for reverse transcription of the viral RNA into DNA and incorporation into the host genome, and env encodes proteins that are embedded in the lipid envelope. The annotation in Fig. 13.7 also includes the genes tat and rev, encoding proteins essential for the HIV life cycle, as well as the genes vif, vpr, vpu, and nef, which encode proteins that enhance virulence in organisms. Identification of the genes necessary to complete the HIV cycle is the first step to finding drugs that can interfere with the cycle and prevent infection.

How does HIV gain entry into T cells? HIV has a surface glycoprotein (a product of the env region in the annotated HIV genome shown in Fig. 13.7). This molecule interacts with a receptor called CD4 on the surface of T cells to gain entry into these cells. Interaction with CD4 alone, however, does not enable the virus to infect the T cell. The HIV surface glycoprotein also interacts with another receptor on the T cell, which is denoted CCR5, in the early stages of infection (Fig. 15.3). The normal function of CCR5 is to bind certain small secreted proteins that promote tissue inflammation in response to infection. But because CCR5 is also an HIV receptor, cells lacking CCR5 are more difficult to infect.

A beneficial effect of a particular mutation in the CCR5 gene was discovered in studies focusing on HIV patients whose infection had not progressed to full-blown AIDS after 10 years or more. The protective allele is denoted the Δ32 allele because the mutation is a 32-base-pair deletion in the coding sequence of the CCR5 gene (Fig. 15.4). Because 32 is not a multiple of 3, the reading frame for translation is shifted at the site of the deletion, and instead of the normal amino acid sequence Ser–Gln–Tyr–Gln–Phe, the mutant sequence is Ile–Lys–Asp–Ser–His. Not only is the amino acid sequence different from the nonmutant form, the ribosome encounters a stop codon a mere 26 amino acids farther along and translation terminates. The resulting mutant protein is 215, not 352, amino acids long. The CCR5 protein produced by the Δ32 allele is completely inactive.

The effect of the Δ32 allele is pronounced. In individuals with the homozygous Δ3232 genotype, HIV progression to AIDS is rarely observed. There is some protection even in individuals with heterozygous Δ32 genotypes, where progression to AIDS is delayed by an average of about 2 years after infection by HIV.

Much has been written about the evolutionary history of the Δ32 allele. It is found almost exclusively in European populations, where the frequency of heterozygous genotypes ranges from 10% to 25%. The narrow geographical distribution was originally interpreted to mean that the allele was selected over time because it provided protection against some other infectious agent that also interacted with the CCR5 protein.

Influenza causes the flu.

The influenza virus, or flu virus, causes the flu and, in some cases, also has ways to evade the immune system. The flu virus can change its outer protein coat so immune cells called memory cells, produced from earlier infections, no longer recognize it.

There are three major types of flu virus—called A, B, and C—and many different strains. For example, the swine flu pandemic of 2009 was caused by a type A flu virus of the strain H1N1. These strains differ in several respects, including the structure of a cell-surface glycoprotein called hemagglutinin (HA). HA binds to epithelial cells and controls viral entry into these cells. It can bind only to cells that display a complementary cell-surface protein. As a result, hemagglutinin determines in part which organisms the flu virus infects—humans, pigs, or birds—and which part of the body it infects—the nose, throat, or lungs.

Cells infected by viruses secrete cytokines that bring macrophages, T cells, and B cells to the site of infection. These cytokines, produced in abundance, lead to many of the symptoms commonly associated with the flu. For example, the cytokine interleukin-1 produces fever.

Once in the cell, the virus replicates its genome and makes more virus particles (Chapter 19). However, viral replication is prone to error, so there is a high rate of mutation that leads to changes in the amino acid sequences of antigens present on the viral surface, including HA. This process is called antigenic drift (Fig. 43.18a). It allows a population of viruses to evolve over time and evade memory B and T cells that remember past infections.

Antigenic drift leads to a gradual change in the virus over time. The flu virus is also capable of sudden changes by a process known as antigenic shift (Fig. 43.18b). The viral genome consists of eight linear RNA strands. If a single cell is infected with two or more different flu strains at one time, the RNA strands can reassort to generate a new strain. H1N1, for example, has genetic elements from human, pig, and bird flu viruses.

Vaccines provide protection against pathogens, including viruses.

Antigenic drift and shift make it difficult to predict from year to year which flu strains will be most prevalent and therefore what vaccine will be most effective. In a vaccine, an antigen from a pathogen is deliberately given to a patient to induce an immune response but not the disease, thereby providing future protection from infection by the same pathogen.

Vaccination was discovered by a combination of observation and experiment. Until the early 1700s, it was common practice to inoculate people with smallpox to induce a mild disease and prevent a more severe, even lethal, infection. It was also well known that milkmaids, exposed to the relatively benign cowpox virus from milking cows, were immune to the related but more deadly smallpox virus. In 1796, the English scientist Edward Jenner specifically tested the consequences of exposure to cowpox by inoculating a young boy with cowpox and demonstrating that he became immune to smallpox. In fact, the word “vaccine” is derived from the Latin vacca, which means “cow.”

Today, vaccines take many forms: They can be a protein or a part of a protein from the pathogen, a live but weakened form of the pathogen, or a killed pathogen. They are among the most effective public health measures ever developed.

V.3 Viruses as models and tools for biological research

In addition to causing diseases, such as AIDS or the flu, viruses provide a model for biological research. For example, the central dogma states that genetic information can be transferred from DNA to RNA to protein (Fig. 3.3). The hypothesis of an RNA intermediary that carries genetic information from DNA to the ribosomes was supported by an experiment using viruses carried out by Sydney Brenner, François Jacob, and Matthew Meselson. They used the virus T2, which infects cells of the bacterium E. coli and uses the cellular machinery to produce viral proteins. The researchers found that the infected cells produce a burst of RNA molecules shortly after infection and before viral proteins are made. This finding and others suggested that RNA is used to retrieve the genetic information stored in DNA for use in protein synthesis.

Interestingly, some exceptions to the central dogma have been discovered, notably in viruses, including the transfer of genetic information from RNA to DNA (as in HIV), and from RNA to RNA (as in replication of the genetic material of influenza virus). Nevertheless, the central dogma still conveys the basic idea that, in most cases, the flow of information is from DNA to RNA to protein.

Viruses have been used to study transcriptional regulation.

For the processes of the central dogma to work in a living organism to produce the traits that we see, they must be coordinated so that genes are only expressed, or turned on, in the right place and time, and in the right amount. Gene regulation encompasses the ways in which cells control gene expression, and it can occur at any step in gene expression. Viruses are vital research tools in studying certain forms of gene regulation, especially at the level of transcription (Chapter 19).

Bacterial cells are susceptible to infection bacteriophages, among which is a type that can undergo one of two fates when infecting a cell. The best known example is bacteriophage λ (lambda), which infects cells of E. coli. The possible results of λ infection are illustrated in Fig. 19.20.

Upon infection, the linear DNA of the phage genome is injected into the bacterial cell, and almost immediately the ends of the molecule join to form a circle. In normal cells growing in nutrient medium, the usual outcome of infection is the lytic pathway, shown on the left in Fig. 19.20. In the lytic pathway, the virus hijacks the cellular machinery to replicate the viral genome and produce viral proteins. After about an hour, the infected cell undergoes lysis and bursts open to release a hundred or more progeny phage capable of infecting other bacterial cells.

The alternative to the lytic pathway is lysogeny, shown on the right in Fig. 19.20. In lysogeny, the bacteriophage DNA and the bacterial DNA undergo a process of recombination at a specific site in both molecules, which results in a bacterial DNA molecule that now includes the bacteriophage DNA. Lysogeny often takes place in cells growing in poor conditions. The relative sizes of the DNA molecules in Fig. 19.20 are not to scale. In reality, the length of the bacteriophage DNA is only about 1% of that of the bacterial DNA. When the bacteriophage DNA is integrated by lysogeny, the only bacteriophage gene transcribed and translated is one that represses the transcription of other phage genes, preventing entry into the lytic pathway. The bacteriophage DNA is replicated along with the bacterial DNA and transmitted to the bacterial progeny when the cell divides. Under stress, such as exposure to ultraviolet light, recombination is reversed, freeing the phage DNA and initiating the lytic pathway.

At the molecular level, the choice between the lytic and lysogenic pathways is determined by the positive and negative regulatory effects of a small number of bacteriophage proteins produced soon after infection. Which pathway results depends on the outcome of a competition between the production of a protein known as cro and that of another protein known as cI. If the production of cro predominates, the lytic pathway results; if cI predominates, the lysogenic pathway takes place.

Fig. 19.21 shows the small region of the bacteriophage DNA in which the key interactions take place. Almost immediately after infection and circularization of the bacteriophage DNA, transcription takes place from the promoters PL and PR. Transcription of genes controlled by the PR promoter results in a transcript encoding the proteins cro and cII. The cro protein represses transcription of a gene controlled by another promoter PM, which encodes the protein cI. In normal cells growing in nutrient medium, proteases present in the bacterial cell degrade cII and prevent its accumulation. With cro protein preventing cI expression and cII protein unable to accumulate, transcription of bacteriophage genes in the lytic pathway takes place, including those genes needed for bacteriophage DNA replication, those encoding proteins in the bacteriophage head and tail, and, finally, those needed for lysis.

Alternatively, in bacterial cells growing in poor conditions, reduced protease activity allows cII protein to accumulate. When cII protein reaches a high enough level, it stimulates transcription from the promoter PE. The transcript from PE includes the coding sequence for cI protein, and the cI protein has three functions:

  • It binds with the operator OR and prevents further expression of cro and cII.
  • It stimulates transcription of its own coding sequence from the promoter PM, establishing a positive feedback loop that keeps the level of cI protein high.
  • It binds with the operator OL and prevents further transcription from PL.

The result is that cI production shuts down transcription of all bacteriophage genes except its own gene, and this is the regulatory state that produces lysogeny. (The protein needed for recombination between the bacteriophage DNA and the bacterial DNA is produced by transcription from the PL promoter before it is shut down by cI.) When cells that have undergone lysogeny are exposed to ultraviolet light or certain other stresses, the cI protein is degraded. In this case, cro and cII are produced again, and the lytic pathway follows.

Regulation of the lytic and lysogenic pathways works to the advantage of bacteriophage λ, but the process is not like something an engineer might design. That is because biological systems are not engineered, they evolve. Regulatory mechanisms are built up over time by the selection of successive mutations. Each evolutionary step refines the regulation in such a way as to be better adapted to the environment than it was before.

Viruses have helped us understand the genetic basis of cancer.

Some viruses are known to cause uncontrolled cell division, or cancer. HPV, discussed earlier, is an example. Because viruses usually carry only a handful of genes, it is relatively easy to identify which of those genes is involved in cancer. The investigation of cancer-causing viruses therefore provided major insights into our understanding of cancer. In the first decade of the twentieth century, Peyton Rous studied cancers called sarcomas in chickens (Fig. 11.19). His work and that of others led to the discovery of the first virus known to cause cancer in animals, named the Rous sarcoma virus.

As discussed in Chapter 9, growth factors normally bind to cell-surface receptors, which in turn activate several types of proteins inside the cell that promote cell division. The gene from the Rous sarcoma virus that promotes uncontrolled cell division encodes an overactive protein kinase similar to receptor kinases that transmit signals to the interior of cells. This viral gene is named v-src, for viral-src (pronounced “sarc” and short for “sarcoma,” the type of cancer it causes).

The v-src gene is one of several examples of an oncogene, or cancer-causing gene, found in viruses. A real surprise was the discovery that the v-src oncogene is found not just in the Rous sarcoma virus. It is an altered version of a gene normally found in the host animal cell, known as c-src (cellular-src). The c-src gene plays a role in the normal control of cell division during embryonic development.

The discovery that the v-src oncogene has a normal counterpart in the host cell was an important step toward determining the cellular genes that participate in cell growth and division. These normal cellular genes are called proto-oncogenes. They are involved in cell division, but do not themselves cause cancer. Only when they are mutated to become oncogenes do they have the potential to cause cancer. Today, we know of scores of proto-oncogenes, most of which were identified through the study of cancer-causing viruses in chickens, mice, and cats.

Oncogenes also play a major role in human cancers. Most human cancers are not caused by viruses. Instead, human proto-oncogenes can be mutated into cancer-causing oncogenes by environmental agents such as chemical pollutants. For example, organic chemicals called aromatic amines present in cigarette smoke can enter cells and damage DNA, resulting in mutations that can convert a proto-oncogene into an oncogene.

Some plant defense mechanisms were discovered following viral infection.

The study of viruses has also helped us to understand how plants protect themselves from pathogens. In 1961, American plant pathologist A. F. Ross reported that when individual tobacco leaves were infected with the tobacco mosaic virus (TMV), they developed necrotic patches. However, when uninfected leaves of the same plant were subsequently exposed to the virus, they suffered little or no damage (Fig. 32.7). Ross called this ability to resist future infections systemic acquired resistance (SAR). Initially reported for viral infections, SAR occurs in response to a wide range of pathogens, especially when infection results in necrosis due to either a hypersensitive response or a necrotrophic pathogen.

How do uninfected leaves acquire resistance to pathogens? One hypothesis is that a chemical signal is transported from the infected region through the phloem. This chemical signal then triggers the expression of genes encoding many of the same proteins that defend against the pathogen in infected cells. The identity of the mobile signal has proved elusive, but experiments show that salicylic acid—the chemical basis of aspirin—is required for SAR. The methylated form of salicylic acid, known familiarly as oil of wintergreen, is also a potent inducer of defense responses in plants. Because methyl salicylic acid vaporizes readily, the transport of this compound through the air may play a role in signaling the presence of pathogens. Airborne signals may be particularly important for leaves: Because leaves export carbohydrates rather than receive them, they are not able to receive signaling molecules transported in the phloem.

Systemic acquired resistance increases the ability of uninfected tissues to resist infection. In general, these responses are not highly specific. Rather, SAR activates defenses effective against broad classes of pathogens. Only in the case of viral infections is the response targeted to an individual pathogen.

Viruses can be potent infectious agents, but plants have evolved responses to viral infection that are much like those mounted against other pathogens. In addition, plants have a form of defense that targets the virus specifically (Fig. 32.8). Most plant viruses have genomes made of single-stranded RNA (ssRNA). During the replication of the viral genome inside the plant cell, double-stranded RNA molecules (dsRNA) are formed. Because plant cells do not normally make double stranded RNA, the replicating viral genomes are identified as foreign. Enzymes produced by the plant cell cleave the double-stranded RNA molecules into small pieces of 21 to 24 nucleotides, forming fragments called small interfering RNA, or siRNA (Chapter 3). These fragments bind with specific protein complexes in the cell. The RNA-protein complexes then play a role in targeting and destroying single-stranded RNA molecules that have a complementary sequence—that is, the viral genome (Chapter 19). The virus is unable to replicate and thus cannot spread. In this way, the plant cell uses RNA chemistry to eliminate viral infections.

When a cell is attacked by a virus, the siRNA molecules that are produced in response to the initial infection can also spread, allowing the plant to acquire immunity against specific viruses. The siRNA molecules move through plasmodesmata, enter the phloem, and from there spread throughout the plant. Thus, the systemic response to viral infection consists of both general defense responses triggered by salicylic acid and the transport of highly specific molecules that can target and destroy viral genomes throughout the plant.

A bacterial defense mechanism against viruses is used as a powerful DNA editing tool.

In addition to providing models for biological research, the study of viruses and viral infections has given us powerful tools for use in the laboratory. For example, scientists are often interested in altering the nucleotide sequence of genes. In this way, they can introduce specific mutations into genes to better understand their function, or correct mutant versions of genes to restore normal function. Collectively, these techniques are known as DNA editing.

One of the newest and most exciting ways to edit DNA goes by the acronym CRISPR (clustered regularly interspaced short palindromic repeats), and it was discovered in an unexpected way. Researchers noted that about half of all species of bacteria and most species of Archaea contain small segments of DNA of about 20–50 base pairs derived from viruses, but their function was at first a mystery. Later, it was discovered that they play a role in the bacterial defense against viruses.

When a bacterium is infected by a virus for the first time, it makes a copy of part of the viral genome and incorporates it into its genome. On subsequent infection by the same virus, the DNA copy of the viral genome is transcribed to RNA that combines with a protein that has a DNA-cleaving function. The RNA serves as a guide to identify target DNA in the virus by complementary base pairing, and the protein cleaves the target DNA. In this way, bacteria “remember” and defend themselves from past infections. The phrase “clustered regularly interspaced short palindromic repeats” describes the organization of the viral DNA segments in the bacterial genome.

In modern genetic engineering, the CRISPR mechanism is put to practical use to alter the nucleotide sequence of almost any gene in any kind of cell. One method is outlined in Fig. 12.21. The first step is to transform a cell with a plasmid containing sequences that code for a CRISPR RNA as well as the CRISPR-associated protein Cas9. The RNA contains a region that can form a hairpin- shaped structure, as well as a region engineered to have bases complementary to any DNA molecule in the cell to be altered, known as the target DNA (Fig. 12.21a). When the RNA undergoes base pairing with the target DNA, Cas9 cleaves the target DNA (Fig. 12.21b). Exonucleases in the cell then expand the gap (Fig. 12.21c). The gap can be repaired using another DNA molecule that serves as a template for editing the target DNA (Fig. 12.21d). This editing template DNA is introduced to the cell by a plasmid and contains a sequence of interest to replace the degraded sequence of the target DNA, flanked by sequences complementary to the target. The strands of the gapped target DNA undergo base pairing with the complementary ends of the editing template, and DNA synthesis elongates the target DNA strands and closes the gap (Fig. 12.21d). The result is that the target DNA is restored, but its sequence has been altered according to the sequence present in the editing template (Fig. 12.21e).

DNA editing by CRISPR is technically straightforward and highly efficient. The method has generated great interest because of its potential to correct genetic disorders of the blood, immune system, or other tissues and organs in which only a subset of cells with restored function can alleviate symptoms.

As we have seen, viruses are diverse and cause disease, but also provide models and tools for our understanding of basic biological problems. We summarize our discussion of viral structure, diversity, replication, host range, and effects on organisms in Fig. 19.22.

Core Concepts Summary

V.1 Viruses are diverse, but all contain genetic material and a protein coat.

A virus is an infectious agent that contains a nucleic acid genome inside a protein capsid and sometimes a lipid envelope.

A virus infects a cell by binding to the cell’s surface, inserting its genetic material into the cell, and using the cellular machinery to produce more viruses.

Viruses cannot replicate on their own and so are not considered living.

Viruses can infect all types of organism, but a given virus can infect only some types of cell.

The host range of a virus can be narrow or broad, and is determined by interactions of viral molecules with cell-surface molecules.

Viruses show a wide variety of sizes, but are typically very small.

Viruses show a diversity of shapes, including head-and-tail, helical, and icosahedral.

Viruses can be classified by the Baltimore system, which defines seven groups on the basis of type of nucleic acid and the way the mRNA is synthesized.

Viruses have a high rate of mutation.

Viruses are capable of molecular self-assembly under the appropriate conditions.

V.2 Some viruses cause disease.

The human immunodeficiency virus (HIV), which causes AIDS, is a retrovirus, and it contains the genes gag, pol, and env.

HIV gains entry into T cells by means of an interaction of the product of the env gene with cell-surface molecules called CD4 and CCR5.

Influenza virus, which causes the flu, evades the immune system by antigenic drift and shift.

Vaccination involves giving an antigen from a pathogen to a patient to induce an immune response but not the disease, providing future protection from infection by the same pathogen.

V.3 Viruses provide useful models and tools for biological research.

The lytic and lysogenic pathways of bacteriophage λ have been well studied as a model of gene regulation.

When bacteriophage λ infects E. coli, it can lyse the cell (the lytic pathway) or its DNA can become integrated into the bacterial genome (the lysogenic pathway).

In infection of E. coli cells by bacteriophage λ, predominance of cro protein results in the lytic pathway, whereas predominance of the cI protein results in the lysogenic pathway.

Cancer results when mechanisms that promote cell division are inappropriately activated or the normal checks on cell division are lost.

Cancers can be caused by certain viruses carrying oncogenes that promote uncontrolled cell division.

Viral oncogenes have cellular counterparts called proto-oncogenes that play normal roles in cell growth and division and that, when mutated, can cause cancer.

Plants respond to viral infections by sending signals to uninfected tissues so that they can mount a response (called systemic acquired resistance) and using specific small RNA molecules that target and destroy viral genomes.

Almost any DNA sequence in an organism can be altered by means of a form of DNA editing called CRISPR, first discovered as a bacterial defense mechanism against viruses.