PROBLEMS

WORKING WITH THE FIGURES

Question 14.1

Based on Figure 14-2, why must the DNA fragments sequenced overlap in order to obtain a genome sequence?

Question 14.2

Filling gaps in draft genome sequences is a major challenge. Based on Figure 14-6, can paired-end reads from a library of 2-kb fragments fill a10-kb gap?

Question 14.3

In Figure 14-9, how are the positions of codons determined?

Question 14.4

In Figure 14-10, expressed sequence tags (ESTs) are aligned with genomic sequence. How are ESTs helpful in genome annotation?

Question 14.5

In Figure 14-10, cDNA sequences are aligned with genomic sequence. How are cDNA sequences helpful in genome annotation? Are cDNAs more important for bacterial or eukaryotic genome annotations?

Question 14.6

Based on Figure 14-14 and the features of ultraconserved elements, what would you predict you’d observe if you injected a reporter-gene construct of the rat ortholog of the ISL1 ultraconserved element into fertilized mouse oocytes and examined reporter gene expression in the developing embryo?

Question 14.7

Figure 14-17 shows syntenic regions of mouse chromosome 11 and human chromosome 17. What do these syntenic regions reveal about the genome of the last common ancestor of mice and humans?

Question 14.8

In Figure 14-18, what key step enables exome sequencing and distinguishes it from whole-genome sequencing?

Question 14.9

The genomes of two E. coli strains are compared in Figure 14-19. Would you expect any third strain to contain more of the blue, tan, or red regions shown in Figure 14-19? Explain.

Question 14.10

Figure 14-21 depicts the Gal4-based two-hybrid system. Why don’t the “bait” proteins fused to the Gal4 DNA-binding protein activate reporter-gene expression.

BASIC PROBLEMS

Question 14.11

Explain the approach that you would apply to sequencing the genome of a newly discovered bacterial species.

Question 14.12

Terminal-sequencing reads of clone inserts are a routine part of genome sequencing. How is the central part of the clone insert ever obtained?

Question 14.13

What is the difference between a contig and a scaffold?

Question 14.14

Two particular contigs are suspected to be adjacent, possibly separated by repetitive DNA. In an attempt to link them, end sequences are used as primers to try to bridge the gap. Is this approach reasonable? In what situation will it not work?

Question 14.15

A segment of cloned DNA containing a protein-encoding gene is radioactively labeled and used in an in situ hybridization to chromosomes. Radioactivity was observed over five regions on different chromosomes. How is this result possible?

545

Question 14.16

In an in situ hybridization experiment, a certain clone bound to only the X chromosome in a boy with no disease symptoms. However, in a boy with Duchenne muscular dystrophy (X-linked recessive disease), it bound to the X chromosome and to an autosome. Explain. Could this clone be useful in isolating the gene for Duchenne muscular dystrophy?

Question 14.17

In a genomic analysis looking for a specific disease gene, one candidate gene was found to have a single-base-pair substitution resulting in a nonsynonymous amino acid change. What would you have to check before concluding that you had identified the disease-causing gene?

Question 14.18

Is a bacterial operator a binding site?

Question 14.19

A certain cDNA of size 2 kb hybridized to eight genomic fragments of total size 30 kb and contained two short ESTs. The ESTs were also found in two of the genomic fragments each of size 2 kb. Sketch a possible explanation for these results.

Question 14.20

A sequenced fragment of DNA in Drosophila was used in a BLAST search. The best (closest) match was to a kinase gene from Neurospora. Does this match mean that the Drosophila sequence contains a kinase gene?

Question 14.21

In a two-hybrid test, a certain gene A gave positive results with two clones, M and N. When M was used, it gave positives with three clones, A, S, and Q. Clone N gave only one positive (with A). Develop a tentative interpretation of these results.

Question 14.22

You have the following sequence reads from a genomic clone of the Drosophila melanogaster genome:

Read 1: TGGCCGTGATGGGCAGTTCCGGTG

Read 2: TTCCGGTGCCGGAAAGA

Read 3: CTATCCGGGCGAACTTTTGGCCG

Read 4: CGTGATGGGCAGTTCCGGTG

Read 5: TTGGCCGTGATGGGCAGTT

Read 6: CGAACTTTTGGCCGTGATGGGCAGTTCC

Use these six sequence reads to create a sequence contig of this part of the D. melanogaster genome.

Question 14.23

Sometimes, cDNAs turn out to be “chimeras”; that is, fusions of DNA copies of two different mRNAs accidentally inserted adjacently to each other in the same clone. You suspect that a cDNA clone from the nematode Caenorhabditis elegans is such a chimera because the sequence of the cDNA insert predicts a protein with two structural domains not normally observed in the same protein. How would you use the availability of the entire genomic sequence to assess if this cDNA clone is a monster or not?

Question 14.24

In browsing through the human genome sequence, you identify a gene that has an apparently long coding region, but there is a two-base-pair deletion that disrupts the reading frame.

  1. How would you determine whether the deletion was correct or an error in the sequencing?

  2. You find that the exact same deletion exists in the chimpanzee homolog of the gene but that the gorilla gene reading frame is intact. Given the phylogeny of great apes below, what can you conclude about when in ape evolution the mutation occurred?

Question 14.25

In browsing through the chimpanzee genome, you find that it has three homologs of a particular gene, whereas humans have only two.

  1. What are two alternative explanations for this observation?

  2. How could you distinguish between these two possibilities?

Question 14.26

The platypus is one of the few venomous mammals. The male platypus has a spur on the hind foot through which it can deliver a mixture of venom proteins. Looking at the phylogeny in Figure 14-15, how would you go about determining whether these venom proteins are unique to the platypus?

Question 14.27

You have sequenced the genome of the bacterium Salmonella typhimurium, and you are using BLAST analysis to identify similarities within the S. typhimurium genome to known proteins. You find a protein that is 100 percent identical in the bacterium Escherichia coli. When you compare nucleotide sequences of the S. typhimurium and E. coli genes, you find that their nucleotide sequences are only 87 percent identical.

  1. Explain this observation.

  2. What do these observations tell you about the merits of nucleotide- versus protein-similarity searches in identifying related genes?

Question 14.28

To inactivate a gene by RNAi, what information do you need? Do you need the map position of the target gene?

Question 14.29

What is the purpose of generating a phenocopy?

Question 14.30

What is the difference between forward and reverse genetics?

Question 14.31

Why might exome sequencing fail to identify a disease-causing mutation in an affected person?

546

CHALLENGING PROBLEMS

Question 14.32

You have the following sequence reads from a genomic clone of the Homo sapiens genome:

Read 1: ATGCGATCTGTGAGCCGAGTCTTTA

Read 2: AACAAAAATGTTGTTATTTTTATTTCAGATG

Read 3: TTCAGATGCGATCTGTGAGCCGAG

Read 4: TGTCTGCCATTCTTAAAAACAAAAATGT

Read 5: TGTTATTTTTATTTCAGATGCGA

Read 6: AACAAAAATGTTGTTATT

  1. Use these six sequence reads to create a sequence contig of this part of the H. sapiens genome.

  2. Translate the sequence contig in all possible reading frames.

  3. Go to the BLAST page of the National Center for Biotechnology Information, or NCBI and see if you can identify the gene of which this sequence is a part by using each of the reading frames as a query for protein–protein comparison (BLASTp).

Question 14.33

Some sizable regions of different chromosomes of the human genome are more than 99 percent nucleotide identical with one another. These regions were overlooked in the production of the draft genome sequence of the human genome because of their high level of similarity. Of the techniques discussed in this chapter, which would allow genome researchers to identify the existence of such duplicate regions?

Question 14.34

Some exons in the human genome are quite small (less than 75 bp long). Identification of such “microexons” is difficult because these distances are too short to reliably use ORF identification or codon bias to determine if small genomic sequences are truly part of an mRNA and a polypeptide. What techniques of “gene finding” can be used to try to assess if a given region of 75 bp constitutes an exon?

Question 14.35

You are studying proteins having roles in translation in the mouse. By BLAST analysis of the predicted proteins of the mouse genome, you identify a set of mouse genes that encode proteins with sequences similar to those of known eukaryotic translation-initiation factors. You are interested in determining the phenotypes associated with loss-of-function mutations of these genes.

  1. Would you use forward- or reverse-genetics approaches to identify these mutations?

  2. Briefly outline two different approaches that you might use to look for loss-of-function phenotypes in one of these genes.

Question 14.36

The entire genome of the yeast Saccharomyces cerevisiae has been sequenced. This sequencing has led to the identification of all the open reading frames (ORFs, gene-size sequences with appropriate translational initiation and termination signals) in the genome. Some of these ORFs are previously known genes with established functions; however, the remainder are unassigned reading frames (URFs). To deduce the possible functions of the URFs, they are being systematically, one at a time, converted into null alleles by in vitro knockout techniques. The results are as follows:

15 percent are lethal when knocked out.

25 percent show some mutant phenotype (altered morphology, altered nutrition, and so forth).

60 percent show no detectable mutant phenotype at all and resemble wild type.

Explain the possible molecular-genetic basis of these three mutant categories, inventing examples where possible.

Question 14.37

Different strains of E. coli are responsible for enterohemorrhagic and urinary tract infections. Based on the differences between the benign K-12 strain and the enterohemorrhagic O157:H7 strain, would you predict that there are obvious genomic differences

  1. between K-12 and uropathogenic strains?

  2. between O157:H7 and uropathogenic strains?

  3. What might explain the observed pair-by-pair differences in genome content?

  4. How might the function of strain-specific genes be tested?