Chapter 1. Working With Data 21.4

Working with Data: HOW DO WE KNOW? Fig. 21.4

Fig. 21.4 describes the process by which researchers commonly study allele frequencies in a population. Answer the questions after the figure to practice interpreting data and understanding experimental design. Some of these questions refer to concepts that are explained in the following three brief data analysis primers from a set of four available on LaunchPad:

  • Experimental Design
  • Data and Data Presentation
  • Statistics

You can find these primers by clicking on the button labeled “Resources” in the menu at the upper right on your main LaunchPad page. Within the following questions, click on “Primer Section” to read the relevant section from these primers. Click on the button labeled “Key Terms” to see pop-up definitions of boldfaced terms.

HOW DO WE KNOW?

FIG. 21.4: How is genetic variation measured?

BACKGROUND The introduction of protein gel electrophoresis in 1966 gave researchers the opportunity to identify differences in amino acid sequence in proteins both among individuals and, in the case of heterozygotes, within individuals. Proteins with different amino acid sequences run at different rates through a gel in an electric field. Often, a single amino acid difference is enough to affect the mobility of a protein in a gel.

METHOD Starting with crude tissue—the whole body of a fruit fly, or a blood sample from a human—we load the material on a gel, and turn on the current. The rate at which a protein migrates depends on its size and charge, both of which may be affected by its amino acid sequence. To visualize the protein at the end of the gel run, we use a biochemical indicator that produces a stain when the protein of interest is active. The result is a series of bands on the gel.

RESULTS The genotypes of eight individuals for a gene with two alleles are analyzed. Four are allele 1 homozygotes; two are allele 2 homozygotes; and two are heterozygotes. Note that the heterozygotes do not stain as strongly on the gel because each band has half the intensity of the single band in the homozygote. We can measure the allele frequencies simply by counting the alleles. Each homozygote has two of the same allele, and each heterozygote has one of each.

Total number of alleles in the population = 8 × 2 = 16

Number of allele 1 in the population = 2 × (number of allele 1 homozygotes) + (number of heterozygotes) = 8 + 2 = 10

Frequency of allele \(1=\frac{10}{16}= \frac{5}{8}\)

Number of allele 2 in the population = 2 × (number of allele 2 homozygotes) + (number of heterozygotes) = 4 + 2 = 6

Frequency of allele \(2=\frac{6}{16}= \frac{3}{8}\)

Note that the two allele frequencies add to 1.

CONCLUSION We now have a profile of genetic variation at this gene for these individuals. Population genetics involves comparing data such as these with data collected from other populations to determine the forces shaping patterns of genetic variation.

FOLLOW-UP WORK This technique is seldom used these days because it is easy now to recover much more detailed genetic information about genetic variation from DNA sequencing.

SOURCE Lewontin, R. C., and J. L. Hubby. 1966. “A Molecular Approach to the Study of Genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura.” Genetics 54:595–609.

Question

In a new experiment using the same eight fruit flies from a single population that were used in the study in Fig. 21.4, we study protein variation at another locus, Locus II. Like the first locus, Locus II has two common alleles, which can be separated by gel electrophoresis, with the slower allele designated “S” and the faster one “F”. Here is the gel:

jS8/bMGyhNE4jTYd+ZZU3qElDkYjbeHcYAYmvBPJ5sp+y4lOHZOXMePitHegWtw6WtBfEfgYxR1sNxlsbuo6NkAeyfblxIKmYLh5t4OlYJMGg+EpAAO0jjuhzKGNk7eOuCD4j/+LFius74KeqsFTy12tAHjbYV6379rDhPyL4XStES9cKRumx/W29Y2NSS5zSybiLsCNNn9IoQVUgOjzpRfu3cmgCASbSMz5EKZTgi1CMiBaVG+zSHnoyd4XPMYQJLXi/pHWQ7KRiNdUdVrwXpkkwNYiJUoyWgF8Y1lvtHBvut+w6euGs9pEVdsG9l3TISIVZ7EINlBJGulOaOVEEg==
Correct.
Incorrect.
Incorrect. Please try again.
1

Question

In another study of Locus II in a new population, you find the following genotype data:

20 SS
10 FS
10 FF

2vTMd5N1HtNIQ78CPZ/5iRrzZ/taCmnow/YDWQW4jy5JAGNFnDU1HTvTepRwhHh2hp7YJIBwrYaBY02kpdW0kXGPaRPUPuY62xgqnWzV02gtTljlYhF5RzjSwfFwIm/oxtuVjzAtwr/ZkvS5atN+/S+p1FQf+zbgKOYkD3FaPVh5yWmW1QdTm57PzbibxPjBKWc0lTFf1WKURgDpmEadVm6njnk6uA4U+O4QkCRHux8kQ+ZIytHR2q67ae+0eEQUWwwOKxnWZtjSDXlN7VSgsuM7tZfHyIUudAM0f4dMHe/F3O55fZuDWZlY9zxPozNJm0iJIQIU/JCZzw+ib0clcUKw525l1Yn+D68CQRW/0zUxenRnJzzFotiQot7J/slO1CSqng0WcvNpOjmmI7Ccz22HztUC3vWiCHH8S7GSAOggx0Qu43dqRC58eRaZ1CEdO/Z3dyTS0DOKbc9tZc1mwXnkeVPrrjIUfeusNQDbsjTLITOosojxWEawBKHtqBzj3WNKTeRBP+Zh0vTcN7BveghJItT+XpMCgyKu61+mwpbgwBmq
Correct.
Incorrect.
Incorrect. Please try again.
1

Question

NSATcClsHmt/fIG2J887KcTf1qFdGggjHDXhrgBttorQZ4efE6NVpqDjLOHfZ4JPXQmvzwthh2KtmS5yhM4ezcIQ091A7KpGXEer3ZCet2q5wK2D8sIQKY3t/sKrWC/eljtdqeVupCBjyjC4W8aolWfZL0iMVu3clhXIOqtSWgxi9BriI/qeWuAvXO/jvZSvDvJsXgrTBLV15vGhmacZQHiYHM78TNQhCEgNNej9BPiO9kkSEevZgM0HM1uoNj3PaGCR8jh+CrhkY2YTcrXmtG4LcK9nExf7Cy2se2Xq8G3KnfcRL0b/EyMLjt+kNGj87PTToa3sqNsyGcmfWEJhzRfOmOPX2gnewR6vhS03m5C/JDV55L6thEEImRGYdMbtaWlHYF59pSEDnwaN3JkVNj3yH5g/uUbZL+gfE2L3SNRnzYG1SWebpFCChxndmFm3ouUfETYuBSlaKrSJUYn0u/B1blpHrrKYqi5u8wIgPNFnQUQ/0B34SRPLoXRy3d4wngfWjeOCAubPnEBgZ2LbqWjS6yRE11QxJmq6+xNIZfiDyFhPIceVqmNsmpP5/QWzuI6HwvShnnbtmNozWRa4Sj1GzSxNIhWFMdK2av75b37CTQQ+c3gqr2UJ34CZAlYo7z+6BeA1YOXV9zFWyt1+L1jnshMUeiaS5wa4GnG2Wx5ikrcvO4rDVo2bU9/FG2HQaBL9sunQSya9QDyYwp4c3npgqsq+LBZsexDNfkSYod1hfkmhqgs4MBdeKJnSV8qtWD7mb0Ghrwbwrb6ReVMzC3AGMoPkHlPRA8NpZGtJrTRBA3M63ggsu+MWQmj3U84MWbjsIeWi20/R91SZWoN1qIqoPZT/1q6uFLseJY7ugnU4yjG1I0QXurChmjHph2Zl2qkDXNH+eRObY28NnQcEs89tmU68MEIFpeIT+nM82Qz+F5nweyhM/XlVoZdsIcr3l38k124XpLsMWYXC4pynmrhORkWos8tALug4VKNchZPb/YLJTphC4zAHeW+GlUCZT3gZdbOoT1qtJtr1B4RAw5JJJf4Oj3ve3UYPJOoYa4XrP0EIQbYAttzOV2AfrGGlvfiMfkfKHAqXGgkJd8NIhKH+oG9Umx+AK8iTJVhRRH92HdNvqbijISYK12vpjmSBEsIZkz9F9SmcQ+hwutYNtAlROBJk5xJXGjPJuH6t9jMn2gKM3ZJy1wWxFO4dlNrzQItra0JPAE9BwkxXvyh2/eO6LTtEL1dlALrnMijH3s9mCPWkhjHYZPDv79wuPT31adIUQ8KWOddWfG9/q8fPKuK/+5CaET2whhJsNIQRKkbWpVJ5zNm+urdLOMY+5HNuTD8WbA6LKMh6XdA99NGCACOn8iwRIB7PQBSPBdFctRo3Gkdxev/l3q49b15p44+Bmr8IQGgvXFWgHEcDWeVaGIzWXA99RlhimSezId6Nvm5bv92//tu8X/+us6U=
Correct.
Incorrect.
Incorrect. Please try again.
1

sample The set of objects or events chosen from a larger population from which observations are taken. Typically, sampling is done randomly to ensure that the sample is representative of the total population of objects or events.
Table

Data and Data Presentation

Collecting Data

We can collect data as part of an experiment. For example, we vary the nutrients added to cells in culture to see which are critical for cell proliferation, and count the number of cells after a specified period in each of the different experimental treatments. Or data collection may be exploratory. For example, if we are interested in what mammal species are present in a remote patch of forest, we can simply record what we see as we walk through the forest. The tools needed for data collection vary correspondingly, ranging from expensive, sophisticated scientific hardware to a notebook and a pencil.

Nowadays computers often collect data automatically, meaning that it is possible to accumulate vast quantities of data. With our ability to sequence DNA cheaply and efficiently, genomic data—long strings of A’s, G’s, C’s, and T’s—is an example of the current explosion of mega-datasets. Satellite imagery, as well, supplies a vast reservoir of data about our planet.

Almost always, data represent a sample. We assume that the cells in our experiment are representative of the appropriate class of cells in general, and we assume that the animals we saw in the forest patch are representative of all the animals present in the forest. With this in mind, we have to be careful in designing our method of collecting data. Imagine, for example, that in determining what mammal species live in our patch of forest, we only visited the forest during daylight hours. Any claim to have assessed the forest for all of its inhabitants then is inaccurate because we have overlooked nocturnal species.

Data sometimes need to be weeded. A freak result in one experiment, for example, might have been caused by contamination and should be removed from the analysis because the result is produced by factors unrelated to what we are investigating. Imagine, for example, that we encounter a domestic cat belonging to a local resident in our forest mammal inventory. Given that we are interested in the native mammals, we should exclude this intruder from our data. Data weeding, however, is a tricky area. It is important only to exclude data that are clearly problematic rather than simply eliminating the data that seem to contradict our hypothesis!

Question

We add four new populations to our analysis shown in Fig. 21.4. In each case, we analyze eight individuals. All five populations are found along the east coast of the United States, along a north–south latitudinal gradient. We find the following allele frequency data:

Sample Original 2 3 4 5
Frequency of allele 1 0.625 0.75 0.75 0.875 1.0
Latitude (degrees north of the equator) 23 26 29 31 36
Table
C3tpAjqePffno07xMJ7QrGDr8g/O8vk+AhajXeTjcfEMxFpi1lAKVxpX4c0ifQV7K0sKQBkcpQMJphJ+qJKTbC1iBBReNAeJQqwgbmLlL7cRo5zu4/ZUQQNizmxN4XyDF1EKbb8Tu9tHW37SSwT/9+avAuUk0Kd6YDny6paARDRFN+k8YgvYzCIbRm1y9zZcm7E5jxaZov+81y8nr/3yQaek4Qw0MCY4xHcU+QttmHABkIdEhxBysluS4ZMj1T74KaApNfa9FanwMzbzLOp1tSRRAIpsdkfXIxOQf62+dFkIz5eeskMyfAZR3hAMblE7SikusVprABcWGmfV521761mnOlqf0L9d5NhXrkODGY4R3BUNXhte7xnZb9q5ZiJ2B8QoGqOstPaDkmXFwMPaFrTElbPlmxEZGlaHYydgomQIQvJ9BWwQSpPnNeclyoKpdUcPX5U8J5IEcpKvxuxEuhXeiUNqclIotgyt3gFhRolyUrkjrrdVzsk5rk1B2ScW5kRaC7Zo+1xcp7avkdyQf1wyFhGVO6JFEuvfTQRuJko=
Correct.
Incorrect.
Incorrect. Please try again.
1

Question

In a study of another locus, also with two alleles designated Allele 1 and Allele 2, over several degrees of latitude in North America, the results are presented using pie charts, below.

ZdLe7mOLjh+eNUadRnYtCfLz+/KFFN+OWlMehNiacRxIzMHxg0Y+8D2mCUnBfE2e4KOgZ56UrVWeW8tNKw5DIx1hXYr8g9nMBMhAQrBNg9f5CYXsUDfftggYqTXU4LaWBT0XuV8qx6QmUzGDnuoec1ZrZDeAefeWQOVukfcyWndumYAfihCwmsD2XhJUlMCn32eYdxum25POxEeKDWJ4ULHExZjwuW7BnpZn+iAPl6GogP0AR4Xt+s6OXTymT+1y7A8Wshd0+bPqtPiNRYr/e1W5OdNj1dDo0nwwkg/kUi7mjx9s0n7k60wa96t2lInC1tT/kIPuQRM9QNgtcCIQY0b5W6zFUnXHIcR0I3oSPi2aU1vZn5KxCIudYJNEk24zaqD1QsTA/OsjwWyDOy/uu9L7/Ire6zCTNnIARztGAC9mvsoCOO3FUIuSVpty3rEfPxrlI6GLCRIFHVaQr4FUxxFIeJLRNvSSGN1b+nUvrBMImgZK+sP4s8OZTGe7iQ5ce+s1qTIpf1cMAISJv6ToZgLzmbtudvu4G50nsJ6ZQF8=
Correct.
Incorrect.
Incorrect. Please try again.
1

pie chart A method of presenting data by dividing a circle into “slices,” each representing the proportion of the total contributed by a particular category.
Table

Data and Data Presentation

Graphing Data

Now we can be confident that our numbers are reliable. The next challenge is to present the data. Typically we do this with a graph. Different kinds of data lend themselves to different kinds of graphs. Our mammal species data is discrete—we have clear categories: A, B, C, D, E, and F. For discrete data, either a pie chart or a bar graph would be appropriate. A pie chart divides a circle into “cake slices,” each representing the proportion of the total contributed by a particular category. In our trapping study, we have a total of 61 animals, so the slice representing species A will make an angle at the center of the pie of 17/61 x 360 = 100°. A bar graph represents the frequency of each species as a column whose height is proportional to frequency.

Fig. 1

What about continuous data? Imagine that the data we collected is the body lengths of the mammals we trapped. In this case, we might choose a histogram, which looks similar to a bar chart; only here we have to impose our own categories on a continuum of data. Because they were discrete categories—different species—the columns in the bar graph may have gaps between them. In the histogram, by contrast, there are no gaps between the columns because the end of one range (1–20cm) is continuous with the beginning of the next (20–40cm).

Fig. 2

Often we are plotting two variables against each other. If, for example, we record the time of day that each mammal is trapped, we can plot the total number of mammals trapped over the course of the 24-hour period.

Midnight-2am 2am-4am 4am-6am 6am-8am 8am-10am 10am-12am 12am-2pm 2pm-4pm 4pm-6pm 6pm-8pm 8pm-10pm 10pm-midnight
Number trapped 8 3 2 0 0 0 0 0 1 22 17 8
Cumulative number 8 11 13 13 13 13 13 13 14 36 53 61
Table

Often one variable is independent—time, for example, will elapse regardless of the mammal count. We plot this on the x-axis, the horizontal axis of the graph. The dependent variable—the values that vary as a function of the independent variable (in this case, time of day)—is plotted on the y-axis, the vertical axis of the graph. If there is reason to believe that consecutive measurements are related to each other, points can be connected to each other by a line. Plotting our data on a graph using the values of the independent and dependent variables as coordinates gives us a line graph. This is a good way to identify trends and patterns in data. Here we can see that the mammals in our forest plot tend to be inactive (and therefore unlikely to be trapped) during daylight hours.

Fig. 3

In science, data are typically presented as a scatterplot, in which points are specified by their (x,y) coordinates. Points are not joined to each other by lines unless there are specified connections among them. Here, plotted in a way similar to the line graph (with the independent variable on the x-axis) is a scatterplot showing the time taken to drive from home to campus for a large number of students. The independent variable is the distance traveled; the dependent variable is travel time because the distances are fixed but travel times vary. Overall, there is a positive correlation between travel time and distance (the further you live from campus, the longer, on average, it will take you to get there), but there is plenty of variation as well. Look at the eight points representing the eight students who live five miles from campus. The variation we see in travel time (from 6 minutes to 30 minutes) is a reflection of differences in driving speed, traffic conditions, and route.

Fig. 4

What if there are more than two variables? Three-dimensional plots can be informative (but can also cause the reader headaches). A popular modern solution to this problem is a so-called temperature plot, in which the third dimension is represented in two dimensions through color: red (hot) for a strong effect in the third dimension and blue (cool) for a weak effect.

Graphs are the mainstay of scientific presentation, but you will see many other ways of presenting data in your textbook. For example, studies showing how different genes interact with each other in the course of development are often illustrated using network diagrams that give the reader a direct sense of the “connectedness” of a particular gene (or node). Evolutionary trees reveal the branching pattern of evolution with species that are closely related having a more recent common ancestor than those that are more distantly related.

Methods of presenting data in science are not limited, even in textbooks, by standard approaches. The popular press has developed many graphics-intense ways of presenting data. Think of an electoral map after an election. You can view information on a number of levels: whether the state is red or blue, the name of the election winner, the size of his or her majority, and so on. Scientists are learning that they too can package information in ways that are simultaneously informative and attractive.

Question

JpW7Rl3cQ3DAPqVzJ8iXXn+ShOcBZNO9s6sr1qZ1h9ve4cxMoBcEYLdkq/hNoPZ59HtfCtghLvmbBE9+7ERoYHuKWu4fpzZFPfaLtvQdLz3mSBz/skSjJI7ZyxCUF69LjoPFmkass6nQaNWP+/Z90fdRTLtQjbUwWzMw7rox7g4DHdxBaDjrqkPTJprFdTP5K3hRDeMWgAoGpz+GNJ7vW+FxF+Ov/7GBTPnO80wYDVzEp4H+K1jFnHO3QccmK1kL0uPFA9CkRYI/B2nO5MOFY2bP1SYeTJS0JIqq0gSqVGpLrZrP6R5+uPRKn8iBlhpyHL7PHrJDKeACPaU0QockAPujwT+fd6fyY5IrdCUWm2nsIr96ai6xcbTYxYcdM/92K34VsLuNAMNuTn+WjDtTbdrXejE8IeXsaC1netmpvbxyo13KpMKOdaIRolqkNa+lncjhlZs7I/4qYVraTN8vLLImS3BYYWn49C4GvhxnCtUjy3cqeRs6MUw0TcArAVzQ42gpuEZ3CVYX9i/bXTMdgGSc4jtXYcpGB9KIec0F6bcXLXYdAI2/+apqvNNH5DEJZgSTWJxBJzLrI21D5vOpd7Zz6OCyWLks4D8H3D1hpWOrm412++23sOxGPGmB4Bzpas/x7h5lmDQcqKOY0UZwhiCoaYaEBtpwlfgahiuFJb4oswiEksr+Q3f/quOMcBsODCO/N5YLiXBGgiu6GKwjA/gxldjoB2SemTq6PbFPqZ7yRbS2lVGOn3HAFdZUYHbx2BUR1K6MdPLIJXowcO3EsGc/FmpYcWS7XAVpYlQD9hpGmNjS/rsFDPjQELaemJOCWaiZ6dtQGLmvVXcY2ODu0q7Mu/YPIqxsKz4jCiVuSWlAsAEF8mxqno1aENd7/Q5gHZKPaP7h9Aniia8/0JajciO7srP9zob8lsbTOC1XAFvXNya2T3/61Y5sf6utECzYoxCm+5m+Lmm1aA4jKnpI1A==
Correct.
Incorrect.
Incorrect. Please try again.
1

hypothesis A tentative explanation for one or more observations that makes predictions that can be tested by experiments or additional observations.
Table

Experimental Design

Types of Hypotheses

A hypothesis, as we saw in Chapter 1, is a tentative answer to the question, an expectation of what the results might be. This might at first seem counterintuitive. Science, after all, is supposed to be unbiased, so why should you expect any particular result at all? The answer is that it helps to organize the experimental setup and interpretation of the data.

Let’s consider a simple example. We design a new medicine and hypothesize that it can be used to treat headaches. This hypothesis is not just a hunch—it is based on previous observations or experiments. For example, we might observe that the chemical structure of the medicine is similar to other drugs that we already know are used to treat headaches. If we went into the experiment with no expectation at all, it would be unclear what to measure.

A hypothesis is considered tentative because we don’t know what the answer is. The answer has to wait until we conduct the experiment and look at the data. When an experiment predicts a specific effect, as in the case of the new medicine, it is typical to also state a null hypothesis, which predicts no effect. Hypotheses are never proven, but it is possible based on statistical analysis to reject a hypothesis. When a null hypothesis is rejected, the hypothesis gains support.

Sometimes, we formulate several alternative hypotheses to answer a single question. This may be the case when researchers consider different explanations of their data. Let’s say for example that we discover a protein that represses the expression of a gene. Our question might be: How does the protein repress the expression of the gene? In this case, we might come up with several models—the protein might block transcription, it might block translation, or it might interfere with the function of the protein product of the gene. Each of these models is an alternative hypothesis, one or more of which might be correct.

Question

dooIEAi4Mlxn0XGNDh0/f/cfLzv6B1Op2ddyr5EgjBxzot7o5KKyTcfDvh33XwArSkkMg8bdFBxYUbUl4tvSJARaz+4sp+VuLhVnWRqWNSdSp/j4gCb4ADZd5mxtH20rPHwSr99owA1t0QnxdJsmsqsDFaZQSRMwto4oc7ArEggxQj/RbNverLN79Q60Yvby9hYm3RLBRemrx3XP0+4QRABYjfJT4FoeQUq9qfDUFq7fC+MN5Y8EvI2yovrMTVl83RiYslBDAd5FX6JmRZE4pqS2gS12i2h58UWbLd6hhmX28qfEWTNl5Kxtx/35pqmIko4ePUeEyUsN+ZYaUQnUOJYb/nRsi2iwO3ZY+4SUKlNAEtTcPZ4m4BxE5zCyJ/xwes7epOILRx45OFbt94yvF4zVxVXIxjJHXFR9qOiMJFTGcESEobX4vU+9dFaFLjMjfddVyXGtwIzfqxQYVm9y0fmAGVTNbnlPcx9ttIMeWP1Vat2qQM/IPp70MyMBhrleGHewMOlJTjqq0UqDUvTjMUshEYYtvPCLY+KEOsJGgQlUBzsJqc2eFNo8Yp68LCrEpcdv7ilcznVJvTdobp70TkQgmhv7JnKKm1Xuj+4tzqKF8lKCEos3n9R2SVDtatZcDAw5fU/83QAZHK6PuDUDINR8F8Yd80CepkkAt28GhYL3J7K14/uPtW9wXMwnVg38jAdii2EO7csAn5MDKR9AKox16UEAA7sxpnarw9dwwGMrR7z+D6iSbg0uBlE2hQHeg5OmDy6hpqp1Q6XMEk8QUVyLvghqHFkPYCeC+N7YkAlW655R0BgltDR39gSWOqsUTKkmyZk7yDrQIMrarDd2t8CFHXIhk1EUpOkdh4weZTE0IF+XUmi3+azAYTxZ+LJxJPio5ymFcl+XWO+bAWII0iYTm10LBWVS3Mm42YU2sLkYO3CJJh04K8zhTlrSoksCKxI4YO7T33WGJbgdGaEe1q/RnnYiMexVbZFcYDrA216bShl8M4SkRmPy3TQs5wcoxNmGgbuSZntRjdwyeb8zn95+LS2ypFctXR52CxAxWdkUu6kZxkMi9Q==
Correct.
Incorrect.
Incorrect. Please try again.
1

statistical test A test used to distinguish accidental or weak relations from real and strong ones.
P-value The likelihood that an observed result (or a result more extreme than that observed) could have been observed merely by chance. If P ≤ 0.05, the observed results are conventionally regarded as unlikely to be attributed to chance alone.
Table

Statistics

Statistical Significance

Biologists observe many relations between variables that are either due to chance in the sample that happened to be chosen or that are too weak to be biologically important. To distinguish the accidental or weak relations from the real and strong ones, a statistical test of the relation is carried out. A statistical test must be based on some specific hypothesis. For example, to determine whether an observed correlation coefficient could be due to chance, we might carry out a statistical test of the hypothesis that the true correlation coefficient is 0. A statistical test usually yields a single number, usually called the P-value (or sometimes p-value), that expresses the likelihood that an observed result (such as a correlation coefficient) could have been observed merely by chance. A P-value is a probability, and if P ≤ 0.05 the observed results are conventionally regarded as unlikely to be attributed to chance alone. In that case, the observed relation is likely to be genuine. In other words, if an observed relation would be obtained by chance alone in only 1 in 20 or fewer experiments (P ≤ 0.05), then the observed relation is regarded as likely to be true. A finding of P ≤ 0.01 is taken as even stronger evidence that the observed result is unlikely to be due to chance.

Figure 6

Statistical testing is necessary because different researchers may disagree on whether or not a finding supports a particular hypothesis or if the interpretation of a result could be affected by wishful thinking. Take Figure 6, for example. If you wish to believe that there was a functional relation between xand y, you might easily convince yourself that the 20 data points fit the straight line. But in fact the P value of the regression coefficient is about P = 0.25, which means that about 25% of the time you would get a line that fits the data as well or better than the line you observed, purely by chance. The proper conclusion is that these data give no support for the hypothesis of a functional relation between x and y. If there is such a relation, then it is too weak to show up in a sample of only 20 pairs of points.

There is good reason to be cautious even when a result is statistically significant. Bear in mind that 5% of statistical tests are misleading in that they indicate that some result is significant merely as a matter of chance. For example, over any short period of time, about 5% of companies listed in stock exchanges will have changes in the dollar value of their shares that are significantly correlated with changes in the number of sunspots, even though the correlation is certainly spurious and due to chance alone. Critical thinking therefore requires that one maintain some skepticism even when faced with statistically significant results published in peer-reviewed scientific journals. Scientific proof rarely hinges on the result of a single experiment, measurement, or observation. It is the accumulation of evidence from many independent sources, all pointing in the same direction, that lends increasing credence to a scientific hypothesis until eventually it becomes a theory.

Question

17ztSd8g8+CGMK2zuUULmRfK9ybLTcBsPd99L2Ob4jwWo++FIeKxZXdhVDIkTMKQCBvjOwu5J5by3cdJmYFRglnAPG4mdycc2zjFeMDXbTJ0R7JIsNunIGmYesI1wkcmCG/EMCJeoysaUh2W0WH8cDjI/rZLQwB98FAtcppNBtB/eNj+TjMYRPA42iDvq/hGjH6JNn1orWODKiIPy+Q/ghtVKCh4qNq/MHH9DW5jHxEKTI/PLg/fO+WjGwQsLOFvYNiUIpIQMK9ROVjKF/Ie6EU2Ov1KXDITHKB805zC9FxhkoE8yc9gsz8/9z4uXoz5y3RfTcu6mm6v9nBH8MggaDWNE3Aah7JTJC5RPD3LvFq8uysJNmlGs6F6FrymHs0OvDk3Y7B89Vx/r/zLUVTM1JnWlaPMPWhS8NytxfSQSuvy3b/8jFiwmKvaylafPX/k5lKO/B6uY2Fi1avx+f8tEIzZp5355fbi1mxX1bH5t2ykxnpSOT/x5Irymi6CrkMXt0DJHMyFDkGfdGkN8zFoLMw7CbYk8kyQwJaynMRCPzzMl7unSbYiiCXDancxIbrEfKCDxjkdLXy2P2+8ti9wKw6ZkZm8bvFyjltKjLuTFtbge6+X0KW9vJuiMJDohDJn57Kt8lRtk82kpNJhmbHC5ILu+yojgR/CPWgyOSsqTGWqUq+fBANRHwvcw7Fiht4CQaqwR8UEVoBPAkDJ9FfhQIeByfPYA2JwcQpDVjFz7biJhZEW6/V91JM5oiQuhnEU9d2sZHKHE537uqefKCFi29/0fEcVNXo7FefHi14Kx0Oj6u1gFD4lbP6bowUmQqgSpDr+WUOGY9OIG6FqWYhW8L00+c6DuCDIgej6DJ/5eCsdPjZwhYp4iYO6wem+OVUH3aE2vuIOJUTKGqJPBDg89juPDO19oPjX4C/SDN67iUrmvDnGMvLw/6u0nw50ALIU34L6AE6FdG4SZtcOOB+pFnoFWw2Z9UKDFbIJ66T65Q0XkBP1llxoB0Zo54EwxYARoTdXnoEv6JJpgx7Y2jszIpUAnny03jxcaN0qb/2TO51VqIqB5B2AcHesJs20yZx36kfn0PN2X0rvnjrOdCNBveDYe+8oRnDfaD1VZUhzQakZr6eWWvQjOqzPSVdtGGRh7I/IXq44Te2Nydhjxs7y2lMPFZ5pWlwrXTHqK5a7TUXUMmELT/rXlzfSngyh/+BniJi7urocEWsYXTZlQSsg9MJWWSxq0knoLys5IMGZ1rQRjG/BGOwMSShkrKqT+wiV
Correct.
Incorrect.
Incorrect. Please try again.
2

null hypothesis A hypothesis that is to be tested, often one that predicts no effect.
Table

Experimental Design

Types of Hypotheses

A hypothesis, as we saw in Chapter 1, is a tentative answer to the question, an expectation of what the results might be. This might at first seem counterintuitive. Science, after all, is supposed to be unbiased, so why should you expect any particular result at all? The answer is that it helps to organize the experimental setup and interpretation of the data.

Let’s consider a simple example. We design a new medicine and hypothesize that it can be used to treat headaches. This hypothesis is not just a hunch—it is based on previous observations or experiments. For example, we might observe that the chemical structure of the medicine is similar to other drugs that we already know are used to treat headaches. If we went into the experiment with no expectation at all, it would be unclear what to measure.

A hypothesis is considered tentative because we don’t know what the answer is. The answer has to wait until we conduct the experiment and look at the data. When an experiment predicts a specific effect, as in the case of the new medicine, it is typical to also state a null hypothesis, which predicts no effect. Hypotheses are never proven, but it is possible based on statistical analysis to reject a hypothesis. When a null hypothesis is rejected, the hypothesis gains support.

Sometimes, we formulate several alternative hypotheses to answer a single question. This may be the case when researchers consider different explanations of their data. Let’s say for example that we discover a protein that represses the expression of a gene. Our question might be: How does the protein repress the expression of the gene? In this case, we might come up with several models—the protein might block transcription, it might block translation, or it might interfere with the function of the protein product of the gene. Each of these models is an alternative hypothesis, one or more of which might be correct.