Data and Data Visualization: Interpreting and Presenting Results | Study Session 5: Line Graphs

An important goal of data presentation is to show trends that may not be apparent when looking at raw data or even processed data in a data table. Many data are confusing when stated in sentences but become clear when graphed. Consider a patient who comes to the emergency room with a fever. The physician orders a white blood cell (WBC) count, which is a measure of the concentration of a type of immune cell in the blood (see Chapter 41 Animal Immune Systems). An elevated white blood cell count can be a sign of an infection.

The physician reviews the patient’s health record. In 2012, the patient had a white blood cell count of 5,400 cells/µL. In 2013, it was 5,000 cells/µL. In 2014, it was 4,700 cells/µL. In 2015, it was 5,800 cells/µL. In 2016, it was 9,400 cells/µL. In 2017, it was 5,100 cells/µL. And in 2018, it was 6,100 cells/µL. Reading this long list of data in sentence format, it’s very difficult to see any trends. But if you look at a graph like the one in Figure 6, the data are easier to understand. In this case, you can plot the time (year) on the horizontal x-axis and the white blood cell count (cells/uL) along the vertical y-axis:

Figure 6 White blood cell count from 2009 to 2019.

When visualized in this way, it is much easier to see that the patient’s white blood cell count was elevated in 2016 compared to the other years. This is an example of a line graph, a data presentation chart that displays information as a series of data points connected by straight line segments.

Line graphs are useful when plotting two variables relative to each other. In this example, you are looking at white blood cell count (one variable) over time (the second variable). You will see many line graphs in your textbook (and elsewhere). For example, in Chapter 46 Ecosystem Ecology, you will see the Keeling curve, which shows the level of carbon dioxide in the atmosphere (y-axis) plotted over time (x-axis). Researchers could of course simply collect carbon dioxide measurements and write them down, but it is easier to see trends in the data in a line graph showing these measurements over time.

Figure 46.1 The Keeling curve.

In both Figure 6 and Figure 46.1, there are two variables. One variable is the independent variable. The independent variable isn’t affected by the other variable. In both of these line graphs, time is the independent variable because it elapses on its own, independent of the white blood cell count or the level of carbon dioxide in the atmosphere. The independent variable is usually plotted on the x-axis (the horizontal axis along the bottom of the graph).

The second variable is the dependent variable, which varies and depends on the independent variable. In both of these cases, the white blood cell count and the level of carbon dioxide change with (depend on) time, so the white blood cell count and carbon dioxide level are our dependent variables. The dependent variable is usually plotted on the y-axis (the vertical axis on the left side of the graph).

Consider your mammal trapping study from the earlier Study Sessions in this primer. If you record the time of day that each mammal is trapped, you can plot the total number of mammals trapped over a 24-hour period.

12am-2am 2am-4am 4am-6am 6am-8am 8am-10am 10am-12am 12am-2pm 2pm-4pm 4pm-6pm 6pm-8pm 8pm-10pm 10pm-12am
Number trapped 8 3 2 0 0 0 0 0 1 22 17 8
Cumulative number 8 11 13 13 13 13 13 13 14 36 53 61

You can take this dataset and plot it on a line graph, with time (the independent variable) on the x-axis and number of animals trapped (the dependent variable) on the y-axis. Figure 7 is the line graph of the data.

Figure 7 Line graph of mammal trapping data.

Usually, the x-axis and y-axis each show just one type of data. For example, time is shown on the x-axis and cumulative number of animals trapped is shown on the y-axis in Figure 7. However, sometimes one or both axes of a line graph can show two different types of data that are correlated with each other. A correlation is an association between two variables. For example, in Chapter 1 Life: Chemical, Cellular, and Evolutionary Foundations, the following line graph is shown in Figure 1.3 What caused the extinction of the dinosaurs?

Figure 1.3 What caused the extinction of the dinosaurs?

In the line graph in Figure 1.3, the amount of iridium is plotted on the x-axis (on the bottom), and both depth (on the left) and time periods (on the right) are plotted on the y-axis. In this case, there can be two variables plotted on the y-axis because time is recorded in layers of rock, and therefore the two variables (depth and time periods) are correlated with one another. That is, as time passes, sediment is deposited in layers. The line graph shows a dramatic spike in the level of iridium at a depth of 280 m, which is right at the boundary between the Cretaceous Period and Paleogene Period. This spike provided one piece of data that helped scientists determine that a massive meteor struck the Earth at this time, causing environmental havoc and eventually leading to the extinction of the dinosaurs.

A line graph from Chapter 20 Evolution: How Genotypes and Phenotypes Change over Time is another example of two variables along one axis.

Figure 20.5 Hardy–Weinberg relation.

The line graph in Figure 20.5 is a visual representation of the Hardy–Weinberg equilibrium, in which you can see how allele frequencies relate to genotype frequencies. In this example, the alleles are A and a, and their frequencies are denoted by p and q, respectively. Both of these variables are plotted on the horizontal x-axis (along the bottom of the graph). This works because p and q are related to each other in such a way that p + q = 1. For example, if p = 0.8, then q = 0.2, so the x-axis can show both at the same time. With two alleles, there are three possible genotypes: AA, Aa, and aa. Genotype frequency is plotted on the vertical y-axis (along the left side of the graph). The three curves then denote the frequencies of the three genotypes at every allele frequency. The frequency of AA is shown in blue; the frequency of Aa in purple; and the frequency of aa in red.