Before scientists interpret and present data, they often need to process the raw data in some way. For quantitative data, this may involve statistical calculations, such as determining the mean, median, mode, and standard deviation (see Primer Probability and Statistics: Predicting and Analyzing Data). For DNA sequence data, processing may involve annotating sequences, that is, indicating which sequences encode proteins and which sequences are regulatory (see Chapter 13 Genomes).
Sometimes, processing data entails transforming numbers into a data table. To do this, you start by categorizing the data. For example, if you are interested in what mammal species are present in a patch of forest, you can record the mammals you see as you walk through the forest. Imagine that over a 24-hour period in a forest patch, you spot 108 mammals in total, including 43 sightings of species A, 47 of B, 3 of C, 5 of D, 7 of E, and 3 of F. These numbers are your raw data.
The first step in processing these data is to put the data in table form. In this case, you can generate a table in which you specify the species in the first row and the number of sightings of each species in the second row. First, you enter the species A-F in the first row, as shown in this data table.
Species | A | B | C | D | E | F |
---|---|---|---|---|---|---|
Number of sightings |
Then, you enter the number of each species you counted in the second row, with the number of sightings for each species directly below that species, as shown in this data table.
Species | A | B | C | D | E | F |
---|---|---|---|---|---|---|
Number of sightings | 43 | 47 | 3 | 5 | 7 | 3 |
This experiment illustrates one potential pitfall of data collection. How valid are these data? You have seen Species B 43 times, but maybe each sighting was of the same individual, whereas perhaps the three sightings of Species F were three different individuals. This experiment demonstrates that you have to be very careful when you design a data collection protocol. To avoid this issue, you can redo the experiment, this time trapping and marking each individual in some way after it is counted.
The revised method for counting mammals based on trapping and marking rather than sighting results in this data table.
Species | A | B | C | D | E | F |
---|---|---|---|---|---|---|
Number trapped | 17 | 29 | 5 | 2 | 5 | 3 |
Note that for species A, these numbers suggest that the initial sightings likely included some instances of the same individual being seen more than once.