1 Frequency Distributions and relative Frequency Distributions

Recall from Chapter 1 that categorical (qualitative) data take values that are non-numeric and are usually classified into categories. In this section, we learn graphical and tabular methods for handling categorical data. Let us begin with an example.

Table 1 shows the 20 most downloaded free apps for the IOS platform, as reported by Apple.com, along with the app type, for June 2014. We will analyze the variable app type, which is a qualitative, not quantitative, variable.

Table 1.3: Table 1Top 20 free IOS apps, June 2014, as reported by Apple.com
Rank App App type Rank App App type
1 Two Dots Games 11 Facebook Social networking
2 The Line Games 12 NBC Sports Live Sports
3 Traffic Racer Games 13 Twitter Social networking
4 Rival Knights Games 14 FIFA Official App Sports
5 Piano Tiles Games 15 Pandora Music
6 Snap Chat Photo and video 16 Spotify Music
7 Instagram Photo and video 17 Pinterest Social networking
8 The Test Games 18 Emoji Keyboard 2 Social networking
9 Republique Games 19 WhatsApp Social networking
10 YouTube Photo and video 20 SoundCloud Music

From this data set, it is not immediately clear which app type is the most popular choice among the 20 apps in the sample. That is why we need ways to summarize the values in a data set. One popular method used to summarize the values in a data set is the frequency distribution (or frequency table).

2.1 Graphs and Tables for Categorical Data

The frequency, or count, of a category refers to the number of observations in each category. A frequency distribution for a qualitative variable is a listing of all the values (for example, categories) that the variable can take, together with the frequencies for each value.

EXAMPLE 1 Frequency distributions

Roberto Westbrook/Blend Images/Getty Images

Note: Check that the sum of the frequencies equals the sample size, n.

Create a frequency distribution for the variable app type from Table 1.

solution

For each app type, we compute the frequency; that is, we count (or tally) how many apps were of that particular app type. Table 2 shows the frequency distribution for the variable app type. For example, five of the apps were social networking apps. The frequency distribution summarizes the data set so that quick observations can be made, such as “The most popular app type in the Apple.com top 20 list of the most downloaded free apps is the Games app type.”

Table 1.4:

Table 2Frequency distribution of app type

App type Tally Frequency
Games |||||| 7
Social networking |||| 5
Music ||| 3
Photo and video ||| 3
Sports || 2

#1

The New York City Police Department tracks the number and type of traffic violations. Table 3 contains a random sample of 12 traffic violations and the borough in which they occurred (Manhattan or Brooklyn).

1.Build a frequency distribution of Borough.

2.Construct a frequency distribution of Violation type.

Table 1.5:

Table 3Violation type and borough of 12 traffic violations

Violation type Borough Violation type Borough
Cell phone Brooklyn Disobey sign Manhattan
Safety belt Manhattan Speeding Brooklyn
Cell phone Brooklyn Safety belt Manhattan
Cell phone Manhattan Disobey sign Manhattan
Speeding Brooklyn Disobey sign Brooklyn
Safety belt Manhattan Cell phone Manhattan

(The solutions are shown in Appendix A.)

As the data set gets larger, the need for summarization gets more and more acute. (Imagine if the Apple.com listing consisted of 1000 apps instead of 20.) Take a moment to add up the frequencies in Table 2. What do they add up to? This number is the sample size: n = 20. Now, is this just a coincidence, or does this happen every time?

04/05/15 12:02 PM

Actually, this happens every time: the sum of the frequencies equals the sample size, n. One way to check if you made a mistake in forming your frequency distribution table is to add up the frequencies and see if the sum equals the sample size.

Relative Frequency Distributions

Next, suppose you didn’t know the size of the sample in the survey. Suppose you were told only that seven apps were games. The logical question is “Is that a lot?” If our sample size was only 10 apps, then 7 of those apps being games is certainly a lot. However, if our sample size was 1000 apps, then only 7 of those apps being games is not a lot. So, the number’s significance depends on what you compare the seven apps to— that is, “relative to what?” or “compared to what?” In statistics, we compare the frequency of a category with the total sample size to get the relative frequency.

The relative frequency of a particular category of a qualitative variable is its frequency divided by the sample size. A relative frequency distribution for a qualitative variable is a listing of all values that the variable can take, together with the relative frequencies for each value.