disco3etemp

1 Frequency Distributions and relative Frequency Distributions

Recall from Chapter 1 that categorical (qualitative) data take values that are non-numeric and are usually classified into categories. In this section, we learn graphical and tabular methods for handling categorical data. Let us begin with an example.

Table 1 shows the 20 most downloaded free apps for the IOS platform, as reported by Apple.com, along with the app type, for June 2014. We will analyze the variable app type, which is a qualitative, not quantitative, variable.

Table 1.3: Table 1Top 20 free IOS apps, June 2014, as reported by Apple.com

Rank	App	App type	Rank	App	App type
1	Two Dots	Games	11	Facebook	Social networking
2	The Line	Games	12	NBC Sports Live	Sports
3	Traffic Racer	Games	13	Twitter	Social networking
4	Rival Knights	Games	14	FIFA Official App	Sports
5	Piano Tiles	Games	15	Pandora	Music
6	Snap Chat	Photo and video	16	Spotify	Music
7	Instagram	Photo and video	17	Pinterest	Social networking
8	The Test	Games	18	Emoji Keyboard 2	Social networking
9	Republique	Games	19	WhatsApp	Social networking
10	YouTube	Photo and video	20	SoundCloud	Music

From this data set, it is not immediately clear which app type is the most popular choice among the 20 apps in the sample. That is why we need ways to summarize the values in a data set. One popular method used to summarize the values in a data set is the frequency distribution (or frequency table).

2.1 Graphs and Tables for Categorical Data

The frequency, or count, of a category refers to the number of observations in each category. A frequency distribution for a qualitative variable is a listing of all the values (for example, categories) that the variable can take, together with the frequencies for each value.

EXAMPLE 1 Frequency distributions

Roberto Westbrook/Blend Images/Getty Images

Note: Check that the sum of the frequencies equals the sample size, n.

Create a frequency distribution for the variable app type from Table 1.

solution

For each app type, we compute the frequency; that is, we count (or tally) how many apps were of that particular app type. Table 2 shows the frequency distribution for the variable app type. For example, five of the apps were social networking apps. The frequency distribution summarizes the data set so that quick observations can be made, such as “The most popular app type in the Apple.com top 20 list of the most downloaded free apps is the Games app type.”

Table 1.4:

Table 2Frequency distribution of app type

App type	Tally	Frequency
Games	\|\|\|\|\|\|	7
Social networking	\|\|\|\|	5
Music	\|\|\|	3
Photo and video	\|\|\|	3
Sports	\|\|	2

#1

The New York City Police Department tracks the number and type of traffic violations. Table 3 contains a random sample of 12 traffic violations and the borough in which they occurred (Manhattan or Brooklyn).

1.Build a frequency distribution of Borough.

2.Construct a frequency distribution of Violation type.

Table 1.5:

Table 3Violation type and borough of 12 traffic violations

Violation type	Borough	Violation type	Borough
Cell phone	Brooklyn	Disobey sign	Manhattan
Safety belt	Manhattan	Speeding	Brooklyn
Cell phone	Brooklyn	Safety belt	Manhattan
Cell phone	Manhattan	Disobey sign	Manhattan
Speeding	Brooklyn	Disobey sign	Brooklyn
Safety belt	Manhattan	Cell phone	Manhattan

(The solutions are shown in Appendix A.)

As the data set gets larger, the need for summarization gets more and more acute. (Imagine if the Apple.com listing consisted of 1000 apps instead of 20.) Take a moment to add up the frequencies in Table 2. What do they add up to? This number is the sample size: n = 20. Now, is this just a coincidence, or does this happen every time?

04/05/15 12:02 PM

Actually, this happens every time: the sum of the frequencies equals the sample size, n. One way to check if you made a mistake in forming your frequency distribution table is to add up the frequencies and see if the sum equals the sample size.

Relative Frequency Distributions

Next, suppose you didn’t know the size of the sample in the survey. Suppose you were told only that seven apps were games. The logical question is “Is that a lot?” If our sample size was only 10 apps, then 7 of those apps being games is certainly a lot. However, if our sample size was 1000 apps, then only 7 of those apps being games is not a lot. So, the number’s significance depends on what you compare the seven apps to— that is, “relative to what?” or “compared to what?” In statistics, we compare the frequency of a category with the total sample size to get the relative frequency.

The relative frequency of a particular category of a qualitative variable is its frequency divided by the sample size. A relative frequency distribution for a qualitative variable is a listing of all values that the variable can take, together with the relative frequencies for each value.