178

Part II

image

179

Statistics: the Science of Data

Data are collected every day. Whether you know it or not, you, too, contribute to the vast amount of data collected daily. Every time you make a phone call (or send a text message), the number, location, date, and length of the call (or number of characters in the text message) are saved by the phone company. If you download a movie, the title, genre, movie rating, date and time of download get recorded. When you use a supermarket card to get store savings, the date, products you buy, amount you spend, and amount you save are stored in a data bank. At the end of each term, your university or school records your courses and grades for future reference. And that’s just a small subset of the personal data that you generate. So data collecting is constantly going on all around you. Analysis of such data is being used for marketing, security, political advocacy, and much, much more.

And you are not the only source of data—data are collected on traffic patterns, mercury level in fish, consumer products, emergency room admissions, climate change, crops, standardized testing, and just about everything you can imagine. With so much data out there, how do we make sense of it? That’s where statistics comes in. Statistics is the science of collecting, organizing, analyzing, and interpreting data.

Chapters 5 and 6 concern data analysis, the art of studying what data reveal. We learn from data by making graphs and doing calculations, guided by principles that help us decide what graphs to make, what to look for in our graphs, and what calculations are helpful based on what we see.

Sometimes we want to know more: An opinion poll or a medical study looks at only some people, but we want conclusions that apply to all voters or all patients. This is called statistical inference because we infer conclusions about a large group from data on a small part of the group. Chapter 7 discusses inference from beginning to end—from how to produce data when we have inference in mind to how to say just how much confidence we can have in our conclusions. Confidence, uncertainty, risk, chance—the mathematics that describes all these ideas is probability theory, the topic of Chapter 8. Probability is the mathematics behind statistical inference, but that’s just a small part of its usefulness.