SECTION 2.1 Summary
- To study relationships between variables, we must measure the variables on the same cases.
- If we think that a variable x may explain or even cause changes in another variable y, we call x an explanatory variable and y a response variable.
- A scatterplot displays the relationship between two quantitative variables measured on the same cases. Plot the data for each case as a point on the graph.
- Always plot the explanatory variable, if there is one, on the x axis of a scatterplot. Plot the response variable on the y axis.
- Plot points with different colors or symbols to see the effect of a categorical variable in a scatterplot.
- In examining a scatterplot, look for an overall pattern showing the form, direction, and strength of the relationship and then for outliers or other deviations from this pattern.
- Form: Linear relationships, where the points show a straight-line pattern, are an important form of relationship between two variables. Curved relationships and clusters are other forms to watch for.
- Direction: If the relationship has a clear direction, we speak of either positive association (high values of the two variables tend to occur together) or negative association (high values of one variable tend to occur with low values of the other variable).
- Strength: The strength of a relationship is determined by how close the points in the scatterplot lie to a clear form such as a line.
- A transformation uses a formula or some other method to replace the original values of a variable with other values for an analysis. The transformation is successful if it helps us to learn something about the data.
- The log transformation is frequently used in business applications of statistics. It tends to make skewed distributions more symmetric, and it can help us to better see relationships between variables in a scatterplot.