Correlation

Prerequisites

Values of the Pearson Correlation, Sampling Distribution of Pearson's r, Confidence Intervals

Learning Objectives

  1. State the standard error of z'
  2. Compute a confidence interval on ρ

Computing a confidence interval for a correlation coefficient can also be useful in analysis.  Imagine that you are a local government manager with the responsibility of repairing sidewalks, and you wonder whether you could improve customer satisfaction by decreasing the number of days it takes to fulfill a repair request.  A first step in your thinking should be to assess whether there is a negative correlation between the time it takes to repair a sidewalk and customer satisfaction with city services (e.g. the longer the repair time the lower the satisfaction).  As a note, finding the correlation coefficient between repair times and customer satisfaction does not provide a complete answer because the relationship may not be linear and it may be confounded by other factors such as the characteristics of individual stakeholders and the political environment.  These factors can be addressed in more complicated, multiple regression models that are taught in another class, but taking the correlation coefficient is, nevertheless a useful first step.   Say, you have satisfaction surveys from 34 individuals who have requested that a sidewalk be repaired, and you find that the correlation between repair times and satisfaction is   -.654.  Now, you are not really interested in the correlation for these 34 individuals but the population correlation for all city residents.

To estimate this city level correlation you need to compute a confidence interval on the population value of Pearson's correlation (ρ).  This calculation is complicated by the fact that the sampling distribution of r is not normally distributed. Nevertheless, Fisher's z' transformation described in the section on the sampling distribution of Pearson's r provides a simple solution. The steps in computing a confidence interval for ρ are:

    1. Convert r to z'
    2. Compute a confidence interval in terms of z'
    3. Convert the confidence interval back to r.

Going back to our example, we wish to compute a 95% confidence interval on the population correlation, ρ, based on our sample correlation, r, which is -.654 in our sample of 34 individuals.

The conversion of r to z' can be done using a table or calculator. The table contains only positive value of r, but that is not a problem. The value of z' associated with an r of 0.654 is 0.78. Therefore, the z' associated with an r of -0.654 is -0.78.

The sampling distribution of z' is approximately normally distributed and has a standard error of

For this example, N = 34 and therefore the standard error is 0.180. The Z for a 95% confidence interval (Z.95) is 1.96 as can be found using the normal distribution calculator (setting the shaded area to .95 and clicking on the "Between" button). The confidence interval is therefore computed as:

Lower limit = -0.78 - (1.96)(0.18)= -1.13
Upper limit = -0.78 + (1.96)(0.18)= -0.43

The final step is to convert the endpoints of the interval back to r using a table or calculator. The r associated with a z' of -1.13 is -0.81 and the r associated with a z' -0.43 is -0.40. Therefore, the population correlation (ρ) is likely to be between -0.81 and -0.40. The 95% confidence interval is:

-0.81≤ ρ ≤ -0.40

To calculate the 99% confidence interval, you use the Z for a 99% confidence interval of 2.58 as follows:

Lower limit = -0.775 - (2.58)(0.18)= -1.24
Upper limit = -0.775 + (2.58)(0.18)= -0.32

Converting back to r, the confidence interval is:

-0.84≤ ρ ≤ -0.31

Naturally, the 99% confidence interval is wider than the 95% confidence interval.

In sum, even though you only have information on 34 individuals, you can be highly confident that the correlation between service completion times and customer satisfaction is negative (e.g., 99% confident that it is between -.31 and -.84) and consequently, you have some evidence that lowering completion times may increase overall satisfaction.  You still need to be cautious because correlation does not prove causation. It is possible, for example, that people with low levels of satisfaction with your city cause service times to be longer by being particularly rude and unpleasant to city workers, leading them to drag their feet.  Nevertheless, the more important point to take away from this example is that you are able to say very precise things about the population even with a small amount of sample information.