A confidence interval is an estimation technique used in statistical inference that allows to limit a pair or several pairs of values, within which the desired point estimate will be found (with a certain probability).
A confidence interval will allow us to calculate two values around a sample mean (one upper and one lower). These values will limit a range within which, with a certain probability, the population parameter will be located.
Confidence interval = mean + – margin of error
Knowing the true population, in general, is something very complicated. Consider a population of 4 million people. Could we know the average consumption expenditure per household of this population? In principle yes. We would simply have to survey all households and calculate the mean. However, following this process would be extremely laborious and would make the study quite complicated.
In situations like this, it becomes more feasible to select a statistical sample. For example, 500 people. And on this sample, calculate the mean. Although we would still not know the true population value, we could assume that it will be close to the sample value. To that mean we add the margin of error and we have a confidence interval value. On the other hand, we subtract that margin of error from the mean and we will have another value. Between these two values will be the population mean.
In conclusion, the confidence interval does not serve to give a punctual estimate of the population parameter, if it is going to help us to get an approximate idea of which could be the true one. It allows us to limit between two values where the population mean will be found.
Factors on which a confidence interval depends
The calculation of a confidence interval depends mainly on the following factors:
- Selected sample size: Depending on the amount of data that has been used to calculate the sample value, it will be more or less close to the true population parameter.
- Confidence level: It will inform us in what percentage of cases our estimate is correct. The usual levels are 95% and 99%.
- Margin of error of our estimate: This is called alpha and informs us of the probability that the population value is outside our range.
- The estimated in the sample (mean, variance, difference of means …): The pivot statistic for calculating the interval will depend on this.
Example of confidence interval for the mean, assuming normality and the standard deviation known
The pivot statistic used for the calculation would be the following:
The resulting interval would be the following:
We see how in the interval to the left and right of the inequality we have the lower and upper bound respectively. Therefore the expression tells us that the probability that the population mean lies between these values is 1-alpha (confidence level).
Let’s take a better look at the above with an exercise solved as an example.
You want to estimate the average time a runner takes to complete a marathon. For this, 10 marathons have been timed and an average of 4 hours with a standard deviation of 33 minutes (0.55 hours) has been obtained. You want to obtain a 95% confidence interval.
To obtain the interval, we would only have to substitute the data in the interval formula.
The confidence interval would be the part of the distribution that is shaded in blue. The 2 values bounded by this would be those corresponding to the 2 red lines. The central line that divides the distribution in 2 would be the true population value.
It is important to note that in this case, given that the density function of the distribution N (0,1) gives us the cumulative probability (from the left to the critical value), we have to find the value that leaves us 0.975 on the left % (this is 1.96).