Impact of Sample Size on Confidence Intervals

The Importance and Effect of Sample Size

A confidence interval is a range of values that is likely to contain the true population parameter with a specified level of confidence. The level of confidence is typically expressed as a percentage, such as 95%. The larger the sample size, the narrower the confidence interval tends to be, and the more precise the estimate of the population parameter.

When the sample size is small, the confidence interval tends to be wider because there is more uncertainty in the estimate. This means that the range of values that could plausibly contain the true population parameter is larger.

As the sample size increases, the standard error of the estimate decreases, and the confidence interval becomes narrower. This means that the range of plausible values for the population parameter becomes smaller, and the estimate becomes more precise.

Example: Height of adult

For example, suppose you are estimating the mean height of adult males in a population, and you take a random sample of 10 men. The resulting confidence interval might be quite wide, say, 65 inches to 75 inches. However, if you take a sample of 100 men, the resulting confidence interval might be much narrower, say, 69 inches to 71 inches.

In summary, as the sample size increases, the confidence interval becomes narrower and more precise, and as the sample size decreases, the confidence interval becomes wider and less precise.

Example: Proportion of people

Suppose you want to estimate the proportion of people in a population who support a particular political candidate. You take a random sample of 50 people and find that 30 of them support the candidate. Using this information, you calculate a 95% confidence interval for the true proportion of supporters.

The formula for calculating a confidence interval for a proportion involves using the sample proportion, the sample size, and the level of confidence. With a sample proportion of 0.6 (30/50), a sample size of 50, and a 95% level of confidence, the confidence interval can be calculated as:

0.6 ± 1.96 * sqrt[(0.6 * 0.4) / 50] = 0.6 ± 0.142

Therefore, the 95% confidence interval for the true proportion of supporters is from 0.458 to 0.742. This means that we can be 95% confident that the true proportion of supporters in the population is somewhere between 45.8% and 74.2%.

Now, let’s consider how the distribution of sample proportions is different for different sample sizes. The distribution of sample proportions can be approximated by a normal distribution, with the mean equal to the true population proportion and the standard deviation equal to the square root of [(population proportion * (1 – population proportion)) / sample size].

When the sample size is small, the distribution of sample proportions is more spread out, with a wider standard deviation, meaning that the estimate is less precise. As the sample size increases, the distribution of sample proportions becomes narrower, with a smaller standard deviation, indicating a more precise estimate.

In our example, with a sample size of 50, the standard deviation is sqrt[(0.6 * 0.4) / 50] = 0.0866, while with a sample size of 100, the standard deviation is sqrt[(0.6 * 0.4) / 100] = 0.0612. This means that the estimate of the population proportion is more precise with a larger sample size, as the distribution of sample proportions is narrower.

 

Answers to conceptual questions on confidence intervals

Decide whether the following statements are true or false.  Explain your reasoning.

Problems:

a)  For a given standard error, lower confidence levels produce wider confidence intervals.

False.   To get higher confidence, we need to make the interval wider interval.  This is evident in the multiplier, which increases with confidence level.

b)  If you increase sample size, the width of confidence intervals will increase.

False.   Increasing the sample size decreases the width of confidence intervals, because it decreases the standard error.

c)  The statement, “the 95% confidence interval for the population mean is (350, 400)”, is equivalent to the statement, “there is a 95% probability that the population mean is between 350 and 400”.

False.   95% confidence means that we used a procedure that works 95% of the time to get this interval.  That is, 95% of all intervals produced by the procedure will contain their corresponding parameters.  For any one particular interval, the true population percentage is either inside the interval or outside  the interval.  In this case, it is either in between 350 and 400, or it is not in between 350 and 400.  Hence, the probabliity that the population percentage is in between those two exact numbers is either zero or one.

d)  To reduce the width of a confidence interval by a factor of two (i.e., in half), you have to quadruple the sample size.

True, as long as we’re talking about a CI for a population percentage.   The standard error for a population percentage has the square root of  the sample size in the denominator.  Hence, increasing the sample size by a factor of 4 (i.e., multiplying it by 4) is equivalent to multiplying the standard error by 1/2.  Hence, the interval will be half as wide.  This also works approximately for population averages as long as the multiplier from the t-curve doesn’t change much when increasing the sample size (which it won’t if the original sample size is large).

e)  Assuming the central limit theorem applies, confidence intervals are always valid.
By “valid,” we mean that the confidence interval procedure has a 95% chance of producing an interval that contains the population parameter.

False.  The central limit theorem is needed for confidence intervals to be valid.   However, it is also necessary that the data be collected from random samples.  Confidence intervals will not remedy poorly collected data.

f)  The statement, “the 95% confidence interval for the population mean is (350, 400)” means that 95% of the population values are between 350 and 400.

False.  The confidence interval is a range of plausible values for the population average.   It does not provide a range for 95% of the data values from the population.  To find the percentage of values in the population between 350 and 400, we need to look at a histogram of the data values and determine what percentage of observations are between 350 and 400.

g)  If you take large random samples over and over again from the same population, and make 95% confidence intervals for the population average, about 95% of the intervals should contain the population average.

True.   This is the definition of confidence intervals.

h)  If you take large random samples over and over again from the same population, and make 95% confidence intervals for the population average, about 95% of the intervals should contain the sample average.

False.   The confidence interval is a range for the population average, not for the sample average.  In fact, every confidence interval contains its corresponding sample average, because CIs are of the form:  sample avg. +/- multiplier SE.  So, the sample average is right in the middle of the CI.

i)   It is necessary that the distribution of the variable of interest follows a normal curve.

False.   It is necessary that the distribution of the sample average follows a normal curve.  The data values of the variable, however, need not follow a normal curve, because if the sample size is large enough the central limit theorem for the sample average will apply.

j)  A 95% confidence interval obtained from a random sample of 1000 people has a better chance of containing the population percentage than a 95% confidence interval obtained from a random sample of 500 people.

False.  All 95% confidence intervals have the property that they come from a procedure that has a 95% chance of yielding an interval that contains the true value.   The confidence interval method automatically accounts for sample size in the standard error.   A 95% CI with n=1000 will be narrower than a 95% CI with n=500, but both CIs will have 95% confidence of containing the population percentage.

k)  If you make go through life making 99% confidence intervals for all sorts of population means, about 1% of the time the intervals won’t cover their respective population means.

True.  Since 99% of the intervals should contain the corresponding population mean, 1% of them will not.

Note: Above questions are taken from this link.

Paired Samples

Introduction to Hypothesis Testing