Introduction to Bootstrapping

Bootstrapping

Bootstrapping is a statistical technique that allows us to estimate the sampling distribution of a statistic by resampling from the original data set. It is useful when we have a limited sample size and we want to estimate the properties of a population or a statistic of interest.

In bootstrapping, we create multiple resamples of the original data set by randomly selecting data points with replacement. These resamples are typically the same size as the original data set. We then calculate the statistic of interest for each resample and use these statistics to estimate the sampling distribution of the statistic.

The benefit of bootstrapping is that it allows us to estimate the sampling distribution of a statistic without making any assumptions about the underlying population distribution. This is particularly useful when the population distribution is unknown or non-normal, as it allows us to make inferences about the population based on the data we have.

So, Bootstrapping is a resampling procedure that uses data from one sample to generate a sampling distribution by repeatedly taking random samples from the known sample.

Bootstrapping
A resampling procedure for constructing a sampling distribution using data from a sample

Example: Bootstrap Distribution

Let’s say you are a researcher who wants to estimate the mean height of a population of adults. However, you only have a small sample of 30 individuals. You could use bootstrapping to estimate the sampling distribution of the mean height and calculate a confidence interval around your estimate.

To do this, you would create multiple resamples of the original data set by randomly selecting 30 individuals from the sample with replacement. You would calculate the mean height for each resample and use these means to estimate the sampling distribution of the mean height.

Once you have the sampling distribution, you can calculate a confidence interval around your estimate of the mean height. For example, you could use the 95% confidence interval, which would include the middle 95% of the sampling distribution. This would give you an idea of the range of values that the true population mean height is likely to fall within.

By using bootstrapping, you can estimate the sampling distribution of the mean height without making any assumptions about the underlying population distribution. This can be particularly useful when the population distribution is unknown or non-normal, as it allows you to make inferences about the population based on the data you have.

Example: Bootstrap Distribution for Mean Height

We have data concerning the heights of individuals in a random sample of n = 15. To construct a bootstrap distribution for the mean height we would first randomly select one individual from that sample and record their height. Then, with the that individual placed back into the sample, we would randomly select a second individual and record their height. This is known as “sampling with replacement” because we are putting each case back into the sample after recording their height. We would repeat this process until we have selected 15 values. Because we are sampling with replacement, some individuals may appear in the bootstrap sample more than once. We would use those 15 selected values to compute a bootstrapped sample mean. This process is repeated many times. The distribution of many bootstrapped sample means is known as the bootstrap distribution or bootstrap sampling distribution.

Introduction to Bootstrapping

Paired Samples