Sample and Population
What is Inferential Statistics?
Statistical inference is the branch of statistics concerned with drawing conclusions and/or making decisions concerning a population based only on sample data.
Let’s consider an awesome example given by great professor. Suppose you are cooking some recipe and you want to test it before serving to the guest to get an idea about the dish as a whole. You will never eat the full dish to get that idea. Rather you will taste very little portion of your dish with a spoon.
- So here you are only doing exploratory analysis to get idea what you cook with a sample in your hand.
- Next if you generalize that your dish required some extra sugar or salt then that making an inference.
- To get a valid and right inference your portion of dish that you tested should be representative of your sample. Otherwise conclusion will be wrong.
The term “population” is used in statistics to represent all possible measurements or outcomes that are of interest to us in a particular study.
Census attempt to gather information from each and every unit of the population of interest.
The term “sample” refers to a portion of the population that is representative of the population from which it was selected.
Depending on the sampling method, a sample can have fewer observations than the population, the same number of observations, or more observations. More than one sample can be derived from the same population.
Now the question is why we use sample in statistics why don’t we go for census?
Why using a sample? Why not census?
- Less time consuming than a census;
- less costly to administer than a census;
- measuring the variable of interest may involve the destruction of the population unit;
- a population may be infinite.
Parameters and Statistics:
One goal of statistical inference is to estimate a population parameter from a sample statistic.
- Parameters are
– Numerical characteristic of a population
– Constant (fixed) at any one moment
– Usually unknown
- Statistics are
– Numerical summary of a sample
– Calculated from sample data (not constant)
– Used to estimate a parameter
A sampling method is a procedure for selecting sample elements from a population. Sampling is necessary to make inferences about a population. If sample is not representative it is biased — you cannot generalize to the population from your statistical data.
1. Convenience Sample:
Suppose you are conducting a survey on job employment of woman and man. Now neighbors of yours are very easily accessible to you and they are more likely to be include in your sample. If you do that then your inference will suffer from convenience sampling bias.
“Statistical inference with convenience samples is a risky business.”- David A. Freedman, Statistical Models and Causal Inference, p. 23
If a convenience sample is used, inferences are not as trustworthy as if a random sample is used.
If only a fraction of the randomly sampled people respond to your survey such that the sample is no longer repetitive of the population then it suffers from non-response bias.
Suppose you are conducting a survey on drug intake rate used by young students. In this case, some students might not reveal the information for personal reason. This is called non-response sample bias.
3. Voluntary Response:
Voluntary response occurs when sample consist of people who volunteer to respond because they have strong opinion on the issue. Often, voluntary response samples oversample people who have strong opinions and undersample people who don’t care much about the topic of the survey. Thus inferences from a voluntary response sample are not as trustworthy as conclusions based on a random sample of the entire population under consideration. Note that in voluntary response there is no initial random sample.
Simple Random Sample (SRS):
A sampling method is a procedure for selecting sample elements from a population. Simple random sampling refers to a sampling method that has the following properties.
- The population consists of N objects.
- The sample consists of n objects.
- All possible samples of n objects are equally likely to occur.
Here we randomly select cases from the population such that each case is equally likely to be selected.
In stratified sampling, we divide the population into homogenous group called strata, then randomly sample from within each stratum. If you want to conduct a survey and first you divide the population as male and female them collect randomly 100 female and 100 male then it is called Stratified Sampling.
In cluster sample, we divide the population in clusters or groups. Then randomly sample a few clusters then randomly sample from within these clusters. Here Sampling error is greater than with random sampling. The main difference between Stratified sampling and cluster sampling is clusters may not be homogeneous.
What is simple Random Sampling and Random Assignment?
Random sampling and random assignment are commonly confused or used interchangeably, though the terms refer to entirely different processes.
If subjects are selected from the population randomly and each members of population has equal chance to get selected and the sample is the representative of the entire population then it is called random sampling. Therefore the studies result are generalizable for population at large. Random assignment is an aspect of experimental design in which study participants are assigned to the treatment or control group using a random procedure.
What is Sampling with Replacement and Without Replacement?
Suppose you pick a card from the deck, you can put the card aside or you can put it back into the deck. If you put the card back into the deck, it may be selected more than once; if we put it aside, it can be selected only one time.
When a population element can be selected more than one time, we are sampling with replacement. When a population element can be selected only one time, we are sampling without replacement.