Inferential Statistics
- Inferential Statistics – Definition, Types, Examples, Formulas
- Observational Studies and Experiments
- Sample and Population
- Sampling Bias
- Sampling Methods
- Research Study Design
- Population Distribution, Sample Distribution and Sampling Distribution
- Central Limit Theorem
- Point Estimates
- Confidence Intervals
- Introduction to Bootstrapping
- Bootstrap Confidence Interval
- Paired Samples
- Impact of Sample Size on Confidence Intervals
- Introduction to Hypothesis Testing
- Writing Hypotheses
- Hypotheses Test Examples
- Randomization Procedures
- p-values
- Type I and Type II Errors
- P-value Significance Level
- Issues with Multiple Testing
- Confidence Intervals and Hypothesis Testing
- Inference for One Sample
- Inference for Two Samples
- One-Way ANOVA
- Two-Way ANOVA
- Chi-Square Tests
Inference for Two Independent Proportions
Statistical inference for two independent proportions involves comparing the proportions of two independent groups to determine whether they are significantly different. This can be done using a hypothesis test or a confidence interval.
Two independent proportions tests are used to compare the proportions in two unrelated groups.
Confidence interval:
General form of a confidence interval:
1. Calculate the confidence interval using the formula:
(p1 – p2) ± zα/2 * sqrt( (p1(1 – p1)) / n1 + (p2(1 – p2)) / n2 )
where p1 and p2 are the proportions of the two groups, n1 and n2 are the sample sizes, and zα/2 is the critical value for the desired confidence level.
2. Interpret the confidence interval. If the interval does not contain zero, then the difference between the two proportions is significant at the given confidence level.
Statistical inference for two independent proportions involves comparing the proportions of two independent groups to determine whether they are significantly different. This can be done using a hypothesis test or a confidence interval.
Example of confidence interval for Two Independent Proportions
Here’s an example of a confidence interval for two independent proportions:
Suppose we want to compare the proportion of males and females who own a smartphone. We randomly sample 500 males and 500 females and find that 350 males and 420 females own a smartphone.
To construct a confidence interval for the difference in proportions, we can use the following formula:
CI = (p1 – p2) ± zsqrt((p1(1-p1)/n1) + (p2*(1-p2)/n2))
Assuming a 95% confidence level, the critical value of z is 1.96.
Plugging in the values, we get:
CI = (0.7 – 0.84) ± 1.96sqrt((0.7(1-0.7)/500) + (0.84*(1-0.84)/500))
CI = -0.14 ± 0.045
CI = (-0.185, -0.095)
Therefore, we can say with 95% confidence that the difference in the proportion of males and females who own a smartphone is between -0.185 and -0.095. This means that the proportion of females who own a smartphone is significantly higher than the proportion of males who own a smartphone.
Hypothesis test
1. Check any necessary assumptions and write null and alternative hypotheses.
The two groups that are being compared must be unpaired and unrelated (i.e., independent). To use the normal approximation method a minimum of 10 successes and 10 failures in each group are necessary. The condition for using the normal approximation method for the proportion of successes in a binary outcome variable can be expressed mathematically as follows:
np >= 10 and n(1-p) >= 10
where n is the sample size, and p is the true proportion of successes in the population.
The normal approximation method, also known as the central limit theorem, is a statistical technique used to approximate the distribution of a sample mean or a sample proportion with a normal distribution.
Below are the possible null and alternative hypothesis pairs:
2. Calculate the test statistic using the formula: z = (p1 – p2) / sqrt( (p̂(1 – p̂)) / n1 + (p̂(1 – p̂)) / n2 )
where p1 and p2 are the proportions of the two groups, n1 and n2 are the sample sizes, and p̂ is the pooled proportion (p̂ = (x1 + x2) / (n1 + n2)).
3. Calculate the p-value using the standard normal distribution table or a statistical software.
4. Compare the p-value to the significance level (α) to determine if the null hypothesis should be rejected or not.
If the p-value is less than α, reject the null hypothesis and conclude that there is a significant difference between the two proportions. Otherwise, fail to reject the null hypothesis.
5. State a “real world” conclusion.
Based on your decision in Step 4, write a conclusion in terms of the original research question.
Hypothesis Testing Example – Customers
Suppose we are interested in comparing the proportion of customers who buy a particular product from two different online retailers, A and B. We randomly select 200 customers who visited retailer A and 250 customers who visited retailer B. Among the 200 customers who visited retailer A, 40 bought the product, while among the 250 customers who visited retailer B, 60 bought the product.
Our null hypothesis is that the proportion of customers who buy the product from retailer A is the same as the proportion of customers who buy the product from retailer B. Our alternative hypothesis is that the proportions are different.
We can set up the hypothesis test as follows:
Null hypothesis: p1 = p2 (where p1 is the proportion of customers who buy the product from retailer A, and p2 is the proportion of customers who buy the product from retailer B)
Alternative hypothesis: p1 ≠ p2
We can use the two-proportion z-test to test this hypothesis. The test statistic is:
z = (p1 – p2) / sqrt(p_hat * (1 – p_hat) * (1/n1 + 1/n2))
where p_hat = (x1 + x2) / (n1 + n2) is the pooled sample proportion, x1 and x2 are the number of customers who bought the product from retailer A and B, respectively, and n1 and n2 are the sample sizes.
In our example, p1 = 40/200 = 0.2, p2 = 60/250 = 0.24, p_hat = (40 + 60) / (200 + 250) = 0.222, n1 = 200, and n2 = 250. Plugging these values into the formula, we get:
z = (0.2 – 0.24) / sqrt(0.222 * (1 – 0.222) * (1/200 + 1/250)) = -1.56
Using a significance level of 0.05 and a two-tailed test, we can find the critical value from the standard normal distribution table. The critical value is ±1.96. Since the calculated test statistic (-1.56) is not in the rejection region, we fail to reject the null hypothesis. Therefore, we do not have enough evidence to conclude that there is a difference in the proportion of customers who buy the product from retailer A and retailer B.