Inference for Two Independent Means

Inference for two independent means is a statistical method used to determine if there is a significant difference between the means of two independent groups. Let’s explore how we can compare the means of two independent groups. If the populations are known to be approximately normally distributed, or if both sample sizes are at least 30, then the sampling distribution can be estimated using the t distribution. If this assumption is not met then simulation methods (i.e., bootstrapping or randomization) may be used.

Confidence interval:

General form of a confidence interval:

1. Calculate the confidence interval using the formula:

Construct the confidence interval by subtracting the margin of error from the difference between the sample means and adding the margin of error to the difference between the sample means.

CI = (x1 – x2) +/- ME

where x1 and x2 are the sample means.

Calculate the margin of error by multiplying the standard error by the critical value. ME = t*SE

Calculate the standard error of the difference between the means using the formula: SE = sqrt[(s1^2 / n1) + (s2^2 / n2)]

where s1 and s2 are the sample standard deviations and n1 and n2 are the sample sizes.

Determine the degrees of freedom (df) using the formula: df = n1 + n2 – 2

Determine the critical value from the t-distribution table or a statistical software package based on the desired level of confidence and degrees of freedom.

2. Interpret the confidence interval. If the interval does not contain zero, then the difference between the two proportions is significant at the given confidence level.

Interpret the confidence interval. The confidence interval provides a range of values that is likely to contain the true population parameter with the specified level of confidence. For example, a 95% confidence interval for the difference between the means would suggest that we are 95% confident that the true population difference lies within the calculated interval.

Example of confidence interval for Two Independent Mean

Let’s say we want to compare the average height of men and women. We take a random sample of 50 men and 50 women and measure their heights. We find that the sample mean height of men is 175 cm and the sample mean height of women is 162 cm. The standard deviation of the sample heights for men is 5 cm and for women is 6 cm.

To calculate the confidence interval for the difference in means, we first need to calculate the standard error:

SE = sqrt((s1^2/n1) + (s2^2/n2))

where s1 and s2 are the standard deviations of the two samples, n1 and n2 are the sample sizes.

SE = sqrt((5^2/50) + (6^2/50)) = 1.3

Next, we need to find the critical value from the t-distribution. Let’s use a 95% confidence level with 98 degrees of freedom (df = n1 + n2 – 2). We can look up the critical value from a t-table or use a calculator to find it.

The critical value is approximately 1.984.

Now we can plug in all the values to the formula:

CI = (175 – 162) ± 1.984*1.3 = 12 ± 2.6

So the confidence interval for the difference in means is (9.4, 14.6) cm. This means we are 95% confident that the true difference in heights between men and women is between 9.4 cm and 14.6 cm. Since the interval does not include zero, we can conclude that there is a statistically significant difference in heights between men and women in the population.

Hypothesis test

1. Check any necessary assumptions and write null and alternative hypotheses.

There are two assumptions: (1) the two samples are independent and (2) both populations are normally distributed or np >= 30 and n(1-p) >= 30. If the second assumption is not met then you can conduct a randomization test.

2. Calculate the test statistic using the formula: t = (x̄1 – x̄2) / sqrt[(s1^2/n1) + (s2^2/n2)]

where x̄1 and x̄2 are the sample means, s1 and s2 are the sample standard deviations, and n1 and n2 are the sample sizes.

Find the degrees of freedom, df, using the following formula:

df = (s1^2/n1 + s2^2/n2)^2 / [(s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1)]

 

 

3.Determine the critical value using a t-table or calculator at the chosen level of significance and degrees of freedom.

4. Compare the p-value to the significance level (α) to determine if the null hypothesis should be rejected or not.

If the p-value is less than α, reject the null hypothesis and conclude that there is a significant difference between the two proportions. Otherwise, fail to reject the null hypothesis.

5. State a “real world” conclusion.

Based on your decision in Step 4, write a conclusion in terms of the original research question.

Hypothesis Testing Example – School

Suppose we want to test whether there is a difference in the average scores on a math test between two different schools, School A and School B. We randomly select 50 students from School A and 50 students from School B and give them the same math test. The null hypothesis is that there is no difference in the average scores between the two schools, and the alternative hypothesis is that there is a difference.

  • H0: μA = μB (there is no difference in the mean scores between School A and School B)
  • Ha: μA ≠ μB (there is a difference in the mean scores between School A and School B)

We can use a significance level of 0.05, which means we are willing to accept a 5% chance of making a type I error (rejecting the null hypothesis when it is actually true).

We can use a two-sample t-test to test the hypothesis. The formula for the test statistic is:

t = (x̄1 – x̄2 – 0) / sqrt(s1^2/n1 + s2^2/n2)

where x̄1 and x̄2 are the sample means, s1 and s2 are the sample standard deviations, and n1 and n2 are the sample sizes.

We calculate the sample means and standard deviations:

x̄1 = 80, s1 = 10 x̄2 = 75, s2 = 8

We plug in the values to get:

t = (80 – 75 – 0) / sqrt(10^2/50 + 8^2/50) = 3.07

We compare the test statistic to the critical value from the t-distribution with 98 degrees of freedom (df = n1 + n2 – 2) at a significance level of 0.05. Let’s assume we use a two-tailed test, so we divide the significance level by 2 to get a critical value of ±1.984.

Since our calculated t-value (3.07) is greater than our critical value (±1.984), we reject the null hypothesis. This means we can conclude that there is a statistically significant difference in the mean math scores between School A and School B.

Inference for Two Independent Proportions

One-Way ANOVA