One-way ANOVA hypothesis test

A one-way between groups ANOVA is used to compare the means of more than two independent groups. A one-way between groups ANOVA comparing just two groups will give you the same results at the independent t test that you learned in previous article. We will use the five step hypothesis testing procedure again in this lesson.

How to conduct a one-way ANOVA analysis

One-way ANOVA (Analysis of Variance) is a statistical test used to compare the means of three or more groups that are independent of each other. It helps to determine whether there is a significant difference between the means of the groups.

Here’s how to conduct a one-way ANOVA analysis:

Step 1: Check assumptions and write hypotheses

The assumptions for a one-way between groups ANOVA are:

  1. Samples are independent
  2. The response variable is approximately normally distributed for each group or all group sample sizes are at least 30
  3. The population variances are equal across responses for the group levels (if the largest sample standard deviation divided by the smallest sample standard deviation is not greater than two, then assume that the population variances are equal)

The null hypothesis (H0) for one-way ANOVA states that there is no significant difference between the means of the groups. The alternative hypothesis (Ha) states that there is a significant difference between the means of the groups.

Mathematically, this can be expressed as:

H0: μ1 = μ2 = μ3 = … = μk

where μ1, μ2, μ3, …, μk represent the population means of the k groups.

The alternative hypothesis is that at least one of the group means is different from the others. This can be expressed as:

Ha: At least one of the population means is different from the others. or alternatively:

Ha: μi ≠ μj for at least one pair of i and j, where i and j are integers between 1 and k and i ≠ j.

Step 2: Choose a Significance Level

Choose a significance level (alpha) to determine whether the null hypothesis should be rejected. The most common value for alpha is 0.05.

Step 3: Collect Data

Collect data from three or more independent groups. The data should be continuous and normally distributed.

Step 4: Calculate the Sum of Squares (SS)

Calculate the sum of squares (SS) for between groups (SSB) and within groups (SSW). The SSB represents the variation between the groups, while the SSW represents the variation within the groups.

Step 5: Calculate the Degrees of Freedom (DF)

Calculate the degrees of freedom (DF) for between groups (DFB) and within groups (DFW).

DFB = k-1, where k is the number of groups. DFW = N – k, where N is the total number of observations.

Step 6: Calculate the Mean Squares (MS)

Calculate the mean squares (MS) for between groups (MSB) and within groups (MSW).

MSB = SSB/DFB

MSW = SSW/DFW

Step 7: Calculate the F-Value

Calculate the F-value by dividing the MSB by the MSW.

F = MSB/MSW

The table below gives you all of the formulas, but you will not be responsible for performing these calculations by hand. 

Step 8: Determine the p-value

Use the F-distribution table or statistical software to determine the p-value associated with the F-value.

Step 9: Make a Decision

Compare the p-value with the chosen significance level (alpha). If the p-value is less than alpha, reject the null hypothesis and conclude that there is a significant difference between the means of the groups. Otherwise, fail to reject the null hypothesis.

Example: Weight gain of rats

Here is an example of conducting a one-way ANOVA analysis:

Suppose a researcher wants to compare the average weight gain of rats that were fed one of three different diets (diet A, diet B, and diet C). The researcher collects weight gain data from 10 rats that were fed each diet and wants to determine if there is a significant difference in weight gain among the three diets.

Suppose the weight gain data for the three diets are as follows:

  • Diet A: 8, 12, 9, 10, 11, 7, 13, 10, 9, 12
  • Diet B: 6, 4, 9, 5, 8, 7, 3, 6, 5, 4
  • Diet C: 10, 11, 13, 9, 12, 8, 14, 11, 12, 13

Step 1: Define your hypothesis.

The null hypothesis is that the means of weight gain for rats on all three diets are equal, while the alternative hypothesis is that at least one diet’s mean weight gain is different from the others.

Null hypothesis: H0: μA = μB = μC

Alternative hypothesis: H1: At least one μi is different

Step 2: Collect data.

Collect weight gain data from 30 rats (10 rats on each diet).

Step 3: Calculate the mean and variance.

Calculate the mean and variance of weight gain for each diet.

The mean and variance for each diet are:

  • Diet A: Mean = 10.2, Variance = 3.76
  • Diet B: Mean = 5.7, Variance = 3.19
  • Diet C: Mean = 11.3, Variance = 3.76

Step 4: Calculate the sum of squares.

Calculate the sum of squares for each diet and the total sum of squares.

The sum of squares for each diet and the total sum of squares are:

  • Diet A: SS(A) = (8-10.2)² + (12-10.2)² + (9-10.2)² + (10-10.2)² + (11-10.2)² + (7-10.2)² + (13-10.2)² + (10-10.2)² + (9-10.2)² + (12-10.2)² = 39.6
  • Diet B: SS(B) = (6-5.7)² + (4-5.7)² + (9-5.7)² + (5-5.7)² + (8-5.7)² + (7-5.7)² + (3-5.7)² + (6-5.7)² + (5-5.7)² + (4-5.7)² = 34.7
  • Diet C: SS(C) = (10-11.3)² + (11-11.3)² + (13-11.3)² + (9-11.3)² + (12-11.3)² + (8-11.3)² + (14-11.3)² + (11-11.3)² + (12-11.3)² + (13-11.3)² = 39.7
  • Total: SST = SS(A) + SS(B) + SS(C) = 114

Step 5: Calculate the degrees of freedom.

Calculate the degrees of freedom for each diet and the total degrees of freedom.

Degrees of freedom for each diet and the total degrees of freedom are:

  • Degrees of freedom for diet A: df(A) = n – 1 = 10 – 1 = 9
  • Degrees of freedom for diet B: df(B) = n – 1 = 10 – 1 = 9
  • Degrees of freedom for diet C: df(C) = n – 1 = 10 – 1 = 9
  • Total degrees of freedom: dft = N – 1 = 30 – 1 = 29

Step 6: Calculate the F-statistic.

Calculate the F-statistic using the sum of squares and degrees of freedom. The F-statistic is calculated as:

F = (MSbetween / MSwithin)

    = (SStreatment / dfbetween) / (SSerror / dfwithin)

where MSbetween is the mean square between groups, MSwithin is the mean square within groups, SStreatment is the sum of squares between groups (also called the treatment sum of squares), dfbetween is the degrees of freedom between groups, SSerror is the sum of squares within groups (also called the error sum of squares), and dfwithin is the degrees of freedom within groups.

To calculate the F-statistic, we first need to calculate the mean square between groups and the mean square within groups:

  • Mean square between groups:

MSbetween = SST / dfbetween = (SSA + SSB + SSC) / (k – 1) = (39.6 + 34.7 + 39.7) / 2 = 56.5

where k is the number of groups, which is 3 in this case.

  • Mean square within groups:

MSwithin = SSE / dfwithin = (SSA + SSB + SSC) / (N – k) = 114 / 27 = 4.22

where SSE is the sum of squares for error, which is equal to the total sum of squares minus the sum of squares between groups: SSE = SST – SSB = 114 – 56.5 = 57.5

Now we can calculate the F-statistic:

F = MSbetween / MSwithin = 56.5 / 4.22 = 13.4

Step 7: Determine the p-value.

Determine the p-value associated with the F-statistic using a table or statistical software. Let’s assume the p-value is 0.02.

Step 8: Draw a conclusion.

Compare the p-value with the level of significance (alpha). If the alpha is set at 0.05, since the p-value is less than alpha, we reject the null hypothesis and conclude that there is a significant difference in weight gain among the three diets.

In conclusion, based on the one-way ANOVA, we can reject the null hypothesis and conclude that there is a significant difference in weight gain among the rats fed the three different diets.

Post-hoc testing

Post-hoc testing refers to a set of statistical tests that are used to determine which groups in a study differ significantly from each other after a significant result has been obtained from an omnibus test like ANOVA (Analysis of Variance).

The purpose of post-hoc testing is to identify which pairs of groups have statistically significant differences in means, which can help to further interpret the results of the omnibus test. There are several different methods for post-hoc testing, including Tukey’s HSD (Honestly Significant Difference) test, Bonferroni correction, and Scheffe’s method, among others.

Tukey’s HSD test is one of the most commonly used post-hoc tests, and it involves calculating a confidence interval for the difference between the means of each pair of groups. If the confidence interval does not include zero, then the difference between the means is considered statistically significant. Bonferroni correction involves adjusting the significance level (usually 0.05) by dividing it by the number of comparisons being made, which can help to reduce the likelihood of false positives. Scheffe’s method is a more conservative post-hoc test that involves comparing the overall variation in the data to the variation within and between groups to determine if there are any significant differences.

It’s important to note that post-hoc testing is only appropriate if the omnibus test (such as ANOVA) has yielded a significant result. Conducting multiple post-hoc tests can increase the likelihood of false positives, so it’s important to choose an appropriate method and adjust the significance level as needed to reduce the risk of making Type I errors.

Frequently asked questions about one-way ANOVA

What is the difference between a one-way and a two-way ANOVA?

A one-way ANOVA is a statistical test used to compare the means of two or more independent groups, based on a single independent variable (or factor) with two or more levels. The main purpose of a one-way ANOVA is to determine whether there is a significant difference among the means of the groups.

On the other hand, a two-way ANOVA is a statistical test used to analyze the effects of two independent variables (or factors) on a dependent variable. It is used to determine whether there is a significant interaction between the two factors, as well as the main effects of each factor on the dependent variable.

In a two-way ANOVA, the dependent variable is still a continuous variable, and the independent variables are categorical variables, each with two or more levels. The two independent variables are called “factors”, and they can either be crossed or nested.

A crossed design means that all combinations of the two factors are present in the study, while a nested design means that one factor is nested within the other. The main difference between one-way and two-way ANOVA is that the latter takes into account the effect of two independent variables on the dependent variable, while the former only considers the effect of a single independent variable.

In summary, a one-way ANOVA is used when you want to compare the means of two or more independent groups based on a single independent variable, while a two-way ANOVA is used when you want to analyze the effects of two independent variables on a dependent variable.

What is a factorial ANOVA?

A factorial ANOVA is a statistical test used to analyze the effects of two or more independent variables (or factors) on a dependent variable. It is an extension of the two-way ANOVA, where instead of two independent variables, there are two or more independent variables, and each independent variable can have two or more levels.

A factorial ANOVA allows researchers to examine the effect of each independent variable on the dependent variable while controlling for the effects of the other independent variables. The interaction effect between the independent variables can also be examined.

For example, suppose a researcher is interested in how both age and gender influence test scores. Age could have three levels (young, middle-aged, and old), while gender could have two levels (male and female). The researcher could use a 3 x 2 factorial ANOVA to examine the main effects of age and gender and the interaction effect of age and gender on test scores.

Introduction to the F Distribution

Two-Way ANOVA