Inferential Statistics
- Inferential Statistics – Definition, Types, Examples, Formulas
- Observational Studies and Experiments
- Sample and Population
- Sampling Bias
- Sampling Methods
- Research Study Design
- Population Distribution, Sample Distribution and Sampling Distribution
- Central Limit Theorem
- Point Estimates
- Confidence Intervals
- Introduction to Bootstrapping
- Bootstrap Confidence Interval
- Paired Samples
- Impact of Sample Size on Confidence Intervals
- Introduction to Hypothesis Testing
- Writing Hypotheses
- Hypotheses Test Examples
- Randomization Procedures
- p-values
- Type I and Type II Errors
- P-value Significance Level
- Issues with Multiple Testing
- Confidence Intervals and Hypothesis Testing
- Inference for One Sample
- Inference for Two Samples
- One-Way ANOVA
- Two-Way ANOVA
- Chi-Square Tests
Introduction to the F Distribution
In this article we will cover the F-Distribution (Fisher–Snedecor distribution) which is often used in the analysis of variance (ANOVA) tests and so is definitely worth knowing if you are a Data Scientist!
Earlier you learned about the z and t distributions. You computed z and t test statistics and used those values to look up p-values using statistical software. Similarly, in this lesson you are going to compute F test statistics. The F test statistic can be used to determine the p-value for a one-way ANOVA.
F-Distribution Simply Explained
The F distribution is a probability distribution used in statistics, particularly in analysis of variance (ANOVA) and regression analysis. It is named after Sir Ronald A. Fisher, who developed the concept of the F distribution.
The F distribution has two degrees of freedom, which determine the shape of the distribution.
There are two types of degrees of freedom in the F distribution: the numerator degrees of freedom (df1) and the denominator degrees of freedom (df2).
In ANOVA, the numerator degrees of freedom correspond to the number of groups minus one (k – 1), while the denominator degrees of freedom correspond to the total sample size minus the number of groups (n – k).
The F statistic is calculated as the ratio of the between-group variance to the within-group variance, and follows the F distribution with df1 and df2 degrees of freedom.
The F distribution is a right-skewed distribution, meaning that the majority of the values are concentrated on the left-hand side of the distribution. The distribution is also non-negative, as the F statistic is always a positive value.
The F distribution is used in hypothesis testing to determine whether there is a significant difference between the means of three or more groups. If the F statistic is larger than the critical value from the F distribution, the null hypothesis is rejected and it can be concluded that there is a significant difference between the means of the groups.
Example of how the F-distribution can be used
Here’s an example of how the F-distribution can be used in practice:
Suppose we want to test whether the variance of the weight of apples produced by two different orchards is the same. We collect a random sample of n1=20 apples from orchard 1 and n2=15 apples from orchard 2, and we compute the sample variances s1^2=4.5 and s2^2=6.2, respectively.
We then calculate the F-statistic:
F = s1^2 / s2^2
Assuming that the weights of the apples are normally distributed, we can use the F-distribution to test the null hypothesis that the two variances are equal.
We choose a significance level α, say 0.05, and compute the corresponding critical value of the F-distribution with degrees of freedom (df1=n1-1, df2=n2-1). If our calculated F-statistic is larger than the critical value, we reject the null hypothesis and conclude that the variances are not equal.
Plugging these values into the formula for the F-statistic, we get:
F = 4.5 / 6.2
F ≈ 0.7258
We then compare this calculated F-statistic to the critical value of the F-distribution with degrees of freedom (df1=n1-1, df2=n2-1) at the chosen significance level (α).
Since the degrees of freedom for the first example are df1=19 and df2=14 (given n1=20 and n2=15), we can use a table of F-distribution critical values or statistical software to find the critical value.
Assuming a significance level of α=0.05, the critical value for an F-distribution with df1=19 and df2=14 is approximately 2.37.
Since the calculated F-statistic (0.7258) is smaller than the critical value (2.37), we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the variances of the two populations are different. In other words, we do not have enough evidence to say that the weights of apples produced by the two orchards have significantly different variances.