Statistics with R
- Statistics with R
- R Objects, Numbers, Attributes, Vectors, Coercion
- Matrices, Lists, Factors
- Data Frames in R
- Control Structures in R
- Functions in R
- Data Basics: Compute Summary Statistics in R
- Central Tendency and Spread in R Programming
- Data Basics: Plotting – Charts and Graphs
- Normal Distribution in R
- Skewness of statistical data
- Bernoulli Distribution in R
- Binomial Distribution in R Programming
- Compute Randomly Drawn Negative Binomial Density in R Programming
- Poisson Functions in R Programming
- How to Use the Multinomial Distribution in R
- Beta Distribution in R
- Chi-Square Distribution in R
- Exponential Distribution in R Programming
- Log Normal Distribution in R
- Continuous Uniform Distribution in R
- Understanding the t-distribution in R
- Gamma Distribution in R Programming
- How to Calculate Conditional Probability in R?
- How to Plot a Weibull Distribution in R
- Hypothesis Testing in R Programming
- T-Test in R Programming
- Type I Error in R
- Type II Error in R
- Confidence Intervals in R
- Covariance and Correlation in R
- Covariance Matrix in R
- Pearson Correlation in R
- Normal Probability Plot in R
Normal Probability Plot in R using ggplot2
A normal probability plot, also known as a quantile-quantile (Q-Q) plot, is a graphical method for comparing a set of data to a normal distribution. If the data follows a normal distribution, the points in the plot will fall approximately along a straight line.
To create a Normal Probability Plot in R using ggplot2, you can use the ggplot()
function to create the plot and then add the data using the stat_qq()
function.
Here’s an simple example:
library(ggplot2) # Create some sample data x <- rnorm(100) # Create the normal probability plot ggplot(data.frame(x), aes(sample = x)) + stat_qq() + stat_qq_line() + labs(title = "Normal Probability Plot")
In this example, we first generate a random sample of 100 observations from a normal distribution using the rnorm()
function. We then create the plot using ggplot()
and specify the sample
variable using aes()
. We add the normal probability plot using stat_qq()
and the reference line using stat_qq_line()
. Finally, we add a title using labs()
. You can customize the plot further by adding axis labels and adjusting the appearance of the plot using the various ggplot()
functions.
Example 2:
Here is another example of normal probability plot in R using ggplot2, you can use the ggplot() function and the stat_qq() function from the ggplot2 package. In this example we will use built-in mtcars dataset.
library(ggplot2) # Create a normal probability plot of the # mpg variable in the mtcars dataset ggplot(mtcars, aes(sample = mpg)) + stat_qq() + ggtitle("Normal Probability Plot of MPG in the mtcars Dataset")
This code will create a normal probability plot of the mpg
variable in the mtcars
dataset and add a title to the plot. You can customize the plot by adding additional layers, changing the title and axis labels, and adjusting other plot aesthetics.
If you want to add a reference line to the plot that represents a perfectly normal distribution, you can use the stat_qq_line()
function like this:
ggplot(mtcars, aes(sample = mpg)) + stat_qq() + stat_qq_line() + ggtitle("Normal Probability Plot of MPG in the mtcars Dataset")
Example 3:
Now you know that, a normal probability plot, also called a Q-Q plot (quantile-quantile plot), is used to assess if a dataset follows a normal distribution. To create a normal probability plot in R using ggplot2, you’ll need to follow these steps:
- Install and load required libraries.
- Create a dataset or load existing data.
- Calculate the theoretical quantiles and sort the data.
- Create a ggplot with the sorted data and theoretical quantiles.
- Add the reference line (45-degree line) to the plot.
Here’s another complete example:
# Step 1: Install and load required libraries if (!requireNamespace("ggplot2", quietly = TRUE)) { install.packages("ggplot2") } if (!requireNamespace("dplyr", quietly = TRUE)) { install.packages("dplyr") } library(ggplot2) library(dplyr) # Step 2: Create a dataset or load existing data # Here, we generate a random dataset following a normal distribution set.seed(42) data <- rnorm(100, mean = 0, sd = 1) # Step 3: Calculate the theoretical quantiles and sort the data data <- data.frame(sample = data) %>% mutate(rank = rank(sample)) %>% arrange(rank) %>% mutate(qq = qnorm((rank - 0.5) / length(sample))) # Step 4: Create a ggplot with the sorted data and theoretical quantiles normal_probability_plot <- ggplot(data, aes(x = qq, y = sample)) + geom_point() + xlab("Theoretical Quantiles") + ylab("Sample Quantiles") + ggtitle("Normal Probability Plot") # Step 5: Add the reference line (45-degree line) to the plot normal_probability_plot <- normal_probability_plot + geom_abline( intercept = 0, slope = 1, color = "red", linetype = "dashed" ) + theme_bw() # Display the plot print(normal_probability_plot)
This code will create a normal probability plot for a dataset following a normal distribution. If the points in the plot lie close to the 45-degree red reference line, it suggests that the data is normally distributed.