Normal Probability Plot in R using ggplot2

A normal probability plot, also known as a quantile-quantile (Q-Q) plot, is a graphical method for comparing a set of data to a normal distribution. If the data follows a normal distribution, the points in the plot will fall approximately along a straight line.

To create a Normal Probability Plot in R using ggplot2, you can use the ggplot() function to create the plot and then add the data using the stat_qq() function.

Here’s an simple example:

library(ggplot2)

# Create some sample data
x <- rnorm(100)

# Create the normal probability plot
ggplot(data.frame(x), aes(sample = x)) +
stat_qq() +
stat_qq_line() +
labs(title = "Normal Probability Plot")

In this example, we first generate a random sample of 100 observations from a normal distribution using the rnorm() function. We then create the plot using ggplot() and specify the sample variable using aes(). We add the normal probability plot using stat_qq() and the reference line using stat_qq_line(). Finally, we add a title using labs(). You can customize the plot further by adding axis labels and adjusting the appearance of the plot using the various ggplot() functions.

Example 2:

Here is another example of normal probability plot in R using ggplot2, you can use the ggplot() function and the stat_qq() function from the ggplot2 package. In this example we will use built-in mtcars dataset.

library(ggplot2)

# Create a normal probability plot of the 
# mpg variable in the mtcars dataset
ggplot(mtcars, aes(sample = mpg)) +
stat_qq() +
ggtitle("Normal Probability Plot of MPG in the mtcars Dataset")

This code will create a normal probability plot of the mpg variable in the mtcars dataset and add a title to the plot. You can customize the plot by adding additional layers, changing the title and axis labels, and adjusting other plot aesthetics.

If you want to add a reference line to the plot that represents a perfectly normal distribution, you can use the stat_qq_line() function like this:

ggplot(mtcars, aes(sample = mpg)) +
stat_qq() +
stat_qq_line() +
ggtitle("Normal Probability Plot of MPG in the mtcars Dataset")

Example 3:

Now you know that, a normal probability plot, also called a Q-Q plot (quantile-quantile plot), is used to assess if a dataset follows a normal distribution. To create a normal probability plot in R using ggplot2, you’ll need to follow these steps:

Install and load required libraries.
Create a dataset or load existing data.
Calculate the theoretical quantiles and sort the data.
Create a ggplot with the sorted data and theoretical quantiles.
Add the reference line (45-degree line) to the plot.

Here’s another complete example:

# Step 1: Install and load required libraries
if (!requireNamespace("ggplot2", quietly = TRUE)) {
install.packages("ggplot2")
}
if (!requireNamespace("dplyr", quietly = TRUE)) {
install.packages("dplyr")
}
library(ggplot2)
library(dplyr)

# Step 2: Create a dataset or load existing data
# Here, we generate a random dataset following a normal distribution
set.seed(42)
data <- rnorm(100, mean = 0, sd = 1)

# Step 3: Calculate the theoretical quantiles and sort the data
data <- data.frame(sample = data) %>%
mutate(rank = rank(sample)) %>%
arrange(rank) %>%
mutate(qq = qnorm((rank - 0.5) / length(sample)))

# Step 4: Create a ggplot with the sorted data and theoretical quantiles
normal_probability_plot <- ggplot(data, aes(x = qq, y = sample)) +
geom_point() +
xlab("Theoretical Quantiles") +
ylab("Sample Quantiles") +
ggtitle("Normal Probability Plot")

# Step 5: Add the reference line (45-degree line) to the plot
normal_probability_plot <- normal_probability_plot +
geom_abline(
intercept = 0,
slope = 1,
color = "red",
linetype = "dashed"
) +
theme_bw()

# Display the plot
print(normal_probability_plot)

This code will create a normal probability plot for a dataset following a normal distribution. If the points in the plot lie close to the 45-degree red reference line, it suggests that the data is normally distributed.

Statistics with R