Pearson Correlation in R Programming

Pearson correlation is a measure of the linear relationship between two variables. It is also known as Pearson’s correlation coefficient or Pearson’s r. The Pearson correlation coefficient is a number between -1 and 1 that indicates the strength and direction of the linear relationship between two variables.

A Pearson correlation coefficient of +1 indicates a perfect positive correlation, meaning that when one variable increases, the other variable increases proportionally. A Pearson correlation coefficient of -1 indicates a perfect negative correlation, meaning that when one variable increases, the other variable decreases proportionally. A Pearson correlation coefficient of 0 indicates no linear correlation between the two variables.

The Pearson correlation coefficient is commonly used in statistical analysis to determine if there is a relationship between two variables. It can also be used to test hypotheses about the strength and direction of the relationship. Pearson correlation assumes that both variables are normally distributed and have a linear relationship. If these assumptions are not met, other correlation coefficients, such as Spearman’s rank correlation coefficient or Kendall’s tau, may be more appropriate.

Pearson Correlation Testing in R Programming

In R programming, Pearson correlation testing is used to measure the linear relationship between two continuous variables. The Pearson correlation coefficient (r) ranges from -1 to 1, with -1 indicating a perfect negative correlation, 1 indicating a perfect positive correlation, and 0 indicating no correlation.

To conduct Pearson correlation testing in R, you can use the cor() function, which computes the correlation coefficient, and cor.test() function, which provides additional statistical details including p-values and confidence intervals.

Here’s a step-by-step guide to calculating the Pearson correlation coefficient in R:

1. Install and load necessary packages (optional):

# Check if ggplot2 package is already installed
if(!require("ggplot2", quietly = TRUE)) {
# If the package is not installed, install it
install.packages("ggplot2")
}
# Load the package
library(ggplot2)

2. Create two continuous variables:

# Create two continuous variables
variable1 <- c(1, 2, 3, 4, 5)
variable2 <- c(2, 4, 6, 8, 10)

3. Calculate the Pearson correlation coefficient:

# Calculate the correlation coefficient
correlation_coefficient <- cor(variable1, variable2)

# Print the correlation coefficient
print(correlation_coefficient)

4. Perform a Pearson correlation test:

# Perform the Pearson correlation test
correlation_test <- cor.test(variable1, variable2)

# Print the correlation test results
print(correlation_test)

Output

> # Print the correlation test results
> print(correlation_test)

Pearson's product-moment correlation

data: variable1 and variable2
t = 82191237, df = 3, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
1 1
sample estimates:
cor 
1

5. Visualize the correlation between variables (optional):

# Create a data frame with the variables
data_frame <- data.frame(variable1, variable2)

# Visualize the correlation using a scatter plot
ggplot(data_frame, aes(x = variable1, y = variable2)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Scatter plot of Variable 1 vs Variable 2",
x = "Variable 1",
y = "Variable 2")

Replace variable1 and variable2 with your own data, and follow the steps to perform Pearson correlation testing in R.

Example 2:

Here’s another example:

# Create two vectors
x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)

# Compute the correlation coefficient
correlation <- cor(x, y)

# Print the result
print(correlation)

Output

[1] 1

In this example, we created two vectors x and y, and then computed the correlation coefficient using the cor() function. The output of the function is the correlation coefficient between x and y.

Note that the output is 1, which means that there is a perfect positive correlation between x and y. If the output was -1, it would indicate a perfect negative correlation, and if it was 0, it would indicate no correlation.

You can also calculate the p-value of the correlation coefficient using the cor.test() function:

# Compute the correlation coefficient and p-value
cor.test(x, y)

Output:

Pearson's product-moment correlation

data: x and y
t = Inf, df = 3, p-value = 2.449e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
1 1
sample estimates:
cor 
1

In this example, the cor.test() function returns the correlation coefficient, t-value, degrees of freedom, and p-value. The p-value is less than the standard alpha level of 0.05, indicating that the correlation between x and y is statistically significant.

Example 3:

Using cor() method

Here’s another example of calculating Pearson correlation using the built-in “mtcars” dataset in R:

# Load the mtcars dataset
data(mtcars)

# Calculate Pearson correlation between mpg and wt columns
cor(mtcars$mpg, mtcars$wt, method = "pearson")

Output

> # Calculate Pearson correlation between mpg and wt columns
> cor(mtcars$mpg, mtcars$wt, method = "pearson")
[1] -0.8676594

The cor function is used to calculate correlation in R. The first argument is the variable for which correlation is to be calculated, and the second argument is the variable with which correlation is to be calculated. The method parameter is set to “pearson” to calculate Pearson correlation coefficient.

Using cor.test() method

In this example, we’re calculating the correlation coefficient between the “mpg” (miles per gallon) and “wt” (weight) columns of the mtcars dataset. The output will be a single value between -1 and 1, representing the strength and direction of the correlation.

Here’s an example of using the cor.test() function in R to calculate Pearson correlation coefficient and its significance level between the “Sepal.Length” and “Petal.Length” variables in the built-in “iris” dataset:

# Load the iris dataset
data(iris)

# Calculate Pearson correlation coefficient and significance level
cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson")

The cor.test() function is used to perform hypothesis tests on the correlation coefficient between two variables. The first two arguments specify the variables to be correlated, and the method parameter is set to “pearson” to calculate Pearson correlation coefficient.

Output

> # Calculate Pearson correlation coefficient and significance level
> cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson")

Pearson's product-moment correlation

data: iris$Sepal.Length and iris$Petal.Length
t = 21.646, df = 148, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8270363 0.9055080
sample estimates:
cor 
0.8717538

In this example, we’re testing the hypothesis that there is no correlation between “Sepal.Length” and “Petal.Length” in the “iris” dataset. The output will include the Pearson correlation coefficient, the degrees of freedom, the p-value, and a confidence interval for the correlation. The p-value indicates the significance of the correlation coefficient, with lower p-values indicating stronger evidence against the null hypothesis of no correlation.

Statistics with R