Statistics with R
- Statistics with R
- R Objects, Numbers, Attributes, Vectors, Coercion
- Matrices, Lists, Factors
- Data Frames in R
- Control Structures in R
- Functions in R
- Data Basics: Compute Summary Statistics in R
- Central Tendency and Spread in R Programming
- Data Basics: Plotting – Charts and Graphs
- Normal Distribution in R
- Skewness of statistical data
- Bernoulli Distribution in R
- Binomial Distribution in R Programming
- Compute Randomly Drawn Negative Binomial Density in R Programming
- Poisson Functions in R Programming
- How to Use the Multinomial Distribution in R
- Beta Distribution in R
- Chi-Square Distribution in R
- Exponential Distribution in R Programming
- Log Normal Distribution in R
- Continuous Uniform Distribution in R
- Understanding the t-distribution in R
- Gamma Distribution in R Programming
- How to Calculate Conditional Probability in R?
- How to Plot a Weibull Distribution in R
- Hypothesis Testing in R Programming
- T-Test in R Programming
- Type I Error in R
- Type II Error in R
- Confidence Intervals in R
- Covariance and Correlation in R
- Covariance Matrix in R
- Pearson Correlation in R
- Normal Probability Plot in R
Pearson Correlation in R Programming
Pearson correlation is a measure of the linear relationship between two variables. It is also known as Pearson’s correlation coefficient or Pearson’s r. The Pearson correlation coefficient is a number between -1 and 1 that indicates the strength and direction of the linear relationship between two variables.
A Pearson correlation coefficient of +1 indicates a perfect positive correlation, meaning that when one variable increases, the other variable increases proportionally. A Pearson correlation coefficient of -1 indicates a perfect negative correlation, meaning that when one variable increases, the other variable decreases proportionally. A Pearson correlation coefficient of 0 indicates no linear correlation between the two variables.
The Pearson correlation coefficient is commonly used in statistical analysis to determine if there is a relationship between two variables. It can also be used to test hypotheses about the strength and direction of the relationship. Pearson correlation assumes that both variables are normally distributed and have a linear relationship. If these assumptions are not met, other correlation coefficients, such as Spearman’s rank correlation coefficient or Kendall’s tau, may be more appropriate.
Pearson Correlation Testing in R Programming
In R programming, Pearson correlation testing is used to measure the linear relationship between two continuous variables. The Pearson correlation coefficient (r) ranges from -1 to 1, with -1 indicating a perfect negative correlation, 1 indicating a perfect positive correlation, and 0 indicating no correlation.
To conduct Pearson correlation testing in R, you can use the cor()
function, which computes the correlation coefficient, and cor.test()
function, which provides additional statistical details including p-values and confidence intervals.
Here’s a step-by-step guide to calculating the Pearson correlation coefficient in R:
1. Install and load necessary packages (optional):
# Check if ggplot2 package is already installed if(!require("ggplot2", quietly = TRUE)) { # If the package is not installed, install it install.packages("ggplot2") } # Load the package library(ggplot2)
2. Create two continuous variables:
# Create two continuous variables variable1 <- c(1, 2, 3, 4, 5) variable2 <- c(2, 4, 6, 8, 10)
3. Calculate the Pearson correlation coefficient:
# Calculate the correlation coefficient correlation_coefficient <- cor(variable1, variable2) # Print the correlation coefficient print(correlation_coefficient)
4. Perform a Pearson correlation test:
# Perform the Pearson correlation test correlation_test <- cor.test(variable1, variable2) # Print the correlation test results print(correlation_test)
Output
> # Print the correlation test results > print(correlation_test) Pearson's product-moment correlation data: variable1 and variable2 t = 82191237, df = 3, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 1 1 sample estimates: cor 1
5. Visualize the correlation between variables (optional):
# Create a data frame with the variables data_frame <- data.frame(variable1, variable2) # Visualize the correlation using a scatter plot ggplot(data_frame, aes(x = variable1, y = variable2)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + labs(title = "Scatter plot of Variable 1 vs Variable 2", x = "Variable 1", y = "Variable 2")
Replace variable1
and variable2
with your own data, and follow the steps to perform Pearson correlation testing in R.
Example 2:
Here’s another example:
# Create two vectors x <- c(1, 2, 3, 4, 5) y <- c(6, 7, 8, 9, 10) # Compute the correlation coefficient correlation <- cor(x, y) # Print the result print(correlation)
Output
[1] 1
In this example, we created two vectors x
and y
, and then computed the correlation coefficient using the cor()
function. The output of the function is the correlation coefficient between x
and y
.
Note that the output is 1
, which means that there is a perfect positive correlation between x
and y
. If the output was -1
, it would indicate a perfect negative correlation, and if it was 0
, it would indicate no correlation.
You can also calculate the p-value of the correlation coefficient using the cor.test()
function:
# Compute the correlation coefficient and p-value cor.test(x, y)
Output:
Pearson's product-moment correlation data: x and y t = Inf, df = 3, p-value = 2.449e-10 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 1 1 sample estimates: cor 1
In this example, the cor.test()
function returns the correlation coefficient, t-value, degrees of freedom, and p-value. The p-value is less than the standard alpha level of 0.05, indicating that the correlation between x
and y
is statistically significant.
Example 3:
Using cor() method
Here’s another example of calculating Pearson correlation using the built-in “mtcars” dataset in R:
# Load the mtcars dataset data(mtcars) # Calculate Pearson correlation between mpg and wt columns cor(mtcars$mpg, mtcars$wt, method = "pearson")
Output
> # Calculate Pearson correlation between mpg and wt columns > cor(mtcars$mpg, mtcars$wt, method = "pearson") [1] -0.8676594
The cor
function is used to calculate correlation in R. The first argument is the variable for which correlation is to be calculated, and the second argument is the variable with which correlation is to be calculated. The method
parameter is set to “pearson” to calculate Pearson correlation coefficient.
Using cor.test() method
In this example, we’re calculating the correlation coefficient between the “mpg” (miles per gallon) and “wt” (weight) columns of the mtcars dataset. The output will be a single value between -1 and 1, representing the strength and direction of the correlation.
Here’s an example of using the cor.test()
function in R to calculate Pearson correlation coefficient and its significance level between the “Sepal.Length” and “Petal.Length” variables in the built-in “iris” dataset:
# Load the iris dataset data(iris) # Calculate Pearson correlation coefficient and significance level cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson")
The cor.test()
function is used to perform hypothesis tests on the correlation coefficient between two variables. The first two arguments specify the variables to be correlated, and the method
parameter is set to “pearson” to calculate Pearson correlation coefficient.
Output
> # Calculate Pearson correlation coefficient and significance level > cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson") Pearson's product-moment correlation data: iris$Sepal.Length and iris$Petal.Length t = 21.646, df = 148, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.8270363 0.9055080 sample estimates: cor 0.8717538
In this example, we’re testing the hypothesis that there is no correlation between “Sepal.Length” and “Petal.Length” in the “iris” dataset. The output will include the Pearson correlation coefficient, the degrees of freedom, the p-value, and a confidence interval for the correlation. The p-value indicates the significance of the correlation coefficient, with lower p-values indicating stronger evidence against the null hypothesis of no correlation.