Statistics with R
- Statistics with R
- R Objects, Numbers, Attributes, Vectors, Coercion
- Matrices, Lists, Factors
- Data Frames in R
- Control Structures in R
- Functions in R
- Data Basics: Compute Summary Statistics in R
- Central Tendency and Spread in R Programming
- Data Basics: Plotting – Charts and Graphs
- Normal Distribution in R
- Skewness of statistical data
- Bernoulli Distribution in R
- Binomial Distribution in R Programming
- Compute Randomly Drawn Negative Binomial Density in R Programming
- Poisson Functions in R Programming
- How to Use the Multinomial Distribution in R
- Beta Distribution in R
- Chi-Square Distribution in R
- Exponential Distribution in R Programming
- Log Normal Distribution in R
- Continuous Uniform Distribution in R
- Understanding the t-distribution in R
- Gamma Distribution in R Programming
- How to Calculate Conditional Probability in R?
- How to Plot a Weibull Distribution in R
- Hypothesis Testing in R Programming
- T-Test in R Programming
- Type I Error in R
- Type II Error in R
- Confidence Intervals in R
- Covariance and Correlation in R
- Covariance Matrix in R
- Pearson Correlation in R
- Normal Probability Plot in R
Matrices, Lists, Factors
Matrices, lists, and factors are important data structures in R that are commonly used in data analysis and statistical computing. Here is a brief overview of each of them:
Matrices
A matrix in R is a two-dimensional array of elements, all of the same data type. Matrices are created using the matrix()
function and can be manipulated using various functions such as cbind()
, rbind()
, and diag()
. Matrices are useful for performing matrix algebra and other mathematical operations on data.
R can also be used for matrix calculations. Matrices have rows and columns containing a single data type. In a matrix, the order of rows and columns is important.
m <- matrix(nrow = 2, ncol = 3) dim(m) attributes(m) m <- matrix(1:20, nrow = 4, ncol = 5) m
Output:
> dim(m) [1] 2 3 > > attributes(m) $dim [1] 2 3 > m <- matrix(1:20, nrow = 4, ncol = 5) > m [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20
Matrices can also be created directly from vectors by adding a dimension attribute.
m <- 1:20 m dim(m) <- c(4, 5) m
Output:
> m [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > dim(m) <- c(4, 5) > m [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20
Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.
x<-1:3 y<-10:12 z<-30:32 cbind(x,y,z) rbind(x,y,z)
Output:
> cbind(x,y,z) x y z [1,] 1 10 30 [2,] 2 11 31 [3,] 3 12 32 > rbind(x,y,z) [,1] [,2] [,3] x 1 2 3 y 10 11 12 z 30 31 32
By default the matrix function reorders a vector into columns, but we can also tell R to use rows instead.
x <-1:9 matrix(x, nrow = 3, ncol = 3) matrix(x, nrow = 3, ncol = 3, byrow = TRUE)
Output:
> cbind(x,y,z)
x y z
[1,] 1 10 30
[2,] 2 11 31
[3,] 3 12 32
> rbind(x,y,z)
[,1] [,2] [,3]
x 1 2 3
y 10 11 12
z 30 31 32
We can also create a matrix of a specified dimension where every element is the same.
z<- matrix(5, 3, 4) z
Output:
> z [,1] [,2] [,3] [,4] [1,] 5 5 5 5 [2,] 5 5 5 5 [3,] 5 5 5 5
We can create a matrix with specified elements on the diagonal. (And 0 on the off-diagonals.)
diag(3) diag(1:4)
Output:
> diag(3) [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1 > diag(1:4) [,1] [,2] [,3] [,4] [1,] 1 0 0 0 [2,] 0 2 0 0 [3,] 0 0 3 0 [4,] 0 0 0 4
Like vectors, matrices can be subsetted using square brackets, []. However, since matrices are two dimensional, we need to specify both a row and a column when subsetting.
Here we accessed the element in the first row and the second column.
z<- matrix(1:12, 3, 4) z
Output:
> z [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12
We could also subset an entire row.
z[1, ]
Output:
[1] 1 4 7 10
We could also subset an entire column.
z[ ,2]
Output:
[1] 4 5 6
We can also use vectors to subset more than one row or column at a time. Here we subset to the first and third column of the second row.
z[2, c(1, 3)]
Output:
[1] 2 8
Example:
# create a matrix m <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3) # print the matrix m [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 # transpose the matrix t(m) [,1] [,2] [1,] 1 2 [2,] 3 4 [3,] 5 6
Matrix Operations
x = 1:9 y = 9:1 x = matrix(x, 3, 3) y = matrix(y, 3, 3) x x + y x-y x*y x/y
Output:
> x [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 > x + y [,1] [,2] [,3] [1,] 10 10 10 [2,] 10 10 10 [3,] 10 10 10 > x-y [,1] [,2] [,3] [1,] -8 -2 4 [2,] -6 0 6 [3,] -4 2 8 > x*y [,1] [,2] [,3] [1,] 9 24 21 [2,] 16 25 16 [3,] 21 24 9 > x/y [,1] [,2] [,3] [1,] 0.1111111 0.6666667 2.333333 [2,] 0.2500000 1.0000000 4.000000 [3,] 0.4285714 1.5000000 9.000000
Note that X * Y is not matrix multiplication. It is element by element multiplication. (Same for X / Y). Instead, matrix multiplication uses %*%. Other matrix functions include t() which gives the transpose of a matrix and solve() which returns the inverse of a square matrix if it is invertible.
x%*%y t(x) z<-matrix(c(9, 2, -3, 2, 4, -2, -3, -2, 16), 3, byrow = TRUE) solve(z) z<-matrix(c(9, 2, -3, 2, 4, -2, -3, -2, 16), 3, byrow = TRUE) dim(z) rowSums(z) colSums(z) rowMeans(z) colMeans(z) diag(z)
Output:
> x%*%y [,1] [,2] [,3] [1,] 90 54 18 [2,] 114 69 24 [3,] 138 84 30 > t(x) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 > > z<-matrix(c(9, 2, -3, 2, 4, -2, -3, -2, 16), 3, byrow = TRUE) > solve(z) [,1] [,2] [,3] [1,] 0.12931034 -0.05603448 0.01724138 [2,] -0.05603448 0.29094828 0.02586207 [3,] 0.01724138 0.02586207 0.06896552 > z<-matrix(c(9, 2, -3, 2, 4, -2, -3, -2, 16), 3, byrow = TRUE) > dim(z) [1] 3 3 > rowSums(z) [1] 8 4 11 > colSums(z) [1] 8 4 11 > rowMeans(z) [1] 2.666667 1.333333 3.666667 > colMeans(z) [1] 2.666667 1.333333 3.666667 > diag(z) [1] 9 4 16
Lists in R
Lists are a special type of vector that can contain elements of different classes. So, a list in R is a collection of objects, which can be of different data types. Lists are created using the list()
function and can be accessed using the double bracket [[ ]]
or single bracket [ ]
notation. Lists are useful for organizing data and creating complex data structures.
x <- list("stat",5.1, TRUE, 1 + 4i) x class(x)
Output:
> x [[1]] [1] "stat" [[2]] [1] 5.1 [[3]] [1] TRUE [[4]] [1] 1+4i > class(x) [1] "list"
You can create an empty list of a prespecified length with the vector() function.
x <- vector("list", length = 10) x
Output:
> x [[1]] NULL [[2]] NULL [[3]] NULL [[4]] NULL [[5]] NULL [[6]] NULL [[7]] NULL [[8]] NULL [[9]] NULL [[10]] NULL
We can create a little bit complex List like below.
l <-list( a <- c(1, 2, 3, 4), b <- FALSE, c <- "Hello Statistics!", d = function(arg = 42) {print("Hello World!")}, e = diag(10) ) l
Output:
> l [[1]] [1] 1 2 3 4 [[2]] [1] FALSE [[3]] [1] "Hello Statistics!" $d function (arg = 42) { print("Hello World!") } $e [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 0 0 0 0 0 0 0 0 0 [2,] 0 1 0 0 0 0 0 0 0 0 [3,] 0 0 1 0 0 0 0 0 0 0 [4,] 0 0 0 1 0 0 0 0 0 0 [5,] 0 0 0 0 1 0 0 0 0 0 [6,] 0 0 0 0 0 1 0 0 0 0 [7,] 0 0 0 0 0 0 1 0 0 0 [8,] 0 0 0 0 0 0 0 1 0 0 [9,] 0 0 0 0 0 0 0 0 1 0 [10,] 0 0 0 0 0 0 0 0 0 1
Lists can be subset using two syntaxes, the $ operator, and square brackets []. The $ operator returns a named element of a list. The [] syntax returns a list, while the [[]] returns an element of a list.
# subsetting l$e l["e"] l[1:2]
Output:
> l$e [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 0 0 0 0 0 0 0 0 0 [2,] 0 1 0 0 0 0 0 0 0 0 [3,] 0 0 1 0 0 0 0 0 0 0 [4,] 0 0 0 1 0 0 0 0 0 0 [5,] 0 0 0 0 1 0 0 0 0 0 [6,] 0 0 0 0 0 1 0 0 0 0 [7,] 0 0 0 0 0 0 1 0 0 0 [8,] 0 0 0 0 0 0 0 1 0 0 [9,] 0 0 0 0 0 0 0 0 1 0 [10,] 0 0 0 0 0 0 0 0 0 1 > l["e"] $e [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 0 0 0 0 0 0 0 0 0 [2,] 0 1 0 0 0 0 0 0 0 0 [3,] 0 0 1 0 0 0 0 0 0 0 [4,] 0 0 0 1 0 0 0 0 0 0 [5,] 0 0 0 0 1 0 0 0 0 0 [6,] 0 0 0 0 0 1 0 0 0 0 [7,] 0 0 0 0 0 0 1 0 0 0 [8,] 0 0 0 0 0 0 0 1 0 0 [9,] 0 0 0 0 0 0 0 0 1 0 [10,] 0 0 0 0 0 0 0 0 0 1 > l[1:2] [[1]] [1] 1 2 3 4 [[2]] [1] FALSE
Factor
Factors are used to represent categorical data and can be unordered or ordered. An example might be “Male” and “Female” if we consider gender. Factor objects can be created with the factor() function.
x <- factor(c("male", "female", "male", "male", "female")) x table(x)
Output:
> x [1] male female male male female Levels: female male > table(x) x female male 2 3
By default Levels are put in alphabetical order. If you print the above code you will get levels as female and male. But if you want to get your levels in particular order then set levels parameter like this.
x <- factor(c("male", "female", "male", "male", "female"), levels=c("male", "female")) x table(x)
Output:
> x [1] male female male male female Levels: male female > table(x) x male female 3 2