Data and Programming

R is a programming language used primarily for statistical computing and data analysis. In R, everything is an object. Here are some key concepts related to R objects, numbers, attributes, vectors, and coercion.

R Objects

In R, everything is an object, which means that every piece of data has a specific type and a set of properties and functions associated with it. Here are some of the common types of objects in R:

1. Numeric: Numeric objects represent numerical data, such as integers or decimal values. Numeric objects can be created using the numeric() function or by simply entering a numeric value into R.

2. Character: Character objects represent text data, such as names, words, or sentences. Character objects can be created using the character() function or by enclosing text in quotes.

3. Logical: Logical objects represent boolean values, either TRUE or FALSE. Logical objects can be created using the logical() function or by using logical operators such as ==, >, <, and &.

4. Factor: Factor objects represent categorical data, such as gender, race, or color. Factors are used to store data as levels, which are mapped to integer values. Factors can be created using the factor() function.

5. Date: Date objects represent dates, such as “2022-03-25”. Date objects can be created using the as.Date() function or by using the lubridate package.

6. Time: Time objects represent times, such as “14:30:00”. Time objects can be created using the chron package or by using the lubridate package.

7. List: List objects are collections of objects, which can be of different types. Lists can be created using the list() function.

8. Matrix: Matrix objects are two-dimensional arrays of objects, which must be of the same type. Matrices can be created using the matrix() function.

9. Data frame: Data frame objects are similar to matrices, but can contain columns of different types. Data frames can be created using the data.frame() function.

These are just a few examples of the types of objects in R. To work with these objects, you can use various functions and operators in R.

Attributes

R objects can have attributes, which are like metadata for the object. These metadata can be very useful in that they help to describe the object. R objects can have attributes such as names, dimensions, and classes. For example, a matrix is a two-dimensional array with an attribute for row and column names. You can access and modify object attributes using functions such as attributes(), dim(), and class().

  • names, dimnames
  • dimensions (e.g. matrices, arrays)
  • class (e.g. integer, numeric)
  • length
  • other user-defined attributes/metadata

Data Structures in R

R is a powerful programming language that is commonly used for data analysis and statistical computing. It provides a number of built-in data structures that are optimized for working with data, including vectors, matrices, arrays, lists, and data frames.

A data structure is either homogeneous (all elements are of the same data type) or heterogeneous (elements can be of more than one data type).

1. Vectors: A vector is a one-dimensional array that can hold elements of the same data type, such as numbers or characters. Vectors can be created using the c() function, which concatenates the specified values into a vector.

2. Matrices: A matrix is a two-dimensional array that can hold elements of the same data type. Matrices can be created using the matrix() function, which takes the data elements and dimensions as arguments.

3. Arrays: An array is a multi-dimensional version of a vector or matrix, which can hold elements of the same data type. Arrays can be created using the array() function, which takes the data elements, dimensions, and optionally the names of the dimensions as arguments.

4. Lists: A list is a collection of objects that can be of different types, including vectors, matrices, arrays, or even other lists. Lists can be created using the list() function, which takes the objects to be included in the list as arguments.

5. Data Frames: A data frame is a two-dimensional table-like structure, where each column can have a different data type. Data frames can be created using the data.frame() function, which takes the data elements as arguments and optionally the names of the columns.

Creating Vectors in R

In R, you can create vectors using the c() function. Here are some examples:

 

1. Numeric Vector:

To create a numeric vector, use the c() function with a list of numbers separated by commas.

# creating a numeric vector
my_vector <- c(1, 2, 3, 4, 5)

2. Character Vector:

To create a character vector, use the c() function with a list of strings surrounded by quotes and separated by commas.

# creating a character vector
my_vector <- c("apple", "banana", "orange")

3. Logical Vector:

To create a logical vector, use the c() function with a list of TRUE/FALSE values separated by commas.

# creating a logical vector
my_vector <- c(TRUE, FALSE, TRUE)

4. Factor Vector:

To create a factor vector, use the factor() function with a list of strings surrounded by quotes and separated by commas.

# creating a factor vector
my_vector <- factor(c("small", "medium", "large"))

5. Numeric Sequence Vector:

To create a numeric sequence vector, use the : operator or the seq() function.

# creating a numeric sequence vector using `:` operator
my_vector <- 1:10

# creating a numeric sequence vector using `seq()` function
my_vector <- seq(from = 1, to = 10, by = 1)

These are some ways to create vectors in R. You can also combine two or more vectors using the c() function to create a new vector.

 

You can also use the vector() function to initialize vectors. In R, vector() is a function that creates a vector of a specified length and type. The basic syntax for using the vector() function is:

vector(mode, length)

where mode is the data type of the elements in the vector, and length is the length of the vector.

 

x <- vector("numeric", 5)

You can also create a character vector by setting mode to "character", a logical vector by setting mode to "logical", and so on.

Mixing Objects

Because vectors must contains elements that are all the same type, R will automatically coerce to a single type when attempting to create a vector that combines multiple types.

 

x<- c(100, "Statistics with R", TRUE) #character
y <- c(TRUE, 200) #numeric
z <- c("a", TRUE) # character
class(x)
class(y)
class(z)

Output:

> class(x)
[1] "character"
> class(y)
[1] "numeric"
> class(z)
[1] "character"

Remember that the only rule about vectors says this is not allowed. When different objects are mixed in a vector, coercion occurs so that every element in the vector is of the same class.

Explicit Coercion

Explicit coercion in R refers to the process of converting a data type into another data type using a specific function or operator. Explicit coercion is also known as type casting or type conversion.

R provides various functions and operators to perform explicit coercion, including as.numeric(), as.character(), as.logical(), as.integer(), and as.factor(). These functions can be used to convert variables from one data type to another. For example, to convert a character vector to a numeric vector, the as.numeric() function can be used:

 

# Create a character vector
char_vec <- c("1", "2", "3")

# Convert the character vector to a numeric vector
num_vec <- as.numeric(char_vec)

# Check the class of the new vector
class(num_vec)
# Output: "numeric"

In addition to functions, operators such as +, -, *, /, and ^ can also be used to perform explicit coercion. For example, to convert a logical vector to a numeric vector, the + operator can be used:

# Create a logical vector
logical_vec <- c(TRUE, FALSE, TRUE)

# Convert the logical vector to a numeric vector
num_vec <- logical_vec + 0

# Check the class of the new vector
class(num_vec)
# Output: "numeric"

It is important to note that explicit coercion may result in data loss or unexpected results if the conversion is not performed correctly. Therefore, it is recommended to use explicit coercion only when necessary and with caution.

Sometimes, R also don’t how to coerce an object and this can result in NAs being produced.

 

x <- c("Statistics", "R Programming", "Python")
as.numeric(x)
as.logical(x)

Output:

> as.numeric(x)
[1] NA NA NA
Warning message:
NAs introduced by coercion 
> as.logical(x)
[1] NA NA NA

 

Frequently you may wish to create a vector based on a sequence of numbers. The quickest and easiest way to do this is with the : operator, which creates a sequence of integers between two specified integers.

 

y<-1:10
print(y)

Output:

> print(y)
[1] 1 2 3 4 5 6 7 8 9 10

If we want to create a sequence that isn’t limited to integers and increasing by 2 at a time, we can use the seq() function.

seq(from = 1, to = 10, by = 2)
seq(1.5, 10.2, 2)

Output:

[1] 1.5 3.5 5.5 7.5 9.5

Another common operation to create a vector is rep(), which can repeat a single value a number of times.

 

rep("Statistics", times = 10)
x<-c("Statistics","R Programming","Python")
rep(x, times = 3)
length(x)

Output:

[1] "Statistics" "Statistics" "Statistics" "Statistics" "Statistics" "Statistics" "Statistics" "Statistics" "Statistics"
[10] "Statistics"
> x<-c("Statistics","R Programming","Python")
> rep(x, times = 3)
[1] "Statistics" "R Programming" "Python" "Statistics" "R Programming" "Python" "Statistics" "R Programming"
[9] "Python"
> length(x)
[1] 3

Subsetting

Subsetting in R involves extracting a subset of data from a larger dataset based on certain conditions or criteria. There are several ways to subset data in R, including using indexing, logical vectors, and the subset function.

Indexing: You can use indexing to extract a subset of data from a larger dataset by specifying the row and column numbers. For example, if you have a dataset called “mydata” and you want to extract the first three rows and the first two columns, you can use the following code:

 

mydata[1:3, 1:2]

2. Logical vectors: You can also use logical vectors to extract a subset of data based on certain conditions. For example, if you have a dataset called “mydata” and you want to extract all the rows where the value in the first column is greater than 5, you can use the following code:

mydata[mydata[,1] > 5, ]

3.The subset function: The subset function can be used to extract a subset of data based on certain conditions. For example, if you have a dataset called “mydata” and you want to extract all the rows where the value in the first column is greater than 5, you can use the following code:

subset(mydata, mydata[,1] > 5)

Vectorization

Vectorization is a powerful feature of R that allows for efficient manipulation and calculation of large datasets. It involves performing operations on entire vectors or matrices at once, rather than looping through each element of the data structure.

 

x<- 10:20
y<- x+2
print(y)
2*x
2^x
sqrt(x)
log(x)

Output:

> print(y)
 [1] 12 13 14 15 16 17 18 19 20 21 22
> 2*x
 [1] 20 22 24 26 28 30 32 34 36 38 40
> 2^x
 [1] 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576
> sqrt(x)
 [1] 3.162278 3.316625 3.464102 3.605551 3.741657 3.872983 4.000000 4.123106 4.242641 4.358899 4.472136
> log(x)
 [1] 2.302585 2.397895 2.484907 2.564949 2.639057 2.708050 2.772589 2.833213 2.890372 2.944439 2.995732

Logical Operators in R

In R, logical operators are used to create logical expressions that evaluate to either TRUE or FALSE. These operators are commonly used in conditional statements, loops, and data filtering.

 

# Logical AND
x <- 5
y <- 10
x > 3 & y < 20 # returns TRUE

# Logical OR
x <- 5
y <- 10
x < 3 | y > 20 # returns FALSE

# Logical NOT
x <- 5
!(x > 3) # returns FALSE

# Short-circuiting Logical AND
x <- 5
y <- 10
x > 3 && y < 20 # returns TRUE
y <- 5
x > 3 && y < 20 # returns FALSE (y is not less than 20)

# Short-circuiting Logical OR
x <- 5
y <- 10
x < 3 || y > 20 # returns FALSE
x <- 1
x < 3 || y > 20 # returns TRUE (x is less than 3, y is not evaluated)

# Logical XOR
x <- TRUE
y <- FALSE
xor(x, y) # returns TRUE

rep() function in R

rep() is function sometimes very useful to replicate the values in x. In R, the rep() function is used to repeat a specified object or vector multiple times. The basic syntax of the rep() function is:

The basic syntax of the rep() function is:

rep(x, times, each, length.out)

where:

  • x is the object or vector to be repeated.
  • times is the number of times to repeat x.
  • each is the number of times to repeat each element of x before moving to the next element.
  • length.out is the desired length of the output vector, which can be used as an alternative to the times argument.

Here are a few examples of using the rep() function:

 

# Repeat a single element 3 times
rep(5, times = 3)
# Output: 5 5 5

# Repeat a vector 2 times
rep(c("apple", "orange"), times = 2)
# Output: "apple" "orange" "apple" "orange"

# Repeat each element of a vector 3 times
rep(c("apple", "orange"), each = 3)
# Output: "apple" "apple" "apple" "orange" "orange" "orange"

# Create a vector of length 5 by repeating a vector of length 2
rep(c("red", "green"), length.out = 5)
# Output: "red" "green" "red" "green" "red"

rep(1:4, 2)
rep(1:4, each = 2) # not the same.
rep(1:4, c(2,2,2,2)) # same as second.
rep(1:4, c(2,1,2,1))
rep(1:4, each = 2, len = 4) # first 4 only.
rep(1:4, each = 2, len = 10) # 8 integers plus two recycled 1's.
rep(1:4, each = 2, times = 3) # length 24, 3 complete replications

Output:

> rep(1:4, 2)
[1] 1 2 3 4 1 2 3 4
> rep(1:4, each = 2) # not the same.
[1] 1 1 2 2 3 3 4 4
> rep(1:4, c(2,2,2,2)) # same as second.
[1] 1 1 2 2 3 3 4 4
> rep(1:4, c(2,1,2,1))
[1] 1 1 2 3 3 4
> rep(1:4, each = 2, len = 4) # first 4 only.
[1] 1 1 2 2
> rep(1:4, each = 2, len = 10) # 8 integers plus two recycled 1’s.
 [1] 1 1 2 2 3 3 4 4 1 1
> rep(1:4, each = 2, times = 3) # length 24, 3 complete replications
 [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4

rev() function in R

In R, the rev() function is used to reverse the order of elements in a vector. The basic syntax of the rev() function is:

rev(x)

where x is the vector to be reversed.

Here are a few examples of using the rev() function:

# Reverse a numeric vector
x <- c(1, 2, 3, 4, 5)
rev(x)
# Output: 5 4 3 2 1

# Reverse a character vector
y <- c("apple", "banana", "cherry")
rev(y)
# Output: "cherry" "banana" "apple"

# Reverse a logical vector
z <- c(TRUE, FALSE, TRUE)
rev(z)
# Output: TRUE FALSE TRUE

Getting Started with R

Matrices, Lists, Factors