Data Frames in R

In R, a data frame is a two-dimensional table-like data structure that allows you to store and manipulate data in rows and columns. Each column of a data frame can contain data of a different type (numeric, character, factor, etc.) but all columns must have the same length. Data frames are one of the most commonly used data structures in R because they allow you to work with datasets that have different types of data.

Data frames can be created using the data.frame() function in R. You can also read in data from external files, such as CSV or Excel files, and convert them into data frames using functions like read.csv() or read_excel() from the tidyverse package. Once you have a data frame, you can manipulate and analyze the data using functions like subset(), aggregate(), and merge().

One of the benefits of working with data frames in R is that they are compatible with many statistical and visualization packages. This makes it easy to perform data analysis and create visualizations of your data.

Data frames can be created explicitly with the data.frame() function.

employee <- c('Ram','Sham','Jadu')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2016-11-1','2015-3-25','2017-3-14'))
employ_data <- data.frame(employee, salary, startdate)
employ_data
View(employ_data)

Output:

> employ_data
employee salary startdate
1 Ram 21000 2016-11-01
2 Sham 23400 2015-03-25
3 Jadu 26800 2017-03-14
> View(employ_data)

If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output:

str(employ_data)

Output:

> str(employ_data)
'data.frame': 3 obs. of 3 variables:
 $ employee : Factor w/ 3 levels "Jadu","Ram","Sham": 2 3 1
 $ salary : num 21000 23400 26800
 $ startdate: Date, format: "2016-11-01" "2015-03-25" "2017-03-14"

You can try some other functions like dim() to see the dimension. nrow() and ncol() function will help you to know number of row and column in the data frame.

dim(employ_data)
nrow(employ_data)
ncol(employ_data)

Output:

> dim(employ_data)
[1] 3 3
> nrow(employ_data)
[1] 3
> ncol(employ_data)
[1] 3

Example of Data Frames in R

Creating a data frame manually:

# Create a data frame with three columns
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
city = c("New York", "Los Angeles", "Chicago")
)

# Print the data frame
df

Output:

 name age city
1 Alice 25 New York
2 Bob 30 Los Angeles
3 Charlie 35 Chicago

Reading in a CSV file and converting it to a data frame:

# Read in a CSV file
mydata <- read.csv("mydata.csv")

# Convert the data to a data frame
df <- data.frame(mydata)

# Print the data frame
df

Output:

 id name age gender
1 1 John 23 M
2 2 Sarah 28 F
3 3 Michael 35 M
4 4 Lauren 19 F

Subsetting a data frame:

# Subset the data frame to include only rows where age is greater than 25
subset_df <- subset(df, age > 25)

# Print the subsetted data frame
subset_df

Output:

 id name age gender
2 2 Sarah 28 F
3 3 Michael 35 M

Statistics with R

Output:

Output:

Output:

Example of Data Frames in R

Creating a data frame manually:

Output:

Reading in a CSV file and converting it to a data frame:

Output:

Subsetting a data frame:

Output:

Matrices, Lists, Factors

Control Structures