Data Frames

Data frames are used to store tabular data in R. They are an important type of object in R and are used in a variety of statistical modeling applications.Data frames are represented as a special type of list where every element of the list has to have the same length. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows. Unlike matrices, data frames can store different classes of objects in each column. Matrices must have every element be the same class (e.g. all integers or all numeric).

Creating a Data Frame:

 

Data frames can be created explicitly with the data.frame() function.

employee <- c('Ram','Sham','Jadu')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2016-11-1','2015-3-25','2017-3-14'))
employ_data <- data.frame(employee, salary, startdate)
employ_data
View(employ_data)

Output:

employ_data
employee salary startdate
1 Ram 21000 2016-11-01
2 Sham 23400 2015-03-25
3 Jadu 26800 2017-03-14
> View(employ_data)

Get the Structure of the Data Frame:

 

If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output:

str(employ_data)

Output:

> str(employ_data)
'data.frame': 3 obs. of 3 variables:
$ employee : Factor w/ 3 levels "Jadu","Ram","Sham": 2 3 1
$ salary : num 21000 23400 26800
$ startdate: Date, format: "2016-11-01" "2015-03-25" "2017-03-14"

Note that the first column, employee is of type factor, instead of a character vector.By default, data.frame() function converts character vector into factor. To suppress this behavior, we can pass the argument stringsAsFactors=FALSE.

 

employ_data <- data.frame(employee, salary, startdate, stringsAsFactors = FALSE)
str(employ_data)

Output:

'data.frame': 3 obs. of 3 variables:
$ employee : chr "Ram" "Sham" "Jadu"
$ salary : num 21000 23400 26800
$ startdate: Date, format: "2016-11-01" "2015-03-25" "2017-03-14"

Some useful Functions of Data Frame:

You can try some other functions like dim() to see the dimension. nrow() and ncol() function will help you  to know number of row and column in the data frame.

 

dim(employ_data)
nrow(employ_data)
ncol(employ_data)

Output:

> dim(employ_data)
[1] 3 3
> nrow(employ_data)
[1] 3
> ncol(employ_data)
[1] 3

 Accessing Elements of a Data Frame:

You can use either [, [[ or $ operator to access columns of data frame.

 

employ_data["salary"] #Extract the salary column
employ_data[["salary"]]
employ_data[[2]]

employ_data[[2,3]] #Extract the element in row 2 column 3
emp<-data.frame(employ_data$salary,employ_data$employee) #Get only the Salary and Employee Name
emp
employ_data[2:3,]    # Extract 2nd and 3rd row
employ_data[employ_data$salary > 23400,]    # selects rows with salary greater than 23400

Output:

> employ_data["salary"] #Extract the salary column
salary
1 21000
2 23400
3 26800
> employ_data[["salary"]]
[1] 21000 23400 26800
> employ_data[[2]]
[1] 21000 23400 26800
>
> employ_data[[2,3]] #Extract the element in row 2 column 3
[1] "2015-03-25"
> emp<-data.frame(employ_data$salary,employ_data$employee) #Get only the Salary and Employee Name
> emp
employ_data.salary employ_data.employee
1 21000 Ram
2 23400 Sham
3 26800 Jadu
> employ_data[2:3,] # Extract 2nd and 3rd row
employee salary startdate
2 Sham 23400 2015-03-25
3 Jadu 26800 2017-03-14
> employ_data[employ_data$salary > 23400,] # selects rows with Height greater than 82
employee salary startdate
3 Jadu 26800 2017-03-14

Adding a Row or Column in a Data Frame:

You can add Rows to a data frame using the rbind() function.

 

employ_data<-  rbind(employ_data,list("Madhu",35000,"2017-05-03"))
employ_data

Output:

employee salary startdate
1 Ram 21000 2016-11-01
2 Sham 23400 2015-03-25
3 Jadu 26800 2017-03-14
4 Madhu 35000 2017-05-03

Similarly, You can add columns using cbind().

employ_data<- cbind(employ_data,Gender=c("Male","Male","Male","Female"))
employ_data

Output:

employee salary startdate Gender
1 Ram 21000 2016-11-01 Male
2 Sham 23400 2015-03-25 Male
3 Jadu 26800 2017-03-14 Male
4 Madhu 35000 2017-05-03 Female

Modify a Data Frame:

Data frames can be modified like you modified matrices using reassignment.

 

employ_data[1,2]<- 40000
employ_data[3,"Gender"]<- "Female"
employ_data

Output:

employee salary startdate Gender
1 Ram 40000 2016-11-01 Male
2 Sham 23400 2015-03-25 Male
3 Jadu 26800 2017-03-14 Female
4 Madhu 35000 2017-05-03 Female

Deleting Elements from a Data Frame:

You can delete a column from a Data frame by assigning NULL to it. And a row can be deleted through reassignments.

 

employ_data$Gender<-NULL #To remove a column
employ_data<-employ_data[-4,] #To remove a row
employ_data

Output:

employee salary startdate
1 Ram 40000 2016-11-01
2 Sham 23400 2015-03-25
3 Jadu 26800 2017-03-14

Summary of Data in a Data Frame:

You can get the statistical summary of the data by using summary() function.

summary(employ_data)

Output:

employee salary startdate
Length:3 Min. :23400 Min. :2015-03-25
Class :character 1st Qu.:25100 1st Qu.:2016-01-12
Mode :character Median :26800 Median :2016-11-01
Mean :30067 Mean :2016-06-02
3rd Qu.:33400 3rd Qu.:2017-01-06
Max. :40000 Max. :2017-03-14

Array

Missing Values