dplyr Package – mutate()

Add new columns with mutate():

The mutate() function helps to compute transformations of variables in a data frame. Sometimes, you want to create new variables that are derived from existing variables and mutate() provides a clean interface for doing that.

For the examples in this section we will be using a built-in data set in R called sleep data set. First load the data set using data(“sleep”) command. To the help file for sleep data just type ?sleep. Don’t forget to load the dplyr package.

 

library(dplyr)
library(datasets)
#OR
data("sleep")

?sleep

You can see some basic characteristics of the dataset with the dim() and str() functions.

dim(sleep)
str(sleep)
summary(sleep)

Output:

> dim(sleep)
[1] 20 3
> str(sleep)
‘data.frame’: 20 obs. of 3 variables:
$ extra: num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 …
$ group: Factor w/ 2 levels “1”,”2″: 1 1 1 1 1 1 1 1 1 1 …
$ ID : Factor w/ 10 levels “1”,”2″,”3″,”4″,..: 1 2 3 4 5 6 7 8 9 10 …
> summary(sleep)
extra group ID
Min. :-1.600 1:10 1 :2
1st Qu.:-0.025 2:10 2 :2
Median : 0.950 3 :2
Mean : 1.540 4 :2
3rd Qu.: 3.400 5 :2
Max. : 5.500 6 :2
(Other):8

Example:

Here we create a ‘extra_derived’ variable that subtracts the mean from the ‘extra’ variable.

sleep_data<-mutate(sleep, extra_derived= extra – mean(extra, na.rm = TRUE))

str(sleep_data)
head(sleep_data)

Output:

  > str(sleep_data)
'data.frame': 20 obs. of 4 variables:
$ extra : num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 ...
$ group : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ ID : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ extra_derived: num -0.84 -3.14 -1.74 -2.74 -1.64 1.86 2.16 -0.74 -1.54 0.46 ...
> head(sleep_data)
extra group ID extra_derived
1 0.7 1 1 -0.84
2 -1.6 1 2 -3.14
3 -0.2 1 3 -1.74
4 -1.2 1 4 -2.74
5 -0.1 1 5 -1.64
6 3.4 1 6 1.86

Example 2:

There is also the related transmute() function, which does the same thing as mutate() but then drops all non-transformed variables. 

s<-transmute(sleep, extra = extra*100) 

head(s)

Output:

 extra
1 70
2 -160
3 -20
4 -120
5 -10
6 340

rename() Function in dplyr

group_by() Function in dplyr