R Programming
- Overview of R
- Installing R on Windows
- Download and Install RStudio on Windows
- Setting Your Working Directory (Windows)
- Getting Help with R
- Installing R Packages
- Loading R Packages
- Take Input and Print in R
- R Objects and Attributes
- R Data Structures
- R – Operators
- Vectorization
- Dates and Times
- Data Summary
- Reading and Writing Data to and from R
- Control Structure
- Loop Functions
- Functions
- Data Frames and dplyr Package
- Generating Random Numbers
- Random Number Seed in R
- Random Sampling
- Data Visualization Using R
dplyr Package – mutate()
Add new columns with mutate():
The mutate() function helps to compute transformations of variables in a data frame. Sometimes, you want to create new variables that are derived from existing variables and mutate() provides a clean interface for doing that.
For the examples in this section we will be using a built-in data set in R called sleep data set. First load the data set using data(“sleep”) command. To the help file for sleep data just type ?sleep. Don’t forget to load the dplyr package.
library(dplyr)
library(datasets)
#OR
data("sleep")?sleep
You can see some basic characteristics of the dataset with the dim() and str() functions.
dim(sleep)
str(sleep)
summary(sleep)
Output:
> dim(sleep)
[1] 20 3
> str(sleep)
‘data.frame’: 20 obs. of 3 variables:
$ extra: num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 …
$ group: Factor w/ 2 levels “1”,”2″: 1 1 1 1 1 1 1 1 1 1 …
$ ID : Factor w/ 10 levels “1”,”2″,”3″,”4″,..: 1 2 3 4 5 6 7 8 9 10 …
> summary(sleep)
extra group ID
Min. :-1.600 1:10 1 :2
1st Qu.:-0.025 2:10 2 :2
Median : 0.950 3 :2
Mean : 1.540 4 :2
3rd Qu.: 3.400 5 :2
Max. : 5.500 6 :2
(Other):8
Example:
Here we create a ‘extra_derived’ variable that subtracts the mean from the ‘extra’ variable.
sleep_data<-mutate(sleep, extra_derived= extra – mean(extra, na.rm = TRUE))
str(sleep_data)
head(sleep_data)
Output:
> str(sleep_data)
'data.frame': 20 obs. of 4 variables:
$ extra : num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 ...
$ group : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ ID : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ extra_derived: num -0.84 -3.14 -1.74 -2.74 -1.64 1.86 2.16 -0.74 -1.54 0.46 ...
> head(sleep_data)
extra group ID extra_derived
1 0.7 1 1 -0.84
2 -1.6 1 2 -3.14
3 -0.2 1 3 -1.74
4 -1.2 1 4 -2.74
5 -0.1 1 5 -1.64
6 3.4 1 6 1.86
Example 2:
There is also the related transmute() function, which does the same thing as mutate() but then drops all non-transformed variables.
s<-transmute(sleep, extra = extra*100)
head(s)
Output:
extra
1 70
2 -160
3 -20
4 -120
5 -10
6 340