dplyr Package – select()

Select columns with select():

For the examples in this section we will be using a built-in data set in R called iris data set. First load the data set using data(“iris”) command. To the help file for iris just type ?iris. Don’t forget to load the dplyr package.

 

library(dplyr)
library(datasets)
#OR
data("iris")

?iris

You can see some basic characteristics of the dataset with the dim() and str() functions.

dim(iris)
str(iris)
names(iris)

Output:

> dim(iris)
[1] 150 5
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

Example:

x<-select(iris,c(Species,Sepal.Length))
head(x)

Output:

Species Sepal.Length
1 setosa   5.1
2 setosa   4.9
3 setosa   4.7
4 setosa   4.6
5 setosa   5.0
6 setosa   5.4

Example 2:

Inside the select() function you can use :  to specify a range of variable names.

y<-select(iris, Sepal.Length: Petal.Length)
head(y)

Output:

  Sepal.Length Sepal.Width Petal.Length
1   5.1    3.5     1.4
2   4.9    3.0     1.4
3   4.7    3.2     1.3
4   4.6    3.1     1.5
5   5.0    3.6     1.4
6   5.4    3.9     1.7

Example 3:

You can also omit variables using the select() function by using the negative sign.

z<-select(iris,-c(Species,Sepal.Length))
head(z)

Output:

 Sepal.Width Petal.Length Petal.Width
1   3.5   1.4      0.2
2   3.0   1.4      0.2
3   3.2   1.3      0.2
4   3.1   1.5      0.2
5   3.6   1.4      0.2
6   3.9   1.7      0.4

If you don’t want to use the select function then you can do the same things using equivalent following code in base R.

i <- match("Species", names(iris))
j <- match("Sepal.Length", names(chicago))
head(chicago[, -(i:j)]

Example 4:

The select() function also allows a special syntax that allows you to specify variable names based on patterns. Check example 4 and 5.

iris_subset1 <- select(iris, ends_with("Length"))
head(iris_subset1)

Output:

  Sepal.Length Petal.Length
1 5.1 1.4
2 4.9 1.4
3 4.7 1.3
4 4.6 1.5
5 5.0 1.4
6 5.4 1.7

Example 5:

iris_subset2 <- select(iris, starts_with("Sepal"))
head(iris_subset2)

Output:

  Sepal.Length Sepal.Width
1 5.1 3.5
2 4.9 3.0
3 4.7 3.2
4 4.6 3.1
5 5.0 3.6
6 5.4 3.9

Data Frames and dplyr Package

filter() Function in dplyr