R Programming
- Overview of R
- Installing R on Windows
- Download and Install RStudio on Windows
- Setting Your Working Directory (Windows)
- Getting Help with R
- Installing R Packages
- Loading R Packages
- Take Input and Print in R
- R Objects and Attributes
- R Data Structures
- R – Operators
- Vectorization
- Dates and Times
- Data Summary
- Reading and Writing Data to and from R
- Control Structure
- Loop Functions
- Functions
- Data Frames and dplyr Package
- Generating Random Numbers
- Random Number Seed in R
- Random Sampling
- Data Visualization Using R
dplyr Package – group_by()
The group_by() function is used to generate summary statistics from the data frame within strata defined by a variable. The group_by() function first sets up how you want to group your data. The general operation here is a combination of splitting a data frame into separate pieces defined by a variable or group of variables (group_by()), and then applying a summary function across those subsets (summarize()).
For the examples in this section we will be using a built-in data set in R called mtcars data set. First load the data set using data(“mtcars”) command. To the help file for sleep data just type ?mtcars. Don’t forget to load the dplyr package.
library(dplyr)
library(datasets)
#OR
data("mtcars")?mtcars
You can see some basic characteristics of the dataset with the dim() and str() functions.
dim(mtcars)
str(mtcars)
names(mtcars)
Output:
dim(mtcars)
[1] 32 11
> str(mtcars)
‘data.frame’: 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 …
$ cyl : num 6 6 4 6 8 6 8 4 4 6 …
$ disp: num 160 160 108 258 360 …
$ hp : num 110 110 93 110 175 105 245 62 95 123 …
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 …
$ wt : num 2.62 2.88 2.32 3.21 3.44 …
$ qsec: num 16.5 17 18.6 19.4 17 …
$ vs : num 0 0 1 1 0 1 0 1 1 1 …
$ am : num 1 1 1 0 0 0 0 0 0 0 …
$ gear: num 4 4 4 3 3 3 3 4 4 4 …
$ carb: num 4 4 1 1 2 1 4 2 2 4 …
> names(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
Example:
Now we can group the data frame by the cyl variable.
cyl <- group_by(mtcars, cyl)
summarise(cyl, mean(disp), mean(hp))
Output:
> summarise(cyl, mean(disp), mean(hp))
# A tibble: 3 x 3
cyl `mean(disp)` `mean(hp)`
<dbl> <dbl> <dbl>
1 4 105.1364 82.63636
2 6 183.3143 122.28571
3 8 353.1000 209.21429
Example 2:
groupby_vs_am <- group_by(mtcars, vs, am)
summarise(by_vs_am, n = n())
Output:
> summarise(by_vs_am, n = n())
Source: local data frame [4 x 3]
Groups: vs [?]# A tibble: 4 x 3
vs am n
<dbl> <dbl> <int>
1 0 0 12
2 0 1 6
3 1 0 7
4 1 1 7