R Programming
- Overview of R
- Installing R on Windows
- Download and Install RStudio on Windows
- Setting Your Working Directory (Windows)
- Getting Help with R
- Installing R Packages
- Loading R Packages
- Take Input and Print in R
- R Objects and Attributes
- R Data Structures
- R – Operators
- Vectorization
- Dates and Times
- Data Summary
- Reading and Writing Data to and from R
- Control Structure
- Loop Functions
- Functions
- Data Frames and dplyr Package
- Generating Random Numbers
- Random Number Seed in R
- Random Sampling
- Data Visualization Using R
dplyr Package – group_by()
The group_by() function is used to generate summary statistics from the data frame within strata defined by a variable. The group_by() function first sets up how you want to group your data. The general operation here is a combination of splitting a data frame into separate pieces defined by a variable or group of variables (group_by()), and then applying a summary function across those subsets (summarize()).
For the examples in this section we will be using a built-in data set in R called mtcars data set. First load the data set using data(“mtcars”) command. To the help file for sleep data just type ?mtcars. Don’t forget to load the dplyr package.![]()
library(dplyr)
library(datasets)
#OR
data("mtcars")?mtcars
You can see some basic characteristics of the dataset with the dim() and str() functions.
dim(mtcars)
str(mtcars)
names(mtcars)
Output:
dim(mtcars)
[1] 32 11
> str(mtcars)
‘data.frame’: 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 …
$ cyl : num 6 6 4 6 8 6 8 4 4 6 …
$ disp: num 160 160 108 258 360 …
$ hp : num 110 110 93 110 175 105 245 62 95 123 …
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 …
$ wt : num 2.62 2.88 2.32 3.21 3.44 …
$ qsec: num 16.5 17 18.6 19.4 17 …
$ vs : num 0 0 1 1 0 1 0 1 1 1 …
$ am : num 1 1 1 0 0 0 0 0 0 0 …
$ gear: num 4 4 4 3 3 3 3 4 4 4 …
$ carb: num 4 4 1 1 2 1 4 2 2 4 …
> names(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
Example:
Now we can group the data frame by the cyl variable.
cyl <- group_by(mtcars, cyl)
summarise(cyl, mean(disp), mean(hp))
Output:
> summarise(cyl, mean(disp), mean(hp))
# A tibble: 3 x 3
cyl `mean(disp)` `mean(hp)`
<dbl> <dbl> <dbl>
1 4 105.1364 82.63636
2 6 183.3143 122.28571
3 8 353.1000 209.21429
Example 2:
groupby_vs_am <- group_by(mtcars, vs, am)
summarise(by_vs_am, n = n())
Output:
> summarise(by_vs_am, n = n())
Source: local data frame [4 x 3]
Groups: vs [?]# A tibble: 4 x 3
vs am n
<dbl> <dbl> <int>
1 0 0 12
2 0 1 6
3 1 0 7
4 1 1 7