Central Tendency and Spread

For this section mtcars dataset will be used. To get that data, install ggplot2 package and load the package if you didn’t do it till now. Then load the data.
Use the below code do that.

 

install.packages("ggplot2")
library(ggplot2)

data(mtcars)

Now you can access the mtcars data by using ‘mtcars’. Explore the data little bit using names(), str(), summary(), dim() functions etc.

str(mtcars)

Output:

‘data.frame’: 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 …
$ cyl : num 6 6 4 6 8 6 8 4 4 6 …
$ disp: num 160 160 108 258 360 …
$ hp : num 110 110 93 110 175 105 245 62 95 123 …
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 …
$ wt : num 2.62 2.88 2.32 3.21 3.44 …
$ qsec: num 16.5 17 18.6 19.4 17 …
$ vs : num 0 0 1 1 0 1 0 1 1 1 …
$ am : num 1 1 1 0 0 0 0 0 0 0 …
$ gear: num 4 4 4 3 3 3 3 4 4 4 …
$ carb: num 4 4 1 1 2 1 4 2 2 4 …

 

dim(mtcars)

Output:

[1] 32 11

You can type ?mtcars in your R console  to get some more help and detailed description about mtcars dataset.

?mtcars

Central Tendency:

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. The mean and median are the most likely measure of central tendency for numerical data.

There is a variable in this dataset called ‘mpg’ Miles/(US) gallon. If you want to know mean and median mpg miles/gallon then type the below code.

mean(mtcars$mpg)
median(mtcars$mpg)

Output:

> mean(mtcars$mpg)
[1] 20.09062
> median(mtcars$mpg)
[1] 19.2

Measures of Spread:

To get the measures of spread you can use variance, standard deviation, interquartile range (IQR), minimum value, maximum value, range etc. 

Variance:

var(mtcars$mpg)

Output:

[1] 36.3241

Standard deviation:

sd(mtcars$mpg)

Output:

[1] 6.026948

Interquartile range (IQR):

IQR(mtcars$mpg)

Output:

[1] 7.375

Min, Max and Range Function:

min(mtcars$mpg)
max(mtcars$mpg)
range(mtcars$mpg)

Output:

> min(mtcars$mpg)
[1] 10.4
> max(mtcars$mpg)
[1] 33.9
> range(mtcars$mpg)
[1] 10.4 33.9

Categorical Variable:

For categorical variables, counts and percentages can be used for summary.

table(mtcars$cyl)

table(mtcars$cyl)/nrow(mtcars)

Output:

> table(mtcars$cyl)

4 6 8
11 7 14
>
> table(mtcars$cyl)/nrow(mtcars)

4 6 8
0.34375 0.21875 0.43750

If you want to know how many unique values are there in a column then use unique() function.

unique(mtcars$cyl)

Output:

[1] 6 4 8

If you want to get a frequency table for Number of cylinders vs Number of carburetors then use the below code.

table(mtcars$cyl, mtcars$carb)

Output:

>table(mtcars$cyl, mtcars$carb)

1 2 3 4 6 8
4 5 6 0 0 0 0
6 2 0 0 4 1 0
8 0 4 3 6 0 1

Data Basics: Summary Statistics

Data Basics: Plotting