Explore your Data: Graphs and shapes of distributions

For Categorical Variables:

If variable of interest is categorical then generally Pie chart or Bar Graph is the best representation.

If too many categories are there then pie chart might be messy but Bar Graph will give a clear view. So, Bar Graph is having advantage over Pie Chart when no of variables are too high.

For the above table if we draw pie chart for percentage column it will be like this.

For the same table if we draw bar graph for percentage column it will be like this.

Basically, for categorical variable you can do these many representations listed below.

  • Bar plot
  • Pie Chart
  • Frequency Table
  • Contingency Table
  • Segmented Bar Plot.
  • Relative Frequency
  • Mosaic Plot
  • Side by Side Box Plot

For Quantitative Variables:

Dot Plot:

If you are working with quantitative variables or numerical variable then Dot Plot is one kind of representation that can be used. A dot plot looks like this. Plot each and every point into the graphs after drawing a horizontal line and label the possible values on it, in regular intervals.

But If you have a very large sample then dot plot may looks messy.

So, another kind of representation called Histogram might be useful for that case.

 
Histogram:

A histogram is similar to a bar graph in the sense that it uses bars to portray the frequencies or relative frequencies of the possible values of a variable. However, there is one important difference. That difference is that the bars in a histogram touch each other. This touching represents that the values of an interval/ratio variable represent an underlying continuous scale. Below is an example of Histogram.

Observe the above histogram and see the distribution. There are three kind of shapes.

  1. Middle one has the shape of a bell curve, has one peak, and is approximately symmetric.
  2. Left one is left skewed and unimodal
  3. Right one is right skewed and unimodal

 

Four kind of modalities are there

  • Unimodal: It has only one peak
  • Bimodal: It has two peak
  • Multimodal: It has many peak
  • Uniform: All are distributed uniformly

Whenever working with any data don’t forget to observe shape of a distribution. As it has essential importance because it could affect the statistical methods you are going to employ later.

Explore your Data: Matrix and Frequency Table

Explore your Data: Mode, Median and Mean