Basic Statistics
- Data Science Essentials: 10 Statistical Concepts
- Cases, Variables, Types of variables
- Matrix and Frequency Table
- Graphs and shapes of Distributions
- Mode, Median and Mean
- Range, Interquartile Range and Box Plot
- Variance and Standard Deviation
- Z-score or Standardized Score
- Contingency Table, Scatterplot, Pearson’s r
- Basics of Regression
- Elementary Probability
- Random Variables and Probability Distributions
- Normal Distribution, Binomial Distribution & Poisson Distribution
Data Matrix and Frequency table
Data Matrix
A data matrix is a rectangular table or matrix in which rows represent observations or cases, and columns represent variables or attributes. Each cell in the matrix contains a value corresponding to the variable for the given observation. A data matrix can be used to organize and store data for easy analysis and interpretation.
Content Overview
A Frequency Table
A frequency table, on the other hand, is a tabular representation of the frequency distribution of a categorical variable. It shows the number or frequency of observations that fall into each category of the variable. Each row in the frequency table represents a category of the variable, and the corresponding column shows the number of observations that fall into that category. Frequency tables can be used to summarize and visualize categorical data, and they can be used to calculate various summary statistics, such as the mode and the percentage of observations in each category.
If you’re conducting a study, you should think about your data in terms of cases and variables.
Cases are the persons, animals or things in your study, and variables are the characteristics of interest. Here, I will discuss how you can order and present your cases and variables. Lets take an example, imagine you are interested in the “Primera División”, the top football competition in Spain. Here, the cases you’re interested in are individual football players within the league, and the variables you focus on are age, body weight, goals scored, team membership and hair color. The best way to order all this information is by means of a data matrix.
So, Data Matrix is the tabular format representation of cases and variables of your statistical study. Each row of a data matrix represents a case and each column represent a variable.
A complete Data Matrix may contain thousands or lakhs or even more cases.
Sample from IRIS Dataset has shown below. You can get it from UCI Repository.
https://archive.ics.uci.edu/ml/datasets/iris
To get more insight, summarization of the information is very useful. A good way to do that is to make a frequency table. A frequency table shows how the values of a variable are distributed over the cases. Consider this following example to consider that. We can get the frequency of items and then percentage or even calculating cumulative percentage.
Here we have total 8 cases and among 8 cases 2 cases (25 % cases) belongs to Iris-Setosa.
3 cases which means 38% cases belongs to Iris-Virginia and similarly another 38% are Iris Versicolor.
Above example is for a categorical variable called class. But think if your variable is quantitative then computing percentage for every specific value does not make sense. In that case first bring your data into some ordinal categories, by using intervals. Then do the rest of the things.