Basic Statistics
- Data Science Essentials: 10 Statistical Concepts
- Cases, Variables, Types of variables
- Matrix and Frequency Table
- Graphs and shapes of Distributions
- Mode, Median and Mean
- Range, Interquartile Range and Box Plot
- Variance and Standard Deviation
- Z-score or Standardized Score
- Contingency Table, Scatterplot, Pearson’s r
- Basics of Regression
- Elementary Probability
- Random Variables and Probability Distributions
- Normal Distribution, Binomial Distribution & Poisson Distribution
Z-score or Standardized Score
What is Z-score
Z-score, also known as standard score, is a statistical measure that expresses the deviation of a data point from the mean of a dataset in terms of standard deviation.
How to calculate Z-score?
The formula to calculate the Z-score of a data point is:
Z = (X – μ) / σ
where:
- X is the data point
- μ is the mean of the dataset
- σ is the standard deviation of the dataset
A Standardized Score (Z-Score) is useful to know how many standard deviations an element falls from the mean. Sometimes, you can see a formula like below.
where z is the z-score, X is the value of the element, μ is the population mean, and σ is the standard deviation.
- Z-score will help to understand a specific observation is common or exceptional in your study.
- As mean is the middle point. So, negative z-score represent values below the mean. While positive z-score represent values above the mean
- If you add all Z-Score you will get a value 0 because positive and negative z-score will cancel out each other.
- If your data is extremely right skewed then probably you will get large positive Z-Score. On the other way, if distribution is left skewed then you will get large negative Z-Score
- Z-Score of Mean is 0 as it is the middle value
- If value of |Z| is greater than 2 then we can tell a distribution is unusual or exceptional.
Why Z-Score is needed?
Sometimes, in your statistical analysis you want to figure out a specific observation is common or exceptional case. Then Z-score will help to understand the standard deviation it falls below or above the mean.
Z-scores are also useful in identifying outliers in a dataset. If a data point has a Z-score that is significantly larger or smaller than the other data points in the dataset, it may be an outlier that deserves further investigation.
Bell Shaped Distribution and Empirical Rule
If distribution is bell shape then it is assumed that about 68% of the elements have a z-score between -1 and 1; about 95% have a z-score between -2 and 2; and about 99% have a z-score between -3 and 3.
Let’s take an example why Z-Score are useful?
A person is having two sons. He wants to know who scored better on their standardized test with respect to the other test takers. Ram who earned an 1800 on his SAT or Sham who scored a 24 on his ACT Exam ?
Here we cannot simply compare and tell who has done better as they are measured in different scale.
So, his father will be interested to observe how many standard deviation of their respective mean of their distribution Ram and Sham score.
Ram = (1800- 1500) / 300 =1 standard deviation above the mean
Sham = (24 – 21 ) / 5= 0.6 standard deviation above the mean
Now his father can conclude Ram indeed did a better score than Sham.
In summary, Z-scores are important because they provide a standardized way to compare data points from different datasets and identify outliers. If a Z-score is positive, it means the data point is above the mean, while a negative Z-score means the data point is below the mean. A Z-score of 0 means the data point is equal to the mean.
Z-Score FAQs
- What is z-score and why is it used?
- What is z-scores in statistics?
- How do you calculate the z-score?
- What is a good z-score?