Elementary Probability

An Introduction to Elementary Probability

Elementary probability theory is a branch of mathematics that deals with the study of random phenomena. It involves the study of random events or experiments, and the mathematical methods used to analyze and understand the likelihood or probability of different outcomes.

At its core, elementary probability theory is concerned with calculating the probability of an event occurring, given some knowledge of the circumstances surrounding the event. It includes concepts such as sample spaces, probability distributions, random variables, and expected values.

Elementary Probability Theory for Data Analysis and Inferential Statistics

Probability itself is a big topic and here it is not possible to discuss each and everything. This tutorial touches all the relevant fundamentals that will give you a conceptual framework which is required for data analysis and inferential statistics.

Randomness

In a random process, we know what outcomes could happen but we don’t know which particular outcome will happen.

Randomness Example

Tossing a coin, rolling a dice, shuffle mode on your music player, Stock market etc.

If you toss a coin you know only two outcomes may come but we don’t know which will come exactly. On the other way for shuffling mode on your music player you know what are the songs you have stored in your music player. So, you know what are the possible outcome and your next song will be something from your entire music library but don’t know which song will play next. Sometimes it might be helpful to modeled a process as random though it is not truly random. Example is stock market.

To describe the probability of event, the notation will P(A) = Probability of event A. There are several possible interpretations of probability but they (almost) completely agree on the mathematical rules probability must follow 0 <= P(A) <=1. That means probability of and event always between 0 and 1.

The traditional interpretation of probability is a relative frequency. This is call frequentist interpretation.

Frequentist Interpretation of Probability

The probability of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times. An alternative interpretation is Bayesian interpretation.

Bayesian interpretation of Probability

A Bayesian interprets probability as a subjective degree of belief. For same event two separate people may have different viewpoints and so assigned different probabilities to it. This interpretation allows for prior information to be integrated into inferential framework. Largely popularized by revolutionary advance in computational technology and methods during the last twenty years.

Law of large numbers

Law of large numbers states that as more observation are collected, the proportion of occurrences with a particular outcome converges to the probability of that outcome. For example, if you roll a dice 6 times there is no guarantee that you will get at least one five in there. But if you roll the dice for 600 times or 6000 times. Then you are expect to see at least 1/6 times to get a five.

Disjoint Events

Disjoint events, also known as mutually exclusive events, are two or more events that cannot occur at the same time. In other words, if one of the events happens, then the other event(s) cannot happen simultaneously.

For example, if we consider the outcomes of rolling a six-sided die, the events “rolling a 1” and “rolling a 2” are disjoint events, because it is impossible to roll both a 1 and a 2 on a single roll. Similarly, if we consider the event of drawing a card from a standard deck of 52 cards, the events “drawing a heart” and “drawing a spade” are disjoint events, because a card cannot be both a heart and a spade at the same time.

Few Examples of Disjoint Events

We know that disjoint events cannot happen at the same time. So, a synonym for this is mutually exclusive.

The outcome of a single coin toss cannot be a head and tail at the same time.
A student can’t both fail and pass a class.
A single card drawn from a deck cannot be an ace and a queen.
The event don’t join hence the term disjoint.
For disjoint event P(A and B) = 0

Mathematically, if A and B are two disjoint events, then the probability of either event occurring is given by:

P(A or B) = P(A) + P(B)

However, if A and B are not disjoint events (i.e., they can occur at the same time), then the probability of either event occurring needs to be adjusted to avoid double counting the intersection of the events. In this case, we would use the general addition rule of probability.

Non-Disjoint Event

Non-disjoint events are events that can occur at the same time. In other words, they are not mutually exclusive. If two or more events are non-disjoint, they can overlap or have outcomes in common.

For example, consider the event of drawing a card from a standard deck of 52 cards. The events “drawing a red card” and “drawing a face card” are non-disjoint events, as some cards can be both red and face cards (e.g., the Jack of Hearts).

So, now we know that non-disjoint event can happen at the same time. Another example will be, a student can get an A in statistics and Econ in the same semester. P(A and B) is not equals 0.

Mathematically, if A and B are non-disjoint events, then the probability of either event occurring is given by:

P(A or B) = P(A) + P(B) – P(A and B)

The term P(A and B) represents the probability that both events A and B occur together, which needs to be subtracted from the sum of the individual probabilities to avoid double counting.

In the example above, the probability of drawing a red card is 26/52 or 1/2, and the probability of drawing a face card is 12/52 or 3/13. However, the probability of drawing a card that is both red and a face card (i.e., the Jack of Hearts) is 2/52 or 1/26.

Therefore, the probability of drawing either a red card or a face card is:

P(Red or Face) = P(Red) + P(Face) – P(Red and Face) = 1/2 + 3/13 – 1/26 = 27/52

Union of Disjoint Events – Example

What is the probability of drawing a Jack or a three from a well shuffled full deck of cards?

P( J or 3) = P(J) + P(3) = 4/52 + 4/52 = .154

For Disjoint events A and B, P(A or B) = P(A) + P(B)

Union of Non-disjoint Events – Example

What is the probability of drawing a Jack or a red card from a well shuffled full deck of cards?

How is this different form the previous question?

Here is the situation below.

Here we have 4 Jacks and 26 red cards in the deck and note that there is a overlap. Two red Jacks are there which fills both the criteria. So, we need to consider this overlap as we want to double count it once calculating the probability.

P( J or red) = P(A) + P(red) -P(J and red) = 4/52 + 26/52 -2/52 = 0.538

For non-disjoint events A and B, P(A or B) = P(A) + P(B) -P(A and B)

The general Addition rule

The general Addition rule of probability states that the probability of either of two mutually exclusive events occurring is equal to the sum of their individual probabilities. Mathematically, if A and B are two mutually exclusive events (meaning they cannot occur at the same time), then the probability of either event occurring is given by:

P(A or B) = P(A) + P(B)

P(A or B) = P(A) + P(B) -P(A and B)

Note that when A and B are disjoint, P(A and B) = 0, so the formula simplifies to P(A or B)= P(A) + P(B)

This rule can be extended to more than two events as well. For example, if A, B, and C are three mutually exclusive events, then the probability of at least one of them occurring is given by:

P(A or B or C) = P(A) + P(B) + P(C)

Sample Space

A sample space is a collection of all possible outcomes of a trail. For example, a couple has two kids, what is the sample space for the sex of these kids. Assume that a sex can only be male or female.

{MM, FF, FM, MF } —– Sample space for sex of two kids for a couple.

A second example may be, you are tossing a coin two times what will be the sample space? It will be

{HH, TT, HT, TH } . So, as outcome may happen equally likely. we have 25% chance of each outcome may happen. A probability distribution lists all possible outcome in the sample space and the probabilities with which they occur.

Note that this is the probability distribution for discrete events. Next section you will get idea for probability distribution of continuous variable. Probability distribution follow three broad rules.

Probability Distribution Rules

The events listed must be disjoint.
Each probability must be between 0 and 1.
The probabilities must total 1.

First rule describes that, the sum of probabilities in a probability distribution is always equal to 1. In other words, the probability of all possible outcomes in a sample space must add up to 1.

Second rule says, Probabilities of individual events in a probability distribution must be between 0 and 1 (inclusive). This means that the probability of an event cannot be negative, nor can it be greater than 1.

Third rule says, the complement rule states that the probability of an event occurring is equal to 1 minus the probability of the event not occurring. In other words, if A is an event, then the probability of A not occurring is 1 – P(A).

The addition rule states that the probability of the union of two events A and B is given by P(A or B) = P(A) + P(B) – P(A and B), where P(A and B) represents the probability of both events occurring together. This rule applies only when events A and B are not mutually exclusive (i.e., they can occur at the same time).

The multiplication rule states that the probability of the intersection of two independent events A and B is given by P(A and B) = P(A) x P(B), where P(A) and P(B) are the probabilities of events A and B occurring, respectively. This rule applies only when events A and B are independent (i.e., the occurrence of one event does not affect the probability of the other event).

These rules help ensure that probability distributions are valid and can be used to make predictions or draw conclusions about a population or sample.

Complementary Events

Complementary events are two events that are mutually exclusive (i.e., they cannot occur at the same time) and together make up the entire sample space. In other words, if event A is the event of interest, then its complementary event, denoted as A’, is the event that A does not occur.

So, complementary events are two mutually exclusive events whose probabilities add up to 1.

Note that complementary and disjoint events are not same. Because sum of probabilities of two disjoint outcomes not necessarily add up to one. But sum of probabilities of two complementary outcomes always add up to 1.

Independent Event

An event is independent if its outcome does not depend on the previous outcomes. Two process are independent if knowing the outcome of one provides no useful information about the outcome of the other. Let’s say you toss a coin 10 times, and it lands on head each time. What do you think the chance is that another head will come up on the next toss? The probability is still 50%.

P( H on the 11^th toss) = (PH on the 10^th toss) =0.5

On the other way, you can think an independent event is memory less. It doesn’t remember what happened in past.

On the other way, if you draw a card from deck and in the 1^st draw you got a J. Now in the 2^nd draw probability of J i.e. P(J) = 3/51. As you already drawn a card so now we have 52-1=51 cards and now number J will be 4-1 =3. Before 1^st draw probability of J was 4/52 but in the end draw it comes to 3/51. So, this is an example of dependent event.

Checking for Independence

If probability of A given B is Probability of A, then A and B are independent events. Which basically tell us that knowing B is nothing about A.

P ( A|B) =P(A), then A and B are independent.

Multiplication Rule for Independent Events

The product rules for independent event says If A and B are independent then probability of A and B happening is simply product of their probability.

If A and B are independent, P( A and B) = P(A) * P(B)

If you toss a coin twice what is the probability of getting two tails in a row?

P ( Two tails in a row) = P( T on the 1^st Toss) * P( T on the 2^nd Toss) = P(0.5) * P(0.5) =0.25

If A1, A2, A3,………AK are independent, P(A1 and A2 and A3…..Ak)= P(A1)*P(A2)*P(A3)*…..*P(AK)

Marginal Probability

Marginal probability refers to the probability of an event occurring without considering the occurrence of other events. It is the probability distribution of a single random variable, without any reference to other variables.

For example, the study title ADOLESCENTS’ UNDERSTANDING OF SOCIAL CLASS is the study examining teen’ belief about their social class. Sample consists 48 working class, 50 upper middle class 16 year old.

The study was designed by following way:

“Objective” assignment to social class based on self-reported measures of both parents’ occupation and education and household income.
“subjective” association based on survey questions

Here is the summarization of the study as a contingency table.

What is the probability that a student’ objective social class position is upper middle?

If you see the objective upper middle class column in the table it shows total 50 students who belong in this category. So probability will be P( objective upper middle class) 50/98 -0.51 . The term marginal probability comes from the fact that the count we use the probability comes from the margin of the contingency table. Here 50 and 98 both come from the total column which is the margin of that contingency table.

Joint Probability

Joint probability is the probability of two or more events occurring simultaneously. It is the probability of the intersection of two or more events in a sample space. Joint probability is denoted as P(A and B) and is read as “the probability of A and B.”

Now the questions is, “What is the probability that a student’s objective position and subjective identity are both upper middle class?”

P( Objective upper middle class and Subjective upper middle class) =37/98 =.0.38. See the above picture marked by circle. The important term in joint probability is AND. Here students are being considered who are on the intersection of the two event of interest.

Conditional Probability

Conditional probability is the probability of an event A occurring given that another event B has already occurred. It is denoted as P(A|B) and is read as “the probability of A given B.” The conditional probability of A given B is calculated using the formula:

P(A|B) = P(A and B) / P(B)

where P(A and B) is the joint probability of A and B occurring, and P(B) is the probability of B occurring.

Now Calculate, What is probability that a student who is objectively in the working class associated with upper middle class?

P(subjective upper middle class | objective working class) = 8/48 =0.17

Here main important thing to be noted the vertical line which is called given that separates what we are looking for and what we know to be true bout the students. This is called conditional because 1^st we conditioned only on the working class and then probability is calculated based on the count only in this column.

Bayes’ Theorem

Formally, we calculate conditional probability based on Bayes’ theorem.

P(A|B) = P(A and B) / P(B)

Here joint probability on numerator and what we conditioned on the denominator. Consider the same question mentioned in the conditional probability section and calculate the probability using Bayes’ Theorem.

P (Subjective upper middle class | Objective working class) = P ( Subjective Upper middle class & Objective working class) / P(objective working class) = (8/98) / (48/98) = 8/48 = 0.17 ( we get the same answer what we got previously. We already arrived at the same answer by simply reasoning through the contingency table. But if we don’t have the counts neatly organized in the table then using Bayes’ theorem calculating the conditional probability will be much more intuitive.

In a card game, suppose a player needs to draw two cards of the same suit in order to win. Of the 52 cards, there are 13 cards in each suit. Suppose first the player draws a heart. Now the player wishes to draw a second heart. Since one heart has already been chosen, there are now 12 hearts remaining in a deck of 51 cards. So the conditional probability P(Draw second heart|First card a heart) = 12/51.

Suppose an individual applying to a college determines that he has an 80% chance of being accepted, and he knows that dormitory housing will only be provided for 60% of all of the accepted students. The chance of the student being accepted and receiving dormitory housing is defined by

P(Accepted and Dormitory Housing) = P(Dormitory Housing|Accepted)P(Accepted) = (0.60)*(0.80) = 0.48.

General Product rule

Previously, It is shown that product rule for independent event will be P(A and B) = P(A) * P( B) if A and B are independent. If they are not intendent then joint probability needs to be calculated slightly differently.

Since Bayes’ theorem does not have independence condition we can simply rearrange the Bayes’ theorem to get Joint Probability P( A and B) as a product of conditional probability P (A|B) multiplying by Probability P(b)

General product rule: P(A and B) = P(A|B) * P(B)

Here we are shuffling Bayes’ theorem to get a new rule for joint probability. Consider the below example.

P(Accepted and Dormitory Housing) = P(Dormitory Housing|Accepted)P(Accepted) = (0.60)*(0.80) = 0.48.

Independence and Conditional Probability

Generically, if P(A|B) = P(A) then events A and B are said to be independent.

Conceptually, Giving B doesn’t tell us anything about A

Mathematically, If events A and B are independent, P( A and B) = P(A) * P(B). Then,

P(A|B) = P( A and B) / P(B) = P(A)*P(B) / P(B) = P(A)

Previously we suggest the rules that P (A | B) = P(A) Now using Bayes’ theorem we can prove that why this is the case mathematically.

Probability Trees

Probability trees are very important when P (A | B ) is already given for a question and then they ask for P (B | A).

You have 100 emails in your mail box. 60 are spam, 40 are not. Of the 60 spam emails, 35 contain the word “free”. Of the rest, 3 contain the word “free”. If and email contains the word “free”, what is the probability that it is spam?

We are trying to find out P (Spam | “free”). First see the below picture to get an idea how to organize them into a probability tree.

As, we are interested in only the word “free” so 35 come from spam folder and 3 come from non-spam folder. Total words that contain the word “free” is 35+3 =38.

P ( Spam | “free” ) = 35 / (35+3) = 35/38 = 0.92

Here, we have implicitly made use of Bayes’ theorem. Numerator is having the joint probability and the denominator is having marginal probability of what we are conditioning on the word “free”.

Consider another example:

As of 2009, Swaziland had the highest HIV prevalence in the world. 25.9 % of this country’s population is infected with HIV. The ELISA test is the one of the first and most accurate tests for HIV. For those who carries HIV, the ELISA test is 99.7 % accurate. For those who do not carry HIV, the test is 92.6 % accurate. If an individual form Swaziland has tested positive, what is the probability that he carries HIV?

P ( HIV ) = 0.259

P (+ | HIV ) =0.997 and P ( – | HIV) =0.926

Now find P ( HIV | +) = ?

So, here a conditional probability has been asked in the reverse way of a given probability i.e. P (+ | HIV ) =0.997. We should follow the tree diagram.

Common Applications of Elementary Probability Theory

Some common applications of elementary probability theory include analyzing the outcomes of games of chance, predicting the likelihood of future events, and modeling complex systems such as financial markets or weather patterns. It is also used in various fields such as statistics, physics, engineering, and computer science.

Gambling:

Probability theory is essential in analyzing games of chance, such as roulette, blackjack, and poker. The odds of winning in a particular game can be calculated using probability theory, which can help players make more informed decisions.

Finance:

Probability theory is used in finance to assess the risk of an investment and predict the future performance of financial assets. It is also used to model the behavior of financial markets and to develop trading strategies.

Insurance:

Insurance companies use probability theory to calculate the probability of an event occurring, such as a car accident or natural disaster, and to set insurance premiums based on the level of risk.

Physics:

Probability theory is used in quantum mechanics to describe the behavior of subatomic particles, where the outcome of a measurement is determined by chance.

Genetics:

Probability theory is used to study inheritance patterns and to calculate the likelihood of genetic disorders being passed down from one generation to the next.

Weather forecasting:

Probability theory is used to create models of weather patterns and to predict the likelihood of specific weather events occurring.

Machine learning:

Probability theory is used in machine learning algorithms, such as Bayesian networks and hidden Markov models, to make predictions and decisions based on incomplete or uncertain information.

Basic Statistics

An Introduction to Elementary Probability