How to Calculate Conditional Probability in R?

Conditional Probability

Conditional probability is a measure of the likelihood of an event occurring, given that another event has already occurred. It allows us to update our beliefs about the probability of an event based on new information. Conditional probability is written as P(B | A), which is read as “the probability of event B given event A.” It can be calculated using the following formula:

P(B | A) = P(A and B) / P(A)

where:

P(B | A) is the conditional probability of event B occurring given event A has occurred.
P(A and B) is the joint probability of both events A and B occurring together.
P(A) is the probability of event A occurring.

To better illustrate the concept, let’s consider an example:

Suppose we have a deck of 52 playing cards. We know that there are 13 hearts and 12 face cards in the deck. Let’s calculate the conditional probability of drawing a face card, given that the card is a heart.

First, we need to determine the probability of drawing a heart (event A). There are 13 hearts in the deck, so: P(A) = 13 / 52 = 1/4
Next, we need to find the joint probability of drawing a face card and a heart (event A and B). There are 3 face cards that are also hearts (the King, Queen, and Jack of hearts), so: P(A and B) = 3 / 52
Finally, we can calculate the conditional probability of drawing a face card given that the card is a heart (P(B | A)): P(B | A) = P(A and B) / P(A) = (3 / 52) / (1/4) = 3/13

So, the probability of drawing a face card given that the card is a heart is 3/13 or approximately 0.2308.

Calculate Conditional Probability in R

To calculate conditional probability in R, you can use the prop.table() function. Let’s assume you have a data frame with two variables (or columns) named A and B, and you want to find the conditional probability P(B | A). Here’s how to do it:

Create a contingency table (also known as a cross-tabulation or crosstab) using the table() function.
Convert the contingency table into a conditional probability table using the prop.table() function.

Here’s a step-by-step example:

# Sample data
data <- data.frame(
A = c("a1", "a1", "a1", "a2", "a2", "a2"),
B = c("b1", "b1", "b2", "b1", "b2", "b2")
)

# Create a contingency table
contingency_table <- table(data$A, data$B)

# Calculate the conditional probability table P(B | A)
conditional_probability_table <-
prop.table(contingency_table, margin = 1)

# Print the conditional probability table
print(conditional_probability_table)

The conditional_probability_table variable will now contain the conditional probabilities P(B | A) for all combinations of A and B. The margin = 1 argument in the prop.table() function indicates that the probabilities should be calculated by dividing each cell by the row sums (i.e., the probabilities are conditioned on the first variable, A).

If you want to find a specific conditional probability, like P(B=b1 | A=a1), you can access the corresponding cell in the conditional probability table:

probability_b1_given_a1 <- conditional_probability_table["a1", "b1"]
print(probability_b1_given_a1)

Remember to replace the sample data with your own dataset and variable names.

Example 2 – Cloudy Days

Let’s consider another example of calculating conditional probabilities using R. We’ll work with data related to the likelihood of rain given the presence of clouds.

First, let’s create a simple data frame with the information:

# Data frame with weather information
weather_data <- data.frame(
Cloudy = c("Yes", "Yes", "No", "No"),
Rain = c("Yes", "No", "Yes", "No"),
Frequency = c(30, 20, 10, 40)
)

This table represents the frequency of different weather conditions in a particular region:

Cloudy	Rain	Frequency
Yes	Yes	30
Yes	No	20
No	Yes	10
No	No	40

Now, let’s calculate the conditional probability of rain given the presence of clouds (P(Rain | Cloudy)):

# Total frequency of cloudy days
total_cloudy <-
sum(weather_data$Frequency[weather_data$Cloudy == "Yes"])

# Frequency of rainy days when it's cloudy
rainy_and_cloudy <-
weather_data$Frequency[weather_data$Cloudy == "Yes" &
weather_data$Rain == "Yes"]

# Conditional probability of rain given clouds
P_rain_given_cloudy <- rainy_and_cloudy / total_cloudy
P_rain_given_cloudy

In this example, the total frequency of cloudy days is 50 (30 + 20), and the frequency of rainy days when it’s cloudy is 30. The conditional probability of rain given clouds is 30 / 50 = 0.6 or 60%.

Example 3 – Student Information

Let’s consider another example using conditional probabilities in R. This time, we’ll work with data related to the likelihood of passing an exam given the attendance in a course.

First, let’s create a simple data frame with the information:

# Data frame with student information
student_data <- data.frame(
Attendance = c("High", "High", "Low", "Low"),
Pass = c("Yes", "No", "Yes", "No"),
Frequency = c(80, 20, 30, 70)
)

This table represents the frequency of different student outcomes in a particular course:

Attendance	Pass	Frequency
High	Yes	80
High	No	20
Low	Yes	30
Low	No	70

Now, let’s calculate the conditional probability of passing the exam given high attendance (P(Pass | High Attendance)):

# Total frequency of students with high attendance
total_high_attendance <-
sum(student_data$Frequency[student_data$Attendance == "High"])

# Frequency of students who pass the exam with high attendance
pass_and_high_attendance <-
student_data$Frequency[student_data$Attendance == "High" &
student_data$Pass == "Yes"]

# Conditional probability of passing the exam given high attendance
P_pass_given_high_attendance <-
pass_and_high_attendance / total_high_attendance
P_pass_given_high_attendance

In this example, the total frequency of students with high attendance is 100 (80 + 20), and the frequency of students who pass the exam with high attendance is 80. The conditional probability of passing the exam given high attendance is 80 / 100 = 0.8 or 80%.

Statistics with R