How to Calculate Conditional Probability in R?

Conditional Probability

Conditional probability is a measure of the likelihood of an event occurring, given that another event has already occurred. It allows us to update our beliefs about the probability of an event based on new information. Conditional probability is written as P(B | A), which is read as “the probability of event B given event A.” It can be calculated using the following formula:

P(B | A) = P(A and B) / P(A)

where:

  • P(B | A) is the conditional probability of event B occurring given event A has occurred.
  • P(A and B) is the joint probability of both events A and B occurring together.
  • P(A) is the probability of event A occurring.

To better illustrate the concept, let’s consider an example:

Suppose we have a deck of 52 playing cards. We know that there are 13 hearts and 12 face cards in the deck. Let’s calculate the conditional probability of drawing a face card, given that the card is a heart.

  1. First, we need to determine the probability of drawing a heart (event A). There are 13 hearts in the deck, so: P(A) = 13 / 52 = 1/4
  2. Next, we need to find the joint probability of drawing a face card and a heart (event A and B). There are 3 face cards that are also hearts (the King, Queen, and Jack of hearts), so: P(A and B) = 3 / 52
  3. Finally, we can calculate the conditional probability of drawing a face card given that the card is a heart (P(B | A)): P(B | A) = P(A and B) / P(A) = (3 / 52) / (1/4) = 3/13

So, the probability of drawing a face card given that the card is a heart is 3/13 or approximately 0.2308.

Calculate Conditional Probability in R

To calculate conditional probability in R, you can use the prop.table() function. Let’s assume you have a data frame with two variables (or columns) named A and B, and you want to find the conditional probability P(B | A). Here’s how to do it:

  1. Create a contingency table (also known as a cross-tabulation or crosstab) using the table() function.
  2. Convert the contingency table into a conditional probability table using the prop.table() function.

 

Here’s a step-by-step example:

# Sample data
data <- data.frame(
A = c("a1", "a1", "a1", "a2", "a2", "a2"),
B = c("b1", "b1", "b2", "b1", "b2", "b2")
)

# Create a contingency table
contingency_table <- table(data$A, data$B)

# Calculate the conditional probability table P(B | A)
conditional_probability_table <-
prop.table(contingency_table, margin = 1)

# Print the conditional probability table
print(conditional_probability_table)

 

The conditional_probability_table variable will now contain the conditional probabilities P(B | A) for all combinations of A and B. The margin = 1 argument in the prop.table() function indicates that the probabilities should be calculated by dividing each cell by the row sums (i.e., the probabilities are conditioned on the first variable, A).

If you want to find a specific conditional probability, like P(B=b1 | A=a1), you can access the corresponding cell in the conditional probability table:

probability_b1_given_a1 <- conditional_probability_table["a1", "b1"]
print(probability_b1_given_a1)

Remember to replace the sample data with your own dataset and variable names.

Example 2 – Cloudy Days

Let’s consider another example of calculating conditional probabilities using R. We’ll work with data related to the likelihood of rain given the presence of clouds.

First, let’s create a simple data frame with the information:

# Data frame with weather information
weather_data <- data.frame(
Cloudy = c("Yes", "Yes", "No", "No"),
Rain = c("Yes", "No", "Yes", "No"),
Frequency = c(30, 20, 10, 40)
)

This table represents the frequency of different weather conditions in a particular region:

Cloudy Rain Frequency
Yes Yes 30
Yes No 20
No Yes 10
No No 40

Now, let’s calculate the conditional probability of rain given the presence of clouds (P(Rain | Cloudy)):

# Total frequency of cloudy days
total_cloudy <-
sum(weather_data$Frequency[weather_data$Cloudy == "Yes"])

# Frequency of rainy days when it's cloudy
rainy_and_cloudy <-
weather_data$Frequency[weather_data$Cloudy == "Yes" &
weather_data$Rain == "Yes"]

# Conditional probability of rain given clouds
P_rain_given_cloudy <- rainy_and_cloudy / total_cloudy
P_rain_given_cloudy

In this example, the total frequency of cloudy days is 50 (30 + 20), and the frequency of rainy days when it’s cloudy is 30. The conditional probability of rain given clouds is 30 / 50 = 0.6 or 60%.

Example 3 – Student Information

Let’s consider another example using conditional probabilities in R. This time, we’ll work with data related to the likelihood of passing an exam given the attendance in a course.

First, let’s create a simple data frame with the information:

# Data frame with student information
student_data <- data.frame(
Attendance = c("High", "High", "Low", "Low"),
Pass = c("Yes", "No", "Yes", "No"),
Frequency = c(80, 20, 30, 70)
)

This table represents the frequency of different student outcomes in a particular course:

Attendance Pass Frequency
High Yes 80
High No 20
Low Yes 30
Low No 70

Now, let’s calculate the conditional probability of passing the exam given high attendance (P(Pass | High Attendance)):

# Total frequency of students with high attendance
total_high_attendance <-
sum(student_data$Frequency[student_data$Attendance == "High"])

# Frequency of students who pass the exam with high attendance
pass_and_high_attendance <-
student_data$Frequency[student_data$Attendance == "High" &
student_data$Pass == "Yes"]

# Conditional probability of passing the exam given high attendance
P_pass_given_high_attendance <-
pass_and_high_attendance / total_high_attendance
P_pass_given_high_attendance

In this example, the total frequency of students with high attendance is 100 (80 + 20), and the frequency of students who pass the exam with high attendance is 80. The conditional probability of passing the exam given high attendance is 80 / 100 = 0.8 or 80%.

Gamma Distribution in R Programming

How to Plot a Weibull Distribution in R