What is Multinomial Distribution?

The multinomial distribution is a probability distribution that describes the probability of observing a certain number of each of several possible outcomes in a sequence of independent trials, where each trial has multiple possible outcomes with fixed probabilities.

In other words, the multinomial distribution models the probability of observing a particular set of counts for each possible outcome, given the total number of trials and the probabilities of each outcome.

The multinomial distribution is an extension of the binomial distribution, which models the probability of observing a certain number of successes in a fixed number of trials with two possible outcomes (e.g., heads or tails).

The parameters of the multinomial distribution are the total number of trials (n) and a vector of probabilities (p) for each possible outcome. The probabilities in the vector p must sum to 1, and each individual probability must be between 0 and 1.

The probability mass function of the multinomial distribution is:

P(x1, x2, …, xk) = (n! / (x1! x2! … xk!)) * p1^x1 * p2^x2 * … * pk^xk,

where x1, x2, …, xk are the counts of each possible outcome, and p1, p2, …, pk are the probabilities of each outcome.

In R, the rmultinom() function can be used to simulate random samples from a multinomial distribution, and the dmultinom() function can be used to calculate the probability mass function of a multinomial distribution.

How to Use the Multinomial Distribution in R?

The multinomial distribution is a probability distribution that is used when there are more than two possible outcomes for a single event. In R, the rmultinom function can be used to simulate data from a multinomial distribution, while the dmultinom function can be used to calculate the probability of a specific outcome.

Here’s an example of how to use the rmultinom function to simulate 1000 samples of rolling a fair six-sided die three times:

# Set the number of trials and the probabilities for each outcome
n <- 3
probs <- rep(1/6, 6)

# Simulate 1000 samples
samples <- rmultinom(n = 1000, size = n, prob = probs)

# View the first 5 samples
head(samples)

This code will produce a matrix where each column represents a sample of rolling the die three times and each row represents the number of times the corresponding outcome occurred in that sample.

To calculate the probability of a specific outcome, you can use the dmultinom function. Here’s an example of how to calculate the probability of getting two ones and a two when rolling a fair six-sided die three times:

# Set the outcome you're interested in
outcome <- c(2, 1, 0, 0, 0, 0)

# Calculate the probability of that outcome
prob <- dmultinom(x = outcome, size = n, prob = probs)

# View the result
prob

This code will produce the probability of getting two ones and a two when rolling a fair six-sided die three times.

Example – 2

Here’s another example of how to use the Multinomial distribution in R:

# generate a random sample of 100 outcomes from a 
# Multinomial distribution with 4 possible outcomes 
# and probabilities (0.3, 0.2, 0.2, 0.3)
set.seed(123)
sample <- rmultinom(n = 1,
size = 100,
prob = c(0.3, 0.2, 0.2, 0.3))

# print the sample
sample

# compute the probability mass function of a Multinomial 
# distribution with 4 possible outcomes and 
# probabilities (0.3, 0.2, 0.2, 0.3)
probs <-
dmultinom(
x = c(34, 22, 21, 23),
size = 100,
prob = c(0.3, 0.2, 0.2, 0.3)
)

# print the probabilities
probs

Output

> sample

[,1]
[1,] 34
[2,] 22
[3,] 21
[4,] 23

> probs
[1] 0.0002940597

Example – 3

# Set the seed for reproducibility
set.seed(123)

# Generate 1000 samples from a multinomial distribution with 3 trials 
# and unequal probabilities for each outcome
samples <- rmultinom(n = 1000,
size = 3,
prob = c(0.3, 0.4, 0.3))

# Create a table of the results
tab <- table(samples)

# Create a pie chart of the proportions
pie(prop.table(tab),
labels = paste0(names(tab), ": ", round(prop.table(tab) * 100, 1), "%"),
main = "Multinomial Distribution")

What is the difference between binomial distribution and multinomial distribution?

The binomial distribution and the multinomial distribution are both probability distributions used in statistics to model the probability of certain events occurring. However, they differ in some key ways:

Number of outcomes: The binomial distribution is used when there are only two possible outcomes for each trial (such as success or failure), while the multinomial distribution is used when there are three or more possible outcomes.
Number of trials: The binomial distribution models the probability of a certain number of successes in a fixed number of independent trials, while the multinomial distribution models the probability of observing a certain combination of outcomes across multiple trials.
Probability distribution function: The binomial distribution has a single probability distribution function, while the multinomial distribution has multiple probability distribution functions, one for each possible combination of outcomes.
Independence: The binomial distribution assumes that the trials are independent, while the multinomial distribution can be used to model dependent trials.
Parameters: The binomial distribution has two parameters, the probability of success in each trial and the number of trials, while the multinomial distribution has multiple parameters representing the probabilities of each outcome.

In summary, the binomial distribution is used for modeling the probability of a fixed number of successes in a fixed number of independent trials with two possible outcomes, while the multinomial distribution is used for modeling the probability of observing a certain combination of outcomes across multiple trials with three or more possible outcomes.

Statistics with R

How to Use the Multinomial Distribution in R