Normal Distribution in R

Normal distribution, also known as Gaussian distribution, is a probability distribution that is often used in statistics and probability theory to describe continuous random variables.

The shape of a normal distribution is a bell curve, which is symmetrical and has a single peak at the center. The mean, median, and mode of a normal distribution are all equal, and the distribution is characterized by two parameters: the mean (μ) and the standard deviation (σ).

In a normal distribution, the majority of the data falls within one, two, or three standard deviations of the mean. This makes normal distribution a useful tool for analyzing and modeling many natural phenomena, such as the heights of people in a population, the IQ scores of a group of individuals, or the error rates of a manufacturing process

In R, there are various functions to work with the normal distribution. Here are some of the most common ones:

1. dnorm(x, mean = 0, sd = 1, log = FALSE): This function returns the probability density function (PDF) of the normal distribution at a given value x, with mean mean and standard deviation sd. If log is set to TRUE, the logarithm of the PDF is returned.

2. pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE): This function returns the cumulative distribution function (CDF) of the normal distribution at a given value q, with mean mean and standard deviation sd. If lower.tail is set to FALSE, the upper tail probability is returned. If log.p is set to TRUE, the logarithm of the CDF is returned.

3. qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE): This function returns the quantile function (inverse CDF) of the normal distribution at a given probability p, with mean mean and standard deviation sd. If lower.tail is set to FALSE, the upper tail quantile is returned. If log.p is set to TRUE, the logarithm of the probability is used.

4. rnorm(n, mean = 0, sd = 1): This function generates n random numbers from the normal distribution with mean mean and standard deviation sd.

These functions are very useful for statistical analysis and modeling in R.

Normal Distribution in R

The rnorm() function in R

In R, the normal distribution can be simulated using the built-in function rnorm(). This function generates random numbers from a normal distribution with a specified mean and standard deviation.

Here’s an example of how to generate 1000 random numbers from a normal distribution with a mean of 0 and a standard deviation of 1:

 

 

# Generate 1000 random numbers from a normal distribution
data <- rnorm(1000, mean = 0, sd = 1)

# Plot a histogram of the data to visualize the distribution
hist(data,
main = "Normal Distribution",
xlab = "Data",
ylab = "Frequency"
)

This code generates 1000 random numbers from a standard normal distribution (i.e., with a mean of 0 and a standard deviation of 1), stores the numbers in the variable data, and then plots a histogram of the data using the hist() function.

If you want to generate random numbers from a normal distribution with a different mean or standard deviation, simply adjust the values for the mean and sd parameters accordingly. For example, to generate random numbers from a normal distribution with a mean of 5 and a standard deviation of 2, you would use the following code:

# Generate 1000 random numbers from a normal distribution
data <- rnorm(1000, mean = 5, sd = 2)

# Plot a histogram of the data to visualize the distribution
hist(data,
main = "Normal Distribution",
xlab = "Data",
ylab = "Frequency"
)

The dnorm() function in R

In R, the dnorm() function is used to compute the probability density function (PDF) of the normal distribution at a given value or set of values. The syntax of the dnorm() function is as follows:

dnorm(x, mean = 0, sd = 1)

Here, x is the value or vector of values at which to evaluate the PDF. The mean parameter specifies the mean of the normal distribution (default is 0), and the sd parameter specifies the standard deviation of the normal distribution (default is 1).

For example, to compute the PDF of a normal distribution with a mean of 2 and a standard deviation of 3 at the values 0, 1, 2, 3, and 4, we can use the following code:

dnorm(c(0, 1, 2, 3, 4), mean = 2, sd = 3)

This will produce the PDF values at the specified values, with the output being a vector of the same length as the input values.

Note that the dnorm() function returns the height of the PDF at the specified value(s), rather than the probability. To compute the probability of a range of values, you would need to integrate the PDF over that range.

 

If you want to plot the probability density function of the normal distribution, you can use the dnorm() function. The dnorm() function takes two arguments: the value(s) for which to calculate the density, and the mean and standard deviation of the distribution.

Here is an example of plotting the probability density function of a normal distribution with a mean of 0 and a standard deviation of 1:

 

x <- seq(-4, 4, length.out = 100)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l")

The pnorm() function in R

In R, the pnorm() function is used to calculate the cumulative distribution function (CDF) of a normal distribution. The CDF gives the probability that a random variable is less than or equal to a given value.

The pnorm() function takes two arguments: the value(s) for which to calculate the CDF, and the mean and standard deviation of the normal distribution. By default, pnorm() calculates the area to the left of the given value(s).

Here is an example of using the pnorm() function to calculate the CDF of a normal distribution with a mean of 0 and a standard deviation of 1:

# Calculate the CDF for x = 1
pnorm(1, mean = 0, sd = 1)

# Calculate the CDF for x = -1
pnorm(-1, mean = 0, sd = 1)

# Calculate the CDF for x = 0
pnorm(0, mean = 0, sd = 1)

The output will be the probabilities that a random variable from the given normal distribution is less than or equal to the given values (1, -1, and 0, respectively).

If you want to calculate the area to the right of a given value(s), you can set the lower.tail argument to FALSE. For example:

# Calculate the area to the right of x = 1
pnorm(1, mean = 0, sd = 1, lower.tail = FALSE)

# Calculate the area to the right of x = -1
pnorm(-1, mean = 0, sd = 1, lower.tail = FALSE)

Here is an example plot of the CDF of a normal distribution with mean 0 and standard deviation 1 using pnorm function in R:

# Generate sequence of 100 x-values from -3 to 3
x <- seq(-3, 3, length = 100)

# Plot the CDF of the standard normal distribution
plot(x, pnorm(x), type = "l", lty = 1, 
xlab = "x", ylab = "Cumulative Probability", 
main = "CDF of Standard Normal Distribution")

# Add a legend with the mean and standard deviation
legend("topright", legend = c("Mean = 0", "SD = 1"), 
lty = 1, col = 1, bg = "white")

The qnorm() function in R

The qnorm() function in R is used to calculate the quantiles of the normal distribution.

The function takes two arguments:

  1. p – the probability of getting a value less than or equal to the quantile
  2. mean and sd – the mean and standard deviation of the normal distribution (default is mean = 0 and sd = 1)

The output of the function is the quantile for the given probability p.

For example, to find the quantile for a probability of 0.95 in a normal distribution with a mean of 10 and standard deviation of 2, you can use the following code:

qnorm(0.95, mean = 10, sd = 2)

# Generate some random data
set.seed(123)
x <- rnorm(100)

# Create a normal probability plot
plot(qnorm(seq(0.01, 0.99, length.out = 100)), sort(x),
xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",
main = "Normal Probability Plot")

In summary, “dnorm” calculates the PDF, “pnorm” calculates the CDF, “qnorm” calculates the inverse of the CDF (quantiles), and “rnorm” generates random numbers from the normal distribution. These functions are useful for different purposes in statistical analysis and data science.

The main difference between dnorm() and rnorm() functions in R

The main difference between dnorm() and rnorm() functions in R is that dnorm() is used to calculate the probability density function (PDF) of the normal distribution, while rnorm() is used to generate random numbers from the normal distribution.

The dnorm() function takes as input a value or a vector of values, and returns the height of the normal distribution at those values, given a specified mean and standard deviation. In other words, dnorm() calculates the probability density of the normal distribution at a specific point or set of points.

On the other hand, rnorm() is used to generate random numbers from a normal distribution with a specified mean and standard deviation. This function takes as input the number of random values to generate, and returns a vector of random numbers drawn from the specified normal distribution.

To summarize, dnorm() is used to calculate the height of the normal distribution at a specific point or set of points, while rnorm() is used to generate random numbers from a normal distribution.

 

The main difference between the “rnorm()” and “qnorm()” functions in R

The main difference between the “rnorm()” and “qnorm()” functions in R is that “rnorm()” generates random numbers from a normal distribution, while “qnorm()” calculates quantiles of a normal distribution.

The “rnorm()” function is used to generate a specified number of random samples from a normal distribution with a specified mean and standard deviation. This function is useful for simulating data, creating random samples, and performing Monte Carlo simulations.

The “qnorm()” function, on the other hand, calculates the value at a specified quantile of a normal distribution, given the mean and standard deviation. This function is useful for calculating percentiles, constructing confidence intervals, and setting decision thresholds.

In summary, “rnorm()” generates random samples from a normal distribution, while “qnorm()” calculates the value at a specified quantile of a normal distribution. Both functions are useful for different purposes in statistical analysis and data science.

Data Basics: Plotting – Charts and Graphs

Skewness of statistical data