Calculate and visualize the variance and standard deviation for discrete and continuous random variables
The variance of a Bernoulli random variable X with parameter p is Var(X) = p(1-p).
The Variance of a random variable X, denoted by Var(X) or σ², is a measure of the dispersion or spread of the probability distribution. It quantifies how far the values of the random variable typically deviate from the mean.
where μ = E[X] is the mean (expected value) of X.
where xi are the possible values of X and P(X = xi) is the probability of each value.
where f(x) is the probability density function (PDF) of X.
This alternative formula is often easier to compute.
The Standard Deviation (σ) is the square root of the variance:
The standard deviation has several important interpretations:
The standard deviation measures the typical or average deviation of values from the mean. Unlike variance, it's expressed in the same units as the random variable, making it more directly interpretable.
For normally distributed variables, the standard deviation determines the probability of observations falling within specific ranges:
This is known as the "empirical rule" or "68-95-99.7 rule."
The coefficient of variation (CV) is the ratio of the standard deviation to the mean:
It provides a dimensionless measure of relative variability, allowing for comparison of dispersion across different distributions with different units or scales.
Consider a discrete random variable X with the following probability distribution:
P(X = 1) = 0.2, P(X = 2) = 0.5, P(X = 3) = 0.3
The mean is:
The variance is:
And the standard deviation is:
Distribution | Parameters | Variance | Notes |
---|---|---|---|
Bernoulli | p = probability of success | \(Var(X) = p(1-p)\) | Maximized at p = 0.5, where Var(X) = 0.25 |
Binomial | n = number of trials p = probability of success |
\(Var(X) = np(1-p)\) | Sum of n independent Bernoulli variances |
Geometric | p = probability of success | \(Var(X) = \frac{1-p}{p^2}\) | Increases rapidly as p approaches 0 |
Poisson | λ = rate parameter | \(Var(X) = \lambda\) | Equal to the mean, a unique property |
Uniform | a = lower bound b = upper bound |
\(Var(X) = \frac{(b-a)^2}{12}\) | Depends only on the range width |
Normal | μ = mean σ = standard deviation |
\(Var(X) = \sigma^2\) | Variance is a parameter of the distribution |
Exponential | λ = rate parameter | \(Var(X) = \frac{1}{\lambda^2}\) | Standard deviation equals the mean |
Variance is fundamental in portfolio theory as a measure of investment risk. The variance of returns quantifies investment volatility, helping investors balance risk and return. Modern Portfolio Theory uses variance to optimize asset allocation, while Value at Risk (VaR) calculations rely on variance to estimate potential losses.
In experimental sciences, variance measures the reliability and consistency of observations. Low variance suggests high precision in measurements. In statistical hypothesis testing, variance is used to assess whether differences between groups are statistically significant or merely due to random variation in the data.
Variance is central to Six Sigma and other quality improvement methodologies. Process capability indices like Cp and Cpk compare the variance of a process to specification limits. Statistical Process Control uses variance to set control limits for detecting when a process is operating outside normal parameters.
In machine learning, variance has dual meanings: as a statistical measure and as a learning theory concept. High-variance models (like deep neural networks) can overfit training data, while variance is used in feature selection to identify informative variables. Principal Component Analysis uses variance to identify the most important dimensions in high-dimensional data.
1. What is the variance of a Bernoulli random variable with p = 0.5?
2. The variance of a binomial distribution with n = 10 and p = 0.3 is: