1. Why the Mean Isn't Enough
Imagine two carnival games. Both cost $5 to play:
Game A: "The Sure Thing"
You always win exactly $5 back.
Average payout = $5.00
Game B: "The Coin Flip"
Flip a coin: Heads = $10, Tails = $0.
Average payout = $5.00
Both games have the same average payout (μ = $5). But they feel completely different. Game A is safe. Game B is risky — you might walk away with nothing or double your money.
The mean tells you the center. But it says nothing about how spread out the outcomes are. We need a new number that captures this "spread" or "risk." That number is the variance.
2. What Does "Spread" Look Like?
Picture dots on a number line. Each dot is a possible outcome, and its size represents its probability. The mean μ is the balance point.
Low spread: values cluster near μ
Small variance
High spread: values are far from μ
Large variance
Variance answers: "On average, how far are the outcomes from the mean?" (Specifically, the average squared distance — we'll see why squared in a moment.)
3. Building the Variance Formula Step by Step
Let's measure spread systematically. We want to know "how far is each outcome from the mean?"
Step 1: Distance from the mean
For each outcome x, the distance from μ is: (x - μ)
Problem: Some distances are positive (x > μ) and some are negative (x < μ). If we just average them, they cancel out and we always get zero. Not helpful!
Step 2: Square the distances
Squaring makes all distances positive: (x - μ)²
Now big deviations (far from the mean) get amplified, which is exactly what we want — an outcome 4 units away contributes 16 to variance, not just 4.
Step 3: Weight by probability and add up
Multiply each squared distance by its probability, then sum everything:
σ² = Σ (x - μ)² · f(x)
This is the variance: the average squared distance from the mean, weighted by probability.
Step 4: Standard deviation undoes the squaring
Since we squared the distances, variance is in "squared units." To get back to the original units, take the square root. This is the standard deviation:
σ = √σ²
4. Worked Example: A Loaded Die
A weighted game piece has outcomes 1, 2, 3 with probabilities 3/6, 2/6, 1/6. Let's compute the variance.
First, find the mean:
μ = 1(3/6) + 2(2/6) + 3(1/6) = 10/6 = 5/3 ≈ 1.667
Now compute each squared distance times probability:
| x | x - μ | (x - μ)² | f(x) | (x-μ)² · f(x) |
|---|---|---|---|---|
| 1 | -2/3 | 4/9 | 3/6 | 12/54 |
| 2 | +1/3 | 1/9 | 2/6 | 2/54 |
| 3 | +4/3 | 16/9 | 1/6 | 16/54 |
σ² = 12/54 + 2/54 + 16/54 = 30/54 = 5/9 ≈ 0.556
σ = √(5/9) ≈ 0.745
On average, outcomes are about 0.745 units away from the mean of 1.667.
5. A Shortcut That Saves Real Time
Computing (x - μ) for every value gets tedious, especially when μ is a fraction. There's a much faster way. Here's the algebra trick:
Start with the definition and expand:
σ² = E[(X - μ)²]
= E[X² - 2μX + μ²]
= E(X²) - 2μE(X) + μ²
= E(X²) - 2μ² + μ²
= E(X²) - μ²
The Shortcut (memorize this!)
σ² = E(X²) - [E(X)]²
"The mean of the square minus the square of the mean."
Verify with our loaded die example:
E(X²) = 1²(3/6) + 2²(2/6) + 3²(1/6) = (3 + 8 + 9)/6 = 20/6
σ² = 20/6 - (5/3)² = 20/6 - 25/9 = (60 - 50)/18 = 10/18 = 5/9 ✓
Same answer, but we never had to compute (x - μ) for each row!
6. Seeing Standard Deviation in Action
Two distributions, both centered at 0, but with different spreads:
X: values {-1, 0, 1}
Each equally likely (probability 1/3)
μ = 0
σ² = 2/3
σ = √(2/3) ≈ 0.816
Y: values {-2, 0, 2}
Each equally likely (probability 1/3)
μ = 0
σ² = 8/3
σ = 2√(2/3) ≈ 1.633
Y = 2X — every value is doubled. The standard deviation also doubles: σY = 2σX. The variance quadruples: σ²Y = 4σ²X.
General rule: Y = aX + b
μY = aμX + b (shifting and scaling moves the center)
σ²Y = a²σ²X (only scaling affects spread — adding b does nothing!)
σY = |a|σX
Why doesn't adding a constant change variance?
Think about it: if you add 100 to every outcome, the mean also shifts by 100. Every outcome is still the same distance from the new mean. Nothing about the spread changed — you just slid everything along the number line.
7. Classic Example: Fair Six-Sided Die
f(x) = 1/6 for x = 1, 2, 3, 4, 5, 6. For the uniform distribution on {1, ..., m}:
Uniform {1, ..., m} Formulas
μ = (m + 1) / 2
σ² = (m² - 1) / 12
For m = 6 (a fair die):
μ = 7/2 = 3.5
σ² = (36 - 1)/12 = 35/12 ≈ 2.917
σ = √(35/12) ≈ 1.708
8. Moments: A Family of Summaries
The mean is the "first moment" and variance uses the "second moment." In general, the rth moment about the origin is:
E(Xr) = Σ xr · f(x)
r = 1 gives E(X) = μ. r = 2 gives E(X²), which we need for the variance shortcut.
There's also a factorial moment trick that can simplify certain variance calculations:
Second factorial moment
E[X(X - 1)] = E(X²) - E(X)
So: σ² = E[X(X - 1)] + E(X) - μ²
Hypergeometric variance (via factorial moments)
σ² = n(N₁/N)(N₂/N)[(N-n)/(N-1)] = np(1-p)[(N-n)/(N-1)]
The factor (N-n)/(N-1) is called the "finite population correction." It makes variance smaller than what you'd get with replacement, because sampling without replacement reduces variability.
9. The Moment-Generating Function (MGF)
What if there was a single function that packaged the entire distribution into one formula — and you could extract any moment just by taking derivatives?
That's exactly what the moment-generating function does.
Definition
M(t) = E(etX) = Σ etx f(x)
Here's the magic — the derivatives at t = 0 give you the moments:
Plug in t = 0
M(0) = 1
Always equals 1
First derivative
M′(0) = E(X) = μ
Gives the mean
Second derivative
M″(0) = E(X²)
Then: σ² = M″(0) - [M′(0)]²
The Uniqueness Property (Exam Important!)
If two random variables have the same MGF, they must have the same distribution. The MGF is like a fingerprint — it uniquely identifies the distribution. This will be very useful when we prove things about sums of random variables later.
Example: Reading a PMF from an MGF
M(t) = (3/6)et + (2/6)e2t + (1/6)e3t
The coefficient of ext is f(x). So: f(1) = 3/6, f(2) = 2/6, f(3) = 1/6. It's our loaded die distribution, encoded in one formula!
Geometric distribution MGF and variance
f(x) = qx-1p, x = 1, 2, 3, ...
M(t) = pet / (1 - qet)
Differentiating gives: μ = 1/p, σ² = q/p²
For example, if p = 1/6 (rolling a 6): μ = 6 trials on average, σ² = (5/6)/(1/36) = 30, σ ≈ 5.48.
Quick Reference: Key Formulas
Variance (definition)
σ² = Σ(x - μ)² f(x)
Variance (shortcut)
σ² = E(X²) - μ²
Linear transform
Var(aX + b) = a²Var(X)
Uniform {1,...,m}
σ² = (m² - 1)/12
Geometric variance
σ² = q/p²
Hypergeometric variance
σ² = np(1-p)(N-n)/(N-1)
MGF
M(t) = E(etX)
Moments from MGF
M(r)(0) = E(Xr)