2.3 Special Mathematical Expectations

Variance, standard deviation, moments, and MGFs

1. Why the Mean Isn't Enough

Imagine two carnival games. Both cost $5 to play:

Game A: "The Sure Thing"

You always win exactly $5 back.

Average payout = $5.00

Game B: "The Coin Flip"

Flip a coin: Heads = $10, Tails = $0.

Average payout = $5.00

Both games have the same average payout (μ = $5). But they feel completely different. Game A is safe. Game B is risky — you might walk away with nothing or double your money.

The mean tells you the center. But it says nothing about how spread out the outcomes are. We need a new number that captures this "spread" or "risk." That number is the variance.

2. What Does "Spread" Look Like?

Picture dots on a number line. Each dot is a possible outcome, and its size represents its probability. The mean μ is the balance point.

Low spread: values cluster near μ

μ

Small variance

High spread: values are far from μ

μ

Large variance

Variance answers: "On average, how far are the outcomes from the mean?" (Specifically, the average squared distance — we'll see why squared in a moment.)

3. Building the Variance Formula Step by Step

Let's measure spread systematically. We want to know "how far is each outcome from the mean?"

Step 1: Distance from the mean

For each outcome x, the distance from μ is: (x - μ)

Problem: Some distances are positive (x > μ) and some are negative (x < μ). If we just average them, they cancel out and we always get zero. Not helpful!

Step 2: Square the distances

Squaring makes all distances positive: (x - μ)²

Now big deviations (far from the mean) get amplified, which is exactly what we want — an outcome 4 units away contributes 16 to variance, not just 4.

Step 3: Weight by probability and add up

Multiply each squared distance by its probability, then sum everything:

σ² = Σ (x - μ)² · f(x)

This is the variance: the average squared distance from the mean, weighted by probability.

Step 4: Standard deviation undoes the squaring

Since we squared the distances, variance is in "squared units." To get back to the original units, take the square root. This is the standard deviation:

σ = √σ²

4. Worked Example: A Loaded Die

A weighted game piece has outcomes 1, 2, 3 with probabilities 3/6, 2/6, 1/6. Let's compute the variance.

First, find the mean:

μ = 1(3/6) + 2(2/6) + 3(1/6) = 10/6 = 5/3 ≈ 1.667

Now compute each squared distance times probability:

xx - μ(x - μ)²f(x)(x-μ)² · f(x)
1-2/34/93/612/54
2+1/31/92/62/54
3+4/316/91/616/54

σ² = 12/54 + 2/54 + 16/54 = 30/54 = 5/9 ≈ 0.556

σ = √(5/9) ≈ 0.745

On average, outcomes are about 0.745 units away from the mean of 1.667.

5. A Shortcut That Saves Real Time

Computing (x - μ) for every value gets tedious, especially when μ is a fraction. There's a much faster way. Here's the algebra trick:

Start with the definition and expand:

σ² = E[(X - μ)²]

   = E[X² - 2μX + μ²]

   = E(X²) - 2μE(X) + μ²

   = E(X²) - 2μ² + μ²

   = E(X²) - μ²

The Shortcut (memorize this!)

σ² = E(X²) - [E(X)]²

"The mean of the square minus the square of the mean."

Verify with our loaded die example:

E(X²) = 1²(3/6) + 2²(2/6) + 3²(1/6) = (3 + 8 + 9)/6 = 20/6

σ² = 20/6 - (5/3)² = 20/6 - 25/9 = (60 - 50)/18 = 10/18 = 5/9

Same answer, but we never had to compute (x - μ) for each row!

6. Seeing Standard Deviation in Action

Two distributions, both centered at 0, but with different spreads:

X: values {-1, 0, 1}

Each equally likely (probability 1/3)

μ = 0

σ² = 2/3

σ = √(2/3) ≈ 0.816

Y: values {-2, 0, 2}

Each equally likely (probability 1/3)

μ = 0

σ² = 8/3

σ = 2√(2/3) ≈ 1.633

Y = 2X — every value is doubled. The standard deviation also doubles: σY = 2σX. The variance quadruples: σ²Y = 4σ²X.

General rule: Y = aX + b

μY = aμX + b   (shifting and scaling moves the center)

σ²Y = a²σ²X   (only scaling affects spread — adding b does nothing!)

σY = |a|σX

Why doesn't adding a constant change variance?

Think about it: if you add 100 to every outcome, the mean also shifts by 100. Every outcome is still the same distance from the new mean. Nothing about the spread changed — you just slid everything along the number line.

7. Classic Example: Fair Six-Sided Die

f(x) = 1/6 for x = 1, 2, 3, 4, 5, 6. For the uniform distribution on {1, ..., m}:

Uniform {1, ..., m} Formulas

μ = (m + 1) / 2

σ² = (m² - 1) / 12

For m = 6 (a fair die):

μ = 7/2 = 3.5

σ² = (36 - 1)/12 = 35/12 ≈ 2.917

σ = √(35/12) ≈ 1.708

8. Moments: A Family of Summaries

The mean is the "first moment" and variance uses the "second moment." In general, the rth moment about the origin is:

E(Xr) = Σ xr · f(x)

r = 1 gives E(X) = μ.   r = 2 gives E(X²), which we need for the variance shortcut.

There's also a factorial moment trick that can simplify certain variance calculations:

Second factorial moment

E[X(X - 1)] = E(X²) - E(X)

So: σ² = E[X(X - 1)] + E(X) - μ²

Hypergeometric variance (via factorial moments)

σ² = n(N₁/N)(N₂/N)[(N-n)/(N-1)] = np(1-p)[(N-n)/(N-1)]

The factor (N-n)/(N-1) is called the "finite population correction." It makes variance smaller than what you'd get with replacement, because sampling without replacement reduces variability.

9. The Moment-Generating Function (MGF)

What if there was a single function that packaged the entire distribution into one formula — and you could extract any moment just by taking derivatives?

That's exactly what the moment-generating function does.

Definition

M(t) = E(etX) = Σ etx f(x)

Here's the magic — the derivatives at t = 0 give you the moments:

Plug in t = 0

M(0) = 1

Always equals 1

First derivative

M′(0) = E(X) = μ

Gives the mean

Second derivative

M″(0) = E(X²)

Then: σ² = M″(0) - [M′(0)]²

The Uniqueness Property (Exam Important!)

If two random variables have the same MGF, they must have the same distribution. The MGF is like a fingerprint — it uniquely identifies the distribution. This will be very useful when we prove things about sums of random variables later.

Example: Reading a PMF from an MGF

M(t) = (3/6)et + (2/6)e2t + (1/6)e3t

The coefficient of ext is f(x). So: f(1) = 3/6, f(2) = 2/6, f(3) = 1/6. It's our loaded die distribution, encoded in one formula!

Geometric distribution MGF and variance

f(x) = qx-1p,   x = 1, 2, 3, ...

M(t) = pet / (1 - qet)

Differentiating gives: μ = 1/p,   σ² = q/p²

For example, if p = 1/6 (rolling a 6): μ = 6 trials on average, σ² = (5/6)/(1/36) = 30, σ ≈ 5.48.

Quick Reference: Key Formulas

Variance (definition)

σ² = Σ(x - μ)² f(x)

Variance (shortcut)

σ² = E(X²) - μ²

Linear transform

Var(aX + b) = a²Var(X)

Uniform {1,...,m}

σ² = (m² - 1)/12

Geometric variance

σ² = q/p²

Hypergeometric variance

σ² = np(1-p)(N-n)/(N-1)

MGF

M(t) = E(etX)

Moments from MGF

M(r)(0) = E(Xr)