12 Estimating the population mean
In this chapter, we’ll explore three fundamental properties of statistical estimators that form the backbone of statistical inference. We’ll prove that the sample mean is an unbiased estimator of the population mean, demonstrate that it’s the most efficient among all unbiased estimators, and examine why the sample variance requires a correction factor. These proofs are not merely mathematical exercises—they reveal deep truths about how we can reliably learn about populations from samples.
By the end of this chapter, you will understand:
- What makes an estimator “unbiased” and why this matters
- How to compare estimators using the criterion of efficiency
- Why the sample variance formula uses \(n-1\) instead of \(n\)
- The relationship between sample statistics and population parameters
12.1 The Unbiasedness of the Sample Mean
Let’s begin with a fundamental question that underlies all of statistical inference.
Understanding Unbiasedness
An estimator is unbiased if its expected value equals the parameter it’s trying to estimate. For the sample mean \(\bar{Y}\) estimating the population mean \(\mu\), we want to show:
\[ E[\bar{Y}] = \mu \]
This is a powerful property because it guarantees that our estimator has no systematic tendency to overestimate or underestimate the true parameter. Some samples will give us values above \(\mu\), others below, but on average—across infinitely many samples—we hit the target exactly.
The Proof
The proof is remarkably elegant. Let’s work through it step by step.
We start with the definition of the sample mean:
\[ \bar{Y} = \frac{Y_1 + Y_2 + \cdots + Y_n}{n} \]
Taking the expected value of both sides:
\[ E[\bar{Y}] = E\left[\frac{Y_1 + Y_2 + \cdots + Y_n}{n}\right] \]
Since \(n\) is a constant (our fixed sample size), we can factor it out using the linearity of expectation:
\[ E[\bar{Y}] = \frac{1}{n} E[Y_1 + Y_2 + \cdots + Y_n] \]
Using the linearity property again, the expectation of a sum equals the sum of expectations:
\[ E[\bar{Y}] = \frac{1}{n} \left(E[Y_1] + E[Y_2] + \cdots + E[Y_n]\right) \]
Now comes the key insight. Each observation \(Y_i\) is drawn from the same population, so each has the same expected value—the population mean \(\mu\). Think about it this way: if you could observe infinitely many “first observations” from different samples, their distribution would look exactly like the population distribution, and their mean would be \(\mu\). The same holds for the second observation, the third, and all others.
Therefore:
\[ E[\bar{Y}] = \frac{1}{n}(\mu + \mu + \cdots + \mu) = \frac{1}{n}(n\mu) = \mu \]
The proof is complete. ∎
12.2 The Efficiency of the Sample Mean
Proving unbiasedness is just the first step. After all, there are infinitely many unbiased estimators of the population mean. We need another criterion to choose among them.
The Problem of Too Many Estimators
Consider this: the first observation \(Y_1\) by itself is an unbiased estimator of \(\mu\) since \(E[Y_1] = \mu\). So is the second observation. So is any weighted average of your observations, as long as the weights sum to one. For example:
\[ T = \frac{1}{4}Y_1 + \frac{3}{4}Y_2 \]
This is unbiased (you can verify that \(E[T] = \mu\)). But it completely ignores observations 3 through \(n\) if you have more data! Intuitively, this seems wasteful. We need a way to filter these infinitely many unbiased estimators.
The Efficiency Criterion
We use efficiency as our second filter. An efficient estimator is one that has the minimum variance among all unbiased estimators. Why variance? Because variance measures precision—how much our estimates vary from sample to sample. Lower variance means more consistent, reliable estimates.
Setting Up the Proof
Let’s define a general unbiased estimator as a weighted combination of our observations:
\[ T = \sum_{i=1}^{n} A_i Y_i \]
where the \(A_i\) are arbitrary weights (constants). Since we’re restricting attention to unbiased estimators, we require:
\[ E[T] = \mu \]
Let’s see what this constraint implies. Taking expectations:
\[ E[T] = E\left[\sum_{i=1}^{n} A_i Y_i\right] = \sum_{i=1}^{n} A_i E[Y_i] = \sum_{i=1}^{n} A_i \mu = \mu \sum_{i=1}^{n} A_i \]
For this to equal \(\mu\), we need:
\[ \sum_{i=1}^{n} A_i = 1 \]
This is our first constraint: the weights must sum to one. The sample mean satisfies this with \(A_i = 1/n\) for all \(i\).
Finding the Variance
Now let’s calculate the variance of our general estimator \(T\):
\[ \text{Var}(T) = \text{Var}\left(\sum_{i=1}^{n} A_i Y_i\right) \]
Since the \(A_i\) are constants and the observations are independent:
\[ \text{Var}(T) = \sum_{i=1}^{n} A_i^2 \text{Var}(Y_i) = \sum_{i=1}^{n} A_i^2 \sigma^2 = \sigma^2 \sum_{i=1}^{n} A_i^2 \]
where \(\sigma^2\) is the population variance (assumed to be the same for all observations, since they’re all drawn from the same population).
The Key Inequality
Now we employ a clever algebraic trick. Notice that:
\[ \sum_{i=1}^{n} A_i^2 = \sum_{i=1}^{n} \left(A_i - \frac{1}{n}\right)^2 + \frac{1}{n} \]
This might seem to come out of nowhere, but let’s verify it by expanding the squared term:
\[\begin{align} \sum_{i=1}^{n} \left(A_i - \frac{1}{n}\right)^2 &= \sum_{i=1}^{n} \left(A_i^2 - 2A_i \cdot \frac{1}{n} + \frac{1}{n^2}\right)\\ &= \sum_{i=1}^{n} A_i^2 - \frac{2}{n}\sum_{i=1}^{n} A_i + \sum_{i=1}^{n}\frac{1}{n^2}\\ &= \sum_{i=1}^{n} A_i^2 - \frac{2}{n}(1) + \frac{n}{n^2}\\ &= \sum_{i=1}^{n} A_i^2 - \frac{2}{n} + \frac{1}{n}\\ &= \sum_{i=1}^{n} A_i^2 - \frac{1}{n} \end{align}\]
where we used the constraint that \(\sum A_i = 1\). Rearranging gives us the identity we claimed.
Completing the Proof
Since squares are always non-negative, we have:
\[ \sum_{i=1}^{n} \left(A_i - \frac{1}{n}\right)^2 \geq 0 \]
with equality if and only if \(A_i = 1/n\) for all \(i\). Therefore:
\[ \sum_{i=1}^{n} A_i^2 \geq \frac{1}{n} \]
Multiplying both sides by \(\sigma^2\):
\[ \text{Var}(T) = \sigma^2 \sum_{i=1}^{n} A_i^2 \geq \frac{\sigma^2}{n} = \text{Var}(\bar{Y}) \]
The minimum variance is achieved when \(A_i = 1/n\) for all \(i\)—which is precisely the sample mean! ∎
12.3 The Bias of the Sample Variance
Having established the virtues of the sample mean, we now turn to a more subtle problem: estimating the population variance \(\sigma^2\).
Defining the Sample Variance
Let’s define what we’ll call the “uncorrected” sample variance:
\[ S^2 = \frac{1}{n}\sum_{i=1}^{n}(Y_i - \bar{Y})^2 \]
This is the natural definition—it’s the average squared deviation from the sample mean. But is it unbiased? To answer this, we need to calculate \(E[S^2]\) and see if it equals \(\sigma^2\).
A Clever Algebraic Manipulation
The key to this proof is recognizing that we can rewrite each deviation \((Y_i - \bar{Y})\) in terms of deviations from the true population mean \(\mu\):
\[ Y_i - \bar{Y} = (Y_i - \mu) - (\bar{Y} - \mu) \]
This is just adding and subtracting \(\mu\). Now let’s square both sides:
\[ (Y_i - \bar{Y})^2 = [(Y_i - \mu) - (\bar{Y} - \mu)]^2 \]
Expanding the square:
\[ (Y_i - \bar{Y})^2 = (Y_i - \mu)^2 - 2(Y_i - \mu)(\bar{Y} - \mu) + (\bar{Y} - \mu)^2 \]
Summing Over All Observations
Now sum both sides from \(i=1\) to \(n\):
\[ \sum_{i=1}^{n}(Y_i - \bar{Y})^2 = \sum_{i=1}^{n}(Y_i - \mu)^2 - 2(\bar{Y} - \mu)\sum_{i=1}^{n}(Y_i - \mu) + \sum_{i=1}^{n}(\bar{Y} - \mu)^2 \]
Let’s examine each term carefully. For the middle term, notice that:
\[ \sum_{i=1}^{n}(Y_i - \mu) = \sum_{i=1}^{n}Y_i - n\mu = n\bar{Y} - n\mu = n(\bar{Y} - \mu) \]
So the middle term becomes:
\[ -2(\bar{Y} - \mu) \cdot n(\bar{Y} - \mu) = -2n(\bar{Y} - \mu)^2 \]
For the last term, \((\bar{Y} - \mu)^2\) doesn’t depend on \(i\), so:
\[ \sum_{i=1}^{n}(\bar{Y} - \mu)^2 = n(\bar{Y} - \mu)^2 \]
Putting it all together:
\[ \sum_{i=1}^{n}(Y_i - \bar{Y})^2 = \sum_{i=1}^{n}(Y_i - \mu)^2 - 2n(\bar{Y} - \mu)^2 + n(\bar{Y} - \mu)^2 \]
\[ = \sum_{i=1}^{n}(Y_i - \mu)^2 - n(\bar{Y} - \mu)^2 \]
Therefore, dividing both sides by \(n\):
\[ S^2 = \frac{1}{n}\sum_{i=1}^{n}(Y_i - \mu)^2 - (\bar{Y} - \mu)^2 \]
Taking Expectations
Now we take the expected value of both sides:
\[ E[S^2] = E\left[\frac{1}{n}\sum_{i=1}^{n}(Y_i - \mu)^2\right] - E[(\bar{Y} - \mu)^2] \]
Let’s evaluate each term. For the first term:
\[ E\left[\frac{1}{n}\sum_{i=1}^{n}(Y_i - \mu)^2\right] = \frac{1}{n}\sum_{i=1}^{n}E[(Y_i - \mu)^2] \]
But \(E[(Y_i - \mu)^2]\) is precisely the definition of the population variance \(\sigma^2\) (the expected squared deviation from the population mean). So:
\[ \frac{1}{n}\sum_{i=1}^{n}E[(Y_i - \mu)^2] = \frac{1}{n} \cdot n\sigma^2 = \sigma^2 \]
For the second term, \(E[(\bar{Y} - \mu)^2]\) is the expected squared deviation of the sample mean from the population mean—which is exactly the variance of the sample mean:
\[ E[(\bar{Y} - \mu)^2] = \text{Var}(\bar{Y}) = \frac{\sigma^2}{n} \]
We proved this earlier when showing that the sample mean is efficient.
The Final Result
Combining these results:
\[ E[S^2] = \sigma^2 - \frac{\sigma^2}{n} = \sigma^2\left(1 - \frac{1}{n}\right) = \sigma^2 \cdot \frac{n-1}{n} \]
This is the crucial finding: the expected value of \(S^2\) is not \(\sigma^2\), but rather \(\sigma^2 \cdot \frac{n-1}{n}\). Since \((n-1)/n < 1\) for all \(n > 1\), the uncorrected sample variance systematically underestimates the true population variance. It is biased downward.
12.4 A Challenge for You
Now that we’ve proven the sample variance with \(n\) in the denominator is biased, here’s a thought question:
12.5 Synthesis and Reflection
Let’s step back and consider what these three proofs reveal about the nature of statistical estimation.
The sample mean emerged as the gold standard estimator not by accident, but because it possesses two fundamental virtues: it’s unbiased (correct on average) and efficient (most precise). These aren’t just mathematical curiosities—they have practical implications. When you calculate a sample mean, you can trust that you’re using the best possible estimator given your data.
The sample variance case is more subtle. The natural estimator \(S^2\) turns out to be biased, but the bias is systematic and predictable, allowing us to correct it. This illustrates an important principle: not all biases are equal. A systematic, known bias that we can correct is far less problematic than an unknown or random error.
Moreover, these proofs showcase the power of mathematical statistics. We’re not guessing or using intuition—we’re proving with logical certainty that our estimators have desirable properties. This rigor is what allows statistics to be a reliable tool for scientific inference.
These results form the foundation for much of what follows in statistical inference. Understanding why they’re true—not just memorizing the formulas—will serve you well as we build toward more sophisticated techniques.