4 Expectation and Variance Operators
Statistical operators are powerful tools that transform random variables in systematic ways. In this chapter, we’ll explore two fundamental operators: the expectation operator and the variance operator. These operators will appear throughout the rest of this book, so understanding their properties deeply will pay dividends as we tackle more complex statistical concepts.
By the end of this chapter, you will be able to:
- Define what an operator is in the statistical context
- Calculate and interpret expected values
- Calculate and interpret variances
- Apply the properties of expectation and variance to simplify complex expressions
- Understand how these operators behave under linear transformations
4.1 What is an Operator?
Think of an operator as a special kind of function that acts on random variables rather than on simple numbers. Just as the square root function takes a number and returns another number, statistical operators take random variables and return quantities that summarize key features of those variables.
4.2 The Expectation Operator
Intuition and Definition
Intuitively, a random variable’s expected value represents the average we would see if we observed many independent realizations of that variable. For example, if we roll a fair six-sided die thousands of times and compute the average of all the outcomes, that average will converge to 3.5. This value—3.5—is the expected value of the die roll.
More generally, we can write this as:
\[ \mathbb{E}[X] = \sum_{i=1}^n p_i X_i = \mu \]
where we often use the Greek letter \(\mu\) (mu) to denote the expected value.
For continuous random variables, the sum becomes an integral:
\[ \mathbb{E}[X] = \int_{\mathbb{R}} x f(x) \, dx \]
where \(f(x)\) is the probability density function of \(X\).
Properties of the Expectation Operator
The expectation operator has several important properties that make it remarkably useful for statistical analysis. These properties allow us to simplify complex calculations and derive important results.
Property 1: Non-negativity
If \(X\) is a random variable such that \(\mathrm{P}(X \geq 0) = 1\) (that is, \(X\) is always non-negative), then \(\mathbb{E}[X] \geq 0\).
This property formalizes an intuitive idea: if a random variable can only take non-negative values, its average must also be non-negative.
Property 2: Expectation of a Constant
If \(X\) is a random variable such that \(\mathrm{P}(X = r) = 1\) for some fixed number \(r\), then \(\mathbb{E}[X] = r\). In other words, the expectation of a constant equals that constant.
This property tells us that constants behave exactly as we’d expect under the expectation operator—their “average” value is simply themselves.
Property 3: Linearity
The expectation operator is linear. Given two random variables \(X\) and \(Y\) and two real constants \(a\) and \(b\):
\[ \mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y] \]
Additional properties that follow from linearity include:
\[ \begin{aligned} \mathbb{E}[kY] &= k\mathbb{E}[Y] \quad \text{(scaling)} \\ \mathbb{E}[X + Y] &= \mathbb{E}[X] + \mathbb{E}[Y] \quad \text{(additivity)} \end{aligned} \]
4.3 The Variance Operator
Intuition and Definition
While the expected value tells us about the center of a distribution, it says nothing about the spread. Consider two random variables: one that always equals 10, and one that equals 0 half the time and 20 half the time. Both have an expected value of 10, but they behave very differently. The variance operator captures this difference.
Variance measures how far a set of random values typically lie from their expected value. A small variance indicates that values cluster tightly around the mean; a large variance indicates that values are more dispersed.
For a discrete random variable, we can write this explicitly as:
\[ \mathrm{Var}(Y) = \sum_{i=1}^n p_i (Y_i - \mu)^2 = \sigma^2 \]
An Alternative Formula
The definition of variance can be algebraically rearranged into a form that’s often more convenient for computation:
\[ \begin{aligned} \mathrm{Var}(X) &= \mathbb{E}[(X - \mathbb{E}[X])^2] \\ &= \mathbb{E}[X^2 - 2X\mathbb{E}[X] + \mathbb{E}[X]^2] \\ &= \mathbb{E}[X^2] - 2\mathbb{E}[X]\mathbb{E}[X] + \mathbb{E}[X]^2 \\ &= \mathbb{E}[X^2] - \mathbb{E}[X]^2 \end{aligned} \]
This gives us the memorable formula:
\[ \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \]
In words: the variance equals the expected value of the square minus the square of the expected value. This computational formula is often easier to work with than the definitional formula.
Properties of the Variance Operator
Property 1: Non-negativity
Variance is always non-negative: \(\mathrm{Var}(X) \geq 0\).
Property 2: Variance of a Constant
The variance of a constant is zero: \(\mathrm{Var}(a) = 0\).
This makes intuitive sense: if a variable doesn’t vary (it’s constant), its variance should be zero.
Property 3: Zero Variance Implies Constant
If the variance of a random variable is zero, then the variable must be constant with probability 1: \(\mathrm{Var}(X) = 0 \Rightarrow \mathrm{P}(X = a) = 1\) for some constant \(a\).
Together, Properties 2 and 3 tell us that constants are precisely the random variables with zero variance—and vice versa.
Property 4: Variance of a Sum
The variance of a sum of two random variables is:
\[ \mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2\mathrm{Cov}(X,Y) \]
where \(\mathrm{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]\) is the covariance between \(X\) and \(Y\).
Property 5: Variance is Invariant to Location Shifts
If a constant is added to all values of a variable, the variance is unchanged:
\[ \mathrm{Var}(X + a) = \mathrm{Var}(X) \]
This property reflects the fact that variance measures spread, not location. Shifting all values by the same amount doesn’t change how spread out they are.
Property 6: Variance Under Scaling
If all values are scaled by a constant, the variance is scaled by the square of that constant:
\[ \mathrm{Var}(aX) = a^2 \mathrm{Var}(X) \]
Property 7: Variance of a Sum of Independent Identically Distributed Variables
If \(Y_1, Y_2, \ldots, Y_n\) are independent and identically distributed random variables, then:
\[ \mathrm{Var}\left(\sum_{i=1}^n Y_i\right) = \sum_{i=1}^n \mathrm{Var}(Y_i) = n\mathrm{Var}(Y) \]
where the last equality uses the fact that all the \(Y_i\) have the same variance.
This property is fundamental to understanding sampling distributions and the behavior of sample means.
4.4 Putting It All Together
Let’s work through a comprehensive example that uses both operators and their properties.
4.5 Summary
The expectation and variance operators are fundamental tools in probability and statistics. The expectation operator \(\mathbb{E}[\cdot]\) captures the center or average of a distribution, while the variance operator \(\mathrm{Var}(\cdot)\) captures its spread.
Key takeaways:
- Expectation is linear: \(\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]\), regardless of dependence
- Variance is not linear: \(\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)\) only when \(X\) and \(Y\) are independent
- Adding constants doesn’t change variance: \(\mathrm{Var}(X + a) = \mathrm{Var}(X)\)
- Scaling affects variance quadratically: \(\mathrm{Var}(aX) = a^2\mathrm{Var}(X)\)
These operators and their properties will appear repeatedly throughout your study of statistics. Mastering them now will make everything that follows much more intuitive.