Sampling and estimation
In this chapter: Sample mean and variance · Central Limit Theorem · Confidence intervals (known and unknown σ)
You rarely have the full population — you sample. Sample statistics estimate population parameters. The Central Limit Theorem tells us that sample means follow a normal distribution regardless of the population distribution (for large enough n). This is the foundation of hypothesis testing, confidence intervals, and statistical inference. Master this reading and the next two on hypothesis testing become natural extensions.
Population parameters (true values, often unknown): • μ — population mean • σ² — population variance • σ — population SD Sample statistics (computed from data): • X̄ — sample mean = Σx_i / n • s² — sample variance = Σ(x_i − X̄)² / (n − 1) • s — sample SD = √s² Note: sample variance uses (n − 1) in the denominator (degrees of freedom adjustment) for unbiased estimation. Population variance uses n. Central Limit Theorem (CLT): the distribution of the sample mean X̄ approaches normal as n grows, regardless of the underlying population distribution. Mean = μ, SD (called Standard Error) = σ/√n. This is why we can do statistical inference even when the underlying data is non-normal.
Standard Error (SE) — the SD of the sample mean: SE = σ / √n (when σ known) SE = s / √n (when σ unknown, using sample SD) Critical insight: SE shrinks as √n, not n. Doubling sample size cuts SE by √2 ≈ 1.41× — diminishing returns. To halve uncertainty, need 4× the sample. Confidence intervals: For known σ, large n (use Z): CI = X̄ ± Z_α/2 × (σ / √n) For unknown σ (use sample s and t-distribution): CI = X̄ ± t_α/2,n−1 × (s / √n) Common confidence levels: 90% CI: Z = 1.65 95% CI: Z = 1.96 99% CI: Z = 2.58 As n grows, t-distribution → Z-distribution. For n > 30, they're practically identical.
Critical practitioner insight: confidence interval interpretation. A 95% CI of [10%, 14%] for fund return DOES NOT mean "95% probability the true mean is between 10% and 14%". It means "95% of CIs constructed this way would contain the true mean if we repeated the experiment many times". Why this matters: if you sample once and get a CI, the true mean is either in it or not — there's no probability about it after the fact. The 95% refers to the procedure, not the specific interval. Finance applications: • Estimating fund manager alpha: CI tells us whether observed alpha could be statistical noise. • Beta estimation: regression coefficient comes with a CI; a CI containing zero means we can't reject "no market exposure". • Sharpe ratio confidence: Sharpe estimates have CIs; selecting "top quartile" managers using point estimates ignores this. A fund with ₹1.5% reported alpha but with a CI of [−1%, +4%] — the alpha could easily be zero. Marketing people don't show CIs. Sophisticated allocators demand them.
- CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 5
- GIPS (Global Investment Performance Standards) — confidence-based reporting
- SEBI MF performance disclosure standards
- Using σ when you should use s (sample SD) — affects CI width.
- Forgetting that CI interpretation is about the procedure, not the specific interval.
- Comparing sample means without computing CIs — small differences may be noise.
- Confusing standard deviation (spread of observations) with standard error (spread of mean estimate).
- Reporting fund alpha as "skill" without statistical significance test.
Frequently asked
When should I use t-distribution vs Z-distribution?
Why does sample variance use (n-1) instead of n?
How big does my sample need to be for CLT to apply?
Practice questions
Click each question to reveal the answer and explanation.
Q 1A sample of 100 fund returns has mean 12%, SD 18%. The standard error of the mean is closest to:- (a)0.18%
- (b)1.8%
- (c)12%
- (d)18%
- (a)0.18%
- (b)1.8%
- (c)12%
- (d)18%
Q 2Doubling sample size from 100 to 200 changes the standard error by:- (a)Halves it
- (b)Reduces by factor of √2 ≈ 1.41
- (c)Doubles it
- (d)No change
- (a)Halves it
- (b)Reduces by factor of √2 ≈ 1.41
- (c)Doubles it
- (d)No change
Q 3A 95% confidence interval for the population mean (using Z = 1.96):- (a)Means 95% probability the true mean is in this interval
- (b)Means 95% of similarly-constructed intervals would contain the true mean
- (c)Cannot be calculated without knowing σ
- (d)Always equals the sample mean
- (a)Means 95% probability the true mean is in this interval
- (b)Means 95% of similarly-constructed intervals would contain the true mean
- (c)Cannot be calculated without knowing σ
- (d)Always equals the sample mean
Q 4Sample mean 8%, sample SD 5%, n = 25. 95% CI (assume normal, t ≈ Z) is closest to:- (a)(6%, 10%)
- (b)(5.6%, 10.4%)
- (c)(7%, 9%)
- (d)(0%, 16%)
- (a)(6%, 10%)
- (b)(5.6%, 10.4%)
- (c)(7%, 9%)
- (d)(0%, 16%)
Q 5The Central Limit Theorem states:- (a)Sample mean approaches population mean for any sample size
- (b)Sample mean distribution approaches normal for large n, regardless of underlying distribution
- (c)All distributions are normal for large samples
- (d)Variance is always equal to mean
- (a)Sample mean approaches population mean for any sample size
- (b)Sample mean distribution approaches normal for large n, regardless of underlying distribution
- (c)All distributions are normal for large samples
- (d)Variance is always equal to mean