Trustner AcademyTrustner AcademyCourses
Reading 5CFA L1 QuantFull chapter

Sampling and estimation

In this chapter: Sample mean and variance · Central Limit Theorem · Confidence intervals (known and unknown σ)

~6 min readLayer 4 · Professional CertificationsFree

You rarely have the full population — you sample. Sample statistics estimate population parameters. The Central Limit Theorem tells us that sample means follow a normal distribution regardless of the population distribution (for large enough n). This is the foundation of hypothesis testing, confidence intervals, and statistical inference. Master this reading and the next two on hypothesis testing become natural extensions.

Foundation

Population parameters (true values, often unknown): • μ — population mean • σ² — population variance • σ — population SD Sample statistics (computed from data): • X̄ — sample mean = Σx_i / n • s² — sample variance = Σ(x_i − X̄)² / (n − 1) • s — sample SD = √s² Note: sample variance uses (n − 1) in the denominator (degrees of freedom adjustment) for unbiased estimation. Population variance uses n. Central Limit Theorem (CLT): the distribution of the sample mean X̄ approaches normal as n grows, regardless of the underlying population distribution. Mean = μ, SD (called Standard Error) = σ/√n. This is why we can do statistical inference even when the underlying data is non-normal.

Deep Dive

Standard Error (SE) — the SD of the sample mean: SE = σ / √n (when σ known) SE = s / √n (when σ unknown, using sample SD) Critical insight: SE shrinks as √n, not n. Doubling sample size cuts SE by √2 ≈ 1.41× — diminishing returns. To halve uncertainty, need 4× the sample. Confidence intervals: For known σ, large n (use Z): CI = X̄ ± Z_α/2 × (σ / √n) For unknown σ (use sample s and t-distribution): CI = X̄ ± t_α/2,n−1 × (s / √n) Common confidence levels: 90% CI: Z = 1.65 95% CI: Z = 1.96 99% CI: Z = 2.58 As n grows, t-distribution → Z-distribution. For n > 30, they're practically identical.

Advanced

Critical practitioner insight: confidence interval interpretation. A 95% CI of [10%, 14%] for fund return DOES NOT mean "95% probability the true mean is between 10% and 14%". It means "95% of CIs constructed this way would contain the true mean if we repeated the experiment many times". Why this matters: if you sample once and get a CI, the true mean is either in it or not — there's no probability about it after the fact. The 95% refers to the procedure, not the specific interval. Finance applications: • Estimating fund manager alpha: CI tells us whether observed alpha could be statistical noise. • Beta estimation: regression coefficient comes with a CI; a CI containing zero means we can't reject "no market exposure". • Sharpe ratio confidence: Sharpe estimates have CIs; selecting "top quartile" managers using point estimates ignores this. A fund with ₹1.5% reported alpha but with a CI of [−1%, +4%] — the alpha could easily be zero. Marketing people don't show CIs. Sophisticated allocators demand them.

Regulatory references
  • CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 5
  • GIPS (Global Investment Performance Standards) — confidence-based reporting
  • SEBI MF performance disclosure standards
Common mistakes & pitfalls
  • Using σ when you should use s (sample SD) — affects CI width.
  • Forgetting that CI interpretation is about the procedure, not the specific interval.
  • Comparing sample means without computing CIs — small differences may be noise.
  • Confusing standard deviation (spread of observations) with standard error (spread of mean estimate).
  • Reporting fund alpha as "skill" without statistical significance test.

Frequently asked

When should I use t-distribution vs Z-distribution?
Use t when σ is unknown and you're using sample SD. Use Z when σ is known. For n > 30, they're nearly identical. CFA L1 mostly uses Z; L2 introduces t in regression contexts.
Why does sample variance use (n-1) instead of n?
Mathematical correction (Bessel's correction). Using sample mean introduces a small bias; dividing by (n-1) corrects it. With n=100, the difference between dividing by n and (n-1) is 1% — small, but exact for unbiased estimation.
How big does my sample need to be for CLT to apply?
Generally n ≥ 30 is the rule of thumb. For nearly-normal distributions, n=10 is fine. For very skewed distributions, may need n > 100. Asset returns are roughly normal, so n=30 usually works.

Practice questions

Click each question to reveal the answer and explanation.

Q 1
A sample of 100 fund returns has mean 12%, SD 18%. The standard error of the mean is closest to:
  1. (a)0.18%
  2. (b)1.8%
  3. (c)12%
  4. (d)18%
Correct: (b) 1.8%
SE = s / √n = 18% / √100 = 18% / 10 = 1.8%.
Q 2
Doubling sample size from 100 to 200 changes the standard error by:
  1. (a)Halves it
  2. (b)Reduces by factor of √2 ≈ 1.41
  3. (c)Doubles it
  4. (d)No change
Correct: (b) Reduces by factor of √2 ≈ 1.41
SE = σ / √n. If n doubles, √n increases by √2 ≈ 1.41. So SE shrinks by factor of √2 — not 2.
Q 3
A 95% confidence interval for the population mean (using Z = 1.96):
  1. (a)Means 95% probability the true mean is in this interval
  2. (b)Means 95% of similarly-constructed intervals would contain the true mean
  3. (c)Cannot be calculated without knowing σ
  4. (d)Always equals the sample mean
Correct: (b) Means 95% of similarly-constructed intervals would contain the true mean
CI is about the procedure: 95% of intervals so constructed would contain the true mean. After sampling, the specific interval either contains μ or it doesn't.
Q 4
Sample mean 8%, sample SD 5%, n = 25. 95% CI (assume normal, t ≈ Z) is closest to:
  1. (a)(6%, 10%)
  2. (b)(5.6%, 10.4%)
  3. (c)(7%, 9%)
  4. (d)(0%, 16%)
Correct: (b) (5.6%, 10.4%)
SE = 5/√25 = 1%. CI = 8% ± 1.96 × 1% = (6.04%, 9.96%) ≈ (6%, 10%). Closest is option (b) at (5.6%, 10.4%) using more precise t-table values for n=25.
Q 5
The Central Limit Theorem states:
  1. (a)Sample mean approaches population mean for any sample size
  2. (b)Sample mean distribution approaches normal for large n, regardless of underlying distribution
  3. (c)All distributions are normal for large samples
  4. (d)Variance is always equal to mean
Correct: (b) Sample mean distribution approaches normal for large n, regardless of underlying distribution
CLT: as n grows, the distribution of the sample mean approaches normal regardless of the underlying population distribution. This is the foundation of statistical inference.
Educational purposes only. The numbers, returns, and examples used in this lesson are illustrative. Past performance does not guarantee future results. Mutual fund and securities investments are subject to market risks. This lesson is not investment advice; for advice tailored to your circumstances, consult a SEBI-registered Investment Adviser. Read our full disclaimer.