Sampling and estimation — CFA L1 Quant

You rarely have the full population — you sample. Sample statistics estimate population parameters. The Central Limit Theorem tells us that sample means follow a normal distribution regardless of the population distribution (for large enough n). This is the foundation of hypothesis testing, confidence intervals, and statistical inference. Master this reading and the next two on hypothesis testing become natural extensions.

Foundation

Population parameters (true values, often unknown): • μ — population mean • σ² — population variance • σ — population SD Sample statistics (computed from data): • X̄ — sample mean = Σx_i / n • s² — sample variance = Σ(x_i − X̄)² / (n − 1) • s — sample SD = √s² Note: sample variance uses (n − 1) in the denominator (degrees of freedom adjustment) for unbiased estimation. Population variance uses n. Central Limit Theorem (CLT): the distribution of the sample mean X̄ approaches normal as n grows, regardless of the underlying population distribution. Mean = μ, SD (called Standard Error) = σ/√n. This is why we can do statistical inference even when the underlying data is non-normal.

Deep Dive

Standard Error (SE) — the SD of the sample mean: SE = σ / √n (when σ known) SE = s / √n (when σ unknown, using sample SD) Critical insight: SE shrinks as √n, not n. Doubling sample size cuts SE by √2 ≈ 1.41× — diminishing returns. To halve uncertainty, need 4× the sample. Confidence intervals: For known σ, large n (use Z): CI = X̄ ± Z_α/2 × (σ / √n) For unknown σ (use sample s and t-distribution): CI = X̄ ± t_α/2,n−1 × (s / √n) Common confidence levels: 90% CI: Z = 1.65 95% CI: Z = 1.96 99% CI: Z = 2.58 As n grows, t-distribution → Z-distribution. For n > 30, they're practically identical.

Worked example

Worked example — testing fund manager skill A mutual fund reports 5-year alpha of 1.5% per year (annualised). 60 monthly observations. Sample SD of monthly excess returns = 1%. Monthly alpha = 1.5%/12 = 0.125% Standard error of monthly alpha = 1% / √60 = 0.129% T-statistic = 0.125% / 0.129% = 0.97 Is this statistically significant? At 5% two-tailed (Z = 1.96): CI for monthly alpha = 0.125% ± 1.96 × 0.129% = (−0.128%, +0.378%) The CI includes zero. We cannot reject the hypothesis that the manager has zero true alpha. The 1.5% per year alpha is consistent with luck. To distinguish skill from luck (assuming 1% true alpha and 5% one-tailed test), we need n where SE × 1.65 < 1%/12: s/√n × 1.65 < 0.0833% √n > 1 × 1.65 / 0.0833 √n > 19.8 n > 392 months ≈ 33 years This is why CFA L3 emphasises long track records before manager selection. 5 years is rarely sufficient.

Real-world scenario

Real-world scenario — Indian SIP investor return survey A mutual fund company surveys 1,000 SIP investors with 10-year track records. Sample mean SIP IRR: 11.8%. Sample SD: 4%. Question 1: 95% CI for the true population mean SIP IRR? SE = 4 / √1000 = 0.126% 95% CI = 11.8% ± 1.96 × 0.126% = (11.55%, 12.05%) With 1,000 investors, the CI is tight — we're confident the true mean is between 11.55% and 12.05%. Question 2: A specific investor claims 13% IRR. Is this exceptional? Z = (13 − 11.8) / 4 = 0.30 (using individual SD, not SE) P(Z > 0.30) ≈ 38% So about 38% of SIP investors achieve >13% — this individual is in the top half but not exceptional. Marketing copy that says "above-average returns" without context misleads. This kind of analysis lets practitioners distinguish between true outperformance and survivorship bias — exactly what sophisticated allocators must do.

Advanced

Critical practitioner insight: confidence interval interpretation. A 95% CI of [10%, 14%] for fund return DOES NOT mean "95% probability the true mean is between 10% and 14%". It means "95% of CIs constructed this way would contain the true mean if we repeated the experiment many times". Why this matters: if you sample once and get a CI, the true mean is either in it or not — there's no probability about it after the fact. The 95% refers to the procedure, not the specific interval. Finance applications: • Estimating fund manager alpha: CI tells us whether observed alpha could be statistical noise. • Beta estimation: regression coefficient comes with a CI; a CI containing zero means we can't reject "no market exposure". • Sharpe ratio confidence: Sharpe estimates have CIs; selecting "top quartile" managers using point estimates ignores this. A fund with ₹1.5% reported alpha but with a CI of [−1%, +4%] — the alpha could easily be zero. Marketing people don't show CIs. Sophisticated allocators demand them.

Regulatory references

CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 5
GIPS (Global Investment Performance Standards) — confidence-based reporting
SEBI MF performance disclosure standards

Common mistakes & pitfalls

Using σ when you should use s (sample SD) — affects CI width.
Forgetting that CI interpretation is about the procedure, not the specific interval.
Comparing sample means without computing CIs — small differences may be noise.
Confusing standard deviation (spread of observations) with standard error (spread of mean estimate).
Reporting fund alpha as "skill" without statistical significance test.

Frequently asked

When should I use t-distribution vs Z-distribution?

Use t when σ is unknown and you're using sample SD. Use Z when σ is known. For n > 30, they're nearly identical. CFA L1 mostly uses Z; L2 introduces t in regression contexts.

Why does sample variance use (n-1) instead of n?

Mathematical correction (Bessel's correction). Using sample mean introduces a small bias; dividing by (n-1) corrects it. With n=100, the difference between dividing by n and (n-1) is 1% — small, but exact for unbiased estimation.

How big does my sample need to be for CLT to apply?

Generally n ≥ 30 is the rule of thumb. For nearly-normal distributions, n=10 is fine. For very skewed distributions, may need n > 100. Asset returns are roughly normal, so n=30 usually works.

Practice questions

Click each question to reveal the answer and explanation.

Q 1

A sample of 100 fund returns has mean 12%, SD 18%. The standard error of the mean is closest to:

(a)0.18%
(b)1.8%
(c)12%
(d)18%

Correct: (b) 1.8%

SE = s / √n = 18% / √100 = 18% / 10 = 1.8%.

Q 2

Doubling sample size from 100 to 200 changes the standard error by:

(a)Halves it
(b)Reduces by factor of √2 ≈ 1.41
(c)Doubles it
(d)No change

Correct: (b) Reduces by factor of √2 ≈ 1.41

SE = σ / √n. If n doubles, √n increases by √2 ≈ 1.41. So SE shrinks by factor of √2 — not 2.

Q 3

A 95% confidence interval for the population mean (using Z = 1.96):

(a)Means 95% probability the true mean is in this interval
(b)Means 95% of similarly-constructed intervals would contain the true mean
(c)Cannot be calculated without knowing σ
(d)Always equals the sample mean

Correct: (b) Means 95% of similarly-constructed intervals would contain the true mean

CI is about the procedure: 95% of intervals so constructed would contain the true mean. After sampling, the specific interval either contains μ or it doesn't.

Q 4

Sample mean 8%, sample SD 5%, n = 25. 95% CI (assume normal, t ≈ Z) is closest to:

(a)(6%, 10%)
(b)(5.6%, 10.4%)
(c)(7%, 9%)
(d)(0%, 16%)

Correct: (b) (5.6%, 10.4%)

SE = 5/√25 = 1%. CI = 8% ± 1.96 × 1% = (6.04%, 9.96%) ≈ (6%, 10%). Closest is option (b) at (5.6%, 10.4%) using more precise t-table values for n=25.

Q 5

The Central Limit Theorem states:

(a)Sample mean approaches population mean for any sample size
(b)Sample mean distribution approaches normal for large n, regardless of underlying distribution
(c)All distributions are normal for large samples
(d)Variance is always equal to mean

Correct: (b) Sample mean distribution approaches normal for large n, regardless of underlying distribution

CLT: as n grows, the distribution of the sample mean approaches normal regardless of the underlying population distribution. This is the foundation of statistical inference.

Educational purposes only. The numbers, returns, and examples used in this lesson are illustrative. Past performance does not guarantee future results. Mutual fund and securities investments are subject to market risks. This lesson is not investment advice; for advice tailored to your circumstances, consult a SEBI-registered Investment Adviser. Read our full disclaimer.