Hypothesis testing — CFA L1 Quant

Hypothesis testing answers the question: is what I'm seeing real, or could it be random chance? Every claim about fund performance, every "this strategy beat the market", every "this asset class is undervalued" — should be a hypothesis to test, not assert. Master this reading and you become much harder to fool — by yourself or by others.

Foundation

Hypothesis test structure: • Null hypothesis (H₀): the default, no effect, no difference. E.g., "fund alpha = 0". • Alternative hypothesis (H₁): what we suspect. E.g., "fund alpha > 0". We compute a test statistic (Z or t) from data. If it exceeds critical value, reject H₀ in favour of H₁. Two error types: • Type I error: rejecting H₀ when it's true (false positive). Probability = α (significance level). • Type II error: failing to reject H₀ when it's false (false negative). Probability = β. Power of test = 1 − β. Higher power = more likely to detect real effects. P-value: probability of observing data this extreme (or more) if H₀ is true. If p-value < α, reject H₀.

Deep Dive

Common tests: 1. Z-test (large sample, known σ): Z = (X̄ − μ₀) / (σ/√n) 2. T-test (small sample or unknown σ): t = (X̄ − μ₀) / (s/√n), df = n−1 3. Difference of means: t = (X̄₁ − X̄₂) / √(s₁²/n₁ + s₂²/n₂) 4. Chi-square (variance test): χ² = (n−1)s²/σ₀² 5. F-test (compare variances): F = s₁²/s₂² One-tailed vs two-tailed: • One-tailed: H₁ specifies direction (e.g., μ > μ₀). Use one-tailed critical value. • Two-tailed: H₁ is non-directional (e.g., μ ≠ μ₀). Use two-tailed critical value (split α between tails). Decision rule: • If |test statistic| > critical value → reject H₀ • Equivalently: if p-value < α → reject H₀

Worked example

Worked example — testing fund manager outperformance Claim: ABC equity fund beats NIFTY 50 by 2% per year. We have 3 years of monthly data: Monthly excess returns (fund − NIFTY): mean 0.18%, SD 0.7%, n = 36. H₀: true mean excess return = 0 H₁: true mean excess return > 0 (one-tailed test) Test statistic: t = (0.18% − 0) / (0.7% / √36) t = 0.18 / 0.117 t = 1.54 Critical t-value at 5% one-tailed, df = 35: t* = 1.69 Decision: t = 1.54 < t* = 1.69, so DO NOT reject H₀. Conclusion: even though fund returned 2.16% extra per year (0.18% × 12), the sample is too noisy to conclude the manager has true skill. To detect a 2% true outperformance with 80% power, need approximately: n > (1.69 × 0.7/2/12)² × ... ≈ ~50-60 months of consistent data 5+ years for credible signal. This is why CFA program emphasises sustained track records.

Real-world scenario

Real-world scenario — testing the "January Effect" Claim: Indian small-cap stocks outperform in January due to tax-loss harvesting reversal. We have 25 years of NIFTY Smallcap monthly data. H₀: January mean return = average month mean H₁: January mean return > other months Dataset: January (25 observations): mean 4.2%, SD 8% Other months (275 observations): mean 1.4%, SD 7% Difference of means test: t = (4.2 − 1.4) / √(8²/25 + 7²/275) = 2.8 / √(2.56 + 0.178) = 2.8 / √2.738 = 2.8 / 1.654 = 1.69 Critical t at 5% one-tailed: ~1.65 (large df) P-value ≈ 0.046 Conclusion: REJECT H₀ at 5% significance. January returns appear statistically higher than other months. BUT: this is one of dozens of "calendar anomalies" tested. Multiple-testing correction would significantly tighten the threshold. And in practice, transaction costs eat the apparent edge for retail investors. CFA tests recognition of these caveats — the marketing claims rarely include them.

Advanced

Practitioner trap — multiple testing. Test 20 strategies at α = 5%. Even if all 20 are useless, expected 1 will appear "significant" by chance. This is why backtesting requires Bonferroni correction (divide α by number of tests) or other adjustments. Real example: a quant fund tests 100 trading strategies. Top 5 show "statistically significant" outperformance at 5%. Without correction, this is exactly what you'd expect from luck alone (5% × 100 = 5). Bonferroni-corrected α = 0.05/100 = 0.0005. Now top performers must clear a much higher bar. CFA L1 doesn't test multiple-testing corrections explicitly, but tests recognition that significance with many tests means little. Another trap: data-snooping bias. Looking at data, finding a pattern, testing the hypothesis on the same data — circular. Always reserve out-of-sample data for testing what you found in sample.

Regulatory references

CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 6
SEBI fund advertisement standards — significance claims must be backed
GIPS — performance reporting standards

Common mistakes & pitfalls

Confusing one-tailed vs two-tailed tests — using wrong critical value.
Reporting "p-value < 0.05" without context — always report effect size too.
Multiple testing without correction — finding "significant" result by chance.
Using normal Z when sample is small and σ unknown — should use t.
Treating "fail to reject H₀" as "H₀ is true" — it just means insufficient evidence.

Frequently asked

What's the difference between p-value and significance level?

Significance level α is the threshold we set in advance (typically 5% or 1%). P-value is the actual probability of seeing data this extreme under H₀. We reject H₀ if p-value < α.

How does sample size affect hypothesis tests?

Larger n → smaller SE → easier to reject H₀ for given true effect. With huge samples, even tiny effects become "statistically significant" — but may be economically trivial. CFA distinguishes statistical significance from economic significance.

When should I reject H₀ in finance?

Standard threshold is 5%, but for important decisions (capital allocation, manager selection), 1% is more conservative. Always pre-specify before testing — don't adjust α after seeing data.

Practice questions

Click each question to reveal the answer and explanation.

Q 1

A two-tailed t-test at 5% significance with n = 30 has critical t-value closest to:

(a)1.65
(b)1.96
(c)2.04
(d)2.58

Correct: (c) 2.04

For two-tailed at 5% with df = 29 (n−1), t-critical ≈ 2.045. The closest answer is 2.04. (Z would be 1.96; t is slightly larger for finite df.)

Q 2

A Type I error is:

(a)Rejecting H₀ when it's true (false positive)
(b)Failing to reject H₀ when it's false (false negative)
(c)Computing variance incorrectly
(d)Using wrong distribution

Correct: (a) Rejecting H₀ when it's true (false positive)

Type I = false positive. Type II = false negative. The probability of Type I is α; Type II is β. Power = 1 − β.

Q 3

A test statistic t = 2.5 with df = 30 has approximate p-value (two-tailed) closest to:

(a)0.005
(b)0.018
(c)0.05
(d)0.10

Correct: (b) 0.018

Two-tailed p-value ≈ 0.018 for t = 2.5 at df = 30. Reject H₀ at 5%, fail to reject at 1%.

Q 4

A fund's 3-year alpha is 1.2%, t-statistic 1.4. At 5% one-tailed (t-critical = 1.69), the conclusion is:

(a)Reject H₀ — manager has skill
(b)Fail to reject H₀ — alpha could be luck
(c)Inconclusive
(d)Re-test with smaller sample

Correct: (b) Fail to reject H₀ — alpha could be luck

t = 1.4 < t-critical = 1.69. Fail to reject H₀. The 1.2% alpha is not statistically distinguishable from zero given the noise.

Q 5

Power of a test is:

(a)Probability of rejecting H₀ when H₁ is true
(b)Probability of Type I error
(c)Significance level
(d)1 − Type I error rate

Correct: (a) Probability of rejecting H₀ when H₁ is true

Power = 1 − β = probability of correctly detecting true effect. Increased by larger sample size, larger true effect, lower variance.

Q 6

Multiple testing without correction (e.g., testing 20 strategies at 5%) leads to:

(a)No problem
(b)Inflated false-positive rate
(c)Reduced power
(d)Higher confidence

Correct: (b) Inflated false-positive rate

Test 20 useless strategies at 5%, expect 1 to show "significant" by chance. Bonferroni correction (α/20) restores true 5% family-wise error rate.

Educational purposes only. The numbers, returns, and examples used in this lesson are illustrative. Past performance does not guarantee future results. Mutual fund and securities investments are subject to market risks. This lesson is not investment advice; for advice tailored to your circumstances, consult a SEBI-registered Investment Adviser. Read our full disclaimer.