Hypothesis testing
In this chapter: Null and alternative hypotheses · t-tests, z-tests, p-values · Type I and Type II errors
Hypothesis testing answers the question: is what I'm seeing real, or could it be random chance? Every claim about fund performance, every "this strategy beat the market", every "this asset class is undervalued" — should be a hypothesis to test, not assert. Master this reading and you become much harder to fool — by yourself or by others.
Hypothesis test structure: • Null hypothesis (H₀): the default, no effect, no difference. E.g., "fund alpha = 0". • Alternative hypothesis (H₁): what we suspect. E.g., "fund alpha > 0". We compute a test statistic (Z or t) from data. If it exceeds critical value, reject H₀ in favour of H₁. Two error types: • Type I error: rejecting H₀ when it's true (false positive). Probability = α (significance level). • Type II error: failing to reject H₀ when it's false (false negative). Probability = β. Power of test = 1 − β. Higher power = more likely to detect real effects. P-value: probability of observing data this extreme (or more) if H₀ is true. If p-value < α, reject H₀.
Common tests: 1. Z-test (large sample, known σ): Z = (X̄ − μ₀) / (σ/√n) 2. T-test (small sample or unknown σ): t = (X̄ − μ₀) / (s/√n), df = n−1 3. Difference of means: t = (X̄₁ − X̄₂) / √(s₁²/n₁ + s₂²/n₂) 4. Chi-square (variance test): χ² = (n−1)s²/σ₀² 5. F-test (compare variances): F = s₁²/s₂² One-tailed vs two-tailed: • One-tailed: H₁ specifies direction (e.g., μ > μ₀). Use one-tailed critical value. • Two-tailed: H₁ is non-directional (e.g., μ ≠ μ₀). Use two-tailed critical value (split α between tails). Decision rule: • If |test statistic| > critical value → reject H₀ • Equivalently: if p-value < α → reject H₀
Practitioner trap — multiple testing. Test 20 strategies at α = 5%. Even if all 20 are useless, expected 1 will appear "significant" by chance. This is why backtesting requires Bonferroni correction (divide α by number of tests) or other adjustments. Real example: a quant fund tests 100 trading strategies. Top 5 show "statistically significant" outperformance at 5%. Without correction, this is exactly what you'd expect from luck alone (5% × 100 = 5). Bonferroni-corrected α = 0.05/100 = 0.0005. Now top performers must clear a much higher bar. CFA L1 doesn't test multiple-testing corrections explicitly, but tests recognition that significance with many tests means little. Another trap: data-snooping bias. Looking at data, finding a pattern, testing the hypothesis on the same data — circular. Always reserve out-of-sample data for testing what you found in sample.
- CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 6
- SEBI fund advertisement standards — significance claims must be backed
- GIPS — performance reporting standards
- Confusing one-tailed vs two-tailed tests — using wrong critical value.
- Reporting "p-value < 0.05" without context — always report effect size too.
- Multiple testing without correction — finding "significant" result by chance.
- Using normal Z when sample is small and σ unknown — should use t.
- Treating "fail to reject H₀" as "H₀ is true" — it just means insufficient evidence.
Frequently asked
What's the difference between p-value and significance level?
How does sample size affect hypothesis tests?
When should I reject H₀ in finance?
Practice questions
Click each question to reveal the answer and explanation.
Q 1A two-tailed t-test at 5% significance with n = 30 has critical t-value closest to:- (a)1.65
- (b)1.96
- (c)2.04
- (d)2.58
- (a)1.65
- (b)1.96
- (c)2.04
- (d)2.58
Q 2A Type I error is:- (a)Rejecting H₀ when it's true (false positive)
- (b)Failing to reject H₀ when it's false (false negative)
- (c)Computing variance incorrectly
- (d)Using wrong distribution
- (a)Rejecting H₀ when it's true (false positive)
- (b)Failing to reject H₀ when it's false (false negative)
- (c)Computing variance incorrectly
- (d)Using wrong distribution
Q 3A test statistic t = 2.5 with df = 30 has approximate p-value (two-tailed) closest to:- (a)0.005
- (b)0.018
- (c)0.05
- (d)0.10
- (a)0.005
- (b)0.018
- (c)0.05
- (d)0.10
Q 4A fund's 3-year alpha is 1.2%, t-statistic 1.4. At 5% one-tailed (t-critical = 1.69), the conclusion is:- (a)Reject H₀ — manager has skill
- (b)Fail to reject H₀ — alpha could be luck
- (c)Inconclusive
- (d)Re-test with smaller sample
- (a)Reject H₀ — manager has skill
- (b)Fail to reject H₀ — alpha could be luck
- (c)Inconclusive
- (d)Re-test with smaller sample
Q 5Power of a test is:- (a)Probability of rejecting H₀ when H₁ is true
- (b)Probability of Type I error
- (c)Significance level
- (d)1 − Type I error rate
- (a)Probability of rejecting H₀ when H₁ is true
- (b)Probability of Type I error
- (c)Significance level
- (d)1 − Type I error rate
Q 6Multiple testing without correction (e.g., testing 20 strategies at 5%) leads to:- (a)No problem
- (b)Inflated false-positive rate
- (c)Reduced power
- (d)Higher confidence
- (a)No problem
- (b)Inflated false-positive rate
- (c)Reduced power
- (d)Higher confidence