Trustner AcademyTrustner AcademyCourses
Reading 6CFA L1 QuantFull chapter

Hypothesis testing

In this chapter: Null and alternative hypotheses · t-tests, z-tests, p-values · Type I and Type II errors

~6 min readLayer 4 · Professional CertificationsFree

Hypothesis testing answers the question: is what I'm seeing real, or could it be random chance? Every claim about fund performance, every "this strategy beat the market", every "this asset class is undervalued" — should be a hypothesis to test, not assert. Master this reading and you become much harder to fool — by yourself or by others.

Foundation

Hypothesis test structure: • Null hypothesis (H₀): the default, no effect, no difference. E.g., "fund alpha = 0". • Alternative hypothesis (H₁): what we suspect. E.g., "fund alpha > 0". We compute a test statistic (Z or t) from data. If it exceeds critical value, reject H₀ in favour of H₁. Two error types: • Type I error: rejecting H₀ when it's true (false positive). Probability = α (significance level). • Type II error: failing to reject H₀ when it's false (false negative). Probability = β. Power of test = 1 − β. Higher power = more likely to detect real effects. P-value: probability of observing data this extreme (or more) if H₀ is true. If p-value < α, reject H₀.

Deep Dive

Common tests: 1. Z-test (large sample, known σ): Z = (X̄ − μ₀) / (σ/√n) 2. T-test (small sample or unknown σ): t = (X̄ − μ₀) / (s/√n), df = n−1 3. Difference of means: t = (X̄₁ − X̄₂) / √(s₁²/n₁ + s₂²/n₂) 4. Chi-square (variance test): χ² = (n−1)s²/σ₀² 5. F-test (compare variances): F = s₁²/s₂² One-tailed vs two-tailed: • One-tailed: H₁ specifies direction (e.g., μ > μ₀). Use one-tailed critical value. • Two-tailed: H₁ is non-directional (e.g., μ ≠ μ₀). Use two-tailed critical value (split α between tails). Decision rule: • If |test statistic| > critical value → reject H₀ • Equivalently: if p-value < α → reject H₀

Advanced

Practitioner trap — multiple testing. Test 20 strategies at α = 5%. Even if all 20 are useless, expected 1 will appear "significant" by chance. This is why backtesting requires Bonferroni correction (divide α by number of tests) or other adjustments. Real example: a quant fund tests 100 trading strategies. Top 5 show "statistically significant" outperformance at 5%. Without correction, this is exactly what you'd expect from luck alone (5% × 100 = 5). Bonferroni-corrected α = 0.05/100 = 0.0005. Now top performers must clear a much higher bar. CFA L1 doesn't test multiple-testing corrections explicitly, but tests recognition that significance with many tests means little. Another trap: data-snooping bias. Looking at data, finding a pattern, testing the hypothesis on the same data — circular. Always reserve out-of-sample data for testing what you found in sample.

Regulatory references
  • CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 6
  • SEBI fund advertisement standards — significance claims must be backed
  • GIPS — performance reporting standards
Common mistakes & pitfalls
  • Confusing one-tailed vs two-tailed tests — using wrong critical value.
  • Reporting "p-value < 0.05" without context — always report effect size too.
  • Multiple testing without correction — finding "significant" result by chance.
  • Using normal Z when sample is small and σ unknown — should use t.
  • Treating "fail to reject H₀" as "H₀ is true" — it just means insufficient evidence.

Frequently asked

What's the difference between p-value and significance level?
Significance level α is the threshold we set in advance (typically 5% or 1%). P-value is the actual probability of seeing data this extreme under H₀. We reject H₀ if p-value < α.
How does sample size affect hypothesis tests?
Larger n → smaller SE → easier to reject H₀ for given true effect. With huge samples, even tiny effects become "statistically significant" — but may be economically trivial. CFA distinguishes statistical significance from economic significance.
When should I reject H₀ in finance?
Standard threshold is 5%, but for important decisions (capital allocation, manager selection), 1% is more conservative. Always pre-specify before testing — don't adjust α after seeing data.

Practice questions

Click each question to reveal the answer and explanation.

Q 1
A two-tailed t-test at 5% significance with n = 30 has critical t-value closest to:
  1. (a)1.65
  2. (b)1.96
  3. (c)2.04
  4. (d)2.58
Correct: (c) 2.04
For two-tailed at 5% with df = 29 (n−1), t-critical ≈ 2.045. The closest answer is 2.04. (Z would be 1.96; t is slightly larger for finite df.)
Q 2
A Type I error is:
  1. (a)Rejecting H₀ when it's true (false positive)
  2. (b)Failing to reject H₀ when it's false (false negative)
  3. (c)Computing variance incorrectly
  4. (d)Using wrong distribution
Correct: (a) Rejecting H₀ when it's true (false positive)
Type I = false positive. Type II = false negative. The probability of Type I is α; Type II is β. Power = 1 − β.
Q 3
A test statistic t = 2.5 with df = 30 has approximate p-value (two-tailed) closest to:
  1. (a)0.005
  2. (b)0.018
  3. (c)0.05
  4. (d)0.10
Correct: (b) 0.018
Two-tailed p-value ≈ 0.018 for t = 2.5 at df = 30. Reject H₀ at 5%, fail to reject at 1%.
Q 4
A fund's 3-year alpha is 1.2%, t-statistic 1.4. At 5% one-tailed (t-critical = 1.69), the conclusion is:
  1. (a)Reject H₀ — manager has skill
  2. (b)Fail to reject H₀ — alpha could be luck
  3. (c)Inconclusive
  4. (d)Re-test with smaller sample
Correct: (b) Fail to reject H₀ — alpha could be luck
t = 1.4 < t-critical = 1.69. Fail to reject H₀. The 1.2% alpha is not statistically distinguishable from zero given the noise.
Q 5
Power of a test is:
  1. (a)Probability of rejecting H₀ when H₁ is true
  2. (b)Probability of Type I error
  3. (c)Significance level
  4. (d)1 − Type I error rate
Correct: (a) Probability of rejecting H₀ when H₁ is true
Power = 1 − β = probability of correctly detecting true effect. Increased by larger sample size, larger true effect, lower variance.
Q 6
Multiple testing without correction (e.g., testing 20 strategies at 5%) leads to:
  1. (a)No problem
  2. (b)Inflated false-positive rate
  3. (c)Reduced power
  4. (d)Higher confidence
Correct: (b) Inflated false-positive rate
Test 20 useless strategies at 5%, expect 1 to show "significant" by chance. Bonferroni correction (α/20) restores true 5% family-wise error rate.
Educational purposes only. The numbers, returns, and examples used in this lesson are illustrative. Past performance does not guarantee future results. Mutual fund and securities investments are subject to market risks. This lesson is not investment advice; for advice tailored to your circumstances, consult a SEBI-registered Investment Adviser. Read our full disclaimer.