Linear regression
In this chapter: Single-variable regression · Coefficient interpretation, R², SSE · Beta estimation in CAPM
Regression fits a line through data: how does Y change as X changes? Equity beta — how much a stock moves with the market — comes from a regression. Factor models, performance attribution, return forecasting — all regression. CFA L1 tests single-variable regression; L2 deepens to multiple regression. Master the single-variable case here and the rest follows.
Linear regression: Y = α + β·X + ε • Y: dependent variable (what we predict) • X: independent variable (predictor) • α: intercept — value of Y when X = 0 • β: slope — change in Y per unit change in X • ε: residual — unexplained variation For finance: • Y = stock return • X = market return • β = sensitivity to market moves • α = manager skill (Jensen's alpha) above CAPM-predicted return Least-squares estimation: choose α and β to minimise sum of squared residuals. Key formula: β = Cov(X, Y) / Var(X) = ρ_{XY} × σ_Y / σ_X R²: fraction of Y's variance explained by X. Bounded 0 to 1. Higher = X explains more of Y.
CAPM regression — the workhorse of equity finance: Monthly returns of HDFC Bank regressed on NIFTY 50 monthly returns over 36 months: Result: HDFC Bank β = 1.05, α = 0.3% per month, R² = 0.65 Interpretation: • β = 1.05 — HDFC Bank moves 1.05× as much as NIFTY (slightly aggressive) • α = 0.3% per month — beyond what beta predicts. Annualised: 3.6%. Could be skill, could be luck. • R² = 0.65 — 65% of HDFC's return variation explained by NIFTY. The other 35% is HDFC-specific (idiosyncratic risk). Standard error of β estimate determines whether the β is statistically distinguishable from a hypothesis (e.g., β = 1). Is α significant? Compute t-statistic for α/SE(α). If |t| > 2, reject H₀ that α = 0. For most active managers, α is statistically indistinguishable from zero over typical sample sizes. Detecting real skill needs many years and high active risk.
Beyond simple OLS — practitioner concerns: 1. Heteroscedasticity: variance of residuals changes with X. Common in cross-sectional regressions across firms of different sizes. Fix: White or Newey-West standard errors. 2. Autocorrelation: residuals correlated across observations (typical in time-series). Fix: Newey-West, or include lagged variables. 3. Outlier sensitivity: a few extreme observations can dominate β estimates. Robust regression methods downweight outliers. 4. Non-stationarity: regressing one trending variable on another produces "spurious" results. Test for stationarity first. 5. Multicollinearity (in multiple regression — L2 topic): independent variables correlated with each other. Inflates standard errors, makes β unstable. 6. Time-varying β: real β changes over time. Rolling-window regression captures this. CFA L1 tests recognition of issues; L2 tests fixes; the practitioner deals with all of them daily.
- CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 7
- CAPM — foundational asset-pricing model in finance
- SEBI factsheet disclosure standards — beta typically reported
- Reporting beta without standard error or confidence interval.
- Assuming β is constant — really it's time-varying.
- Confusing R² with prediction accuracy — R² is variance explained, not absolute predictive accuracy.
- Forgetting that high R² doesn't imply causation — just correlation.
- Using small samples (n < 30) — beta estimates very unstable.
Frequently asked
What's the difference between alpha and excess return?
How do I interpret a high R² with low alpha?
How long a track record do I need to estimate beta reliably?
Practice questions
Click each question to reveal the answer and explanation.
Q 1For an asset with σ = 25%, market σ = 15%, correlation 0.6, beta is:- (a)0.36
- (b)0.6
- (c)1.0
- (d)1.5
- (a)0.36
- (b)0.6
- (c)1.0
- (d)1.5
Q 2A regression's R² is 0.81. The correlation coefficient (assuming positive slope) is:- (a)0.40
- (b)0.66
- (c)0.81
- (d)0.90
- (a)0.40
- (b)0.66
- (c)0.81
- (d)0.90
Q 3A stock has β = 1.2 in a market expected to return 15%. Risk-free rate is 7%. CAPM-expected return:- (a)7%
- (b)12.6%
- (c)16.6%
- (d)22%
- (a)7%
- (b)12.6%
- (c)16.6%
- (d)22%
Q 4A fund's alpha t-statistic is 0.5. We conclude:- (a)The alpha is significantly positive
- (b)The alpha is statistically indistinguishable from zero
- (c)The fund will outperform
- (d)The fund has no market exposure
- (a)The alpha is significantly positive
- (b)The alpha is statistically indistinguishable from zero
- (c)The fund will outperform
- (d)The fund has no market exposure
Q 5In regression Y = α + βX + ε, if X increases by 1 unit, Y changes by:- (a)α units
- (b)β units
- (c)ε units
- (d)R² units
- (a)α units
- (b)β units
- (c)ε units
- (d)R² units
Q 6A high R² (0.95) in a fund-vs-benchmark regression typically suggests:- (a)The fund is highly active
- (b)The fund tracks the benchmark closely (closet indexer)
- (c)The fund has high alpha
- (d)The fund is risk-free
- (a)The fund is highly active
- (b)The fund tracks the benchmark closely (closet indexer)
- (c)The fund has high alpha
- (d)The fund is risk-free