Multiple regression — interpretation and inference
In this chapter: Coefficients and t-tests · F-test for joint significance · R² and adjusted R² · Multicollinearity, heteroskedasticity, serial correlation
Multiple regression is the workhorse of empirical finance. CFA L2 tests not whether you can run a regression, but whether you can read its output critically and identify violations.
Multiple regression model: Yi = b0 + b1X1i + b2X2i + ... + bkXki + εi. Key output to read: - Coefficient estimates (slopes): impact of 1-unit change in Xj on Y, holding others constant. - Standard errors → t-statistic = coefficient / SE. - t-stat > 2 (or p < 0.05): coefficient statistically significant. - F-statistic: tests joint significance of all slopes (H0: all slopes = 0). - R²: % of Y variance explained by model. - Adjusted R²: penalises adding useless variables. Assumptions of OLS: 1. Linearity in parameters. 2. Independence of errors (no serial correlation). 3. Homoskedasticity (constant variance of errors). 4. Errors normally distributed. 5. No perfect multicollinearity among Xs. 6. Errors uncorrelated with Xs (exogeneity).
Three classical violations and remedies: **Multicollinearity** — Xs highly correlated with each other. - Symptom: high R² but individual t-stats insignificant; coefficients change wildly when adding/dropping variables. - Test: VIF (variance inflation factor) > 10 = problem. - Fix: drop redundant variables, principal components, ridge regression. **Heteroskedasticity** — variance of errors not constant. - Symptom: residual plot shows fan-shape; Breusch-Pagan test rejects. - Effect: standard errors biased → t-stats unreliable. Coefficients still unbiased. - Fix: White (heteroskedasticity-consistent) standard errors; weighted least squares. **Serial correlation** — errors correlated across observations. - Common in time-series. - Test: Durbin-Watson (1.5-2.5 OK; <1.5 positive corr). - Effect: SE biased downward → false significance. Coefficients unbiased. - Fix: Newey-West SEs; first-difference; lagged variables.
L2 vignette pattern: a regression output is shown. Questions ask: 1. Which coefficient is significant? 2. Is the model jointly significant? 3. What violation is suspected (often given as a residual chart)? 4. What remedial action would you take? The trap is "fixing" with a remedy that doesn't address the actual violation. Memorise: heteroskedasticity → White SEs, serial corr → Newey-West, multicollinearity → variable selection.
- CFA Institute Quant Methods curriculum
- Reading high R² as proof of good model — multicollinearity may inflate.
- Using OLS SEs when heteroskedasticity is present (overconfident inference).
- Forgetting that significant slope is not causation — confounders + omitted-variable bias.
Frequently asked
How do I know if a coefficient is significant?
What if R² is high but no t-stats are significant?
Practice questions
Click each question to reveal the answer and explanation.
Q 1A model has high R² and significant F-statistic, but only one t-stat is significant. The likely issue is:- (a)Heteroskedasticity
- (b)Multicollinearity
- (c)Serial correlation
- (d)Non-linearity
- (a)Heteroskedasticity
- (b)Multicollinearity
- (c)Serial correlation
- (d)Non-linearity
Q 2Heteroskedasticity affects:- (a)Coefficient estimates
- (b)Standard errors
- (c)R²
- (d)F-statistic critical values
- (a)Coefficient estimates
- (b)Standard errors
- (c)R²
- (d)F-statistic critical values
Q 3Durbin-Watson statistic of 0.9 suggests:- (a)No issue
- (b)Positive serial correlation
- (c)Negative serial correlation
- (d)Heteroskedasticity
- (a)No issue
- (b)Positive serial correlation
- (c)Negative serial correlation
- (d)Heteroskedasticity
Q 4Adjusted R² differs from R² because:- (a)Penalises adding variables
- (b)Always higher
- (c)Ignores intercept
- (d)Only for time series
- (a)Penalises adding variables
- (b)Always higher
- (c)Ignores intercept
- (d)Only for time series
Q 5A regression has F-stat significant but no individual t-stat significant. Best remedy:- (a)Add more variables
- (b)Investigate multicollinearity, drop redundant variables
- (c)Use larger sample
- (d)Increase α
- (a)Add more variables
- (b)Investigate multicollinearity, drop redundant variables
- (c)Use larger sample
- (d)Increase α