Multiple regression — interpretation and inference — CFA L2 Quant

Multiple regression is the workhorse of empirical finance. CFA L2 tests not whether you can run a regression, but whether you can read its output critically and identify violations.

Foundation

Multiple regression model: Yi = b0 + b1X1i + b2X2i + ... + bkXki + εi. Key output to read: - Coefficient estimates (slopes): impact of 1-unit change in Xj on Y, holding others constant. - Standard errors → t-statistic = coefficient / SE. - t-stat > 2 (or p < 0.05): coefficient statistically significant. - F-statistic: tests joint significance of all slopes (H0: all slopes = 0). - R²: % of Y variance explained by model. - Adjusted R²: penalises adding useless variables. Assumptions of OLS: 1. Linearity in parameters. 2. Independence of errors (no serial correlation). 3. Homoskedasticity (constant variance of errors). 4. Errors normally distributed. 5. No perfect multicollinearity among Xs. 6. Errors uncorrelated with Xs (exogeneity).

Deep Dive

Three classical violations and remedies: **Multicollinearity** — Xs highly correlated with each other. - Symptom: high R² but individual t-stats insignificant; coefficients change wildly when adding/dropping variables. - Test: VIF (variance inflation factor) > 10 = problem. - Fix: drop redundant variables, principal components, ridge regression. **Heteroskedasticity** — variance of errors not constant. - Symptom: residual plot shows fan-shape; Breusch-Pagan test rejects. - Effect: standard errors biased → t-stats unreliable. Coefficients still unbiased. - Fix: White (heteroskedasticity-consistent) standard errors; weighted least squares. **Serial correlation** — errors correlated across observations. - Common in time-series. - Test: Durbin-Watson (1.5-2.5 OK; <1.5 positive corr). - Effect: SE biased downward → false significance. Coefficients unbiased. - Fix: Newey-West SEs; first-difference; lagged variables.

Advanced

L2 vignette pattern: a regression output is shown. Questions ask: 1. Which coefficient is significant? 2. Is the model jointly significant? 3. What violation is suspected (often given as a residual chart)? 4. What remedial action would you take? The trap is "fixing" with a remedy that doesn't address the actual violation. Memorise: heteroskedasticity → White SEs, serial corr → Newey-West, multicollinearity → variable selection.

Regulatory references

CFA Institute Quant Methods curriculum

Common mistakes & pitfalls

Reading high R² as proof of good model — multicollinearity may inflate.
Using OLS SEs when heteroskedasticity is present (overconfident inference).
Forgetting that significant slope is not causation — confounders + omitted-variable bias.

Frequently asked

How do I know if a coefficient is significant?

t-statistic > critical value (2 for n large, 5% level). Or p-value < 0.05.

What if R² is high but no t-stats are significant?

Multicollinearity. Variables explain Y collectively but cannot separate individual effects.

Practice questions

Click each question to reveal the answer and explanation.

Q 1

A model has high R² and significant F-statistic, but only one t-stat is significant. The likely issue is:

(a)Heteroskedasticity
(b)Multicollinearity
(c)Serial correlation
(d)Non-linearity

Correct: (b) Multicollinearity

Multicollinearity inflates joint fit but masks individual significance because variables share information.

Q 2

Heteroskedasticity affects:

(a)Coefficient estimates
(b)Standard errors
(c)R²
(d)F-statistic critical values

Correct: (b) Standard errors

Coefficients remain unbiased, but standard errors are biased → unreliable t-stats. Use White SEs.

Q 3

Durbin-Watson statistic of 0.9 suggests:

(a)No issue
(b)Positive serial correlation
(c)Negative serial correlation
(d)Heteroskedasticity

Correct: (b) Positive serial correlation

DW < 1.5 indicates positive serial correlation; ~2 = none; >2.5 = negative.

Q 4

Adjusted R² differs from R² because:

(a)Penalises adding variables
(b)Always higher
(c)Ignores intercept
(d)Only for time series

Correct: (a) Penalises adding variables

Adjusted R² penalises useless variables; can decrease when adding poor predictors.

Q 5

A regression has F-stat significant but no individual t-stat significant. Best remedy:

(a)Add more variables
(b)Investigate multicollinearity, drop redundant variables
(c)Use larger sample
(d)Increase α

Correct: (b) Investigate multicollinearity, drop redundant variables

Classic multicollinearity symptom. Remedy: variable selection or PCA.

Educational purposes only. The numbers, returns, and examples used in this lesson are illustrative. Past performance does not guarantee future results. Mutual fund and securities investments are subject to market risks. This lesson is not investment advice; for advice tailored to your circumstances, consult a SEBI-registered Investment Adviser. Read our full disclaimer.