Organising and visualising data
In this chapter: Frequency distributions, histograms, box plots · Heat maps and scatter plots · Measures of central tendency and dispersion
Before you can analyse data, you have to organise it. Frequency distribution sorts observations into bins. Histograms display the distribution visually. Box plots summarise central tendency and spread. Scatter plots reveal relationships. The CFA tests whether you can interpret these — not draw them. Get fluent at reading these visualisations and you can absorb research reports, fund factsheets, and economic data faster than 95% of investment professionals.
Two big questions every dataset answers: where is the centre, and how spread out is the data? Measures of central tendency: arithmetic mean (sum/n), median (middle value), mode (most frequent), geometric mean (n-th root of product — used for compound returns), harmonic mean (n / sum of reciprocals — used for averaging multiples like P/E across firms). Measures of dispersion: range (max − min), variance (average squared deviation from mean), standard deviation (square root of variance), mean absolute deviation, coefficient of variation (CV = SD / mean — used to compare risk across different return scales).
Visualisation vocabulary tested in CFA item-sets: • Frequency distribution: organise data into mutually-exclusive bins. Relative frequency = bin count / total. Cumulative frequency builds up to 100%. • Histogram: bar chart of frequencies. Visual signature of distribution (symmetric, skewed, peaked, flat). • Box plot anatomy: box spans Q1 (25th percentile) to Q3 (75th percentile) — this is the IQR (interquartile range). Median line inside box. Whiskers extend to min/max within 1.5× IQR. Dots beyond are outliers. • Heat maps: colour-encode magnitudes. Useful for correlation matrices in portfolio analysis. • Scatter plots: read for direction (positive/negative), strength (tight/loose), shape (linear/curved), outliers. Geometric vs arithmetic mean — exam favorite: Fund returns: 50%, −30%, 50% Arithmetic mean: (50 − 30 + 50)/3 = 23.3% Geometric mean: [(1.50)(0.70)(1.50)]^(1/3) − 1 = (1.575)^(1/3) − 1 = 16.3% If you invest ₹100, you end with ₹100 × 1.5 × 0.7 × 1.5 = ₹157.5, which is 16.3% per year compounded over 3 years — NOT 23.3%. The arithmetic mean overstates the true compound return when there is volatility (Jensen's inequality). Always use geometric for actual investor experience.
Skewness and kurtosis — visible in histograms: • Right-skewed (positive skew): long right tail. Mean > Median > Mode. Typical of asset returns over short windows — many small gains, occasional large gains. • Left-skewed (negative skew): long left tail. Mean < Median < Mode. Typical of portfolio strategies that look like collecting nickels in front of a steamroller (occasional large losses). • Kurtosis: peakedness/tail-fatness. Excess kurtosis > 0 means fatter tails than normal — extreme moves more frequent than bell curve predicts. Asset returns are universally fat-tailed; the normal distribution understates tail risk. CFA tests recognition: given a histogram, identify shape and infer mean-median-mode ordering. Equity index returns have mild positive skew over short windows but left-tail fat (crashes). Match the shape to the appropriate central-tendency measure.
- CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 2
- SEBI fund disclosure regulations — standard deviation must be reported in fund factsheets
- NSE / BSE historical data archives for verifying empirical distributions
- Using arithmetic mean for multi-period compound returns — overstates the actual return.
- Confusing standard deviation (population) with sample standard deviation (n−1 in denominator for unbiased estimate).
- Reporting absolute return without context — a 14% return with 20% volatility is very different from 14% with 5% volatility.
- Ignoring skewness and kurtosis — assuming normal distribution understates tail risk.
- Using mean for skewed data — median is more representative.
Frequently asked
Why is geometric mean always less than or equal to arithmetic mean?
When should I use geometric vs arithmetic mean?
What does coefficient of variation tell me?
Practice questions
Click each question to reveal the answer and explanation.
Q 1A fund's annual returns are: 20%, −10%, 30%. The geometric mean return is closest to:- (a)10.0%
- (b)12.4%
- (c)13.3%
- (d)15.0%
- (a)10.0%
- (b)12.4%
- (c)13.3%
- (d)15.0%
Q 2A right-skewed distribution typically has:- (a)Mean = Median = Mode
- (b)Mean < Median < Mode
- (c)Mean > Median > Mode
- (d)No relationship between these measures
- (a)Mean = Median = Mode
- (b)Mean < Median < Mode
- (c)Mean > Median > Mode
- (d)No relationship between these measures
Q 3In a box plot, the box itself represents:- (a)Mean ± 1 SD
- (b)Q1 to Q3 (interquartile range)
- (c)Min to Max
- (d)Median ± 95% confidence
- (a)Mean ± 1 SD
- (b)Q1 to Q3 (interquartile range)
- (c)Min to Max
- (d)Median ± 95% confidence
Q 4Coefficient of variation is most useful for:- (a)Calculating absolute risk
- (b)Comparing risk per unit of return across assets with different return scales
- (c)Measuring inflation
- (d)Computing geometric mean
- (a)Calculating absolute risk
- (b)Comparing risk per unit of return across assets with different return scales
- (c)Measuring inflation
- (d)Computing geometric mean
Q 5Excess kurtosis greater than zero indicates:- (a)Skewed distribution
- (b)Fatter tails than the normal distribution
- (c)Higher mean than median
- (d)Negative correlation
- (a)Skewed distribution
- (b)Fatter tails than the normal distribution
- (c)Higher mean than median
- (d)Negative correlation