Trustner AcademyTrustner AcademyCourses
Reading 2CFA L1 QuantFull chapter

Organising and visualising data

In this chapter: Frequency distributions, histograms, box plots · Heat maps and scatter plots · Measures of central tendency and dispersion

~6 min readLayer 4 · Professional CertificationsFree

Before you can analyse data, you have to organise it. Frequency distribution sorts observations into bins. Histograms display the distribution visually. Box plots summarise central tendency and spread. Scatter plots reveal relationships. The CFA tests whether you can interpret these — not draw them. Get fluent at reading these visualisations and you can absorb research reports, fund factsheets, and economic data faster than 95% of investment professionals.

Foundation

Two big questions every dataset answers: where is the centre, and how spread out is the data? Measures of central tendency: arithmetic mean (sum/n), median (middle value), mode (most frequent), geometric mean (n-th root of product — used for compound returns), harmonic mean (n / sum of reciprocals — used for averaging multiples like P/E across firms). Measures of dispersion: range (max − min), variance (average squared deviation from mean), standard deviation (square root of variance), mean absolute deviation, coefficient of variation (CV = SD / mean — used to compare risk across different return scales).

Deep Dive

Visualisation vocabulary tested in CFA item-sets: • Frequency distribution: organise data into mutually-exclusive bins. Relative frequency = bin count / total. Cumulative frequency builds up to 100%. • Histogram: bar chart of frequencies. Visual signature of distribution (symmetric, skewed, peaked, flat). • Box plot anatomy: box spans Q1 (25th percentile) to Q3 (75th percentile) — this is the IQR (interquartile range). Median line inside box. Whiskers extend to min/max within 1.5× IQR. Dots beyond are outliers. • Heat maps: colour-encode magnitudes. Useful for correlation matrices in portfolio analysis. • Scatter plots: read for direction (positive/negative), strength (tight/loose), shape (linear/curved), outliers. Geometric vs arithmetic mean — exam favorite: Fund returns: 50%, −30%, 50% Arithmetic mean: (50 − 30 + 50)/3 = 23.3% Geometric mean: [(1.50)(0.70)(1.50)]^(1/3) − 1 = (1.575)^(1/3) − 1 = 16.3% If you invest ₹100, you end with ₹100 × 1.5 × 0.7 × 1.5 = ₹157.5, which is 16.3% per year compounded over 3 years — NOT 23.3%. The arithmetic mean overstates the true compound return when there is volatility (Jensen's inequality). Always use geometric for actual investor experience.

Advanced

Skewness and kurtosis — visible in histograms: • Right-skewed (positive skew): long right tail. Mean > Median > Mode. Typical of asset returns over short windows — many small gains, occasional large gains. • Left-skewed (negative skew): long left tail. Mean < Median < Mode. Typical of portfolio strategies that look like collecting nickels in front of a steamroller (occasional large losses). • Kurtosis: peakedness/tail-fatness. Excess kurtosis > 0 means fatter tails than normal — extreme moves more frequent than bell curve predicts. Asset returns are universally fat-tailed; the normal distribution understates tail risk. CFA tests recognition: given a histogram, identify shape and infer mean-median-mode ordering. Equity index returns have mild positive skew over short windows but left-tail fat (crashes). Match the shape to the appropriate central-tendency measure.

Regulatory references
  • CFA Institute Curriculum — Level 1, Quantitative Methods, Reading 2
  • SEBI fund disclosure regulations — standard deviation must be reported in fund factsheets
  • NSE / BSE historical data archives for verifying empirical distributions
Common mistakes & pitfalls
  • Using arithmetic mean for multi-period compound returns — overstates the actual return.
  • Confusing standard deviation (population) with sample standard deviation (n−1 in denominator for unbiased estimate).
  • Reporting absolute return without context — a 14% return with 20% volatility is very different from 14% with 5% volatility.
  • Ignoring skewness and kurtosis — assuming normal distribution understates tail risk.
  • Using mean for skewed data — median is more representative.

Frequently asked

Why is geometric mean always less than or equal to arithmetic mean?
Mathematical inequality known as the AM-GM inequality. With volatility, recovering from a loss requires a larger gain (down 50%, you need up 100% to break even). Arithmetic mean ignores this asymmetry; geometric mean captures it. Equality only when all observations are identical.
When should I use geometric vs arithmetic mean?
Geometric for compound returns (what you actually earn over time). Arithmetic for next-period expected return (what you might earn in any single period). For long-term wealth projection, always use geometric. For one-period statistical analysis, arithmetic is fine.
What does coefficient of variation tell me?
CV = SD / mean. Risk per unit of return — comparable across assets with different return levels. A low-CV asset gives you more return per unit of risk. Useful when comparing, say, an equity fund (CV ~1.5) with a debt fund (CV ~0.3).

Practice questions

Click each question to reveal the answer and explanation.

Q 1
A fund's annual returns are: 20%, −10%, 30%. The geometric mean return is closest to:
  1. (a)10.0%
  2. (b)12.4%
  3. (c)13.3%
  4. (d)15.0%
Correct: (b) 12.4%
Geometric mean = [(1.20)(0.90)(1.30)]^(1/3) − 1 = (1.404)^(1/3) − 1 = 12.0% (arithmetic ~13.3%). The geometric mean is what you actually earned compounded.
Q 2
A right-skewed distribution typically has:
  1. (a)Mean = Median = Mode
  2. (b)Mean < Median < Mode
  3. (c)Mean > Median > Mode
  4. (d)No relationship between these measures
Correct: (c) Mean > Median > Mode
Right-skewed (positive skew): the long right tail pulls the mean up, so Mean > Median > Mode. The reverse holds for left-skewed.
Q 3
In a box plot, the box itself represents:
  1. (a)Mean ± 1 SD
  2. (b)Q1 to Q3 (interquartile range)
  3. (c)Min to Max
  4. (d)Median ± 95% confidence
Correct: (b) Q1 to Q3 (interquartile range)
The box spans Q1 (25th percentile) to Q3 (75th percentile) — the IQR. This contains the middle 50% of observations.
Q 4
Coefficient of variation is most useful for:
  1. (a)Calculating absolute risk
  2. (b)Comparing risk per unit of return across assets with different return scales
  3. (c)Measuring inflation
  4. (d)Computing geometric mean
Correct: (b) Comparing risk per unit of return across assets with different return scales
CV = SD/mean normalises risk by return, allowing comparison across assets with different absolute return levels. A low CV means more return per unit of risk.
Q 5
Excess kurtosis greater than zero indicates:
  1. (a)Skewed distribution
  2. (b)Fatter tails than the normal distribution
  3. (c)Higher mean than median
  4. (d)Negative correlation
Correct: (b) Fatter tails than the normal distribution
Excess kurtosis > 0 means leptokurtic — fatter tails. Asset returns universally show this, which is why VaR based on normal-distribution assumptions understates extreme risks.
Educational purposes only. The numbers, returns, and examples used in this lesson are illustrative. Past performance does not guarantee future results. Mutual fund and securities investments are subject to market risks. This lesson is not investment advice; for advice tailored to your circumstances, consult a SEBI-registered Investment Adviser. Read our full disclaimer.