Introduction

Descriptive statistics summarize and describe the main features of a dataset. They provide a way to present large amounts of data in a meaningful and understandable way, forming the foundation for further statistical analysis.

The three main categories of descriptive statistics are measures of central tendency (where is the center?), measures of dispersion (how spread out is the data?), and measures of shape (what does the distribution look like?).


Measures of Central Tendency

These describe the "typical" or "central" value in a dataset.

Mean (Average)

Mean = Σx / n

Sum of all values divided by the number of values

  • Pros: Uses all data points; mathematically tractable
  • Cons: Sensitive to outliers
  • Use when: Data is symmetric without extreme outliers

Median

The middle value when data is sorted. For even number of observations, it's the average of the two middle values.

  • Pros: Not affected by outliers
  • Cons: Doesn't use all data points
  • Use when: Data is skewed or has outliers

Mode

The most frequently occurring value.

  • Pros: Works for categorical data; shows most common value
  • Cons: May not exist or may have multiple modes
  • Use when: Finding most common category or response

Example

Salaries: ₹30K, ₹35K, ₹40K, ₹45K, ₹500K

  • Mean: ₹130K (distorted by outlier)
  • Median: ₹40K (better represents typical salary)

Measures of Dispersion

These describe how spread out the data is.

Range

Range = Maximum - Minimum

  • Simple but affected by outliers
  • Doesn't describe distribution within the range

Variance

Variance (σ²) = Σ(x - μ)² / n

Average of squared deviations from the mean

  • Uses all data points
  • Units are squared (harder to interpret)

Standard Deviation

Standard Deviation (σ) = √Variance

  • Most commonly used measure of spread
  • Same units as original data
  • For normal distributions: ~68% within 1 SD, ~95% within 2 SD

Interquartile Range (IQR)

IQR = Q3 - Q1

Range of the middle 50% of data

  • Robust to outliers
  • Used with median for skewed data
MeasureFormulaSensitive to Outliers?
RangeMax - MinYes (very)
VarianceΣ(x-μ)²/nYes
Std Deviation√VarianceYes
IQRQ3 - Q1No

Measures of Shape

Skewness

Measures asymmetry of the distribution.

  • Positive skew: Tail extends to the right (Mean > Median)
  • Negative skew: Tail extends to the left (Mean < Median)
  • Zero skew: Symmetric distribution

Kurtosis

Measures "tailedness" of the distribution.

  • High kurtosis: Heavy tails, more outliers
  • Low kurtosis: Light tails, fewer outliers

Choosing the Right Measure

Data CharacteristicCentral TendencyDispersion
Symmetric, no outliersMeanStandard Deviation
Skewed or outliersMedianIQR
Categorical dataModeN/A

Conclusion

Key Takeaways

  • Central tendency measures: Mean, Median, Mode
  • Dispersion measures: Range, Variance, Standard Deviation, IQR
  • Mean and SD are best for symmetric data
  • Median and IQR are robust to outliers
  • Skewness indicates asymmetry; Kurtosis indicates tail heaviness
  • Always visualize data alongside summary statistics
  • Choose measures appropriate to data characteristics