In This Article
Introduction
Descriptive statistics summarize and describe the main features of a dataset. They provide a way to present large amounts of data in a meaningful and understandable way, forming the foundation for further statistical analysis.
The three main categories of descriptive statistics are measures of central tendency (where is the center?), measures of dispersion (how spread out is the data?), and measures of shape (what does the distribution look like?).
Measures of Central Tendency
These describe the "typical" or "central" value in a dataset.
Mean (Average)
Mean = Σx / n
Sum of all values divided by the number of values
- Pros: Uses all data points; mathematically tractable
- Cons: Sensitive to outliers
- Use when: Data is symmetric without extreme outliers
Median
The middle value when data is sorted. For even number of observations, it's the average of the two middle values.
- Pros: Not affected by outliers
- Cons: Doesn't use all data points
- Use when: Data is skewed or has outliers
Mode
The most frequently occurring value.
- Pros: Works for categorical data; shows most common value
- Cons: May not exist or may have multiple modes
- Use when: Finding most common category or response
Example
Salaries: ₹30K, ₹35K, ₹40K, ₹45K, ₹500K
- Mean: ₹130K (distorted by outlier)
- Median: ₹40K (better represents typical salary)
Measures of Dispersion
These describe how spread out the data is.
Range
Range = Maximum - Minimum
- Simple but affected by outliers
- Doesn't describe distribution within the range
Variance
Variance (σ²) = Σ(x - μ)² / n
Average of squared deviations from the mean
- Uses all data points
- Units are squared (harder to interpret)
Standard Deviation
Standard Deviation (σ) = √Variance
- Most commonly used measure of spread
- Same units as original data
- For normal distributions: ~68% within 1 SD, ~95% within 2 SD
Interquartile Range (IQR)
IQR = Q3 - Q1
Range of the middle 50% of data
- Robust to outliers
- Used with median for skewed data
| Measure | Formula | Sensitive to Outliers? |
|---|---|---|
| Range | Max - Min | Yes (very) |
| Variance | Σ(x-μ)²/n | Yes |
| Std Deviation | √Variance | Yes |
| IQR | Q3 - Q1 | No |
Measures of Shape
Skewness
Measures asymmetry of the distribution.
- Positive skew: Tail extends to the right (Mean > Median)
- Negative skew: Tail extends to the left (Mean < Median)
- Zero skew: Symmetric distribution
Kurtosis
Measures "tailedness" of the distribution.
- High kurtosis: Heavy tails, more outliers
- Low kurtosis: Light tails, fewer outliers
Choosing the Right Measure
| Data Characteristic | Central Tendency | Dispersion |
|---|---|---|
| Symmetric, no outliers | Mean | Standard Deviation |
| Skewed or outliers | Median | IQR |
| Categorical data | Mode | N/A |
Conclusion
Key Takeaways
- Central tendency measures: Mean, Median, Mode
- Dispersion measures: Range, Variance, Standard Deviation, IQR
- Mean and SD are best for symmetric data
- Median and IQR are robust to outliers
- Skewness indicates asymmetry; Kurtosis indicates tail heaviness
- Always visualize data alongside summary statistics
- Choose measures appropriate to data characteristics