Introduction

Regression analysis is a statistical technique for modeling the relationship between a dependent variable (Y) and one or more independent variables (X). It's one of the most widely used tools in business analytics for prediction and understanding relationships.


Simple Linear Regression

Y = β₀ + β₁X + ε

Where: Y = dependent variable, X = independent variable

β₀ = intercept (Y when X=0), β₁ = slope (change in Y per unit X)

ε = error term

Ordinary Least Squares (OLS)

OLS finds the line that minimizes the sum of squared residuals (differences between actual and predicted values).

Example

If Sales = 100 + 5×Advertising, then:

• Base sales (no advertising) = ₹100

• Each ₹1 in advertising adds ₹5 in sales


Multiple Linear Regression

Y = β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ + ε

Multiple regression includes several independent variables, allowing you to:

  • Control for other factors
  • Understand relative importance of variables
  • Make more accurate predictions

Variable Types

  • Continuous: Price, age, income
  • Dummy (0/1): Gender, region, yes/no
  • Interaction: Combined effect of two variables

Key Assumptions

AssumptionDescriptionViolation Consequence
LinearityRelationship is linearBiased estimates
IndependenceErrors are independentInefficient estimates
HomoscedasticityConstant error varianceUnreliable standard errors
NormalityErrors normally distributedInvalid hypothesis tests
No multicollinearityIVs not highly correlatedUnstable coefficients

Interpreting Results

Key Metrics

  • R² (R-squared): % of variance in Y explained by X (0-1, higher is better)
  • Adjusted R²: R² adjusted for number of predictors
  • p-value: Statistical significance (typically < 0.05)
  • Coefficients: Effect size—change in Y per unit change in X
  • Standard Error: Precision of coefficient estimate
  • F-statistic: Overall model significance
Key Insight: A high R² doesn't mean causation. Correlation ≠ causation. Also check for omitted variable bias and reverse causality.

Business Applications

  • Sales forecasting: Predict sales from marketing spend, seasonality
  • Pricing: Estimate price elasticity of demand
  • HR: Predict employee turnover from satisfaction scores
  • Finance: Model stock returns, credit risk
  • Marketing: Attribution modeling, customer lifetime value
  • Operations: Demand planning, capacity optimization

Conclusion

Key Takeaways

  • Regression models relationship between Y and X variables
  • Simple regression: One predictor; Multiple: Several predictors
  • OLS minimizes sum of squared errors
  • Check assumptions: Linearity, independence, homoscedasticity, normality
  • shows variance explained; p-value shows significance
  • Coefficients show effect size and direction
  • Correlation ≠ causation—be careful with interpretation