## **ANOVA – Introduction**

Anova – Analysis of Variance

It's a collection of statistical methods to detect whether there are significant differences between the means of two or more categorical groups or two treatments.

Application: Does the ice-cream product have different sales in the four seasons (sales in the spring, summer, winter, and fall)

Anova is based on the comparison of variance and not mean.

H0 = m1 = m2 = …… = mt

Ha = at least two group (two treatment) means are different

Normality: the samples of each of the categorical groups are taken from a population that's distributed normal. In the normal, we call this the bell-shaped curve.

Independence: every sample is taken independently of the other samples, so the existence of one element does not affect the existence of another, they're just independent of each other.

Equality of variances (Homoscedasticity): the variance of data in the different groups should be the same

Numeric Continuous Dependent Variable: The target variable should be subdivided using increments (i.e. interval of time, seasons, etc.)

Total Sum of Squares à

The first term of the decomposition is called the between-treatment sum of squares (SST) and it measures the variability due to differences between the treatment means.

The second term of the decomposition is called the sum of squares for error (SSE) which measures the variability that is not explained by the differences between the treatment means.

(k-1) = degree of freedom of the between group variation

(N-k) = degree of freedom of the within-group variation

Under the null hypothesis H0 = m1 = m2 = …… = mk and the normality assumption, the test statistics follows F distribution which involves two degrees of freedoms (mentioned above).

Reject H0 if p-value < a, else accept it.

In two-way ANOVA, groups are split on two independent variables. This is to study whether there is an interaction between the two independent variables on the response variable. For example, we can study whether the spray type and spray dosage will affect the count of insects through two-way ANOVA.

Assumptions of a Two-way ANOVA:

It's a collection of statistical methods to detect whether there are significant differences between the means of two or more categorical groups or two treatments.

Application: Does the ice-cream product have different sales in the four seasons (sales in the spring, summer, winter, and fall)

Anova is based on the comparison of variance and not mean.

H0 = m1 = m2 = …… = mt

Ha = at least two group (two treatment) means are different

__Assumptions:__Normality: the samples of each of the categorical groups are taken from a population that's distributed normal. In the normal, we call this the bell-shaped curve.

Independence: every sample is taken independently of the other samples, so the existence of one element does not affect the existence of another, they're just independent of each other.

Equality of variances (Homoscedasticity): the variance of data in the different groups should be the same

Numeric Continuous Dependent Variable: The target variable should be subdivided using increments (i.e. interval of time, seasons, etc.)

Total Sum of Squares à

The first term of the decomposition is called the between-treatment sum of squares (SST) and it measures the variability due to differences between the treatment means.

The second term of the decomposition is called the sum of squares for error (SSE) which measures the variability that is not explained by the differences between the treatment means.

(k-1) = degree of freedom of the between group variation

(N-k) = degree of freedom of the within-group variation

Under the null hypothesis H0 = m1 = m2 = …… = mk and the normality assumption, the test statistics follows F distribution which involves two degrees of freedoms (mentioned above).

__Inference:__Reject H0 if p-value < a, else accept it.

**Example - Inspect Spray (One-way ANOVA) & Tooth Growth (Two-way ANOVA)**There are multiple types of ANOVA. The two commonly used ones are:- One-way ANOVA
- Two-way ANOVA

In two-way ANOVA, groups are split on two independent variables. This is to study whether there is an interaction between the two independent variables on the response variable. For example, we can study whether the spray type and spray dosage will affect the count of insects through two-way ANOVA.

Assumptions of a Two-way ANOVA:

- Observations are independent
- Dependent variable is continuous
- Dependent variable follows normal distribution for every combination of the groups of the two independent variables.
- Each of the two independent variables consists of two or more categorical groups
- Variance of every combination of the groups of the two independent variables are homogenous
- There are no significant outliers.