ANOVA

One-way ANOVA

Test if the means of three or more sampled populations are the same or not.
H₀:μ₁=μ₂=...=μ_k H_A: at least one has a different mean.
Assumptions: data is normal (but robust unless highly skewed); samples independent; variances equal (but robust to deviation if sample sizes ~equal), populations are categorized in one way/factor/treatment.
"Based on two different estimates of a common population variance."
Test independence of a categorical and a numeric variable?
ANOVA on two samples is same as Pooled 2-Samp T-test.

Type or paste the samples' data, row by row:

Samples' statistics:
sample# n mean s

#samples k: Total # of data N=∑n_i: Grand mean X̿:

Between: SS_B= ∑n_i(x̄_i-X̄)² : df_B =k-1: MS_B [BGMS] =SS_B/df_B: separation among the samples; between-group variance
Within (error) SS_W=∑∑n_i(x_ij-x̄_i)²: df_W =N-k: MS_W [WGMS]=SS_W/df_W: scatter within each sample; within-group variance
SS(Total)= df(Total)=

F =MS_B/MS_W: ratio of variance between samples and variance within samples (i.e. the two estimates of σ²)
If there is no difference in the means, the between-group variance estimate will be roughly the same as the within-group variance estimate, and the F test value will be roughly equal to 1 and so H₀ not rejected.
Critical value: Numerator DF (k-1=dfB) is column, Denominator DF (k(n-1)=N-k=dfW) is row in 0.05 F table.
Critical value (C.V.): α=0.05: If < F, reject H₀.
When the means differ significantly, the between-group variance will be much larger than the within-group variance, and F will be much greater than 1 and so reject H₀.

p_value:

If separation between the samples ("signal") is large relative to the scatter ("noise") within the samples, F is large and it's likely that at least one of the samples came from a population different than the others. But how many and which ones? (use Scheffe and Tukey tests)

If H₀ rejected, use Scheffe test to find significant differences in means. Look at each pair of samples.
F_s = (x̄_i-x̄_j)² / MS_W(1/n_i+1/n_j)
Critical value is F' = (k-1)(C.V)=
i j x̄_i x̄_j Fs
* significant difference between this pair.
(Occasionally doesn't find when the difference is in the average of two or more means compared to the other mean.)

If H₀ rejected and all samples are same size, use Tukey test to find significant differences in means. Look at each pair of samples.
q = (x̄_i-x̄_j) / √(MS_W/n)
Critical value: Numerator DF=k is column, Denominator DF (k(n-1)=N-k=dfW) is row in 0.05 Tukey table.
Critical value @α=0.05:
i j x̄_i x̄_j q
* significant difference between this pair, i.e. |q|>crit.val..

row-by-row  small:   midsize:  large:  SUV: 
253 143 124 301 422 324 258 271 467 298 315 304
 117 121 204 195 186 178 157 203 132 212 229 235
 249 90 178 114 183 87 180 103 154 129 266 338 
121 112 261 145 198 193 193 111 276 156 213 143


skulls
131 138 125 129 132 135 132 134 138
129 134 136 137 137 129 136 138 134
128 138 136 139 141 142 137 145 137

Data Set A
7 3 6 6
6 5 5 8
4 7 6 7

Data Set B
17 13 16 16
6 5 5 8
4 7 6 7

Exercise 1
29 31 35 33 26 32 21 26 25 34 26 34
32 28 26 23 25 26 19 29 26 20 22 22
27 32 39 27 31 26 34 30 34 26 24 31
24 31 31 25 30 39 22 33 34 35 29 26

Clancy-Rowling-Tolstoy 
58.2 73.4 73.1 64.4 72.7 89.2 43.9 76.3 76.4 78.9 69.4 72.9
85.3 84.3 79.5 82.5 80.2 84.6 79.2 70.9 78.6 86.2 74.0 83.7
69.4 64.2 71.4 71.6 68.5 51.9 72.2 74.4 52.8 58.4 65.4 73.6

chars/word
4.8 4.5 4.6 4.5 4.0 4.0 4.6 4.5 4.4 4.4 4.3 4.3
4.1 4.2 4.2 4.4 4.3 4.2 4.5 4.5 4.3 4.0 4.4 4.3
4.3 4.5 4.5 4.5 4.5 4.8 4.3 4.2 4.7 4.3 4.4 4.5

BP reduction: meds, exercise, diet
10 12 9 15 13
6 8 3 0 2
5 9 12 8 4


7 14 32 19 10 11
10 1 1 0 11 1
1 12 1 9 1 11