ANOVA

One-way ANOVA

Test if the means of three or more sampled populations are the same or not.
H₀:μ₁=μ₂=...=μ_k H_A: at least one has a different mean.
Assumptions: data is normal (but robust unless highly skewed); samples independent; variances equal (but robust to deviation if sample sizes ~equal), populations are categorized in one way/factor/treatment.
"Based on two different estimates of a common population variance."
Test independence of a categorical and a numeric variable: H₀: categories are independent of the numeric variable.
ANOVA on two samples is same as Pooled 2-Samp T-test.

Type or paste the samples' data, row by row:

Samples' statistics:
sample# n mean s

#samples k: Total # of data N=∑n_i: Grand mean X̿:

Between: SS_B= ∑n_i(x̄_i-X̿)² :
(treatment) df_B =k-1: MS_B [BGMS] = SS_B/df_B:
separation among the samples; between-group variance
Within SS_W=∑∑n_i(x_ij-x̄_i)²:
(error) df_W =N-k: MS_W [WGMS]=SS_W/df_W:
scatter within each sample; within-group variance
SS(Total) = df(Total) =

F =MS_B/MS_W: ratio of variance between samples and variance within samples (i.e. the two estimates of σ²)
If there is no difference in the means, the between-group variance estimate will be roughly the same as the within-group variance estimate, and the F test value will be roughly equal to 1 and so H₀ not rejected.

Critical value: Numerator DF (k-1=dfB) is column, Denominator DF (k(n-1)=N-k=dfW) is row in 0.05 F table.
Critical value (C.V.): α=0.05: ** If < F, reject H₀.
When the means differ significantly, the between-group variance will be much larger than the within-group variance, and F will be much greater than 1 and so reject H₀.

p_value: ** If p < α, reject H₀.

If separation between the samples ("signal") is large relative to the scatter ("noise") within the samples, F is large and it's likely that at least one of the samples came from a population different than the others. But how many and which ones? (use Scheffe and Tukey tests)

If H₀ rejected, use Scheffe test to find significant differences in means. Look at each pair of samples.
F_s = (x̄_i-x̄_j)² / MS_W(1/n_i+1/n_j)
Critical value is F' = (k-1)(C.V)=
i j x̄_i x̄_j Fs
* significant difference between this pair.
(Occasionally doesn't find when the difference is in the average of two or more means compared to the other mean.)

If H₀ rejected and all samples are same size, use Tukey range test to find significant differences in means. Looks at each pair of samples.
q = (x̄_i-x̄_j) / √(MS_W/n)
Critical value: Numerator DF=k is column, Denominator DF (k(n-1)=N-k=dfW) is row in 0.05 Tukey table.
Critical value @α=0.05:
i j x̄_i x̄_j q
* significant difference between this pair, i.e. |q|>crit.val..

row-by-row  4 categories: small, midsize, large, SUV. head injuries HIC
253 143 124 301 422 324 258 271 467 298 315 304
 117 121 204 195 186 178 157 203 132 212 229 235
 249 90 178 114 183 87 180 103 154 129 266 338 
121 112 261 145 198 193 193 111 276 156 213 143


skulls   3 archealogical sites, skull measurement
131 138 125 129 132 135 132 134 138
129 134 136 137 137 129 136 138 134
128 138 136 139 141 142 137 145 137

Data Set A
7 3 6 6
6 5 5 8
4 7 6 7

Data Set B
17 13 16 16
6 5 5 8
4 7 6 7

Exercise 1   4 categories: small, midsize, large, SUV.  Chest compression
29 31 35 33 26 32 21 26 25 34 26 34
32 28 26 23 25 26 19 29 26 20 22 22
27 32 39 27 31 26 34 30 34 26 24 31
24 31 31 25 30 39 22 33 34 35 29 26

Exercise 2
8 7 7 7 8 8 6 8 8 7 7 8 8
7 8 7 7 5 8 5 8 7 6 6 6 6
4 9 6 7 9 8 5 8 7 5 4 5 4


Clancy-Rowling-Tolstoy pages.  Flesch Reading Ease Scores. (higher=easier)
58.2 73.4 73.1 64.4 72.7 89.2 43.9 76.3 76.4 78.9 69.4 72.9
85.3 84.3 79.5 82.5 80.2 84.6 79.2 70.9 78.6 86.2 74.0 83.7
69.4 64.2 71.4 71.6 68.5 51.9 72.2 74.4 52.8 58.4 65.4 73.6

chars/word
4.8 4.5 4.6 4.5 4.0 4.0 4.6 4.5 4.4 4.4 4.3 4.3
4.1 4.2 4.2 4.4 4.3 4.2 4.5 4.5 4.3 4.0 4.4 4.3
4.3 4.5 4.5 4.5 4.5 4.8 4.3 4.2 4.7 4.3 4.4 4.5

BP reduction: meds, exercise, diet
10 12 9 15 13
6 8 3 0 2
5 9 12 8 4


7 14 32 19 10 11
10 1 1 0 11 1
1 12 1 9 1 11

4 rice varieties' yields
934 1041 1028  935
880  963  924  946
987  953  976  840
992 1143 1140 1191

unequal-sized samples OK
10.99 9.72 11.29 13.04 9.53 9.53 13.15 11.53 9.06 11.08 9.07 9.06 10.48 6.17 6.55
10.59 9.46 12.78 9.72 8.46 15.66 11.43 12.16 8.43 10.63 12.27 9.12 12.93 10.49 11.27 10.49 16.63 11.96 9.35 14.05 8.94 12.52 7.1 8.67 12.49
13.2 11.5 10.6 10 6.5 8.8 9.6 14.1 12 5.7 11.9 9.8 8.9 12.8 14 13.7 8.4 10 11.9 13.9 9.5 10.4 7.6 7.4 13.4 15 10.7 14 12 9 12 15.6 10.8 15.6 3.1 13.4 11.2 10.1 11.2 5

11.85 8.54 9.71 9.38 8.94 8.13 7.92 8.95 8.35 4.85 5.34 8.73 6.66 9.86 6.57 9.24 3.69 6.67 11.36 14.19

Between: SS_B= ∑n_i(x̄_i-X̿)² : (treatment)	df_B =k-1:	MS_B [BGMS] = SS_B/df_B: separation among the samples; between-group variance
Within SS_W=∑∑n_i(x_ij-x̄_i)²: (error)	df_W =N-k:	MS_W [WGMS]=SS_W/df_W: scatter within each sample; within-group variance
SS(Total) =	df(Total) =