OneFactor Analysis of Variance
We want a test of whether there are diffenrences among various groups on some characteristic of the response variable. If there is evidence of such differences, a second analysis involves finding just which group are different and the degree to which they differ.
The analysis of variance procedure involves partitioning this measure of the total variation of the observation about the overall mean into two independent parts. One of these is, BBS, is the portion of the total that is explained by the difference among the sample means. The other part, WSS, is the portion of the total that cannot be explained by these difference (among the means).
When we measure (for example) apparently under the same circumstances the same phenomenons time after time, we may notice a variation in the result measured by the mean. You may recognize a similar variation if you measure the same phenomenon in other areas. But perhaps you recognize differences from area to area. Are we sure we are measuring precisely the same phenomenons in the different areas, when we look at the varing result? Can we explain the differences in the sample means by chance? To answer this we have to analyze the variance and compare relatively to chance fluctuations. How can we measure this chance fluctuation?
120 tests (N) of the antibiotic amount in water of 8 areas (k) 

Test No (n) Areas (k) 

Y1 
2 
3 
4 
5 
6 
7 
Y8 

X1 
81 
81 
86 
87 
83 
59 
78 
94 
2 
56 
92 
94 
88 
76 
67 
96 
88 
3 
58 
88 
82 
86 
55 
74 
86 
77 
4 
81 
86 
87 
89 
90 
90 
76 
57 
5 
78 
90 
79 
90 
91 
61 
57 
77 
6 
83 
77 
88 
87 
86 
91 
58 
75 
7 
71 
74 
78 
86 
84 
57 
76 
78 
8 
55 
65 
62 
77 
93 
56 
89 
83 
9 
76 
78 
90 
78 
76 
89 
94 
84 
10 
85 
81 
89 
77 
52 
77 
57 
85 
11 
83 
78 
77 
75 
75 
76 
87 
76 
12 
58 
76 
75 
93 
87 
56 
53 
78 
13 
77 
79 
57 
91 
75 
91 
81 
90 
14 
56 
57 
87 
90 
86 
65 
91 
87 
X15 
54 
79 
83 
84 
76 
69 
74 
67 
Yg 
70 
79 
81 
85 
79 
72 
77 
80 
(Yi1Y1g)^2 
(Yi2Y2g)^2 
… 
… 
… 
… 
… 
(Yi8Y8g)^2 

81 
4 
25 
4 
9 
196 
1 
196 

256 
169 
169 
9 
16 
36 
361 
64 

196 
81 
1 
1 
625 
1 
81 
9 

81 
49 
36 
16 
100 
289 
1 
529 

36 
121 
4 
25 
121 
144 
400 
9 

121 
4 
49 
4 
36 
324 
361 
25 

1 
25 
9 
1 
16 
256 
1 
4 

289 
196 
361 
64 
169 
289 
144 
9 

16 
1 
81 
49 
16 
256 
289 
16 

169 
4 
64 
64 
784 
16 
400 
25 

121 
1 
16 
100 
25 
9 
100 
16 

196 
9 
36 
64 
49 
289 
576 
4 

25 
0 
576 
36 
25 
324 
16 
100 

256 
484 
36 
25 
36 
64 
196 
49 

260 
0 
4 
1 
9 
8 
9 
162 

S sq. 
2104 
1148 
1467 
463 
2036 
2501 
2936 
1217 
St. dev. 
280,57 
153,08 
195,64 
61,79 
271,47 
333,50 
391,47 
162,29 
Variance 
16,8 
12,4 
14,0 
7,9 
16,5 
18,3 
19,8 
12,7 
WSS: 
S i S j of sq. = 
13873,42 

Degrees of freedom: 
total sample size – number of groups 
90  8 = 
82 

Within Estimate of Variance: 
WSS/Nk = 
169,19 

The overall mean of Y is the mean of the combined sample of N=120, which is: 


(S i S j)/N = 
78 

BBS: 
Between Sums of Squares: 
S ni(Ygj Ygg)^2= 
2482,19 

This sum of squares is based on degrees of freedom k1, number of groups 1, 
8 1= 
7 

and 
BSS/k1 = 
354,60 

so that the between estimate of variance is: 
F=BSS(k1)/WSS(Nk): 
2,10 
Source 
Sum of Squares 
DF 
Mean Square 
F 
Prop>F 
Between 
2482,19 
k 1=7 
354,60 
2,10 
P<0.05 (?) 
Within 
13873,42 
k(n 1)=112 
169,19 

Total 
16355,61 
kn 1=119 
Whenever Ho is true, this ration F will have a value near 1, where Ho "no difference"
in the population means.
Few tables of the F distribution are so comprehensive that they include all possible combinations of degrees of freedom. I can mention:
F.95(7, 120) = 2.09 and F.99 (7,120) = 2.79
F.95 (7, º ) = 2.01 and F.99 (7, º ) = 2.64
The conclusion is that mean square sb^2 = 354.60 is so much larger than the mean square sw^2 = 169.19 that difference is slightly significant.
The sum of squares between the area means plus the sum of of squares within the same area should add up to the total sum of squares. Each sum of squares is also called variation. When we divide the variation by the appropriate degrees of freedom, we get the estimated variance. The differences between the columns is "explained" by the fact that values come from different parent populations (areas/pullution/other unrecongnized phenomenons). The variance within the columns may be explained by apparatus of measurement, the responsible person who measures or by quite other facts not known yet. Without knowing the relative importance of the differences of variations (and how large a rate is explained by chance), you cannot go on doing it better, because you do not know where to begin. The form of variance analyses concentrated of here is also called the simple analyses of variance. It assumes that sets of observations are classified into k groups. In each group the number (m) may vary or be helt constant. If varying you have to use the weight average and introduce the constant weights. A constant number in the groups is the simplest and used here. But the other methods do not vary very much whether these assumptions are chanced a little. 