One-Factor Analysis of Variance

We want a test of whether there are diffenrences among various groups on some characteristic of the response variable. If there is evidence of such differences, a second analysis involves finding just which group are different and the degree to which they differ.

The analysis of variance procedure involves partitioning this measure of the total variation of the observation about the overall mean into two independent parts. One of these is, BBS, is the portion of the total that is explained by the difference among the sample means. The other part, WSS, is the portion of the total that cannot be explained by these difference (among the means).

When we measure (for example) apparently under the same circumstances the same phenomenons time after time, we may notice a variation in the result measured by the mean. You may recognize a similar variation if you measure the same phenomenon in other areas. But perhaps you recognize differences from area to area. Are we sure we are measuring precisely the same phenomenons in the different areas, when we look at the varing result? Can we explain the differences in the sample means by chance? To answer this we have to analyze the variance and compare relatively to chance fluctuations. How can we measure this chance fluctuation?

 

120 tests (N) of the antibiotic amount in water of 8 areas (k)

Test No

(n) Areas (k)

Y1

2

3

4

5

6

7

Y8

X1

81

81

86

87

83

59

78

94

2

56

92

94

88

76

67

96

88

3

58

88

82

86

55

74

86

77

4

81

86

87

89

90

90

76

57

5

78

90

79

90

91

61

57

77

6

83

77

88

87

86

91

58

75

7

71

74

78

86

84

57

76

78

8

55

65

62

77

93

56

89

83

9

76

78

90

78

76

89

94

84

10

85

81

89

77

52

77

57

85

11

83

78

77

75

75

76

87

76

12

58

76

75

93

87

56

53

78

13

77

79

57

91

75

91

81

90

14

56

57

87

90

86

65

91

87

X15

54

79

83

84

76

69

74

67

Yg

70

79

81

85

79

72

77

80

 

 

(Yi1-Y1g)^2

(Yi2-Y2g)^2

(Yi8-Y8g)^2

81

4

25

4

9

196

1

196

256

169

169

9

16

36

361

64

196

81

1

1

625

1

81

9

81

49

36

16

100

289

1

529

36

121

4

25

121

144

400

9

121

4

49

4

36

324

361

25

1

25

9

1

16

256

1

4

289

196

361

64

169

289

144

9

16

1

81

49

16

256

289

16

169

4

64

64

784

16

400

25

121

1

16

100

25

9

100

16

196

9

36

64

49

289

576

4

25

0

576

36

25

324

16

100

256

484

36

25

36

64

196

49

260

0

4

1

9

8

9

162

S sq.

2104

1148

1467

463

2036

2501

2936

1217

St. dev.

280,57

153,08

195,64

61,79

271,47

333,50

391,47

162,29

Variance

16,8

12,4

14,0

7,9

16,5

18,3

19,8

12,7

 

 

WSS:

S i S j of sq. =

13873,42

Degrees of freedom:

total sample size – number of groups

90 - 8 =

82

Within Estimate of Variance:

WSS/N-k =

169,19

The overall mean of Y is the mean of

the combined sample of N=120, which is:

(S i S j)/N =

78

BBS:

Between Sums of Squares:

S ni(Ygj -Ygg)^2=

2482,19

This sum of squares is based on degrees of freedom k-1,

number of groups -1,

8 -1=

7

and

BSS/k-1 =

354,60

so that the between estimate of variance is:

F=BSS(k-1)/WSS(N-k):

2,10

 

 

 

Source

Sum of Squares

DF

Mean Square

F

Prop>F

Between

2482,19

k -1=7

354,60

2,10

P<0.05 (?)

Within

13873,42

k(n -1)=112

169,19

Total

16355,61

kn -1=119

Whenever Ho is true, this ration F will have a value near 1, where Ho "no difference"

in the population means.

Few tables of the F- distribution are so comprehensive that they include all possible combinations of degrees of freedom. I can mention:

F.95(7, 120) = 2.09 and F.99 (7,120) = 2.79

F.95 (7, º ) = 2.01 and F.99 (7, º ) = 2.64

The conclusion is that mean square sb^2 = 354.60 is so much larger than the mean square sw^2 = 169.19 that difference is slightly significant.

 

 

The sum of squares between the area means plus the sum of of squares within the same area should add up to the total sum of squares. Each sum of squares is also called variation. When we divide the variation by the appropriate degrees of freedom, we get the estimated variance.

The differences between the columns is "explained" by the fact that values come from different parent populations (areas/pullution/other unrecongnized phenomenons). The variance within the columns may be explained by apparatus of measurement, the responsible person who measures or by quite other facts not known yet.

Without knowing the relative importance of the differences of variations (and how large a rate is explained by chance), you cannot go on doing it better, because you do not know where to begin.

The form of variance analyses concentrated of here is also called the simple analyses of variance. It assumes that sets of observations are classified into k groups. In each group the number (m) may vary or be helt constant. If varying you have to use the weight average and introduce the constant weights. A constant number in the groups is the simplest and used here. But the other methods do not vary very much whether these assumptions are chanced a little.