Chi-Square Tests

Chi square (c^2) is a very popular form of hypotesis testing, but also one that is subject to substantial abuse. You have to consider that you might reject a hypotesis while it is correct and also might confirm a hypotesis while it is false. Now we shall see that you also might make another mistake, simply ask the wrong question before the analyzing has begun.

We shall look at alternatives to Chi square. Results obtained in samples do not always agree exactly with theoretical results according to the rules of probability. It is often good to know whether the observed frequences differ signifi- cantly from the expected frequences. But there are several difficulties and also several opportunities to come out with unreliable results, as we shall see.

 

Tests for Goodness of fit – or more logical, Badness-of-fit Statistic
As we in the following will concentrate on messured observed values and expected values according to the rules of probability, we will use o as an observed value and e as an expected value.

Event

E1

E2

….

Ek

Observed
frequency

o1

o2

 

ok

Expected
frequency

e1

e2

 

ek

The Chi square c ^2 is defined:

(o1 - e1)^2/e1 + (o2 - e2)^2/e2 +…… + (on - en)^2/en = S (oj – ej)^2/ej >=0

j from 1 to k

"By how much does the observed frequency diviate from the expected observed"?

Example:

You throw a die 180 times. You expect the die to show 1, 2, 3, 4, 5 and 6 1/6 of the 180 times that means 30 times 1, 30 times 2… 30 times 6.

But actually observe the following result compared with the expectations:

Event

1

2

3

4

5

6

Observed

23

34

26

39

20

38

Expected
Observed

30

30

30

30

30

30

c ^2 = [(23 - 30)^2 + (34 - 30)^2 + (26 - 30)^2 + (20 - 30)^2 + (38 - 30)^2]/30

= (49 + 16 + 16 + 100 + 64)/30

= 8.17

If c ^2 = 0 the expected and the observed frequences agree exactly. When c ^2 >0 they do not agree. The larger c ^2 is , the greater is the discrepancy between the observed and the expected frequences.

When S oj = S ej = N (here 180)

c ^2 = (S oj^2/ej) - N, j from 1 to k

Imagine that if you make the experiment with 180 throws of die a lot of times, you will then expect to obtain variable frequences of observed events.

 

Confidence Intervals for c ^2
As can be done with other distributions, we can define 95%, 99% or other confidence limits and intervals for Chi square and estimate the population standard deviation s in terms of a sample standard deviation s by using the table showing Chi square distribution.

Degrees of Freedom
The n degrees of freedom in a sample of n observation is reduced when you calculate the average or any other parameter that must be estimated beforehand and you use this/ those to calculate further. This means that you have in reality manipulated the raw data so that e
j fit oj a little better. Therefore the reduction of degrees of freedom is justified.

In the example above the number of degrees of freedom is 6-1=5

You cannot claim that your dia is fair (come il faut): c ^2 0.90 < 8.17<c ^2 0.95

 

Contingency Tables
Example:

If you classify 2000 salmon in Norway by the number of lice found on a fish and the registred breeding region (South, North) of the fish.

 

 

 

 

 

 

i

j

Lice

1

2

3

4

 

 

 

Total
frequency

 

 

Relative
frequency

0-10

10-30

30-50

50-

Region

1

South

281

251

260

288

1080

0.540

2

North

221

240

251

208

920

0.460

Total
frequency

502

491

511

496

2000

1.000

Relative
frequency

0.251

0.246

0.256

0.248

1.000

 

 

Zero hypthesis (H0): Assumming independence

 

Estimated bivariate probabilities

c ^2 - calculations

1.

Bivariate
Probabilities

0.136

0.133

0.138

0.134

0.115

0.113

0.118

0.114

 

2.

Expected frequences Eij

272

266

276

268

230

226

236

228

 

3.

Deviations
(Oij - Eij)

9

-15

-16

20

-9

14

15

-20

4.

Relative squared
deviation
(Oij - Eij)^2/ Eij

0.298

0.846

0.926

1.493

0.352

0.867

0.953

1.754


Sc^2 = 7.489

Number of Degrees of Freedom = (r - 1)(c -1) (here: (2-1)(4-1) =3)

The observed c ^2 =7.489 is backeted by c ^2 0.10 =6.25 and c ^2 0.05 = 7.81

Thus: 0.90< prop-value < 0.95


The zero hypothesis that the number of lice does not differ significant from North to South must be rejected at the customary 5% level (and even at 10% level).

Learn more about tests of hypotheses, levels of significance and Prop-Value in the link about this.

Critics:
The test completely misses the point of asking the wrong question. The right question perhaps concerns salmon breeded unnaturally in fabrics and salmon breeding free or salmon living close together and salmon living more distributed in nature. You perhaps register that the first group seems to have more lice than the other, but this assumption has perhaps nothing to do with the regions South and North. And perhaps it would be of interest to know how the sample of the 2000 individuals actually took place. Are we sure that North/South indicates or add anything else to the assumption of indepence between the two groups.

If you have found a significant difference in the number of lice given a specific level of significance – perhaps in another corrected analyses – you even do not find out how much this difference actually amounts or othewise means. If c ^2 is close to zero you have to look at the analyses with suspion since it is rare that observed frequences agree to well with expected frequences (how did they collect the sample?). Less than c ^2 0.05 or c ^2 0.01 we would decide ("come il faut", but perhaps not in analyses of recovery by medical treatment) that the agreement is to good.

Confidence Interval as an alternative
To compare the difference in the two means of two regions is often much easier and better.

Another Example of Contingency Tables
Two groups, A and B, consist of 200 people who have a desease. A serum is given to group A but not to group B (which we call the control group; otherwise, the two groups are treated identically). It is found in group A and B that 152 and 131 people, respectively recover from the desease. Let us test the hypotesis that serum helps to cure the disease at a level of significance 0.01, 0.05 and 0.10 that means we willing to run the risk that our conclusion is random false in 1, 5 and 10 out of 100 cases.

If assume that the serum has no effect (H0), we expect 140 people in each group to recover and 60 in each group not to recover.

Frequences Observed

 

Recover

No recover

Total

Group A

152

48

200

Group B

131

69

200

Total

283

117

400

Frequences Expected under H0

 

Recover

No recover

Total

Group A

140

60

200

Group B

140

60

200

Total

280

120

400

 

c ^2 = ((152 - 140)^2/140)+((131 - 140)^2/140)+((48 - 60)^2/60)+((69 - 60)^2/60)

= 5.358

The number of freedom: (2-1)(2-1) = 1

Since c ^2 0.95 for 1 degree of freedom is read to 3.84 < 5.358 we have to conclude that the results are not significant at a 0.05 level.

We have to conclude that the serum is not effective or else decision pending further tests.

 

 

When numerical variable shall be analysed, to much manipulation of the raw material just dims the picture, and a lot of c ^2-test are made alternatively by me- thods of comparing averages, regression (simple or multiple), other multiple comparisons. One of problems with Chi is that expected and observed values are mixed before you account the result. You might ask if you are testing your observation or you are testing your expec- tations. Its often a quite open question. The conclu- sions are seldom so ambigious when you use the other tests as apparently that of the Chi-tests-analyzer but the results are often clear and better.

A general interpretation of the misuse of Chi:

Causal interpretation almost left..

"Because two phenomenons appear at the same time at best,
coherence do not necessary light from them for this simple reason"