Tests of Hypotheses and Confidence Intervals

Statistical evaluation and decision is often based on samples rather than on populations in general. For example we wish to decide on the basic of a sample whether an immunization is effective in preventing people from getting a disease.

In order to decide we have to make assumption or just guess about the population - which can be "the population in full", "the total production of beef production" as well as "all children of age less than 16 years" and "every occurence of antibiotic material in the seas around Denmark".


If 30 tosses of a die yield 10 "ones" you would perhaps say that the die is unfair. But perhaps this single serie of 30 tosses - even though random - turn out to be a little special. A priori we expect "the one" to come out 1/6 of the times. Forget for a moment the problem deciding how large a sample to choose. We can make a hypothesis then that there is no diffence between the final result from the sample and the result from the whole population. We will call such an hypothesis a Null Hypothesis (H0). In this example we call the following result the expected: p =1/6 (S=5), because we know from the probability calculation that the chance of a "one" "in both the whole population and in a sample" is 1/6. The alternative to H0 is P>< 1/6 or S><5. One alternative more, and yet another type of hypothesis may be S< 7.

Test Design Problems

The procedure which enable us to decide whether to accept or reject a hypothesis H0 or to determine whether the sample differs significant from the expected result or the difference might be explained as pure random, and if the result from the sample is included in the area of confidence it is called a test of hypothesis or test of confidence.

To get a good trust worthy results from tests of hypotheses or other basics of decision the test must be designed to minimize errors of decision. This is not simple, especially not, if the size of sample is given, some errors may be more serious than others, and if you reduce one type you may increase the influence of another. Even if the sample is not given, it is often difficult to alter the size of a sample, think of the number of patients with a specific diagnosis who may suffer pain before the sample size about pain-treatment has been changed, think of new knowledge about relations between variables collected and registred some years ago etc.

Level of confidence

In designing a test of a hypothesis we could choose a confidence level of 0.95 which means that we allowe a random variation so that in 5 of 100 it will make us reject the hypothesis, when we should have accepted it, or perhaps just in 1 of 100 that means we have to choose a test of H0 at confidence level of 0.99. This means that we accept to get the wrong answer - rejecting the hypothesis - in 1 of 100. This type of errors is called Type 1 error. If we on the other hand reject a hypothesis, when it should been accepted, it is called a Type 2 error.

Normal Distribution

If assumed that the sample is normal distributed - which is the case in most of the cases - the sample has a mean m s and a standard deviation s s - the distribution in the example above is typically normal. That means the number of "ones" everytime we select a sample 30 tosses might vary, but if the sample is large (>=30) the number of "ones" concentrate about 5, and the following curve will illustrate the distribution of the number of "ones" in the sample. The standardized normal distribution has been chosen with z=(X -m s)/s s on the 1. axis , with mean 0 and variance 1, where S is the calculated mean of "ones" of the sample.

The die example above: z = (X - m )/s Ö N , where m is the mean in the population and s is the standard deviation in the population (following from the Binomial Distribution).

The probability to get a "one" and the probability to get "not a one" are respectively

1/6 and 5/6.

Would we accept the H0 p=1/6 even if we got "ones" 8 times in a sample of 30 at confidence level of 1) 0.95 2) with 28 times "one" in a sample of 90 at confidence level 0.99?

N=30, p= X/N=1/6, and q=1-p= 5/6.

m =Np =30*1/6=5 and s =Ö Npq=Ö 30*(1/6)*(5/6)=2.04

The confidence limits:

z= (X-5)/2.04=1.96 or X=9.0

z= (X-5)/2.04=-1.96 or X=1.0

Another method:


5 -1.96*2.04=1.0

A third method:

-1.96< (X-5)/2.04 <1.96

or 1.0<X<9.0



m =Np =90*1/6=18 and s =Ö Npq=Ö 90*(1/6)*(5/6)=3.54


18 -2.58*3.54 =8.9

1) Accept the null hypothesis if you accept a type 1 error in 5 of 100 cases.

2) Reject the null hypothesis if you accept a type 2 error in 1 of 100 cases

You often formulate a hypothesis H0 in order to reject it, because it is often easier to test X= a number than X lesser or larger than a number.

A new fertilizer product was claimed by the supplier to pullute the subsoil water in a measurable amount only in 0.1% of all wells. A sample of 2000 water tests – one from each random selected well - 6 were measured pulluted. Determine whether the claim was legitime.

Let p denote the probability of measurable pullution:

H0: p>=0.1%, and the claim is correct

H0: p<0.1%, and the claim is false

We choose a one-tailed test, since we are interested in determine whether the proportion of pulluted wells is too high.

If the level of significance is taken as 0.01, i.e. if the shaded area in the figure 0.01, then z=2.33. Then the area between 0 and z1 0.49 can be seen from the table of the Normal

Distribution, and form the figure above when you exclude the one tail.

The decision rule:

1. The claim i not legitimate if z is larger than 2.33 (in which case we reject the H0)

2. Otherwise, the claim is legitimate and the observed results are due to chance (in which we accept H0)

If H0 is true, m = Np = 2000*0.001 = 2 and s = Ö Npq = Ö 20000*0.999*0.001 =1.4

Now 6 in standard units: (6-2)/1.4 = 2.857 which is more than 2.33. Thus we have to conclude that the claim is not legitimate, and H0 is rejected.

If not 6 but 5 pulluted wells had been found, the number of standard units would have been 2.14, and this might have made us accept the claim at the same level of confidence.


A hot example from Denmark (not finished)

Provisional English comments to the investigation of fertility among immigrants in Denmark

In Denmark we have got no official correct account of the number immigrantes living in the country. Now the authorities try to make a better basic of their prognoses of the population by estimating the demografic parameter of fertility among foreign women immigrated to Denmark. Fertility is the average number of children expected born by each women in the ages of fertility.
China and Mexico are among the few countries in the world, where UN observed an adjustment of the rate of fertility, even in the native population. The population policies in the mentioned countries had to very restictive to get these decreases. In Denmark the assertion by E. Vesselbo about an adjustment of the rate of fertility is aimed at the islamic immigrantes (about 70% of the group of immigrantes in Denmark for the last 20 years), and the Danish population policy is certainly the opposite of restictive.

Primo November 2000 SFI (Institute of Social Investigations) published a report that assumed the idea that fertility of the foreigners adjust to the fertility of Danish women from the second generation of immigrants. Therefore we assume, they have made an analyses based on a sample of 700 (mentioned both in the media and in the abstract below) covering "time-period of stay in Denmark" and "total number of born children".
We have not read the report yet, but we certainly have a few relevant questions inspired by the reports in the Press:
Was the sample of 700 made random, and how was the parameters secured estimated unbiased?

How was the selection of information managed – by letter included rates of reply (we read in the abstract by interviews) or with help from local authorities controling by use of the Central Personal Number (no confidential information anymore) according to a letter( ref.no. LR/mbj and journal.no 1.3.9-018) of July 7th/2000 On Notification to the Supervision of Data of Private Files mailed from the organization Pension and Assurance in the House of Pension, Copenhagen, and according to Law no 426 on The Central File of Persons, May 31th 2000.

We have links to both in Danish:
Startende med: Personnummeret er ikke fortrolig oplysning

Ny lovgivning fjerner dit privatliv

Earlier (in the late 1980s) the authorities refused to use the Central Personal Number referring to the confidentiality of exactly this number.
Taken this possible new fact coming from a NGO-organization into consideration it is not quite easy to understand why the authorities do not measure the fertility of immigrants directly by use of the Central Personal Number. This would certainly ensure the quality of such an investigation.
Is this to be understood in the sense that the Central Personal Number is still confidential (and even confidential to the authorities), when it has been assigned to an immigrant.
As a new investigation how did it register the information?
Was 700 asked by a questionaire (no, interviews according to the Danish abstract), and was the information controled to be the truth in any way?
Were the immigrants asked about their wishes of having a specific number of children, or were they asked after having born all their children, now included in the third generation?
Was the way most Pakistanians and Turks marry by finding their bride in the homeland included or excluded from the analyses?
Which test was used (no test was used, according to the Danish abstract), and how was it used to make the conclusion that the immigrantes do not have more children than the Danish women, provided they are included in the second generation of immigrants in Denmark?