| Overview
of the Chi Square Test |
|
Chi Square tests, and contingency tables,
are used to test whether counts, or proportions,
are consistent with some specified population
distribution. They can be used to answer
questions such as:
- are people who have seen an advertisement
more likely to purchase a product
- are people of a particular type under,
or over represented, in a group
In both these examples the tests would
discover whether the differences could be
explained by chance, or whether they indicate
that the factor being investigated did affect
the result.
The Chi-Square 'Goodness of Fit' test is
used to test whether a sample is drawn from
a population that conforms to a specified
distribution.
The hypothesis is:
H0 the sample conforms to
the specified distribution
H1 the sample does not conform
to the distribution
The test is illustrated by example. An
organization has three categories of employees,
'A', 'B' and 'C'. It collects the following
data:
| Category |
#
Employees |
Days
Sick |
| A |
100 |
10 |
| B |
60 |
12 |
| C |
40 |
14 |
| Total |
200 |
36 |
From this we form the table. Expected for
'Days Well' is calculated from:

| Category |
#
Employees |
Days Well |
Expected |
Chi-Square
Contribution |
Days
Sick |
Expected |
Chi-Square
Contribution |
| A |
100 |
90 |
82.0 |
0.78 |
10 |
18.0 |
3.56 |
| B |
60 |
48 |
49.2 |
0.03 |
12 |
10.8 |
0.13 |
| C |
40 |
26 |
32.8 |
1.41 |
14 |
7.2 |
6.42 |
| Total |
200 |
164 |
164 |
2.22 |
36 |
36 |
10.11 |
If the sample conformed exactly to the
distribution, the days well and days sick
would be shared out as shown in the expected
column. The chi-square statistic is calculated
by summing the chi-square contributions
from each category:

Where:
Ai actual value for category
'i'
Ei expected value for category
'i'
There are two degrees of freedom (if two
of the 'days sick' data values are known
the third can be calculated from the totals).
The critical p-value can be obtained from
tables, or the p-value can be calculated
using eg. Excel:
=CHIDIST(12.33,2) gives 0.0021
Contingency tables are an
application of the chi-square test used
when the relationship is between two variables.
For example, the organization decides to
investigate whether there is a relationship
between employers who take sick leave, and
who take their full entitlement of annual
leave. The hypothesis is:
H0 there is no relationship
between taking leave and propensity for
sickness
H1 there is a relationship
between taking leave and sickness
The data are as follows:
| |
Sick |
Not
Sick |
Total |
| Take Leave |
65 |
55 |
120 |
| Don't take
leave |
50 |
30 |
80 |
| Total |
115 |
85 |
200 |
The expected values for the individual
cells are found from:

The chi-square contributions for each cell
are calculate from:
The expected values and the chi-square
contribution are
| |
Sick |
Not Sick |
Total |
| Take Leave |
69
(0.23) |
51
(0.31) |
120 |
| Don't take
leave |
46
(0.35) |
34
(0.47) |
80 |
| Total |
115 |
85 |
200 |
The total chi-square value is 1.36. The
number of degrees of freedom can be calculated
from:
(rows - 1) x (column - 1)
This gives one degree of freedom. The number
of degrees of freedom may also be obtained
by considering that given any cell and the
totals, the values in the remaining cells
can be calculated.
From Excel =CHIDIST(1.36,1) the p-value
is 0.24; this would not be accepted at the
0.05 level of significance.
|