For quantitative data (numerical measurements) we use t tests but they cannot be used for qualitative data (non numerical). For qualitative data we use Chi-Squared ($\chi^2$) Tests. Chi-Squared tests provide an objective method of investigating the probabilities of individuals being in specific categories/groups, hence why it is used for qualitative data.
The Chi-Squared test is used to:
There are rules to follow when using a Chi-Square test:
Both R and Minitab perform Chi-Squared tests. However, it is useful to learn how to carry out the test by hand. The steps are as follows:
\begin{equation} \chi^2 = \sum {\frac{(O-E)^2}{E}} \end{equation}
where:
\begin{align} O &= \text{Observed value} \\ E &= \text{Expected value} \\ \sum{} &= \text{Sum of} \end{align}
We shall now look at an example where we examine the numbers of dogs expressing the different genotypes found in a generation from a Mendelian breeding trial. The Merle (M) gene in dogs gives all white coat colour when present as a homozygote (MM). For the heterozygote (Mm) there will be a dappled coat of normal and dilute pigmentation and white markings on the shoulders and head. The mm genotype is normal coat colour. Theory says that the animal types should be found in a ratio of $1$ (MM) : $2$ (Mm) : $1$ (mm) if two Mm are mated.
After carrying out a series of matings the results are pooled (see table below) and the Mendelian pattern is examined.
Coat Type |
Observed $(O)$ |
---|---|
White |
23 |
Merle |
58 |
Normal |
39 |
Total |
120 |
The hair length and eye colour of all the cats visiting a veterinary practice over five days were recorded.
The corresponding contingency table is given below:
Blue Eyes |
Yellow Eyes |
Green Eyes |
Total |
|
---|---|---|---|---|
'Short Hair' |
25 |
48 |
13 |
86 |
'Long Hair' |
8 |
32 |
24 |
64 |
'Total' |
33 |
80 |
37 |
150 |
The hypothesis we are testing is: There is no association between eye colour and hair length in this particular group of cats. If this is rejected then there must be an association.
We know we need to use a Chi-Squared test because we are dealing with qualitative data, eye colour, short hair and long hair are not numerical.
Like in the above example, we need to calculate the expected values (1 d.p.) for the contingency table.
Here is the contingency table for the expected $(E)$ numbers:
Blue Eyes |
Yellow Eyes |
Green Eyes |
Total |
|
---|---|---|---|---|
'Short Hair' |
18.9 |
45.9 |
21.2 |
86 |
'Long Hair' |
14.1 |
34.1 |
15.8 |
64 |
'Total' |
33 |
80 |
37 |
150 |
We now need to calculate the Chi-Squared values (1 d.p.) using the formula: $\chi^2 = \frac{(O-E)^2}{E}$
Here is the table for the $\chi^2$ values.
Blue Eyes |
Yellow Eyes |
Green Eyes |
|
---|---|---|---|
'Short Hair' |
2.0 |
0.1 |
3.2 |
'Long Hair' |
2.6 |
0.1 |
4.3 |
Our Chi-Squared statistic is $2.0 + 0.1 + 3.2 + 2.6 + 0.1 + 4.3 = 12.3$.
We compare this to a Chi-Squared table on $(\text{Number of Rows} - 1) \times (\text{Number of Columns} -1) = (1 \times 2) = 2$ degrees of freedom. The corresponding $\chi^2$ value is $5.991$ at $P = 0.05$ level.
$12.3$ exceeds $5.991$ so we must reject that there is no association between hair length and eye colour in this group of cats and conclude that an association must therefore exist. Note: Minitab and R both yield $P = 0.002$, which is very significant (i.e. strong evidence that there is an association).
Here are video tutorials for this example in R Studio and Minitab (ver. 16):
Try our Numbas test on hypothesis testing: Practising confidence intervals and hypothesis tests.
To develop these ideas further see the other sections of Hypothesis Tests (Animal Science).
For additional information on topics covered in this section see the main site's page on hypothesis testing.