ch16chi-square-tests.Rmd

# Chi-square-tests {#ch-chi-square-tests}

## Introduction {#sec:ch16introduction}

Earlier, we already saw that we cannot always make use of a 
parametric test such as the *t* test or analysis of variance, because
the collected data do not satisfy the assumptions. If the collected data 
have not been measured on an interval level of measurement (see Chapter 
\@ref(ch-levelsofmeasurement)), or if the probability distribution
of the data is far from normal (see
§\@ref(sec:whatifnotnormal)), then a non-parametric test is to be 
preferred over such a parametric test. If the collected data
do satisfy the assumptions for a parametric test, then a non-parametric
test is less sensitive (more conservative) than a parametric test, i.e. the 
non-parametric test requires a larger effect and/or a larger sample, and generally
has less power than a parametric test when seeking out an effect
(see Chapter \@ref(ch-power)).

In this Chapter, we discuss the most used non-parametric
test: the so-called $\chi^2$ test, pronounced as "chi-square-test" (with the greek letter "chi"). 

## $\chi^2$ test for "goodness of fit" in single sample {#sec:chi2gof}

Data of nominal level of measurement are often analysed with the 
$\chi^2$ test. The number of dots on a dice
is an example of a dependent variable of nominal level of measurement:
there is no physical ordering between the six sides, and each side of a die has
an equally high probability of appearing on the top. Imagine we throw
a die $60\times$, and find the following frequencies
of the six possible outcomes: $14, 9, 11, 10, 15, 1$. This can be 
considered to be a sample of $n=60$ throws from an infinite
population of possible throws, and the outcome frequencies reported here should
be seen as a contingency table of 1 row and 6
columns (i.e. 6 cells). How high is the probability of this distribution
of outcomes? Is the die indeed honest?

The $\chi^2$ test is based on the differences between the expected
and observed frequencies. According to the null hypothesis (H0: the die is honest),
we expect 10 outcomes in each cell ($60/6=10$), i.e. the 
expected frequency is identical for each cell (this is called
a *uniform* distribution).
The observed outcomes deviate from the expected frequencies of outcomes, 
in particular because the outcome "six" barely occurs in this sample. This
might of course also have happened by chance. The $\chi^2$ test indicates how high the probability is of this uneven distribution of outcomes (or an even more uneven distribution),
if H0 is true.
The expected outcomes are thus deduced from a distribution of the outcomes
according to H0, and we investigate how well the observed outcomes 
fit the expected outcomes. This form of the $\chi^2$ test is thus also 
referred to as a test of the 'goodness of fit'.

For this example, we find the outcome of the testing $\chi^2=12.44$
with 5 degrees of freedom (see
§\@ref(sec:ttest-freedomdegrees) for explanation about
degrees of freedom), with $p=.03$. We usually use the computer to 
calculate this probability value, but we can also estimate this probability
via a table with critical
$\chi^2$-values, see Appendix \@ref(app-criticalchi2values), and footnote
[^fn16-1]). 
If H0 is true, then we have only 3% probability
of finding this outcome (or an even more uneven distribution of outcomes).
The significance $p$ found is smaller than $\alpha=.05$, and we thus
reject H0. We conclude that this die is not honest: the distribution
of outcomes found deviates significantly from the expected
distribution according to H0. 

## $\chi^2$ test for homogeneity of a variable in multiple samples

The $\chi^2$ test can also be used for a research design with *one* nominal
variable which we have observed in two or more samples. The
question is then whether the distribution of the observations over the
categories is equal for the different samples. This test is comparable with
*t* tests for two independent samples
(§\@ref(sec:ttest-indep)). We usually then summarise the numbers of observations
with a contingency table with multiple rows for the different samples,
and multiple columns for the categories of the nominal dependent variable (see also
Table \@ref(tab:cito-contingency-table)).

The $\chi^2$-test is again based on the differences between the expected and
observed frequencies. According to the null hypothesis (there is no difference
in distribution between the two samples), the distribution of observations
across the columns should be approximately equal for all rows 
(and vice versa).

## $\chi^2$ test for association between two variables in single sample

Finally, the $\chi^2$ test can equally well be used for a research design
with *two* nominal variables, which we have observed in a single 
sample. The question then is whether the distribution of observations
over the second variable's categories is equal for the different
categories of the first variable (and vice versa). We again summarise
the numbers of observations in a contingency table with multiple rows for 
the categories of the first nominal variable, and multiple columns
for the categories of the second nominal variable.

Here too, the $\chi^2$-test is based on the differences between the expected and
observed frequencies. According to the null hypothesis (that there is no association
between the two nominal variables), the distribution of observations across
rows should be approximately equal for all columns, and vice versa. However, this
does *not* mean that we expect the same frequency for all cells.
This is illustrated in the following example. 

---

> *Example 16.1*: In the early morning of 15th April 1912, the *Titanic*
sunk in the Atlantic Ocean. Many of those on board lost their lives.
Those on board could be divided into four classes (1st/2nd/3rd class passengers, and crew). Was the outcome of the disaster (whether the individual survived the 
disaster or not) approximately equal for persons of these four classes?
The contingency table \@ref(tab:titanic) provides the distributions of
outcomes. 

Table: (#tab:titanic) Distribution of those on board the *Titanic* ($N=2201$), 
according to passage
and status (survived or not). Data taken from the dataset 
`Titanic` in R. 
                        
  Class     Died   Survived   Total
  -------- ------ ---------- --------
  1st         122   203         325
  2nd         167   118         285
  3rd         528   178         706
  Crew        673   212         885
  Total      1490   711         2201

> For the expected frequencies, we have to take into account
the different numbers of those on board in the different classes,
and the unequal distribution of outcomes (1490 non-survivors and 711 
survivors). If there were no association between the class and the survival
status, we would expect there to be 220 non-survivors amongst the first class
passengers $[(1490/2201) \times 325 = (325 \times 1490) / 2201 = 220]$ 
and 105 non-survivors $[(711/2201) \times 325 = (325 \times 711) / 2201 =
105]$. In this way, we can determine the expected frequencies for each cell,
taking into account the marginal totals. With the help of these 
expected frequencies, we then calculate $\chi^2=190.4$, here with 3 d.f.,
$p<.001$. The significance $p$ found is smaller than $\alpha=.001$, and
we thus reject H0. We conclude that the outcome of the disaster (died or survived)
was *unevenly* distributed for the four classes of those on board the *Titanic*.

---

For the analysis of contingency tables which consist of precisely
$2\times2$ cells, the Phi coefficient is an effective alternative
(see §\@ref(sec:Phi)).

Reread and remember the warnings about correlation and causality
(§\@ref(sec:correlationcausation))
--- these are also applicable here. 

## assumptions {#sec:chi2test-assumptions}

The $\chi^2$-test requires three assumptions which must be satisfied
in order to use the test.

* The data have to be measured on a nominal level of measurement, or have
to be simplified to nominal level (see Chapter \@ref(ch-levelsofmeasurement)).

* All observations have to be independent of each other, and based
on (a) random sample(s) of the population(s) (see
§\@ref(sec:random-samples)), or on random assignment of the elements
from the sample to experimental conditions (randomisation, see
§\@ref(sec:internalvalidity), point 5). Each element for the sample can thus only
contribute one observation to one cell[^fn16-2].

* The sample has to be large enough so that the expected frequency
($E$) for each cell is at least 5. If the expected frequency or frequencies
in one or more cells is/are less than 5, then reduce the number
of cells by merging bordering cells, and determine the expected
frequencies again. 

## formulas

The test statistic $\chi^2$ is defined as 
\begin{equation}
  (\#eq:chisquared)
    \chi^2 = \sum \frac{(O-E)^2}{E}
\end{equation}
in which $O$ and $E$ indicate the observed and expected numbers of observations for each cell of the frequency table [@Ferg89]. The expected
numbers might also be rational numbers (e.g. $45/6$ for the 6 possible
outcomes of an honest die, if we throw 
$45\times$). The larger the difference $(O-E)$ in one or several cells,
the larger also $\chi^2$ will be (see below). Due to squaring, the test
statistic $\chi^2$ is always null or positive, and never negative
[@Ferg89].

The probability distribution of the test statistic $\chi^2$ is determined by the number of degrees of freedom (see §\@ref(sec:ttest-freedomdegrees) for explanation of this concept).
For a $\chi^2$-test with one nominal variable ("goodness of fit"), the number of degrees of freedom must be equal to the number of cells minus 1. For a  $\chi^2$-test with multiple samples (homogeneity) and/or with two variables (correlations), with respectively $k$ and $m$ categories, the number of degrees of freedom is equal to $(k-1)\times(m-1)$.

For each cell of the frequency table, in row $i$ and column $j$, we can also compute the raw residual:
\begin{equation}
  (\#eq:chi2-rawresidu)
  e_{ij} = \frac{(O_{ij}-E_{ij})}{\sqrt{E_{ij}}}
\end{equation}
If we square these raw residuals and then sum the squares, the result is the $\chi^2$ test statistic given in Eq.\@ref(eq:chisquared) above.  

It is more insightful to compute the *standardized* residual for each cell of the frequency table [@Agre07, p.38]. The standardization means that the standard error of the residuals is taken into account (by using row totals $R_i$, column totals $C_j$, and the grand total $N$):
\begin{equation}
  (\#eq:chi2-stdresidu)
  e_{ij} = \frac{(O_{ij}-E_{ij})}{\sqrt{E_{ij}\times(1-\frac{R_i}{N})\times(1-\frac{C_j}{N})}}
\end{equation}
These standardized residuals may be interpreted as standard normal $Z$ scores, using the critical $Z$ values given in Appendix \@ref(app-criticalZvalues). Hence the adjusted standardized residuals provide insight in the source of a significant outcome of the $\chi^2$ test, and they also allow us to assess the contribution of each cell to that outcome[^fn16-3]. 

[^fn16-3]: If multiple comparisons are performed, then the critical value of $\alpha$ should be adjusted accordingly, in order to prevent Type I errors somewhere among the comparisons. With $k$ cells and $k$ comparisons, a safe precaution is to use $\alpha/k$ instead of $\alpha$ for each comparison -- this is called Bonferroni's adjustement of the $\alpha$ value, or Dunn's procedure [@MD04, p.202]. See also §\@ref(sec:anova-oneway-posthoc). 

For the example given in §\@ref(sec:chi2gof) we find the following six  standardized residuals for the six possible outcomes of the die: $(1.39, -0.35, +0.35, 0.00, 1.73, -3.12)$. The first five of these outcomes are observed approximately as frequently as expected, but the sixth of these outcomes is observed significantly less often than expected ($p=.003$).  

## SPSS

### goodness of fit: preparation

If we want to investigate a nominal variable, then it must of course
be marked as a column in the SPSS data file. Every observation
forms a separate row in the data file, and the nominal independent
variable is a column in the data file. 

Sometimes, we do not have the separate observations (rows) but
do have the table of numbers of observations per category of the nominal 
variable. We can work further with these. Let us say that we have two columns,
named `outcome` and `number`, as follows
(see §\@ref(sec:chi2gof)):

```
Outcome Number
 1        14
 2         9
 3        11
 4        10
 5        15
 6         1
```

Next, each cell (row) has to get a weight that is as large as the
`number` of observations, which is named here in the second column: the
first cell (row) weighs $14\times$, the second cell (row) weighs
$9\times$ etc. Thanks to this trick, we do not have to fill in $N=60$ rows
(a row for each observation), but only 6 rows (a row for each cell). 
```
Data > Weigh Cases... 
```
Choose `Weigh cases by...` and select the variable `number` in
entry field. Confirm with `OK`.

Choose and select the variable `number` in
input field. Confirm with `OK`.

### goodness of fit: testing

```
Analyze > Nonparametric tests > Legacy Dialogs > Chi-square...
```
Select the variables `outcome` (in "Test variable list" panel) and
indicate that we expect *equal* numbers of observations in each cell.
(It is also possible to enter other expected frequencies here,
if other, unequal frequencies are expected according to H0.)
Confirm with `OK`.

### contingency tables: preparation

If we want to investigate two nominal variables, then they must
both be marked as columns in the SPSS data file. Each observation
forms a separate row in the data file, and the nominal variables
are columns in the data file. For Example 16.1 above, we then use a "long"
data file, consisting of $N=2201$ rows, with a separate row for each person
on board, with at least two columns, for `class` and
`survivor`.

Sometimes, we do not have the separate observations (rows) but 
do have the contingency table of numbers of observations for each
combination of categories of the nominal variables. We can also
work further with these. Let us say that we have three columns, named
`class`, `survivor` and `number`, as follows:

```
Class  Survivor   Number
1st     no         122
1st     yes        203
2nd     no         167
2nd     yes        118
3rd     no         528
3rd     yes        178
crew    no         673
crew    yes        212
```
Next, each cell (row) has to get a weight which is as large as
the `number` of observations, which is named in the third column: the
first cell (row) weighs $122\times$, the second cell (row) weighs
$203\times$, etc. With this trick, we do not have to enter $N=2201$ rows
(a row for each observation), but only 8 rows (a row
for each cell). 
```
Data > Weigh Cases... 
```
Choose `Weigh cases by...` and select the variable `number` in
entry field. Confirm with `OK`.

### contingency tables: testing

The testing proceeds in the same way as described in 
§\@ref(sec:Phi) for 
the association between two nominal variables. 

```
Analyze > Descriptives > Crosstabs...
```

Select the variables `class` (in "Rows" panel) and `survivor` (in 
"Columns" panel) for
contingency table \@ref(tab:titanic).\
Choose `Statistics…` and tick the option `Chi-square`. Confirm firstly with
`Continue` and afterwards again with `OK`.

## JASP

### goodness of fit: preparation

The nominal data to investigate are typically coded as a "long" column in the data file. Each observation typically forms a separate row in the data file, and the nominal independent variable is a column in the data file. However, for the "goodness of fit" $\chi^2$ test in JASP, the data have to be entered not in this "long" fashion (with $N$ rows), but in the form of a summary of numbers of observations (counts, frequency) per category of the nominal variable (with $k$ rows, one row for each of $k$ categories). 

For the example in §\@ref(sec:chi2gof) these summary data would look like this:
```
outcome count
 1        14
 2         9
 3        11
 4        10
 5        15
 6         1
```
In order to enter these data in JASP, create a data file (using e.g. Excel or any text editor) with the contents as listed above, including the column headers. Save the file in CSV format (`.csv`, not `.xlsx`) and open it in JASP. 

### goodness of fit: testing

In the top menu bar, choose
```
Frequencies > Classical: Multinomial Test
```
Select the variable containing the categories of the nominal variable, here `outcome`, and place it in the entry field "Factor". 
Select the variable containing the counts (frequencies) of each category, and place it in the entry field "Count". 

Under "Test Values" there are two options. \
If you choose `Equal proportions (multinomial test)`, a special version of the $\chi^2$ test will be performed, testing for a uniform distribution (as explained above, this means that the expected frequency is equal for each outcome category). In this example, this H0 implies that the die is honest, which is exactly what we want to test here.\
If you choose `Expected proportions (chi-square test)`, you may adjust the expected frequencies in each cell. Use this option if your H0 postulates a non-uniform (e.g. gaussian) distribution. A table will appear, in which you must enter the expected frequencies according to *H0* for each category or cell. By default, the values in this table are all equal, so that the default is equivalent to the "equal proportions" or uniform H0 in the first option. 

You may also check `Descriptives` and `Confidence interval` under the heading "Additional Statistics", and check `Descriptives plot` under "Plots", so as to gain better insight in the patterns in your data. 

In JASP it is not possible to obtain the (adjusted) standardized residuals for this test; however you can compute these manually from the observed and expected counts. 

### contingency tables: preparation

The nominal data to investigate are typically coded as two or more "long" columns in the data file. Each observation (e.g. each person on the Titanic, in Example 16.1) corresponds with a separate row in the data file, and the nominal variables are in columns in the data file (e.g. `class` and `outcome`). We can use such a "long" data file for creating a contingency table in JASP, and for performing a $\chi^2$ test on that contingency table --- see the end of the next subsection for further instructions.

However, for performing a $\chi^2$ test on a contingency table in JASP, the data do not necessarily have to be entered in this "long" fashion (with $N$ rows); the data may also be in the form of a summary of numbers of observations (counts, frequency) per category of the nominal variable (with $k$ rows, one row for each of $k$ cells or combinations of categories). 

For example 16.1, the data would then look as follows:
```
class   outcome    count
1st     died       122
1st     survived   203
2nd     died       167
2nd     survived   118
3rd     died       528
3rd     survived   178
crew    died       673
crew    survived   212
```
In order to enter these data in JASP, create a data file (using e.g. Excel or any text editor) with the contents as listed above, including the column headers. Save the file in CSV format (`.csv`, not `.xlsx`) and open it in JASP. 

### contingency tables: testing

The $\chi^2$ test on a contingency table proceeds in the same way as described in 
§\@ref(sec:Phi) for association between two nominal variables. 

In the top menu bar, choose:
```
Frequencies > Classical: Contingency Tables
```
Select one nominal variable (`class`) in the "Rows" field, and the other nominal variable (`outcome`) in the "Columns" field, to set up the contingency table (Table \@ref(tab:titanic)). 
Select the variable `count` into the "Counts" field; this specifies the numbers of observations for each cell.\
Open the `Statistics` section bar, and check the option `Chi-square` ($\chi^2$).
Open the `Cells` section bar, and check the option `Expected counts`. 

The resulting value of the $\chi^2$ test statistic is reported in the output under **Chi-Squared Tests**. 

If you have a "long" data sheet, with one observation per row, then you only need to select one nominal variable (`class`) in the "Rows" field, and the other nominal variable (`outcome`) in the "Columns" field, to set up the contingency table (Table \@ref(tab:titanic)).\
Open the `Statistics` section bar, and check the option `Chi-square` ($\chi^2$). 
Open the `Cells` section bar, and check the option `Expected counts`. Also check the `Pearson residuals` and `Standardized (adjusted Pearson)`. 

The number of survivors is significantly larger than expected under H0 for passengers in first and second class (positive standardized residuals for these cells, both $p<.001$) and significantly lower than expected under H0 for passenger in third class and for crew (negative standardized residuals, both $p<.001$).  


## R

### goodness of fit: testing

```{r}
chisq.test( c( 14, 9, 11, 10, 15, 1 ) ) -> dobbel.chi2.htest # die §16.2
print(dobbel.chi2.htest)
dobbel.chi2.htest$residuals # raw residuals
sum( (dobbel.chi2.htest$residuals)^2 ) # chi2 = sum of sq of raw resid
dobbel.chi2.htest$stdres # standardized residuals
```

### contingency table: preparation and testing

In R, the dataset `Titanic` is provided as a multidimensional matrix. We sum 
the observations and make a contingency table of the first dimension (class) and
the fourth dimension (outcome).
```{r}
apply( Titanic, c(1,4), sum ) -> Titanic.classoutcome
```

Next, we use the contingency (frequency) table as the input for a `chisq.test`. 
The resulting `chisq.htest` object is saved within R in order to inspect its residuals. 
```{r}
chisq.test( Titanic.classoutcome ) -> Titanic.chisq.htest
print(Titanic.chisq.htest)
Titanic.chisq.htest$stdres # standardized residuals
```

The adjusted standardized residuals show the remarkably high number of survivors among the first class passengers, and the remarkably low number of survivors among the ship's crew. 
Note that R here reports the standardized (but not otherwise adjusted) Pearson residuals. 

## Effect size: odds ratio

When using the $\chi^2$-test, the effect size can be reported in the form
of the so-called "odds ratio". The 'odds ratio' is derived from the contingency
table with frequencies per cell; the odds ratio is most commonly used with $2 \times 2$ contingency tables. 
We will explain all these matters using the following example of a $2 \times 2$ contingency table. 

---

> *Example 16.2*:
@DollHill1956 investigated the relation between smoking
and lung cancer. They first surveyed all British doctors about
their age and smoking behaviour. Next, the researchers kept up over the years with
the death notices and cause of death of all those surveyed. The first
outcomes, after more than four years, are summarised in 
Table \@ref(tab:dollhill).

Table: (#tab:dollhill) Contingency table of $N=24354$ British doctors of 35 years 
and older for the first survey, divided according to smoking behaviour (rows: (non-)
smoker currently or previously) and according to death by lung cancer in the last
4 years (columns), with letter indication for the numbers
of observations. 

  Smoking     No lung cancer             Lung cancer              Total
  ---------- ---------------- ------- --------------- -------   --------  -----------
  No (0)               3092   (A)                 1   (B)         3093     (A+B)
  Yes (1)              21178   (C)                83   (D)        21261     (C+D)
  Total                24270  (A+C)               84  (B+D)       24354   (A+B+C+D)

> In the usual manner, we find $\chi^2=10.35$, df=1, $p<.01$. We conclude that
there is an association between smoking behaviour and death from lung cancer. 

---

For the effect size, we firstly calculate the 'odds' of death from
lung cancer for the smokers: D/C= $83/21178 =0.00392$. Amongst the smokers,
there are 83 deaths from lung cancer, compared with 21178
persons not dying from lung cancer (the 'odds' of dying from
lung cancer are 1 in 0.00392). For the non-smokers: 
B/A=$1/3092 =0.00032$ (the 'odds' are 1 in 0.00032).

We call the *ratio* of these two 'odds' for the two groups the
'odds ratio' (abbreviated OR). In this example, we find (D/C) / (B/A) =
AD/BC =
$(3092 \times 83) / (1 \times 2178) = (0.00392)/(0.00032) = 12.1$. The
'odds' of dying from lung cancer are thus more than $12\times$ 
as great for the smokers as for the non-smokers. We report this as follows:

> @DollHill1956 found a significant relation between
> smoking behaviour and death from lung cancer,
> $\chi^2(1)=10.35, p<.01, \textrm{OR}=12.1$. The 'odds' of dying from
> lung cancer seemed to be more than $12\times$ as great for smokers as for 
> non-smokers.


[^fn16-1]: The value found $\chi^2=12.44$ is slightly under the critical value for 5 d.f. and $p=.03$, (there $(\chi^2)^*=12.83$), thus the corresponding probability of this value or a larger value is slightly greater than $0.03$.

[^fn16-2]: If one variable's observations are paired rather than independent (e.g. before/after treatment, passed/failed, etc.), then the McNemar test is a useful alternative.