-
Notifications
You must be signed in to change notification settings - Fork 4
/
ch16chi-square-tests.Rmd
477 lines (386 loc) · 24.7 KB
/
ch16chi-square-tests.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
# Chi-square-tests {#ch-chi-square-tests}
## Introduction {#sec:ch16introduction}
Earlier, we already saw that we cannot always make use of a
parametric test such as the *t* test or analysis of variance, because
the collected data do not satisfy the assumptions. If the collected data
have not been measured on an interval level of measurement (see Chapter
\@ref(ch-levelsofmeasurement)), or if the probability distribution
of the data is far from normal (see
§\@ref(sec:whatifnotnormal)), then a non-parametric test is to be
preferred over such a parametric test. If the collected data
do satisfy the assumptions for a parametric test, then a non-parametric
test is less sensitive (more conservative) than a parametric test, i.e. the
non-parametric test requires a larger effect and/or a larger sample, and generally
has less power than a parametric test when seeking out an effect
(see Chapter \@ref(ch-power)).
In this Chapter, we discuss the most used non-parametric
test: the so-called $\chi^2$ test, pronounced as "chi-square-test" (with the greek letter "chi").
## $\chi^2$ test for "goodness of fit" in single sample {#sec:chi2gof}
Data of nominal level of measurement are often analysed with the
$\chi^2$ test. The number of dots on a dice
is an example of a dependent variable of nominal level of measurement:
there is no physical ordering between the six sides, and each side of a die has
an equally high probability of appearing on the top. Imagine we throw
a die $60\times$, and find the following frequencies
of the six possible outcomes: $14, 9, 11, 10, 15, 1$. This can be
considered to be a sample of $n=60$ throws from an infinite
population of possible throws, and the outcome frequencies reported here should
be seen as a contingency table of 1 row and 6
columns (i.e. 6 cells). How high is the probability of this distribution
of outcomes? Is the die indeed honest?
The $\chi^2$ test is based on the differences between the expected
and observed frequencies. According to the null hypothesis (H0: the die is honest),
we expect 10 outcomes in each cell ($60/6=10$), i.e. the
expected frequency is identical for each cell (this is called
a *uniform* distribution).
The observed outcomes deviate from the expected frequencies of outcomes,
in particular because the outcome "six" barely occurs in this sample. This
might of course also have happened by chance. The $\chi^2$ test indicates how high the probability is of this uneven distribution of outcomes (or an even more uneven distribution),
if H0 is true.
The expected outcomes are thus deduced from a distribution of the outcomes
according to H0, and we investigate how well the observed outcomes
fit the expected outcomes. This form of the $\chi^2$ test is thus also
referred to as a test of the 'goodness of fit'.
For this example, we find the outcome of the testing $\chi^2=12.44$
with 5 degrees of freedom (see
§\@ref(sec:ttest-freedomdegrees) for explanation about
degrees of freedom), with $p=.03$. We usually use the computer to
calculate this probability value, but we can also estimate this probability
via a table with critical
$\chi^2$-values, see Appendix \@ref(app-criticalchi2values), and footnote
[^fn16-1]).
If H0 is true, then we have only 3% probability
of finding this outcome (or an even more uneven distribution of outcomes).
The significance $p$ found is smaller than $\alpha=.05$, and we thus
reject H0. We conclude that this die is not honest: the distribution
of outcomes found deviates significantly from the expected
distribution according to H0.
## $\chi^2$ test for homogeneity of a variable in multiple samples
The $\chi^2$ test can also be used for a research design with *one* nominal
variable which we have observed in two or more samples. The
question is then whether the distribution of the observations over the
categories is equal for the different samples. This test is comparable with
*t* tests for two independent samples
(§\@ref(sec:ttest-indep)). We usually then summarise the numbers of observations
with a contingency table with multiple rows for the different samples,
and multiple columns for the categories of the nominal dependent variable (see also
Table \@ref(tab:cito-contingency-table)).
The $\chi^2$-test is again based on the differences between the expected and
observed frequencies. According to the null hypothesis (there is no difference
in distribution between the two samples), the distribution of observations
across the columns should be approximately equal for all rows
(and vice versa).
## $\chi^2$ test for association between two variables in single sample
Finally, the $\chi^2$ test can equally well be used for a research design
with *two* nominal variables, which we have observed in a single
sample. The question then is whether the distribution of observations
over the second variable's categories is equal for the different
categories of the first variable (and vice versa). We again summarise
the numbers of observations in a contingency table with multiple rows for
the categories of the first nominal variable, and multiple columns
for the categories of the second nominal variable.
Here too, the $\chi^2$-test is based on the differences between the expected and
observed frequencies. According to the null hypothesis (that there is no association
between the two nominal variables), the distribution of observations across
rows should be approximately equal for all columns, and vice versa. However, this
does *not* mean that we expect the same frequency for all cells.
This is illustrated in the following example.
---
> *Example 16.1*: In the early morning of 15th April 1912, the *Titanic*
sunk in the Atlantic Ocean. Many of those on board lost their lives.
Those on board could be divided into four classes (1st/2nd/3rd class passengers, and crew). Was the outcome of the disaster (whether the individual survived the
disaster or not) approximately equal for persons of these four classes?
The contingency table \@ref(tab:titanic) provides the distributions of
outcomes.
Table: (#tab:titanic) Distribution of those on board the *Titanic* ($N=2201$),
according to passage
and status (survived or not). Data taken from the dataset
`Titanic` in R.
Class Died Survived Total
-------- ------ ---------- --------
1st 122 203 325
2nd 167 118 285
3rd 528 178 706
Crew 673 212 885
Total 1490 711 2201
> For the expected frequencies, we have to take into account
the different numbers of those on board in the different classes,
and the unequal distribution of outcomes (1490 non-survivors and 711
survivors). If there were no association between the class and the survival
status, we would expect there to be 220 non-survivors amongst the first class
passengers $[(1490/2201) \times 325 = (325 \times 1490) / 2201 = 220]$
and 105 non-survivors $[(711/2201) \times 325 = (325 \times 711) / 2201 =
105]$. In this way, we can determine the expected frequencies for each cell,
taking into account the marginal totals. With the help of these
expected frequencies, we then calculate $\chi^2=190.4$, here with 3 d.f.,
$p<.001$. The significance $p$ found is smaller than $\alpha=.001$, and
we thus reject H0. We conclude that the outcome of the disaster (died or survived)
was *unevenly* distributed for the four classes of those on board the *Titanic*.
---
For the analysis of contingency tables which consist of precisely
$2\times2$ cells, the Phi coefficient is an effective alternative
(see §\@ref(sec:Phi)).
Reread and remember the warnings about correlation and causality
(§\@ref(sec:correlationcausation))
--- these are also applicable here.
## assumptions {#sec:chi2test-assumptions}
The $\chi^2$-test requires three assumptions which must be satisfied
in order to use the test.
* The data have to be measured on a nominal level of measurement, or have
to be simplified to nominal level (see Chapter \@ref(ch-levelsofmeasurement)).
* All observations have to be independent of each other, and based
on (a) random sample(s) of the population(s) (see
§\@ref(sec:random-samples)), or on random assignment of the elements
from the sample to experimental conditions (randomisation, see
§\@ref(sec:internalvalidity), point 5). Each element for the sample can thus only
contribute one observation to one cell[^fn16-2].
* The sample has to be large enough so that the expected frequency
($E$) for each cell is at least 5. If the expected frequency or frequencies
in one or more cells is/are less than 5, then reduce the number
of cells by merging bordering cells, and determine the expected
frequencies again.
## formulas
The test statistic $\chi^2$ is defined as
\begin{equation}
(\#eq:chisquared)
\chi^2 = \sum \frac{(O-E)^2}{E}
\end{equation}
in which $O$ and $E$ indicate the observed and expected numbers of observations for each cell of the frequency table [@Ferg89]. The expected
numbers might also be rational numbers (e.g. $45/6$ for the 6 possible
outcomes of an honest die, if we throw
$45\times$). The larger the difference $(O-E)$ in one or several cells,
the larger also $\chi^2$ will be (see below). Due to squaring, the test
statistic $\chi^2$ is always null or positive, and never negative
[@Ferg89].
The probability distribution of the test statistic $\chi^2$ is determined by the number of degrees of freedom (see §\@ref(sec:ttest-freedomdegrees) for explanation of this concept).
For a $\chi^2$-test with one nominal variable ("goodness of fit"), the number of degrees of freedom must be equal to the number of cells minus 1. For a $\chi^2$-test with multiple samples (homogeneity) and/or with two variables (correlations), with respectively $k$ and $m$ categories, the number of degrees of freedom is equal to $(k-1)\times(m-1)$.
For each cell of the frequency table, in row $i$ and column $j$, we can also compute the raw residual:
\begin{equation}
(\#eq:chi2-rawresidu)
e_{ij} = \frac{(O_{ij}-E_{ij})}{\sqrt{E_{ij}}}
\end{equation}
If we square these raw residuals and then sum the squares, the result is the $\chi^2$ test statistic given in Eq.\@ref(eq:chisquared) above.
It is more insightful to compute the *standardized* residual for each cell of the frequency table [@Agre07, p.38]. The standardization means that the standard error of the residuals is taken into account (by using row totals $R_i$, column totals $C_j$, and the grand total $N$):
\begin{equation}
(\#eq:chi2-stdresidu)
e_{ij} = \frac{(O_{ij}-E_{ij})}{\sqrt{E_{ij}\times(1-\frac{R_i}{N})\times(1-\frac{C_j}{N})}}
\end{equation}
These standardized residuals may be interpreted as standard normal $Z$ scores, using the critical $Z$ values given in Appendix \@ref(app-criticalZvalues). Hence the adjusted standardized residuals provide insight in the source of a significant outcome of the $\chi^2$ test, and they also allow us to assess the contribution of each cell to that outcome[^fn16-3].
[^fn16-3]: If multiple comparisons are performed, then the critical value of $\alpha$ should be adjusted accordingly, in order to prevent Type I errors somewhere among the comparisons. With $k$ cells and $k$ comparisons, a safe precaution is to use $\alpha/k$ instead of $\alpha$ for each comparison -- this is called Bonferroni's adjustement of the $\alpha$ value, or Dunn's procedure [@MD04, p.202]. See also §\@ref(sec:anova-oneway-posthoc).
For the example given in §\@ref(sec:chi2gof) we find the following six standardized residuals for the six possible outcomes of the die: $(1.39, -0.35, +0.35, 0.00, 1.73, -3.12)$. The first five of these outcomes are observed approximately as frequently as expected, but the sixth of these outcomes is observed significantly less often than expected ($p=.003$).
## SPSS
### goodness of fit: preparation
If we want to investigate a nominal variable, then it must of course
be marked as a column in the SPSS data file. Every observation
forms a separate row in the data file, and the nominal independent
variable is a column in the data file.
Sometimes, we do not have the separate observations (rows) but
do have the table of numbers of observations per category of the nominal
variable. We can work further with these. Let us say that we have two columns,
named `outcome` and `number`, as follows
(see §\@ref(sec:chi2gof)):
```
Outcome Number
1 14
2 9
3 11
4 10
5 15
6 1
```
Next, each cell (row) has to get a weight that is as large as the
`number` of observations, which is named here in the second column: the
first cell (row) weighs $14\times$, the second cell (row) weighs
$9\times$ etc. Thanks to this trick, we do not have to fill in $N=60$ rows
(a row for each observation), but only 6 rows (a row for each cell).
```
Data > Weigh Cases...
```
Choose `Weigh cases by...` and select the variable `number` in
entry field. Confirm with `OK`.
Choose and select the variable `number` in
input field. Confirm with `OK`.
### goodness of fit: testing
```
Analyze > Nonparametric tests > Legacy Dialogs > Chi-square...
```
Select the variables `outcome` (in "Test variable list" panel) and
indicate that we expect *equal* numbers of observations in each cell.
(It is also possible to enter other expected frequencies here,
if other, unequal frequencies are expected according to H0.)
Confirm with `OK`.
### contingency tables: preparation
If we want to investigate two nominal variables, then they must
both be marked as columns in the SPSS data file. Each observation
forms a separate row in the data file, and the nominal variables
are columns in the data file. For Example 16.1 above, we then use a "long"
data file, consisting of $N=2201$ rows, with a separate row for each person
on board, with at least two columns, for `class` and
`survivor`.
Sometimes, we do not have the separate observations (rows) but
do have the contingency table of numbers of observations for each
combination of categories of the nominal variables. We can also
work further with these. Let us say that we have three columns, named
`class`, `survivor` and `number`, as follows:
```
Class Survivor Number
1st no 122
1st yes 203
2nd no 167
2nd yes 118
3rd no 528
3rd yes 178
crew no 673
crew yes 212
```
Next, each cell (row) has to get a weight which is as large as
the `number` of observations, which is named in the third column: the
first cell (row) weighs $122\times$, the second cell (row) weighs
$203\times$, etc. With this trick, we do not have to enter $N=2201$ rows
(a row for each observation), but only 8 rows (a row
for each cell).
```
Data > Weigh Cases...
```
Choose `Weigh cases by...` and select the variable `number` in
entry field. Confirm with `OK`.
### contingency tables: testing
The testing proceeds in the same way as described in
§\@ref(sec:Phi) for
the association between two nominal variables.
```
Analyze > Descriptives > Crosstabs...
```
Select the variables `class` (in "Rows" panel) and `survivor` (in
"Columns" panel) for
contingency table \@ref(tab:titanic).\
Choose `Statistics…` and tick the option `Chi-square`. Confirm firstly with
`Continue` and afterwards again with `OK`.
## JASP
### goodness of fit: preparation
The nominal data to investigate are typically coded as a "long" column in the data file. Each observation typically forms a separate row in the data file, and the nominal independent variable is a column in the data file. However, for the "goodness of fit" $\chi^2$ test in JASP, the data have to be entered not in this "long" fashion (with $N$ rows), but in the form of a summary of numbers of observations (counts, frequency) per category of the nominal variable (with $k$ rows, one row for each of $k$ categories).
For the example in §\@ref(sec:chi2gof) these summary data would look like this:
```
outcome count
1 14
2 9
3 11
4 10
5 15
6 1
```
In order to enter these data in JASP, create a data file (using e.g. Excel or any text editor) with the contents as listed above, including the column headers. Save the file in CSV format (`.csv`, not `.xlsx`) and open it in JASP.
### goodness of fit: testing
In the top menu bar, choose
```
Frequencies > Classical: Multinomial Test
```
Select the variable containing the categories of the nominal variable, here `outcome`, and place it in the entry field "Factor".
Select the variable containing the counts (frequencies) of each category, and place it in the entry field "Count".
Under "Test Values" there are two options. \
If you choose `Equal proportions (multinomial test)`, a special version of the $\chi^2$ test will be performed, testing for a uniform distribution (as explained above, this means that the expected frequency is equal for each outcome category). In this example, this H0 implies that the die is honest, which is exactly what we want to test here.\
If you choose `Expected proportions (chi-square test)`, you may adjust the expected frequencies in each cell. Use this option if your H0 postulates a non-uniform (e.g. gaussian) distribution. A table will appear, in which you must enter the expected frequencies according to *H0* for each category or cell. By default, the values in this table are all equal, so that the default is equivalent to the "equal proportions" or uniform H0 in the first option.
You may also check `Descriptives` and `Confidence interval` under the heading "Additional Statistics", and check `Descriptives plot` under "Plots", so as to gain better insight in the patterns in your data.
In JASP it is not possible to obtain the (adjusted) standardized residuals for this test; however you can compute these manually from the observed and expected counts.
### contingency tables: preparation
The nominal data to investigate are typically coded as two or more "long" columns in the data file. Each observation (e.g. each person on the Titanic, in Example 16.1) corresponds with a separate row in the data file, and the nominal variables are in columns in the data file (e.g. `class` and `outcome`). We can use such a "long" data file for creating a contingency table in JASP, and for performing a $\chi^2$ test on that contingency table --- see the end of the next subsection for further instructions.
However, for performing a $\chi^2$ test on a contingency table in JASP, the data do not necessarily have to be entered in this "long" fashion (with $N$ rows); the data may also be in the form of a summary of numbers of observations (counts, frequency) per category of the nominal variable (with $k$ rows, one row for each of $k$ cells or combinations of categories).
For example 16.1, the data would then look as follows:
```
class outcome count
1st died 122
1st survived 203
2nd died 167
2nd survived 118
3rd died 528
3rd survived 178
crew died 673
crew survived 212
```
In order to enter these data in JASP, create a data file (using e.g. Excel or any text editor) with the contents as listed above, including the column headers. Save the file in CSV format (`.csv`, not `.xlsx`) and open it in JASP.
### contingency tables: testing
The $\chi^2$ test on a contingency table proceeds in the same way as described in
§\@ref(sec:Phi) for association between two nominal variables.
In the top menu bar, choose:
```
Frequencies > Classical: Contingency Tables
```
Select one nominal variable (`class`) in the "Rows" field, and the other nominal variable (`outcome`) in the "Columns" field, to set up the contingency table (Table \@ref(tab:titanic)).
Select the variable `count` into the "Counts" field; this specifies the numbers of observations for each cell.\
Open the `Statistics` section bar, and check the option `Chi-square` ($\chi^2$).
Open the `Cells` section bar, and check the option `Expected counts`.
The resulting value of the $\chi^2$ test statistic is reported in the output under **Chi-Squared Tests**.
If you have a "long" data sheet, with one observation per row, then you only need to select one nominal variable (`class`) in the "Rows" field, and the other nominal variable (`outcome`) in the "Columns" field, to set up the contingency table (Table \@ref(tab:titanic)).\
Open the `Statistics` section bar, and check the option `Chi-square` ($\chi^2$).
Open the `Cells` section bar, and check the option `Expected counts`. Also check the `Pearson residuals` and `Standardized (adjusted Pearson)`.
The number of survivors is significantly larger than expected under H0 for passengers in first and second class (positive standardized residuals for these cells, both $p<.001$) and significantly lower than expected under H0 for passenger in third class and for crew (negative standardized residuals, both $p<.001$).
## R
### goodness of fit: testing
```{r}
chisq.test( c( 14, 9, 11, 10, 15, 1 ) ) -> dobbel.chi2.htest # die §16.2
print(dobbel.chi2.htest)
dobbel.chi2.htest$residuals # raw residuals
sum( (dobbel.chi2.htest$residuals)^2 ) # chi2 = sum of sq of raw resid
dobbel.chi2.htest$stdres # standardized residuals
```
### contingency table: preparation and testing
In R, the dataset `Titanic` is provided as a multidimensional matrix. We sum
the observations and make a contingency table of the first dimension (class) and
the fourth dimension (outcome).
```{r}
apply( Titanic, c(1,4), sum ) -> Titanic.classoutcome
```
Next, we use the contingency (frequency) table as the input for a `chisq.test`.
The resulting `chisq.htest` object is saved within R in order to inspect its residuals.
```{r}
chisq.test( Titanic.classoutcome ) -> Titanic.chisq.htest
print(Titanic.chisq.htest)
Titanic.chisq.htest$stdres # standardized residuals
```
The adjusted standardized residuals show the remarkably high number of survivors among the first class passengers, and the remarkably low number of survivors among the ship's crew.
Note that R here reports the standardized (but not otherwise adjusted) Pearson residuals.
## Effect size: odds ratio
When using the $\chi^2$-test, the effect size can be reported in the form
of the so-called "odds ratio". The 'odds ratio' is derived from the contingency
table with frequencies per cell; the odds ratio is most commonly used with $2 \times 2$ contingency tables.
We will explain all these matters using the following example of a $2 \times 2$ contingency table.
---
> *Example 16.2*:
@DollHill1956 investigated the relation between smoking
and lung cancer. They first surveyed all British doctors about
their age and smoking behaviour. Next, the researchers kept up over the years with
the death notices and cause of death of all those surveyed. The first
outcomes, after more than four years, are summarised in
Table \@ref(tab:dollhill).
Table: (#tab:dollhill) Contingency table of $N=24354$ British doctors of 35 years
and older for the first survey, divided according to smoking behaviour (rows: (non-)
smoker currently or previously) and according to death by lung cancer in the last
4 years (columns), with letter indication for the numbers
of observations.
Smoking No lung cancer Lung cancer Total
---------- ---------------- ------- --------------- ------- -------- -----------
No (0) 3092 (A) 1 (B) 3093 (A+B)
Yes (1) 21178 (C) 83 (D) 21261 (C+D)
Total 24270 (A+C) 84 (B+D) 24354 (A+B+C+D)
> In the usual manner, we find $\chi^2=10.35$, df=1, $p<.01$. We conclude that
there is an association between smoking behaviour and death from lung cancer.
---
For the effect size, we firstly calculate the 'odds' of death from
lung cancer for the smokers: D/C= $83/21178 =0.00392$. Amongst the smokers,
there are 83 deaths from lung cancer, compared with 21178
persons not dying from lung cancer (the 'odds' of dying from
lung cancer are 1 in 0.00392). For the non-smokers:
B/A=$1/3092 =0.00032$ (the 'odds' are 1 in 0.00032).
We call the *ratio* of these two 'odds' for the two groups the
'odds ratio' (abbreviated OR). In this example, we find (D/C) / (B/A) =
AD/BC =
$(3092 \times 83) / (1 \times 2178) = (0.00392)/(0.00032) = 12.1$. The
'odds' of dying from lung cancer are thus more than $12\times$
as great for the smokers as for the non-smokers. We report this as follows:
> @DollHill1956 found a significant relation between
> smoking behaviour and death from lung cancer,
> $\chi^2(1)=10.35, p<.01, \textrm{OR}=12.1$. The 'odds' of dying from
> lung cancer seemed to be more than $12\times$ as great for smokers as for
> non-smokers.
[^fn16-1]: The value found $\chi^2=12.44$ is slightly under the critical value for 5 d.f. and $p=.03$, (there $(\chi^2)^*=12.83$), thus the corresponding probability of this value or a larger value is slightly greater than $0.03$.
[^fn16-2]: If one variable's observations are paired rather than independent (e.g. before/after treatment, passed/failed, etc.), then the McNemar test is a useful alternative.