-
Notifications
You must be signed in to change notification settings - Fork 0
/
bowers-methodgames.tex
198 lines (182 loc) · 11.9 KB
/
bowers-methodgames.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
\documentclass[12pt]{article}
\usepackage{parskip}
\usepackage{natbib}
\usepackage{amsmath}
\usepackage{url}
\usepackage[pdftex,colorlinks=true,citecolor=black,raiselinks=false]{hyperref}
%%\usepackage[T1]{fontenc}
%% Soc Meth Formatting
\usepackage{tgtermes}
\usepackage[margin=1.25in]{geometry}
\usepackage{setspace}
\usepackage{footmisc} % for footnote spacing
\title{Method Games:\\ A Proposal for Assessing and Learning About Methods}
\author{Jake Bowers\thanks{Associate Professor, Departments of Political
Science and Statistics \& NCSA, University of Illinois @ Urbana-Champaign
(\texttt{[email protected]}). \textit{Acknowledgements:} David
Collier, Joanne Rodrigues, and Tim Liao helped improve this essay.}
}
\date{\today}
\begin{document}
\begin{titlepage}
\maketitle
\end{titlepage}
% I think this is true double spacing
\setstretch{1.67}
Imagine assessing a promising method for pattern discovery using a game. One
scholar would invent a true pattern of features, generate an outcome and
perhaps hide this pattern amid irrelevant information. For example, the game
designer might provide 15 binary features of 40 cases to the players. Players
would compete to discover the hidden truth. One version of the method game
would require that participants use a particular algorithm. A second version
would allow participants to choose their own algorithm. For example, some
might choose a QCA variant \citep{rihoux2008configurational}, others would
implement an adaptive lasso \citep{zou2006adaptive} and still others might
prefer one of the many competitors to the lasso, such as the smoothly clipped
absolute deviation (SCAD) penalty \citep{fan2001variable}, random forests
~\citep{breiman2001random}, or kernel-regularized least squares
~\citep{hainmueller2012kernel}.)\footnote{\citet{hasttibfried09,james2013introduction} provide an
excellent overview of many of the techniques known as ``machine learning.''
The proposal for method assessment in this essay uses two simple techniques
(adaptive lasso and SCAD) for illustration which are similar to common tools
used by social scientists, but is not limited to supervised linear model-based
algorithms. The particular procedures evaluated here work by fitting
penalized linear models to outcomes aiming to return coefficients of zero
for irrelevant features, thereby revealing relevant features. These
algorithms choose coefficients $\beta_1, \ldots, \beta_P$ for $P$ features
(and arbitrary combinations thereof), $X_{i1}, \ldots, X_{iP}$, related to
some outcome, $y_i$, to minimize a function of the sum of squared prediction
error (i.e. least squares) plus a penalty function that rewards solutions
with some collection of $\beta_p$ set to 0. The objective function tends to
look like $\sum_{i=1}^N (y_i - ( \beta_0 + \beta_1 X_{i1} + \ldots + \beta_P
X_{iP}) )^2 + \sum_{p=1}^P p(\lambda,\beta_p)$. The tuning parameter,
$\lambda$, determines the relative importance of the penalty function in the
objective function compared to the least squares function. The adaptive
lasso penalty is $p(\lambda,\beta_p,w_p)=\lambda w_p|\beta_p|$ where
$w_p=1/\hat{\beta}_p$ and $\hat{\beta}_p$ arises from a previous linear
model (here a ridge regression but often an OLS regression). The SCAD
replaces the lasso penalty with a function designed to have no penalty when
$\beta_p=0$ (like the adaptive lasso) but then to rise smoothly to penalize
very large $\beta_p$ at a decreasing rate:
$p(\lambda,\beta_p,a)=\begin{cases} \lambda |\beta_p|, & \text{if }
|\beta_p|\le \lambda; \\ - \left( \frac{|\beta_p|^2 - 2 a \lambda
|\beta_p| + \lambda^2}{2 (a-1)} \right), & \text{if } \lambda <
|\beta_p| \le a \lambda; \\ \frac{(a+1)\lambda^2}{2}, & \text{if }
|\beta_p| > a \lambda \end{cases}$, where $a > 2$ and $\lambda > 0$. Some
see adaptive lasso as an approximation or competitor to the SCAD penalty
\cite[page 92]{hasttibfried09}. }
%% \citet{fan2001variable} define the penalty by its derivative
%% such that if the penalty function for a given $\beta$ is $p_\lambda(\beta)$ then
%% $p'_{\lambda}(\beta)=\lambda \left\{ I(\beta \le \lambda) + \frac{ (a
%% \lambda - \beta)_{+} }{ (a-1) \lambda} I(\beta > \lambda) \right\}$.
%% }
%%
In the first version of the competition, we would learn about craft: in
different hands the same method may perform differently. The results of this
competition would teach us about the many kinds of substantive and
methodological judgments required to use the method
successfully. In the version of the game where players choose different
approaches, we could learn how different methods compare in their ability to
address a given problem.\footnote{Although we might also confuse learning
about method with a discovery that some researchers have excellent
methodological judgment and luck.}
If, however, time were short or players difficult to recruit, one could
approximate such a game using a computer rather than a group of scholars:
\cite{lucasfk2014} provide one example of how to evaluate one method in
this way. That is, a single scholar could generate a true relationship as if
kicking off a real method game but then write a computer program to compare
the effectiveness of different algorithms in a simple situation. Imagine that
a scholarly literature focusing on 40 cases suggests that a complex dependent
pattern of binary features $X_{i1}, \ldots, X_{iP}$, of a given case $i$,
drives outcomes (say, for $P=5$, $Y_i= \left\{ (X_{i1} \cdot X_{i2} \cdot
X_{i3} ) \text{ OR } ( X_{i4} \cdot X_{i5}) \right\}$ all $X_{ip} \in
\{0,1\}$ and thus $Y_i \in \{0,1\}$). Further, imagine that three methods
suggest themselves as useful a priori: (1) QCA, (2) the adaptive lasso and (3)
iterative sure independence screening with a SCAD plugin (ISIS/SCAD)
\citep{fan2008sure}. \citet{fan2001variable} proved that the SCAD penalty
would correctly set false parameters to 0 as $n \rightarrow \infty$ given a
reasonable choice of tuning parameters in contrast to the simple lasso
proposed by \cite{tibshirani1996regression} --- that is, SCAD has an oracle
property but the simple lasso does not. Later, \citet{zou2006adaptive} showed
a modification of the lasso penalty (the adaptive lasso) does have an oracle
property given well-chosen tuning parameters and weights. And,
\citet{fan2008sure} demonstrate that, when the number of irrelevant features
is much larger than the number of cases (for example, when each case has 4000
measured features but we only observe 40 cases), a preliminary dimension
reduction step (ISIS) improves the performance of the SCAD penalty. Although
QCA does not promise to find the truth as information increases, it appears,
prima facie, well suited to discovering complex comparisons and it does not
require tuning parameters. This essay presents the results from a script that
implements a machine version of the method game to compare QCA, the adaptive
lasso, and ISIS/SCAD.\footnote{Interested readers can download the code from
\url{https://github.com/jwbowers/MethodGames}.}
Notice that this game is relatively easy. The case-knowledge available is only
that the causal features involve the $X$'s and the outcomes are recorded in
the $Y$. Simplifying the case-knowledge requirements here is useful and allows
an assessment of method rather than game player: all of the machine players
have the same case knowledge and will use it in the same manner. Further,
machine players are naive. Assessing the performance of a machine will not
tell us about the craft by which human scholars exploit a method. Further,
any single collection of case attributes can idiosyncratically advantage one
method over another. For fairness, and to approximate the kind of natural
variation one would see if different human players were involved in the game,
the script generated a different set of features for each machine player
although the outcomes arose from the same deterministic true function as
described above (i.e. the case-knowledge is held constant across players and
scenarios). This competition involved 800 players each using all three
approaches to seek the truth. The script runs two contests. The easier of the
two games presents players with a five column dataset: each column represents
a part of the truth, and players focus on finding the true combinations of the
existing features. The hard game differs from the easy game only in that the
data set contains 10 irrelevant case features in addition to the original 5.
The script counts a player as successful if it found the truth and only the
truth. In the easy game, QCA, the adaptive lasso, and ISIS/SCAD found the
truth and only the truth for 18\%, 82\%, and 96\% of the players respectively.
In the hard game, QCA, the adaptive lasso, and ISIS/SCAD found the truth and
only the truth for 0\%, 33\%, and 61\% of the players respectively.
One should not interpret these results as severe criticism of the adaptive
lasso or QCA. Remember, that this essay is a proposal for evaluating methods,
is agnostic to the methods themselves, and proposes a machine-based approach
only as a low-budget way to approximate the real competition among humans.
However, this essay \emph{does} claim that all methods must be able to be
evaluated: as a minimal standard, given a true set of relationships, a good
method should find the truth. In the machine learning literature, such
critical evaluation drives innovation. For example, a large and growing
literature both criticizes and builds on the adaptive lasso. For example, if
the features are highly inter-dependent, we might prefer adaptive versions of
the fused lasso \citep{rinaldo2009properties}, the grouped lasso
\citep{wang2008note}, or the elastic net \citep{ghosh2011grouped,
zou2004regression}. And future methodology building on the simple QCA might
adapt insights from machine learning to overcome current shortcomings or
advise against the use of QCA for particular designs and data. Most scholars
would prefer a technique that recovers the truth more than 60\% of the time,
so one might use these results to motivate work to improve the performance of
the ISIS/SCAD or to find a substitute. Obviously, a real competition with
skilled human players with excellent judgment might have produced different
results. After all, those who investigate and modify the script will notice
little expertise and craft in use of the techniques: for example, the tuning
parameters for the adaptive lasso were chosen fairly naively, only one
open-source implementation of QCA was used, and many other small but
potentially consequential decisions appear throughout. This essay also
highlights the importance of the real, human-based, game. For example, in this
example the case knowledge required was simple, and perhaps a scholar might
have been able to identify the noise variables in some preliminary steps in
the process if these data arose from real world observations. The fact that
case knowledge plays a role in the use of pattern recognition/machine learning
is yet another reason why a human-oriented game would be useful --- as long as
the humans who were successful in finding the pre-arranged truth were able to
detail their decisions and reasons for their decisions such that other humans
could repeat their analyses, we would learn about the world and about the
method.
Fruitful communication about methods involves comparing the successes of
different methods in the hands of different human scholars confronting
specific research designs, theoretical goals, and existing observations. The
method game proposed here would help us learn not only about a method in
an abstract sense, but about the craft of using said method in comparison with
other methods. A machine version of the method game enables a fast and cheap
and controlled way to begin to build a comparative understanding of the
methods and/or to motivate people to engage in a real method game.
\bibliographystyle{asr}
%\bibliography{/Users/jwbowers/Documents/PROJECTS/BIB/big}
\bibliography{methodgames}
\end{document}