-
Notifications
You must be signed in to change notification settings - Fork 4
/
ConjMCMCSimple.lhs
388 lines (292 loc) · 11.3 KB
/
ConjMCMCSimple.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
% Bayesian Analysis: A Conjugate Prior and Markov Chain Monte Carlo
% Dominic Steinitz
% 9th March 2014
---
bibliography: Bayes.bib
---
Introduction
============
This is meant to be shorter blog post than normal with the expectation
that the material will be developed further in future blog posts.
A Bayesian will have a prior view of the distribution of some data and
then based on data, update that view. Mostly the updated distribution,
the posterior, will not be expressible as an analytic function and
sampling via Markov Chain Monte Carlo (MCMC) is the only way to determine it.
In some special cases, when the posterior is of the same family of
distributions as the prior, then the posterior is available
analytically and we call the posterior and prior **conjugate**. It
turns out that the normal or Gaussian distribution is conjugate with
respect to a normal likelihood distribution.
This gives us the opportunity to compare MCMC against the analytic
solution and give ourselves more confidence that MCMC really does
deliver the goods.
Some points of note:
* Since we want to display the posterior (and the prior for that
matter), for histograms we use the
[histogram-fill](http://hackage.haskell.org/package/histogram-fill)
package.
* Since we are using Monte Carlo we can use all the cores on our
computer via one of Haskell's parallelization mechanisms.
Preamble
--------
> {-# OPTIONS_GHC -Wall #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans #-}
> {-# LANGUAGE NoMonomorphismRestriction #-}
> module ConjMCMCSimple where
>
> import qualified Data.Vector.Unboxed as V
> import Data.Random.Source.PureMT
> import Data.Random
> import Control.Monad.State
> import Data.Histogram ( asList )
> import qualified Data.Histogram as H
> import Data.Histogram.Fill
> import Data.Histogram.Generic ( Histogram )
> import Data.List
> import Control.Parallel.Strategies
>
> import Diagrams.Backend.Cairo.CmdLine
>
> import Diagrams.Backend.CmdLine
> import Diagrams.Prelude hiding ( sample, render )
>
> import LinRegAux
A Simple Example
================
Analytically
------------
Suppose the prior is $\mu \sim \cal{N}(\mu_0, \sigma_0)$, that is
$$
\pi(\mu) \propto \exp{\bigg( -\frac{(\mu - \mu_0)^2}{2\sigma_0^2}\bigg)}
$$
Our data is IID normal, $x_i \sim \cal{N}(\mu, \sigma)$, where
$\sigma$ is known, so the likelihood is
$$
p(x\,|\,\mu, \sigma) \propto \prod_{i=1}^n \exp{\bigg( -\frac{(x_i - \mu)^2}{2\sigma^2}\bigg)}
$$
The assumption that $\sigma$ is known is unlikely but the point of
this post is to demonstrate MCMC matching an analytic formula.
This gives a posterior of
$$
\begin{aligned}
p(\mu\,|\, \boldsymbol{x}) &\propto \exp{\bigg(
-\frac{(\mu - \mu_0)^2}{2\sigma_0^2}
- \frac{\sum_{i=1}^n(x_i - \mu)^2}{2\sigma^2}\bigg)} \\
&\propto \exp{\bigg[-\frac{1}{2}\bigg(\frac{\mu^2 \sigma^2 -2\sigma^2\mu\mu_0 - 2\sigma_0^2n\bar{x}\mu + \sigma_0^2 n\mu^2}{\sigma^2\sigma_0^2}\bigg)\bigg]} \\
&= \exp{\bigg[-\frac{1}{2}\bigg(\frac{ (n\sigma_0^2 + \sigma^2)\mu^2 - 2(\sigma^2\mu_0 - \sigma_0^2n\bar{x})\mu}{\sigma^2\sigma_0^2}\bigg)\bigg]} \\
&= \exp{\Bigg[-\frac{1}{2}\Bigg(\frac{ \mu^2 - 2\mu\frac{(\sigma^2\mu_0 - \sigma_0^2n\bar{x})}{(n\sigma_0^2 + \sigma^2)}}{\frac{\sigma^2\sigma_0^2}{(n\sigma_0^2 + \sigma^2)}}\Bigg)\Bigg]} \\
&\propto \exp{\Bigg[-\frac{1}{2}\Bigg(\frac{\big(\mu - \frac{(\sigma^2\mu_0 - \sigma_0^2n\bar{x})}{(n\sigma_0^2 + \sigma^2)}\big)^2}{\frac{\sigma^2\sigma_0^2}{(n\sigma_0^2 + \sigma^2)}}\Bigg)\Bigg]}
\end{aligned}
$$
In other words
$$
\mu\,|\, \boldsymbol{x} \sim \cal{N}\bigg(\frac{\sigma^2\mu_0 + n\sigma_0^2\bar{x}}{n\sigma_0^2 + \sigma^2}, \frac{\sigma^2\sigma_0^2}{n\sigma_0^2 + \sigma^2} \bigg)
$$
Writing
$$
\sigma_n^2 = \frac{\sigma^2\sigma_0^2}{n\sigma_n^2 + \sigma^2}
$$
we get
$$
\frac{1}{\sigma_n^2} = \frac{n}{\sigma^2} + \frac{1}{\sigma_0^2}
$$
Thus the precision (the inverse of the variance) of the posterior is
the precision of the prior plus the precision of the data scaled by
the number of observations. This gives a nice illustration of how
Bayesian statistics improves our beliefs.
Writing
$$
\mu_n = \frac{\sigma^2\mu_0 + n\sigma_0^2\bar{x}}{n\sigma_0^2 + \sigma^2}
$$
and
$$
\lambda = 1 / \sigma^2, \, \lambda_0 = 1 / \sigma_0^2, \, \lambda_n = 1 / \sigma_n^2
$$
we see that
$$
\mu_n = \frac{n\bar{x}\lambda + \mu_0\lambda_0}{\lambda_n}
$$
Thus the mean of the posterior is a weight sum of the mean of the
prior and the sample mean scaled by preciscion of the prior and the
precision of the data itself scaled by the number of observations.
Rather arbitrarily let us pick a prior mean of
> mu0 :: Double
> mu0 = 11.0
and express our uncertainty about it with a largish prior variance
> sigma_0 :: Double
> sigma_0 = 2.0
And also arbitrarily let us pick the know variance for the samples as
> sigma :: Double
> sigma = 1.0
```{.dia height='600'}
import ConjMCMCSimple
import LinRegAux
dia = diagNormals [(mu0, sigma_0, blue, "Prior")]
````
We can sample from this in way that looks very similar to
[STAN](http://mc-stan.org) and
[JAGS](http://mcmc-jags.sourceforge.net):
> hierarchicalSample :: MonadRandom m => m Double
> hierarchicalSample = do
> mu <- sample (Normal mu0 sigma_0)
> x <- sample (Normal mu sigma)
> return x
and we didn't need to write a new language for this.
Again arbitrarily let us take
> nSamples :: Int
> nSamples = 10
and use
> arbSeed :: Int
> arbSeed = 2
And then actually generate the samples.
> simpleXs :: [Double]
> simpleXs =
> evalState (replicateM nSamples hierarchicalSample)
> (pureMT $ fromIntegral arbSeed)
Using the formulae we did above we can calculate the posterior
> mu_1, sigma1, simpleNumerator :: Double
> simpleNumerator = fromIntegral nSamples * sigma_0**2 + sigma**2
> mu_1 = (sigma**2 * mu0 + sigma_0**2 * sum simpleXs) / simpleNumerator
> sigma1 = sigma**2 * sigma_0**2 / simpleNumerator
and then compare it against the prior
```{.dia height='600'}
import ConjMCMCSimple
import LinRegAux
dia = diagNormals [(mu0, sigma_0, blue, "Prior"), (mu_1, sigma1, red, "Posterior")]
````
The red posterior shows we are a lot more certain now we have some evidence.
Via Markov Chain Monte Carlo
----------------------------
The theory behinde MCMC is described in a [previous
post](http://idontgetoutmuch.wordpress.com/2013/12/07/haskell-ising-markov-metropolis/). We
need to generate some proposed steps for the chain. We sample from the
normal distribution but we could have used e.g. the gamma.
> normalisedProposals :: Int -> Double -> Int -> [Double]
> normalisedProposals seed sigma nIters =
> evalState (replicateM nIters (sample (Normal 0.0 sigma)))
> (pureMT $ fromIntegral seed)
We also need samples from the uniform distribution
> acceptOrRejects :: Int -> Int -> [Double]
> acceptOrRejects seed nIters =
> evalState (replicateM nIters (sample stdUniform))
> (pureMT $ fromIntegral seed)
And now we can calculate the (un-normalised) prior, likelihood and posterior
> prior :: Double -> Double
> prior mu = exp (-(mu - mu0)**2 / (2 * sigma_0**2))
>
> likelihood :: Double -> [Double] -> Double
> likelihood mu xs = exp (-sum (map (\x -> (x - mu)**2 / (2 * sigma**2)) xs))
>
> posterior :: Double -> [Double] -> Double
> posterior mu xs = likelihood mu xs * prior mu
The [Metropolis
algorithm](http://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm)
tells us that we always jump to a better place but only sometimes jump
to a worse place. We count the number of acceptances as we go.
> acceptanceProb :: Double -> Double -> [Double] -> Double
> acceptanceProb mu mu' xs = min 1.0 ((posterior mu' xs) / (posterior mu xs))
> oneStep :: (Double, Int) -> (Double, Double) -> (Double, Int)
> oneStep (mu, nAccs) (proposedJump, acceptOrReject) =
> if acceptOrReject < acceptanceProb mu (mu + proposedJump) simpleXs
> then (mu + proposedJump, nAccs + 1)
> else (mu, nAccs)
Now we can actually run our simulation. We set the number of jumps and
a burn in but do not do any thinning.
> nIters, burnIn :: Int
> nIters = 300000
> burnIn = nIters `div` 10
Let us start our chain at
> startMu :: Double
> startMu = 10.0
and set the variance of the jumps to
> jumpVar :: Double
> jumpVar = 0.4
> test :: Int -> [(Double, Int)]
> test seed =
> drop burnIn $
> scanl oneStep (startMu, 0) $
> zip (normalisedProposals seed jumpVar nIters)
> (acceptOrRejects seed nIters)
We put the data into a histogram
> numBins :: Int
> numBins = 400
> hb :: HBuilder Double (Data.Histogram.Generic.Histogram V.Vector BinD Double)
> hb = forceDouble -<< mkSimple (binD lower numBins upper)
> where
> lower = startMu - 1.5*sigma_0
> upper = startMu + 1.5*sigma_0
>
> hist :: Int -> Histogram V.Vector BinD Double
> hist seed = fillBuilder hb (map fst $ test seed)
```{.dia width='800'}
dia = image "diagrams/HistMCMC.png" 1.0 1.0
````
Not bad but a bit lumpy. Let's try a few runs and see if we can smooth
things out.
> hists :: [Histogram V.Vector BinD Double]
> hists = parMap rpar hist [3,4..102]
> emptyHist :: Histogram V.Vector BinD Double
> emptyHist = fillBuilder hb (replicate numBins 0)
>
> smoothHist :: Histogram V.Vector BinD Double
> smoothHist = foldl' (H.zip (+)) emptyHist hists
```{.dia width='800'}
dia = image "diagrams/SmoothHistMCMC.png" 1.0 1.0
````
Quite nice and had my machine running at 750% with +RTS -N8.
Comparison
----------
Let's create the same histogram but from the posterior created analytically.
> analPosterior :: [Double]
> analPosterior =
> evalState (replicateM (nIters - burnIn) (sample (Normal mu_1 (sqrt sigma1))))
> (pureMT $ fromIntegral 5)
>
> histAnal :: Histogram V.Vector BinD Double
> histAnal = fillBuilder hb analPosterior
And then compare them. Because they overlap so well, we show the MCMC, both and the analytic on separate charts.
```{.dia width='800'}
dia = image "diagrams/HistMCMC.png" 1.0 1.0
===
image "diagrams/HistMCMCAnal.png" 1.0 1.0
===
image "diagrams/HistAnal.png" 1.0 1.0
````
PostAmble
=========
Normally with BlogLiteratelyD, we can generate diagrams on the
fly. However, here we want to run the simulations in parallel so we
need to actually compile something.
~~~~ { .shell }
ghc -O2 ConjMCMCSimple.lhs -main-is ConjMCMCSimple -threaded -fforce-recomp
~~~~
> displayHeader :: FilePath -> Diagram B R2 -> IO ()
> displayHeader fn =
> mainRender ( DiagramOpts (Just 900) (Just 700) fn
> , DiagramLoopOpts False Nothing 0
> )
> main :: IO ()
> main = do
> displayHeader "diagrams/HistMCMC.png"
> (barDiag MCMC
> (zip (map fst $ asList (hist 2)) (map snd $ asList (hist 2)))
> (zip (map fst $ asList histAnal) (map snd $ asList histAnal)))
>
> displayHeader "diagrams/HistMCMCAnal.png"
> (barDiag MCMCAnal
> (zip (map fst $ asList (hist 2)) (map snd $ asList (hist 2)))
> (zip (map fst $ asList histAnal) (map snd $ asList histAnal)))
>
> displayHeader "diagrams/HistAnal.png"
> (barDiag Anal
> (zip (map fst $ asList (hist 2)) (map snd $ asList (hist 2)))
> (zip (map fst $ asList histAnal) (map snd $ asList histAnal)))
>
> displayHeader "diagrams/SmoothHistMCMC.png"
> (barDiag MCMC
> (zip (map fst $ asList smoothHist) (map snd $ asList smoothHist))
> (zip (map fst $ asList histAnal) (map snd $ asList histAnal)))