-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-allanvariance.Rmd
executable file
·597 lines (442 loc) · 38 KB
/
03-allanvariance.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
# Allan Variance Calibration Techniques
In this chapter, we will study the Allan variance (AV) and the corresponding calibration techniques. For this purpose, this chapter is organized in the following order:
- Review on MLE-based methods;
- An introduction of the Allan variance;
- Estimation of the Allan variance;
- Allan variance-based estimation.
## Review on MLE-based Methods
In general, inertial sensor stochastic calibration involves the estimation of the parameters of composite stochastic processes. These models are typically difficult to estimate because of their latent features. In the definition below we characterize the class of composite stochastic processes we shall consider here.
```{definition, defCompStocProcesses, name = "Composite Stochastic Processes for IMU Calibration"}
Let $(X_t)$ be a sum of latent independent stochastic process such that:
- $(X_t)$ is made of a sum which includes a subset or all processes in the set $\{$**QN**, **WN**, **AR1**, **RW**, **DR**$\}$, where processes in the subset $\{$**QN**, **WN**, **RW**, **DR**$\}$ can be included only up to once and process **AR1** can be included $k$ times ($0 \leq k < \infty$).
- Let $\mathcal{Q}$ denote an arbitrary compact subset of $\mathbb{R}^+$. Then, the innovation process for processes **WN**, **RW** and **AR1** have respective variances $\sigma^2$, $\gamma^2$ and $\nu^2$ such that $\sigma^2, \gamma^2 \text{ and } \nu^2 \in \mathcal{Q}$ and processes **QN** and **DR** have $Q^2 \in \mathcal{Q}$ and $|\omega|\in \mathcal{Q}$ respectively.
```
There are three main class of estimation techniques that can be used for the estimation of the class of composite stochastic processes for IMU calibration. The methods are the following:
- **Maximum likelihood based methods**: Although these methods are optimal in theory, their applicability is extremely limited due to numerical reasons and tends to perform badly in practice.
- **Allan variance-based methods**: This class of method is the most popular approach for IMU calibration. However, they typically lead to inconsistent estimators and their finite sample performance is often much lower than GMWM-based technique.
- **GMWM and related methods**: In our (very biased) opinion, these techniques are currently the optimal choice for the estimation of the parameters of the class of processes considered in Definition \@ref(def:defCompStocProcesses). As we will see, this method is in fact a formalized version Allan variance-based methods.
Maximum likelihood based approaches are generally inappropriate for the estimation of the class of processes considered in Definition \@ref(def:defCompStocProcesses). In this chapter, we shall avoid a technical discussion on likelihood based method and refer the readers to @stebler2011constrained and @guerrier2016discussion for more details. Instead, we will consider an example to illustrate the numerical issues of this technique.
```{example, exMLEmethod, name="MLE-based IMU calibration"}
Suppose we have a composite stochastic process composed of a WN and an AR1 process, i.e.
\begin{equation*}
\begin{aligned}
Y_t &= \phi_0 Y_{t-1} + Z_t, \;\;\; Z_t \overset{iid}{\sim} \mathcal{N}\left(0, \nu^2_0 \right), \\
U_t &\overset{iid}{\sim} \mathcal{N}\left(0, \sigma^2_0 \right), \;\;\;\; X_t = Y_t + U_t.
\end{aligned}
\end{equation*}
Then we want to estimate the parameter $\boldsymbol{\theta}_0 = \left[\phi_0 \;\; \nu^2_0 \;\; \sigma^2_0\right]^T$.
Since the process is Gaussian, we have
\begin{equation*}
\begin{aligned}
\mathbf{X} \sim \mathcal{N} \left(\mathbf{0}, \boldsymbol{\Sigma}(\boldsymbol{\theta}_0)\right),
\end{aligned}
\end{equation*}
where $\mathbf{X} \equiv [X_1, ..., X_T]^T$ and $\boldsymbol{\Sigma}(\boldsymbol{\theta}_0) \equiv \text{Cov}(\mathbf{X})$. Since $Y_t$ and $U_t$ are independent, we have
\begin{equation*}
\boldsymbol{\Sigma}(\boldsymbol{\theta}_0) = \text{Cov}(\mathbf{X}) = \text{Cov}(\mathbf{Y}) + \text{Cov}(\mathbf{U}) = \frac{\nu^2_0}{1 - \phi_0^2} \left[ \phi_0^{|i-j|}\right]_{i,j = 1, ..., T} + \sigma^2_0 \mathbf{I}_T,
\end{equation*}
where $\mathbf{Y} \equiv [Y_1, ..., Y_T]^T$, $\mathbf{U} \equiv [U_1, ..., U_T]^T$ and where $\mathbf{I}_T$ denotes the identity matrix of dimension $T$. Note that the form of $\text{Cov}(\mathbf{Y})$ is due to the autocovariance of an AR1 which has been discussed in Chapter 2.
So we can now write the log-likelihood function of the model considered here which, up to a constant, can be expressed as
\begin{equation*}
\mathcal{L}\left(\boldsymbol{\theta} | \mathbf{X} \right) = - \log \left( \det \left( \boldsymbol{\Sigma}(\boldsymbol{\theta}) \right)\right) - \mathbf{X}^T \boldsymbol{\Sigma}(\boldsymbol{\theta})^{-1} \mathbf{X}.
\end{equation*}
Therefore, we can find the maximum likelihood estimator for $\boldsymbol{\theta}_0$:
\begin{equation*}
\hat{\boldsymbol{\theta}} = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\text{argmax}} \; \mathcal{L}\left(\boldsymbol{\theta} | \mathbf{X} \right) = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\text{argmin}} \; \text{log} \left( \det \left( \boldsymbol{\Sigma}(\boldsymbol{\theta}) \right)\right) + \mathbf{X}^T \boldsymbol{\Sigma}(\boldsymbol{\theta})^{-1} \mathbf{X}.
\end{equation*}
Unfortunately, the applicability of this estimator is essentially impossible when $T > 10^5$ since every evaluation of this the function $\mathcal{L}\left(\boldsymbol{\theta} | \mathbf{X} \right)$ requires to invert a $T \times T$ matrix, which entails a considerable (and often unrealistic) computational burden.
An alternative approach to compute maximum likelihood estimator for $\boldsymbol{\theta}_0$ is based on the EM-algorithm of @dempster1977maximum. If the process $(Y_t)$ (or $U_t$) was observed, then we could easily estimate the parameters of our model by considering separately the likelihood of both processes. Since the composite process is a state-space model we could use the following approach:
\begin{equation*}
\hat{\boldsymbol{\theta}} = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\text{argmax}} \; \mathcal{L}\left(\boldsymbol{\theta} | \mathbf{X}, \hat{\mathbf{Y}}(\boldsymbol{\theta}) \right),
\end{equation*}
where $\hat{\mathbf{Y}}(\boldsymbol{\theta})$ denotes the estimation of $\mathbf{Y}$ (i.e. the states) based on a Kalman filter assuming $\boldsymbol{\theta}$ to be the correct parameter vector. Unfortunately, this approach suffers from the same computational limitations as the maximum likelihood estimator approach.
<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```
## An Introduction of the Allan Variance
### Definition of the Allan Variance
The Allan variance (AV) is a statistical technique originally developed in the mid-1960s to study the stability of precision oscillators [see e.g. @allan1966statistics]. It can provide information on the types and magnitude of various superimposed noise terms (i.e. composite stochastic processes). This method has been adapted to characterize the properties of a variety of devices including inertial sensors [see @elsheimy08av]. The AV is a measure of variability developed for long term memory processes and can in fact be interpreted as a Haar wavelet coefficient variance [see @percival1994long]. We will discuss this connection further on.
```{definition, defAV, name = "Allan Variance"}
We consider the AV at dyadic scales ($\tau_j$) starting from local averages of the process which can be denoted as
\begin{equation*}
\bar{X}_{t}^{(j)} \equiv \frac{1}{\tau_j} \sum_{i = 1}^{\tau_j} X_{t - \tau_j + i}\, ,
\label{mean.noav}
\end{equation*}
where $\tau_j \equiv 2^j, \; j \in \left\{x \in \mathbb{N} \, : \; 1 \leq x < \log_2 (T) - 1 \right\}$ and therefore determines the number of consecutive observations considered for the average. Then, the AV is defined as
\begin{equation*}
\text{AV}_j \left(X_t \right) \equiv \frac{1}{2} \, \mathbb{E}\left[ \left(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)} \right)^2 \right].
\end{equation*}
```
```{exercise, alterDefScale, name = "Alternative scale definition"}
The definition of the AV is actually valid for $\tau_j = \lfloor2^j\rfloor$ with $j \in \left\{x \in \mathbb{R} \, : \; 1 \leq x < \log_2 (T) - 1 \right\}$. In some case, it could be used to consider this alternative definition [see e.g. @elsheimy08av] but we shall restrict ourselves here to the case where $j \in \left\{x \in \mathbb{N} \, : \; 1 \leq x < \log_2 (T) - 1 \right\}$.
```
```{exercise, notationAV, name = "Notation of the Allan Variance"}
For notational simplicity, we may sometimes replace $\text{AV}_j \left(X_t \right)$ by simply $\phi_j^2$ when the dependence of the AV to the process $(X_t)$ is evident.
```
As highlighted earlier, the AV is, among others, a widely and commonly used approach in engineering for sensor calibration as it is linked to the properties of the process $(X_t)$ as shown in the following lemma [see e.g. @percival2006wavelet for the proof].
```{lemma, lemmavpsd, name = "AV Connection to PSD"}
For a stationary process $(X_t)$ with PSD $S_{X}(f)$ we have
\begin{equation*}
\phi_j^2 \equiv \text{AV}_j \left(X_t \right) = 4 \int_0^{\infty} \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} S_{X}(f) df.
\label{eq:allanvariancePSD_LInk}
\end{equation*}
```
Therefore, this result establishes a direct connection between the AV and PSD. Therefore a natural question is whether the mapping PSD $\mapsto$ AV is one-to-one. @greenhall1998spectral (see Theorem 1) showed that this is actually not the case. This is illustrated in the followsing Section [4.2.2](Spectral Ambiguity of the Allan Variance).
### Spectral Ambiguity of the Allan Variance
Consider two (independent) stochastic processes $(X_t)$ and $(Y_t)$ with respective PSD $S_X(f)$ and $S_Y(f)$. Suppose that $S_X(f) \neq S_Y(f)$, then the two processes will have the same AV if
\begin{equation*}
\Delta \equiv \int_0^{\infty} \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} \Phi(f) df = 0,
\end{equation*}
where $\Phi(f) \equiv S_{X}(f) - S_{Y}(f)$. To show that it is possible that $\Delta = 0$ when $\Phi(f) \neq 0$, we will use the following critical identity:
\begin{equation*}
\sin^4(x) = \sin^2(x) - \frac{1}{4} \sin^2(2x).
\end{equation*}
First, we note that $\Delta$ may be expressed using the above critical identity as follows:
\begin{equation*}
\begin{aligned}
\Delta &= \int_{0}^{\infty} \frac{\sin^4\left(\tau \pi f \right)}{\left(\tau \pi f \right)^2} \Phi(f) df \\
&= \lim_{n \rightarrow -\infty} \int_{2^{n}}^{\infty} \frac{\sin^2\left(\tau \pi f \right) - \frac{1}{4} \sin^2\left(2 \tau \pi f \right) }{\left(\tau \pi f \right)^2} \Phi(f) df .
\end{aligned}
\end{equation*}
Second, by the change of variable $u = 2f$ in the second term, we obtain
\begin{equation*}
\begin{aligned}
\Delta = \lim_{n \rightarrow -\infty} & \Bigg[ \int_{2^{n}}^{\infty} \frac{\sin^2\left(\tau \pi f \right)}{\left(\tau \pi f \right)^2} \Phi(f) df - \frac{1}{2}\int_{2^{n+1}}^{\infty} \frac{\sin^2\left(\tau \pi u \right)}{\left(\tau \pi u \right)^2} \Phi(f) du \Bigg].
\end{aligned}
\end{equation*}
Now suppose that $\Phi(f) = 2 \Phi(2f)$. In this case, we have $\Phi(f) = 2 \Phi(u)$ and therefore we obtain
\begin{equation*}
\begin{aligned}
\Delta &= \lim_{n \rightarrow -\infty} \int_{2^{n}}^{2^{n+1}} \frac{\sin^2\left(\tau \pi f \right)}{\left(\tau \pi f \right)^2} \Phi(f) df = 0.
\end{aligned}
\end{equation*}
This result demonstrates that the mapping from PSD to AV is not necessarily one-to-one. @greenhall1998spectral showed that in the continuous case (i.e. $\tau_j \in \mathbb{R}$) $\Delta = 0$ if and only if $\Phi(f) = 2 \Phi(2f)$. However, the ``only if'' part of this result (while conjectured) is unknown in the discrete case.
### Properties of the Allan Variance
One reason of explaining the widespread use of the AV for sensor calibration is due to the following additivity property, which is particularly convenient to identify composite stochastic processes (see Definition 2.17).
```{corollary, coroaddav, name = "Additivity of the AV"}
Consider two (independent) stochastic processes $(X_t)$ and $(Y_t)$ with respective PSD $S_X(f)$ and $S_Y(f)$. Suppose that we observe the process $Z_t = X_t + Y_t$. Then, we have
\begin{equation*}
\text{AV}_j \left(Z_t \right) = \text{AV}_j \left(X_t \right) + \text{AV}_j \left(Y_t \right).
\end{equation*}
```
```{proof, pf_coroaddav, name = "Additivity of the AV"}
The proof of Additivity of the AV is direct from Lemma \@ref(lem:lemmavpsd). Indeed, since $S_Z(f) = S_X(f) + S_Y(f)$, we have
\begin{equation*}
\begin{aligned}
\text{AV}_j \left(Z_t \right) &= 4 \int_0^{\infty} \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} S_{Z}(f) df\\
&= 4 \int_0^{\infty} \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} S_{X}(f) df + 4 \int_0^{\infty} \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} S_{Y}(f) df\\
&= \text{AV}_j \left(X_t \right) + \text{AV}_j \left(Y_t \right).
\end{aligned}
\end{equation*}
```
<br>
Lemma \@ref(lem:lemmavpsd) is an important result which is very convenient to determine the theoretical AV of a certain stochastic process. However, the applicability of this result is often limited since the integral defined in Section [4.2.2](Spectral Ambiguity of the Allan Variance) can be intractable. An alternative to Lemma \@ref(lem:lemmavpsd) has been proposed by @zhang2008allan and is far advantageous from a computational standpoint.
```{lemma, lemmaavtoacf, name = "AV Connection to ACF"}
For a stationary process $(X_t)$ with variance $\sigma^2_X$ and ACF $\rho(h)$ we have
\begin{equation*}
\text{AV}_j \left(X_t \right) = \frac{\sigma_X^2}{\tau_j^2} \bigg(\tau_j\left[1-\rho(\tau_j)\right]
+ \sum_{i=1}^{\tau_j-1} i \left[2 \rho(\tau_j-i) - \rho(i) - \rho(2\tau_j-i)\right]\bigg).
\end{equation*}
```
The proof of this result is instructive and is presented in @xu2017study.
Using Lemma \@ref(lem:lemmaavtoacf), the exact form of the AV for different stationary processes, such as the general class of ARMA models, can be derived. Moreover, @zhang2008allan provided the theoretical AV for non-stationary processes such as the random walk and ARFIMA models for which the AV, as mentioned earlier, represents a better measure of uncertainty compared to other methods.
Lemma \@ref(lem:lemmaavtoacf) was extended to non-stationary processes in @xu2017study.
```{example, label = avma1, name = "Theoretical AV of an MA(1) Process"}
From the autocovariance we obtain
\begin{equation*}
\rho(h) = \text{corr}\left(X_t, X_{t-h} \right) =\left\{
\begin{array}{cl}
1 &\text{if } h = 0\\
\frac{\theta}{1 + \theta^2} &\text{if } |h| = 1\\
0 &\text{if } |h| > 1.\\
\end{array}
\right.
\end{equation*}
We can now apply the formula given in Lemma \@ref(lem:lemmaavtoacf), which leads to
\begin{equation*}
\begin{aligned}
\text{AV}_j \left(X_t \right) &= \frac{\left(1 + \theta^2 \right) \sigma^2}{\tau_j^2} \bigg(\tau_j
+ \sum_{i=1}^{\tau_j-1} i \left[2 \rho(\tau_j-i) - \rho(i) - \rho(2\tau_j-i)\right]\bigg)\\
&=\frac{\left(1 + \theta^2 \right) \sigma^2}{\tau_j^2} \bigg(\tau_j
+ 2 \sum_{i=1}^{\tau_j-1} i \rho(\tau_j-i) -\sum_{i=1}^{\tau_j-1} i \rho(i) - \sum_{i=1}^{\tau_j-1} i \rho(2\tau_j-i)\bigg)\\
&=\frac{\left(1 + \theta^2 \right) \sigma^2}{\tau_j^2} \left(\tau_j
+ 2 (\tau_j - 1) \rho(1) - \rho(1) \right)\\
&=\frac{\left(1 + \theta^2 \right) \sigma^2}{\tau_j^2} \bigg(\tau_j
+ (2\tau_j - 3) \frac{\theta}{1 + \theta^2} \bigg).
\end{aligned}
\end{equation*}
<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```
## Estimation of the Allan Variance
Several estimators of the AV have been introduced in the literature. The most commonly used one is (probably) the Maximum-Overlapping AV (MOAV) estimator proposed by @percival1994long, which is defined as follows:
```{definition, defmoav, name = "Maximum-Overlapping AV Estimator"}
The MOAV is defined as:
\begin{eqnarray*}
\hat{\phi}_j^2 \equiv \widehat{\text{AV}}_j \left(X_t \right) = \frac{1}{2 \left(T - 2\tau_j + 1\right)} \sum_{k = 2 \tau_j}^{T} \left(\bar{X}_{k}^{(j)} - \bar{X}_{k-\tau_j}^{(j)} \right)^2.
\end{eqnarray*}
```
We will now study the properties of this estimator through the following lemmas.
### Consistency of the MOAV Estimator
```{lemma, lemmconsistencyAV, name = "Consistency of the MOAV Estimator"}
Let $(X_t)$ be such that:
- $(X_t - X_{t-1})$ is a (strongly) stationary process,
- $(X_t - X_{t-1})^2$ has absolutely summable covariance structure,
- $\mathbb{E}\left[(X_t - X_{t-1})^4\right] < \infty$,
Then, we have
$$\widehat{\text{AV}}_j \left(X_t \right) \overset{ \mathcal{P} }{\longrightarrow} \text{AV}_j \left(X_t \right).$$
```
```{proof, pf_lemmconsistencyAV, name = "Consistency of the MOAV Estimator"}
The proof of the result is direct from the theorem of Weak Law of Large Number for Dependent Process in Chapter 3.
Let $Z_t = X_t - X_{t-1}$, then since $Z_t$ is stationary with mean zero then so is $(X_t - X_{t-h})$ for all $h \in \mathbb{Z}$. This directly implies that $\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)}$ is also stationary (since it is based on a linear combination of stationary processes) and so is $Y_t \equiv (\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)})^2$ (since it is based on a time invariant transformation of a stationary process). Moreover, there exist constants $c_h$ such that
\begin{equation*}
\sum_{h = -\infty}^{\infty} \; \gamma_Y (h) = \sum_{h = -\infty}^{\infty} \; c_{|h|} \gamma_{Z^2} (h).
\end{equation*}
Therefore, we obtain
\begin{equation*}
\sum_{h = -\infty}^{\infty} \; |\gamma_Y (h)| = \sum_{h = -\infty}^{\infty} \; |c_{|h|} \gamma_{Z^2} (h)| \leq \sup_{k = 1, ..., \infty} c_k \; \sum_{h = -\infty}^{\infty} \; |\gamma_{Z^2} (h)| < \infty,
\end{equation*}
since both terms are bounded. Using the same approach we have that $\mathbb{E}\left[Y_t^2\right]$ is bounded since $\mathbb{E}\left[Z_t^4\right]$ is bounded. Thus, we can apply WLLN for dependent process on the process $Y_t$, i.e.
\begin{equation*}
\widehat{\text{AVar}}_j \left(X_t \right) = \frac{1}{2} \bar{Z}_t \overset{\mathcal{P}}{\mapsto} \frac{1}{2} \mathbb{E}[{Z}_t] = \text{AVar}_j \left(X_t \right),
\end{equation*}
which concludes the proof.
```
<br>
This result is closely related by the results of @percival1995estimation on the wavelet variance. We shall explore the connection between the AV and wavelet variance later.
### Asymptotic Normality of the MOAV Estimator
Compared to consistency, the asymptotic normality of the MOAV estimator requires stronger conditions given in the following lemma.
```{lemma, label = lemmaasyav, name = "Asymptotic Normality of the MOAV Estimator"}
Let $(X_t)$ be such that:
- $(X_t - X_{t-1})$ is a (strongly) stationary process.
- $(X_t - X_{t-1})$ is a strong mixing process with mixing coefficient $\alpha(n)$ such that $\sum_{n=1}^{\infty} \alpha(n)^{\frac{\delta}{2+\delta}} < \infty$ for some $\delta > 0$.
- $\mathbb{E}\left[\left(X_t - X_{t-1}\right)^{4+\delta}\right] < \infty$ for some $\delta > 0$.
Then, under these conditions we have that $$\sqrt{T}\left(\widehat{\text{AVar}}_j \left(X_t \right) - \text{AVar}_j \left(X_t \right) \right) \overset{ \mathcal{D} }{\longrightarrow}
\mathcal{N}(0, \sigma^2_T/T),$$ where $\sigma^2_T \equiv \sum_{h = -\infty}^{\infty}\text{cov}\left( \left(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)} \right)^2, \left(\bar{X}_{t+h}^{(j)} - \bar{X}_{t+h-\tau_j}^{(j)} \right)^2 \right)$.
```
```{proof, pf_lemmaasyav, name = "Asymptotic Normality of the MOAV Estimator"}
The proof of the result is direct from the Central Limit Theorem for $\alpha$-mixing process in Chapter 3. Let $Z_t = X_t - X_{t-1}$, then since $Z_t$ is stationary with mean zero, then so is $(X_t - X_{t-h})$ for all $h \in \mathbb{Z}$. This directly implies that $\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)}$ is also stationary (since it is based on a linear combination of stationary processes) and so is $Y_t \equiv (\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)})^2$ (since it is based on a time invariant transformation of a stationary process). And since $(Y_t)$ is a borel-measurable function of $Z_t$, we have $(Y_t)$ is also a strong mixing process with mixing coefficient $\alpha^{\ast}(n) \leq \alpha(n)$, hence $\sum_{n=1}^{\infty} \alpha^{\ast}(n)^{\delta/2 + \delta} < \infty$ for some $\delta > 0$. Moreover, since $(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)})$ is a linear function of $Z_t$, and $\mathbb{E}\left[Z_t^{4+\delta}\right] < \infty$, by triangle inequality, we have $\mathbb{E}\left[(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)})^{4+\delta}\right] < \infty$.
Thus, we can apply CLT for $\alpha$-mixing process on $Y_t$, i.e.
\begin{equation*}
\sqrt{T}\left(\widehat{\text{AVar}}_j \left(X_t \right) - \text{AVar}_j \left(X_t \right) \right) \overset{\mathcal{D}}{\mapsto} \mathcal{N}(0, \sigma^2_T/T),
\end{equation*}
where $\sigma^2_T \equiv \sum_{h = -\infty}^{\infty}\text{Cov}\left( \left(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)} \right)^2, \left(\bar{X}_{t+h}^{(j)} - \bar{X}_{t+h-\tau_j}^{(j)} \right)^2 \right)$, which concludes the proof.
```
<br>
### Confidence Interval of the MOAV Estimator
Based on the asymptotic normality results (Lemma \ref{lemma:asy:av}), we can construct the $1-\alpha$ confidence intervals for $\widehat{\text{AVar}}_j \left(X_t \right)$ as
\begin{equation*}
\text{CI}\left(\text{AVar}_j \left(X_t \right)\right) = \left[ \widehat{\text{AVar}}_j \left(X_t \right) \pm z_{1 - \frac{\alpha}{2}} \frac{\sigma_{T}}{T} \right],
\end{equation*}
where $z_{1 - \frac{\alpha}{2}} \equiv \boldsymbol{\Phi}^{-1}\left( 1- \frac{\alpha}{2} \right)$ is the $(1- \frac{\alpha}{2})$ quantile of a standard normal distribution.
However, the so-called "Long-Run Variance" $\sigma^2_{T}$ is usually unknown. Many methods have been proposed to consistently estimate it under mild conditions [see e.g. @newey1986simple].
```{exercise}
Gaussian-based confidence intervals are often problematic with the AV as the lower limit of CI can very well be negative. We will discuss an alternative method to construct the CI for such statistic later.
```
## Allan Variance-based Estimation
### Allan Variance log-log Representation
As illustrated in Lemmas \@ref(lem:lemmavpsd) and \@ref(lem:lemmaavtoacf), the AV depends on the properties of the stochastic process $(X_t)$. We will see that the "log-log" representation of the AV is often useful to identify various processes that may compose $(X_t)$.
For example, let's suppose that $X_t$ is a white noise process. We showed in \@ref(exm:avma1) that the theoretical AV of such process is given as
\begin{equation*}
\phi_j^2 \equiv \text{AVar}_j(X_t) = \frac{\sigma^2}{\tau_j}.
\end{equation*}
Therefore, we have that the Allan Deviation or AD (i.e. $\sqrt{\text{AVar}_j(X_t)}$ or $\phi_j$) is such that
\begin{equation*}
\text{log}\left( \phi_j \right) = \text{log} \left(\sqrt{\frac{\sigma^2}{\tau_j}}\right) = \text{log} \left(\sigma\right) - \frac{1}{2} \text{log} (\tau_j).
\end{equation*}
Thus, the log of the AD is linear in $\tau_j$ with a slope of $-1/2$ and with intercept $\text{log} (\sigma)$ as shown in the following simple simulated example.
```{r exampleAVwhitenoise, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', fig.cap='Simulation based on a white noise process with variance as 1 and number of observations as 10^5.'}
# Load packages
library(av) # Package for Allan variance functions
library(simts) # Package for time series simulations
# Simulate white noise
Xt = gen_gts(WN(sigma2 = 1), n = 10^5)
# Compute Allan variance
av = avar(Xt)
theo_av = av_wn(sigma2 = 1, n = av$clusters)
# Allan variance log-log representation
plot(av, legend_position = NA,
main = "Simulated WN", ylab = "Allan Deviation")
lines(av$clusters, sqrt(theo_av), col = "red")
# Add legend
legend("bottomleft",
legend = c(as.expression(bquote(paste(.("Empirical AV"), hat(phi)))),
as.expression(bquote(paste("CI(",hat(phi),", ",.(0.95),")"))),
"Theoretical AV"),
pch = c(16, 15, NA), lty = c(1, NA, 1),
col = c("darkblue", hcl(h = 210, l = 65, c = 100, alpha = 0.2), "red"),
cex = 1, pt.cex = c(1.25, 3, 1.25), bty = "n"
)
```
Suppose now that $(X_t)$ is a composite stochastic process composed of a WN and a RW. For simplicity, we assume that $X_t = Y_t + W_t$ where $Y_t$ is a WN process with variance $\sigma^2$ and $W_t$ a RW with variance $\gamma^2$. We already know that
\begin{equation*}
\text{log}\left( \text{AVar}_j(Y_t) \right) = \text{log} \left(\sigma\right) - \frac{1}{2} \text{log} (\tau_j).
\end{equation*}
and it can be shown (using for example Lemma \@ref(lem:lemmavpsd)) that
\begin{equation*}
\text{AVar}_j(W_t) = \frac{1}{3} \gamma^2 \tau_j,
\end{equation*}
and therefore we can obtain
\begin{equation*}
\text{log}\left( \sqrt{\text{AVar}_j(W_t) } \right) = \log \left(\sqrt{\frac{1}{3} \gamma^2 \tau_j}\right) = \text{log} \left(\frac{1}{\sqrt{3}} \gamma\right) + \frac{1}{2} \text{log} (\tau_j).
\end{equation*}
Thus, the log of the AD of $(Z_t)$ is also linear in $\tau_j$ with a slope of $+1/2$. By Corollary \@ref(cor:coroaddav) we also have that
\begin{equation*}
\text{AVar}_j(X_t) = \text{AVar}_j(Y_t) + \text{AVar}_j(W_t) = \frac{\sigma^2}{\tau_j} + \frac{1}{3} \gamma^2 \tau_j.
\end{equation*}
This result can be shown in the following simulated example:
```{r exampleAVwnrw, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', fig.cap='Simulation based on a white noise process with variance as 10^2 and a random walk process with variance 0.03^2 and number of observations as 10^5.'}
# Load packages
library(av)
library(simts)
# Simulate time series
Yt = gen_gts(WN(sigma2 = 10^2), n = 10^5)
Zt = gen_gts(RW(gamma2 = 0.03^2), n = 10^5)
Xt = Yt + Zt
# Compute Allan variance
av = avar(Xt)
av_of_wn = av_wn(sigma2 = 10^2, n = av$clusters)
av_of_rw = av_rw(omega2 = 0.03^2, n = av$clusters)
# Allan variance log-log representation
plot(av, legend_position = NA,
main = "Simulated WN + RW", ylab = "Allan Deviation")
lines(av$clusters, sqrt(av_of_wn), col = "red")
lines(av$clusters, sqrt(av_of_rw), col = "blue")
# Add legend
legend("bottomleft",
legend = c(as.expression(bquote(paste(.("Empirical AV"), hat(phi)))),
as.expression(bquote(paste("CI(",hat(phi),", ",.(0.95),")"))),
"Theoretical AV (WN)",
"Theoretical AV (RW)"),
pch = c(16, 15, NA, NA), lty = c(1, NA, 1, 1),
col = c("darkblue", hcl(h = 210, l = 65, c = 100, alpha = 0.2), "red", "blue"),
cex = 1, pt.cex = c(1.25, 3, 1.25, 1.25), bty = "n"
)
```
The AV is a powerful technique to identify the processes considered in Definition \@ref(def:defCompStocProcesses). Indeed, the five processes we are considering can be characterized in the AV log-log representation as follows:
- QN - linear with slope of -1;
- WN - linear with slope of -1/2;
- AR1 - curved shape with a slope $\in [-1/2, \, 1/2]$;
- RW - linear with slope of 1/2;
- DR - linear with slope of 1.
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='Five processes characterized in the AV log-log representation', echo=FALSE}
knitr::include_graphics("images/process_slope.pdf")
```
Here let's consider a real data which comes from a KVH-1750 gyro. This real data is collected over a 30-minute period at a frequency of 500Hz.
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='KVH-1750 (Gyro), 30 mins at 500Hz', echo=FALSE}
knitr::include_graphics("images/av_example_kvh.pdf")
```
The shape of the AD suggests that a WN + RW may be a reasonable approximation. But how could we estimate the parameters of this model? In the following section we will discuss the AV-based estimation of parameters for models.
### Allan Variance-based Estimation
We have seen that there exists a mapping from $\boldsymbol{\theta}_0$ (model's parameters) to the AV (or AD), i.e.
\begin{equation*}
\boldsymbol{\theta}_0 \mapsto \text{PSD or AVS} \mapsto \text{AV}.
\end{equation*}
In general, the mapping from the PSD to the AV is not one-to-one. Nevertheless, one may hope that in some case this mapping can be "inverted" (in some sense) so that the AV may be used to estimated $\boldsymbol{\theta}_0$. This is the central of AV linear regression approach.
Let us assume that we want to estimate the parameters of a QN, WN, RW or DR process. Then the process $(X_t)$ is such that it has a linear representation in a $\text{log} ( \phi_{\tau_j} ) - \text{log}\left( \tau_j \right)$ plot for the set of scales $\boldsymbol{\eta} \in \boldsymbol{\mathcal{G}}$ where
\begin{equation*}
\boldsymbol{\mathcal{G}} = \left\{ \left\{\tau_k, \, ...,\, \tau_{k + h} \right\} | k,h \in \mathbb{N}^{+}, k + h \leq J\right\}
\end{equation*}
denotes all possible sets which contain adjacent scales having cardinality $|\boldsymbol{\eta}| > 0$.
Then, there exists a linear relationship between $\text{log} \left( \phi_{\tau_j} ({\theta}) \right)$ and $\text{log}\left( \tau_j \right)$ we can write
\begin{equation*}
\text{log} \left( {\phi}_{\tau_j} \left( {\theta} \right) \right) = g\left( \theta \right) + \lambda \text{log} \left( \tau_j \right), \;\; \forall\tau_j \in \boldsymbol{\eta},
\end{equation*}
where the function $g (\cdot)$ as well as the constant $\lambda$ are known and depend on the model. For a white noise, we have for example $g(\sigma) = \text{log}(\sigma)$ and $\lambda = -1/2$. This linear relationship leads to the following "least-squares" estimator of $\theta$, noted as $\hat{\theta}_{AV}$ and defined as
\begin{equation*}
\hat{\theta}_{AV} \equiv \underset{\theta \in \Theta}{\text{argmin}} \; \sum_{\tau_j \in \boldsymbol{\eta}}\left[ \text{log} \left( \hat{\phi}_{\tau_j} \right) - g\left( \theta \right) - \lambda \text{log} \left( \tau_j \right) \right]^2,
\end{equation*}
which is an extremum estimator and admit the solution
\begin{equation*}
\hat{\theta}_{AV} = g^{-1} \left\{\frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \log \left( \hat{\phi}_{\tau_j} \right) - \lambda \log \left( \tau_j \right)\right]\right\} ,
\end{equation*}
where $| \boldsymbol{\eta}|$ denotes the cardinality of the set $\boldsymbol{\eta}$.
```{example, AVparaest, name = "Derivation of the AV-based Parameter Estimator"}
The derivation of the AV-based estimator is pretty straightforward. Since $\hat{\theta}_{AV}$ is an extremum estimator, we can first take the first derivative of the objective function with respect to $\boldsymbol{\theta}$, which yields:
\begin{equation*}
\begin{aligned}
&\frac{\partial}{\partial\boldsymbol{\theta}} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \text{log} \left( \hat{\phi}_{\tau_j} \right) - g\left( \theta \right) - \lambda \text{log} \left( \tau_j \right) \right]^2 = 2\sum_{\tau_j \in \boldsymbol{\eta}}\left[ \text{log} \left( \hat{\phi}_{\tau_j} \right) - g\left( \theta \right) - \lambda \text{log} \left( \tau_j \right)\right] g^{\prime}\left( \theta \right) = 0 \\
&\Leftrightarrow \frac{1}{|\boldsymbol{\eta}|} g^{\prime}\left( \theta \right) g\left( \theta \right) = g^{\prime}\left( \theta \right) \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \log \left( \hat{\phi}_{\tau_j} \right) - \lambda \log \left( \tau_j \right) \right] \\
&\Leftrightarrow \hat{\boldsymbol{\theta}}_{AV} = g^{-1} \left\{\frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \log \left( \hat{\phi}_{\tau_j} \right) - \lambda \text{log} \left( \tau_j \right)\right]\right\}.
\end{aligned}
\end{equation*}
<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```
Let us consider a simple example of the use of $\hat{\theta}_{AV}$ with a WN process. We already derived that
\begin{equation*}
\text{log}\left( \text{AVar}_j(Y_t) \right) = \text{log} \left(\sigma\right) - \frac{1}{2} \text{log} (\tau_j)
\end{equation*}
implying that $g(\sigma) = \text{log}(\sigma)$ (and thus $g^{-1}(\sigma) = \exp(\sigma)$) and $\lambda = -1/2$. This leads to the following estimator of $\sigma$,
\begin{equation*}
\hat{\sigma}_{AV} = \exp\left\{ \frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \text{log} \left( \hat{\phi}_{\tau_j} \right) + \frac{1}{2} \text{log} \left( \tau_j \right) \right\}.
\end{equation*}
Similarly for a RW, we obtain
\begin{equation*}
\hat{\gamma}_{AV} = \sqrt{3} \exp\left\{ \frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \text{log} \left( \hat{\phi}_{\tau_j} \right) - \frac{1}{2} \text{log} \left( \tau_j \right) \right\}.
\end{equation*}
Using these results, let us estimate the parameters on the KVH-1750 gyro in Section [4.4.1](Allan Variance log-log Representation). We have agreed that for this real data, the shape of the AD suggests that a WN + RW may be a reasonable approximation. So taking into consideration the processes characterization in the AV log-log representation, we can use different scales to estimate paramters of WN and RW for more precise estimation as shown in the figure below:
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='KVH-1750 (Gyro): Scales for parameters estimation', echo=FALSE}
knitr::include_graphics("images/av_example_kvh_part2.pdf")
```
Using the AV-based Parameter Estimators for the WN and RW processes, we can obtain the following estimates:
\begin{equation*}
\begin{aligned}
\hat{\sigma}_{AV} &= \frac{1}{2} \exp \left\{\frac{1}{12} \sum_{j = 1}^{12} \text{log} \left(\hat{\phi}_j\right)+ \frac{1}{2} \text{log} \left(2^j\right)\right\} \approx 1.710 \cdot 10^{-3}\\
\hat{\gamma}_{AV} &= \sqrt{3} \exp \left\{\frac{1}{4} \sum_{j = 15}^{18} \text{log} \left(\hat{\phi}_j\right)- \frac{1}{2} \text{log} \left(2^j\right)\right\} \approx 2.043 \cdot 10^{-7}.
\end{aligned}
\end{equation*}
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='KVH-1750 (Gyro): Implied AV by WN and RW respectively', echo=FALSE}
knitr::include_graphics("images/av_example_kvh_part3.pdf")
```
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='KVH-1750 (Gyro): Implied AV by WN + RW', echo=FALSE}
knitr::include_graphics("images/av_example_kvh_part4.pdf")
```
### Properties of Allan variance-based Estimation
Before studying the properties of the AV-based estimation technique, we want to check whether the MOAV estimator is consistent and asymptotically normal under the setting of composite stochastic processes for IMU calibration as defined in Definition \@ref(def:defCompStocProcesses). Lemma 1 of @guerrier2016theoretical has answered this question by showing that all the conditions of Lemmas
\@ref(lem:lemmconsistencyAV) and \@ref(lem:lemmaasyav) are satisfied under this setting.
Since $\hat{\theta}_{AV}$ admits the analytically solution of the AV-based estimator, the consistency of the AV-based estimator can be directly studied using the consistency of the AV and the continuous mapping theorem. Indeed, using the latter we have
\begin{equation*}
\begin{aligned}
\hat{\theta}_{AV} &= g^{-1} \left\{\frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \text{log} \left( \hat{\phi}_{\tau_j} \right) - \lambda \text{log} \left( \tau_j \right)\right]\right\} \\
&\overset{\mathcal{P}}{\mapsto} g^{-1} \left\{\frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \text{log} \left( {\phi}_{\tau_j} \right) - \lambda \text{log} \left( \tau_j \right)\right]\right\} = \theta^* ,
\end{aligned}
\end{equation*}
and therefore if we can show that $\theta^* = \theta_0$, then the AV-based estimator is consistent.
```{corollary, corConsistAVest, name = "Consistency of AV-based Estimator"}
The AV-based estimator is consistent if the process $(X_t)$ is composed of either a QN, WN or DR process.
```
```{proof, pf_corConsistAVest, name = "Proof of Consistency of AV-based Estimator"}
The proof of the consistency of AV-based estimator is direct by simply showing that $\theta^* = \theta_0$.
```
<br>
```{corollary, corInconsistAVest, name = "Inconsistency of AV-based Estimator"}
The AV-based estimator is not consistent if the process $(X_t)$ includes at least one RW process or contains more than one latent process.
```
```{proof, pf_corInconsistAVest, name = "Proof of Inconsistency of AV-based Estimator"}
It can be shown that at any model $\theta^* > \theta_0$, implying the inconsistency of the AV-based estimation method.
```
<br>
A formal proof of these results are given in @guerrier2016theoretical.
The main reason for the inconsistency of this approach is that the effect of multiple superimposed stochastic processes cannot (perfectly) separated in the log-log representation. This suggests the following alternative definition of $\hat{\theta}_{AV}$:
\begin{equation*}
\hat{\theta}_{AV}^* = \text{argmin}_{\boldsymbol{\theta} \in \boldsymbol{\Theta}} \; \sum_{j = 1}^J \; \left( \hat{\phi}_j - \phi_j (\boldsymbol{\theta})\right)^2,
\end{equation*}
where $\phi_j (\boldsymbol{\theta})$ denotes the theoretical AV under the assumption that $\boldsymbol{\theta}$ is the correct parameter vector. We will see that this approach is a special case of the GMWM estimator and it leads to consistent and asymptotically normally distributed estimators.
```{example, exmKVH, name = "Estimation for KVH-1750 Gyro using Alternative AV-based Estimator"}
Let us revisit the KVH-1750 gyro dataset using the above alternative AV-based estimation approach. We have shown that
\begin{equation*}
\text{AVar}_j(X_t) = \text{AVar}_j(Y_t) + \text{AVar}_j(W_t) = \frac{\sigma^2}{\tau_j} + \frac{1}{3} \gamma^2 \tau_j.
\end{equation*}
and therefore we can define the following estimator:
\begin{equation*}
\hat{\boldsymbol{\theta}}_{AV}^* = [\hat{\sigma}_{AV}^* \;\;\; \hat{\gamma}_{AV}^*]^T = \text{argmin}_{\boldsymbol{\theta} \in \boldsymbol{\Theta}} \; \sum_{j = 1}^J \; \left( \phi_j^2 - \frac{\sigma^2}{\tau_j} - \frac{1}{3} \gamma^2 \tau_j \right)^2,
\end{equation*}
corresponding to the point estimates:
\begin{equation*}
\begin{aligned}
\hat{\sigma}_{AV}^* & \approx 1.847 \cdot 10^{-3}, \\
\hat{\gamma}_{AV}^* & \approx 1.493 \cdot 10^{-7}.
\end{aligned}
\end{equation*}
```
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='Estimation for KVH-1750 gyro using both the traditional and the alternative AV-based estimators', echo=FALSE}
knitr::include_graphics("images/av_example_kvh_part5.pdf")
```