03-allanvariance.Rmd

# Allan Variance Calibration Techniques

In this chapter, we will study the Allan variance (AV) and the corresponding calibration techniques. For this purpose, this chapter is organized in the following order:

- Review on MLE-based methods;
- An introduction of the Allan variance;
- Estimation of the Allan variance;
- Allan variance-based estimation.


## Review on MLE-based Methods
In general, inertial sensor stochastic calibration involves the estimation of the parameters of composite stochastic processes. These models are typically difficult to estimate because of their latent features. In the definition below we characterize the class of composite stochastic processes we shall consider here.

```{definition, defCompStocProcesses, name = "Composite Stochastic Processes for IMU Calibration"}
Let $(X_t)$ be a sum of latent independent stochastic process such that:

- $(X_t)$ is made of a sum which includes a subset or all processes in the set $\{$**QN**, **WN**, **AR1**, **RW**, **DR**$\}$, where processes in the subset $\{$**QN**, **WN**, **RW**, **DR**$\}$ can be included only up to once and process **AR1** can be included $k$ times ($0 \leq k < \infty$). 

- Let $\mathcal{Q}$ denote an arbitrary compact subset of $\mathbb{R}^+$. Then, the innovation process for processes **WN**, **RW** and **AR1** have respective variances $\sigma^2$, $\gamma^2$ and $\nu^2$ such that $\sigma^2, \gamma^2 \text{ and } \nu^2 \in \mathcal{Q}$ and processes **QN** and **DR** have $Q^2 \in \mathcal{Q}$ and $|\omega|\in \mathcal{Q}$ respectively.
```

There are three main class of estimation techniques that can be used for the estimation of the class of composite stochastic processes for IMU calibration. The methods are the following:

- **Maximum likelihood based methods**: Although these methods are optimal in theory, their applicability is extremely limited due to numerical reasons and tends to perform badly in practice.

- **Allan variance-based methods**: This class of method is the most popular approach for IMU calibration. However, they typically lead to inconsistent estimators and their finite sample performance is often much lower than GMWM-based technique.  

- **GMWM and related methods**: In our (very biased) opinion, these techniques are currently the optimal choice for the estimation of the parameters of the class of processes considered in Definition \@ref(def:defCompStocProcesses). As we will see, this method is in fact a formalized version Allan variance-based methods.


Maximum likelihood based approaches are generally inappropriate for the estimation of the class of processes considered in Definition \@ref(def:defCompStocProcesses). In this chapter, we shall avoid a technical discussion on likelihood based method and refer the readers to @stebler2011constrained and @guerrier2016discussion for more details. Instead, we will consider an example to illustrate the numerical issues of this technique.

```{example, exMLEmethod, name="MLE-based IMU calibration"}
Suppose we have a composite stochastic process composed of a WN and an AR1 process, i.e.
\begin{equation*}
\begin{aligned}
Y_t &= \phi_0 Y_{t-1} + Z_t, \;\;\; Z_t \overset{iid}{\sim} \mathcal{N}\left(0, \nu^2_0 \right), \\
U_t &\overset{iid}{\sim} \mathcal{N}\left(0, \sigma^2_0 \right), \;\;\;\; X_t = Y_t + U_t.
\end{aligned}
\end{equation*}
Then we want to estimate the parameter $\boldsymbol{\theta}_0 = \left[\phi_0 \;\; \nu^2_0 \;\; \sigma^2_0\right]^T$.

Since the process is Gaussian, we have
\begin{equation*}
\begin{aligned}
\mathbf{X} \sim \mathcal{N} \left(\mathbf{0}, \boldsymbol{\Sigma}(\boldsymbol{\theta}_0)\right),
\end{aligned}
\end{equation*}
where $\mathbf{X} \equiv [X_1, ..., X_T]^T$ and $\boldsymbol{\Sigma}(\boldsymbol{\theta}_0) \equiv \text{Cov}(\mathbf{X})$. Since $Y_t$ and $U_t$ are independent, we have
\begin{equation*}
\boldsymbol{\Sigma}(\boldsymbol{\theta}_0) = \text{Cov}(\mathbf{X}) = \text{Cov}(\mathbf{Y}) + \text{Cov}(\mathbf{U}) = \frac{\nu^2_0}{1 - \phi_0^2} \left[ \phi_0^{|i-j|}\right]_{i,j = 1, ..., T} + \sigma^2_0 \mathbf{I}_T,
\end{equation*}
where $\mathbf{Y} \equiv [Y_1, ..., Y_T]^T$, $\mathbf{U} \equiv [U_1, ..., U_T]^T$ and where $\mathbf{I}_T$ denotes the identity matrix of dimension $T$. Note that the form of $\text{Cov}(\mathbf{Y})$ is due to the autocovariance of an AR1 which has been discussed in Chapter 2.

So we can now write the log-likelihood function of the model considered here which, up to a constant, can be expressed as
\begin{equation*}
\mathcal{L}\left(\boldsymbol{\theta} | \mathbf{X} \right) = - \log \left( \det \left( \boldsymbol{\Sigma}(\boldsymbol{\theta}) \right)\right)  - \mathbf{X}^T \boldsymbol{\Sigma}(\boldsymbol{\theta})^{-1} \mathbf{X}.
\end{equation*}

Therefore, we can find the maximum likelihood estimator for $\boldsymbol{\theta}_0$:
\begin{equation*}
\hat{\boldsymbol{\theta}} = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\text{argmax}} \; \mathcal{L}\left(\boldsymbol{\theta} | \mathbf{X} \right) = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\text{argmin}} \; \text{log} \left( \det \left( \boldsymbol{\Sigma}(\boldsymbol{\theta}) \right)\right)  +  \mathbf{X}^T \boldsymbol{\Sigma}(\boldsymbol{\theta})^{-1} \mathbf{X}.
\end{equation*}

Unfortunately, the applicability of this estimator is essentially impossible when $T > 10^5$ since every evaluation of this the function $\mathcal{L}\left(\boldsymbol{\theta} | \mathbf{X} \right)$ requires to invert a $T \times T$ matrix, which entails a considerable (and often unrealistic) computational burden.

An alternative approach to compute maximum likelihood estimator for $\boldsymbol{\theta}_0$ is based on the EM-algorithm of @dempster1977maximum. If the process $(Y_t)$ (or $U_t$) was observed, then we could easily estimate the parameters of our model by considering separately the likelihood of both processes. Since the composite process is a state-space model we could use the following approach:
\begin{equation*}
\hat{\boldsymbol{\theta}} = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\text{argmax}} \; \mathcal{L}\left(\boldsymbol{\theta} | \mathbf{X}, \hat{\mathbf{Y}}(\boldsymbol{\theta}) \right),
\end{equation*}
where $\hat{\mathbf{Y}}(\boldsymbol{\theta})$ denotes the estimation of $\mathbf{Y}$ (i.e. the states) based on a Kalman filter assuming $\boldsymbol{\theta}$ to be the correct parameter vector. Unfortunately, this approach suffers from the same computational limitations as the maximum likelihood estimator approach.

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```


## An Introduction of the Allan Variance
### Definition of the Allan Variance
The Allan variance (AV) is a statistical technique originally developed in the mid-1960s to study the stability of precision oscillators [see e.g. @allan1966statistics]. It can provide information on the types and magnitude of various superimposed noise terms (i.e. composite stochastic processes). This method has been adapted to characterize the properties of a variety of devices including inertial sensors [see @elsheimy08av]. The AV is a measure of variability developed for long term memory processes and can in fact be interpreted as a Haar wavelet coefficient variance [see @percival1994long]. We will discuss this connection further on.

```{definition, defAV, name = "Allan Variance"}
We consider the AV at dyadic scales ($\tau_j$) starting from local averages of the process which can be denoted as

\begin{equation*} 
    \bar{X}_{t}^{(j)} \equiv \frac{1}{\tau_j} \sum_{i = 1}^{\tau_j} X_{t - \tau_j + i}\, ,
    \label{mean.noav}
\end{equation*}

where $\tau_j \equiv 2^j, \; j  \in \left\{x \in \mathbb{N} \, : \;  1 \leq x < \log_2 (T) - 1 \right\}$ and therefore determines the number of consecutive observations considered for the average. Then, the AV is defined as

\begin{equation*}
    \text{AV}_j \left(X_t \right) \equiv \frac{1}{2} \, \mathbb{E}\left[ \left(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)} \right)^2 \right].
\end{equation*}
```

```{exercise, alterDefScale, name = "Alternative scale definition"}
The definition of the AV is actually valid for $\tau_j = \lfloor2^j\rfloor$ with $j  \in \left\{x \in \mathbb{R} \, : \;  1 \leq x < \log_2 (T) - 1 \right\}$. In some case, it could be used to consider this alternative definition [see e.g. @elsheimy08av] but we shall restrict ourselves here to the case where $j  \in \left\{x \in \mathbb{N} \, : \;  1 \leq x < \log_2 (T) - 1 \right\}$.
```

```{exercise, notationAV, name = "Notation of the Allan Variance"}
For notational simplicity, we may sometimes replace $\text{AV}_j \left(X_t \right)$ by simply $\phi_j^2$ when the dependence of the AV to the process $(X_t)$ is evident.
```

As highlighted earlier, the AV is, among others, a widely and commonly used approach in engineering for sensor calibration as it is linked to the properties of the process $(X_t)$ as shown in the following lemma [see e.g. @percival2006wavelet for the proof].

```{lemma, lemmavpsd, name = "AV Connection to PSD"}
For a stationary process $(X_t)$ with PSD $S_{X}(f)$ we have

\begin{equation*}
\phi_j^2 \equiv \text{AV}_j \left(X_t \right) = 4  \int_0^{\infty}  \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} S_{X}(f) df. 
\label{eq:allanvariancePSD_LInk}
\end{equation*}
```

Therefore, this result establishes a direct connection between the AV and PSD. Therefore a natural question is whether the mapping PSD $\mapsto$ AV is one-to-one. @greenhall1998spectral (see Theorem 1) showed that this is actually not the case. This is illustrated in the followsing Section [4.2.2](Spectral Ambiguity of the Allan Variance).

### Spectral Ambiguity of the Allan Variance 
Consider two (independent) stochastic processes $(X_t)$ and $(Y_t)$ with respective PSD $S_X(f)$ and $S_Y(f)$. Suppose that $S_X(f) \neq S_Y(f)$, then the two processes will have the same AV if

\begin{equation*}
     \Delta \equiv \int_0^{\infty}  \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} \Phi(f) df = 0,
\end{equation*}

where $\Phi(f) \equiv S_{X}(f) - S_{Y}(f)$. To show that it is possible that $\Delta = 0$ when $\Phi(f) \neq 0$, we will use the following critical identity:

\begin{equation*}
\sin^4(x) = \sin^2(x) - \frac{1}{4} \sin^2(2x).
\end{equation*}

First, we note that $\Delta$ may be expressed using the above critical identity as follows:

\begin{equation*}
    \begin{aligned}
        \Delta &=  \int_{0}^{\infty} \frac{\sin^4\left(\tau \pi f \right)}{\left(\tau \pi f \right)^2} \Phi(f) df \\
        &= \lim_{n \rightarrow -\infty} \int_{2^{n}}^{\infty} \frac{\sin^2\left(\tau \pi f \right) - \frac{1}{4} \sin^2\left(2 \tau \pi f \right) }{\left(\tau \pi f \right)^2} \Phi(f) df .
    \end{aligned}
\end{equation*}

Second, by the change of variable $u = 2f$ in the second term, we obtain

\begin{equation*}
    \begin{aligned}
        \Delta = \lim_{n \rightarrow -\infty} & \Bigg[ \int_{2^{n}}^{\infty} \frac{\sin^2\left(\tau \pi f \right)}{\left(\tau \pi f \right)^2} \Phi(f) df - \frac{1}{2}\int_{2^{n+1}}^{\infty} \frac{\sin^2\left(\tau \pi u \right)}{\left(\tau \pi u \right)^2} \Phi(f) du \Bigg].
    \end{aligned}
\end{equation*}

Now suppose that $\Phi(f) = 2 \Phi(2f)$. In this case, we have $\Phi(f) = 2 \Phi(u)$ and therefore we obtain
\begin{equation*}
    \begin{aligned}
        \Delta &= \lim_{n \rightarrow -\infty} \int_{2^{n}}^{2^{n+1}} \frac{\sin^2\left(\tau \pi f \right)}{\left(\tau \pi f \right)^2} \Phi(f) df = 0.
    \end{aligned}
\end{equation*}

This result demonstrates that the mapping from PSD to AV is not necessarily one-to-one. @greenhall1998spectral showed that in the continuous case (i.e. $\tau_j \in \mathbb{R}$) $\Delta = 0$ if and only if $\Phi(f) = 2 \Phi(2f)$. However, the ``only if'' part of this result (while conjectured) is unknown in the discrete case.


### Properties of the Allan Variance

One reason of explaining the widespread use of the AV for sensor calibration is due to the following additivity property, which is particularly convenient to identify composite stochastic processes (see Definition 2.17).

```{corollary, coroaddav, name = "Additivity of the AV"}
Consider two (independent) stochastic processes $(X_t)$ and $(Y_t)$ with respective PSD $S_X(f)$ and $S_Y(f)$. Suppose that we observe the process $Z_t = X_t + Y_t$. Then, we have

\begin{equation*}
    \text{AV}_j \left(Z_t \right) = \text{AV}_j \left(X_t \right) + \text{AV}_j \left(Y_t \right).
\end{equation*}
```

```{proof, pf_coroaddav, name = "Additivity of the AV"}
The proof of Additivity of the AV is direct from Lemma \@ref(lem:lemmavpsd). Indeed, since $S_Z(f) = S_X(f) + S_Y(f)$, we have

\begin{equation*}
    \begin{aligned}
    \text{AV}_j \left(Z_t \right) &= 4  \int_0^{\infty}  \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} S_{Z}(f) df\\
    &= 4  \int_0^{\infty}  \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} S_{X}(f) df + 4  \int_0^{\infty}  \frac{\sin^4(\pi f \tau_j)}{(\pi f \tau_j)^2} S_{Y}(f) df\\
    &= \text{AV}_j \left(X_t \right) + \text{AV}_j \left(Y_t \right).
    \end{aligned}
\end{equation*}
```
<br>

Lemma \@ref(lem:lemmavpsd) is an important result which is very convenient to determine the theoretical AV of a certain stochastic process. However, the applicability of this result is often limited since the integral defined in Section [4.2.2](Spectral Ambiguity of the Allan Variance) can be intractable. An alternative to Lemma \@ref(lem:lemmavpsd) has been proposed by @zhang2008allan and is far advantageous from a computational standpoint. 

```{lemma, lemmaavtoacf, name = "AV Connection to ACF"}
For a stationary process $(X_t)$ with variance $\sigma^2_X$ and ACF $\rho(h)$ we have
\begin{equation*}
\text{AV}_j \left(X_t \right)  = \frac{\sigma_X^2}{\tau_j^2} \bigg(\tau_j\left[1-\rho(\tau_j)\right] 
   + \sum_{i=1}^{\tau_j-1} i \left[2 \rho(\tau_j-i) - \rho(i) - \rho(2\tau_j-i)\right]\bigg).
\end{equation*}
```

The proof of this result is instructive and is presented in @xu2017study.

Using Lemma \@ref(lem:lemmaavtoacf), the exact form of the AV for different stationary processes, such as the general class of ARMA models, can be derived. Moreover, @zhang2008allan provided the theoretical AV for non-stationary processes such as the random walk and ARFIMA models for which the AV, as mentioned earlier, represents a better measure of uncertainty compared to other methods.

Lemma \@ref(lem:lemmaavtoacf) was extended to non-stationary processes in @xu2017study.

```{example, label = avma1, name = "Theoretical AV of an MA(1) Process"}
From the autocovariance we obtain
\begin{equation*}
        \rho(h) = \text{corr}\left(X_t, X_{t-h} \right) =\left\{
  \begin{array}{cl}
    1 &\text{if } h = 0\\
    \frac{\theta}{1 + \theta^2} &\text{if } |h| = 1\\
    0 &\text{if } |h| > 1.\\
  \end{array}
\right.
\end{equation*}
    
We can now apply the formula given in Lemma \@ref(lem:lemmaavtoacf), which leads to
    
\begin{equation*}
    \begin{aligned}
        \text{AV}_j \left(X_t \right)  &= \frac{\left(1 + \theta^2 \right) \sigma^2}{\tau_j^2} \bigg(\tau_j
   + \sum_{i=1}^{\tau_j-1} i \left[2 \rho(\tau_j-i) - \rho(i) - \rho(2\tau_j-i)\right]\bigg)\\
   &=\frac{\left(1 + \theta^2 \right) \sigma^2}{\tau_j^2} \bigg(\tau_j
   + 2 \sum_{i=1}^{\tau_j-1} i \rho(\tau_j-i) -\sum_{i=1}^{\tau_j-1} i \rho(i) - \sum_{i=1}^{\tau_j-1} i \rho(2\tau_j-i)\bigg)\\
   &=\frac{\left(1 + \theta^2 \right) \sigma^2}{\tau_j^2} \left(\tau_j
   + 2  (\tau_j - 1) \rho(1) -  \rho(1) \right)\\
   &=\frac{\left(1 + \theta^2 \right) \sigma^2}{\tau_j^2} \bigg(\tau_j
   +  (2\tau_j - 3) \frac{\theta}{1 + \theta^2} \bigg).
   \end{aligned}
\end{equation*}
<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```


## Estimation of the Allan Variance
Several estimators of the AV have been introduced in the literature. The most commonly used one is (probably) the Maximum-Overlapping AV (MOAV) estimator proposed by @percival1994long, which is defined as follows:

```{definition, defmoav, name = "Maximum-Overlapping AV Estimator"}
The MOAV is defined as:
\begin{eqnarray*}
\hat{\phi}_j^2 \equiv \widehat{\text{AV}}_j \left(X_t \right) = \frac{1}{2 \left(T - 2\tau_j + 1\right)} \sum_{k = 2 \tau_j}^{T} \left(\bar{X}_{k}^{(j)} - \bar{X}_{k-\tau_j}^{(j)} \right)^2.
\end{eqnarray*}
```
    
We will now study the properties of this estimator through the following lemmas.

### Consistency of the MOAV Estimator

```{lemma, lemmconsistencyAV, name = "Consistency of the MOAV Estimator"}
Let $(X_t)$ be such that:

- $(X_t - X_{t-1})$ is a (strongly) stationary process,
- $(X_t - X_{t-1})^2$ has absolutely summable covariance structure,
- $\mathbb{E}\left[(X_t - X_{t-1})^4\right] < \infty$,

Then, we have
     $$\widehat{\text{AV}}_j \left(X_t \right) \overset{ \mathcal{P} }{\longrightarrow} \text{AV}_j \left(X_t \right).$$
```

```{proof, pf_lemmconsistencyAV, name = "Consistency of the MOAV Estimator"}
The proof of the result is direct from the theorem of Weak Law of Large Number for Dependent Process in Chapter 3. 

Let $Z_t = X_t - X_{t-1}$, then since $Z_t$ is stationary with mean zero then so is $(X_t - X_{t-h})$ for all $h \in \mathbb{Z}$. This directly implies that $\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)}$ is also stationary (since it is based on a linear combination of stationary processes) and so is $Y_t \equiv (\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)})^2$ (since it is based on a time invariant transformation of a stationary process). Moreover, there exist constants $c_h$ such that
\begin{equation*}
    \sum_{h = -\infty}^{\infty} \; \gamma_Y (h) = \sum_{h = -\infty}^{\infty} \; c_{|h|} \gamma_{Z^2} (h).
\end{equation*}

Therefore, we obtain
\begin{equation*}
    \sum_{h = -\infty}^{\infty} \; |\gamma_Y (h)| = \sum_{h = -\infty}^{\infty} \; |c_{|h|} \gamma_{Z^2} (h)| \leq \sup_{k = 1, ..., \infty} c_k \; \sum_{h = -\infty}^{\infty} \; |\gamma_{Z^2} (h)| < \infty,
\end{equation*}

since both terms are bounded. Using the same approach we have that $\mathbb{E}\left[Y_t^2\right]$ is bounded since $\mathbb{E}\left[Z_t^4\right]$ is bounded. Thus, we can apply WLLN for dependent process on the process $Y_t$, i.e.
\begin{equation*}
    \widehat{\text{AVar}}_j \left(X_t \right) = \frac{1}{2} \bar{Z}_t \overset{\mathcal{P}}{\mapsto} \frac{1}{2} \mathbb{E}[{Z}_t] = \text{AVar}_j \left(X_t \right),
\end{equation*}
which concludes the proof.
```
<br>

This result is closely related by the results of @percival1995estimation on the wavelet variance. We shall explore the connection between the AV and wavelet variance later.

### Asymptotic Normality of the MOAV Estimator
Compared to consistency, the asymptotic normality of the MOAV estimator requires stronger conditions given in the following lemma.

```{lemma, label = lemmaasyav, name = "Asymptotic Normality of the MOAV Estimator"}
Let $(X_t)$ be such that:

- $(X_t - X_{t-1})$ is a (strongly) stationary process.
- $(X_t - X_{t-1})$ is a strong mixing process with mixing coefficient $\alpha(n)$ such that $\sum_{n=1}^{\infty} \alpha(n)^{\frac{\delta}{2+\delta}} < \infty$ for some $\delta > 0$.
- $\mathbb{E}\left[\left(X_t - X_{t-1}\right)^{4+\delta}\right] < \infty$ for some $\delta > 0$.

Then, under these conditions we have that $$\sqrt{T}\left(\widehat{\text{AVar}}_j \left(X_t \right) - \text{AVar}_j \left(X_t \right) \right) \overset{ \mathcal{D} }{\longrightarrow}
\mathcal{N}(0, \sigma^2_T/T),$$ where $\sigma^2_T \equiv \sum_{h = -\infty}^{\infty}\text{cov}\left( \left(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)} \right)^2, \left(\bar{X}_{t+h}^{(j)} - \bar{X}_{t+h-\tau_j}^{(j)} \right)^2 \right)$.
```

```{proof, pf_lemmaasyav, name = "Asymptotic Normality of the MOAV Estimator"}
The proof of the result is direct from the Central Limit Theorem for $\alpha$-mixing process in Chapter 3. Let $Z_t = X_t - X_{t-1}$, then since $Z_t$ is stationary with mean zero, then so is $(X_t - X_{t-h})$ for all $h \in \mathbb{Z}$. This directly implies that $\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)}$ is also stationary (since it is based on a linear combination of stationary processes) and so is $Y_t \equiv (\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)})^2$ (since it is based on a time invariant transformation of a stationary process). And since $(Y_t)$ is a borel-measurable function of $Z_t$, we have $(Y_t)$ is also a strong mixing process with mixing coefficient $\alpha^{\ast}(n) \leq \alpha(n)$, hence $\sum_{n=1}^{\infty} \alpha^{\ast}(n)^{\delta/2 + \delta} < \infty$ for some $\delta > 0$. Moreover, since $(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)})$ is a linear function of $Z_t$, and $\mathbb{E}\left[Z_t^{4+\delta}\right] < \infty$, by triangle inequality, we have $\mathbb{E}\left[(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)})^{4+\delta}\right] < \infty$.
    
Thus, we can apply CLT for $\alpha$-mixing process on $Y_t$, i.e.
\begin{equation*}
    \sqrt{T}\left(\widehat{\text{AVar}}_j \left(X_t \right) - \text{AVar}_j \left(X_t \right) \right) \overset{\mathcal{D}}{\mapsto} \mathcal{N}(0, \sigma^2_T/T),
\end{equation*}
where $\sigma^2_T \equiv \sum_{h = -\infty}^{\infty}\text{Cov}\left( \left(\bar{X}_{t}^{(j)} - \bar{X}_{t-\tau_j}^{(j)} \right)^2, \left(\bar{X}_{t+h}^{(j)} - \bar{X}_{t+h-\tau_j}^{(j)} \right)^2 \right)$, which concludes the proof.
```
<br>


### Confidence Interval of the MOAV Estimator
Based on the asymptotic normality results (Lemma \ref{lemma:asy:av}), we can construct the $1-\alpha$ confidence intervals for $\widehat{\text{AVar}}_j \left(X_t \right)$ as
\begin{equation*}
    \text{CI}\left(\text{AVar}_j \left(X_t \right)\right) = \left[ \widehat{\text{AVar}}_j \left(X_t \right) \pm z_{1 - \frac{\alpha}{2}} \frac{\sigma_{T}}{T} \right],
\end{equation*}

where $z_{1 - \frac{\alpha}{2}} \equiv \boldsymbol{\Phi}^{-1}\left( 1- \frac{\alpha}{2} \right)$ is the $(1- \frac{\alpha}{2})$ quantile of a standard normal distribution.

However, the so-called "Long-Run Variance" $\sigma^2_{T}$ is usually unknown. Many methods have been proposed to consistently estimate it under mild conditions [see e.g.  @newey1986simple].

```{exercise}
Gaussian-based confidence intervals are often problematic with the AV as the lower limit of CI can very well be negative. We will discuss an alternative method to construct the CI for such statistic later.
```

## Allan Variance-based Estimation
### Allan Variance log-log Representation

As illustrated in Lemmas \@ref(lem:lemmavpsd) and \@ref(lem:lemmaavtoacf), the AV depends on the properties of the stochastic process $(X_t)$. We will see that the "log-log" representation of the AV is often useful to identify various processes that may compose $(X_t)$.

For example, let's suppose that $X_t$ is a white noise process. We showed in \@ref(exm:avma1) that the theoretical AV of such process is given as
\begin{equation*}
      \phi_j^2 \equiv \text{AVar}_j(X_t) = \frac{\sigma^2}{\tau_j}.
\end{equation*}

Therefore, we have that the Allan Deviation or AD (i.e. $\sqrt{\text{AVar}_j(X_t)}$ or $\phi_j$) is such that

\begin{equation*}
\text{log}\left( \phi_j \right) = \text{log} \left(\sqrt{\frac{\sigma^2}{\tau_j}}\right) = \text{log} \left(\sigma\right) - \frac{1}{2} \text{log} (\tau_j).
\end{equation*}

Thus, the log of the AD is linear in $\tau_j$ with a slope of $-1/2$ and with intercept $\text{log} (\sigma)$ as shown in the following simple simulated example.

```{r exampleAVwhitenoise, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', fig.cap='Simulation based on a white noise process with variance as 1 and number of observations as 10^5.'}
# Load packages
library(av)      # Package for Allan variance functions
library(simts)   # Package for time series simulations

# Simulate white noise
Xt = gen_gts(WN(sigma2 = 1), n = 10^5)

# Compute Allan variance
av = avar(Xt)
theo_av = av_wn(sigma2 = 1, n = av$clusters) 

# Allan variance log-log representation
plot(av, legend_position = NA,
     main = "Simulated WN", ylab = "Allan Deviation")
lines(av$clusters, sqrt(theo_av), col = "red")

# Add legend
legend("bottomleft",
       legend = c(as.expression(bquote(paste(.("Empirical AV"), hat(phi)))),
                  as.expression(bquote(paste("CI(",hat(phi),", ",.(0.95),")"))),
                  "Theoretical AV"),
       pch = c(16, 15, NA), lty = c(1, NA, 1), 
       col = c("darkblue", hcl(h = 210, l = 65, c = 100, alpha = 0.2), "red"), 
       cex = 1, pt.cex = c(1.25, 3, 1.25), bty = "n"
      )
```

Suppose now that $(X_t)$ is a composite stochastic process composed of a WN and a RW. For simplicity, we assume that $X_t = Y_t + W_t$ where $Y_t$ is a WN process with variance $\sigma^2$ and $W_t$ a RW with variance $\gamma^2$. We already know that

\begin{equation*}
\text{log}\left( \text{AVar}_j(Y_t)  \right) = \text{log} \left(\sigma\right) - \frac{1}{2} \text{log} (\tau_j).
\end{equation*}

and it can be shown (using for example Lemma \@ref(lem:lemmavpsd)) that

\begin{equation*}
\text{AVar}_j(W_t) = \frac{1}{3} \gamma^2 \tau_j,
\end{equation*}

and therefore we can obtain 

\begin{equation*}
\text{log}\left( \sqrt{\text{AVar}_j(W_t) } \right) = \log \left(\sqrt{\frac{1}{3} \gamma^2 \tau_j}\right) = \text{log} \left(\frac{1}{\sqrt{3}} \gamma\right) + \frac{1}{2} \text{log} (\tau_j).
\end{equation*}

Thus, the log of the AD of $(Z_t)$ is also linear in $\tau_j$ with a slope of $+1/2$. By Corollary \@ref(cor:coroaddav) we also have that
\begin{equation*}
\text{AVar}_j(X_t) = \text{AVar}_j(Y_t) + \text{AVar}_j(W_t) =  \frac{\sigma^2}{\tau_j} + \frac{1}{3} \gamma^2 \tau_j.
\end{equation*}

This result can be shown in the following simulated example:

```{r exampleAVwnrw, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', fig.cap='Simulation based on a white noise process with variance as 10^2 and a random walk process with variance 0.03^2 and number of observations as 10^5.'}
# Load packages
library(av)     
library(simts)   

# Simulate time series
Yt = gen_gts(WN(sigma2 = 10^2), n = 10^5)     
Zt = gen_gts(RW(gamma2 = 0.03^2), n = 10^5)  
Xt = Yt + Zt

# Compute Allan variance
av = avar(Xt)
av_of_wn = av_wn(sigma2 = 10^2, n = av$clusters) 
av_of_rw = av_rw(omega2 = 0.03^2, n = av$clusters) 

# Allan variance log-log representation
plot(av, legend_position = NA,
     main = "Simulated WN + RW", ylab = "Allan Deviation")
lines(av$clusters, sqrt(av_of_wn), col = "red")
lines(av$clusters, sqrt(av_of_rw), col = "blue")

# Add legend
legend("bottomleft",
       legend = c(as.expression(bquote(paste(.("Empirical AV"), hat(phi)))),
                  as.expression(bquote(paste("CI(",hat(phi),", ",.(0.95),")"))),
                  "Theoretical AV (WN)",
                  "Theoretical AV (RW)"),
       pch = c(16, 15, NA, NA), lty = c(1, NA, 1, 1), 
       col = c("darkblue", hcl(h = 210, l = 65, c = 100, alpha = 0.2), "red", "blue"), 
       cex = 1, pt.cex = c(1.25, 3, 1.25, 1.25), bty = "n"
       )
```


The AV is a powerful technique to identify the processes considered in Definition \@ref(def:defCompStocProcesses). Indeed, the five processes we are considering can be characterized in the AV log-log representation as follows: 

- QN - linear with slope of -1;
- WN - linear with slope of -1/2;
- AR1 - curved shape with a slope $\in [-1/2, \, 1/2]$;
- RW - linear with slope of 1/2;
- DR - linear with slope of 1.

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='Five processes characterized in the AV log-log representation', echo=FALSE}
knitr::include_graphics("images/process_slope.pdf")
```


Here let's consider a real data which comes from a KVH-1750 gyro. This real data is collected over a 30-minute period at a frequency of 500Hz. 
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='KVH-1750 (Gyro), 30 mins at 500Hz', echo=FALSE}
knitr::include_graphics("images/av_example_kvh.pdf")
```

The shape of the AD suggests that a WN + RW may be a reasonable approximation. But how could we estimate the parameters of this model? In the following section we will discuss the AV-based estimation of parameters for models. 


### Allan Variance-based Estimation
We have seen that there exists a mapping from $\boldsymbol{\theta}_0$ (model's parameters) to the AV (or AD), i.e.
\begin{equation*}
\boldsymbol{\theta}_0 \mapsto \text{PSD or AVS} \mapsto \text{AV}.
\end{equation*}

In general, the mapping from the PSD to the AV is not one-to-one. Nevertheless, one may hope that in some case this mapping can be "inverted" (in some sense) so that the AV may be used to estimated $\boldsymbol{\theta}_0$. This is the central of AV linear regression approach.

Let us assume that we want to estimate the parameters of a QN, WN, RW or DR process. Then the process $(X_t)$ is such that it has a linear representation in a $\text{log} ( \phi_{\tau_j} ) - \text{log}\left( \tau_j \right)$ plot for the set of scales $\boldsymbol{\eta} \in \boldsymbol{\mathcal{G}}$ where
\begin{equation*}
	\boldsymbol{\mathcal{G}} = \left\{ \left\{\tau_k, \, ...,\, \tau_{k + h} \right\} | k,h \in \mathbb{N}^{+}, k + h \leq J\right\}
\end{equation*}
denotes all possible sets which contain adjacent scales having cardinality $|\boldsymbol{\eta}| > 0$.

Then, there exists a linear relationship between $\text{log} \left( \phi_{\tau_j} ({\theta}) \right)$ and $\text{log}\left( \tau_j \right)$ we can write

\begin{equation*}
\text{log} \left( {\phi}_{\tau_j} \left( {\theta} \right) \right) = g\left( \theta \right) + \lambda \text{log} \left( \tau_j \right), \;\;  \forall\tau_j \in \boldsymbol{\eta},
\end{equation*}

where the function $g (\cdot)$ as well as the constant $\lambda$ are known and depend on the model. For a white noise, we have for example $g(\sigma) = \text{log}(\sigma)$ and $\lambda = -1/2$. This linear relationship leads to the following "least-squares" estimator of $\theta$, noted as $\hat{\theta}_{AV}$ and defined as

\begin{equation*}
\hat{\theta}_{AV} \equiv \underset{\theta \in \Theta}{\text{argmin}} \; \sum_{\tau_j \in \boldsymbol{\eta}}\left[ 	\text{log} \left( \hat{\phi}_{\tau_j} \right) - g\left( \theta \right) - \lambda \text{log} \left( \tau_j \right) \right]^2,
\end{equation*}

which is an extremum estimator and admit the solution

\begin{equation*}
\hat{\theta}_{AV} = g^{-1} \left\{\frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \log \left( \hat{\phi}_{\tau_j}  \right) - \lambda \log \left( \tau_j \right)\right]\right\} ,
\end{equation*}

where $| \boldsymbol{\eta}|$ denotes the cardinality of the set $\boldsymbol{\eta}$.

```{example, AVparaest, name = "Derivation of the AV-based Parameter Estimator"}
The derivation of the AV-based estimator is pretty straightforward. Since $\hat{\theta}_{AV}$ is an extremum estimator, we can first take the first derivative of the objective function with respect to $\boldsymbol{\theta}$, which yields:

\begin{equation*}
\begin{aligned}
    &\frac{\partial}{\partial\boldsymbol{\theta}} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ 	\text{log} \left( \hat{\phi}_{\tau_j} \right) - g\left( \theta \right) - \lambda \text{log} \left( \tau_j \right) \right]^2 = 2\sum_{\tau_j \in \boldsymbol{\eta}}\left[ 	\text{log} \left( \hat{\phi}_{\tau_j} \right) - g\left( \theta \right) - \lambda \text{log} \left( \tau_j \right)\right] g^{\prime}\left( \theta \right) = 0 \\
    &\Leftrightarrow \frac{1}{|\boldsymbol{\eta}|} g^{\prime}\left( \theta \right) g\left( \theta \right) = g^{\prime}\left( \theta \right) \sum_{\tau_j \in \boldsymbol{\eta}} \left[ 	\log \left( \hat{\phi}_{\tau_j} \right) - \lambda \log \left( \tau_j \right) \right] \\
    &\Leftrightarrow \hat{\boldsymbol{\theta}}_{AV} = g^{-1} \left\{\frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \log \left( \hat{\phi}_{\tau_j}  \right) - \lambda \text{log} \left( \tau_j \right)\right]\right\}.
\end{aligned}
\end{equation*}

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```

Let us consider a simple example of the use of $\hat{\theta}_{AV}$ with a WN process. We already derived that
\begin{equation*}
\text{log}\left( \text{AVar}_j(Y_t)  \right) = \text{log} \left(\sigma\right) - \frac{1}{2} \text{log} (\tau_j)
\end{equation*}
implying that $g(\sigma) = \text{log}(\sigma)$ (and thus $g^{-1}(\sigma) = \exp(\sigma)$) and $\lambda = -1/2$. This leads to the following estimator of $\sigma$,

\begin{equation*}
\hat{\sigma}_{AV} = \exp\left\{ \frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}}  \text{log} \left( \hat{\phi}_{\tau_j}  \right) + \frac{1}{2} \text{log} \left( \tau_j \right) \right\}.
\end{equation*}

Similarly for a RW, we obtain

\begin{equation*}
\hat{\gamma}_{AV} = \sqrt{3} \exp\left\{ \frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}}  \text{log} \left( \hat{\phi}_{\tau_j}  \right) - \frac{1}{2} \text{log} \left( \tau_j \right) \right\}.
\end{equation*}

Using these results, let us estimate the parameters on the KVH-1750 gyro in Section [4.4.1](Allan Variance log-log Representation). We have agreed that for this real data, the shape of the AD suggests that a WN + RW may be a reasonable approximation. So taking into consideration the processes characterization in the AV log-log representation, we can use different scales to estimate paramters of WN and RW for more precise estimation as shown in the figure below:

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='KVH-1750 (Gyro): Scales for parameters estimation', echo=FALSE}
knitr::include_graphics("images/av_example_kvh_part2.pdf")
```

Using the AV-based Parameter Estimators for the WN and RW processes, we can obtain the following estimates: 

\begin{equation*}
\begin{aligned}
\hat{\sigma}_{AV} &= \frac{1}{2} \exp \left\{\frac{1}{12} \sum_{j = 1}^{12} \text{log} \left(\hat{\phi}_j\right)+ \frac{1}{2} \text{log} \left(2^j\right)\right\} \approx 1.710 \cdot 10^{-3}\\
\hat{\gamma}_{AV} &= \sqrt{3} \exp \left\{\frac{1}{4} \sum_{j = 15}^{18} \text{log} \left(\hat{\phi}_j\right)- \frac{1}{2} \text{log} \left(2^j\right)\right\} \approx 2.043 \cdot 10^{-7}.
\end{aligned}
\end{equation*}

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='KVH-1750 (Gyro): Implied AV by WN and RW respectively', echo=FALSE}
knitr::include_graphics("images/av_example_kvh_part3.pdf")
```

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='KVH-1750 (Gyro): Implied AV by WN + RW', echo=FALSE}
knitr::include_graphics("images/av_example_kvh_part4.pdf")
```


### Properties of Allan variance-based Estimation
Before studying the properties of the AV-based estimation technique, we want to check whether the MOAV estimator is consistent and asymptotically normal under the setting of composite stochastic processes for IMU calibration as defined in Definition \@ref(def:defCompStocProcesses). Lemma 1 of @guerrier2016theoretical has answered this question by showing that all the conditions of Lemmas
\@ref(lem:lemmconsistencyAV) and \@ref(lem:lemmaasyav) are satisfied under this setting. 

Since $\hat{\theta}_{AV}$ admits the analytically solution of the AV-based estimator, the consistency of the AV-based estimator can be directly studied using the consistency of the AV and the continuous mapping theorem. Indeed, using the latter we have
\begin{equation*}
\begin{aligned}
\hat{\theta}_{AV} &= g^{-1} \left\{\frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \text{log} \left( \hat{\phi}_{\tau_j}  \right) - \lambda \text{log} \left( \tau_j \right)\right]\right\}   \\
        &\overset{\mathcal{P}}{\mapsto}  g^{-1} \left\{\frac{1}{| \boldsymbol{\eta}|} \sum_{\tau_j \in \boldsymbol{\eta}} \left[ \text{log} \left( {\phi}_{\tau_j}  \right) - \lambda \text{log} \left( \tau_j \right)\right]\right\} = \theta^* ,
\end{aligned}
\end{equation*}
and therefore if we can show that $\theta^* = \theta_0$, then the AV-based estimator is consistent.


```{corollary, corConsistAVest, name = "Consistency of AV-based Estimator"}
The AV-based estimator is consistent if the process $(X_t)$ is composed of either a QN, WN or DR process.
```

```{proof, pf_corConsistAVest, name = "Proof of Consistency of AV-based Estimator"}
The proof of the consistency of AV-based estimator is direct by simply showing that $\theta^* = \theta_0$.
```
<br>


```{corollary, corInconsistAVest, name = "Inconsistency of AV-based Estimator"}
The AV-based estimator is not consistent if the process $(X_t)$ includes at least one RW process or contains more than one latent process.
```


```{proof, pf_corInconsistAVest, name = "Proof of Inconsistency of AV-based Estimator"}
It can be shown that at any model $\theta^* > \theta_0$, implying the inconsistency of the AV-based estimation method. 
```
<br>

A formal proof of these results are given in @guerrier2016theoretical.
 
The main reason for the inconsistency of this approach is that the effect of multiple superimposed stochastic processes cannot (perfectly) separated in the log-log representation. This suggests the following alternative definition of $\hat{\theta}_{AV}$:
\begin{equation*}
\hat{\theta}_{AV}^* = \text{argmin}_{\boldsymbol{\theta} \in \boldsymbol{\Theta}} \; \sum_{j = 1}^J \; \left( \hat{\phi}_j - \phi_j (\boldsymbol{\theta})\right)^2,
\end{equation*}
where $\phi_j (\boldsymbol{\theta})$ denotes the theoretical AV under the assumption that $\boldsymbol{\theta}$ is the correct parameter vector. We will see that this approach  is a special case of the GMWM estimator and it leads to consistent and asymptotically normally distributed estimators.

```{example, exmKVH, name = "Estimation for KVH-1750 Gyro using Alternative AV-based Estimator"}
Let us revisit the KVH-1750 gyro dataset using the above alternative AV-based estimation approach. We have shown that 
\begin{equation*}
\text{AVar}_j(X_t) = \text{AVar}_j(Y_t) + \text{AVar}_j(W_t) =  \frac{\sigma^2}{\tau_j} + \frac{1}{3} \gamma^2 \tau_j.
\end{equation*}  

and therefore we can define the following estimator:

\begin{equation*}
\hat{\boldsymbol{\theta}}_{AV}^* = [\hat{\sigma}_{AV}^* \;\;\;  \hat{\gamma}_{AV}^*]^T = \text{argmin}_{\boldsymbol{\theta} \in \boldsymbol{\Theta}} \; \sum_{j = 1}^J \; \left( \phi_j^2 -  \frac{\sigma^2}{\tau_j} - \frac{1}{3} \gamma^2 \tau_j \right)^2,
\end{equation*}

corresponding to the point estimates:

\begin{equation*}
\begin{aligned}
\hat{\sigma}_{AV}^* & \approx 1.847 \cdot 10^{-3}, \\
\hat{\gamma}_{AV}^* & \approx 1.493 \cdot 10^{-7}.
\end{aligned}
\end{equation*}
```

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='Estimation for KVH-1750 gyro using both the traditional and the alternative AV-based estimators', echo=FALSE}
knitr::include_graphics("images/av_example_kvh_part5.pdf")
```