01-timeseries.Rmd

# Introduction to Time Series Analysis {#timeseries}

In this chapter we give an introduction to time series analysis. For this purpose, it is organized in the following order:

- Definition and descriptive analysis of time series;
- Dependence within time series (and fundamental representations);
- Stationarity of time series;
- Basic time series models;
- Linear processes;
- Latent (or composite) stochastic processes;
- Estimation problems with time series. 

## Time Series
```{definition, ts, name = "Time Series"}
A time series is a stochastic process (i.e. a sequence of random variables) defined on a common probability space. Let us denote this time-indexed stochastic process (time series) as $(X_t)_{t = 1,...,T}$, i.e. ($X_1$, $X_2$, ...., $X_T$), where the time $t$ belongs to the discrete index set. Therefore, we implicitly assume that

- $t$ is non-random, i.e. the time at which each observation is measured is known, and 
- the time between two consecutive observations is constant (i.e. sampling occurs at regular intervals).
```

As for any data analysis procedure, the first step consists in representing the data in such a way as to highlight any important features or information that should be taken into account for the following statistical analysis. For time series, a typical first step is representing the observations over time (where the latter is represented on the x-axis and values of $X_t$ on the y-axis). This can be considered as the first step for the **descriptive analysis** of a time series, especially when their length is moderate. 

With this in mind, when performing a descriptive analysis of a time series it is customary to check the following aspects:

- Trends
    - Seasonal (e.g. business cycles)
    - Non-seasonal (e.g. impact of economic indicators on stock returns)
    - Local fluctuations (e.g. vibrations observed before, during and after an earthquake)
- Changes in the statistical properties
    - Mean (e.g. economic crisis)
    - Variance (e.g. earnings)
    - States (e.g. bear/bull in finance)
- Model deviations (e.g. outliers)

In order to give an idea of what the above characteristics imply for a time series, the following examples provide practical insight for some of them.

```{example, exampleJJ, name = "Johnson and Johnson Quarterly Earnings"}
A first example of a time series is the quarterly earnings of the company Johnson and Johnson. In the graph below, we present these earnings between 1960 and 1980.
```

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', fig.cap='Johnson and Johnson Quarterly Earnings'}
# Load simts package
library(simts)

# Load data
data(jj, package = "astsa")

# Construct gts object
jj = gts(jj, start = 1960, freq = 4)

# Plot time series
plot(jj, main = "Johnson and Johnson Quarterly Earnings",
     xlab = "Time (year)",
     ylab = "Quarterly Earnings per Share ($)") 
```

As we can see from the plot, the data contains a non-linear increasing trend as well as a seasonal component highlighted by the almost regularly-spaced peaks and valleys along time. In addition, we can notice that the variability of the data seems to increase with time (the seasonal variations appear to be larger towards the end of the plot). Hence, this simple visual representation can deliver important insight as to the behaviour of the time series and consequently determine the steps to take for further analysis (e.g. consider a non-linear model to explain the trend and consider approaches to model changing variance). <div style="text-align: right"> $\LARGE{\bullet}$ </div> 
<br>

```{example, examplePrecipitation, name = "Monthly Precipitation Data"}
Let us consider another data set coming from the domain of hydrology. The data records monthly precipitation (in mm) over a certain period of time (1907 to 1972) and is interesting for hydrologists for the purpose of studying water cycles. The data are presented in the plot below:
```

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', fig.cap='Monthly Precipitation Data'}
# Load data
data(hydro, package = "simts")

# Construct gts object
hydro = gts(hydro, start = 1907, freq = 12)

# Plot time series
plot(hydro, main = "Monthly Precipitation Data",
     xlab = "Time (year)",
     ylab = "Mean Monthly Precipitation (mm)")
```

The time series plot above differs considerably from the previous one since the values of the time series, being always non-negative, remain almost always between 0 and 1 (no apparent trend) and randomly shows some larger observations that go beyond the value of 2 (or even 3). The latter appear to be extreme observations which could qualify as outliers, i.e. observations that are not representative of the true underlying model that generates the time series and can considerably affect the statistical analysis if not dealt with. <div style="text-align: right"> $\LARGE{\bullet}$ </div>
<br>

```{example, exampleInertialSensor, name = "Inertial Sensor Data"}
Another example is provided by the data coming from the calibration procedure of an Inertial Measurement Unit (IMU). The signals (or time series) coming from an IMU are usually measured at high frequencies over a long time and are often characterized by linear trends and numerous underlying stochastic processes. The plot below represents the time series of an error signal coming from a gyroscope belonging to an IMU.
```

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', fig.cap='Inertial Sensor Data'}
# Load data
data(imu6, package = "imudata")

# Construct gts object
imu = gts(imu6[,1], freq = 100*60*60)

# Plot time series
plot(imu, main = "Inertial Sensor Data",
     ylab = expression(paste("Angular Velocity ", (rad/s))),
     xlab = "Time (h)")
```

As we can see from the plot, there wouldn't appear to be any linear trend, seasonality or increased variation in the time series. Indeed, from this plot it is difficult to detect any particular characteristic of this time series although a linear trend and other processes are present in this data. Therefore, especially when the length of the time series is considerable, this representation of time series can only give insight into few aspects (if none) and consequently is not suitable, on its own, to adequately inform the subsequent statistical analysis. <div style="text-align: right"> $\LARGE{\bullet}$ </div>
<br>

In order to deliver a more appropriate (or more complete) representation of a time series, it is important to study the concept of dependence since (in a linear vision of time) past observations have an influence on present and (possibly) future ones. Hence, the next section gives an overview of this concept.

## Dependence within Time Series

As mentioned above, it is straightforward to assume that observations measured through time are dependent on each other (in that observations at time $t$ have some form of impact on observations at time $t+1$ or beyond). Due to this characteristic, one of the main interests in time series is prediction where, if $(X_t)_{t=1,\ldots,T}$ is an identically distributed but not independent sequence, we often want to know the value of ${X}_{T+h}$ for $h > 0$ (i.e. an estimator of $\mathbb{E}[X_{T+h}| X_T,...]$). In order to tackle this challenge, we first need to understand the dependence between $X_{1},\ldots,X_{T}$ and, even before this, we have to formally define what **independence** is.

```{definition, IndepEvents, name = "Independence of Events"}
Two events $A$ and $B$ are independent if 
\begin{align*}
\mathbb{P}(A \cap B) = \mathbb{P}(A)\mathbb{P}(B),
\end{align*}
with $\mathbb{P}(A)$ denoting the probability of event $A$ occuring and $\mathbb{P}(A \cap B)$ denoting the joint probability (i.e. the probability that events $A$ and $B$ occur jointly). In general, $A_{1},\ldots,A_{n}$ are independent if 
\begin{align*}
\mathbb{P}(A_1 \ldots A_n) = \mathbb{P}(A_1) \ldots \mathbb{P}(A_n) \;\; \forall \; A_i \in S, \;\; i=1,\ldots,n
\end{align*}
where $S$ is the sample space.
```
<br>

```{definition, IndepRV, name = "Independence of Random Variables"}
Two random variables $X$ and $Y$ with Cumulative Distribution Functions (CDF) $F_X(x)$ and $F_Y(y)$, respectively, are independent if and only if their joint CDF $F_{X,Y}(x,y)$ is such that 
\begin{align*}
F_{X,Y}(x,y) = F_{X}(x) F_{Y}(y).
\end{align*}
In general, random variables $X_1, \ldots, X_n$ with CDF $F_{X_1}(x_1), \ldots, F_{X_n}(x_n)$ are respectively independent if and only if their joint CDF $F_{X_1, \ldots, X_n}(x_1, \ldots, x_n)$ is such that
\begin{align*}
F_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = F_{X_1}(x_1) \ldots F_{X_n}(x_n).
\end{align*}
```
<br>

```{definition, iid, name = "iid sequence"}
The sequence $X_{1},X_{2},\ldots,X_{T}$ is said to be independent and identically distributed (i.e. iid) if and only if
\begin{align*}
\mathbb{P}(X_{i}<x) = \mathbb{P}(X_{j}<x) \;\; \forall x \in \mathbb{R}, \forall i,j \in \{1,\ldots,T\},
\end{align*}
and
\begin{align*}
\mathbb{P}(X_{1}<x_{1},X_{2}<x_{2},\ldots,X_{T}<x_{T})=\mathbb{P}(X_{1}<x_1) \ldots \mathbb{P}(X_{T}<x_T) \;\; \forall T\geq2, x_1, \ldots, x_T \in \mathbb{R}.
\end{align*}
```
<br>

The basic idea behind the above definitions of independence is the fact that the probability of an event regarding variable $X_i$ remains unaltered no matter what occurs for variable $X_j$ (for $i \neq j$). From this definition, we can now start exploring the concept of dependence, starting from **linear dependence** within a time series. For this purpose, below we define a quantity called AutoCovariance (ACV). 

```{definition, ACV, name="AutoCovariance"}
Autocovariance denoted as $\gamma_X(t, t+h)$ is defined as
\begin{align*}
\gamma_X(t, t+h) = \text{Cov}(X_{t},X_{t+h})= 	\mathbb{E}(X_{t}X_{t+h})-\mathbb{E}(X_{t})\mathbb{E}(X_{t+h}),
\end{align*}
where $\text{Cov}(\cdot)$ denotes the covariance and
\begin{align*}
\mathbb{E}(X_{t}) = \int_{-\infty}^{\infty}x \, f(x) \, dx \;\; \text{and} \;\; \mathbb{E}(X_{t},X_{t+h}) = \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x_{t}\,x_{t+h}\, f(x_{t},x_{t+h}) \, dx_{t}\,dx_{t+h},
\end{align*}
where $f(x)$ denotes the density of $X_t$ and $f(x_{t},x_{t+h})$ denotes the joint density of $X_{t}$ and $X_{t+h}$.
```
<br>

Notice that, when two variable are independent, $\mathbb{E}(X_{t}X_{t+h}) = \mathbb{E}(X_{t})\mathbb{E}(X_{t+h})$ and hence $\gamma_X(t, t+h) = 0$. In a nutshell, the ACV measures the degree to which the mean-behaviour of a variable (e.g. $X_{t+h}$) changes when another one changes (e.g. $X_t$) and hence, to what extent they are linearly dependent.  

```{exercise, propertiesACV, name="Properties of ACV"}
\phantom{a}

1. ACV is symmetric, i.e. $\gamma_X(t, t+h) = \gamma_X(t+h, t)$ as $\text{Cov}(X_{t},X_{t+h}) = \text{Cov}(X_{t+h},X_{t})$. Under stationarity (will be discussed very soon, see Section \@ref(stationarity) for more details), $\gamma_X(h) = \gamma_X(-h)$, i.e. ACV is an even function. 

2. Variance of the process $\text{Var}(X_t) = \gamma_X(t, t) \geq 0$. Under stationarity, $\text{Var}(X_t) = \gamma_X(0)$ and $\mid \gamma_X(h) \mid \leq \gamma_X(0)$ by Cauchy-Schwarz inequality. 

3. Scale dependent: ACV $\gamma_X(t, t+h)$ is scale dependent like any covariance. So $\gamma_X(t, t+h) \in \mathbb{R}$. 

    - If $\mid \gamma_X(t, t+h) \mid$ is "close" to 0, then $X_{t}$ and $X_{t+h}$ are "less (linearly) dependent".
    - If $\mid \gamma_X(t, t+h) \mid$ is "far" from 0, then $X_{t}$ and $X_{t+h}$ are "more (linearly) dependent".

However in general, it is difficult to assess what "close" and "far" from zero mean.

4. In general, $\gamma_X(t, t+h)=0$ does not imply $X_{t}$ and $X_{t+h}$ are independent. However, if $X_{t}$ and $X_{t+h}$ are joint normally distributed, then $\gamma_X(t, t+h)=0$ implies that $X_{t}$ and $X_{t+h}$ are independent.
```
<br>

However the ACV is a quantity whose bounds are not known *a priori* and it can consequently be impossible to determine whether the degree of linear dependence is large or not. For this reason another measure of linear dependence can be used which is related to the ACV. Indeed, the AutoCorrelation Function (ACF) is a commonly used metric in time series analysis and is defined below.

```{definition, ACF, name="AutoCorrelation"}
AutoCorrelation (ACF) denoted as $\rho_X(t, t+h)$ is defined as 
\begin{align*}
\rho_X(t,t+h) = \text{Corr}(X_{t},X_{t+h}) = \frac{\text{Cov}(X_{t},X_{t+h})}{\sqrt{\text{Var}(X_{t})} \sqrt{\text{Var}(X_{t+h})}},
\end{align*}
where $\text{Var}(\cdot)$ denotes the variance.
```
<br>

```{exercise, propertiesACF, name = "Properties of ACF"}


1. $\mid \rho_X(t,t+h) \mid \leq 1$ and $\mid \rho_X(t,t) \mid = 1$. 

2. ACF is symmetric, i.e. $\rho_X(t, t+h) = \rho_X(t+h, t)$ as $\text{Corr}(X_{t},X_{t+h}) = \text{Corr}(X_{t+h},X_{t})$. Under stationarity, $\gamma_X(h) = \gamma_X(-h)$, i.e. ACF is an even function. 

3. Scale invariant: ACF $\rho_X(t,t+h)$ is scale free like any correlation. Moreover, if $\rho_X(t,t+h)$ is "close" to $\pm 1$, then this implies that there is "strong" (linear) dependence between $X_{t}$ and $X_{t+h}$.
```
<br>

Since the ACV is bounded by the product of the standard deviations of the two variables, we have that the ACF is bounded between 1 and -1. Therefore it is possible to better interpret the linear dependence where an ACF of zero indicates an absence of linear dependence while an ACF close to 1 or -1 indicates respectively a strong positive or negative linear dependence. 

As underlined up to now, both ACV and ACF are appropriate to measure linear dependence only. Therefore if the ACV and ACF are both zero, this does not imply that there isn't dependence between the variables. Indeed, aside from linear dependence, other forms of dependence such as monotonic or nonlinear dependence also exist which the ACV or ACF don't measure directly. The figure below shows scatterplots (one variable on the x-axis and another on the y-axis) along with the values of the ACF:

```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='Different forms of dependence and their ACF values', echo=FALSE}
knitr::include_graphics("images/Cor.png")
```

As can be seen, the first row in the figure shows a consistent behaviour of the ACF value since the absolute value goes closer to 1 the more the points go closer to forming a line. However, the following rows show plots where there is clearly a strong relation (dependence) between the variables but the ACF doesn't detect this characteristic since this relation is not linear.

Finally, it is worth noting that correlation does NOT imply causation. For example, if $\rho(t, t+h) \neq 0$, this does not imply that $X_t \to X_{t+h}$ is causal. More specifically, the presence (or absence) of causality cannot be detected through statistical tools although some approximated metrics have been proposed to measure this concept [e.g Granger causality, see  @granger1969investigating]. 

For the following sections we will simplify the notations $\gamma_X(t, t+h)$ and $\rho_X(t, t+h)$ to be $\gamma(h)$ and $\rho(h)$ when there is no ambiguity (i.e. only one time series is considered and the ACF only depends on the time-lag $h$).
 
## Stationarity {#stationarity}

In this section we are going to introduce the concept of stationarity, one of the most important characteristics of time series data. First let us consider an example of **non-stationary processes**.

```{example, exampleNonStationary, name = "Non-Stationary Process"}
\begin{equation*}
X_t  \sim \mathcal{N} \left(0, Y_t^2\right) \;\; \text{where $Y_t$ is unobserved and such that} \;\; Y_t  \overset{iid}{\sim} \mathcal{N} \left(0, 1\right).
\end{equation*}

In this case, it is clear that the estimation of $\text{Var}(X_t)$ is difficult since only $X_t$ is useful for the estimation. So in fact, $X_t^2$ is our best guess for $\text{Var}(X_t)$. 

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```

On the other hand, let us consider an example of **stationary processes** where averaging becomes meaningful for such process.

```{example, exampleStationary, name = "Stationary Process"}
\begin{equation*}
X_t = \theta W_{t-1} + W_t \;\;\; \text{where} \;\;\;  W_t  \stackrel{iid}{\sim} \mathcal{N} \left(0, 1\right).
\end{equation*}

In this case, we can guess that a natural estimator of $\text{Var}(X_t)$ can be $\hat{\sigma}^2 = \frac{1}{T} \sum_{i = 1}^T X_i^2$. That is, now averages are meaningful for such process. 

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```

We formalize the above idea by introducing the concept of stationarity. There exist two forms of stationarity, which are defined below:

```{definition, StrongStationarity, name = "Strong Stationarity"}
The time series $X_{t}$ is strongly stationary if the joint probability distribution is invariant under a shift in time, i.e.
\begin{equation*}
\mathbb{P}(X_{t}\leq x_{0},\ldots,X_{t+k}\leq x_{k}) = \mathbb{P}(X_{t+h}\leq x_{0} ,\ldots,X_{t+h+k}\leq x_{k})
\end{equation*}
for any time shift $h$ and any $x_{0}, x_{1},x_{2},\cdots,x_{k}$ belong to the domain of $X_t,\cdots,X_{t+k}$ and $X_{t+h},\cdots,X_{t+h+k}$. 
```

```{definition, WeakStationarity, name = "Weak Stationarity"}
The time series $(X_{t})_{t \in \mathbb{N}}$ is weakly stationary if the mean and autocovariance are finite and invariant under a shift in time, i.e.
\begin{equation*}
\begin{aligned}
\mathbb{E}\left[X_t\right] &= \mu < \infty,\\
\mathbb{E}\left[X_t^2\right]  &= \mu_2 < \infty,\\
\text{Cov}(X_{t},X_{t+h})&= \text{Cov}(X_{t + k},X_{t+h + k}) = \gamma( h ).
\end{aligned}
\end{equation*}
for any time shift $h$. For convenience, we use the abbreviation "stationary" to indicate "weakly stationary" by default.
```

The stationarity of $X_{t}$ is important because it provides a framework in which averaging makes sense. The concept of averaging is essentially meaningless unless properties like mean and covariance are either fixed or evolve in a known manner. 

```{exercise, StatImpAcvAcf, name = "Implication on the ACV and ACF"}
If a process is weakly stationary or strongly stationary and $\text{Cov}(X_{t},X_{t+h})$ exists for all $h \in \mathbb{Z}$, then we have both ACV and ACF only depend on the lag between observations, i.e.
\begin{equation*}
\begin{aligned}
\gamma(t, t+h) &= \text{Cov}(X_{t},X_{t+h})= \text{Cov}(X_{t + k},X_{t+h + k}) = \gamma(t+k, t+h+k) = \gamma(h),\\
\rho(t, t+h) &= \text{Corr}(X_{t},X_{t+h})= \text{Corr}(X_{t + k},X_{t+h + k}) = \rho(t+k, t+h+k) = \rho(h).
\end{aligned}
\end{equation*}
```

```{exercise, RelationStationary, name = "Relation between Strong and Weak Stationarity"}


In general, neither type of stationarity implies the other one. However, 

- If $X_{t}$ is Normal (Gaussian) with $\sigma^2 = \text{Var} (X_{t}) < \infty$, then weak stationarity implies strong stationarity.
- If $X_{t}$ is strongly stationary, $\mathbb{E}(X_t) < \infty$ and $\mathbb{E}(X_t^2) < \infty$, then $X_{t}$ is weakly stationary.
```

```{example, exampleRelationStat1, name = "Strong Stationarity does NOT imply Weak Stationarity"}
An iid Cauchy process is strongly but not weakly stationary as the mean of the process does not exist.

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```

```{example, exampleRelationStat2, name = "Weak Stationarity does NOT imply Strong Stationarity"}
Let $X_t \overset{iid}{\sim} \exp(1)$ (i.e. exponential distribution with $\lambda = 1$) and $Y_t \overset{iid}{\sim} \mathcal{N}(1,1)$. Then, let
\begin{equation*}
Z_t = \left\{
\begin{array}{cl}
X_t &\text{if } t \in \left\{2k | k \in \mathbb{N}\right\}\\
Y_t &\text{if }  t \in \left\{2k + 1 | k \in \mathbb{N}\right\},
\end{array}
\right.
\end{equation*}
we have $Z_t$ is weakly stationary but not strongly stationary.

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```

```{exercise, StatProcesses, name = "Stationarity of Latent Time Series Processes"}


- (Weakly) Stationary: WN, QN, AR1
- (Weakly) Non-Stationary: DR, RW
```

```{proof, proofAR1stationary, name="AR1 is weakly stationary"}
Consider an AR1 process defined as:
\begin{equation*}
X_t = \phi X_{t-1} + Z_t, \;\;\; Z_t \overset{iid}{\sim} \mathcal{N}(0,\nu^2),
\end{equation*}
where $\mid \phi \mid < 1$ and $\nu^2 < \infty$. Then we have
\[\begin{aligned}
{X_t}  &=   {\phi }{X_{t - 1}} + {Z_t} = \phi \left[ {\phi {X_{t - 2}} + {Z_{t - 1}}} \right] + {Z_t} =  {\phi ^2}{X_{t - 2}} + \phi {Z_{t - 1}} + {Z_t}  \\
& \; \vdots  \\
&=   {\phi ^k}{X_{t-k}} + \sum\limits_{j = 0}^{k - 1} {{\phi ^j}{Z_{t - j}}} .
\end{aligned} \]
By taking the limit in $k$ (which is perfectly valid as we 
assume $t \in \mathbb{Z}$), we obtain 
\begin{equation*}
\begin{aligned}
X_t = \mathop {\lim }\limits_{k \to \infty} \; {X_t}  =  \sum\limits_{j = 0}^{\infty} {{\phi ^j}{Z_{t - j}}}.
\end{aligned}
\end{equation*}
So we have
\begin{equation*}
\begin{aligned}
\mathbb{E}\left[X_t\right] &=  \sum\limits_{j = 0}^{\infty} {{\phi ^j}{\mathbb{E} [Z_{t - j}]}} = 0, \\
\text{Var}\left(X_t\right) &= \text{Var}\left(\sum\limits_{j = 0}^{\infty} {{\phi ^j}{Z_{t - j}}}\right) = \sum\limits_{j = 0}^{\infty} {\phi^{2j}} \text{Var}\left(Z_{t-j}\right) = \nu^2 \sum\limits_{j = 0}^{\infty} {\phi^{2j}} = \frac{\nu^2}{1-\phi^2} < \infty.
\end{aligned}
\end{equation*}
Moreover, assuming for notational simplicity that $h > 1$, we obtain
\begin{equation*}
\begin{aligned}
\text{Cov}\left(X_t, X_{t+h}\right) &= \phi \text{Cov}\left(X_t, X_{t+h-1}\right) = \phi^2 \text{Cov}\left(X_t, X_{t+h-2}\right) = \ldots = \phi^h \text{Cov}(X_t, X_t).
\end{aligned}
\end{equation*}
In general, when $h \in \mathbb{Z}$ we obtain
\begin{equation*}
\begin{aligned}
\text{Cov}\left(X_t, X_{t+h}\right) & = \phi^{|h|} \text{Cov}(X_t, X_t) = \phi^{|h|} \frac{\nu^2}{1-\phi^2},
\end{aligned}
\end{equation*}
which is a function of the lag $h$ only. Therefore, this AR1 process is weakly stationary.
```
<br>


## Linear Processes
In this section we introduce the concept of linear processes. As a matter of fact, considered stationary models can all, so far, be represented as linear processes. 

```{definition, LinearProc, name = "Linear Processes"}
A stochastic process $(X_t)$ is said to be a linear process if it can be expressed as a linear combination of an iid Gaussian sequence (i.e. white noise process), i.e.:
\[{X_t} = \mu + \sum\limits_{j =  - \infty }^\infty  {{\psi _j}{W_{t - j}}} \]
where $W_t \overset{iid}{\sim} \mathcal{N}(0, \sigma^2)$ and $\sum\limits_{j =  - \infty }^\infty  {\left| {{\psi _j}} \right|}  < \infty$.
```

Notice that the condition $\sum\limits_{j = - \infty }^\infty {\left| {{\psi _j}} \right|} < \infty$ is required in the definition of linear processes in order to ensure that the series has a limit and is related to the absolutely summable covariance structure, which is defined below.

```{definition, AbsSumCov, name = "Absolutely Summable Covariance Structure"}
A process $(X_t)$ is said to have an absolutely summable covariance structure if $\sum\limits_{h = - \infty }^\infty {\left| \gamma_X(h) \right|} < \infty$.
```


```{exercise, propertiesLinearProc, name = "Properties of Linear Processes"}


1. All linear processes are stationary since
\[\begin{aligned}
\mathbb{E}[X_t] &= \mu, \\
\gamma(h) &= \sigma^2\sum\limits_{j =  - \infty }^\infty  {{\psi _j}{\psi _{j+h}}}.
\end{aligned}\]

2. All linear processes have absolutely summable covariance structures. 
```

```{proof, ACVofLP, name = "ACV of Linear Processes"}
\begin{align*}
\gamma(h) &= \text{Cov}(X_t, X_{t+h}) = \text{Cov}(\mu+\sum_{j=-\infty}^{\infty} \psi_j W_{t-j}, \mu+\sum_{j=-\infty}^{\infty} \psi_j W_{t+h-j})\\
&= \text{Cov}(\sum_{j=-\infty}^{\infty} \psi_j W_{t-j}, \sum_{j=-\infty}^{\infty} \psi_j W_{t-(j-h)}) \\
&= \text{Cov}(\sum_{j=-\infty}^{\infty} \psi_j W_{t-j}, \sum_{j=-\infty}^{\infty} \psi_{j+h} W_{t-j}) \\
&= \sum_{j=-\infty}^{\infty} \psi_j \psi_{j+h} \text{Cov}(W_{t-j}, W_{t-j}) \\
&= \sigma^2\sum\limits_{j =  - \infty }^\infty  {{\psi _j}{\psi _{j+h}}}.
\end{align*}
```
<br>

```{proof, AbsSumCovLP, name = "All linear processes have absolutely summable covariance structures."}
\begin{align*}
\sum_{h=-\infty}^{\infty} \mid \gamma(h) \mid &= \sum_{h=-\infty}^{\infty} \sigma^2 \mid \sum_{j=-\infty}^{\infty} \psi_j \psi_{j+h} \mid \\
&\leq \sigma^2 \sum_{h=-\infty}^{\infty} \sum_{j=-\infty}^{\infty} \mid \psi_j \psi_{j+h} \mid \\
&= \sigma^2 \sum_{j=-\infty}^{\infty} \sum_{h=-\infty}^{\infty} \mid \psi_j \mid \cdot \mid \psi_{j+h} \mid \\
&= \sigma^2 \sum_{j=-\infty}^{\infty} \mid \psi_j \mid \sum_{h=-\infty}^{\infty} \mid \psi_{j+h} \mid \\
&= \sigma^2 \big( \sum_{j=-\infty}^{\infty} \mid \psi_j \mid \big)^2 < \infty
\end{align*}
So with the assumption that $\sum_{j=-\infty}^{\infty} \mid \psi_j \mid < \infty$, we obtain that all linear processes have absolutely summable covariance structures. Notice that here we have shown that the condition $\sum\limits_{j = - \infty }^\infty {\left| {{\psi _j}} \right|} < \infty$ is actually stronger than $\sum\limits_{h = - \infty }^\infty {\left| \gamma(h) \right|} < \infty$.
```
<br>

```{example, AR1isLP, name = "AR1 is a linear process"}
When we prove above that AR1 is weakly stationary, we have shown that for an AR1 process $X_t = \phi X_{t-1} + Z_t, \;\;\; Z_t \overset{iid}{\sim} \mathcal{N}(0,\nu^2)$, it can be represented as 
\begin{align*}
X_t = \sum\limits_{j = 0}^{\infty} {{\phi ^j}{Z_{t - j}}}.
\end{align*}
Therefore, AR1 is a linear process.

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```


## Basic Time Series Models

We first introduce some latent time series models that are commonly used, especially in the calibration procedure of inertial sensors.

```{definition, WN, name = "Gaussian White Noise"}
The Gaussian White Noise (WN) process with parameter $\sigma^2 \in \mathbb{R}^+$ is defined as 
\begin{equation*}
		X_t  \overset{iid}{\sim} \mathcal{N}\left(0, \sigma^2 \right)
\end{equation*}
where "iid" stands for "independent and identically distributed".
```

```{definition, QN, name = "Quantization Noise"}
The Quantization Noise (QN) process with parameter $Q^2 \in \mathbb{R}^+$ is a process with Power Spectral Density (PSD) of the form 
\begin{equation*}
		S_{X}(f) = 4 Q^2 \sin^2 \left( \frac{\pi f}{\Delta t} \right) \Delta t, \;\; f < \frac{\Delta t}{2}.
\end{equation*}
```

```{definition, DR, name = "Drift"}
The Drift (DR) process with parameter $\omega \in \Omega$, where $\Omega$ is either $\mathbb{R}^+$ or $\mathbb{R}^-$, is defined as 
\begin{equation*}
X_t = \omega t.
\end{equation*}
```

```{definition, RW, name = "Random Walk"}
The Random Walk (RW) process with parameter $\gamma^2 \in \mathbb{R}^+$ is defined as 
\begin{equation*}
X_t = X_{t-1} + \epsilon_t \;\; \text{where}\;\; \epsilon_t  \overset{iid}{\sim} \mathcal{N}\left(0, \gamma^2 \right)\;\; \text{and}\;\; X_0 = 0.
\end{equation*}
```

```{definition, AR, name = "Auto-Regressive"}
The Auto-Regressive process of Order 1 (AR1) with parameter $\phi \in (-1, +1)$ and $\upsilon^2 \in \mathbb{R}^+$ is defined as 
\begin{equation*}
X_t = \phi X_{t-1} + Z_t, \;\;\; Z_t \overset{iid}{\sim} \mathcal{N}(0,\upsilon^2).
\end{equation*}
```

```{definition, GM, name = "Gauss Markov"}
The Gauss Markov process of Order 1 (GM) with parameter $\beta \in \mathbb{R}$ and $\sigma_G^2 \in \mathbb{R}^+$ is defined as 
\begin{equation*}
      X_t = \exp(-\beta \Delta_t) X_{t-1} + Z_t, \;\;\; 
      Z_t \overset{iid}{\sim} \mathcal{N}(0,\sigma^2_{G}(1-\exp(-2\beta\Delta t)))
\end{equation*}
where $\Delta t$ denotes the time between $X_t$ and $X_{t-1}$.
```

```{exercise, GMandAR1, name = "GM and AR1"}
A GM process is a one-to-one reparametrization of an AR1 process. In the following, we will only discuss AR1 processes but all results remain valid for GM processes.
```

With the above defined latent time series processes, we introduce the composite stochastic process, which is widely used in the estimation procedure of inertial sensor stochastic calibration.

```{definition, CompStocProc, name = "Composite Stochastic Process"}
A composite stochastic process is a sum of latent processes. We implicitly assume that these latent processes are independent.
```

```{example, exampleCompStocProc, name = "2*AR1 + WN"}
The composite stochastic process of "2*AR1 + WN" is given as
\begin{align}
Y_t &= \phi_1 Y_{t-1} + Z_t, \;\;\; Z_t \overset{iid}{\sim} \mathcal{N}(0,\upsilon_1^2),\\
W_t &= \phi_2 W_{t-1} + U_t, \;\;\; U_t \overset{iid}{\sim} \mathcal{N}(0,\upsilon_2^2),\\
Q_t &\overset{iid}{\sim} \mathcal{N}(0,\sigma^2),\\
X_t &= Y_t + W_t + Q_t,
\end{align}
where $Y_t$, $W_t$ and $Q_t$ are independent and only $X_t$ is observed.

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```


## Fundamental Representations of Time Series
We conclude this chapter by summarizing the fundamental representations of time series. If two processes have the same fundamental representations, then these two processes are the same. There are two most commonly used fundamental representations of time series, i.e.

- ACV and ACF;
- Power Spectral Density (PSD).

```{exercise, FundaRepreACVACF, name = "ACV and ACF as fundamental representation"}
If we consider a zero mean normally distributed process, it is clear that its joint distribution is fully characterized by the autocovariances $\mathbb{E}[X_t X_{t+h}]$ since the joint probability density only depends on these covariances. Once we know the autocovariances we know everything there is to know about the process and therefore: if two processes have the same autocovariance function, then they are the same process.
```

```{exercise, FundaReprePSD, name = "PSD as fundamental representation"}
The PSD is defined as
\begin{equation*}
    S_X(f) = \int_{- \infty}^{\infty} \gamma_{X}(h)\,e^{-ifh}dh,
\end{equation*}
where $f$ is a frequency. Hence, the PSD is a Fourier Transform (FT) of the autocovariance function which describes the variance of a time series over frequencies (with respect to lags $h$).

Given that the definition of the PSD, as for the autocovariance function, once we know the PSD we know everything there is to know about the process and therefore: if two processes have the same PSD, then they are the same process.
```


## Estimation Problems with Time Series 
Estimation in the context of time series is not as straightforward as in the iid case. For example, let us consider the easiest case of estimation: the sample mean of a stationary time series. 

```{example, EstSampleMean, name = "Estimation with Sample Mean"}
Let $(X_t)$ be a stationary time series, so we have that $\mathbb{E}[X_t] = \mu$ and the value of $\mu$ can be estimated by the sample mean, i.e.
\begin{equation*}
	\bar{X} = \frac{1}{T} \sum_{t = 1}^T X_t.
\end{equation*}
Using the properties of a stationary process, we obtain
\begin{equation*}
\text{Var} \left(\bar{X}\right) = \frac{\gamma(0)}{T} \sum_{h = -T}^{T} \left(1 - \frac{|h|}{T}\right) \rho(h)
\end{equation*}
since
\begin{align*}
\text{Var}(\bar{X}) &= \frac{1}{T^2} \text{Var}(\sum_{t=1}^T X_t) = \frac{1}{T^2} (\sum_{t=1}^T \text{Var}(X_t) + 2\underset{1 \leq t<s\leq T}{\sum\sum} \text{Cov}(X_t, X_s)) \\
&= \frac{1}{T^2} (T\gamma(0) + 2(T-1)\gamma(1) + 2(T-2)\gamma(2) + \ldots + 2\gamma(T-1)) \\
&= \frac{\gamma(0)}{T^2} (T+ 2(T-1)\rho(1) + \ldots + 2\rho(T-1)) \\
&= \frac{\gamma(0)}{T} + \frac{2\gamma(0)}{T} \sum_{h=1}^{T-1} (1-\frac{h}{T})\rho(h) \\
&= \frac{\gamma(0)}{T} \sum_{h = -(T-1)}^{T-1} \left(1 - \frac{|h|}{T}\right) \rho(h) \\
&= \frac{\gamma(0)}{T} \sum_{h = -T}^{T} \left(1 - \frac{|h|}{T}\right) \rho(h).
\end{align*}

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```

```{example, EstSampleMeaninAR1, name = "Estimation with Sample Mean in AR1"}
As in the previous example, let us consider a stationary AR1 process, i.e.
\begin{equation*}
X_t = \phi X_{t-1} + Z_t, \;\;\; \text{where} \;\;\; |\phi| < 1 \;\;\; \text{and} \;\;\;  Z_t  \overset{iid}{\sim} \mathcal{N} \left(0, \nu^2\right).
\end{equation*}

We have obtained before that in AR1, $\gamma(h) = \phi^h \sigma^2 \left(1 - \phi^2\right)^{-1}$. Therefore, we obtain (after some computations):

\begin{equation*}
\text{Var} \left( {\bar X} \right) = \frac{\nu^2 \left( T - 2\phi - T \phi^2 + 2 \phi^{T + 1}\right)}{T^2\left(1-\phi^2\right)\left(1-\phi\right)^2}.
\end{equation*}

Unfortunately, deriving such an exact formula is often difficult when considering more complex models. Therefore, asymptotic approximations are often employed to simplify the calculation. For example, in this AR1 case we have
\begin{equation*}
\lim_{T \to \infty } \; T \text{Var} \left( {\bar X} \right) = \frac{\nu^2}{\left(1-\phi\right)^2},
\end{equation*}
providing the following approximate formula
\begin{equation*}
\text{Var} \left( {\bar X} \right) \approx \frac{\nu^2}{T \left(1-\phi\right)^2}.
\end{equation*}


Alternatively, simulation methods can also be employed. For example, one could compute $\text{Var} \left( {\bar X} \right)$ as follows:

Step 1: Simulate under the assumed model, i.e. $X_t^* \sim F_{\theta_0}$, where $F_{\theta_0}$ denotes the true model (in this case an AR1 process).

Step 2: Compute ${\bar X^*}$ (i.e. average based on $(X_t^*)$).

Step 3: Repeat Steps 1 and 2 $B$ times.

Step 4: Compute the empirical variance ${\bar X^*}$ (based on $B$ independent replications).

The above procedure is known as Monte-Carlo method (in this case it is actually a Monte-Carlo integral) and is closely related to the concept of parametric bootstrap (see @efron1994introduction) which is a very popular tool in statistics.

<div style="text-align: right"> $\LARGE{\bullet}$ </div>
```

Now we define the classical estimators of $\gamma(h)$ and $\rho(h)$ for AutoCovariance and AutoCorrelation functions. 

```{definition, sampleACV, name = "Sample AutoCovariance Function"}
The sample autocovariance function is defined as
\begin{equation*}
\hat{\gamma}(h) = \frac{1}{T} \sum_{t = 1}^{T-h} \left(X_{t} - \bar{X}\right) \left(X_{t+h} - \bar{X}\right)
\end{equation*}
with $\hat{\gamma}(h) = \hat{\gamma}(-h)$ for $h = 0, 1, ..., k$, where $k$ is a fixed integer.
```

```{definition, sampleACF, name = "Sample AutoCorrelation Function"}
The sample autocorrelation function is defined as
\begin{equation*}
\hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)}
\end{equation*}
with $\hat{\rho}(h) = \hat{\rho}(-h)$ for $h = 0, 1, ..., k$, where $k$ is a fixed integer.
```

We will discuss more about the properties of these estimators in the next chapter.