AR models

This commit is contained in:
jannisp 2021-08-21 17:14:22 +02:00
parent 443ea83418
commit c726ca13ab
2 changed files with 183 additions and 0 deletions

BIN
img/checkresiduals.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

183
main.tex
View file

@ -544,6 +544,189 @@ pacf(wave, ylim=c(1,1))
\label{fig:pacf}
\end{figure}
\section{Basics of modelling}
\subsection{White noise}
\begin{quote}
A time series $(W_1, W_2,..., W_n)$ is a \textbf{White Noise} series if the random variables $W_1 , W_2,...$ are i.i.d with mean zero.
\end{quote}
This implies that all $W_t$ have the same variance $\sigma_W^2$ and
$$Cov(W_i,W_j) = 0 \, \forall \, i \neq j$$
Thus, there is no autocorrelation either: $\rho_k = 0 \, \forall \, k \neq 0$. \\
\vspace{.2cm}
If in addition, the variables also follow a Gaussian distribution, i.e. $W_t \sim N(0, \sigma_W^2)$, the series is called \textbf{Gaussian White Noise}. The term White Noise is due to the analogy to white light (all wavelengths are equally distributed).
\subsection{Autoregressive models (AR)}
In an $AR(p)$ process, the random variable $X_t$ depends on an autoregressive linear combination of the preceding $X_{t-1},..., X_{t-p}$, plus a „completely independent“ term called innovation $E_t$.
$$X_t = \alpha_1 X_{t-1} + ... + \alpha_p X_{t-p} + E_t$$
Here, $p$ is called the order of the AR model. Hence, we abbreviate by $AR(p)$. An alternative notation is with the backshift operator $B$:
$$(1-\alpha_1 B - \alpha_2 B^2 - ... \alpha_p B^p) X_t = E_t \Leftrightarrow \Phi(B)X_t = E_t$$
Here, $\Phi(B)$ is called the characteristic polynomial of the $AR(p)$. It determines most of the relevant properties of the process.
\subsubsection{AR(1)-Model}\label{ar-1}
$$X_t = \alpha_1 X_{t-1} + E_t$$
where $E_t$ is i.i.d. with $E[E_t] = 0$ and $Var(E_t) = \sigma_E^2$. We also require that $E_t$ is independent of $X_s, s<t$ \\
\vspace{.2cm}
Under these conditions, $E_t$ is a causal White Noise process, or an innovation. Be aware that this is stronger than the i.i.d. requirement: not every i.i.d. process is an innovation and that property is absolutely central to $AR(p)$-modelling.
\subsubsection{AR(p)-Models and Stationarity}
$AR(p)$-models must only be fitted to stationary time series. Any potential trends and/or seasonal effects need to be removed first. We will also make sure that the processes are stationary. \\
\vspace{.2cm}
\textbf{Conditions}
Any stationary $AR(p)$-process meets
\begin{itemize}
\item $E[X_t] = \mu = 0$
\item $1-\alpha_1 z + \alpha_2 z^2 + ... + \alpha_p z^p = 0$ (verify with \verb|polyroot()| in R)
\end{itemize}
\subsection{Yule-Walker equations}
We observe that there exists a linear equation system built up from the $AR(p)$-coefficients and the CF-coefficients of up to lag $p$. \\
\vspace{.2cm}
We can use these equations for fitting an $AR(p)$-model:
\begin{enumerate}
\item Estimate the ACF from a time series
\item Plug-in the estimates into the Yule-Walker-Equations
\item The solution are the $AR(p)$-coefficients
\end{enumerate}
\subsection{Fitting AR(p)-models}
This involves 3 crucial steps:
\begin{enumerate}
\item Model Identification
\begin{itemize}
\item is an AR process suitable, and what is $p$?
\item will be based on ACF/PACF-Analysis
\end{itemize}
\item Parameter Estimation
\begin{itemize}
\item Regression approach
\item Yule-Walker-Equations
\item and more (MLE, Burg-Algorithm)
\end{itemize}
\item Residual Analysis
\end{enumerate}
\subsubsection{Model identification}
\begin{itemize}
\item $AR(p)$ processes are stationary
\item For all AR(p) processes, the ACF decays exponentially quickly, or is an exponentially damped sinusoid.
\item For all $AR(p)$ processes, the PACF is equal to zero for all lags $k > p$. The behavior before lag $p$ can be arbitrary.
\end{itemize}
If what we observe is fundamentally different from the above, it is unlikely that the series was generated from an $AR(p)$-process. We thus need other models, maybe more sophisticated ones.
\subsubsection{Parameter estimation}
Observed time series are rarely centered. Then, it is inappropriate to fit a pure $AR(p)$ process. All R routines by default assume the shifted process $Y_t = m + X_t$. Thus, we face the problem:
$$(Y_t - m) = \alpha_1(Y_{t-1} - m) + ... + \alpha_p(Y_{t-p} - m) + E_t$$
The goal is to estimate the global mean m , the AR-coefficients $\alpha_1 ,..., \alpha_p$, and some parameters defining the distribution of the innovation $E_t$. We usually assume a Gaussian, hence this is $\sigma_E^2$.\\
\vspace{.2cm}
We will discuss 4 methods for estimating the parameters:\\
\vspace{.2cm}
\textbf{OLS Estimation} \\
If we rethink the previously stated problem, we recognize a multiple linear regression problem without
intercept on the centered observations. What we do is:
\begin{enumerate}
\item Estimate $\hat{m} = \bar{y}$ and $x_t = y_t - m$
\item Run a regression without intercept on $x_t$ to obtain $\hat{\alpha_1},\dots,\hat{\alpha_p}$
\item For $\hat{\sigma_E^2}$, take the residual standard error from the output
\end{enumerate}
\vspace{.2cm}
\textbf{Burg's algorithm} \\
While OLS works, the first $p$ instances are never evaluated as responses. This is cured by Burgs algorithm, which uses the property of time-reversal in stochastic processes. We thus evaluate the RSS of forward and backward prediction errors:
$$\sum_{t=p+1}^n \bigg[\bigg(X_t - \sum_{k=1}^p \alpha_k X_{t-k}\bigg)^2 + \bigg(X_{t-p} - \sum_{k=1}^p \alpha_k X_{t-p+k}\bigg)^2 \bigg]$$
In contrast to OLS, there is no explicit solution and numerical optimization is required. This is done with a recursive method called the Durbin-Levison algorithm (implemented in R).
\begin{lstlisting}[language=R]
f.burg <- ar.burg(llynx, aic=F, order.max=2)
\end{lstlisting}
\vspace{.2cm}
\textbf{Yule-Walker Equations} \\
The Yule-Walker-Equations yield a LES that connects the true ACF with the true AR-model parameters. We plug-in the estimated ACF coefficients:
$$\hat{\rho}(k) = \hat{\alpha_k}\hat{\rho}(k-1) + \dots + \hat{\alpha_p}\hat{\rho}(k-p), \, \mathrm{for} \, k=1,\dots,p$$
and solve the LES to obtain the AR-parameter estimates.\\
\vspace{.2cm}
In R we can use \verb|ar.yw()| \\
\vspace{.2cm}
\textbf{Maximum-likelihood-estimation} \\
Idea: Determine the parameters such that, given the observed time series $(y_1 ,\dots, y_n)$, the resulting model is the most plausible (i.e. the most likely) one. \\
This requires the choice of a probability model for the time series. By assuming Gaussian innovations, $E_t \sim N (0,\sigma_E^2)$ , any $AR(p)$ process has a multivariate normal distribution:
$$Y = (Y_1,\dots,Y_n) \sim N(m \cdot \vec{1},V)$$
with $V$ depending on $\vec{\alpha},\sigma_E^2$ \\
MLE then provides simultaneous estimates by optimizing:
$$L(\alpha,m,\sigma_E^2) \propto \exp \bigg( \sum_{t=1}^n(x_t - \hat{x_t}) \bigg)$$
\begin{lstlisting}[language=R]
> f.ar.mle
Call: arima(x = log(lynx), order = c(2, 0, 0))
\end{lstlisting}
\vspace{.2cm}
\textbf{Some remarks} \\
\begin{itemize}
\item All 4 estimation methods are asymptotically equivalent and even on finite samples, the differences are usually small.
\item All 4 estimation methods are non-robust against outliers and perform best on data that are approximately Gaussian.
\item Function \verb|arima()| provides standard errors for $\hat{m}; \hat{\alpha}_1 ,\dots, \hat{\alpha}_p$ so that statements about significance become feasible and confidence intervals for the parameters can be built.
\item \verb|ar.ols()|, \verb|ar.yw()| and \verb|ar.burg()| allow for convenient choice of the optimal model order $p$ using the AIC criterion. Among these methods, \verb|ar.burg()| is usually preferred.
\end{itemize}
\subsection{Model diagnostics}
\subsubsection{Residual analysis}
"residuals" = "estimated innovations"
$$\hat{E_t} = (y_t - \hat{m}) - (\hat{\alpha_1}(y_{t-1} - \hat{m}) - \dots - \hat{\alpha}_p(y_{t-1} - \hat{m}))$$
With assumptions as in Chapter \ref{ar-1} \\
\vspace{.2cm}
We can check these, using (in R: \verb|tsdisplay(resid(fit))|)
\begin{itemize}
\item Time-series plot of $\hat{E}_t$
\item ACF/PACF-plot of $\hat{E}_t$
\item QQ-plot of $\hat{E}_t$
\end{itemize}
The time-series should look like white-noise \\
\vspace{.2cm}
\textbf{Alternative} \\
Using \verb|checkresiduals()|: \\
A convenient alternative for residual analysis is this function from \verb|library(forecast)|. It only works correctly when fitting with \verb|arima()|, though.
\begin{lstlisting}[language=R]
> f.arima <- arima(log(lynx), c(11,0,0))
> checkresiduals(f.arima)
Ljung-Box test
data: Residuals from ARIMA(11,0,0) with non-zero mean
Q* = 4.7344, df = 3, p-value = 0.1923
Model df: 12. Total lags used: 15
\end{lstlisting}
The function carries out a Ljung-Box test to check whether residuals are still correlated. It also provides a graphical output:
\begin{figure}[H]
\centering
\includegraphics[width=.25\textwidth]{checkresiduals.png}
\caption{Example output from above code}
\label{fig:checkresiduals}
\end{figure}
\subsubsection{Diagsnostic by simulation}
As a last check before a model is called appropriate, simulating from the estimated coefficients and visually inspecting the resulting series (without any prejudices) to the original one can be beneficial.
\begin{itemize}
\item The simulated series should "look like" the original. If this is not the case, the model failed to capture (some of) the properties in the original data.
\item A larger or more sophisticated model may be necessary in cases where simulation does not recapture the features in the original data.
\end{itemize}
\subsection{Moving average models (MA)}
Whereas for $AR(p)$-models, the current observation of a series is written as a linear combination of its own past, $MA(q)$-models can be seen as an extension of the "pure" process
$$X_t = E_t$$
in the sense that the last q innovation terms $E_{t-1} , E_{t-2} ,...$ are included, too. We call this a moving average model:
$$X_t = E_t + \beta_1 E_{t-1} + \beta_2 E_{t-2} + \dots + \beta_q E_{t-q}$$
This is a time series process that is stationary, but not i.i.d. In many aspects, $MA(q)$ models are complementary to $AR(p)$.
\scriptsize