Compare commits

...

5 commits

Author SHA1 Message Date
jannisp
1a87ddc913 Start ARMA 2021-08-23 09:35:12 +02:00
jannisp
f2900a1fb7 Changes for new deployment 2021-08-23 09:34:46 +02:00
jannisp
e0c5fad908 MA-models 2021-08-22 16:16:28 +02:00
jannisp
c726ca13ab AR models 2021-08-21 17:14:22 +02:00
jannisp
443ea83418 Finish autocorrelation 2021-08-12 14:25:56 +02:00
4 changed files with 344 additions and 4 deletions

2
Jenkinsfile vendored
View file

@ -12,6 +12,6 @@ node {
} }
stage('Publish PDF') { stage('Publish PDF') {
sh 'scp -i /root/.ssh/id_rsa ats-zf.pdf jannis@192.168.178.45:/var/www/html/download/latex-previews/ats-zf.pdf' sh 'scp -i /root/.ssh/id_rsa ats-zf.pdf thisfro@192.168.178.45:/opt/containers/apache2/html/download/latex-previews/ats-zf.pdf'
} }
} }

BIN
img/checkresiduals.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

BIN
img/pacf.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

346
main.tex
View file

@ -180,7 +180,7 @@ mathematically formulated by strict stationarity.
\end{tabular} \end{tabular}
\subsubsection{Weak} \label{weak-stationarity} \subsubsection{Weak} \label{weak-stationarity}
It is impossible to "prove" the theoretical concept of stationarity from data. We can only search for evidence in favor or against it. \\ It is impossible to «prove» the theoretical concept of stationarity from data. We can only search for evidence in favor or against it. \\
\vspace{0.1cm} \vspace{0.1cm}
However, with strict stationarity, even finding evidence only is too difficult. We thus resort to the concept of weak stationarity. However, with strict stationarity, even finding evidence only is too difficult. We thus resort to the concept of weak stationarity.
@ -522,6 +522,346 @@ var.ts <- 1/n^2*acf(b,lag=0,type="cov")$acf[1]*(n+2*sum(((n-1):(n-10))*acf(b,10)
mean(b) + c(-1.96,1.96)*sqrt(var.ts) mean(b) + c(-1.96,1.96)*sqrt(var.ts)
\end{lstlisting} \end{lstlisting}
\subsection{Partial autocorrelation (PACF)}
The $k$-th partial autocorrelation $\pi_k$ is defined as the correlation between $X_{t+k}$ and $X_t$, given all the values in between.
$$\pi_k = Cor(X_{t+k},X_t | X_{t+1},...,X_{t+k-1} = x_{t+k-1})$$
\begin{itemize}
\item Given a time series X t , the partial autocorrelation of lag $k$, is the autocorrelation between $X_t$ and $X_{t+k}$ with the linear dependence of $X_{t+1}$ through to $X_{t+k-1}$ removed.
\item One can draw an analogy to regression. The ACF measures the „simple“ dependence between $X_t$ and $X_{t+k}$, whereas the PACF measures that dependence in a „multiple“ fashion.\footnote{See e.g. \href{https://n.ethz.ch/~jannisp/download/Mathematik-IV-Statistik/zf-statistik.pdf}{\textit{Mathematik IV}}}
\end{itemize}
$$\pi_1 = \rho_1$$
$$\pi_2 = \frac{\rho_2 - \rho_1^2}{1-\rho_1^2}$$
for AR(1) moderls, we have $\pi_2 = 0$, because $\rho_2 = \rho_1^2$, i.e. there is no conditional relation between $(X_t, X_{t+2} | X_{t+1})$
\begin{lstlisting}[language=R]
pacf(wave, ylim=c(1,1))
\end{lstlisting}
\begin{figure}[H]
\centering
\includegraphics[width=.25\textwidth]{pacf.png}
\caption{PACF for wave tank}
\label{fig:pacf}
\end{figure}
\section{Basics of modelling}
\subsection{White noise}
\begin{quote}
A time series $(W_1, W_2,..., W_n)$ is a \textbf{White Noise} series if the random variables $W_1 , W_2,...$ are i.i.d with mean zero.
\end{quote}
This implies that all $W_t$ have the same variance $\sigma_W^2$ and
$$Cov(W_i,W_j) = 0 \, \forall \, i \neq j$$
Thus, there is no autocorrelation either: $\rho_k = 0 \, \forall \, k \neq 0$. \\
\vspace{.2cm}
If in addition, the variables also follow a Gaussian distribution, i.e. $W_t \sim N(0, \sigma_W^2)$, the series is called \textbf{Gaussian White Noise}. The term White Noise is due to the analogy to white light (all wavelengths are equally distributed).
\subsection{Autoregressive models (AR)}
In an $AR(p)$ process, the random variable $X_t$ depends on an autoregressive linear combination of the preceding $X_{t-1},..., X_{t-p}$, plus a „completely independent“ term called innovation $E_t$.
$$X_t = \alpha_1 X_{t-1} + ... + \alpha_p X_{t-p} + E_t$$
Here, $p$ is called the order of the AR model. Hence, we abbreviate by $AR(p)$. An alternative notation is with the backshift operator $B$:
$$(1-\alpha_1 B - \alpha_2 B^2 - ... \alpha_p B^p) X_t = E_t \Leftrightarrow \Phi(B)X_t = E_t$$
Here, $\Phi(B)$ is called the characteristic polynomial of the $AR(p)$. It determines most of the relevant properties of the process.
\subsubsection{AR(1)-Model}\label{ar-1}
$$X_t = \alpha_1 X_{t-1} + E_t$$
where $E_t$ is i.i.d. with $E[E_t] = 0$ and $Var(E_t) = \sigma_E^2$. We also require that $E_t$ is independent of $X_s, s<t$ \\
\vspace{.2cm}
Under these conditions, $E_t$ is a causal White Noise process, or an innovation. Be aware that this is stronger than the i.i.d. requirement: not every i.i.d. process is an innovation and that property is absolutely central to $AR(p)$-modelling.
\subsubsection{AR(p)-Models and Stationarity}
$AR(p)$-models must only be fitted to stationary time series. Any potential trends and/or seasonal effects need to be removed first. We will also make sure that the processes are stationary. \\
\vspace{.2cm}
\textbf{Conditions}
Any stationary $AR(p)$-process meets
\begin{itemize}
\item $E[X_t] = \mu = 0$
\item $1-\alpha_1 z + \alpha_2 z^2 + ... + \alpha_p z^p = 0$ (verify with \verb|polyroot()| in R)
\end{itemize}
\subsection{Yule-Walker equations}
We observe that there exists a linear equation system built up from the $AR(p)$-coefficients and the CF-coefficients of up to lag $p$. \\
\vspace{.2cm}
We can use these equations for fitting an $AR(p)$-model:
\begin{enumerate}
\item Estimate the ACF from a time series
\item Plug-in the estimates into the Yule-Walker-Equations
\item The solution are the $AR(p)$-coefficients
\end{enumerate}
\subsection{Fitting AR(p)-models}
This involves 3 crucial steps:
\begin{enumerate}
\item Model Identification
\begin{itemize}
\item is an AR process suitable, and what is $p$?
\item will be based on ACF/PACF-Analysis
\end{itemize}
\item Parameter Estimation
\begin{itemize}
\item Regression approach
\item Yule-Walker-Equations
\item and more (MLE, Burg-Algorithm)
\end{itemize}
\item Residual Analysis
\end{enumerate}
\subsubsection{Model identification}
\begin{itemize}
\item $AR(p)$ processes are stationary
\item For all AR(p) processes, the ACF decays exponentially quickly, or is an exponentially damped sinusoid.
\item For all $AR(p)$ processes, the PACF is equal to zero for all lags $k > p$. The behavior before lag $p$ can be arbitrary.
\end{itemize}
If what we observe is fundamentally different from the above, it is unlikely that the series was generated from an $AR(p)$-process. We thus need other models, maybe more sophisticated ones.
\subsubsection{Parameter estimation}
Observed time series are rarely centered. Then, it is inappropriate to fit a pure $AR(p)$ process. All R routines by default assume the shifted process $Y_t = m + X_t$. Thus, we face the problem:
$$(Y_t - m) = \alpha_1(Y_{t-1} - m) + ... + \alpha_p(Y_{t-p} - m) + E_t$$
The goal is to estimate the global mean m , the AR-coefficients $\alpha_1 ,..., \alpha_p$, and some parameters defining the distribution of the innovation $E_t$. We usually assume a Gaussian, hence this is $\sigma_E^2$.\\
\vspace{.2cm}
We will discuss 4 methods for estimating the parameters:\\
\vspace{.2cm}
\textbf{OLS Estimation} \\
If we rethink the previously stated problem, we recognize a multiple linear regression problem without
intercept on the centered observations. What we do is:
\begin{enumerate}
\item Estimate $\hat{m} = \bar{y}$ and $x_t = y_t - m$
\item Run a regression without intercept on $x_t$ to obtain $\hat{\alpha_1},\dots,\hat{\alpha_p}$
\item For $\hat{\sigma_E^2}$, take the residual standard error from the output
\end{enumerate}
\vspace{.2cm}
\textbf{Burg's algorithm} \\
While OLS works, the first $p$ instances are never evaluated as responses. This is cured by Burgs algorithm, which uses the property of time-reversal in stochastic processes. We thus evaluate the RSS of forward and backward prediction errors:
$$\sum_{t=p+1}^n \bigg[\bigg(X_t - \sum_{k=1}^p \alpha_k X_{t-k}\bigg)^2 + \bigg(X_{t-p} - \sum_{k=1}^p \alpha_k X_{t-p+k}\bigg)^2 \bigg]$$
In contrast to OLS, there is no explicit solution and numerical optimization is required. This is done with a recursive method called the Durbin-Levison algorithm (implemented in R).
\begin{lstlisting}[language=R]
f.burg <- ar.burg(llynx, aic=F, order.max=2)
\end{lstlisting}
\vspace{.2cm}
\textbf{Yule-Walker Equations} \\
The Yule-Walker-Equations yield a LES that connects the true ACF with the true AR-model parameters. We plug-in the estimated ACF coefficients:
$$\hat{\rho}(k) = \hat{\alpha_k}\hat{\rho}(k-1) + \dots + \hat{\alpha_p}\hat{\rho}(k-p), \, \mathrm{for} \, k=1,\dots,p$$
and solve the LES to obtain the AR-parameter estimates.\\
\vspace{.2cm}
In R we can use \verb|ar.yw()| \\
\vspace{.2cm}
\textbf{Maximum-likelihood-estimation} \\
Idea: Determine the parameters such that, given the observed time series $(y_1 ,\dots, y_n)$, the resulting model is the most plausible (i.e. the most likely) one. \\
This requires the choice of a probability model for the time series. By assuming Gaussian innovations, $E_t \sim N (0,\sigma_E^2)$ , any $AR(p)$ process has a multivariate normal distribution:
$$Y = (Y_1,\dots,Y_n) \sim N(m \cdot \vec{1},V)$$
with $V$ depending on $\vec{\alpha},\sigma_E^2$ \\
MLE then provides simultaneous estimates by optimizing:
$$L(\alpha,m,\sigma_E^2) \propto \exp \bigg( \sum_{t=1}^n(x_t - \hat{x_t}) \bigg)$$
\begin{lstlisting}[language=R]
> f.ar.mle
Call: arima(x = log(lynx), order = c(2, 0, 0))
\end{lstlisting}
\vspace{.2cm}
\textbf{Some remarks} \\
\begin{itemize}
\item All 4 estimation methods are asymptotically equivalent and even on finite samples, the differences are usually small.
\item All 4 estimation methods are non-robust against outliers and perform best on data that are approximately Gaussian.
\item Function \verb|arima()| provides standard errors for $\hat{m}; \hat{\alpha}_1 ,\dots, \hat{\alpha}_p$ so that statements about significance become feasible and confidence intervals for the parameters can be built.
\item \verb|ar.ols()|, \verb|ar.yw()| and \verb|ar.burg()| allow for convenient choice of the optimal model order $p$ using the AIC criterion. Among these methods, \verb|ar.burg()| is usually preferred.
\end{itemize}
\subsection{Model diagnostics}
\subsubsection{Residual analysis}\label{residual-analysis}
"residuals" = "estimated innovations"
$$\hat{E_t} = (y_t - \hat{m}) - (\hat{\alpha_1}(y_{t-1} - \hat{m}) - \dots - \hat{\alpha}_p(y_{t-1} - \hat{m}))$$
With assumptions as in Chapter \ref{ar-1} \\
\vspace{.2cm}
We can check these, using (in R: \verb|tsdisplay(resid(fit))|)
\begin{itemize}
\item Time-series plot of $\hat{E}_t$
\item ACF/PACF-plot of $\hat{E}_t$
\item QQ-plot of $\hat{E}_t$
\end{itemize}
The time-series should look like white-noise \\
\vspace{.2cm}
\textbf{Alternative} \\
Using \verb|checkresiduals()|: \\
A convenient alternative for residual analysis is this function from \verb|library(forecast)|. It only works correctly when fitting with \verb|arima()|, though.
\begin{lstlisting}[language=R]
> f.arima <- arima(log(lynx), c(11,0,0))
> checkresiduals(f.arima)
Ljung-Box test
data: Residuals from ARIMA(11,0,0) with non-zero mean
Q* = 4.7344, df = 3, p-value = 0.1923
Model df: 12. Total lags used: 15
\end{lstlisting}
The function carries out a Ljung-Box test to check whether residuals are still correlated. It also provides a graphical output:
\begin{figure}[H]
\centering
\includegraphics[width=.25\textwidth]{checkresiduals.png}
\caption{Example output from above code}
\label{fig:checkresiduals}
\end{figure}
\subsubsection{Diagsnostic by simulation}
As a last check before a model is called appropriate, simulating from the estimated coefficients and visually inspecting the resulting series (without any prejudices) to the original one can be beneficial.
\begin{itemize}
\item The simulated series should "look like" the original. If this is not the case, the model failed to capture (some of) the properties in the original data.
\item A larger or more sophisticated model may be necessary in cases where simulation does not recapture the features in the original data.
\end{itemize}
\subsection{Moving average models (MA)}
Whereas for $AR(p)$-models, the current observation of a series is written as a linear combination of its own past, $MA(q)$-models can be seen as an extension of the "pure" process
$$X_t = E_t$$
in the sense that the last q innovation terms $E_{t-1} , E_{t-2} ,...$ are included, too. We call this a moving average model:
$$X_t = E_t + \beta_1 E_{t-1} + \beta_2 E_{t-2} + \dots + \beta_q E_{t-q}$$
This is a time series process that is stationary, but not i.i.d. In many aspects, $MA(q)$ models are complementary to $AR(p)$.
\subsubsection{Stationarity of MA models}
We first restrict ourselves to the simple $MA(1)$-model:
$$X_t = E_t + \beta_1 E_{t-1}$$
The series $X_t$ is always weakly stationary, no matter what the choice of the parameter $\beta_1$ is.
\subsubsection{ACF/PACF of MA processes}
For the ACF
$$\rho(1) = \frac{\gamma(1)}{\gamma(0)} = \frac{\beta_1}{1+\beta_1^2} < 0.5$$
and
$$\rho(k) = 0 \, \forall \, k > 1$$
Thus, we have a «cut-off» situation, i.e. a similar behavior to the one of the PACF in an $AR(1)$ process. This is why and how $AR(1)$ and $MA(1)$ are complementary.
\subsubsection{Invertibility}
Without additional assumptions, the ACF of an $MA(1)$ does not allow identification of the generating model.
$$X_t = E_t + 0.5 E_{t-1}$$
$$U_t = E_t + 2 E_{t-1}$$
have identical ACF!
$$\rho(1) = \frac{\beta_{1}}{1+\beta_1^2} = \frac{1/\beta_1}{1+(1/\beta_1^2)}$$
\begin{itemize}
\item An $MA(1)$-, or in general an $MA(q)$-process is said to be invertible if the roots of the characteristic polynomial $\Theta(B)$ exceed one in absolute value.
\item Under this condition, there exists only one $MA(q)$-process for any given ACF. But please note that any $MA(q)$ is stationary, no matter if it is invertible or not.
\item The condition on the characteristic polynomial translates to restrictions on the coefficients. For any MA(1)-model, $|\beta_1| < 1$ is required.
\item R function \verb|polyroot()| can be used for finding the roots.
\end{itemize}
\textbf{Practical importance:} \\
The condition of invertibility is not only a technical issue, but has important practical meaning. All invertible $MA(q)$ processes can be expressed in terms of an $AR(\infty)$, e.g. for an $MA(1)$:
\begin{align*}
X_t &= E_t + \beta_1 E_{t-1} \\
&= E_t + \beta_1(X_{t-1} - \beta_1 E_{t-2}) \\
&= \dots \\
&= E_t + \beta_1 X_{t-1} - \beta_1^2 X_{t-2} + \beta_1^3X_{t-3} + \dots \\
&= E_t + \sum_{i=1}^\infty \psi_i X_{t-i}
\end{align*}
\subsection{Fitting MA(q)-models to data}
As with AR(p) models, there are three main steps:
\begin{enumerate}
\item Model identification
\begin{itemize}
\item Is the series stationary?
\item Do the properties of ACF/PACF match?
\item Derive order $q$ from the cut-off in the ACF
\end{itemize}
\item Parameter estimation
\begin{itemize}
\item How to determine estimates for $m, \beta_1 ,\dots, \beta_q, \sigma_E^2$?
\item Conditional Sum of Squares or MLE
\end{itemize}
\item Model diagnostics
\begin{itemize}
\item With the same tools/techniques as for AR(p) models
\end{itemize}
\end{enumerate}
\subsubsection{Parameter estimation}
The simplest idea is to exploit the relation between model parameters and autocorrelation coefficients («Yule-Walker») after the global mean $m$ has been estimated and subtracted. \\
In contrast to the Yule-Walker method for AR(p) models, this yields an inefficient estimator that generally generates poor results and hence should not be used in practice.
\vspace{.2cm}
It is better to use \textbf{Conditional sum of squares}:\\
This is based on the fundamental idea of expressing $\sum E_t^2$ in terms of $X_1 ,..., X_n$ and $\beta_1 ,\dots, \beta_q$, as the innovations themselves are unobservable. This is possible for any invertible $MA(q)$, e.g. the $MA(1)$:
$$E_t = X_t = \beta_1 X_{t-1} + \beta_1^2 X_{t-2} + \dots + (-\beta)^{t-1} X_1 + \beta_1^t E_0$$
Conditional on the assumption of $E_0 = 0$ , it is possible to rewrite $\sum E_t^2$ for any $MA(1)$ using $X_1 ,\dots, X_n $ and $\beta_1$. \\
\vspace{.2cm}
Numerical optimization is required for finding the optimal parameter $\beta_1$, but is available in R function \verb|arima()| with:
\begin{lstlisting}[language=R]
> arima(..., order=c(...), method="CSS")
\end{lstlisting}
\textbf{Maximium-likelihood estimation}
\begin{lstlisting}[language=R]
> arima(..., order=c(...), method="CSS-ML")
\end{lstlisting}
This is the default methods in R, which is based on finding starting values for MLE using the CSS approach. If assuming Gaussian innovations, then:
$$X_t = E_t + \beta_1 E_{t-1} + \beta_q E_{t-q}$$
will follow a Gaussian distribution as well, and we have:
$$X = (X_1, \dots, X_n) \sim N(0,V)$$
Hence it is possible to derive the likelihood function and simultaneously estimate the parameters $m;\beta_1,\dots,\beta_q;\sigma_E^2$.
\subsubsection{Residual analysis}
See \ref{residual-analysis}
\subsection{ARMA(p,q)-models}
An $ARMA(p,q)$ model combines $AR(p)$ and $MA(q)$:
$$X_t = \alpha_1 X_{t-1} + \dots + \alpha_p X_{t-p} + E_t + \beta_1 E_{t-1} + \dots + \beta_q E{t-q}$$
where $E_t$ are i.i.d. innovations (=a white noise process).\\
\vspace{.2cm}
Its easier to write $ARMA(p,q)$s with the characteristic polynomials: \\
\vspace{.2cm}
$\Phi(B)X_t = \Theta(B)E_t$, where \\
$\Phi(z) = 1 - \alpha_1 z - \dots - \alpha_p z^p$, is the cP of the $AR$-part, and \\
$\Theta(z) = 1 + \beta_1 z + \dots + \beta_1 z^q$ is the cP of the $MA$-part
\subsubsection{Properties of ARMA(p,q)-Models}
The stationarity is determined by the $AR(p)$-part of the model:\\
If the roots of the characteristic polynomial $\Phi(B)$ exceed one in absolute value, the process is stationary.\\
\vspace{.2cm}
The invertibility is determined by the $MA(q)$-part of the model:\\
If the roots of the characteristic polynomial $\Theta(B)$ exceed one in absolute value, the process is invertible.\\
\vspace{.2cm}
Any stationary and invertible $ARMA(p,q)$ can either be rewritten in the form of a non-parsimonious $AR(\infty)$ or an $MA(\infty)$.\\
In practice, we mostly consider shifted $ARMA(p,q)$: $Y_t = m + X_t$
\begin{table}[H]
\centering
\begin{tabular}{l|l|l}
& ACF & PACF \\
\hline
$AR(p)$ & exponential decay & cut-off at lag $p$ \\
$MA(q)$ & cut-off at lag $q$ & exponential decay \\
$ARMA(p,q)$ & mix decay/cut-off & mix decay/cut-off \\
\end{tabular}
\caption{Comparison of $AR$-,$MA$-, $ARMA$-models}
\end{table}
\begin{itemize}
\item In an $ARMA(p,q)$, depending on the coefficients of the model, either the $AR(p)$ or the $MA(q)$ part can dominate the ACF/PACF characteristics.
\item In an $ARMA(p,q)$, depending on the coefficients of the model, either the $AR(p)$ or the $MA(q)$ part can dominate the ACF/PACF characteristics.
\end{itemize}
\subsubsection{Fitting ARMA-models to data}
See $AR$- and $MA$-modelling
\subsubsection{Identification of order (p,q)}
May be more difficult in reality than in theory:
\begin{itemize}
\item We only have one single realization of the time series with finite length. The ACF/PACF plots are not «facts», but are estimates with uncertainty. The superimposed cut-offs may be difficult to identify from the ACF/PACF plots.
\item $ARMA(p,q)$ models are parsimonius, but can usually be replaced by high-order pure $AR(p)$ or $MA(q)$ models. This is not a good idea in practice, however!
\item In many cases, an AIC grid search over all $ARMA(p,q)$ with $p+q < 5$ may help to identify promising models.
\end{itemize}
\scriptsize \scriptsize
\section*{Copyright} \section*{Copyright}
@ -532,8 +872,8 @@ Jannis Portmann, FS21
\section*{References} \section*{References}
\begin{enumerate} \begin{enumerate}
\item ATSA\_Script\_v219219.docx, M. Dettling \item ATSA\_Script\_v210219.docx, M. Dettling
\item ATSA\_Slides\_v219219.pptx, M. Dettling \item ATSA\_Slides\_v210219.pptx, M. Dettling
\end{enumerate} \end{enumerate}
\section*{Image sources} \section*{Image sources}