Complete (S)ARIMA and ARCH chapters

This commit is contained in:
jannisp 2021-08-26 13:16:45 +02:00
parent 92099f5094
commit 7e86deb1be

119
main.tex
View file

@ -942,16 +942,132 @@ $$\hat{\beta} =(X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} y$$
With $Var(\hat{\beta}) = (X^T \Sigma^{-1} X)^{-1} \sigma^2$
\subsubsection{R example}
Package \verb|nlme| has function \verb|gls()|. It does only work if the correlation structure of the errors is provided. This has to be determined from the residuals of an OLS regression first.
\begin{lstlisting}[language=R]
> library(nlme)
> corStruct <- corARMA(form=~time, p=2)
> fit.gls <- gls(temp~time+season, data=dat,correlation=corStruct)
\end{lstlisting}
The output contains the regression coefficients and their standard errors, as well as the AR-coefficients plus some further information about the model (Log-Likelihood, AIC, ...).
\subsection{Missing input variables}
\begin{itemize}
\item Correlated errors in (time series) regression problems are often caused by the absence of crucial input variables.
\item In such cases, it is much better to identify the not-yet-present variables and include them into the regression model.
\item However, in practice this isnt always possible, because these crucial variables may be non-available.
\item \textbf{Note:} Time series regression methods for correlated errors such as GLS can be seen as a sort of emergency kit for the case where the non-present variables cannot be added. If you can do without them, even better!
\end{itemize}
\section{ARIMA and SARIMA}
\textbf{Why?} \\
Many time series in practice show trends and/or seasonality. While we can decompose them and describe the stationary part, it might be attractive to directly model them. \\
\vspace{.2cm}
\textbf{Advantages} \\
Forecasting is convenient and AIC-based decisions for the presence of trend/seasonality become feasible. \\
\vspace{.2cm}
\textbf{Disadvantages} \\
Lack of transparency for the decomposition and forecasting has a bit the flavor of a black-box-method. \\
\subsection{ARIMA(p,d,q)-models}
ARIMA models are aimed at describing series that have a trend which can be removed by differencing, and where the differences can be described with an ARMA($p,q$)-model. \\
\vspace{.2cm}
\textbf{Definition}\\
If
$$Y_t = X_t - X_{t-1} = (1-B)^d X_t \sim ARMA(p,q)$$
then
$$X_t \sim ARIMA(p,d,q)$$
In most practical cases, using $d = 1$ will be enough! \\
\vspace{.2cm}
\textbf{Notation}\\
$$\Phi(B)(1-B)^d X_t = \Theta(B)(E_t)$$
\vspace{.2cm}
\textbf{Stationarity}\\
ARIMA-processes are non-stationary if $d > 0$, option to rewrite as non-stationary ARMA(p,q).
\subsubsection{Fitting ARIMA in R}
\begin{enumerate}
\item Choose the appropriate order of differencing, usually $d = 1$ or (in rare cases) $d = 2$ , such that the result is a stationary series.
\item Analyze ACF and PACF of the differenced series. If the stylized facts of an ARMA process are present, decide for the orders $p$ and $q$.
\item Fit the model using the arima() procedure. This can be done on the original series by setting $d$ accordingly, or on the differences, by setting $d = 0$ and argument \verb|include.mean=FALSE|.
\item Analyze the residuals; these must look like White Noise. If several competing models are appropriate, use AIC to decide for the winner.
\end{enumerate}
\textbf{Example}\footnote{Full example in script pages 117ff}{} \\
Plausible models for the logged oil prices after inspection of ACF/PACF of the differenced series (that seems stationary): ARIMA(1,1,1) or ARIMA(2,1,1)
\begin{lstlisting}[language=R]
> arima(lop, order=c(1,1,1))
Coefficients:
ar1 ma1
-0.2987 0.5700
s.e. 0.2009 0.1723
sigma^2 = 0.006642: ll = 261.11, aic = -518.22
\end{lstlisting}
\subsubsection{Rewriting ARIMA as Non-Stationary ARMA}
Any ARIMA(p,d,q) model can be rewritten in the form of a non-stationary ARMA((p+d),q) process. This provides some deeper insight, especially for the task of forecasting.
\subsection{SARIMA(p,d,q)(P,D,Q)$^S$}
We have learned that it is also possible to use differencing for obtaining a stationary series out of one that features both trend and seasonal effect.
\begin{enumerate}
\item Removing the seasonal effect by differencing at lag 12 \\ \begin{center}$Y_t = X_t - X_{t-12} = (1-B^{12})X_t$ \end{center}
\item Usually, further differencing at lag 1 is required to obtain a series that has constant global mean and is stationary \\ \begin{center} $Z_t = Y_t - Y_{t-1} = (1-B^{12})Y_t = (1-B)(1-B^{12})X_t = X_t - X_{t-1} - X_{t-12} + X_{t-13}$ \end{center}
\end{enumerate}
The stationary series $Z_t$ is then modelled with some special kind of ARMA($p,q$) model. \\
\vspace{.2cm}
\textbf{Definition} \\
A series $X_t$ follows a SARIMA($p,d,q$)($P,D,Q$)$^S$-process if the following equation holds:
$$\Phi(B)\Phi_s (B^S) Z_t = \Theta(B) \Theta_S (B^S) E_t$$
Here, series Z t originated from $X_t$ after appropriate seasonal and trend differencing: $Z_t = (1-B)^d (1-B^S)^D X_t$ \\
\vspace{.2cm}
In most practical cases, using differencing order $d = D = 1$ will be sufficient. Choosing of $p,q,P,Q$ happens via ACF/PACF or via AIC-based decisions.
\subsubsection{Fitting SARIMA}
\begin{enumerate}
\item Perform seasonal differencing of the data. The lag $S$ is determined by the period. Order $D = 1$ is mostly enough.
\item Decide if additional differencing at lag 1 is required for stationarity. If not, then $d = 0$. If yes, then try $d = 1$.
\item Analyze ACF/PACF of $Z_t$ to determine $p,q$ for the short term and $P,Q$ at multiple-of-the-period dependency.
\item Fit the model using \verb|arima()| by setting \verb|order=c(p,d,q)| and \verb|seasonal=c(P,D,Q)| accordingly to your choices.
\item Check the accuracy of the model by residual analysis. The residuals must look like White Noise and +/- Gaussian.
\end{enumerate}
\section{ARCH/GARCH-models}
The basic assumption for ARCH/GARCH models is as follows:
$$X_t = \mu_t + E_t$$
where $E_t = \sigma_t W_t$ and $W_t$ is white noise. \\
Here, both the conditional mean and variance are non-trivial
$$\mu_t = E[X_t | X_{t-1},X_{t-2},\dots], \, \sigma_t^2 = Var[X_t | X_{t-1},X_{t-2},\dots]$$
and can be modelled using a mixture of ARMA and GARCH. \\
\vspace{.2cm}
For simplicity, we here assume that both the conditional and the global mean are zero $\mu = \mu_t = 0$ and consider pure ARCH processes only where:
$$X_t = \sigma_t W_t \; \mathrm{with} \; \sigma_t = f(X_{t-1}^2,X_{t-2}^2,\dots,X_{t-p}^2)$$
\subsection{ARCH(p)-model}
A time series X t is \textit{autoregressive conditional heteroskedastic} of order $p$, abbreviated ARCH($p$), if:
$$X_t = \sigma_t W_t$$
with $\sigma_t = \sqrt{\alpha_0 + \sum_{i=1}^p \alpha_p X_{t-i}^2}$
It is obvious that an ARCH($p$) process shows volatility, as:
$$Var(X_t | X_{t-1},X_{t-2},\dots]) = \alpha_0 + \alpha_1 Var(X_t | \dots]) + \dots + \alpha_p Var(X_t | \dots])$$
We can determine the order of an ARCH($p$) process in by analyzing ACF and PACF of the squared time series data. We then again search for an exponential decay in the ACF and a cut-off in the PACF.
\subsubsection{Fitting an ARCH(2)-model}
The simplest option for fitting an ARCH($p$) in R is to use function \verb|garch()| from \verb|library(tseries)|. Be careful, because the \verb|order=c(q,p)| argument differs from most of the literature.
\begin{lstlisting}[language=R]
> fit <- garch(lret.smi, order = c(0,2))
> fit
Call: garch(x = lret.smi, order = c(0, 2))
Coefficient(s):
a0 a1 a2
6.568e-05 1.309e-01 1.074e-01
\end{lstlisting}
We recommend to run residual analysis afterwards.
\section{General concepts}
\subsection{AIC}
The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows:
The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows (\textbf{lower is better}):
$$AIC = -2 \log (L) + 2(p+q+k+1)$$
where
\begin{itemize}
@ -960,6 +1076,7 @@ where
\end{itemize}
For small samples $n$, often a corrected version is used:
$$AICc = AIC + \frac{2(p + q + k + 1)(p + q + k + 2)}{n - p - q - k - 2}$$
\scriptsize
\newpage