Complete (S)ARIMA and ARCH chapters

2021-08-26 13:16:45 +02:00 · 2021-08-26 13:16:45 +02:00 · 7e86deb1be
commit 7e86deb1be
parent 92099f5094
1 changed files with 118 additions and 1 deletions
--- a/main.tex
+++ b/main.tex
@ -942,16 +942,132 @@ $$\hat{\beta} =(X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} y$$
 With $Var(\hat{\beta}) = (X^T \Sigma^{-1} X)^{-1} \sigma^2$
 \subsubsection{R example}
 Package \verb|nlme| has function \verb|gls()|. It does only work if the correlation structure of the errors is provided. This has to be determined from the residuals of an OLS regression first.
 \begin{lstlisting}[language=R]
 > library(nlme)
 > corStruct <- corARMA(form=~time, p=2)
 > fit.gls <- gls(temp~time+season, data=dat,correlation=corStruct)
 \end{lstlisting}
 The output contains the regression coefficients and their standard errors, as well as the AR-coefficients plus some further information about the model (Log-Likelihood, AIC, ...).
 \subsection{Missing input variables}
 \begin{itemize}
    \item Correlated errors in (time series) regression problems are often caused by the absence of crucial input variables.
    \item In such cases, it is much better to identify the not-yet-present variables and include them into the regression model.
    \item However, in practice this isn‘t always possible, because these crucial variables may be non-available.
    \item \textbf{Note:} Time series regression methods for correlated errors such as GLS can be seen as a sort of emergency kit for the case where the non-present variables cannot be added. If you can do without them, even better!
 \end{itemize}
 \section{ARIMA and SARIMA}
 \textbf{Why?} \\
 Many time series in practice show trends and/or seasonality. While we can decompose them and describe the stationary part, it might be attractive to directly model them. \\
 \vspace{.2cm}
 \textbf{Advantages} \\
 Forecasting is convenient and AIC-based decisions for the presence of trend/seasonality become feasible. \\
 \vspace{.2cm}
 \textbf{Disadvantages} \\
 Lack of transparency for the decomposition and forecasting has a bit the flavor of a black-box-method. \\
 \subsection{ARIMA(p,d,q)-models}
 ARIMA models are aimed at describing series that have a trend which can be removed by differencing, and where the differences can be described with an ARMA($p,q$)-model. \\
 \vspace{.2cm}
 \textbf{Definition}\\
 If
 $$Y_t = X_t - X_{t-1} = (1-B)^d X_t \sim ARMA(p,q)$$
 then
 $$X_t \sim ARIMA(p,d,q)$$
 In most practical cases, using $d = 1$ will be enough! \\
 \vspace{.2cm}
 \textbf{Notation}\\
 $$\Phi(B)(1-B)^d X_t = \Theta(B)(E_t)$$
 \vspace{.2cm}
 \textbf{Stationarity}\\
 ARIMA-processes are non-stationary if $d > 0$, option to rewrite as non-stationary ARMA(p,q).
 \subsubsection{Fitting ARIMA in R}
 \begin{enumerate}
    \item  Choose the appropriate order of differencing, usually $d = 1$ or (in rare cases) $d = 2$ , such that the result is a stationary series.
    \item  Analyze ACF and PACF of the differenced series. If the stylized facts of an ARMA process are present, decide for the orders $p$ and $q$.
    \item Fit the model using the arima() procedure. This can be done on the  original series by setting $d$ accordingly, or on the differences, by setting $d = 0$ and argument \verb|include.mean=FALSE|.
    \item Analyze the residuals; these must look like White Noise. If several competing models are appropriate, use AIC to decide for the winner.
 \end{enumerate}
 \textbf{Example}\footnote{Full example in script pages 117ff}{} \\
 Plausible models for the logged oil prices after inspection of ACF/PACF of the differenced series (that seems stationary): ARIMA(1,1,1) or ARIMA(2,1,1)
 \begin{lstlisting}[language=R]
 > arima(lop, order=c(1,1,1))
 Coefficients:
         ar1       ma1
     -0.2987    0.5700
 s.e.  0.2009    0.1723
 sigma^2 = 0.006642: ll = 261.11, aic = -518.22    
 \end{lstlisting}
 \subsubsection{Rewriting ARIMA as Non-Stationary ARMA}
 Any ARIMA(p,d,q) model can be rewritten in the form of a non-stationary ARMA((p+d),q) process. This provides some deeper insight, especially for the task of forecasting.
 \subsection{SARIMA(p,d,q)(P,D,Q)$^S$}
 We have learned that it is also possible to use differencing for obtaining a stationary series out of one that features both trend and seasonal effect.
 \begin{enumerate}
    \item Removing the seasonal effect by differencing at lag 12 \\ \begin{center}$Y_t = X_t - X_{t-12} = (1-B^{12})X_t$ \end{center}
    \item  Usually, further differencing at lag 1 is required to obtain a series that has constant global mean and is stationary \\ \begin{center} $Z_t = Y_t - Y_{t-1} = (1-B^{12})Y_t = (1-B)(1-B^{12})X_t = X_t - X_{t-1} - X_{t-12} + X_{t-13}$ \end{center}
 \end{enumerate}
 The stationary series $Z_t$ is then modelled with some special kind of ARMA($p,q$) model. \\
 \vspace{.2cm}
 \textbf{Definition} \\
 A series $X_t$ follows a SARIMA($p,d,q$)($P,D,Q$)$^S$-process if the following equation holds:
 $$\Phi(B)\Phi_s (B^S) Z_t = \Theta(B) \Theta_S (B^S) E_t$$
 Here, series Z t originated from $X_t$ after appropriate seasonal and trend differencing: $Z_t = (1-B)^d (1-B^S)^D X_t$ \\
 \vspace{.2cm}
 In most practical cases, using differencing order $d = D = 1$ will be sufficient. Choosing of $p,q,P,Q$ happens via ACF/PACF or via AIC-based decisions.
 \subsubsection{Fitting SARIMA}
 \begin{enumerate}
    \item Perform seasonal differencing of the data. The lag $S$ is determined by the period. Order $D = 1$ is mostly enough.
    \item Decide if additional differencing at lag 1 is required for stationarity. If not, then $d = 0$. If yes, then try $d = 1$.
    \item Analyze ACF/PACF of $Z_t$ to determine $p,q$ for the short term and $P,Q$ at multiple-of-the-period dependency.
    \item Fit the model using \verb|arima()| by setting \verb|order=c(p,d,q)| and \verb|seasonal=c(P,D,Q)| accordingly to your choices.
    \item Check the accuracy of the model by residual analysis. The residuals must look like White Noise and +/- Gaussian.    
 \end{enumerate}
 \section{ARCH/GARCH-models}
 The basic assumption for ARCH/GARCH models is as follows:
 $$X_t = \mu_t + E_t$$
 where $E_t = \sigma_t W_t$ and $W_t$ is white noise. \\
 Here, both the conditional mean and variance are non-trivial
 $$\mu_t = E[X_t | X_{t-1},X_{t-2},\dots], \, \sigma_t^2 = Var[X_t | X_{t-1},X_{t-2},\dots]$$
 and can be modelled using a mixture of ARMA and GARCH. \\
 \vspace{.2cm}
 For simplicity, we here assume that both the conditional and the global mean are zero $\mu = \mu_t = 0$ and consider pure ARCH processes only where:
 $$X_t = \sigma_t W_t \; \mathrm{with} \; \sigma_t = f(X_{t-1}^2,X_{t-2}^2,\dots,X_{t-p}^2)$$
 \subsection{ARCH(p)-model}
 A time series X t is \textit{autoregressive conditional heteroskedastic} of order $p$, abbreviated ARCH($p$), if:
 $$X_t = \sigma_t W_t$$
 with $\sigma_t = \sqrt{\alpha_0 + \sum_{i=1}^p \alpha_p X_{t-i}^2}$
 It is obvious that an ARCH($p$) process shows volatility, as:
 $$Var(X_t | X_{t-1},X_{t-2},\dots]) = \alpha_0 + \alpha_1 Var(X_t | \dots]) + \dots + \alpha_p Var(X_t | \dots])$$
 We can determine the order of an ARCH($p$) process in by analyzing ACF and PACF of the squared time series data. We then again search for an exponential decay in the ACF and a cut-off in the PACF.
 \subsubsection{Fitting an ARCH(2)-model}
 The simplest option for fitting an ARCH($p$) in R is to use function \verb|garch()| from \verb|library(tseries)|. Be careful, because the \verb|order=c(q,p)| argument differs from most of the literature.
 \begin{lstlisting}[language=R]
 > fit <- garch(lret.smi, order = c(0,2))
 > fit
 Call: garch(x = lret.smi, order = c(0, 2))
 Coefficient(s):
       a0         a1         a2
 6.568e-05  1.309e-01  1.074e-01
 \end{lstlisting}
 We recommend to run residual analysis afterwards.
 \section{General concepts}
 \subsection{AIC}
-The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows:
+The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows (\textbf{lower is better}):
 $$AIC = -2 \log (L) + 2(p+q+k+1)$$
 where
 \begin{itemize}
@ -960,6 +1076,7 @@ where
 \end{itemize}
 For small samples $n$, often a corrected version is used:
 $$AICc = AIC + \frac{2(p + q + k + 1)(p + q + k + 2)}{n - p - q - k - 2}$$
 \scriptsize
 \newpage