diff --git a/main.tex b/main.tex index a3f2d21..0db7ce7 100644 --- a/main.tex +++ b/main.tex @@ -942,16 +942,132 @@ $$\hat{\beta} =(X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} y$$ With $Var(\hat{\beta}) = (X^T \Sigma^{-1} X)^{-1} \sigma^2$ \subsubsection{R example} +Package \verb|nlme| has function \verb|gls()|. It does only work if the correlation structure of the errors is provided. This has to be determined from the residuals of an OLS regression first. \begin{lstlisting}[language=R] > library(nlme) > corStruct <- corARMA(form=~time, p=2) > fit.gls <- gls(temp~time+season, data=dat,correlation=corStruct) \end{lstlisting} +The output contains the regression coefficients and their standard errors, as well as the AR-coefficients plus some further information about the model (Log-Likelihood, AIC, ...). + +\subsection{Missing input variables} +\begin{itemize} + \item Correlated errors in (time series) regression problems are often caused by the absence of crucial input variables. + \item In such cases, it is much better to identify the not-yet-present variables and include them into the regression model. + \item However, in practice this isn‘t always possible, because these crucial variables may be non-available. + \item \textbf{Note:} Time series regression methods for correlated errors such as GLS can be seen as a sort of emergency kit for the case where the non-present variables cannot be added. If you can do without them, even better! +\end{itemize} + +\section{ARIMA and SARIMA} +\textbf{Why?} \\ +Many time series in practice show trends and/or seasonality. While we can decompose them and describe the stationary part, it might be attractive to directly model them. \\ +\vspace{.2cm} +\textbf{Advantages} \\ +Forecasting is convenient and AIC-based decisions for the presence of trend/seasonality become feasible. \\ +\vspace{.2cm} +\textbf{Disadvantages} \\ +Lack of transparency for the decomposition and forecasting has a bit the flavor of a black-box-method. \\ + +\subsection{ARIMA(p,d,q)-models} +ARIMA models are aimed at describing series that have a trend which can be removed by differencing, and where the differences can be described with an ARMA($p,q$)-model. \\ +\vspace{.2cm} +\textbf{Definition}\\ +If +$$Y_t = X_t - X_{t-1} = (1-B)^d X_t \sim ARMA(p,q)$$ +then +$$X_t \sim ARIMA(p,d,q)$$ +In most practical cases, using $d = 1$ will be enough! \\ +\vspace{.2cm} +\textbf{Notation}\\ +$$\Phi(B)(1-B)^d X_t = \Theta(B)(E_t)$$ +\vspace{.2cm} +\textbf{Stationarity}\\ +ARIMA-processes are non-stationary if $d > 0$, option to rewrite as non-stationary ARMA(p,q). + +\subsubsection{Fitting ARIMA in R} +\begin{enumerate} + \item Choose the appropriate order of differencing, usually $d = 1$ or (in rare cases) $d = 2$ , such that the result is a stationary series. + \item Analyze ACF and PACF of the differenced series. If the stylized facts of an ARMA process are present, decide for the orders $p$ and $q$. + \item Fit the model using the arima() procedure. This can be done on the original series by setting $d$ accordingly, or on the differences, by setting $d = 0$ and argument \verb|include.mean=FALSE|. + \item Analyze the residuals; these must look like White Noise. If several competing models are appropriate, use AIC to decide for the winner. +\end{enumerate} + +\textbf{Example}\footnote{Full example in script pages 117ff}{} \\ +Plausible models for the logged oil prices after inspection of ACF/PACF of the differenced series (that seems stationary): ARIMA(1,1,1) or ARIMA(2,1,1) +\begin{lstlisting}[language=R] +> arima(lop, order=c(1,1,1)) +Coefficients: + ar1 ma1 + -0.2987 0.5700 +s.e. 0.2009 0.1723 +sigma^2 = 0.006642: ll = 261.11, aic = -518.22 +\end{lstlisting} + +\subsubsection{Rewriting ARIMA as Non-Stationary ARMA} +Any ARIMA(p,d,q) model can be rewritten in the form of a non-stationary ARMA((p+d),q) process. This provides some deeper insight, especially for the task of forecasting. + +\subsection{SARIMA(p,d,q)(P,D,Q)$^S$} +We have learned that it is also possible to use differencing for obtaining a stationary series out of one that features both trend and seasonal effect. +\begin{enumerate} + \item Removing the seasonal effect by differencing at lag 12 \\ \begin{center}$Y_t = X_t - X_{t-12} = (1-B^{12})X_t$ \end{center} + \item Usually, further differencing at lag 1 is required to obtain a series that has constant global mean and is stationary \\ \begin{center} $Z_t = Y_t - Y_{t-1} = (1-B^{12})Y_t = (1-B)(1-B^{12})X_t = X_t - X_{t-1} - X_{t-12} + X_{t-13}$ \end{center} +\end{enumerate} +The stationary series $Z_t$ is then modelled with some special kind of ARMA($p,q$) model. \\ +\vspace{.2cm} + +\textbf{Definition} \\ +A series $X_t$ follows a SARIMA($p,d,q$)($P,D,Q$)$^S$-process if the following equation holds: +$$\Phi(B)\Phi_s (B^S) Z_t = \Theta(B) \Theta_S (B^S) E_t$$ +Here, series Z t originated from $X_t$ after appropriate seasonal and trend differencing: $Z_t = (1-B)^d (1-B^S)^D X_t$ \\ +\vspace{.2cm} +In most practical cases, using differencing order $d = D = 1$ will be sufficient. Choosing of $p,q,P,Q$ happens via ACF/PACF or via AIC-based decisions. + +\subsubsection{Fitting SARIMA} +\begin{enumerate} + \item Perform seasonal differencing of the data. The lag $S$ is determined by the period. Order $D = 1$ is mostly enough. + \item Decide if additional differencing at lag 1 is required for stationarity. If not, then $d = 0$. If yes, then try $d = 1$. + \item Analyze ACF/PACF of $Z_t$ to determine $p,q$ for the short term and $P,Q$ at multiple-of-the-period dependency. + \item Fit the model using \verb|arima()| by setting \verb|order=c(p,d,q)| and \verb|seasonal=c(P,D,Q)| accordingly to your choices. + \item Check the accuracy of the model by residual analysis. The residuals must look like White Noise and +/- Gaussian. +\end{enumerate} + +\section{ARCH/GARCH-models} +The basic assumption for ARCH/GARCH models is as follows: +$$X_t = \mu_t + E_t$$ +where $E_t = \sigma_t W_t$ and $W_t$ is white noise. \\ +Here, both the conditional mean and variance are non-trivial +$$\mu_t = E[X_t | X_{t-1},X_{t-2},\dots], \, \sigma_t^2 = Var[X_t | X_{t-1},X_{t-2},\dots]$$ +and can be modelled using a mixture of ARMA and GARCH. \\ +\vspace{.2cm} +For simplicity, we here assume that both the conditional and the global mean are zero $\mu = \mu_t = 0$ and consider pure ARCH processes only where: +$$X_t = \sigma_t W_t \; \mathrm{with} \; \sigma_t = f(X_{t-1}^2,X_{t-2}^2,\dots,X_{t-p}^2)$$ + +\subsection{ARCH(p)-model} +A time series X t is \textit{autoregressive conditional heteroskedastic} of order $p$, abbreviated ARCH($p$), if: +$$X_t = \sigma_t W_t$$ +with $\sigma_t = \sqrt{\alpha_0 + \sum_{i=1}^p \alpha_p X_{t-i}^2}$ +It is obvious that an ARCH($p$) process shows volatility, as: +$$Var(X_t | X_{t-1},X_{t-2},\dots]) = \alpha_0 + \alpha_1 Var(X_t | \dots]) + \dots + \alpha_p Var(X_t | \dots])$$ + +We can determine the order of an ARCH($p$) process in by analyzing ACF and PACF of the squared time series data. We then again search for an exponential decay in the ACF and a cut-off in the PACF. + +\subsubsection{Fitting an ARCH(2)-model} +The simplest option for fitting an ARCH($p$) in R is to use function \verb|garch()| from \verb|library(tseries)|. Be careful, because the \verb|order=c(q,p)| argument differs from most of the literature. +\begin{lstlisting}[language=R] +> fit <- garch(lret.smi, order = c(0,2)) +> fit +Call: garch(x = lret.smi, order = c(0, 2)) + +Coefficient(s): + a0 a1 a2 +6.568e-05 1.309e-01 1.074e-01 +\end{lstlisting} +We recommend to run residual analysis afterwards. \section{General concepts} \subsection{AIC} -The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows: +The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows (\textbf{lower is better}): $$AIC = -2 \log (L) + 2(p+q+k+1)$$ where \begin{itemize} @@ -960,6 +1076,7 @@ where \end{itemize} For small samples $n$, often a corrected version is used: $$AICc = AIC + \frac{2(p + q + k + 1)(p + q + k + 2)}{n - p - q - k - 2}$$ + \scriptsize \newpage