From 92099f50942277304095bfb6ed226d6ffed716a4 Mon Sep 17 00:00:00 2001 From: jannisp Date: Mon, 23 Aug 2021 19:21:53 +0200 Subject: [PATCH] Begin ts regression --- main.tex | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/main.tex b/main.tex index 704482f..a3f2d21 100644 --- a/main.tex +++ b/main.tex @@ -882,6 +882,73 @@ In R, finding the AIC-minimizing $ARMA(p,q)$-model is convenient with the use of \vspace{.2cm} Using \verb|auto.arima()| should always be complemented by visual inspection of the time series for assessing stationarity, verifying the ACF/PACF plots for a second thought on suitable models. Finally, model diagnostics with the usual residual plots will decide whether the model is useful in practice. +\section{Time series regression} +We speak of time series regression if response and predictors are time series, i.e. if they were observed in a sequence. +\subsection{Model} +In principle, it is perfectly fine to apply the usual OLS setup: +$$Y_t = \beta_0 + \beta_1 x_{t1} + \dots + \beta_q x_{tp} + E_t$$ +Be careful: this assumes that the errors $E_t$ are uncorrelated (often not the case)! \\ +\vspace{.2cm} +With correlated errors, the estimates $\hat{\beta}_j$ are still unbiased, but more efficient estimators than OLS exist. The standard errors are wrong, often underestimated, causing spurious significance. $\rightarrow$ GLS! +\begin{itemize} + \item The series $Y_t, x_{t1} ,\dots, x_{tp}$ can be stationary or non-stationary. + \item It is crucial that there is no feedback from the response $Y_t$ to the predictor variables $x_{t1},\dots, x_{tp}$ , i.e. we require an input/output system. + \item $E_t$ must be stationary and independent of $x_{t1},\dots, x_{tp}$, but may be Non-White-Noise with some serial correlation. +\end{itemize} + +\subsubsection{Finding correlated errors} +\begin{enumerate} + \item Start by fitting an OLS regression and analyze residuals + \item Continue with a time series plot of OLS residuals + \item Also analyze ACF and PACF of OLS residuals +\end{enumerate} + +\subsubsection{Durbin-Watson test} +The Durbin-Watson approach is a test for autocorrelated errors in regression modeling based on the test statistic: +$$D = \frac{\sum_{t=2}^N (r_t - r_{t-1})^2}{\sum_{t=1}^N r_t^2} \approx 2(1-\hat{\rho}_1) \in [0,4]$$ + +\begin{itemize} + \item This is implemented in R: \verb|dwtest()| in \verb|library(lmtest)|. A p-value for the null of no autocorrelation is computed. + \item This test does not detect all autocorrelation structures. If the null is not rejected, the residuals may still be autocorrelated. + \item Never forget to check ACF/PACF of the residuals! (Test has only limited power) +\end{itemize} +Example: +\begin{lstlisting}[language=R] +> library(lmtest) +> dwtest(fit.lm) +data: fit.lm +DW = 0.5785, p-value < 2.2e-16 +alt. hypothesis: true autocorrelation is greater than 0 +\end{lstlisting} + +\subsubsection{Cochrane-Orcutt method} +This is a simple, iterative approach for correctly dealing with time series regression. We consider the pollutant example: +$$Y_t = \beta_0 + \beta_1 x_{t1} + \beta_2 x_{t2} + E_t$$ +with +$$E_t = \alpha E_{t-1} + U_t$$ +and $U_t \sim N(0, \sigma_U^2)$ i.i.d. \\ +\vspace{.2cm} +The fundamental trick is using the transformation\footnote{See script for more details}: +$$Y_t' = Y_t - \alpha Y_{t-1}$$ +This will lead to a regression problem with i.i.d. errors: +$$Y_t' = \beta_0' + \beta1 x'_{t1} \beta_2 x'_{t2} + U_t$$ +The idea is to run an OLS regression first, determine the transformation from the residuals and finally obtaining corrected estimates. + +\subsection{Generalized least squares (GLS)} +OLS regression assumes a diagonal error covariance matrix, but there is a generalization to $Var(E) = \sigma^2 \Sigma$. \\ +For using the GLS approach, i.e. for correcting the dependent errors, we need an estimate of the error covariance matrix $\Sigma = SS^T$. \\ +We can the obtain the (simultaneous) estimates: +$$\hat{\beta} =(X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} y$$ +With $Var(\hat{\beta}) = (X^T \Sigma^{-1} X)^{-1} \sigma^2$ + +\subsubsection{R example} +\begin{lstlisting}[language=R] +> library(nlme) +> corStruct <- corARMA(form=~time, p=2) +> fit.gls <- gls(temp~time+season, data=dat,correlation=corStruct) +\end{lstlisting} + + \section{General concepts} \subsection{AIC} The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows: