Begin ts regression

2021-08-23 19:21:53 +02:00 · 2021-08-23 19:21:53 +02:00 · 92099f5094
commit 92099f5094
parent fe179958fb
1 changed files with 67 additions and 0 deletions
--- a/main.tex
+++ b/main.tex
@ -882,6 +882,73 @@ In R, finding the AIC-minimizing $ARMA(p,q)$-model is convenient with the use of
 \vspace{.2cm}
 Using \verb|auto.arima()| should always be complemented by visual inspection of the time series for assessing stationarity, verifying the ACF/PACF plots for a second thought on suitable models. Finally, model diagnostics with the usual residual plots will decide whether the model is useful in practice.

+\section{Time series regression}
+We speak of time series regression if response and predictors are time series, i.e. if they were observed in a sequence.
+\subsection{Model}
+In principle, it is perfectly fine to apply the usual OLS setup:
+$$Y_t = \beta_0 + \beta_1 x_{t1} + \dots + \beta_q x_{tp} + E_t$$
+Be careful: this assumes that the errors $E_t$ are uncorrelated (often not the case)! \\
+\vspace{.2cm}
+With correlated errors, the estimates $\hat{\beta}_j$ are still unbiased, but more efficient estimators than OLS exist. The standard errors are wrong, often underestimated, causing spurious significance. $\rightarrow$ GLS!
+\begin{itemize}
+    \item The series $Y_t, x_{t1} ,\dots, x_{tp}$ can be stationary or non-stationary.
+    \item  It is crucial that there is no feedback from the response $Y_t$ to the predictor variables $x_{t1},\dots, x_{tp}$ , i.e. we require an input/output system.
+    \item $E_t$ must be stationary and independent of $x_{t1},\dots, x_{tp}$, but may be Non-White-Noise with some serial correlation.    
+\end{itemize}
+
+\subsubsection{Finding correlated errors}
+\begin{enumerate}
+    \item Start by fitting an OLS regression and analyze residuals
+    \item Continue with a time series plot of OLS residuals
+    \item Also analyze ACF and PACF of OLS residuals
+\end{enumerate}
+
+\subsubsection{Durbin-Watson test}
+The Durbin-Watson approach is a test for autocorrelated errors in regression modeling based on the test statistic:
+$$D = \frac{\sum_{t=2}^N (r_t - r_{t-1})^2}{\sum_{t=1}^N r_t^2} \approx 2(1-\hat{\rho}_1) \in [0,4]$$
+
+\begin{itemize}
+    \item This is implemented in R: \verb|dwtest()| in \verb|library(lmtest)|. A p-value for the null of no autocorrelation is computed.
+    \item This test does not detect all autocorrelation structures. If the null is not rejected, the residuals may still be autocorrelated.
+    \item Never forget to check ACF/PACF of the residuals! (Test has only limited power)
+\end{itemize}
+Example:
+\begin{lstlisting}[language=R]
+> library(lmtest)
+> dwtest(fit.lm)
+data: fit.lm
+DW = 0.5785, p-value < 2.2e-16
+alt. hypothesis: true autocorrelation is greater than 0
+\end{lstlisting}
+
+\subsubsection{Cochrane-Orcutt method}
+This is a simple, iterative approach for correctly dealing with time series regression. We consider the pollutant example:
+$$Y_t = \beta_0 + \beta_1 x_{t1} + \beta_2 x_{t2} + E_t$$
+with
+$$E_t = \alpha E_{t-1} + U_t$$
+and $U_t \sim N(0, \sigma_U^2)$ i.i.d. \\
+\vspace{.2cm}
+The fundamental trick is using the transformation\footnote{See script for more details}:
+$$Y_t' = Y_t - \alpha Y_{t-1}$$
+This will lead to a regression problem with i.i.d. errors:
+$$Y_t' = \beta_0' + \beta1 x'_{t1} \beta_2 x'_{t2} + U_t$$
+The idea is to run an OLS regression first, determine the transformation from the residuals and finally obtaining corrected estimates.
+
+\subsection{Generalized least squares (GLS)}
+OLS regression assumes a diagonal error covariance matrix, but there is a generalization to $Var(E) = \sigma^2 \Sigma$. \\
+For using the GLS approach, i.e. for correcting the dependent errors, we need an estimate of the error covariance matrix $\Sigma = SS^T$. \\
+We can the obtain the (simultaneous) estimates:
+$$\hat{\beta} =(X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} y$$
+With $Var(\hat{\beta}) = (X^T \Sigma^{-1} X)^{-1} \sigma^2$
+
+\subsubsection{R example}
+\begin{lstlisting}[language=R]
+> library(nlme)
+> corStruct <- corARMA(form=~time, p=2)
+> fit.gls <- gls(temp~time+season, data=dat,correlation=corStruct)
+\end{lstlisting}
+
+
 \section{General concepts}
 \subsection{AIC}
 The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows: