Begin ts regression

This commit is contained in:
jannisp 2021-08-23 19:21:53 +02:00
parent fe179958fb
commit 92099f5094

View file

@ -882,6 +882,73 @@ In R, finding the AIC-minimizing $ARMA(p,q)$-model is convenient with the use of
\vspace{.2cm} \vspace{.2cm}
Using \verb|auto.arima()| should always be complemented by visual inspection of the time series for assessing stationarity, verifying the ACF/PACF plots for a second thought on suitable models. Finally, model diagnostics with the usual residual plots will decide whether the model is useful in practice. Using \verb|auto.arima()| should always be complemented by visual inspection of the time series for assessing stationarity, verifying the ACF/PACF plots for a second thought on suitable models. Finally, model diagnostics with the usual residual plots will decide whether the model is useful in practice.
\section{Time series regression}
We speak of time series regression if response and predictors are time series, i.e. if they were observed in a sequence.
\subsection{Model}
In principle, it is perfectly fine to apply the usual OLS setup:
$$Y_t = \beta_0 + \beta_1 x_{t1} + \dots + \beta_q x_{tp} + E_t$$
Be careful: this assumes that the errors $E_t$ are uncorrelated (often not the case)! \\
\vspace{.2cm}
With correlated errors, the estimates $\hat{\beta}_j$ are still unbiased, but more efficient estimators than OLS exist. The standard errors are wrong, often underestimated, causing spurious significance. $\rightarrow$ GLS!
\begin{itemize}
\item The series $Y_t, x_{t1} ,\dots, x_{tp}$ can be stationary or non-stationary.
\item It is crucial that there is no feedback from the response $Y_t$ to the predictor variables $x_{t1},\dots, x_{tp}$ , i.e. we require an input/output system.
\item $E_t$ must be stationary and independent of $x_{t1},\dots, x_{tp}$, but may be Non-White-Noise with some serial correlation.
\end{itemize}
\subsubsection{Finding correlated errors}
\begin{enumerate}
\item Start by fitting an OLS regression and analyze residuals
\item Continue with a time series plot of OLS residuals
\item Also analyze ACF and PACF of OLS residuals
\end{enumerate}
\subsubsection{Durbin-Watson test}
The Durbin-Watson approach is a test for autocorrelated errors in regression modeling based on the test statistic:
$$D = \frac{\sum_{t=2}^N (r_t - r_{t-1})^2}{\sum_{t=1}^N r_t^2} \approx 2(1-\hat{\rho}_1) \in [0,4]$$
\begin{itemize}
\item This is implemented in R: \verb|dwtest()| in \verb|library(lmtest)|. A p-value for the null of no autocorrelation is computed.
\item This test does not detect all autocorrelation structures. If the null is not rejected, the residuals may still be autocorrelated.
\item Never forget to check ACF/PACF of the residuals! (Test has only limited power)
\end{itemize}
Example:
\begin{lstlisting}[language=R]
> library(lmtest)
> dwtest(fit.lm)
data: fit.lm
DW = 0.5785, p-value < 2.2e-16
alt. hypothesis: true autocorrelation is greater than 0
\end{lstlisting}
\subsubsection{Cochrane-Orcutt method}
This is a simple, iterative approach for correctly dealing with time series regression. We consider the pollutant example:
$$Y_t = \beta_0 + \beta_1 x_{t1} + \beta_2 x_{t2} + E_t$$
with
$$E_t = \alpha E_{t-1} + U_t$$
and $U_t \sim N(0, \sigma_U^2)$ i.i.d. \\
\vspace{.2cm}
The fundamental trick is using the transformation\footnote{See script for more details}:
$$Y_t' = Y_t - \alpha Y_{t-1}$$
This will lead to a regression problem with i.i.d. errors:
$$Y_t' = \beta_0' + \beta1 x'_{t1} \beta_2 x'_{t2} + U_t$$
The idea is to run an OLS regression first, determine the transformation from the residuals and finally obtaining corrected estimates.
\subsection{Generalized least squares (GLS)}
OLS regression assumes a diagonal error covariance matrix, but there is a generalization to $Var(E) = \sigma^2 \Sigma$. \\
For using the GLS approach, i.e. for correcting the dependent errors, we need an estimate of the error covariance matrix $\Sigma = SS^T$. \\
We can the obtain the (simultaneous) estimates:
$$\hat{\beta} =(X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} y$$
With $Var(\hat{\beta}) = (X^T \Sigma^{-1} X)^{-1} \sigma^2$
\subsubsection{R example}
\begin{lstlisting}[language=R]
> library(nlme)
> corStruct <- corARMA(form=~time, p=2)
> fit.gls <- gls(temp~time+season, data=dat,correlation=corStruct)
\end{lstlisting}
\section{General concepts} \section{General concepts}
\subsection{AIC} \subsection{AIC}
The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows: The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows: