Start forecasting

This commit is contained in:
jannisp 2021-08-27 13:23:10 +02:00
parent a2b70f07dc
commit 2f08ed9f1b
2 changed files with 140 additions and 0 deletions

BIN
img/forecast-notation.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

140
main.tex
View file

@ -1064,6 +1064,146 @@ Coefficient(s):
\end{lstlisting}
We recommend to run residual analysis afterwards.
\section{Forecasting}
\begin{tabular}{lp{.26\textwidth}}
Goal: & Point predictions for future observations with a measure of uncertainty, i.e. a 95\% prediction interval. \\
Note: & - A point prediction is basically the mean of the prediction of the stochastic distribution \\
& - builds on the dependency structure and past data \\
& - is an extrapolation, thus to take with a grain of salt \\
& - similar to driving a car by using the side mirror \\
\end{tabular}
\textbf{Notation}
\begin{figure}[H]
\centering
\includegraphics[width=.25\textwidth]{forecast-notation.png}
\label{fig:forecast-notation}
\end{figure}
\subsection{Sources of uncertainty in forecasting}
\begin{enumerate}
\item Does the data generating process from the past also apply in the future? Or are there major disruptions and discontinuities?
\item Is the model we chose correct? This applies both to the class of models (i.e. ARMA($p,q$)) as well as to the order of the model.
\item Are the model coefficients (e.g. $\alpha_1 ,..., \alpha_p; \beta_1 ,..., \beta_q; \sigma_E^2 ; m$) well estimated and accurate? How much differ they from the «truth»?
\item The stochastic variability coming from the innovation $E_t$.
\end{enumerate}
Due to the major uncertainties that are present, forecasting will usually only work reasonably on a short-term basis.
\subsection{Basics}
Probabilistic principle for deriving point forecasts:
$$\hat{X}_{n+k;1:n} = E[X_{n+k} | X_1, \dots, X_n]$$
\begin{itemize}
\item The point forecast will be based on the conditional mean.
\end{itemize}
Probabilistic principle for deriving prediction intervals:
$$\hat{\sigma}^2_{\hat{X}_{n+1;1:n}} = Var[X_{n+k} | X_1, \dots, X_n]$$
An (approximate) 95\% prediction interval will be obtained via:
$$CI: \hat{X}_{n+k;1:n} \pm 1.96 \hat{\sigma}^2_{\hat{X}_{n+l;1:n}}$$
\subsubsection{How to apply the principles?}
\begin{itemize}
\item The principles provide a generic setup, but are only useful and practicable under additional assumptions and have to be operationalized for every time series model/process.
\item For stationary AR (1) processes with normally distributed innovations, we can apply the generic principles with relative ease and derive formulae for the point forecast and the prediction interval.
\end{itemize}
\subsection{AR(p) forecasting}
The principles are the same, forecast and prognosis interval are:
$$E[X_{n+k} | X_1, \dots, X_n]$$
and
$$Var[X_{n+k} | X_1, \dots, X_n]$$
The computations are a bit more complicated, but do not yield major further insight. We are thus doing without and present: \\
\vspace{.2cm}
\begin{tabular}{ll}
1-step-forecast: & $\hat{X}_{n+1;1:n} = \alpha_1 x_n + \dots + \alpha_p x_{n+1-p}$ \\
k-step-forecast: & $\hat{X}_{n+k;1:n} = \alpha_1 \hat{X}_{n+k-1;1:n} + \dots + \alpha_p \hat{X}_{n+k-p;1:n}$
\end{tabular} \\
\vspace{.2cm}
If an observed value for $\hat{X}_{n+k-t}$ is available, we plug it in. Else, the forecasted value is used. Hence, the forecasts for horizons $k > 1$ are determined in a recursive manner.
\subsubsection{Measuring forecast error}
\textbf{When on absolute scale (no log-transformation)}:
$$MAE = \frac{1}{h} \sum_{t=n+1}^{n+h}|x_t - \hat{x_t}| = mean(|e_t|)$$
$$RMSE = \sqrt{\frac{1}{h} \sum_{t=n+1}^{n+h} (x_t - \hat{x_t})^2} = \sqrt{mean(e_t^2)}$$
in R:
\begin{lstlisting}[language=R]
> mae <- mean(abs(btest-pred$pred)); mae
[1] 0.07202408
\end{lstlisting}
\begin{lstlisting}[language=R]
> rmse <- sqrt(mean((btest-pred$pred)^2)); rmse
[1] 0.1044069
\end{lstlisting}
or using (look for the «Test set» values)
\begin{lstlisting}[language=R]
> round(accuracy(forecast(fit, h=14), btest),3)
ME RMSE MAE MPE MAPE MASE ACF1
Training 0.004 0.096 0.062 0.012 0.168 0.939 -0.068
Test set 0.049 0.104 0.072 0.132 0.195 1.092 0.337
\end{lstlisting}
\textbf{When on log-scale}:
$$MAPE = \frac{100}{h}\sum_{t=n+1}^{n+h} \bigg|\frac{x_t - \hat{x_t}}{x_t} \bigg|$$
\subsubsection{Going back to the original scale}
\begin{itemize}
\item If a time series gets log-transformed, we will study its character and its dependencies on the transformed scale. This is also where we will fit time series models.
\item If forecasts are produced, one is most often interested in the value on the original scale. Now, caution is needed: \\ $\exp(\hat{x}_t)$ yields a biased forecast, the median of the forecast distribution. This is the value that 50\% of the realizations will lie above, and 50\% will be below. For an unbiased forecast, i.e. obtaining the mean, we need:
\end{itemize}
$$\exp(\hat{x}_t)\bigg(1 + \frac{\hat{\sigma}_h^2}{2} \bigg)$$
where $\hat{\sigma}_k^2$ is equal to the k-step forecast variance.
\subsubsection{Remarks}
\begin{itemize}
\item AR($p$) processes have a Markov property. Given the model parameters, we only need to know the last $p$ observations in the series to compute the forecast and prognosis interval.
\item The prognosis intervals are only valid on a pointwise basis, and they generally only cover the uncertainty coming from innovation, but not from other sources. Hence, they are generally too small.
\item Retaining the final part of the series, and predicting it with several competing models may give hints which one yields the best forecasts. This can be an alternative approach for choosing the model order $p$.
\end{itemize}
\subsection{Forecasting MA(q) and ARMA(p,q)}
\begin{itemize}
\item Point and interval forecasts will again, as for AR($p$), be derived from the theory of conditional mean and variance.
\item The derivation is more complicated, as it involves the latent innovations terms $e_n, e_{n-1},e_{n-2} ,...$ or alternatively not observed time series instances $x_{-\infty},...,x_{-1},x_0$.
\item Under invertibility of the MA($q$)-part, the forecasting problem can be approximately but reasonably solved by choosing starting values $x_{-\infty}=...=x_{-1}=x_0 = 0$.
\end{itemize}
\subsubsection{MA(1) example}
\begin{itemize}
\item We have seen that for all non-shifted MA($1$)-processes, the $k$-step forecast for all $k>1$ is trivial and equal to $0$.
\item In case of $k=1$, we obtain for the MA($1$)-forecast: \\
\begin{center}
$\hat{X}_{n+1;1:n} = \beta_1 E[E_n | X_1,\dots,X_n]$
\end{center}
This conditional expectation is (too) difficult to compute, but we can get out by conditioning on the infinite past:
\begin{center}$e_n := E[E_n | X_{-\infty},\dots,X_n]$\end{center}
\item We then express the MA($1$) as an AR($\infty$) and obtain:
\begin{center}
$\hat{X}_{n+1;1:n} = \sum_{j=0}^{n-1} \hat{\beta_1}(-\hat{\beta_1})^j x_{n-j} = \sum_{j=0}^{n-1} \hat{\Psi}_j^{(1)} x_{n-j}$
\end{center}
\end{itemize}
\subsubsection{General MA(q) forecasting}
\begin{itemize}
\item With MA($q$) models, all forecasts for horizons $k>q$ will be trivial and equal to zero. This is not the case for $k \leq q$.
\item We encounter the same difficulties as with MA($1$) processes. By conditioning on the infinite past, rewriting the MA($q$) as an AR($\infty$) and the choice of initial zero values for times $t \geq 0$, the forecasts can be computed.
\item We do without giving precise details about the involved formulae here, but refer to the general results for ARMA($p,q$), from where the solution for pure MA($q$) can be obtained.
\item In R, functions \verb|predict()| and \verb|forecast()| implement all this!
\end{itemize}
\subsection{Forecasting with trend and seasonality}
Time series with a trend and/or seasonal effect can either be predicted after decomposing or with exponential smoothing. It is also very easy and quick to predict from a SARIMA model.
\begin{itemize}
\item The ARIMA/SARIMA model is fitted in R as usual. Then, we can simply employ the \verb|predict()| command and obtain the forecast plus a prediction interval.
\item Technically, the forecast comes from the stationary ARMA model that is obtained after differencing the series.
\item Finally, these forecasts need to be integrated again. This procedure has a bit the touch of a black box approach.
\end{itemize}
\subsubsection{ARIMA-models}
We assume that $X_t$ is an ARIMA($p,1,q$) series, so after lag $1$ differencing, we have $Y_t = X_t - X_{t-1}$ which is an ARMA($p,q$).
\begin{itemize}
\item Anchor: $\hat{X}_{n+1;1:n} = \hat{Y}_{1+n;1:n} + x_n$
\end{itemize}
\section{General concepts}
\subsection{AIC}