Integrate offline changes

This commit is contained in:
Jannis Portmann 2021-08-09 21:17:55 +02:00
parent b79d5f51ed
commit a9336daad5

181
main.tex
View file

@ -23,7 +23,7 @@
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
\definecolor{backcolour}{rgb}{0.95,0.95,0.95}
\lstdefinestyle{mystyle}{
backgroundcolor=\color{backcolour},
@ -180,7 +180,7 @@ mathematically formulated by strict stationarity.
\end{tabular}
\subsubsection{Weak}
It is impossible to „prove“ the theoretical concept of stationarity from data. We can only search for evidence in favor or against it. \\
It is impossible to "prove" the theoretical concept of stationarity from data. We can only search for evidence in favor or against it. \\
\vspace{0.1cm}
However, with strict stationarity, even finding evidence only is too difficult. We thus resort to the concept of weak stationarity.
@ -233,6 +233,183 @@ A (sometimes) useful trick, especially when working with the correlogram, is to
\label{fig:non-stationary}
\end{figure}
\section{Descriptive Analysis}
\subsection{Linear Transformation}
$$Y_t = a + bX_t$$
e.g. conversion of °F in °C \\
\vspace{.1cm}
Such linear transformations will not change the appereance of the series. All derived results (i.e. autocorrelations, models, forecasts) will be equivalent. Hence, we are free to perform linear transformations whenever it seems convenient.
\subsection{Log-Transformation}
Transforming $x_1,...,x_n$ to $g(x_1),...,g(x_n)$
$$g(\cdot) = \log(\cdot)$$
\textbf{Note:}
\begin{itemize}
\item If a time series gets log-transformed, we will study its character and its dependencies on the transformed scale. This is also where we will fit time series models.
\item If forecasts are produced, one is most often interested in the value on the original scale.
\end{itemize}
\subsubsection{When to apply log-transformation}
As we argued above, a log-transformation of the data often facilitates estimation, fitting and interpretation. When is it indicated to log-transform the data?
\begin{itemize}
\item If the time series is on a relative scale, i.e. where an absolute increment changes its meaning with the level of the series (e.g. 10 $\rightarrow$ 20 is not the same as 100 $\rightarrow$ 110).
\item If the time series is on a scale which is left closed with value zero, and right open, i.e. cannot take negative values.
\item If the marginal distribution of the time series (i.e. when analyzed with a histogram) is right-skewed.
\end{itemize}
\subsection{Box-Cox and power transformations}
$$g(x_t) = \frac{x_t^\lambda - 1}{\lambda} \, \mathrm{for} \, \lambda \neq 0, \, g(x_t) = \log(x_t) \, \mathrm{for} \, \lambda = 0$$
Box-Cox transformations, in contrast to $\log$ have no easy interpretation. Hence, they are mostly applied if utterly necessary or if the principal goal is (black-box) forecasting.
\begin{itemize}
\item In practice, one often prefers the $\log$ if $|\lambda| < 0.3$ or does w/o transformation if $|\lambda -1| < 0.3$.
\item For an unbiased forecast, correction is needed!
\end{itemize}
\subsection{Decomposition of time series}
\subsubsection{Additive decomposition}
trend + seasonal effect + remainder:
$$X_t = m_t + s_t + R_t$$
Does not occur very often in reality!
\subsubsection{Multiplicative decomposition}
In most real-world series, the additive decomposition does not apply, as seasonal and random variation increase with the level. It is often better to use
$$\log(X_t) = \log(m_t+ s_t + R_t) = \log(m_t) + \log(s_t) + \log(R_t) = m_t' + s_t' + R_t'$$
\subsubsection{Differencing}
We assume a series with an additive trend, but no seasonal variation. We can write: $X_t = m_t + R_t$ . If we perfom differencing and assume a slowly-varying trend with $m_t \approx m_{t+1}$, we obtain
$$Y_t = X_t - X_{t-1} \approx R_t - R_{t-1}$$
\begin{itemize}
\item Note that $Y_t$ are the observation-to-observation changes in the series, but no longer the observations or the remainder.
\item This may (or may not) remove trend/seasonality, but does not yield estimates for $m_t$ and $s_t$ , and not even for $R_t$.
\item For a slow, curvy trend, the mean is zero: $E[Y_t] = 0$
\end{itemize}
It is important to know that differencing creates artificial new dependencies that are different from the original ones. For illustration, consider a stochastically independent remainder:
\begin{align*}
\mathrm{Cov}(Y_t) &= \mathrm{Cov}(R_t - R_{t-1} ,R_{t-1} - R_{t-2}) \\
&= -\mathrm{Cov}(R_{t-1},R_{t-1}) \\
&\neq 0 \\
\end{align*}
\subsubsection{Higher order differencing}
The “normal” differencing from above managed to remove any linear trend from the data. In case of polynomial trend, that is no longer true. But we can take higher-order differences:
$$X_t = \alpha + \beta_1 t + \beta_2 t^2 + R_t$$
where $R_t$ is stationary
\begin{align*}
Y_t &= (1-B)^2 X_t \\
&= (X_t - X_{t-1}) - (X_{t-1} - X_{t-2}) \\
&= R_t - 2R_{t-1} + R_{t-2} + 2\beta_2
\end{align*}
Where $B$ denotes the \textbf{backshift-operator}: $B(X_t) = X_{t-1}$ \\
\vspace{.2cm}
We basically get the difference of the differences
\subsubsection{Removing seasonal trends}
Time series with seasonal effects can be made stationary through differencing by comparing to the previous periods value.
$$Y_t = (1-B^p)X_t = X_t - X_{t-p}$$
\begin{itemize}
\item Here, $p$ is the frequency of the series.
\item A potential trend which is exactly linear will be removed by the above form of seasonal differencing.
\item In practice, trends are rarely linear but slowly varying: $m_t \approx m_{t-1}$. However, here we compare $m_t$ with $m_{t-p}$, which means that seasonal differencing often fails to remove trends completely.
\end{itemize}
\subsubsection{Pros and cons of Differencing}
+ trend and seasonal effect can be removed \\
+ procedure is very quick and very simple to implement \\
- $\hat{m_t}, \hat{s_t}, \hat{R_T}$ are not known, and cannot be visualised \\
- resulting time series will be shorter than the original \\
- differencing leads to strong artificial dependencies \\
- extrapolation of $\hat{m_t}, \hat{s_t}$ is not easily possible
\subsection{Smoothing and filtering}
In the absence of a seasonal effect, the trend of a non-stationary time series can be determined by applying any additive, linear filter. We obtain a new time series $\hat{m_t}$, representing the trend (running mean):
$$\hat{m_t} = \sum_{i=-p}^q a_i X_{t+i}$$
\begin{itemize}
\item the window, defined by $p$ and $q$, can or cant be symmetric.
\item the weights, given by $a_i$ , can or cant be uniformly distributed.
\item most popular is to rely on $p = q$ and $a_i = 1/(2p+1)$.
\item other smoothing procedures can be applied, too.
\end{itemize}
In the presence a seasonal effect, smoothing approaches are still valid for estimating the trend. We have to make sure that the sum is taken over an entire season, i.e. for monthly data:
$$\hat{m_t} = \frac{1}{12}(\frac{1}{2}X_{t-6}+X_{t-5}+\dots+X_{t+5}+\frac{1}{2}X_{t+6}) \; \mathrm{for} \, t=7,\dots,n-6$$
\subsubsection{Estimating seasonal effects}
An estimate of the seasonal effect $s_t$ at time $t$ can be obtained by:
$$\hat{s_t} = x_t - \hat{m_t}$$
We basically substract the trend from the data.
\subsubsection{Estimating remainder}
$$\hat{R_t} = x_t - \hat{m_t} - \hat{s_t}$$
\begin{itemize}
\item The smoothing approach is based on estimating the trend first, and then the seasonality after removal of the trend.
\item The generalization to other periods than $p = 12$, i.e. monthly data is straighforward. Just choose a symmetric window and use uniformly distributed coefficients that sum up to 1.
\item The sum over all seasonal effects will often be close to zero. Usually, one centers the seasonal effects to mean zero.
\item This procedure is implemented in R with \verb|decompose()|. Note that it only works for seasonal series where at least two full periods were observed!
\end{itemize}
\subsubsection{Pros and cons of filtering and smoothing}
+ trend and seasonal effect can be estimated \\
+ $\hat{m_t}, \hat{s_t}, \hat{R_t}$ are explicitly known and can be visualised \\
+ procedure is transparent, and simple to implement \\
- resulting time series will be shorter than the original \\
- the running mean is not the very best smoother \\
- extrapolation of $\hat{m_t}, \hat{s_t}$ are not entirely obvious \\
- seasonal effect is constant over time \\
\subsection{STL-Decomposition}
\textit{Seasonal-Trend Decomposition Procedure by LOESS}
\begin{itemize}
\item is an iterative, non-parametric smoothing algorithm
\item yields a simultaneous estimation of trend and seasonal effect
\item similar to what was presented above, but \textbf{more robust}!
\end{itemize}
+ very simple to apply \\
+ very illustrative and quick \\
+ seasonal effect can be constant or smoothly varying \\
- model free, extrapolation and forecasting is difficult \\
\subsubsection{Using STL in R}
\verb|stl(x, s.window = ...)|, where \verb|s.window| is the span (in lags) of the loess window for seasonal extraction, which should be odd and at least 7
\subsection{Parsimonius Decomposition}
The goal is to use a simple model that features a linear trend plus a cyclic seasonal effect and a remainder term:
$$X_t = \beta_0 + \beta_1 t + \beta_2 \sin(2\pi t) + \beta_3 \cos(2\pi t) + R_t$$
\subsection{Flexible Decomposition}
We add more flexibility (i.e. degrees of freedom) to the trend and seasonal components. We will use a GAM for this decomposition, with monthly dummy variables for the seasonal effect.
$$X_t = f(t) + \alpha_{i(t)} + R_t$$
where $t \in {1,2,...,128}$ and $i(t) \in {1,2,...,12}$ \\
\vspace{.2cm}
It is not a good idea to use more than quadratic polynomials. They usually fit poorly and are erractic near the boundaries.
\subsubsection{Example in R}
\begin{lstlisting}[language=R]
library(mgcv)
tnum <- as.numeric(time(maine))
mm <- rep(c("Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep","Oct","Nov","Dec"))
mm <- factor(rep(mm,11),levels=mm)[1:128]
fit <- gam(log(maine) ~ s(tnum) + mm)
\end{lstlisting}
\section{Autocorrelation}
For most of the rest of this course, we will deal with (weakly) stationary time series. See \ref{}
\vspace{.2cm}
Definition of autocorrelation at lag $k$
$$Cor(X_{t+k},X_t) = \frac{Cov(X_{k+t},X_t)}{\sqrt{Var(X_{k+t})\cdot Var(X_t)}} = \rho(k)$$
Autocorrelation is a dimensionless measure for the strength of thelinear association between the random variables $X_{t+k}$ and $X_t$. \\
Autocorrelation estimation in a time series is based on lagged data pairs, the definitive implementation is with a plug-in estimator. \\
\vspace{.2cm}
\textbf{Example} \\
We assume $\rho(k) = 0.7$
\begin{itemize}
\item The square of the autocorrelation, i.e. $\rho(k)^2 = 0.49$, is the percentage of variability explained by the linear association between $X_t$ and its predecessor $X_{t+k}$.
\item Thus, in our example, $X_{t+k}$ accounts for roughly 49\% of the variability observed in random variable $X_t$. Only roughly because the world is seldom exactly linear.
\item From this we can also conclude that any $\rho(k) < 0.4$ is not a strong association, i.e. has a small effect on the next observation only.
\end{itemize}
\scriptsize
\section*{Copyleft}