ats-zf/main.tex

\documentclass[8pt,landscape]{article}
\usepackage{multicol}
\usepackage{calc}
\usepackage{bookmark}
\usepackage{ifthen}
\usepackage[a4paper, landscape]{geometry}
\usepackage{hyperref}
\usepackage{ccicons}
\usepackage{amsmath, amsfonts, amssymb, amsthm}
\usepackage{listings}
\usepackage{graphicx}
\usepackage{fontawesome5}
\usepackage{xcolor}
\usepackage{float}
\usepackage[
    type={CC},
    modifier={by-sa},
    version={3.0}
]{doclicense}

\graphicspath{{./img/}} 

\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.95}

\lstdefinestyle{mystyle}{
    backgroundcolor=\color{backcolour},
    commentstyle=\color{codegreen},
    keywordstyle=\color{magenta},
    numberstyle=\tiny\color{codegray},
    stringstyle=\color{codepurple},
    basicstyle=\ttfamily\footnotesize,
    breakatwhitespace=false,
    breaklines=true,
    captionpos=b,
    keepspaces=true,
    numbers=left,
    numbersep=5pt,
    showspaces=false,
    showstringspaces=false,
    showtabs=false,
    tabsize=2
}

\lstset{style=mystyle}

% To make this come out properly in landscape mode, do one of the following
% 1.
%  pdflatex latexsheet.tex
%
% 2.
%  latex latexsheet.tex
%  dvips -P pdf  -t landscape latexsheet.dvi
%  ps2pdf latexsheet.ps


% If you're reading this, be prepared for confusion.  Making this was
% a learning experience for me, and it shows.  Much of the placement
% was hacked in; if you make it better, let me know...


% 2008-04
% Changed page margin code to use the geometry package. Also added code for
% conditional page margins, depending on paper size. Thanks to Uwe Ziegenhagen
% for the suggestions.

% 2006-08
% Made changes based on suggestions from Gene Cooperman. <gene at ccs.neu.edu>


% To Do:
% \listoffigures \listoftables
% \setcounter{secnumdepth}{0}


% This sets page margins to .5 inch if using letter paper, and to 1cm
% if using A4 paper. (This probably isn't strictly necessary.)
% If using another size paper, use default 1cm margins.
\ifthenelse{\lengthtest { \paperwidth = 11in}}
	{ \geometry{top=.5in,left=.5in,right=.5in,bottom=.5in} }
	{\ifthenelse{ \lengthtest{ \paperwidth = 297mm}}
		{\geometry{top=1cm,left=1cm,right=1cm,bottom=1cm} }
		{\geometry{top=1cm,left=1cm,right=1cm,bottom=1cm} }
	}

% Turn off header and footer
\pagestyle{empty}


% Redefine section commands to use less space
\makeatletter
\renewcommand{\section}{\@startsection{section}{1}{0mm}%
                                {-1ex plus -.5ex minus -.2ex}%
                                {0.5ex plus .2ex}%x
                                {\normalfont\large\bfseries}}
\renewcommand{\subsection}{\@startsection{subsection}{2}{0mm}%
                                {-1explus -.5ex minus -.2ex}%
                                {0.5ex plus .2ex}%
                                {\normalfont\normalsize\bfseries}}
\renewcommand{\subsubsection}{\@startsection{subsubsection}{3}{0mm}%
                                {-1ex plus -.5ex minus -.2ex}%
                                {1ex plus .2ex}%
                                {\normalfont\small\bfseries}}


\makeatother

% Define BibTeX command
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
    T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}

% Don't print section numbers
% \setcounter{secnumdepth}{0}


\setlength{\parindent}{0pt}
\setlength{\parskip}{0pt plus 0.5ex}

% -----------------------------------------------------------------------

\begin{document}

\raggedright
\footnotesize
\begin{multicols*}{3}


% multicol parameters
% These lengths are set only within the two main columns
%\setlength{\columnseprule}{0.25pt}
\setlength{\premulticols}{1pt}
\setlength{\postmulticols}{1pt}
\setlength{\multicolsep}{1pt}
\setlength{\columnsep}{2pt}

\begin{center}
     \Large{Applied Time Series } \\
    \small{\href{http://vvz.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2021S&ansicht=LEHRVERANSTALTUNGEN&lerneinheitId=149645&lang=de}{401-6624-11L}} \\
    \small{Jannis Portmann \the\year} \\
    {\ccbysa}
\rule{\linewidth}{0.25pt}
\end{center}

\section{Mathematical Concepts}

For the \textbf{time series process}, we have to assume the following

\subsection{Stochastic Model}
From the lecture
\begin{quote}
    A time series process is a set $\{X_t, t \in T\}$ of random variables, where $T$ is the set of times. Each of the random variables $X_t,t \in t$ has a univariate probability distribution $F_t$.
\end{quote}
\begin{itemize}
    \item If we exclusively consider time series processes with
    equidistant time intervals, we can enumerate $\{T = 1,2,3,...\}$
    \item An observed time series is a realization of $X = \{X_1 ,..., X_n\}$,
    and is denoted with small letters as $x = (x_1 ,... , x_n)$.
    \item We have a multivariate distribution, but only 1 observation
    (i.e. 1 realization from this distribution) is available. In order
    to perform “statistics”, we require some additional structure.
\end{itemize}

\subsection{Stationarity}
\subsubsection{Strict}
For being able to do statistics with time series, we require that the
series “doesn’t change its probabilistic character” over time. This is
mathematically formulated by strict stationarity.

\begin{quote}
    A time series $\{X_t, t \in T\}$  is strictly stationary, if the joint distribution of the random vector $(X_t ,... , X_{t+k})$ is equal to the one of $(X_s ,... , X_{s+k})$ for all combinations of $t,s$ and $k$
\end{quote}

\begin{tabular}{ll}
    $X_t \sim F$  & all $X_t$ are identically distributed \\
    $E[X_t] = \mu$ & all $X_t$ have identical expected value \\
    $Var(X_t) = \sigma^2$ & all $X_t$ have identical variance \\
    $Cov[X_t,X_{t+h}] = \gamma_h$ & autocovariance depends only on lag $h$ \\   
\end{tabular}

\subsubsection{Weak}
It is impossible to "prove" the theoretical concept of stationarity from data. We can only search for evidence in favor or against it. \\
\vspace{0.1cm}
However, with strict stationarity, even finding evidence only is too difficult. We thus resort to the concept of weak stationarity.

\begin{quote}
    A time series $\{X_t , t \in T\}$ is said to be weakly stationary, if \\
    $E[X_t] = \mu$ \\
    $Cov(X_t,X_{t+h} = \gamma_h)$, for all lags $h$ \\
    and thus $Var(X_t) = \sigma^2$
\end{quote}

\subsubsection{Testing stationarity}
\begin{itemize}
    \item  In time series analysis, we need to verify whether the series has arisen from a stationary process or not. Be careful: stationarity is a property of the process, and not of the data.
    \item Treat stationarity as a hypothesis! We may be able to reject it when the data strongly speak against it. However, we can never prove stationarity with data. At best, it is plausible.
    \item Formal tests for stationarity do exist. We discourage their use due to their low power for detecting general non-stationarity, as well as their complexity.
\end{itemize}

\textbf{Evidence for non-stationarity}
\begin{itemize}
    \item Trend, i.e. non-constant expected value
    \item Seasonality, i.e. deterministic, periodical oscillations
    \item Non-constant variance, i.e. multiplicative error
    \item Non-constant dependency structure
\end{itemize}

\textbf{Strategies for Detecting Non-Stationarity}
\begin{itemize}
    \item Time series plot
        \subitem - non-constant expected value (trend/seasonal effect)
        \subitem - changes in the dependency structure
        \subitem - non-constant variance
    \item Correlogram (presented later...)
        \subitem - non-constant expected value (trend/seasonal effect)
        \subitem - changes in the dependency structure
\end{itemize}
A (sometimes) useful trick, especially when working with the correlogram, is to split up the series in two or more parts, and producing plots for each of the pieces separately.

\subsection{Examples}
\begin{figure}[H]
    \centering
    \includegraphics[width=.25\textwidth]{stationary.png}
    \caption{Stationary Series}
    \label{fig:stationary}
\end{figure}

\begin{figure}[H]
    \centering
    \includegraphics[width=.25\textwidth]{non-stationary.png}
    \caption{Non-stationary Series}
    \label{fig:non-stationary}
\end{figure}

\section{Descriptive Analysis}
\subsection{Linear Transformation}
$$Y_t = a + bX_t$$
e.g. conversion of °F in °C \\
\vspace{.1cm}
Such linear transformations will not change the appereance of the series. All derived results (i.e. autocorrelations, models, forecasts) will be equivalent. Hence, we are free to perform linear transformations whenever it seems convenient.

\subsection{Log-Transformation}
Transforming $x_1,...,x_n$ to $g(x_1),...,g(x_n)$
$$g(\cdot) = \log(\cdot)$$
\textbf{Note:}
\begin{itemize}
    \item If a time series gets log-transformed, we will study its character and its dependencies on the transformed scale. This is also where we will fit time series models.
    \item If forecasts are produced, one is most often interested in the value on the original scale.
\end{itemize}

\subsubsection{When to apply log-transformation}
As we argued above, a log-transformation of the data often facilitates estimation, fitting and interpretation. When is it indicated to log-transform the data?
\begin{itemize}
    \item If the time series is on a relative scale, i.e. where an absolute increment changes its meaning with the level of the series (e.g. 10 $\rightarrow$ 20 is not the same as 100 $\rightarrow$ 110).
    \item If the time series is on a scale which is left closed with value zero, and right open, i.e. cannot take negative values.
    \item If the marginal distribution of the time series (i.e. when analyzed with a histogram) is right-skewed.
\end{itemize}

\subsection{Box-Cox and power transformations}
$$g(x_t) = \frac{x_t^\lambda - 1}{\lambda} \, \mathrm{for} \, \lambda \neq 0, \, g(x_t) = \log(x_t) \, \mathrm{for} \, \lambda = 0$$
Box-Cox transformations, in contrast to $\log$ have no easy interpretation. Hence, they are mostly applied if utterly necessary or if the principal goal is (black-box) forecasting.
\begin{itemize}
    \item In practice, one often prefers the $\log$ if $|\lambda| < 0.3$ or does w/o transformation if $|\lambda -1| <  0.3$.
    \item For an unbiased forecast, correction is needed!
\end{itemize}

\subsection{Decomposition of time series}
\subsubsection{Additive decomposition}
trend + seasonal effect + remainder:
$$X_t = m_t + s_t + R_t$$
Does not occur very often in reality!

\subsubsection{Multiplicative decomposition}
In most real-world series, the additive decomposition does not apply, as seasonal and random variation increase with the level. It is often better to use
$$\log(X_t) = \log(m_t+ s_t + R_t) = \log(m_t) + \log(s_t) + \log(R_t) = m_t' + s_t' + R_t'$$

\subsubsection{Differencing}
We assume a series with an additive trend, but no seasonal variation. We can write: $X_t = m_t + R_t$ . If we perfom differencing and assume a slowly-varying trend with $m_t  \approx m_{t+1}$, we obtain
$$Y_t = X_t - X_{t-1} \approx R_t - R_{t-1}$$
\begin{itemize}
    \item Note that $Y_t$ are the observation-to-observation changes in the series, but no longer the observations or the remainder.
    \item This may (or may not) remove trend/seasonality, but does not yield estimates for $m_t$ and $s_t$ , and not even for $R_t$.
    \item For a slow, curvy trend, the mean is zero: $E[Y_t] = 0$
\end{itemize}
It is important to know that differencing creates artificial new dependencies that are different from the original ones. For illustration, consider a stochastically independent remainder:
\begin{align*}
    \mathrm{Cov}(Y_t) &= \mathrm{Cov}(R_t - R_{t-1} ,R_{t-1} - R_{t-2}) \\
    &= -\mathrm{Cov}(R_{t-1},R_{t-1}) \\
    &\neq 0 \\
\end{align*}

\subsubsection{Higher order differencing}
The “normal” differencing from above managed to remove any linear trend from the data. In case of polynomial trend, that is no longer true. But we can take higher-order differences:
$$X_t = \alpha + \beta_1 t +  \beta_2 t^2 + R_t$$
where $R_t$ is stationary
\begin{align*}
    Y_t &= (1-B)^2 X_t \\
    &= (X_t - X_{t-1}) - (X_{t-1} - X_{t-2}) \\
    &= R_t - 2R_{t-1} + R_{t-2} + 2\beta_2 
\end{align*}

Where $B$ denotes the \textbf{backshift-operator}: $B(X_t) = X_{t-1}$ \\
\vspace{.2cm}
We basically get the difference of the differences

\subsubsection{Removing seasonal trends}
Time series with seasonal effects can be made stationary through differencing by comparing to the previous periods’ value.
$$Y_t = (1-B^p)X_t = X_t - X_{t-p}$$
\begin{itemize}
    \item Here, $p$ is the frequency of the series.
    \item A potential trend which is exactly linear will be removed by the above form of seasonal differencing.
    \item In practice, trends are rarely linear but slowly varying: $m_t \approx m_{t-1}$. However, here we compare $m_t$ with $m_{t-p}$, which means that seasonal differencing often fails to remove trends completely.
\end{itemize}
\subsubsection{Pros and cons of Differencing}
+ trend and seasonal effect can be removed \\
+ procedure is very quick and very simple to implement \\
- $\hat{m_t}, \hat{s_t}, \hat{R_T}$ are not known, and cannot be visualised \\
- resulting time series will be shorter than the original \\
- differencing leads to strong artificial dependencies \\
- extrapolation of $\hat{m_t}, \hat{s_t}$ is not easily possible
 
\subsection{Smoothing and filtering}
In the absence of a seasonal effect, the trend of a non-stationary time series can be determined by applying any additive, linear filter. We obtain a new time series $\hat{m_t}$, representing the trend (running mean):
$$\hat{m_t} = \sum_{i=-p}^q a_i X_{t+i}$$
\begin{itemize}
    \item the window, defined by $p$ and $q$, can or can‘t be symmetric.
    \item the weights, given by $a_i$ , can or can‘t be uniformly distributed.
    \item most popular is to rely on $p = q$ and $a_i = 1/(2p+1)$.
    \item other smoothing procedures can be applied, too.
\end{itemize}

In the presence a seasonal effect, smoothing approaches are still valid for estimating the trend. We have to make sure that the sum is taken over an entire season, i.e. for monthly data:
$$\hat{m_t} = \frac{1}{12}(\frac{1}{2}X_{t-6}+X_{t-5}+\dots+X_{t+5}+\frac{1}{2}X_{t+6}) \; \mathrm{for} \, t=7,\dots,n-6$$

\subsubsection{Estimating seasonal effects}
An estimate of the seasonal effect $s_t$ at time $t$ can be obtained by:
$$\hat{s_t} = x_t - \hat{m_t}$$
We basically substract the trend from the data.

\subsubsection{Estimating remainder}
$$\hat{R_t} = x_t - \hat{m_t} - \hat{s_t}$$

\begin{itemize}
    \item The smoothing approach is based on estimating the trend first, and then the seasonality after removal of the trend.
    \item The generalization to other periods than $p = 12$, i.e. monthly data is straighforward. Just choose a symmetric window and use uniformly distributed coefficients that sum up to 1.
    \item The sum over all seasonal effects will often be close to zero. Usually, one centers the seasonal effects to mean zero.
    \item This procedure is implemented in R with \verb|decompose()|. Note that it only works for seasonal series where at least two full periods were observed!
\end{itemize}

\subsubsection{Pros and cons of filtering and smoothing}
+ trend and seasonal effect can be estimated \\
+ $\hat{m_t}, \hat{s_t}, \hat{R_t}$ are explicitly known and can be visualised \\
+ procedure is transparent, and simple to implement \\
- resulting time series will be shorter than the original \\
- the running mean is not the very best smoother \\
- extrapolation of $\hat{m_t}, \hat{s_t}$ are not entirely obvious \\
- seasonal effect is constant over time \\

\subsection{STL-Decomposition}
\textit{Seasonal-Trend Decomposition Procedure by LOESS}
\begin{itemize}
    \item is an iterative, non-parametric smoothing algorithm
    \item yields a simultaneous estimation of trend and seasonal effect
    \item similar to what was presented above, but \textbf{more robust}!
\end{itemize}

+ very simple to apply \\
+ very illustrative and quick \\
+ seasonal effect can be constant or smoothly varying \\
- model free, extrapolation and forecasting is difficult \\

\subsubsection{Using STL in R}
\verb|stl(x, s.window = ...)|, where \verb|s.window| is the span (in lags) of the loess window for seasonal extraction, which should be odd and at least 7

\subsection{Parsimonius Decomposition}
The goal is to use a simple model that features a linear trend plus a cyclic seasonal effect and a remainder term:
$$X_t = \beta_0 + \beta_1 t + \beta_2 \sin(2\pi t) + \beta_3 \cos(2\pi t) + R_t$$

\subsection{Flexible Decomposition}
We add more flexibility (i.e. degrees of freedom) to the trend and seasonal components. We will use a GAM for this decomposition, with monthly dummy variables for the seasonal effect.
$$X_t = f(t) + \alpha_{i(t)} + R_t$$
where $t \in {1,2,...,128}$ and $i(t) \in {1,2,...,12}$ \\ 
\vspace{.2cm}
It is not a good idea to use more than quadratic polynomials. They usually fit poorly and are erractic near the boundaries.
\subsubsection{Example in R}
\begin{lstlisting}[language=R]
library(mgcv)
tnum  <- as.numeric(time(maine))
mm <- rep(c("Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep","Oct","Nov","Dec"))
mm <- factor(rep(mm,11),levels=mm)[1:128]
fit <- gam(log(maine) ~ s(tnum) + mm)
\end{lstlisting}

\section{Autocorrelation}
For most of the rest of this course, we will deal with (weakly) stationary time series. See \ref{}
\vspace{.2cm}
Definition of autocorrelation at lag $k$
$$Cor(X_{t+k},X_t) = \frac{Cov(X_{k+t},X_t)}{\sqrt{Var(X_{k+t})\cdot Var(X_t)}} = \rho(k)$$
Autocorrelation is a dimensionless measure for the strength of thelinear association between the random variables $X_{t+k}$ and $X_t$. \\
Autocorrelation estimation in a time series is based on lagged data pairs, the definitive implementation is with a plug-in estimator. \\

\vspace{.2cm}

\textbf{Example} \\
We assume $\rho(k) = 0.7$
\begin{itemize}
    \item The square of the autocorrelation, i.e. $\rho(k)^2 = 0.49$, is the percentage of variability explained by the linear association between $X_t$ and its predecessor $X_{t+k}$.
    \item Thus, in our example, $X_{t+k}$ accounts for roughly 49\% of the variability observed in random variable $X_t$. Only roughly because the world is seldom exactly linear.
    \item From this we can also conclude that any $\rho(k) < 0.4$ is not a strong association, i.e. has a small effect on the next observation only.
\end{itemize}

\scriptsize

\section*{Copyleft}

\doclicenseImage \\
Dieses Dokument ist unter (CC BY-SA 3.0) freigegeben \\
\faGlobeEurope \kern 1em \url{https://n.ethz.ch/~jannisp/ats-zf} \\
\faGit \kern 0.88em \url{https://git.thisfro.ch/thisfro/ats-zf} \\
Jannis Portmann, FS21

\section*{Referenzen}
\begin{enumerate}
    \item Skript
\end{enumerate}

\section*{Bildquellen}
\begin{itemize}
    \item Bild
\end{itemize}

\end{multicols*}

\end{document}