Last chapters

master
jannisp 2021-08-27 21:54:27 +02:00
parent 2f08ed9f1b
commit ccf60f38b3
7 changed files with 349 additions and 4 deletions

BIN
img/aliasing.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

BIN
img/arima-forecast.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

BIN
img/periodigram.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

BIN
img/sarima-forecast.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

BIN
img/stl-forecast.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

353
main.tex
View File

@ -988,7 +988,7 @@ ARIMA-processes are non-stationary if $d > 0$, option to rewrite as non-stationa
\begin{enumerate}
\item Choose the appropriate order of differencing, usually $d = 1$ or (in rare cases) $d = 2$ , such that the result is a stationary series.
\item Analyze ACF and PACF of the differenced series. If the stylized facts of an ARMA process are present, decide for the orders $p$ and $q$.
\item Fit the model using the arima() procedure. This can be done on the original series by setting $d$ accordingly, or on the differences, by setting $d = 0$ and argument \verb|include.mean=FALSE|.
\item Fit the model using the \verb|arima()| procedure. This can be done on the original series by setting $d$ accordingly, or on the differences, by setting $d = 0$ and argument \verb|include.mean=FALSE|.
\item Analyze the residuals; these must look like White Noise. If several competing models are appropriate, use AIC to decide for the winner.
\end{enumerate}
@ -1191,6 +1191,17 @@ where $\hat{\sigma}_k^2$ is equal to the k-step forecast variance.
\item In R, functions \verb|predict()| and \verb|forecast()| implement all this!
\end{itemize}
\subsubsection{ARMA(p,q) forecasting}
Similar to before
$$\hat{X}_{n+1;1:n} = \sum_{i=1}^{p} \alpha_i x_{n+k-i}^* + \sum_{j=1}^{q} \beta_j x_{n+k-j}^*$$
\begin{itemize}
\item Any ARMA($p,q$) forecast converges to the global mean.
\item The size of the prediction interval for $k \rightarrow \infty$ converges to an interval that is determined by the global process variance.
\item If using a Box-Cox transformation with $0 \leq \lambda < 1$, the prediction interval on the original scale will be asymmetric.
\item Due to this asymmetry, it is better to use MAPE for evaluating the performance.
\end{itemize}
\subsection{Forecasting with trend and seasonality}
Time series with a trend and/or seasonal effect can either be predicted after decomposing or with exponential smoothing. It is also very easy and quick to predict from a SARIMA model.
\begin{itemize}
@ -1199,11 +1210,347 @@ Time series with a trend and/or seasonal effect can either be predicted after de
\item Finally, these forecasts need to be integrated again. This procedure has a bit the touch of a black box approach.
\end{itemize}
\subsubsection{ARIMA-models}
\subsubsection{ARIMA forecasting}
We assume that $X_t$ is an ARIMA($p,1,q$) series, so after lag $1$ differencing, we have $Y_t = X_t - X_{t-1}$ which is an ARMA($p,q$).
\begin{itemize}
\item Anchor: $\hat{X}_{n+1;1:n} = \hat{Y}_{1+n;1:n} + x_n$
\end{itemize}
The longer horizon forecasts with $k > 1$ are obtained from:
\begin{align*}
\hat{X}_{n+1;1:n} &= \hat{Y}_{n+1;1:n} + x_n \\
\hat{X}_{n+2;1:n} &= \hat{Y}_{n+2;1:n} + \hat{X}_{n+1;1:n} = x_n + \hat{Y}_{n+1;1:n} + \hat{Y}_{n+2;1:n} \\
& \vdots \\
\hat{X}_{n+k;1:n} &= x_n + \hat{Y}_{n+1;1:n} + \dots + \hat{Y}_{n+k;1:n}
\end{align*}
ARIMA processes are aimed at unit-root processes which are non-stationary, but do not necessarily feature a deterministic (e.g. linear) trend. We observe the following behavior:
\begin{itemize}
\item If $d = 1$ , the forecast from an ARIMA($p,1,q$) will converge to a constant value, i.e. the global mean of the time series.
\item ARIMA ($p,1,q$) prediction interval do not converge to constant size for $k \rightarrow \infty$, but are indefinitely increasing in width.
\item In particular, an ARIMA forecast always fails to pick up a linear trend in the data. If such a thing exists, we need to add a so-called drift term.
\end{itemize}
\subsubsection{ARIMA with drift term}
To capture a trend we can use
\begin{lstlisting}[language=R]
> fit <- Arima(dat, order=c(1,0,1), include.drift=TRUE, include.mean=FALSE)
\end{lstlisting}
\begin{figure}[H]
\centering
\includegraphics[width=.3\textwidth]{arima-forecast.png}
\label{fig:arima-forecast}
\caption{Forecast from ARIMA(1,0,1) with drift}
\end{figure}
\subsubsection{SARIMA forecasting}
\begin{itemize}
\item When SARIMA models are used for forecasting, they will pick-up both the latest seasonality and trend in the data.
\item Due to the double differencing that is usually applied, there is no need/option to include a drift term for covering trends.
\item As we can see, the prognosis intervals also cover the effect of trend and seasonality. They become (much) wider for longer forecasting horizons.
\item There is no control about the trend forecast, nor can we take any interventions about it. This leaves room for decomposition based forecasting with more freedom.
\end{itemize}
\begin{lstlisting}[language=R]
> fit <- auto.arima(train, lambda=0)
\end{lstlisting}
\begin{figure}[H]
\centering
\includegraphics[width=.3\textwidth]{sarima-forecast.png}
\label{fig:sarima-forecast}
\caption{Forecast from SARIMA(0,0,1)(0,1,1)[12]}
\end{figure}
\subsection{Forecasting decomposed series}
The principle for forecasting time series that are decomposed into trend, seasonal effect and remainder is:
\begin{enumerate}
\item \textbf{Stationary Remainder} \\ Is usually modelled with an ARMA ($p,q$) , so we can generate a time series forecast with the methodology from before.
\item \textbf{Seasonal Effect} \\ Is assumed as remaining “as is”, or “as it was last” (in the case of evolving seasonal effect) and extrapolated.
\item \textbf{Trend} \\ Is either extrapolated linearly, or sometimes even manually.
\end{enumerate}
\subsubsection{Using R}
A much simpler forecasting procedure for decomposed series is available in R. Just three lines of code are good enough.
\begin{lstlisting}[language=R]
> fit <- stl(log(tsd), s.window="periodic")
> plot(forecast(fit, lambda=0, biasadj=TRUE, level=95))
\end{lstlisting}
\begin{figure}[H]
\centering
\includegraphics[width=.3\textwidth]{stl-forecast.png}
\label{fig:stl-forecast}
\caption{Forecasts from STL + ETS(A,N,N)}
\end{figure}
Approach behind:
\begin{itemize}
\item The time series is decomposed and deseasonalized
\item The last observed year of the seasonality is extrapolated
\item The \verb|seasadj()| series is automatically forecasted using a) exponential smoothing, b) ARIMA, c) a random walk with drift or any custom method.
\end{itemize}
\subsection{Exponential smoothing}
\subsubsection{Simple exponential smoothing}
This is a quick approach for estimating the current level of a time series, as well as for forecasting future values. It works for any stationary time series \textbf{without a trend and season.}
$$\hat{X}_{n+1;1:n} = \sum_{i=0}^{n-1} w_i x_{n-1}$$
where $w_0 \geq w_1 \geq \dots \geq 0$ and $ \sum_{i=0}^{n-1} w_i = 1$ \\
\vspace{.2cm}
The weights are often chosen to be exponentially decaying.
$$X_t = \mu_t + E_t$$
\begin{itemize}
\item $\mu_t$ is the conditional expectation, which we try to estimate from the data. The estimate $a_t$ is called level of the series.
\item $E_t$ is a completely random innovation term
\end{itemize}
Estimation of the level (two notions):
\begin{itemize}
\item Weighted updating: $a_t = \alpha x_t + (1-\alpha)a_{t-1}$
\item Exponential smoothing: $a_t = \displaystyle\sum_{i=0}^{\infty} \alpha(1-\alpha)^i x_{t-i} = \displaystyle\sum_{i=0}^{t-1} \alpha(1-\alpha)^i + (1-\alpha)^t x_0$
\end{itemize}
\subsubsection{Forecasting and parameter estimation}
The forecast, for any horizon $k > 0$ is:
$$\hat{X}_{n+k;1:n} = a_n$$
Hence, the forecast is given by the current level, and it is constant for all horizons $k$. However, it does depend on the
choice of the smoothing parameter $\alpha$ . In R, a data-adaptive solution is available by minimizing SS1 PE:
\begin{itemize}
\item 1-step-prediction-error: $e_t = x_t - hat{X}_{t;1:(t-1) = x_t - a_{t-1}}$
\item $\hat{\alpha} = \arg \min \alpha \displaystyle\sum_{i=2}^n e_t^2$
\end{itemize}
Example in script page 185ff
\subsubsection{Holt-Winters method}
Purpose:
\begin{itemize}
\item is for time series with deterministic trend and/or seasonality
\item this is the additive version, a multiplicative one exists, too
\item again based in iteratively cycling through the equation(s)
\end{itemize}
Is based on these 3 \textbf{smoothing equations} with $0 < \alpha, \beta, \gamma < 1$, the idea updating the previous value with current information:
\begin{align*}
a_t &= \alpha(x_t - s_{t-p}) + (1-\alpha)(a_{t-1} + b_{t-1}) \\
b_t &= \beta(a_t - a_{t-1}) + (1-\beta) b_{t-1} \\
s_t &= \gamma(x_t - a_t) + (1-\gamma) s_{t-p}
\end{align*}
\textbf{Forecasting equation}:
$$\hat{X}_{n+k;1:n} = a_n + k b_n + s_{n+k-p}$$
\begin{lstlisting}[language=R]
> fit <- HoltWinters(log(aww)); fit
Holt-Winters exponential smoothing with trend and additive seasonal component.
Smoothing parameters:
alpha=0.4148028; beta=0; gamma=0.4741967
Coefficients:
a 5.62591329; b 0.01148402
s1 -0.01230437; s2 0.01344762; s3 0.06000025
s4 0.20894897; s5 0.45515787; s6 -0.37315236
s7 -0.09709593; s8 -0.25718994; s9 -0.17107682
s10 -0.29304652; s11 -0.26986816; s12 -0.01984965
\end{lstlisting}
Example in script pages 190ff
\subsection{Forecasting using ETS models}
This is an \textbf{ExponenTial Smoothing} approach that is designed for forecasting time series with various properties (i.e. trend, seasonality, additive/multiplicative, etc.)
\begin{itemize}
\item With the R function \verb|ets()|, an automatic search for the best fitting model among 30 candidates is carried out.
\item The coefficients of these models are (by default) estimated using the Maximum-Likelihood-Principle.
\item Model selection happens using AIC , BIC or (by default) with the corrected AICc $=$ AIC $+$ $2(p + 1)(p + 2)/(n - p)$.
\item The function outputs point and interval forecasts and also allows for convenient graphical display of the results.
\end{itemize}
The \verb|ets()| function in R works fully automatic:
\begin{itemize}
\item It recognizes by itself whether a multiplicative model (i.e. a log-transformation behind the scenes) is required or not.
\item It correctly deals with and finds the appropriate model for series with trend or seasonal effect, or both or none of that.
\item From the manual: a 3-character string identifies the model used. The first letter denotes the error type \verb|("A", "M" or "Z")|; \\ the second letter denotes the trend type \verb|("N","A","M" or "Z")|; \\ and the third letter denotes the season type \verb|("N","A","M" or "Z")|. \\ In all cases, \verb|"N"|=none, \verb|"A"|=additive, \verb|"M"|=multiplicative and \verb|"Z"|=automatically selected.
\end{itemize}
\subsection{Using external factors}
Time series forecasting as we will discuss it is just based on the past observed data and does not incorporate any external factors (i.e. acquisition, competitors, market share, ...):
\begin{itemize}
\item The influence of external factors is usually hard to quantify even in the past. If a model can be built, we still need to extrapolate all the external factors into the future.
\item It is usually very difficult to organize reliable data for this.
\item Alternative: generate time series forecasts as shown here.
\item These forecasts are to be seen as a basis for discussion, manual modification is still possible if appropriate.
\end{itemize}
\section{Multivariate time series analysis}
Goal: Infer the relation between two time series
$$X_1 = (X_{1,t}); \; X_2 = (X_{2,t})$$
What is the difference to time series regression?
\begin{itemize}
\item Here, the two series arise „on an equal footing“, and we are interested in the correlation between them.
\item In time series regression, the two (or more) series are causally related and we are interested in inferring that relation. There is an independent and several dependent variables.
\item The difference is comparable to the difference between correlation and regression.
\end{itemize}
\subsection{Cross covariance}
The cross correlations describe the relation between two time series. However, note that the interpretation is quite tricky! \\
\vspace{.2cm}
usual «wihtin seris» covariance:
$$\gamma_{11}(k) = Cov(X_{1,t+k},X_{1,t})$$
$$\gamma_{22}(k) = Cov(X_{2,t+k},X_{2,t})$$
cross covariance, independent from $t$:
$$\gamma_{12}(k) = Cov(X_{1,t+k},X_{2,t})$$
$$\gamma_{21}(k) = Cov(X_{2,t+k},X_{1,t})$$
Also, we have:
$$\gamma_{12}(-k) = Cov(X_{1,t-k},X_{2,t}) = Cov(X_{2,t+k},X_{1,t}) = \gamma_{21}(k)$$
\subsection{Cross correlations}
It suffices to analyze $\gamma_{12}(k)$, and neglect $\gamma_{21}(k)$, but we have to regard both positive and negative lags $k$. We again prefer to work with correlations:
$$\rho_{12}(k) = \frac{\gamma_{12}(k)}{\sqrt{\gamma_{11}(0) \gamma_{22}(0)}}$$
which describe the linear relation between two values of $X_1$ and $X_2$, when the series $X_1$ is $k$ time units ahead.
\subsubsection{Estimation}
Cross covariances and correlations are estimated as follows:
$$\hat{\gamma}_{12}(k) = \frac{1}{n} \sum_t (x_{1,t+k} - \bar{x}_1)(x_{2,t} - \bar{x}_2)$$
and
$$\hat{\rho}_{12} = \frac{\hat{\gamma}_{12}(k)}{\sqrt{\hat{\gamma}_{11}(0) \hat{\gamma}_{22}(0)}}$$
The plot of $\hat{\rho}_{12}(k)$ versus the lag $k$ is called the \textbf{cross correlogram}. It has to be inspected for both $+$ and $ k$.
\subsection{Cross correlogram}
\begin{figure}[H]
\centering
\includegraphics[width=.3\textwidth]{cross-correlogram-example.png}
\label{fig:cross-correlogram-example}
\caption{Example cross correlogram}
\end{figure}
The confidence bounds in the sample cross correlation are only valid in some special cases, i.e. if there is no cross correlation and at least one of the series is uncorrelated. \textbf{Note}: the confidence bounds are often too small!
\subsubsection{Special case I}
We assume that there is no cross correlation for large lags $k$: \\
If $\rho_{12}(j) = 0$ for $|j| \geq m$ we have for $|k|\geq m:$
$$Var(\hat{\rho}_{12}(j)) \approx \frac{1}{n} \sum_{j=-\infty}^\infty (\rho_{11}(j) \rho_{22}(j) + \rho_{12}(j+k) \rho_{12}(j-k))$$
This goes to zero for large $n$ and we thus have consistency. For giving statements about the confidence bounds, we would have to know more about the cross correlations, though.
\subsubsection{Special case II}
There is no cross correlation, but $X_1$ and $X_2$ are both time series that show correlation „within“:
$$Var(\hat{\rho}_{12}(k)) \approx \frac{1}{n} \sum_{j=-\infty}^\infty (\rho_{11}(j) \rho_{22}(j)$$
\subsubsection{Special case III}
There is no cross correlation, and $X_1$ is a White Noise series that is independent from $X_2$. Then, the estimation variance simplifies to:
$$Var(\hat{\rho}(k)) \approx \frac{1}{n}$$
Thus, the confidence bounds are valid in this case. \\
\vspace{.2cm}
However, we introduced the concept of cross correlation to infer the relation between correlated series. The trick of the so-called «prewhitening» helps.
\subsection{Prewhitening}
Prewhitening means that the time series is transformed such that it becomes a white noise process, i.e. is uncorrelated. \\
\vspace{.2cm}
We assume that both stationary processes $X_1$ and be rewritten as follows:
$$U_t = \sum_{i=0}^\infty a_i X_{1,t-i} \; \mathrm{and} \; V_t = \sum_{i=0}^\infty b_i X_{2,t-i}$$
with uncorrelated $U_t$ and $V_t$. Note that this is possible for ARMA($p,q$) processes by writing them as an AR($\infty$). The left hand side of the equation then is the innovation.
\subsubsection{Cross correlation of prewhitened series}
The cross correlation between $U_t$ and $V_t$ can be derived from the one between $X_1$ and $X_2$:
$$\rho_{UV}(k) = \sum_{i=0}^\infty \sum_{j=0}^\infty a_i b_j \rho_{X_1 X_2}(k+i-j)$$
Thus
$$\rho_{UV}(k) = 0 \, \forall \, k \Leftrightarrow \rho_{X_1 X_2}(k) = 0 \, \forall \, k$$
\subsection{Vector autoregression (VAR)}
What if we do not have an input/output system, but there are cross correlations and hence influence between both variables? \\
\vspace{.2cm}
A VAR model is a generalization of the univariate AR-model. It has one equation per variable in the system. We keep it simple and consider a 2-variable VAR at lag 1.
$$Y_t = c_1 + \phi_{11,1} Y_{1,t-1} + \phi_{12,1} Y_{2,t-1} + E_{1,t}$$
$$Y_t = c_2 + \phi_{21,1} Y_{1,t-1} + \phi_{22,1} Y_{2,t-1} + E_{2,t}$$
Here, $E_1$ and $E_2$ are both White Noise processes, but not strictly uncorrelated among each other. \\
\section{Spectral analysis}
\begin{tabular}{lp{0.27\textwidth}}
Basis: & Many time series show (stochastic) periodic behavior. The goal of spectral analysis is to understand the
cycles at which highs and lows in the data appear. \\
Idea: & Time series are interpreted as a combination of cyclic components. For observed series, a decomposition into a linear combination of harmonic oscillations is set up and used as a basis for estimating the spectrum. \\
Why: & As a descriptive means, showing the character and the dependency structure within the series. There are
some important applications in engineering, economics and medicine.
\end{tabular}
\subsection{Harmonic oscillations}
The most simple periodic functions are sine and cosine, which we will use as the basis of our decomposition analysis.
$$y(t) = \alpha \cos(2\pi \nu t) + \beta \sin(2\pi \nu t)$$
\begin{itemize}
\item In discrete time, we have aliasing, i.e. some frequencies cannot be distinguished (See \ref{aliasing})
\item The periodic analysis is limited to frequencies between 0 and 0.5, i.e. things we observe at least twice in the series.
\end{itemize}
\subsection{Regression model for decomposition}
We can decompose any time series with a regression model containing sine and cosine terms at the fourier frequencies.
$$X_t = \alpha_0 + \sum_{k=1}^m (\alpha_k \cos(2\pi \nu_k t) + \beta_k \sin(2\pi \nu_k t)) + E_t$$
where $\nu_k = \frac{k}{n}$ for $k = 1,\dots,m$ with $m \in (n/2)$ \\
\vspace{.2cm}
We are limited to this set of frequencies which provides an orthogonal fit. As we are spending $n$ degrees of freedom on $n$ we will have a perfect fit with zero residuals. \\
\vspace{.2cm}
Note that the Fourier frequencies are not necessarily the correct frequencies, there may be aliasing and leakage problems.
\subsection{Aliasing}\label{aliasing}
The aliasing problem is based on the fact that if frequency $\nu$ fits the data, then frequencies $\nu +1, \nu + 2, ...$ will do so, too.
\begin{figure}[H]
\centering
\includegraphics[width=.3\textwidth]{aliasing.png}
\label{fig:aliasing}
\caption{Example aliasing}
\end{figure}
\subsection{Periodogram}
If frequency $\nu_k$ is omitted from the decomposition model, the residual sum of squares increases by the amount of:
$$\frac{n}{2} \big( \hat{\alpha}_k + \hat{\beta}_k \big) = 2 I_n (\nu_k), \, \mathrm{for} \, k=1,\dots,m$$
This values measures the importance of $\nu_k$ in the spectral decompostion and is the basis of the raw periodogram, which shows that importance for all Fourier frequencies. \\
\vspace{.2cm}
Note: the period of frequency $\nu_k$ is $1/\nu_k = n/k$. Or we can also say that the respective peaks at this frequency repeat themselves for $k$ time in the observed time series.
\begin{lstlisting}[language=R]
> spec.pgram(log(lynx), log="no", type="h")
\end{lstlisting}
\begin{figure}[H]
\centering
\includegraphics[width=.3\textwidth]{periodigram.png}
\label{fig:periodigram}
\caption{Raw Periodogram of log(lynx)}
\end{figure}
\subsection{The spectrum}
The spectrum of a time series process is a function telling us the importance of particular frequencies to the variation of the series.
\begin{itemize}
\item Usually, time series processes have a continous frequency spectrum and do not only consist of a few single frequencies.
\item For ARMA($p,q$) process, the spectrum is continous and there are explicit formulae, depending on the model parameters.
\item Subsequently, we will pursue the difficult task of estimating the spectrum, based on the raw periodogram.
\item There is a 1:1 correspondence between the autocovariance function of a time series process and its spectrum.
\end{itemize}
Our goal is estimating the spectrum of e.g. an ARMA($p,q$). There is quite a discrepancy between the discrete raw periodogram and the continous spectrum. The following issues arise:
\begin{itemize}
\item The periodogram is noisy, and there may be leakage.
\item The periodogram value at frequency $\nu_k$ is an unbiased estimator of the spectrum value $f (\nu_k)$. However, it is inconsistent due to its variability, owing to the fact that we estimate n periodogram values from n observations.
\item Theory tells us that $\nu_k$ and $\nu_j$ for $k \neq j$ are asymptotically independent. This will be exploited to improve estimation.
\end{itemize}
\subsection{Smoothing the periodogram}
Due to asymptotic independence and unbiasedness and the smooth nature of the spectrum, smoothing approaches help in achieving qualitatively good, consistent spectral estimates. \\
\subsubsection{Running mean estimator}
$$f(\hat{v}_j) = \frac{1}{2L+1} \sum_{k=-L}^L I_n(\nu_{j+k})$$
The choice of the bandwidth $B = 2 L / n$ is crucial. If chosen appropriately, the spectral estimates at the Fourier frequencies will be consistent.
\subsubsection{Daniell smoother}
An option for improving the Running Mean is to use weights. They need to be symmetric, decaying and sum up to one. Weighted running mean:
$$f(\hat{v}_j) = \frac{1}{2L+1} \sum_{k=-L}^L w_k I_n(\nu_{j+k})$$
The challenge lies in the choice of the weights. The Daniell Smoother is a Weighted Running Mean with $w_k = 1/ 2 L$ for $k < L$ and $w_k = 1/ 4 L$ for $k = L$. This is the default in the R function \verb|spec.pgram()| if argument \verb|spans=2L+1|
\subsubsection{Tapering}
Tapering is a technique to further improve spectral estimates. The R function \verb|spec.pgram()| applies it by default, and unless you know much better, you must keep it that way.
\begin{itemize}
\item In spectral analysis, a time series is seen as a finite sample with a rectangular window of an infinitely long process.
\item This rectangular window distorts spectral estimation in several ways, among others also via the effect of leaking.
\item Tapering means that the ends of the time series are altered to mitigate these effects, i.e. they gradually taper down towards zero.
\end{itemize}
\subsection{Model-based spectral estimation}
The fundamental idea for this type of spectral estimate is to fit an AR(p) model to an observed series and then derive the theoretical spectrum by plugging-in the estimated coefficients
\begin{itemize}
\item This approach is not related to the periodogram based smoothing approaches presented before.
\item By nature, it alwas provides a smooth spectral estimate.
\item There is an excellent implementation in R: \verb|spec.ar()|.
\end{itemize}
Please note that spectral estimates are usually plotted on the dB-scale which is logarithmic. Also, the R function provides a confidence interval.
\section{General concepts}
\subsection{AIC}
@ -1219,8 +1566,6 @@ $$AICc = AIC + \frac{2(p + q + k + 1)(p + q + k + 2)}{n - p - q - k - 2}$$
\scriptsize
\newpage
\section*{Copyright}
Nearly everything is copy paste from the slides or the script. Copyright belongs to M. Dettling \\
\faGlobeEurope \kern 1em \url{https://n.ethz.ch/~jannisp/ats-zf} \\