For the \textbf{time series process}, we have to assume the following
\subsection{Stochastic Model}
From the lecture
\begin{quote}
A time series process is a set $\{X_t, t \in T\}$ of random variables, where $T$ is the set of times. Each of the random variables $X_t,t \in t$ has a univariate probability distribution $F_t$.
\end{quote}
\begin{itemize}
\item If we exclusively consider time series processes with
equidistant time intervals, we can enumerate $\{T =1,2,3,...\}$
\item An observed time series is a realization of $X =\{X_1 ,..., X_n\}$,
and is denoted with small letters as $x =(x_1 ,... , x_n)$.
\item We have a multivariate distribution, but only 1 observation
(i.e. 1 realization from this distribution) is available. In order
to perform “statistics”, we require some additional structure.
\end{itemize}
\subsection{Stationarity}
\subsubsection{Strict}
For being able to do statistics with time series, we require that the
series “doesn’t change its probabilistic character” over time. This is
mathematically formulated by strict stationarity.
\begin{quote}
A time series $\{X_t, t \in T\}$ is strictly stationary, if the joint distribution of the random vector $(X_t ,... , X_{t+k})$ is equal to the one of $(X_s ,... , X_{s+k})$ for all combinations of $t,s$ and $k$
\end{quote}
\begin{tabular}{ll}
$X_t \sim F$& all $X_t$ are identically distributed \\
$E[X_t]=\mu$& all $X_t$ have identical expected value \\
$Var(X_t)=\sigma^2$& all $X_t$ have identical variance \\
$Cov[X_t,X_{t+h}]=\gamma_h$& autocovariance depends only on lag $h$\\
However, with strict stationarity, even finding evidence only is too difficult. We thus resort to the concept of weak stationarity.
\begin{quote}
A time series $\{X_t , t \in T\}$ is said to be weakly stationary, if \\
$E[X_t]=\mu$\\
$Cov(X_t,X_{t+h}=\gamma_h)$, for all lags $h$\\
and thus $Var(X_t)=\sigma^2$
\end{quote}
\subsubsection{Testing stationarity}
\begin{itemize}
\item In time series analysis, we need to verify whether the series has arisen from a stationary process or not. Be careful: stationarity is a property of the process, and not of the data.
\item Treat stationarity as a hypothesis! We may be able to reject it when the data strongly speak against it. However, we can never prove stationarity with data. At best, it is plausible.
\item Formal tests for stationarity do exist. We discourage their use due to their low power for detecting general non-stationarity, as well as their complexity.
\end{itemize}
\textbf{Evidence for non-stationarity}
\begin{itemize}
\item Trend, i.e. non-constant expected value
\item Seasonality, i.e. deterministic, periodical oscillations
\item Non-constant variance, i.e. multiplicative error
\item Non-constant dependency structure
\end{itemize}
\textbf{Strategies for Detecting Non-Stationarity}
\begin{itemize}
\item Time series plot
\subitem - non-constant expected value (trend/seasonal effect)
\subitem - changes in the dependency structure
\subitem - non-constant variance
\item Correlogram (presented later...)
\subitem - non-constant expected value (trend/seasonal effect)
\subitem - changes in the dependency structure
\end{itemize}
A (sometimes) useful trick, especially when working with the correlogram, is to split up the series in two or more parts, and producing plots for each of the pieces separately.
Such linear transformations will not change the appereance of the series. All derived results (i.e. autocorrelations, models, forecasts) will be equivalent. Hence, we are free to perform linear transformations whenever it seems convenient.
\subsection{Log-Transformation}
Transforming $x_1,...,x_n$ to $g(x_1),...,g(x_n)$
$$g(\cdot)=\log(\cdot)$$
\textbf{Note:}
\begin{itemize}
\item If a time series gets log-transformed, we will study its character and its dependencies on the transformed scale. This is also where we will fit time series models.
\item If forecasts are produced, one is most often interested in the value on the original scale.
\end{itemize}
\subsubsection{When to apply log-transformation}
As we argued above, a log-transformation of the data often facilitates estimation, fitting and interpretation. When is it indicated to log-transform the data?
\begin{itemize}
\item If the time series is on a relative scale, i.e. where an absolute increment changes its meaning with the level of the series (e.g. 10 $\rightarrow$ 20 is not the same as 100 $\rightarrow$ 110).
\item If the time series is on a scale which is left closed with value zero, and right open, i.e. cannot take negative values.
\item If the marginal distribution of the time series (i.e. when analyzed with a histogram) is right-skewed.
Box-Cox transformations, in contrast to $\log$ have no easy interpretation. Hence, they are mostly applied if utterly necessary or if the principal goal is (black-box) forecasting.
\begin{itemize}
\item In practice, one often prefers the $\log$ if $|\lambda| < 0.3$ or does w/o transformation if $|\lambda-1| < 0.3$.
\item For an unbiased forecast, correction is needed!
\end{itemize}
\subsection{Decomposition of time series}
\subsubsection{Additive decomposition}
trend + seasonal effect + remainder:
$$X_t = m_t + s_t + R_t$$
Does not occur very often in reality!
\subsubsection{Multiplicative decomposition}
In most real-world series, the additive decomposition does not apply, as seasonal and random variation increase with the level. It is often better to use
We assume a series with an additive trend, but no seasonal variation. We can write: $X_t = m_t + R_t$ . If we perfom differencing and assume a slowly-varying trend with $m_t \approx m_{t+1}$, we obtain
$$Y_t = X_t - X_{t-1}\approx R_t - R_{t-1}$$
\begin{itemize}
\item Note that $Y_t$ are the observation-to-observation changes in the series, but no longer the observations or the remainder.
\item This may (or may not) remove trend/seasonality, but does not yield estimates for $m_t$ and $s_t$ , and not even for $R_t$.
\item For a slow, curvy trend, the mean is zero: $E[Y_t]=0$
\end{itemize}
It is important to know that differencing creates artificial new dependencies that are different from the original ones. For illustration, consider a stochastically independent remainder:
The “normal” differencing from above managed to remove any linear trend from the data. In case of polynomial trend, that is no longer true. But we can take higher-order differences:
$$X_t =\alpha+\beta_1 t +\beta_2 t^2+ R_t$$
where $R_t$ is stationary
\begin{align*}
Y_t &= (1-B)^2 X_t \\
&= (X_t - X_{t-1}) - (X_{t-1} - X_{t-2}) \\
&= R_t - 2R_{t-1} + R_{t-2} + 2\beta_2
\end{align*}
Where $B$ denotes the \textbf{backshift-operator}: $B(X_t)= X_{t-1}$\\
\vspace{.2cm}
We basically get the difference of the differences
\subsubsection{Removing seasonal trends}
Time series with seasonal effects can be made stationary through differencing by comparing to the previous periods’ value.
$$Y_t =(1-B^p)X_t = X_t - X_{t-p}$$
\begin{itemize}
\item Here, $p$ is the frequency of the series.
\item A potential trend which is exactly linear will be removed by the above form of seasonal differencing.
\item In practice, trends are rarely linear but slowly varying: $m_t \approx m_{t-1}$. However, here we compare $m_t$ with $m_{t-p}$, which means that seasonal differencing often fails to remove trends completely.
\end{itemize}
\subsubsection{Pros and cons of Differencing}
+ trend and seasonal effect can be removed \\
+ procedure is very quick and very simple to implement \\
- $\hat{m_t}, \hat{s_t}, \hat{R_T}$ are not known, and cannot be visualised \\
- resulting time series will be shorter than the original \\
- differencing leads to strong artificial dependencies \\
- extrapolation of $\hat{m_t}, \hat{s_t}$ is not easily possible
\subsection{Smoothing and filtering}
In the absence of a seasonal effect, the trend of a non-stationary time series can be determined by applying any additive, linear filter. We obtain a new time series $\hat{m_t}$, representing the trend (running mean):
$$\hat{m_t}=\sum_{i=-p}^q a_i X_{t+i}$$
\begin{itemize}
\item the window, defined by $p$ and $q$, can or can‘t be symmetric.
\item the weights, given by $a_i$ , can or can‘t be uniformly distributed.
\item most popular is to rely on $p = q$ and $a_i =1/(2p+1)$.
\item other smoothing procedures can be applied, too.
\end{itemize}
In the presence a seasonal effect, smoothing approaches are still valid for estimating the trend. We have to make sure that the sum is taken over an entire season, i.e. for monthly data:
An estimate of the seasonal effect $s_t$ at time $t$ can be obtained by:
$$\hat{s_t}= x_t -\hat{m_t}$$
We basically substract the trend from the data.
\subsubsection{Estimating remainder}
$$\hat{R_t}= x_t -\hat{m_t}-\hat{s_t}$$
\begin{itemize}
\item The smoothing approach is based on estimating the trend first, and then the seasonality after removal of the trend.
\item The generalization to other periods than $p =12$, i.e. monthly data is straighforward. Just choose a symmetric window and use uniformly distributed coefficients that sum up to 1.
\item The sum over all seasonal effects will often be close to zero. Usually, one centers the seasonal effects to mean zero.
\item This procedure is implemented in R with \verb|decompose()|. Note that it only works for seasonal series where at least two full periods were observed!
\end{itemize}
\subsubsection{Pros and cons of filtering and smoothing}
+ trend and seasonal effect can be estimated \\
+ $\hat{m_t}, \hat{s_t}, \hat{R_t}$ are explicitly known and can be visualised \\
+ procedure is transparent, and simple to implement \\
- resulting time series will be shorter than the original \\
- the running mean is not the very best smoother \\
- extrapolation of $\hat{m_t}, \hat{s_t}$ are not entirely obvious \\
- seasonal effect is constant over time \\
\subsection{STL-Decomposition}
\textit{Seasonal-Trend Decomposition Procedure by LOESS}
\begin{itemize}
\item is an iterative, non-parametric smoothing algorithm
\item yields a simultaneous estimation of trend and seasonal effect
\item similar to what was presented above, but \textbf{more robust}!
\end{itemize}
+ very simple to apply \\
+ very illustrative and quick \\
+ seasonal effect can be constant or smoothly varying \\
- model free, extrapolation and forecasting is difficult \\
\subsubsection{Using STL in R}
\verb|stl(x, s.window = ...)|, where \verb|s.window| is the span (in lags) of the loess window for seasonal extraction, which should be odd and at least 7
\subsection{Parsimonius Decomposition}
The goal is to use a simple model that features a linear trend plus a cyclic seasonal effect and a remainder term:
$$X_t =\beta_0+\beta_1 t +\beta_2\sin(2\pi t)+\beta_3\cos(2\pi t)+ R_t$$
\subsection{Flexible Decomposition}
We add more flexibility (i.e. degrees of freedom) to the trend and seasonal components. We will use a GAM for this decomposition, with monthly dummy variables for the seasonal effect.
$$X_t = f(t)+\alpha_{i(t)}+ R_t$$
where $t \in{1,2,...,128}$ and $i(t)\in{1,2,...,12}$\\
\vspace{.2cm}
It is not a good idea to use more than quadratic polynomials. They usually fit poorly and are erractic near the boundaries.
\subsubsection{Example in R}
\begin{lstlisting}[language=R]
library(mgcv)
tnum <- as.numeric(time(maine))
mm <- rep(c("Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep","Oct","Nov","Dec"))
Autocorrelation is a dimensionless measure for the strength of thelinear association between the random variables $X_{t+k}$ and $X_t$. \\
Autocorrelation estimation in a time series is based on lagged data pairs, the definitive implementation is with a plug-in estimator. \\
\vspace{.2cm}
\textbf{Example}\\
We assume $\rho(k)=0.7$
\begin{itemize}
\item The square of the autocorrelation, i.e. $\rho(k)^2=0.49$, is the percentage of variability explained by the linear association between $X_t$ and its predecessor $X_{t+k}$.
\item Thus, in our example, $X_{t+k}$ accounts for roughly 49\% of the variability observed in random variable $X_t$. Only roughly because the world is seldom exactly linear.
\item From this we can also conclude that any $\rho(k) < 0.4$ is not a strong association, i.e. has a small effect on the next observation only.
Create a plot of $(x_t, x_{t+k})\,\forall\, t =1,...,n-k$ and compute the canonical Pearson correlation coefficient of these pairs and use it as an estimation for the autocorrelation $\tilde{\rho}(k)$
\caption{Lagged scatterplot estimation vs. plug-in estimation}
\label{fig:lagged-scatterplot-vs-plug-in}
\end{figure}
\subsection{Important points on ACF estimation}
\begin{itemize}
\item Correlations measure linear association and usually fail if there are non-linear associations between the variables.
\item The bigger the lag $k$ for which $\rho(k)$ is estimated, the fewer data pairs remain. Hence the higher the lag, the bigger the variability in $\hat{\rho}(k)$ .
\item To avoid spurious autocorrelation, the plug-in approach shrinks $\hat{\rho}(k)$ for large $k$ towards zero. This creates a bias, but pays off in terms of mean squared error.
\item Autocorrelations are only computed and inspected for lags up to $10\log_{10}(n)$, where they have less bias/variance
Even for an i.i.d. series $X_t$ without autocorrelation, i.e. $\rho(k)=0\,\forall\, k$, the estimates will be different from zero: $\hat{\rho}(k)\neq0$\\
\textbf{Question}: Which $\hat{\rho}(k)$ are significantly different from zero?
\item Under the null hypothesis of an i.i.d. series, a 95\% acceptance region for the null is given by the interval $\pm1.96/\sqrt{n}$
\item For any stationary series, $\hat{\rho}(k)$ within the confidence bands are considered to be different from 0 only by chance, while those outside are considered to be truly different from zero.
\end{itemize}
\textbf{Type I Errors}\\
For iid series, we need to expect 5\% of type I errors, i.e. $\hat{\rho}(k)$ that go beyond the confidence bands by chance. \\
\textbf{Non i.i.d. series}\\
The confidence bands are asymptotic for i.i.d. series. Real finite length non-i.i.d. series have different (unknown) properties.
\subsection{Ljung-box test}
The Ljung-Box approach tests the null hypothesis that a number of autocorrelation coefficients are simultaneously equal to zero. \\
Thus, it tests for significant autocorrelation in a series. The test statistic is:
The estimates $\hat{\rho}(k)$ are sensitive to outliers. They can be diagnosed using the lagged scatterplot, where every single outlier appears twice. \\
\vspace{.2cm}
\textbf{Some basic strategies for dealing with outliers}
\begin{itemize}
\item if it is bad data point: delete the observation
\item most (if not all) R functions can deal with missing data
\item if complete data are required, replace missing values with
\begin{itemize}
\item global mean of the series
\item local mean of the series, e.g. $\pm3$ observations
\item fit a time series model and predict the missing value
\end{itemize}
\end{itemize}
\subsection{Properties of estimated ACF}
\begin{itemize}
\item Appearance of the series $\Rightarrow$ Appearance of the ACF \\ Appearance of the series $\nLeftarrow$ Appearance of the ACF
\item The compensation issue: \\$\sum_{k=1}^{n-1}\hat{\rho}(k)=-1/2$\\ All estimable autocorrelation coefficients sum up to -1/2
\item For large lags $k$ , there are only few data pairs for estimating $\rho(k)$. This leads to higher variability and hence the plug-in estimates are shrunken towards zero.
\end{itemize}
\subsection{Application: Variance of the arithmetic mean}
We need to estimate the mean of a realized/observed time series. We would like to attach a standard error
\begin{itemize}
\item If we estimate the mean of a time series without taking into account the dependency, the standard error will be flawed.
\item This leads to misinterpretation of tests and confidence intervals and therefore needs to be corrected.
\item The standard error of the mean can both be over-, but also underestimated. This depends on the ACF of the series.
\item Given a time series X t , the partial autocorrelation of lag $k$, is the autocorrelation between $X_t$ and $X_{t+k}$ with the linear dependence of $X_{t+1}$ through to $X_{t+k-1}$ removed.
\item One can draw an analogy to regression. The ACF measures the „simple“ dependence between $X_t$ and $X_{t+k}$, whereas the PACF measures that dependence in a „multiple“ fashion.\footnote{See e.g. \href{https://n.ethz.ch/~jannisp/download/Mathematik-IV-Statistik/zf-statistik.pdf}{\textit{Mathematik IV}}}
\end{itemize}
$$\pi_1=\rho_1$$
$$\pi_2=\frac{\rho_2-\rho_1^2}{1-\rho_1^2}$$
for AR(1) moderls, we have $\pi_2=0$, because $\rho_2=\rho_1^2$, i.e. there is no conditional relation between $(X_t, X_{t+2} | X_{t+1})$
A time series $(W_1, W_2,..., W_n)$ is a \textbf{White Noise} series if the random variables $W_1 , W_2,...$ are i.i.d with mean zero.
\end{quote}
This implies that all $W_t$ have the same variance $\sigma_W^2$ and
$$Cov(W_i,W_j)=0\,\forall\, i \neq j$$
Thus, there is no autocorrelation either: $\rho_k =0\,\forall\, k \neq0$. \\
\vspace{.2cm}
If in addition, the variables also follow a Gaussian distribution, i.e. $W_t \sim N(0, \sigma_W^2)$, the series is called \textbf{Gaussian White Noise}. The term White Noise is due to the analogy to white light (all wavelengths are equally distributed).
\subsection{Autoregressive models (AR)}
In an $AR(p)$ process, the random variable $X_t$ depends on an autoregressive linear combination of the preceding $X_{t-1},..., X_{t-p}$, plus a „completely independent“ term called innovation $E_t$.
Here, $\Phi(B)$ is called the characteristic polynomial of the $AR(p)$. It determines most of the relevant properties of the process.
\subsubsection{AR(1)-Model}\label{ar-1}
$$X_t =\alpha_1 X_{t-1}+ E_t$$
where $E_t$ is i.i.d. with $E[E_t]=0$ and $Var(E_t)=\sigma_E^2$. We also require that $E_t$ is independent of $X_s, s<t$\\
\vspace{.2cm}
Under these conditions, $E_t$ is a causal White Noise process, or an innovation. Be aware that this is stronger than the i.i.d. requirement: not every i.i.d. process is an innovation and that property is absolutely central to $AR(p)$-modelling.
\subsubsection{AR(p)-Models and Stationarity}
$AR(p)$-models must only be fitted to stationary time series. Any potential trends and/or seasonal effects need to be removed first. We will also make sure that the processes are stationary. \\
\vspace{.2cm}
\textbf{Conditions}
Any stationary $AR(p)$-process meets
\begin{itemize}
\item$E[X_t]=\mu=0$
\item$1-\alpha_1 z +\alpha_2 z^2+ ... +\alpha_p z^p =0$ (verify with \verb|polyroot()| in R)
\end{itemize}
\subsection{Yule-Walker equations}
We observe that there exists a linear equation system built up from the $AR(p)$-coefficients and the CF-coefficients of up to lag $p$. \\
\vspace{.2cm}
We can use these equations for fitting an $AR(p)$-model:
\begin{enumerate}
\item Estimate the ACF from a time series
\item Plug-in the estimates into the Yule-Walker-Equations
\item The solution are the $AR(p)$-coefficients
\end{enumerate}
\subsection{Fitting AR(p)-models}
This involves 3 crucial steps:
\begin{enumerate}
\item Model Identification
\begin{itemize}
\item is an AR process suitable, and what is $p$?
\item will be based on ACF/PACF-Analysis
\end{itemize}
\item Parameter Estimation
\begin{itemize}
\item Regression approach
\item Yule-Walker-Equations
\item and more (MLE, Burg-Algorithm)
\end{itemize}
\item Residual Analysis
\end{enumerate}
\subsubsection{Model identification}
\begin{itemize}
\item$AR(p)$ processes are stationary
\item For all AR(p) processes, the ACF decays exponentially quickly, or is an exponentially damped sinusoid.
\item For all $AR(p)$ processes, the PACF is equal to zero for all lags $k > p$. The behavior before lag $p$ can be arbitrary.
\end{itemize}
If what we observe is fundamentally different from the above, it is unlikely that the series was generated from an $AR(p)$-process. We thus need other models, maybe more sophisticated ones.
\subsubsection{Parameter estimation}
Observed time series are rarely centered. Then, it is inappropriate to fit a pure $AR(p)$ process. All R routines by default assume the shifted process $Y_t = m + X_t$. Thus, we face the problem:
The goal is to estimate the global mean m , the AR-coefficients $\alpha_1 ,..., \alpha_p$, and some parameters defining the distribution of the innovation $E_t$. We usually assume a Gaussian, hence this is $\sigma_E^2$.\\
\vspace{.2cm}
We will discuss 4 methods for estimating the parameters:\\
\vspace{.2cm}
\textbf{OLS Estimation}\\
If we rethink the previously stated problem, we recognize a multiple linear regression problem without
intercept on the centered observations. What we do is:
\begin{enumerate}
\item Estimate $\hat{m}=\bar{y}$ and $x_t = y_t - m$
\item Run a regression without intercept on $x_t$ to obtain $\hat{\alpha_1},\dots,\hat{\alpha_p}$
\item For $\hat{\sigma_E^2}$, take the residual standard error from the output
\end{enumerate}
\vspace{.2cm}
\textbf{Burg's algorithm}\\
While OLS works, the first $p$ instances are never evaluated as responses. This is cured by Burg’s algorithm, which uses the property of time-reversal in stochastic processes. We thus evaluate the RSS of forward and backward prediction errors:
In contrast to OLS, there is no explicit solution and numerical optimization is required. This is done with a recursive method called the Durbin-Levison algorithm (implemented in R).
\begin{lstlisting}[language=R]
f.burg <- ar.burg(llynx, aic=F, order.max=2)
\end{lstlisting}
\vspace{.2cm}
\textbf{Yule-Walker Equations}\\
The Yule-Walker-Equations yield a LES that connects the true ACF with the true AR-model parameters. We plug-in the estimated ACF coefficients:
and solve the LES to obtain the AR-parameter estimates.\\
\vspace{.2cm}
In R we can use \verb|ar.yw()| \\
\vspace{.2cm}
\textbf{Maximum-likelihood-estimation}\\
Idea: Determine the parameters such that, given the observed time series $(y_1 ,\dots, y_n)$, the resulting model is the most plausible (i.e. the most likely) one. \\
This requires the choice of a probability model for the time series. By assuming Gaussian innovations, $E_t \sim N (0,\sigma_E^2)$ , any $AR(p)$ process has a multivariate normal distribution:
$$Y =(Y_1,\dots,Y_n)\sim N(m \cdot\vec{1},V)$$
with $V$ depending on $\vec{\alpha},\sigma_E^2$\\
MLE then provides simultaneous estimates by optimizing:
\item All 4 estimation methods are asymptotically equivalent and even on finite samples, the differences are usually small.
\item All 4 estimation methods are non-robust against outliers and perform best on data that are approximately Gaussian.
\item Function \verb|arima()| provides standard errors for $\hat{m}; \hat{\alpha}_1 ,\dots, \hat{\alpha}_p$ so that statements about significance become feasible and confidence intervals for the parameters can be built.
\item\verb|ar.ols()|, \verb|ar.yw()| and \verb|ar.burg()| allow for convenient choice of the optimal model order $p$ using the AIC criterion. Among these methods, \verb|ar.burg()| is usually preferred.
We can check these, using (in R: \verb|tsdisplay(resid(fit))|)
\begin{itemize}
\item Time-series plot of $\hat{E}_t$
\item ACF/PACF-plot of $\hat{E}_t$
\item QQ-plot of $\hat{E}_t$
\end{itemize}
The time-series should look like white-noise \\
\vspace{.2cm}
\textbf{Alternative}\\
Using \verb|checkresiduals()|: \\
A convenient alternative for residual analysis is this function from \verb|library(forecast)|. It only works correctly when fitting with \verb|arima()|, though.
\begin{lstlisting}[language=R]
> f.arima <- arima(log(lynx), c(11,0,0))
> checkresiduals(f.arima)
Ljung-Box test
data: Residuals from ARIMA(11,0,0) with non-zero mean
Q* = 4.7344, df = 3, p-value = 0.1923
Model df: 12. Total lags used: 15
\end{lstlisting}
The function carries out a Ljung-Box test to check whether residuals are still correlated. It also provides a graphical output:
As a last check before a model is called appropriate, simulating from the estimated coefficients and visually inspecting the resulting series (without any prejudices) to the original one can be beneficial.
\begin{itemize}
\item The simulated series should "look like" the original. If this is not the case, the model failed to capture (some of) the properties in the original data.
\item A larger or more sophisticated model may be necessary in cases where simulation does not recapture the features in the original data.
\end{itemize}
\subsection{Moving average models (MA)}
Whereas for $AR(p)$-models, the current observation of a series is written as a linear combination of its own past, $MA(q)$-models can be seen as an extension of the "pure" process
$$X_t = E_t$$
in the sense that the last q innovation terms $E_{t-1} , E_{t-2} ,...$ are included, too. We call this a moving average model:
Thus, we have a «cut-off» situation, i.e. a similar behavior to the one of the PACF in an $AR(1)$ process. This is why and how $AR(1)$ and $MA(1)$ are complementary.
\subsubsection{Invertibility}
Without additional assumptions, the ACF of an $MA(1)$ does not allow identification of the generating model.
\item An $MA(1)$-, or in general an $MA(q)$-process is said to be invertible if the roots of the characteristic polynomial $\Theta(B)$ exceed one in absolute value.
\item Under this condition, there exists only one $MA(q)$-process for any given ACF. But please note that any $MA(q)$ is stationary, no matter if it is invertible or not.
\item The condition on the characteristic polynomial translates to restrictions on the coefficients. For any MA(1)-model, $|\beta_1| < 1$ is required.
\item R function \verb|polyroot()| can be used for finding the roots.
\end{itemize}
\textbf{Practical importance:}\\
The condition of invertibility is not only a technical issue, but has important practical meaning. All invertible $MA(q)$ processes can be expressed in terms of an $AR(\infty)$, e.g. for an $MA(1)$:
The simplest idea is to exploit the relation between model parameters and autocorrelation coefficients («Yule-Walker») after the global mean $m$ has been estimated and subtracted. \\
In contrast to the Yule-Walker method for AR(p) models, this yields an inefficient estimator that generally generates poor results and hence should not be used in practice.
\vspace{.2cm}
It is better to use \textbf{Conditional sum of squares}:\\
This is based on the fundamental idea of expressing $\sum E_t^2$ in terms of $X_1 ,..., X_n$ and $\beta_1 ,\dots, \beta_q$, as the innovations themselves are unobservable. This is possible for any invertible $MA(q)$, e.g. the $MA(1)$:
\caption{Comparison of $AR$-,$MA$-, $ARMA$-models}
\end{table}
\begin{itemize}
\item In an $ARMA(p,q)$, depending on the coefficients of the model, either the $AR(p)$ or the $MA(q)$ part can dominate the ACF/PACF characteristics.
\item In an $ARMA(p,q)$, depending on the coefficients of the model, either the $AR(p)$ or the $MA(q)$ part can dominate the ACF/PACF characteristics.
\end{itemize}
\subsubsection{Fitting ARMA-models to data}
See $AR$- and $MA$-modelling
\subsubsection{Identification of order (p,q)}
May be more difficult in reality than in theory:
\begin{itemize}
\item We only have one single realization of the time series with finite length. The ACF/PACF plots are not «facts», but are estimates with uncertainty. The superimposed cut-offs may be difficult to identify from the ACF/PACF plots.
\item$ARMA(p,q)$ models are parsimonius, but can usually be replaced by high-order pure $AR(p)$ or $MA(q)$ models. This is not a good idea in practice, however!
\item In many cases, an AIC grid search over all $ARMA(p,q)$ with $p+q < 5$ may help to identify promising models.
In R, finding the AIC-minimizing $ARMA(p,q)$-model is convenient with the use of \verb|auto.arima()| from \verb|library(forecast)|. \\
\vspace{.2cm}
\textbf{Beware}: Handle this function with care! It will always identify a «best fitting» $ARMA(p,q)$, but there is no guarantee that this model provides an adequate fit! \\
\vspace{.2cm}
Using \verb|auto.arima()| should always be complemented by visual inspection of the time series for assessing stationarity, verifying the ACF/PACF plots for a second thought on suitable models. Finally, model diagnostics with the usual residual plots will decide whether the model is useful in practice.
Be careful: this assumes that the errors $E_t$ are uncorrelated (often not the case)! \\
\vspace{.2cm}
With correlated errors, the estimates $\hat{\beta}_j$ are still unbiased, but more efficient estimators than OLS exist. The standard errors are wrong, often underestimated, causing spurious significance. $\rightarrow$ GLS!
\begin{itemize}
\item The series $Y_t, x_{t1} ,\dots, x_{tp}$ can be stationary or non-stationary.
\item It is crucial that there is no feedback from the response $Y_t$ to the predictor variables $x_{t1},\dots, x_{tp}$ , i.e. we require an input/output system.
\item$E_t$ must be stationary and independent of $x_{t1},\dots, x_{tp}$, but may be Non-White-Noise with some serial correlation.
\end{itemize}
\subsubsection{Finding correlated errors}
\begin{enumerate}
\item Start by fitting an OLS regression and analyze residuals
\item Continue with a time series plot of OLS residuals
\item Also analyze ACF and PACF of OLS residuals
\end{enumerate}
\subsubsection{Durbin-Watson test}
The Durbin-Watson approach is a test for autocorrelated errors in regression modeling based on the test statistic:
Package \verb|nlme| has function \verb|gls()|. It does only work if the correlation structure of the errors is provided. This has to be determined from the residuals of an OLS regression first.
The output contains the regression coefficients and their standard errors, as well as the AR-coefficients plus some further information about the model (Log-Likelihood, AIC, ...).
\subsection{Missing input variables}
\begin{itemize}
\item Correlated errors in (time series) regression problems are often caused by the absence of crucial input variables.
\item In such cases, it is much better to identify the not-yet-present variables and include them into the regression model.
\item However, in practice this isn‘t always possible, because these crucial variables may be non-available.
\item\textbf{Note:} Time series regression methods for correlated errors such as GLS can be seen as a sort of emergency kit for the case where the non-present variables cannot be added. If you can do without them, even better!
\end{itemize}
\section{ARIMA and SARIMA}
\textbf{Why?}\\
Many time series in practice show trends and/or seasonality. While we can decompose them and describe the stationary part, it might be attractive to directly model them. \\
\vspace{.2cm}
\textbf{Advantages}\\
Forecasting is convenient and AIC-based decisions for the presence of trend/seasonality become feasible. \\
\vspace{.2cm}
\textbf{Disadvantages}\\
Lack of transparency for the decomposition and forecasting has a bit the flavor of a black-box-method. \\
\subsection{ARIMA(p,d,q)-models}
ARIMA models are aimed at describing series that have a trend which can be removed by differencing, and where the differences can be described with an ARMA($p,q$)-model. \\
\item Fit the model using the \verb|arima()| procedure. This can be done on the original series by setting $d$ accordingly, or on the differences, by setting $d =0$ and argument \verb|include.mean=FALSE|.
\item Analyze the residuals; these must look like White Noise. If several competing models are appropriate, use AIC to decide for the winner.
\end{enumerate}
\textbf{Example}\footnote{Full example in script pages 117ff}{}\\
Plausible models for the logged oil prices after inspection of ACF/PACF of the differenced series (that seems stationary): ARIMA(1,1,1) or ARIMA(2,1,1)
\begin{lstlisting}[language=R]
> arima(lop, order=c(1,1,1))
Coefficients:
ar1 ma1
-0.2987 0.5700
s.e. 0.2009 0.1723
sigma^2 = 0.006642: ll = 261.11, aic = -518.22
\end{lstlisting}
\subsubsection{Rewriting ARIMA as Non-Stationary ARMA}
Any ARIMA(p,d,q) model can be rewritten in the form of a non-stationary ARMA((p+d),q) process. This provides some deeper insight, especially for the task of forecasting.
\subsection{SARIMA(p,d,q)(P,D,Q)$^S$}
We have learned that it is also possible to use differencing for obtaining a stationary series out of one that features both trend and seasonal effect.
\begin{enumerate}
\item Removing the seasonal effect by differencing at lag 12 \\\begin{center}$Y_t = X_t - X_{t-12}=(1-B^{12})X_t$\end{center}
\item Usually, further differencing at lag 1 is required to obtain a series that has constant global mean and is stationary \\\begin{center}$Z_t = Y_t - Y_{t-1}=(1-B^{12})Y_t =(1-B)(1-B^{12})X_t = X_t - X_{t-1}- X_{t-12}+ X_{t-13}$\end{center}
\end{enumerate}
The stationary series $Z_t$ is then modelled with some special kind of ARMA($p,q$) model. \\
\vspace{.2cm}
\textbf{Definition}\\
A series $X_t$ follows a SARIMA($p,d,q$)($P,D,Q$)$^S$-process if the following equation holds:
Here, series Z t originated from $X_t$ after appropriate seasonal and trend differencing: $Z_t =(1-B)^d (1-B^S)^D X_t$\\
\vspace{.2cm}
In most practical cases, using differencing order $d = D =1$ will be sufficient. Choosing of $p,q,P,Q$ happens via ACF/PACF or via AIC-based decisions.
\subsubsection{Fitting SARIMA}
\begin{enumerate}
\item Perform seasonal differencing of the data. The lag $S$ is determined by the period. Order $D =1$ is mostly enough.
\item Decide if additional differencing at lag 1 is required for stationarity. If not, then $d =0$. If yes, then try $d =1$.
\item Analyze ACF/PACF of $Z_t$ to determine $p,q$ for the short term and $P,Q$ at multiple-of-the-period dependency.
\item Fit the model using \verb|arima()| by setting \verb|order=c(p,d,q)| and \verb|seasonal=c(P,D,Q)| accordingly to your choices.
\item Check the accuracy of the model by residual analysis. The residuals must look like White Noise and +/- Gaussian.
\end{enumerate}
\section{ARCH/GARCH-models}
The basic assumption for ARCH/GARCH models is as follows:
$$X_t =\mu_t + E_t$$
where $E_t =\sigma_t W_t$ and $W_t$ is white noise. \\
Here, both the conditional mean and variance are non-trivial
We can determine the order of an ARCH($p$) process in by analyzing ACF and PACF of the squared time series data. We then again search for an exponential decay in the ACF and a cut-off in the PACF.
\subsubsection{Fitting an ARCH(2)-model}
The simplest option for fitting an ARCH($p$) in R is to use function \verb|garch()| from \verb|library(tseries)|. Be careful, because the \verb|order=c(q,p)| argument differs from most of the literature.
\subsection{Sources of uncertainty in forecasting}
\begin{enumerate}
\item Does the data generating process from the past also apply in the future? Or are there major disruptions and discontinuities?
\item Is the model we chose correct? This applies both to the class of models (i.e. ARMA($p,q$)) as well as to the order of the model.
\item Are the model coefficients (e.g. $\alpha_1 ,..., \alpha_p; \beta_1 ,..., \beta_q; \sigma_E^2 ; m$) well estimated and accurate? How much differ they from the «truth»?
\item The stochastic variability coming from the innovation $E_t$.
\end{enumerate}
Due to the major uncertainties that are present, forecasting will usually only work reasonably on a short-term basis.
\subsection{Basics}
Probabilistic principle for deriving point forecasts:
\item The principles provide a generic setup, but are only useful and practicable under additional assumptions and have to be operationalized for every time series model/process.
\item For stationary AR (1) processes with normally distributed innovations, we can apply the generic principles with relative ease and derive formulae for the point forecast and the prediction interval.
\end{itemize}
\subsection{AR(p) forecasting}
The principles are the same, forecast and prognosis interval are:
$$E[X_{n+k} | X_1, \dots, X_n]$$
and
$$Var[X_{n+k} | X_1, \dots, X_n]$$
The computations are a bit more complicated, but do not yield major further insight. We are thus doing without and present: \\
If an observed value for $\hat{X}_{n+k-t}$ is available, we plug it in. Else, the forecasted value is used. Hence, the forecasts for horizons $k > 1$ are determined in a recursive manner.
\subsubsection{Measuring forecast error}
\textbf{When on absolute scale (no log-transformation)}:
\item If a time series gets log-transformed, we will study its character and its dependencies on the transformed scale. This is also where we will fit time series models.
\item If forecasts are produced, one is most often interested in the value on the original scale. Now, caution is needed: \\$\exp(\hat{x}_t)$ yields a biased forecast, the median of the forecast distribution. This is the value that 50\% of the realizations will lie above, and 50\% will be below. For an unbiased forecast, i.e. obtaining the mean, we need:
where $\hat{\sigma}_k^2$ is equal to the k-step forecast variance.
\subsubsection{Remarks}
\begin{itemize}
\item AR($p$) processes have a Markov property. Given the model parameters, we only need to know the last $p$ observations in the series to compute the forecast and prognosis interval.
\item The prognosis intervals are only valid on a pointwise basis, and they generally only cover the uncertainty coming from innovation, but not from other sources. Hence, they are generally too small.
\item Retaining the final part of the series, and predicting it with several competing models may give hints which one yields the best forecasts. This can be an alternative approach for choosing the model order $p$.
\end{itemize}
\subsection{Forecasting MA(q) and ARMA(p,q)}
\begin{itemize}
\item Point and interval forecasts will again, as for AR($p$), be derived from the theory of conditional mean and variance.
\item The derivation is more complicated, as it involves the latent innovations terms $e_n, e_{n-1},e_{n-2} ,...$ or alternatively not observed time series instances $x_{-\infty},...,x_{-1},x_0$.
\item Under invertibility of the MA($q$)-part, the forecasting problem can be approximately but reasonably solved by choosing starting values $x_{-\infty}=...=x_{-1}=x_0=0$.
\end{itemize}
\subsubsection{MA(1) example}
\begin{itemize}
\item We have seen that for all non-shifted MA($1$)-processes, the $k$-step forecast for all $k>1$ is trivial and equal to $0$.
\item In case of $k=1$, we obtain for the MA($1$)-forecast: \\
\item With MA($q$) models, all forecasts for horizons $k>q$ will be trivial and equal to zero. This is not the case for $k \leq q$.
\item We encounter the same difficulties as with MA($1$) processes. By conditioning on the infinite past, rewriting the MA($q$) as an AR($\infty$) and the choice of initial zero values for times $t \geq0$, the forecasts can be computed.
\item We do without giving precise details about the involved formulae here, but refer to the general results for ARMA($p,q$), from where the solution for pure MA($q$) can be obtained.
\item In R, functions \verb|predict()| and \verb|forecast()| implement all this!
\subsection{Forecasting with trend and seasonality}
Time series with a trend and/or seasonal effect can either be predicted after decomposing or with exponential smoothing. It is also very easy and quick to predict from a SARIMA model.
\begin{itemize}
\item The ARIMA/SARIMA model is fitted in R as usual. Then, we can simply employ the \verb|predict()| command and obtain the forecast plus a prediction interval.
\item Technically, the forecast comes from the stationary ARMA model that is obtained after differencing the series.
\item Finally, these forecasts need to be integrated again. This procedure has a bit the touch of a black box approach.
ARIMA processes are aimed at unit-root processes which are non-stationary, but do not necessarily feature a deterministic (e.g. linear) trend. We observe the following behavior:
\begin{itemize}
\item If $d =1$ , the forecast from an ARIMA($p,1,q$) will converge to a constant value, i.e. the global mean of the time series.
\item ARIMA ($p,1,q$) prediction interval do not converge to constant size for $k \rightarrow\infty$, but are indefinitely increasing in width.
\item In particular, an ARIMA forecast always fails to pick up a linear trend in the data. If such a thing exists, we need to add a so-called drift term.
\end{itemize}
\subsubsection{ARIMA with drift term}
To capture a trend we can use
\begin{lstlisting}[language=R]
> fit <- Arima(dat, order=c(1,0,1), include.drift=TRUE, include.mean=FALSE)
\item When SARIMA models are used for forecasting, they will pick-up both the latest seasonality and trend in the data.
\item Due to the double differencing that is usually applied, there is no need/option to include a drift term for covering trends.
\item As we can see, the prognosis intervals also cover the effect of trend and seasonality. They become (much) wider for longer forecasting horizons.
\item There is no control about the trend forecast, nor can we take any interventions about it. This leaves room for decomposition based forecasting with more freedom.
The principle for forecasting time series that are decomposed into trend, seasonal effect and remainder is:
\begin{enumerate}
\item\textbf{Stationary Remainder}\\ Is usually modelled with an ARMA ($p,q$) , so we can generate a time series forecast with the methodology from before.
\item\textbf{Seasonal Effect}\\ Is assumed as remaining “as is”, or “as it was last” (in the case of evolving seasonal effect) and extrapolated.
\item\textbf{Trend}\\ Is either extrapolated linearly, or sometimes even manually.
\end{enumerate}
\subsubsection{Using R}
A much simpler forecasting procedure for decomposed series is available in R. Just three lines of code are good enough.
\item The time series is decomposed and deseasonalized
\item The last observed year of the seasonality is extrapolated
\item The \verb|seasadj()| series is automatically forecasted using a) exponential smoothing, b) ARIMA, c) a random walk with drift or any custom method.
\end{itemize}
\subsection{Exponential smoothing}
\subsubsection{Simple exponential smoothing}
This is a quick approach for estimating the current level of a time series, as well as for forecasting future values. It works for any stationary time series \textbf{without a trend and season.}
Holt-Winters exponential smoothing with trend and additive seasonal component.
Smoothing parameters:
alpha=0.4148028; beta=0; gamma=0.4741967
Coefficients:
a 5.62591329; b 0.01148402
s1 -0.01230437; s2 0.01344762; s3 0.06000025
s4 0.20894897; s5 0.45515787; s6 -0.37315236
s7 -0.09709593; s8 -0.25718994; s9 -0.17107682
s10 -0.29304652; s11 -0.26986816; s12 -0.01984965
\end{lstlisting}
Example in script pages 190ff
\subsection{Forecasting using ETS models}
This is an \textbf{ExponenTial Smoothing} approach that is designed for forecasting time series with various properties (i.e. trend, seasonality, additive/multiplicative, etc.)
\begin{itemize}
\item With the R function \verb|ets()|, an automatic search for the best fitting model among 30 candidates is carried out.
\item The coefficients of these models are (by default) estimated using the Maximum-Likelihood-Principle.
\item Model selection happens using AIC , BIC or (by default) with the corrected AICc $=$ AIC $+$$2(p +1)(p +2)/(n - p)$.
\item The function outputs point and interval forecasts and also allows for convenient graphical display of the results.
\end{itemize}
The \verb|ets()| function in R works fully automatic:
\begin{itemize}
\item It recognizes by itself whether a multiplicative model (i.e. a log-transformation behind the scenes) is required or not.
\item It correctly deals with and finds the appropriate model for series with trend or seasonal effect, or both or none of that.
\item From the manual: a 3-character string identifies the model used. The first letter denotes the error type \verb|("A", "M" or "Z")|; \\ the second letter denotes the trend type \verb|("N","A","M" or "Z")|; \\ and the third letter denotes the season type \verb|("N","A","M" or "Z")|. \\ In all cases, \verb|"N"|=none, \verb|"A"|=additive, \verb|"M"|=multiplicative and \verb|"Z"|=automatically selected.
\end{itemize}
\subsection{Using external factors}
Time series forecasting as we will discuss it is just based on the past observed data and does not incorporate any external factors (i.e. acquisition, competitors, market share, ...):
\begin{itemize}
\item The influence of external factors is usually hard to quantify even in the past. If a model can be built, we still need to extrapolate all the external factors into the future.
\item It is usually very difficult to organize reliable data for this.
\item Alternative: generate time series forecasts as shown here.
\item These forecasts are to be seen as a basis for discussion, manual modification is still possible if appropriate.
\end{itemize}
\section{Multivariate time series analysis}
Goal: Infer the relation between two time series
$$X_1=(X_{1,t}); \; X_2=(X_{2,t})$$
What is the difference to time series regression?
\begin{itemize}
\item Here, the two series arise „on an equal footing“, and we are interested in the correlation between them.
\item In time series regression, the two (or more) series are causally related and we are interested in inferring that relation. There is an independent and several dependent variables.
\item The difference is comparable to the difference between correlation and regression.
\end{itemize}
\subsection{Cross covariance}
The cross correlations describe the relation between two time series. However, note that the interpretation is quite tricky! \\
It suffices to analyze $\gamma_{12}(k)$, and neglect $\gamma_{21}(k)$, but we have to regard both positive and negative lags $k$. We again prefer to work with correlations:
The confidence bounds in the sample cross correlation are only valid in some special cases, i.e. if there is no cross correlation and at least one of the series is uncorrelated. \textbf{Note}: the confidence bounds are often too small!
\subsubsection{Special case I}
We assume that there is no cross correlation for large lags $k$: \\
If $\rho_{12}(j)=0$ for $|j| \geq m$ we have for $|k|\geq m:$
This goes to zero for large $n$ and we thus have consistency. For giving statements about the confidence bounds, we would have to know more about the cross correlations, though.
\subsubsection{Special case II}
There is no cross correlation, but $X_1$ and $X_2$ are both time series that show correlation „within“:
There is no cross correlation, and $X_1$ is a White Noise series that is independent from $X_2$. Then, the estimation variance simplifies to:
$$Var(\hat{\rho}(k))\approx\frac{1}{n}$$
Thus, the confidence bounds are valid in this case. \\
\vspace{.2cm}
However, we introduced the concept of cross correlation to infer the relation between correlated series. The trick of the so-called «prewhitening» helps.
\subsection{Prewhitening}
Prewhitening means that the time series is transformed such that it becomes a white noise process, i.e. is uncorrelated. \\
\vspace{.2cm}
We assume that both stationary processes $X_1$ and be rewritten as follows:
with uncorrelated $U_t$ and $V_t$. Note that this is possible for ARMA($p,q$) processes by writing them as an AR($\infty$). The left hand side of the equation then is the innovation.
\subsubsection{Cross correlation of prewhitened series}
The cross correlation between $U_t$ and $V_t$ can be derived from the one between $X_1$ and $X_2$:
$$\rho_{UV}(k)=0\,\forall\, k \Leftrightarrow\rho_{X_1 X_2}(k)=0\,\forall\, k$$
\subsection{Vector autoregression (VAR)}
What if we do not have an input/output system, but there are cross correlations and hence influence between both variables? \\
\vspace{.2cm}
A VAR model is a generalization of the univariate AR-model. It has one equation per variable in the system. We keep it simple and consider a 2-variable VAR at lag 1.
Here, $E_1$ and $E_2$ are both White Noise processes, but not strictly uncorrelated among each other. \\
\section{Spectral analysis}
\begin{tabular}{lp{0.27\textwidth}}
Basis: & Many time series show (stochastic) periodic behavior. The goal of spectral analysis is to understand the
cycles at which highs and lows in the data appear. \\
Idea: & Time series are interpreted as a combination of cyclic components. For observed series, a decomposition into a linear combination of harmonic oscillations is set up and used as a basis for estimating the spectrum. \\
Why: & As a descriptive means, showing the character and the dependency structure within the series. There are
some important applications in engineering, economics and medicine.
\end{tabular}
\subsection{Harmonic oscillations}
The most simple periodic functions are sine and cosine, which we will use as the basis of our decomposition analysis.
where $\nu_k =\frac{k}{n}$ for $k =1,\dots,m$ with $m \in(n/2)$\\
\vspace{.2cm}
We are limited to this set of frequencies which provides an orthogonal fit. As we are spending $n$ degrees of freedom on $n$ we will have a perfect fit with zero residuals. \\
\vspace{.2cm}
Note that the Fourier frequencies are not necessarily the correct frequencies, there may be aliasing and leakage problems.
\subsection{Aliasing}\label{aliasing}
The aliasing problem is based on the fact that if frequency $\nu$ fits the data, then frequencies $\nu+1, \nu+2, ...$ will do so, too.
This values measures the importance of $\nu_k$ in the spectral decompostion and is the basis of the raw periodogram, which shows that importance for all Fourier frequencies. \\
\vspace{.2cm}
Note: the period of frequency $\nu_k$ is $1/\nu_k = n/k$. Or we can also say that the respective peaks at this frequency repeat themselves for $k$ time in the observed time series.
The spectrum of a time series process is a function telling us the importance of particular frequencies to the variation of the series.
\begin{itemize}
\item Usually, time series processes have a continous frequency spectrum and do not only consist of a few single frequencies.
\item For ARMA($p,q$) process, the spectrum is continous and there are explicit formulae, depending on the model parameters.
\item Subsequently, we will pursue the difficult task of estimating the spectrum, based on the raw periodogram.
\item There is a 1:1 correspondence between the autocovariance function of a time series process and its spectrum.
\end{itemize}
Our goal is estimating the spectrum of e.g. an ARMA($p,q$). There is quite a discrepancy between the discrete raw periodogram and the continous spectrum. The following issues arise:
\begin{itemize}
\item The periodogram is noisy, and there may be leakage.
\item The periodogram value at frequency $\nu_k$ is an unbiased estimator of the spectrum value $f (\nu_k)$. However, it is inconsistent due to its variability, owing to the fact that we estimate n periodogram values from n observations.
\item Theory tells us that $\nu_k$ and $\nu_j$ for $k \neq j$ are asymptotically independent. This will be exploited to improve estimation.
\end{itemize}
\subsection{Smoothing the periodogram}
Due to asymptotic independence and unbiasedness and the smooth nature of the spectrum, smoothing approaches help in achieving qualitatively good, consistent spectral estimates. \\
The challenge lies in the choice of the weights. The Daniell Smoother is a Weighted Running Mean with $w_k =1/2 L$ for $k < L$ and $w_k =1/4 L$ for $k = L$. This is the default in the R function \verb|spec.pgram()| if argument \verb|spans=2L+1|
\subsubsection{Tapering}
Tapering is a technique to further improve spectral estimates. The R function \verb|spec.pgram()| applies it by default, and unless you know much better, you must keep it that way.
\begin{itemize}
\item In spectral analysis, a time series is seen as a finite sample with a rectangular window of an infinitely long process.
\item This rectangular window distorts spectral estimation in several ways, among others also via the effect of leaking.
\item Tapering means that the ends of the time series are altered to mitigate these effects, i.e. they gradually taper down towards zero.
\end{itemize}
\subsection{Model-based spectral estimation}
The fundamental idea for this type of spectral estimate is to fit an AR(p) model to an observed series and then derive the theoretical spectrum by plugging-in the estimated coefficients
\begin{itemize}
\item This approach is not related to the periodogram based smoothing approaches presented before.
\item By nature, it alwas provides a smooth spectral estimate.
\item There is an excellent implementation in R: \verb|spec.ar()|.
\end{itemize}
Please note that spectral estimates are usually plotted on the dB-scale which is logarithmic. Also, the R function provides a confidence interval.
The \textit{Akaike-information-criterion} is useful for determining the order of an $ARMA(p,q)$ model. The formula is as follows (\textbf{lower is better}):
\item$\log(L)$: Goodness-of-fit criterion: Log-likelihood function
\item$p+q+k+1$: Penalty for model complexity: $p, q$ are the $AR$- resp. $MA$-orders; $k =1$ if a global mean is in use, else $0$ . The final $+1$ is for the innovation variance
\end{itemize}
For small samples $n$, often a corrected version is used:
$$AICc = AIC +\frac{2(p + q + k +1)(p + q + k +2)}{n - p - q - k -2}$$