53 lines
2.3 KiB
TeX
53 lines
2.3 KiB
TeX
\subsubsection{Supervised Learning}
|
|
\label{learning}
|
|
|
|
A conceptual difference between classical and ML methods is the format
|
|
for the model inputs.
|
|
In ML models, a time series $Y$ is interpreted as labeled data.
|
|
Labels are collected into a vector $\vec{y}$ while the corresponding
|
|
predictors are aligned in an $(T - n) \times n$ matrix $\mat{X}$:
|
|
$$
|
|
\vec{y}
|
|
=
|
|
\begin{pmatrix}
|
|
y_T \\
|
|
y_{T-1} \\
|
|
\dots \\
|
|
y_{n+1}
|
|
\end{pmatrix}
|
|
~~~~~~~~~~
|
|
\mat{X}
|
|
=
|
|
\begin{bmatrix}
|
|
y_{T-1} & y_{T-2} & \dots & y_{T-n} \\
|
|
y_{T-2} & y_{T-3} & \dots & y_{T-(n+1)} \\
|
|
\dots & \dots & \dots & \dots \\
|
|
y_n & y_{n-1} & \dots & y_1
|
|
\end{bmatrix}
|
|
$$
|
|
The $m = T - n$ rows are referred to as samples and the $n$ columns as
|
|
features.
|
|
Each row in $\mat{X}$ is "labeled" by the corresponding entry in $\vec{y}$,
|
|
and ML models are trained to fit the rows to their labels.
|
|
Conceptually, we model a functional relationship $f$ between $\mat{X}$ and
|
|
$\vec{y}$ such that the difference between the predicted
|
|
$\vec{\hat{y}} = f(\mat{X})$ and the true $\vec{y}$ are minimized
|
|
according to some error measure $L(\vec{\hat{y}}, \vec{y})$, where $L$
|
|
summarizes the goodness of the fit into a scalar value (e.g., the
|
|
well-known mean squared error [MSE]; cf., Section \ref{mase}).
|
|
$\mat{X}$ and $\vec{y}$ show the ordinal character of time series data:
|
|
Not only overlap the entries of $\mat{X}$ and $\vec{y}$, but the rows of
|
|
$\mat{X}$ are shifted versions of each other.
|
|
That does not hold for ML applications in general (e.g., the classical
|
|
example of predicting spam vs. no spam emails, where the features model
|
|
properties of individual emails), and most of the common error measures
|
|
presented in introductory texts on ML, are only applicable in cases
|
|
without such a structure in $\mat{X}$ and $\vec{y}$.
|
|
$n$, the number of past time steps required to predict a $y_t$, is an
|
|
exogenous model parameter.
|
|
For prediction, the forecaster supplies the trained ML model an input
|
|
vector in the same format as a row $\vec{x}_i$ in $\mat{X}$.
|
|
For example, to predict $y_{T+1}$, the model takes the vector
|
|
$(y_T, y_{T-1}, ..., y_{T-n+1})$ as input.
|
|
That is in contrast to the classical methods, where we only supply the number
|
|
of time steps to be predicted as a scalar integer.
|