Add Literature section
This commit is contained in:
parent
8d10ba9a05
commit
3849e5fd3f
15 changed files with 878 additions and 3 deletions
53
tex/2_lit/3_ml/2_learning.tex
Normal file
53
tex/2_lit/3_ml/2_learning.tex
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
\subsubsection{Supervised Learning.}
|
||||
\label{learning}
|
||||
|
||||
A conceptual difference between classical and ML methods is the format
|
||||
for the model inputs.
|
||||
In ML models, a time series $Y$ is interpreted as labeled data.
|
||||
Labels are collected into a vector $\vec{y}$ while the corresponding
|
||||
predictors are aligned in an $(T - n) \times n$ matrix $\mat{X}$:
|
||||
$$
|
||||
\vec{y}
|
||||
=
|
||||
\begin{pmatrix}
|
||||
y_T \\
|
||||
y_{T-1} \\
|
||||
\dots \\
|
||||
y_{n+1}
|
||||
\end{pmatrix}
|
||||
~~~~~~~~~~
|
||||
\mat{X}
|
||||
=
|
||||
\begin{bmatrix}
|
||||
y_{T-1} & y_{T-2} & \dots & y_{T-n} \\
|
||||
y_{T-2} & y_{T-3} & \dots & y_{T-(n+1)} \\
|
||||
\dots & \dots & \dots & \dots \\
|
||||
y_n & y_{n-1} & \dots & y_1
|
||||
\end{bmatrix}
|
||||
$$
|
||||
The $m = T - n$ rows are referred to as samples and the $n$ columns as
|
||||
features.
|
||||
Each row in $\mat{X}$ is "labeled" by the corresponding entry in $\vec{y}$,
|
||||
and ML models are trained to fit the rows to their labels.
|
||||
Conceptually, we model a functional relationship $f$ between $\mat{X}$ and
|
||||
$\vec{y}$ such that the difference between the predicted
|
||||
$\vec{\hat{y}} = f(\mat{X})$ and the true $\vec{y}$ are minimized
|
||||
according to some error measure $L(\vec{\hat{y}}, \vec{y})$, where $L$
|
||||
summarizes the goodness of the fit into a scalar value (e.g., the
|
||||
well-known mean squared error [MSE]; cf., Section \ref{mase}).
|
||||
$\mat{X}$ and $\vec{y}$ show the ordinal character of time series data:
|
||||
Not only overlap the entries of $\mat{X}$ and $\vec{y}$, but the rows of
|
||||
$\mat{X}$ are shifted versions of each other.
|
||||
That does not hold for ML applications in general (e.g., the classical
|
||||
example of predicting spam vs. no spam emails, where the features model
|
||||
properties of individual emails), and most of the common error measures
|
||||
presented in introductory texts on ML, are only applicable in cases
|
||||
without such a structure in $\mat{X}$ and $\vec{y}$.
|
||||
$n$, the number of past time steps required to predict a $y_t$, is an
|
||||
exogenous model parameter.
|
||||
For prediction, the forecaster supplies the trained ML model an input
|
||||
vector in the same format as a row $\vec{x}_i$ in $\mat{X}$.
|
||||
For example, to predict $y_{T+1}$, the model takes the vector
|
||||
$(y_T, y_{T-1}, ..., y_{T-n+1})$ as input.
|
||||
That is in contrast to the classical methods, where we only supply the number
|
||||
of time steps to be predicted as a scalar integer.
|
||||
Loading…
Add table
Add a link
Reference in a new issue