\subsubsection{Vertical and Real-time Forecasts without Retraining.}
\label{ml_models}

The lower-right in Figure \ref{f:inputs} shows how ML models take
    real-time order data into account without retraining.
Based on the seasonally-adjusted time series $a_t$, we employ the feature
    matrix and label vector representations from Sub-section \ref{learning}
    and set $n$ to the number of daily time steps, $H$, to cover all potential
    auto-correlations.
The ML models are trained once before a test day starts.
For training, the matrix and vector are populated such that $y_T$ is set to
    the last time step of the day before the forecasts, $a_T$.
As the splitting during CV is done with whole days, the \gls{ml} models are
    trained with training sets consisting of samples from all times of a day
    in an equal manner.
Thus, the ML models learn to predict each time of the day.
For prediction on a test day, the $H$ observations preceding the time
    step to be forecast are used as the input vector after seasonal
    adjustment.
As a result, real-time data are included.
The models in this family are:
\begin{enumerate}
\item \textit{\gls{vrfr}}: RF trained on the matrix as described
\item \textit{\gls{vsvr}}: SVR trained on the matrix as described
\end{enumerate}
We tried other ML models such as gradient boosting machines but found
    only RFs and SVRs to perform well in our study.
In the case of gradient boosting machines, this is to be expected as they are
    known not to perform well in the presence of high noise - as is natural
    with low count data - as shown, for example, by \cite{ma2018} or
    \cite{mason2000}.
Also, deep learning methods are not applicable as the feature matrices only
    consist of several hundred to thousands of rows (cf., Sub-section
    \ref{params}).
In \ref{tabular_ml_models}, we provide an alternative feature matrix
    representation that exploits the two-dimensional structure of time tables
    without decomposing the time series.
In \ref{enhanced_feats}, we show how feature matrices are extended
    to include predictors other than historical order data.
However, to answer \textbf{Q5} already here, none of the external data sources
    improves the results in our study.
Due to the high number of time series in our study, to investigate why
    no external sources improve the forecasts, we must us some automated
    approach to analyzing individual time series.
\cite{barbour2014} provide a spectral density estimation approach, called
    the Shannon entropy, that measures the signal-to-noise ratio in a
    database with a number normalized between 0 and 1 where lower values
    indicate a higher signal-to-noise ratio.
We then looked at averages of the estimates on a daily level per pixel and
    find that including any of the external data sources from
    \ref{enhanced_feats} always leads to significantly lower signal-to-noise
    ratios.
Thus, we conclude that at least for the demand faced by our industry partner
    the historical data contains all of the signal.