1
0
Fork 0

Add Model section

This commit is contained in:
Alexander Hess 2020-10-04 23:39:20 +02:00
commit 91bd4ba083
Signed by: alexander
GPG key ID: 344EA5AB10D868E0
25 changed files with 1354 additions and 6 deletions

View file

@ -0,0 +1,20 @@
\subsection{Forecasting Models}
\label{models}
This sub-section describes the concrete models in our study.
Figure \ref{f:inputs} shows how we classify them into four families with
regard to the type of the time series, horizontal or vertical, and the
moment at which a model is trained:
Solid lines indicate that the corresponding time steps lie before the
training, and dotted lines show the time horizon predicted by a model.
For conciseness, we only show the forecasts for one test day.
The setup is the same for each inner validation day.
\
\begin{center}
\captionof{figure}{Classification of the models by input type and training
moment}
\label{f:inputs}
\includegraphics[width=.95\linewidth]{static/model_inputs_gray.png}
\end{center}

View file

@ -0,0 +1,42 @@
\subsubsection{Horizontal and Whole-day-ahead Forecasts.}
\label{hori}
The upper-left in Figure \ref{f:inputs} illustrates the simplest way to
generate forecasts for a test day before it has started:
For each time of the day, the corresponding horizontal slice becomes the input
for a model.
With whole days being the unified time interval, each model is trained $H$
times, providing a one-step-ahead forecast.
While it is possible to have models of a different type be selected per time
step, that did not improve the accuracy in the empirical study.
As the models in this family do not include the test day's demand data in
their training sets, we see them as benchmarks to answer \textbf{Q4},
checking if a UDP can take advantage of real-time information.
The models in this family are as follows; we use prefixes, such as "h" here,
when methods are applied in other families as well:
\begin{enumerate}
\item \textit{\gls{naive}}:
Observation from the same time step one week prior
\item \textit{\gls{trivial}}:
Predict $0$ for all time steps
\item \textit{\gls{hcroston}}:
Intermittent demand method introduced by \cite{croston1972}
\item \textit{\gls{hholt}},
\textit{\gls{hhwinters}},
\textit{\gls{hses}},
\textit{\gls{hsma}}, and
\textit{\gls{htheta}}:
Exponential smoothing without calibration
\item \textit{\gls{hets}}:
ETS calibrated as described by \cite{hyndman2008b}
\item \textit{\gls{harima}}:
ARIMA calibrated as described by \cite{hyndman2008a}
\end{enumerate}
\textit{naive} and \textit{trivial} provide an absolute benchmark for the
actual forecasting methods.
\textit{hcroston} is often mentioned in the context of intermittent demand;
however, the method did not perform well at all.
Besides \textit{hhwinters} that always fits a seasonal component, the
calibration heuristics behind \textit{hets} and \textit{harima} may do so
as well.
With $k=7$, an STL decomposition is unnecessary here.

View file

@ -0,0 +1,39 @@
\subsubsection{Vertical and Whole-day-ahead Forecasts without Retraining.}
\label{vert}
The upper-right in Figure \ref{f:inputs} shows an alternative way to
generate forecasts for a test day before it has started:
First, a seasonally-adjusted time series $a_t$ is obtained from a vertical
time series by STL decomposition.
Then, the actual forecasting model, trained on $a_t$, makes an $H$-step-ahead
prediction.
Lastly, we add the $H$ seasonal na\"{i}ve forecasts for the seasonal component
$s_t$ to them to obtain the actual predictions for the test day.
Thus, only one training is required per model type, and no real-time data is
used.
By decomposing the raw time series, all long-term patterns are assumed to be
in the seasonal component $s_t$, and $a_t$ only contains the level with
a potential trend and auto-correlations.
The models in this family are:
\begin{enumerate}
\item \textit{\gls{fnaive}},
\textit{\gls{pnaive}}:
Sum of STL's trend and seasonal components' na\"{i}ve forecasts
\item \textit{\gls{vholt}},
\textit{\gls{vses}}, and
\textit{\gls{vtheta}}:
Exponential smoothing without calibration and seasonal
fit
\item \textit{\gls{vets}}:
ETS calibrated as described by \cite{hyndman2008b}
\item \textit{\gls{varima}}:
ARIMA calibrated as described by \cite{hyndman2008a}
\end{enumerate}
As mentioned in Sub-section \ref{unified_cv}, we include the sum of the
(seasonal) na\"{i}ve forecasts of the STL's trend and seasonal components
as forecasts on their own:
For \textit{fnaive}, we tune the "flexible" $ns$ parameter, and for
\textit{pnaive}, we set it to a "periodic" value.
Thus, we implicitly assume that there is no signal in the remainder $r_t$, and
predict $0$ for it.
\textit{fnaive} and \textit{pnaive} are two more simple benchmarks.

View file

@ -0,0 +1,22 @@
\subsubsection{Vertical and Real-time Forecasts with Retraining.}
\label{rt}
The lower-left in Figure \ref{f:inputs} shows how models trained on vertical
time series are extended with real-time order data as it becomes available
during a test day:
Instead of obtaining an $H$-step-ahead forecast, we retrain a model after
every time step and only predict one step.
The remainder is as in the previous sub-section, and the models are:
\begin{enumerate}
\item \textit{\gls{rtholt}},
\textit{\gls{rtses}}, and
\textit{\gls{rttheta}}:
Exponential smoothing without calibration and seasonal fit
\item \textit{\gls{rtets}}:
ETS calibrated as described by \cite{hyndman2008b}
\item \textit{\gls{rtarima}}:
ARIMA calibrated as described by \cite{hyndman2008a}
\end{enumerate}
Retraining \textit{fnaive} and \textit{pnaive} did not increase accuracy, and
thus we left them out.
A downside of this family is the significant increase in computing costs.

View file

@ -0,0 +1,54 @@
\subsubsection{Vertical and Real-time Forecasts without Retraining.}
\label{ml_models}
The lower-right in Figure \ref{f:inputs} shows how ML models take
real-time order data into account without retraining.
Based on the seasonally-adjusted time series $a_t$, we employ the feature
matrix and label vector representations from Sub-section \ref{learning}
and set $n$ to the number of daily time steps, $H$, to cover all potential
auto-correlations.
The ML models are trained once before a test day starts.
For training, the matrix and vector are populated such that $y_T$ is set to
the last time step of the day before the forecasts, $a_T$.
As the splitting during CV is done with whole days, the \gls{ml} models are
trained with training sets consisting of samples from all times of a day
in an equal manner.
Thus, the ML models learn to predict each time of the day.
For prediction on a test day, the $H$ observations preceding the time
step to be forecast are used as the input vector after seasonal
adjustment.
As a result, real-time data are included.
The models in this family are:
\begin{enumerate}
\item \textit{\gls{vrfr}}: RF trained on the matrix as described
\item \textit{\gls{vsvr}}: SVR trained on the matrix as described
\end{enumerate}
We tried other ML models such as gradient boosting machines but found
only RFs and SVRs to perform well in our study.
In the case of gradient boosting machines, this is to be expected as they are
known not to perform well in the presence of high noise - as is natural
with low count data - as shown, for example, by \cite{ma2018} or
\cite{mason2000}.
Also, deep learning methods are not applicable as the feature matrices only
consist of several hundred to thousands of rows (cf., Sub-section
\ref{params}).
In \ref{tabular_ml_models}, we provide an alternative feature matrix
representation that exploits the two-dimensional structure of time tables
without decomposing the time series.
In \ref{enhanced_feats}, we show how feature matrices are extended
to include predictors other than historical order data.
However, to answer \textbf{Q5} already here, none of the external data sources
improves the results in our study.
Due to the high number of time series in our study, to investigate why
no external sources improve the forecasts, we must us some automated
approach to analyzing individual time series.
\cite{barbour2014} provide a spectral density estimation approach, called
the Shannon entropy, that measures the signal-to-noise ratio in a
database with a number normalized between 0 and 1 where lower values
indicate a higher signal-to-noise ratio.
We then looked at averages of the estimates on a daily level per pixel and
find that including any of the external data sources from
\ref{enhanced_feats} always leads to significantly lower signal-to-noise
ratios.
Thus, we conclude that at least for the demand faced by our industry partner
the historical data contains all of the signal.