Add Model section
This commit is contained in:
parent
7c203cb87c
commit
91bd4ba083
25 changed files with 1354 additions and 6 deletions
|
|
@ -1,8 +1,6 @@
|
|||
\section{Model Formulation}
|
||||
\label{mod}
|
||||
|
||||
% temporary placeholders
|
||||
\label{decomp}
|
||||
\label{f:stl}
|
||||
\label{mase}
|
||||
\label{unified_cv}
|
||||
In this section, we describe how the platform's raw data are pre-processed
|
||||
into model inputs and how the forecasting models are built and benchmarked
|
||||
against each other.
|
||||
|
|
|
|||
28
tex/3_mod/2_overall.tex
Normal file
28
tex/3_mod/2_overall.tex
Normal file
|
|
@ -0,0 +1,28 @@
|
|||
\subsection{Overall Approach}
|
||||
\label{approach_approach}
|
||||
|
||||
On a conceptual level, there are three distinct aspects of the model
|
||||
development process.
|
||||
First, a pre-processing step transforms the platform's tabular order data into
|
||||
either time series in Sub-section \ref{grid} or feature matrices in
|
||||
Sub-section \ref{ml_models}.
|
||||
Second, a benchmark methodology is developed in Sub-section \ref{unified_cv}
|
||||
that compares all models on the same scale, in particular, classical
|
||||
models with ML ones.
|
||||
Concretely, the CV approach is adapted to the peculiar requirements of
|
||||
sub-daily and ordinal time series data.
|
||||
This is done to maximize the predictive power of all models into the future
|
||||
and to compare them on the same scale.
|
||||
Third, the forecasting models are described with respect to their assumptions
|
||||
and training requirements.
|
||||
Four classification dimensions are introduced:
|
||||
\begin{enumerate}
|
||||
\item \textbf{Timeliness of the Information}:
|
||||
whole-day-ahead vs. real-time forecasts
|
||||
\item \textbf{Time Series Decomposition}: raw vs. decomposed
|
||||
\item \textbf{Algorithm Type}: "classical" statistics vs. ML
|
||||
\item \textbf{Data Sources}: pure vs. enhanced (i.e., with external data)
|
||||
\end{enumerate}
|
||||
Not all of the possible eight combinations are implemented; instead, the
|
||||
models are varied along these dimensions to show different effects and
|
||||
answer the research questions.
|
||||
95
tex/3_mod/3_grid.tex
Normal file
95
tex/3_mod/3_grid.tex
Normal file
|
|
@ -0,0 +1,95 @@
|
|||
\subsection{Gridification, Time Tables, and Time Series Generation}
|
||||
\label{grid}
|
||||
|
||||
The platform's tabular order data are sliced with respect to both location and
|
||||
time and then aggregated into time series where an observation tells
|
||||
the number of orders in an area for a time step/interval.
|
||||
Figure \ref{f:grid} shows how the orders' delivery locations are each
|
||||
matched to a square-shaped cell, referred to as a pixel, on a grid
|
||||
covering the entire service area within a city.
|
||||
This gridification step is also applied to the pickup locations separately.
|
||||
The lower-left corner is chosen at random.
|
||||
\cite{winkenbach2015} apply the same gridification idea and slice an urban
|
||||
area to model a location-routing problem, and \cite{singleton2017} portray
|
||||
it as a standard method in the field of urban analytics.
|
||||
With increasing pixel sizes, the time series exhibit more order aggregation
|
||||
with a possibly stronger demand pattern.
|
||||
On the other hand, the larger the pixels, the less valuable become the
|
||||
generated forecasts as, for example, a courier sent to a pixel
|
||||
preemptively then faces a longer average distance to a restaurant in the
|
||||
pixel.
|
||||
|
||||
\begin{center}
|
||||
\captionof{figure}{Gridification for delivery locations in Paris with a pixel
|
||||
size of $1~\text{km}^2$}
|
||||
\label{f:grid}
|
||||
\includegraphics[width=.8\linewidth]{static/gridification_for_paris_gray.png}
|
||||
\end{center}
|
||||
|
||||
After gridification, the ad-hoc orders within a pixel are aggregated by their
|
||||
placement timestamps into sub-daily time steps of pre-defined lengths
|
||||
to obtain a time table as exemplified in Figure \ref{f:timetable} with
|
||||
one-hour intervals.
|
||||
|
||||
\begin{center}
|
||||
\captionof{figure}{Aggregation into a time table with hourly time steps}
|
||||
\label{f:timetable}
|
||||
\begin{tabular}{|c||*{9}{c|}}
|
||||
\hline
|
||||
\backslashbox{Time}{Day} & \makebox[2em]{\ldots}
|
||||
& \makebox[3em]{Mon} & \makebox[3em]{Tue}
|
||||
& \makebox[3em]{Wed} & \makebox[3em]{Thu}
|
||||
& \makebox[3em]{Fri} & \makebox[3em]{Sat}
|
||||
& \makebox[3em]{Sun} & \makebox[2em]{\ldots} \\
|
||||
\hline
|
||||
\hline
|
||||
11:00 & \ldots & $y_{11,Mon}$ & $y_{11,Tue}$ & $y_{11,Wed}$ & $y_{11,Thu}$
|
||||
& $y_{11,Fri}$ & $y_{11,Sat}$ & $y_{11,Sun}$ & \ldots \\
|
||||
\hline
|
||||
12:00 & \ldots & $y_{12,Mon}$ & $y_{12,Tue}$ & $y_{12,Wed}$ & $y_{12,Thu}$
|
||||
& $y_{12,Fri}$ & $y_{12,Sat}$ & $y_{12,Sun}$ & \ldots \\
|
||||
\hline
|
||||
\ldots & \ldots & \ldots & \ldots & \ldots
|
||||
& \ldots & \ldots & \ldots & \ldots & \ldots \\
|
||||
\hline
|
||||
20:00 & \ldots & $y_{20,Mon}$ & $y_{20,Tue}$ & $y_{20,Wed}$ & $y_{20,Thu}$
|
||||
& $y_{20,Fri}$ & $y_{20,Sat}$ & $y_{20,Sun}$ & \ldots \\
|
||||
\hline
|
||||
21:00 & \ldots & $y_{21,Mon}$ & $y_{21,Tue}$ & $y_{21,Wed}$ & $y_{21,Thu}$
|
||||
& $y_{21,Fri}$ & $y_{21,Sat}$ & $y_{21,Sun}$ & \ldots \\
|
||||
\hline
|
||||
\ldots & \ldots & \ldots & \ldots & \ldots
|
||||
& \ldots & \ldots & \ldots & \ldots & \ldots \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\
|
||||
|
||||
Consequently, each $y_{t,d}$ in Figure \ref{f:timetable} is the number of
|
||||
all orders within the pixel for the time of day $t$ and day of week
|
||||
$d$ ($y_t$ and $y_{t,d}$ are the same but differ in that the latter
|
||||
acknowledges a 2D view).
|
||||
The same trade-off as with gridification applies:
|
||||
The shorter the interval, the weaker is the demand pattern to be expected in
|
||||
the time series due to less aggregation while longer intervals lead to
|
||||
less usable forecasts.
|
||||
We refer to time steps by their start time, and their number per day, $H$,
|
||||
is constant.
|
||||
Given a time table as in Figure \ref{f:timetable} there are two ways to
|
||||
generate a time series by slicing:
|
||||
\begin{enumerate}
|
||||
\item \textbf{Horizontal View}:
|
||||
Take only the order counts for a given time of the day
|
||||
\item \textbf{Vertical View}:
|
||||
Take all order counts and remove the double-seasonal pattern induced
|
||||
by the weekday and time of the day with decomposition
|
||||
\end{enumerate}
|
||||
Distinct time series are retrieved by iterating through the time tables either
|
||||
horizontally or vertically in increments of a single time step.
|
||||
Another property of a generated time series is its length, which, following
|
||||
the next sub-section, can be interpreted as the sum of the production
|
||||
training set and the test day.
|
||||
In summary, a distinct time series is generated from the tabular order data
|
||||
based on a configuration of parameters for the dimensions pixel size,
|
||||
number of daily time steps $H$, shape (horizontal vs. vertical), length,
|
||||
and the time step to be predicted.
|
||||
86
tex/3_mod/4_cv.tex
Normal file
86
tex/3_mod/4_cv.tex
Normal file
|
|
@ -0,0 +1,86 @@
|
|||
\subsection{Unified Cross-Validation and Training, Validation, and Test Sets}
|
||||
\label{unified_cv}
|
||||
|
||||
The standard $k$-fold CV, which assumes no structure in the individual
|
||||
features of the samples, as shown in $\mat{X}$ above, is adapted to the
|
||||
ordinal character of time series data:
|
||||
A model must be evaluated on observations that occurred strictly after the
|
||||
ones used for training as, otherwise, the model knows about the future.
|
||||
Furthermore, some models predict only a single to a few time steps before
|
||||
being retrained, while others predict an entire day without retraining
|
||||
(cf., Sub-section \ref{ml_models}).
|
||||
Consequently, we must use a unified time interval wherein all forecasts are
|
||||
made first before the entire interval is evaluated.
|
||||
As whole days are the longest prediction interval for models without
|
||||
retraining, we choose that as the unified time interval.
|
||||
In summary, our CV methodology yields a distinct best model per pixel and day
|
||||
to be forecast.
|
||||
Whole days are also practical for managers who commonly monitor, for example,
|
||||
the routing and thus the forecasting performance on a day-to-day basis.
|
||||
Our methodology assumes that the models are trained at least once per day.
|
||||
As we create operational forecasts into the near future in this paper,
|
||||
retraining all models with the latest available data is a logical step.
|
||||
|
||||
\begin{center}
|
||||
\captionof{figure}{Training, validation, and test sets
|
||||
during cross validation}
|
||||
\label{f:cv}
|
||||
\includegraphics[width=.8\linewidth]{static/cross_validation_gray.png}
|
||||
\end{center}
|
||||
|
||||
The training, validation, and test sets are defined as follows.
|
||||
To exemplify the logic, we refer to Figure \ref{f:cv} that shows the calendar
|
||||
setup (i.e., weekdays on the x-axis) for three days $T_1$, $T_2$, and
|
||||
$T_3$ (shown in dark gray) for which we generate forecasts.
|
||||
Each of these days is, by definition, a test day, and the test set comprises
|
||||
all time series, horizontal or vertical, whose last observation lies on
|
||||
that day.
|
||||
With an assumed training horizon of three weeks, the 21 days before each of
|
||||
the test days constitute the corresponding training sets (shown in lighter
|
||||
gray on the same rows as $T_1$, $T_2$, and $T_3$).
|
||||
There are two kinds of validation sets, depending on the decision to be made.
|
||||
First, if a forecasting method needs parameter tuning, the original training
|
||||
set is divided into as many equally long series as validation days are
|
||||
needed to find stable parameters.
|
||||
The example shows three validation days per test day named $V_n$ (shown
|
||||
in darker gray below each test day).
|
||||
The $21 - 3 = 18$ preceding days constitute the training set corresponding to
|
||||
a validation day.
|
||||
To obtain the overall validation error, the three errors are averaged.
|
||||
We call these \textit{inner} validation sets because they must be repeated
|
||||
each day to re-tune the parameters and because the involved time series
|
||||
are true subsets of the original series.
|
||||
Second, to find the best method per day and pixel, the same averaging logic
|
||||
is applied on the outer level.
|
||||
For example, if we used two validation days to find the best method for $T_3$,
|
||||
we would average the errors of $T_1$ and $T_2$ for each method and select
|
||||
the winner; then, $T_1$ and $T_2$ constitute an \textit{outer} validation
|
||||
set.
|
||||
Whereas the number of inner validation days is method-specific and must be
|
||||
chosen before generating any test day forecasts in the first place, the
|
||||
number of outer validation days may be varied after the fact and is
|
||||
determined empirically as we show in Section \ref{stu}.
|
||||
|
||||
Our unified CV approach is also optimized for large-scale production settings,
|
||||
for example, at companies like Uber.
|
||||
As \cite{bell2018} note, there is a trade-off as to when each of the
|
||||
inner time series in the example begins.
|
||||
While the forecasting accuracy likely increases with more training days,
|
||||
supporting inner series with increasing lengths, cutting the series
|
||||
to the same length allows caching the forecasts and errors.
|
||||
In the example, $V_3$, $V_5$, and $V_7$, as well as $V_6$ and $V_8$ are
|
||||
identical despite belonging to different inner validation sets.
|
||||
Caching is also possible on the outer level when searching for an optimal
|
||||
number of validation days for model selection.
|
||||
We achieved up to 80\% cache hit ratios in our implementation in the
|
||||
empirical study, thereby saving computational resources by the same
|
||||
amount.
|
||||
Lastly, we assert that our suggested CV, because of its being unified
|
||||
around whole test days and usage of fix-sized time series, is also
|
||||
suitable for creating consistent learning curves and, thus, answering
|
||||
\textbf{Q3} on the relationship between forecast accuracy and amount of
|
||||
historic data:
|
||||
We simply increase the length of the outer training set holding the test day
|
||||
fixed.
|
||||
Thus, independent of a method's need for parameter tuning, all methods have
|
||||
the same demand history available for each test day forecast.
|
||||
87
tex/3_mod/5_mase.tex
Normal file
87
tex/3_mod/5_mase.tex
Normal file
|
|
@ -0,0 +1,87 @@
|
|||
\subsection{Accuracy Measures}
|
||||
\label{mase}
|
||||
|
||||
Choosing an error measure for both model selection and evaluation is not
|
||||
straightforward when working with intermittent demand, as shown, for
|
||||
example, by \cite{syntetos2005}, and one should understand the trade-offs
|
||||
between measures.
|
||||
\cite{hyndman2006} provide a study of measures with real-life data taken from
|
||||
the popular M3-competition and find that most standard measures degenerate
|
||||
under many scenarios.
|
||||
They also provide a classification scheme for which we summarize the main
|
||||
points as they apply to the UDP case:
|
||||
\begin{enumerate}
|
||||
\item \textbf{Scale-dependent Errors}:
|
||||
The error is reported in the same unit as the raw data.
|
||||
Two popular examples are the root mean square error (RMSE) and mean absolute
|
||||
error (MAE).
|
||||
They may be used for model selection and evaluation within a pixel, and are
|
||||
intuitively interpretable; however, they may not be used to compare errors
|
||||
of, for example, a low-demand pixel (e.g., at the UDP's service
|
||||
boundary) with that of a high-demand pixel (e.g., downtown).
|
||||
\item \textbf{Percentage Errors}:
|
||||
The error is derived from the percentage errors of individual forecasts per
|
||||
time step, and is also intuitively interpretable.
|
||||
A popular example is the mean absolute percentage error (MAPE) that is the
|
||||
primary measure in most forecasting competitions.
|
||||
Whereas such errors could be applied both within and across pixels, they
|
||||
cannot be calculated reliably for intermittent demand.
|
||||
If only one time step exhibits no demand, the result is a divide-by-zero
|
||||
error.
|
||||
This often occurs even in high-demand pixels due to the slicing.
|
||||
\item \textbf{Relative Errors}:
|
||||
A workaround is to calculate a scale-dependent error for the test day and
|
||||
divide it by the same measure calculated with forecasts of a simple
|
||||
benchmark method (e.g., na\"{i}ve method).
|
||||
An example could be
|
||||
$\text{RelMAE} = \text{MAE} / \text{MAE}_\text{bm}$.
|
||||
Nevertheless, even simple methods create (near-)perfect forecasts, and then
|
||||
$\text{MAE}_\text{bm}$ becomes (close to) $0$.
|
||||
These numerical instabilities occurred so often in our studies that we argue
|
||||
against using such measures.
|
||||
\item \textbf{Scaled Errors}:
|
||||
\cite{hyndman2006} contribute this category and introduce the mean absolute
|
||||
scaled error (\gls{mase}).
|
||||
It is defined as the MAE from the actual forecasting method on the test day
|
||||
(i.e., "out-of-sample") divided by the MAE from the (seasonal) na\"{i}ve
|
||||
method on the entire training set (i.e., "in-sample").
|
||||
A MASE of $1$ indicates that a forecasting method has the same accuracy
|
||||
on the test day as the (seasonal) na\"{i}ve method applied on a longer
|
||||
horizon, and lower values imply higher accuracy.
|
||||
Within a pixel, its results are identical to the ones obtained with MAE.
|
||||
Also, we acknowledge recent publications, for example, \cite{prestwich2014} or
|
||||
\cite{kim2016}, showing other ways of tackling the difficulties mentioned.
|
||||
However, only the MASE provided numerically stable results for all
|
||||
forecasts in our study.
|
||||
\end{enumerate}
|
||||
Consequently, we use the MASE with a seasonal na\"{i}ve benchmark as the
|
||||
primary measure in this paper.
|
||||
With the previously introduced notation, it is defined as follows:
|
||||
$$
|
||||
\text{MASE}
|
||||
:=
|
||||
\frac{\text{MAE}_{\text{out-of-sample}}}{\text{MAE}_{\text{in-sample}}}
|
||||
=
|
||||
\frac{\text{MAE}_{\text{forecasts}}}{\text{MAE}_{\text{training}}}
|
||||
=
|
||||
\frac{\frac{1}{H} \sum_{h=1}^H |y_{T+h} - \hat{y}_{T+h}|}
|
||||
{\frac{1}{T-k} \sum_{t=k+1}^T |y_{t} - y_{t-k}|}
|
||||
$$
|
||||
The denominator can only become $0$ if the seasonal na\"{i}ve benchmark makes
|
||||
a perfect forecast on each day in the training set except the first seven
|
||||
days, which never happened in our case study involving hundreds of
|
||||
thousands of individual model trainings.
|
||||
Further, as per the discussion in the subsequent Section \ref{decomp}, we also
|
||||
calculate peak-MASEs where we leave out the time steps of non-peak times
|
||||
from the calculations.
|
||||
For this analysis, we define all time steps that occur at lunch (i.e., noon to
|
||||
2 pm) and dinner time (i.e., 6 pm to 8 pm) as peak.
|
||||
As time steps in non-peak times typically average no or very low order counts,
|
||||
a UDP may choose to not actively forecast these at all and be rather
|
||||
interested in the accuracies of forecasting methods during peaks only.
|
||||
|
||||
We conjecture that percentage error measures may be usable for UDPs facing a
|
||||
higher overall demand with no intra-day down-times in between but have to
|
||||
leave that to a future study.
|
||||
Yet, even with high and steady demand, divide-by-zero errors are likely to
|
||||
occur.
|
||||
76
tex/3_mod/6_decomp.tex
Normal file
76
tex/3_mod/6_decomp.tex
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
\subsection{Time Series Decomposition}
|
||||
\label{decomp}
|
||||
|
||||
Concerning the time table in Figure \ref{f:timetable}, a seasonal demand
|
||||
pattern is inherent to both horizontal and vertical time series.
|
||||
First, the weekday influences if people eat out or order in with our partner
|
||||
receiving more orders on Thursday through Saturday than the other four
|
||||
days.
|
||||
This pattern is part of both types of time series.
|
||||
Second, on any given day, demand peaks occur around lunch and dinner times.
|
||||
This only regards vertical series.
|
||||
Statistical analyses show that horizontally sliced time series indeed exhibit
|
||||
a periodicity of $k=7$, and vertically sliced series only yield a seasonal
|
||||
component with a regular pattern if the periodicity is set to the product
|
||||
of the number of weekdays and the daily time steps indicating a distinct
|
||||
intra-day pattern per weekday.
|
||||
|
||||
Figure \ref{f:stl} shows three exemplary STL decompositions for a
|
||||
$1~\text{km}^2$ pixel and a vertical time series with 60-minute time steps
|
||||
(on the x-axis) covering four weeks:
|
||||
With the noisy raw data $y_t$ on the left, the seasonal and trend components,
|
||||
$s_t$ and $t_t$, are depicted in light and dark gray for increasing $ns$
|
||||
parameters.
|
||||
The plots include (seasonal) na\"{i}ve forecasts for the subsequent test day
|
||||
as dotted lines.
|
||||
The remainder components $r_t$ are not shown for conciseness.
|
||||
The periodicity is set to $k = 7 * 12 = 84$ as our industry partner has $12$
|
||||
opening hours per day.
|
||||
|
||||
\begin{center}
|
||||
\captionof{figure}{STL decompositions for a medium-demand pixel with hourly
|
||||
time steps and periodicity $k=84$}
|
||||
\label{f:stl}
|
||||
\includegraphics[width=.95\linewidth]{static/stl_gray.png}
|
||||
\end{center}
|
||||
|
||||
As described in Sub-section \ref{stl}, with $k$ being implied by the
|
||||
application, at the very least, the length of the seasonal smoothing
|
||||
window, represented by the $ns$ parameter, must be calibrated by the
|
||||
forecaster:
|
||||
It controls how many past observations go into each smoothened $s_t$.
|
||||
Many practitioners, however, skip this step and set $ns$ to a big number, for
|
||||
example, $999$, then referred to as "periodic."
|
||||
For the other parameters, it is common to use the default values as
|
||||
specified in \cite{cleveland1990}.
|
||||
The goal is to find a decomposition with a regular pattern in $s_t$.
|
||||
In Figure \ref{f:stl}, this is not true for $ns=7$ where, for
|
||||
example, the four largest bars corresponding to the same time of day a
|
||||
week apart cannot be connected by an approximately straight line.
|
||||
On the contrary, a regular pattern in the most extreme way exists for
|
||||
$ns=999$, where the same four largest bars are of the same height.
|
||||
This observation holds for each time step of the day.
|
||||
For $ns=11$, $s_t$ exhibits a regular pattern whose bars adapt over time:
|
||||
The pattern is regular as bars corresponding to the same time of day can be
|
||||
connected by approximately straight lines, and it is adaptive as these
|
||||
lines are not horizontal.
|
||||
The trade-off between small and large values for $ns$ can thus be interpreted
|
||||
as allowing the average demand during peak times to change over time:
|
||||
If demand is intermittent at non-peak times, it is reasonable to expect the
|
||||
bars to change over time as only the relative differences between peak and
|
||||
non-peak times impact the bars' heights with the seasonal component being
|
||||
centered around $0$.
|
||||
To confirm the goodness of a decomposition statistically, one way is to verify
|
||||
that $r_t$ can be modeled as a typical error process like white noise
|
||||
$\epsilon_t$.
|
||||
|
||||
However, we suggest an alternative way of calibrating the STL method in an
|
||||
automated fashion based on our unified CV approach.
|
||||
As hinted at in Figure \ref{f:stl}, we interpret an STL decomposition as a
|
||||
forecasting method on its own by just adding the (seasonal) na\"{i}ve
|
||||
forecasts for $s_t$ and $t_t$ and predicting $0$ for $r_t$.
|
||||
Then, the $ns$ parameter is tuned just like a parameter for an ML model.
|
||||
To the best of our knowledge, this has not yet been proposed before.
|
||||
Conceptually, forecasting with the STL method can be viewed as a na\"{i}ve
|
||||
method with built-in smoothing, and it outperformed all other
|
||||
benchmark methods in all cases.
|
||||
20
tex/3_mod/7_models/1_intro.tex
Normal file
20
tex/3_mod/7_models/1_intro.tex
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
\subsection{Forecasting Models}
|
||||
\label{models}
|
||||
|
||||
This sub-section describes the concrete models in our study.
|
||||
Figure \ref{f:inputs} shows how we classify them into four families with
|
||||
regard to the type of the time series, horizontal or vertical, and the
|
||||
moment at which a model is trained:
|
||||
Solid lines indicate that the corresponding time steps lie before the
|
||||
training, and dotted lines show the time horizon predicted by a model.
|
||||
For conciseness, we only show the forecasts for one test day.
|
||||
The setup is the same for each inner validation day.
|
||||
|
||||
\
|
||||
|
||||
\begin{center}
|
||||
\captionof{figure}{Classification of the models by input type and training
|
||||
moment}
|
||||
\label{f:inputs}
|
||||
\includegraphics[width=.95\linewidth]{static/model_inputs_gray.png}
|
||||
\end{center}
|
||||
42
tex/3_mod/7_models/2_hori.tex
Normal file
42
tex/3_mod/7_models/2_hori.tex
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
\subsubsection{Horizontal and Whole-day-ahead Forecasts.}
|
||||
\label{hori}
|
||||
|
||||
The upper-left in Figure \ref{f:inputs} illustrates the simplest way to
|
||||
generate forecasts for a test day before it has started:
|
||||
For each time of the day, the corresponding horizontal slice becomes the input
|
||||
for a model.
|
||||
With whole days being the unified time interval, each model is trained $H$
|
||||
times, providing a one-step-ahead forecast.
|
||||
While it is possible to have models of a different type be selected per time
|
||||
step, that did not improve the accuracy in the empirical study.
|
||||
As the models in this family do not include the test day's demand data in
|
||||
their training sets, we see them as benchmarks to answer \textbf{Q4},
|
||||
checking if a UDP can take advantage of real-time information.
|
||||
The models in this family are as follows; we use prefixes, such as "h" here,
|
||||
when methods are applied in other families as well:
|
||||
\begin{enumerate}
|
||||
\item \textit{\gls{naive}}:
|
||||
Observation from the same time step one week prior
|
||||
\item \textit{\gls{trivial}}:
|
||||
Predict $0$ for all time steps
|
||||
\item \textit{\gls{hcroston}}:
|
||||
Intermittent demand method introduced by \cite{croston1972}
|
||||
\item \textit{\gls{hholt}},
|
||||
\textit{\gls{hhwinters}},
|
||||
\textit{\gls{hses}},
|
||||
\textit{\gls{hsma}}, and
|
||||
\textit{\gls{htheta}}:
|
||||
Exponential smoothing without calibration
|
||||
\item \textit{\gls{hets}}:
|
||||
ETS calibrated as described by \cite{hyndman2008b}
|
||||
\item \textit{\gls{harima}}:
|
||||
ARIMA calibrated as described by \cite{hyndman2008a}
|
||||
\end{enumerate}
|
||||
\textit{naive} and \textit{trivial} provide an absolute benchmark for the
|
||||
actual forecasting methods.
|
||||
\textit{hcroston} is often mentioned in the context of intermittent demand;
|
||||
however, the method did not perform well at all.
|
||||
Besides \textit{hhwinters} that always fits a seasonal component, the
|
||||
calibration heuristics behind \textit{hets} and \textit{harima} may do so
|
||||
as well.
|
||||
With $k=7$, an STL decomposition is unnecessary here.
|
||||
39
tex/3_mod/7_models/3_vert.tex
Normal file
39
tex/3_mod/7_models/3_vert.tex
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
\subsubsection{Vertical and Whole-day-ahead Forecasts without Retraining.}
|
||||
\label{vert}
|
||||
|
||||
The upper-right in Figure \ref{f:inputs} shows an alternative way to
|
||||
generate forecasts for a test day before it has started:
|
||||
First, a seasonally-adjusted time series $a_t$ is obtained from a vertical
|
||||
time series by STL decomposition.
|
||||
Then, the actual forecasting model, trained on $a_t$, makes an $H$-step-ahead
|
||||
prediction.
|
||||
Lastly, we add the $H$ seasonal na\"{i}ve forecasts for the seasonal component
|
||||
$s_t$ to them to obtain the actual predictions for the test day.
|
||||
Thus, only one training is required per model type, and no real-time data is
|
||||
used.
|
||||
By decomposing the raw time series, all long-term patterns are assumed to be
|
||||
in the seasonal component $s_t$, and $a_t$ only contains the level with
|
||||
a potential trend and auto-correlations.
|
||||
The models in this family are:
|
||||
\begin{enumerate}
|
||||
\item \textit{\gls{fnaive}},
|
||||
\textit{\gls{pnaive}}:
|
||||
Sum of STL's trend and seasonal components' na\"{i}ve forecasts
|
||||
\item \textit{\gls{vholt}},
|
||||
\textit{\gls{vses}}, and
|
||||
\textit{\gls{vtheta}}:
|
||||
Exponential smoothing without calibration and seasonal
|
||||
fit
|
||||
\item \textit{\gls{vets}}:
|
||||
ETS calibrated as described by \cite{hyndman2008b}
|
||||
\item \textit{\gls{varima}}:
|
||||
ARIMA calibrated as described by \cite{hyndman2008a}
|
||||
\end{enumerate}
|
||||
As mentioned in Sub-section \ref{unified_cv}, we include the sum of the
|
||||
(seasonal) na\"{i}ve forecasts of the STL's trend and seasonal components
|
||||
as forecasts on their own:
|
||||
For \textit{fnaive}, we tune the "flexible" $ns$ parameter, and for
|
||||
\textit{pnaive}, we set it to a "periodic" value.
|
||||
Thus, we implicitly assume that there is no signal in the remainder $r_t$, and
|
||||
predict $0$ for it.
|
||||
\textit{fnaive} and \textit{pnaive} are two more simple benchmarks.
|
||||
22
tex/3_mod/7_models/4_rt.tex
Normal file
22
tex/3_mod/7_models/4_rt.tex
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
\subsubsection{Vertical and Real-time Forecasts with Retraining.}
|
||||
\label{rt}
|
||||
|
||||
The lower-left in Figure \ref{f:inputs} shows how models trained on vertical
|
||||
time series are extended with real-time order data as it becomes available
|
||||
during a test day:
|
||||
Instead of obtaining an $H$-step-ahead forecast, we retrain a model after
|
||||
every time step and only predict one step.
|
||||
The remainder is as in the previous sub-section, and the models are:
|
||||
\begin{enumerate}
|
||||
\item \textit{\gls{rtholt}},
|
||||
\textit{\gls{rtses}}, and
|
||||
\textit{\gls{rttheta}}:
|
||||
Exponential smoothing without calibration and seasonal fit
|
||||
\item \textit{\gls{rtets}}:
|
||||
ETS calibrated as described by \cite{hyndman2008b}
|
||||
\item \textit{\gls{rtarima}}:
|
||||
ARIMA calibrated as described by \cite{hyndman2008a}
|
||||
\end{enumerate}
|
||||
Retraining \textit{fnaive} and \textit{pnaive} did not increase accuracy, and
|
||||
thus we left them out.
|
||||
A downside of this family is the significant increase in computing costs.
|
||||
54
tex/3_mod/7_models/5_ml.tex
Normal file
54
tex/3_mod/7_models/5_ml.tex
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
\subsubsection{Vertical and Real-time Forecasts without Retraining.}
|
||||
\label{ml_models}
|
||||
|
||||
The lower-right in Figure \ref{f:inputs} shows how ML models take
|
||||
real-time order data into account without retraining.
|
||||
Based on the seasonally-adjusted time series $a_t$, we employ the feature
|
||||
matrix and label vector representations from Sub-section \ref{learning}
|
||||
and set $n$ to the number of daily time steps, $H$, to cover all potential
|
||||
auto-correlations.
|
||||
The ML models are trained once before a test day starts.
|
||||
For training, the matrix and vector are populated such that $y_T$ is set to
|
||||
the last time step of the day before the forecasts, $a_T$.
|
||||
As the splitting during CV is done with whole days, the \gls{ml} models are
|
||||
trained with training sets consisting of samples from all times of a day
|
||||
in an equal manner.
|
||||
Thus, the ML models learn to predict each time of the day.
|
||||
For prediction on a test day, the $H$ observations preceding the time
|
||||
step to be forecast are used as the input vector after seasonal
|
||||
adjustment.
|
||||
As a result, real-time data are included.
|
||||
The models in this family are:
|
||||
\begin{enumerate}
|
||||
\item \textit{\gls{vrfr}}: RF trained on the matrix as described
|
||||
\item \textit{\gls{vsvr}}: SVR trained on the matrix as described
|
||||
\end{enumerate}
|
||||
We tried other ML models such as gradient boosting machines but found
|
||||
only RFs and SVRs to perform well in our study.
|
||||
In the case of gradient boosting machines, this is to be expected as they are
|
||||
known not to perform well in the presence of high noise - as is natural
|
||||
with low count data - as shown, for example, by \cite{ma2018} or
|
||||
\cite{mason2000}.
|
||||
Also, deep learning methods are not applicable as the feature matrices only
|
||||
consist of several hundred to thousands of rows (cf., Sub-section
|
||||
\ref{params}).
|
||||
In \ref{tabular_ml_models}, we provide an alternative feature matrix
|
||||
representation that exploits the two-dimensional structure of time tables
|
||||
without decomposing the time series.
|
||||
In \ref{enhanced_feats}, we show how feature matrices are extended
|
||||
to include predictors other than historical order data.
|
||||
However, to answer \textbf{Q5} already here, none of the external data sources
|
||||
improves the results in our study.
|
||||
Due to the high number of time series in our study, to investigate why
|
||||
no external sources improve the forecasts, we must us some automated
|
||||
approach to analyzing individual time series.
|
||||
\cite{barbour2014} provide a spectral density estimation approach, called
|
||||
the Shannon entropy, that measures the signal-to-noise ratio in a
|
||||
database with a number normalized between 0 and 1 where lower values
|
||||
indicate a higher signal-to-noise ratio.
|
||||
We then looked at averages of the estimates on a daily level per pixel and
|
||||
find that including any of the external data sources from
|
||||
\ref{enhanced_feats} always leads to significantly lower signal-to-noise
|
||||
ratios.
|
||||
Thus, we conclude that at least for the demand faced by our industry partner
|
||||
the historical data contains all of the signal.
|
||||
Loading…
Add table
Add a link
Reference in a new issue