Merge branch 'study-section' into develop
This commit is contained in:
commit
9f282b3923
12 changed files with 544 additions and 12 deletions
BIN
paper.pdf
BIN
paper.pdf
Binary file not shown.
11
paper.tex
11
paper.tex
|
@ -30,8 +30,13 @@
|
||||||
\input{tex/3_mod/7_models/4_rt}
|
\input{tex/3_mod/7_models/4_rt}
|
||||||
\input{tex/3_mod/7_models/5_ml}
|
\input{tex/3_mod/7_models/5_ml}
|
||||||
\input{tex/4_stu/1_intro}
|
\input{tex/4_stu/1_intro}
|
||||||
|
\input{tex/4_stu/2_data}
|
||||||
|
\input{tex/4_stu/3_params}
|
||||||
|
\input{tex/4_stu/4_overall}
|
||||||
|
\input{tex/4_stu/5_training}
|
||||||
|
\input{tex/4_stu/6_fams}
|
||||||
|
\input{tex/4_stu/7_pixels_intervals}
|
||||||
\input{tex/5_con/1_intro}
|
\input{tex/5_con/1_intro}
|
||||||
|
|
||||||
\newpage
|
\newpage
|
||||||
|
|
||||||
\input{tex/glossary}
|
\input{tex/glossary}
|
||||||
|
@ -43,6 +48,10 @@
|
||||||
\newpage
|
\newpage
|
||||||
\input{tex/apx/enhanced_feats}
|
\input{tex/apx/enhanced_feats}
|
||||||
\newpage
|
\newpage
|
||||||
|
\input{tex/apx/case_study}
|
||||||
|
\newpage
|
||||||
|
\input{tex/apx/peak_results}
|
||||||
|
\newpage
|
||||||
|
|
||||||
\bibliographystyle{static/elsarticle-harv}
|
\bibliographystyle{static/elsarticle-harv}
|
||||||
\bibliography{tex/references}
|
\bibliography{tex/references}
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
\section{Empirical Study: A Meal Delivery Platform in Europe}
|
\section{Empirical Study: A Meal Delivery Platform in Europe}
|
||||||
\label{stu}
|
\label{stu}
|
||||||
|
|
||||||
% temporary placeholder
|
In the following, we first give a brief overview of the case study dataset
|
||||||
\label{params}
|
and the parameters we applied to calibrate the time series generation.
|
||||||
|
Then, we discuss the overall results.
|
||||||
|
|
23
tex/4_stu/2_data.tex
Normal file
23
tex/4_stu/2_data.tex
Normal file
|
@ -0,0 +1,23 @@
|
||||||
|
\subsection{Case Study Dataset}
|
||||||
|
\label{data}
|
||||||
|
|
||||||
|
The studied dataset consists of a meal delivery platform's entire
|
||||||
|
transactional data covering the French market from launch in February of
|
||||||
|
2016 to January of 2017.
|
||||||
|
The platform operated in five cities throughout this period and received a
|
||||||
|
total of 686,385 orders.
|
||||||
|
The forecasting models were developed based on the data from Lyon and Paris in
|
||||||
|
the period from August through December; this ensures comparability across
|
||||||
|
cities and avoids irregularities in demand assumed for a new service
|
||||||
|
within the first operating weeks.
|
||||||
|
The data exhibit a steady-state as the UDP's service area remained
|
||||||
|
unchanged, and the numbers of orders and of couriers grew in lock-step and
|
||||||
|
organically.
|
||||||
|
This does not mean that no new restaurants were openend: If that happened, the
|
||||||
|
new restaurant did not attract new customers, but demand was shifted from
|
||||||
|
other member restaurants.
|
||||||
|
Results are similar in both cities, and we only report them for Paris for
|
||||||
|
greater conciseness.
|
||||||
|
Lastly, the platform recorded all incoming orders, and lost demand does not
|
||||||
|
exist.
|
||||||
|
See \ref{dataset} for details on the raw data.
|
37
tex/4_stu/3_params.tex
Normal file
37
tex/4_stu/3_params.tex
Normal file
|
@ -0,0 +1,37 @@
|
||||||
|
\subsection{Calibration of the Time Series Generation Process}
|
||||||
|
\label{params}
|
||||||
|
|
||||||
|
Independent of the concrete forecasting models, the time series generation
|
||||||
|
must be calibrated.
|
||||||
|
We concentrate our forecasts on the pickup side for two reasons.
|
||||||
|
First, the restaurants come in a significantly lower number than the
|
||||||
|
customers resulting in more aggregation in the order counts and thus a
|
||||||
|
better pattern recognition.
|
||||||
|
Second, from an operational point of view, forecasts for the pickups are more
|
||||||
|
valuable because of the waiting times due to meal preparation.
|
||||||
|
We choose pixel sizes of $0.5~\text{km}^2$, $1~\text{km}^2$, $2~\text{km}^2$,
|
||||||
|
and $4~\text{km}^2$, and time steps covering 60, 90, and 120 minute windows
|
||||||
|
resulting in $H_{60}=12$, $H_{90}=9$, and $H_{120}=6$ time steps per day
|
||||||
|
with the platform operating between 11 a.m. and 11 p.m. and corresponding
|
||||||
|
frequencies $k_{60}=7*12=84$, $k_{90}=7*9=63$, and $k_{120}=7*6=42$ for the
|
||||||
|
vertical time series.
|
||||||
|
Smaller pixels and shorter time steps yield no recognizable patterns, yet would
|
||||||
|
have been more beneficial for tactical routing.
|
||||||
|
90 and 120 minute time steps are most likely not desirable for routing; however,
|
||||||
|
we keep them for comparison and note that a UDP may employ such forecasts
|
||||||
|
to activate more couriers at short notice if a (too) high demand is
|
||||||
|
forecasted in an hour from now.
|
||||||
|
This could, for example, be implemented by paying couriers a premium if they
|
||||||
|
show up for work at short notice.
|
||||||
|
Discrete lengths of 3, 4, 5, 6, 7, and 8 weeks are chosen as training
|
||||||
|
horizons.
|
||||||
|
We do so as the structure within the pixels (i.e., number and kind of
|
||||||
|
restaurants) is not stable for more than two months in a row in the
|
||||||
|
covered horizon.
|
||||||
|
That is confirmed by the empirical finding that forecasting accuracy
|
||||||
|
improves with longer training horizon but this effect starts to
|
||||||
|
level off after about six to seven weeks.
|
||||||
|
So, the demand patterns of more than two months ago do not resemble more
|
||||||
|
recent ones.
|
||||||
|
|
||||||
|
In total, 100,000s of distinct time series are forecast in the study.
|
238
tex/4_stu/4_overall.tex
Normal file
238
tex/4_stu/4_overall.tex
Normal file
|
@ -0,0 +1,238 @@
|
||||||
|
\subsection{Overall Results}
|
||||||
|
\label{overall_results}
|
||||||
|
|
||||||
|
Table \ref{t:results} summarizes the overall best-performing models grouped by
|
||||||
|
training horizon and a pixel's average daily demand (\gls{add}) for a
|
||||||
|
pixel size of $1~\text{km}^2$ and 60-minute time steps.
|
||||||
|
Each combination of pixel and test day counts as one case, and the total
|
||||||
|
number of cases is denoted as $n$.
|
||||||
|
Clustering the individual results revealed that a pixel's ADD over the
|
||||||
|
training horizon is the primary indicator of similarity and three to four
|
||||||
|
clusters suffice to obtain cohesive clusters:
|
||||||
|
We labeled them "no", "low", "medium", and "high" demand pixels with
|
||||||
|
increasing ADD, and present the average MASE per cluster.
|
||||||
|
The $n$ do not vary significantly across the training horizons, which confirms
|
||||||
|
that the platform did not grow area-wise and is indeed in a steady-state.
|
||||||
|
We use this table to answer \textbf{Q1} regarding the overall best methods
|
||||||
|
under different ADDs.
|
||||||
|
All result tables in the main text report MASEs calculated with all time
|
||||||
|
steps of a day.
|
||||||
|
In contrast, \ref{peak_results} shows the same tables with MASEs calculated
|
||||||
|
with time steps within peak times only (i.e., lunch from 12 pm to 2 pm and
|
||||||
|
dinner from 6 pm to 8 pm).
|
||||||
|
The differences lie mainly in the decimals of the individual MASE
|
||||||
|
averages while the ranks of the forecasting methods do not change except
|
||||||
|
in rare cases.
|
||||||
|
That shows that the presented accuracies are driven by the forecasting methods'
|
||||||
|
accuracies at peak times.
|
||||||
|
Intuitively, they all correctly predict zero demand for non-peak times.
|
||||||
|
|
||||||
|
Unsurprisingly, the best model for pixels without demand (i.e.,
|
||||||
|
$0 < \text{ADD} < 2.5$) is \textit{trivial}.
|
||||||
|
Whereas \textit{hsma} also adapts well, its performance is worse.
|
||||||
|
None of the more sophisticated models reaches a similar accuracy.
|
||||||
|
The intuition behind is that \textit{trivial} is the least distorted by the
|
||||||
|
relatively large proportion of noise given the low-count nature of the
|
||||||
|
time series.
|
||||||
|
|
||||||
|
For low demand (i.e., $2.5 < \text{ADD} < 10$), there is also a clear
|
||||||
|
best-performing model, namely \textit{hsma}.
|
||||||
|
As the non-seasonal \textit{hses} reaches a similar accuracy as its
|
||||||
|
potentially seasonal generalization, the \textit{hets}, we conclude that
|
||||||
|
the seasonal pattern from weekdays is not yet strong enough to be
|
||||||
|
recognized in low demand pixels.
|
||||||
|
So, in the absence of seasonality, models that only model a trend part are
|
||||||
|
the least susceptible to the noise.
|
||||||
|
|
||||||
|
For medium demand (i.e., $10 < \text{ADD} < 25$) and training horizons up to
|
||||||
|
six weeks, the best-performing models are the same as for low demand.
|
||||||
|
For longer horizons, \textit{hets} provides the highest accuracy.
|
||||||
|
Thus, to fit a seasonal pattern, longer training horizons are needed.
|
||||||
|
While \textit{vsvr} enters the top three, \textit{hets} has the edge as they
|
||||||
|
neither require parameter tuning nor real-time data.
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\captionof{table}{Top-3 models by training weeks and average demand
|
||||||
|
($1~\text{km}^2$ pixel size, 60-minute time steps)}
|
||||||
|
\label{t:results}
|
||||||
|
\begin{tabular}{|c|c|*{12}{c|}}
|
||||||
|
|
||||||
|
\hline
|
||||||
|
\multirow{3}{*}{\rotatebox{90}{\thead{Training}}}
|
||||||
|
& \multirow{3}{*}{\rotatebox{90}{\thead{Rank}}}
|
||||||
|
& \multicolumn{3}{c|}{\thead{No Demand}}
|
||||||
|
& \multicolumn{3}{c|}{\thead{Low Demand}}
|
||||||
|
& \multicolumn{3}{c|}{\thead{Medium Demand}}
|
||||||
|
& \multicolumn{3}{c|}{\thead{High Demand}} \\
|
||||||
|
~ & ~
|
||||||
|
& \multicolumn{3}{c|}{(0 - 2.5)}
|
||||||
|
& \multicolumn{3}{c|}{(2.5 - 10)}
|
||||||
|
& \multicolumn{3}{c|}{(10 - 25)}
|
||||||
|
& \multicolumn{3}{c|}{(25 - $\infty$)} \\
|
||||||
|
\cline{3-14}
|
||||||
|
~ & ~
|
||||||
|
& Method & MASE & $n$
|
||||||
|
& Method & MASE & $n$
|
||||||
|
& Method & MASE & $n$
|
||||||
|
& Method & MASE & $n$ \\
|
||||||
|
|
||||||
|
\hline \hline
|
||||||
|
\multirow{3}{*}{3} & 1
|
||||||
|
& \textbf{\textit{trivial}}
|
||||||
|
& 0.785 & \multirow{3}{*}{\rotatebox{90}{4586}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.819 & \multirow{3}{*}{\rotatebox{90}{2975}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.839 & \multirow{3}{*}{\rotatebox{90}{2743}}
|
||||||
|
& \textbf{\textit{rtarima}}
|
||||||
|
& 0.872 & \multirow{3}{*}{\rotatebox{90}{2018}} \\
|
||||||
|
~ & 2
|
||||||
|
& \textit{hsma} & 0.809 & ~
|
||||||
|
& \textit{hses} & 0.844 & ~
|
||||||
|
& \textit{hses} & 0.858 & ~
|
||||||
|
& \textit{rtses} & 0.873 & ~ \\
|
||||||
|
~ & 3
|
||||||
|
& \textit{pnaive} & 0.958 & ~
|
||||||
|
& \textit{hets} & 0.846 & ~
|
||||||
|
& \textit{hets} & 0.859 & ~
|
||||||
|
& \textit{rtets} & 0.877 & ~ \\
|
||||||
|
|
||||||
|
\hline
|
||||||
|
\multirow{3}{*}{4} & 1
|
||||||
|
& \textbf{\textit{trivial}}
|
||||||
|
& 0.770 & \multirow{3}{*}{\rotatebox{90}{4532}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.825 & \multirow{3}{*}{\rotatebox{90}{3033}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.837 & \multirow{3}{*}{\rotatebox{90}{2687}}
|
||||||
|
& \textbf{\textit{vrfr}}
|
||||||
|
& 0.855 & \multirow{3}{*}{\rotatebox{90}{2016}} \\
|
||||||
|
~ & 2
|
||||||
|
& \textit{hsma} & 0.788 & ~
|
||||||
|
& \textit{hses} & 0.848 & ~
|
||||||
|
& \textit{hses} & 0.850 & ~
|
||||||
|
& \textbf{\textit{rtarima}} & 0.855 & ~ \\
|
||||||
|
~ & 3
|
||||||
|
& \textit{pnaive} & 0.917 & ~
|
||||||
|
& \textit{hets} & 0.851 & ~
|
||||||
|
& \textit{hets} & 0.854 & ~
|
||||||
|
& \textit{rtses} & 0.860 & ~ \\
|
||||||
|
|
||||||
|
\hline
|
||||||
|
\multirow{3}{*}{5} & 1
|
||||||
|
& \textbf{\textit{trivial}}
|
||||||
|
& 0.780 & \multirow{3}{*}{\rotatebox{90}{4527}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.841 & \multirow{3}{*}{\rotatebox{90}{3055}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.837 & \multirow{3}{*}{\rotatebox{90}{2662}}
|
||||||
|
& \textbf{\textit{vrfr}}
|
||||||
|
& 0.850 & \multirow{3}{*}{\rotatebox{90}{2019}} \\
|
||||||
|
~ & 2
|
||||||
|
& \textit{hsma} & 0.803 & ~
|
||||||
|
& \textit{hses} & 0.859 & ~
|
||||||
|
& \textit{hets} & 0.845 & ~
|
||||||
|
& \textbf{\textit{rtarima}} & 0.852 & ~ \\
|
||||||
|
~ & 3
|
||||||
|
& \textit{pnaive} & 0.889 & ~
|
||||||
|
& \textit{hets} & 0.861 & ~
|
||||||
|
& \textit{hses} & 0.845 & ~
|
||||||
|
& \textit{vsvr} & 0.854 & ~ \\
|
||||||
|
|
||||||
|
\hline
|
||||||
|
\multirow{3}{*}{6} & 1
|
||||||
|
& \textbf{\textit{trivial}}
|
||||||
|
& 0.741 & \multirow{3}{*}{\rotatebox{90}{4470}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.847 & \multirow{3}{*}{\rotatebox{90}{3086}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.840 & \multirow{3}{*}{\rotatebox{90}{2625}}
|
||||||
|
& \textbf{\textit{vrfr}}
|
||||||
|
& 0.842 & \multirow{3}{*}{\rotatebox{90}{2025}} \\
|
||||||
|
~ & 2
|
||||||
|
& \textit{hsma} & 0.766 & ~
|
||||||
|
& \textit{hses} & 0.863 & ~
|
||||||
|
& \textit{hets} & 0.842 & ~
|
||||||
|
& \textbf{\textit{hets}} & 0.847 & ~ \\
|
||||||
|
~ & 3
|
||||||
|
& \textit{pnaive} & 0.837 & ~
|
||||||
|
& \textit{hets} & 0.865 & ~
|
||||||
|
& \textit{hses} & 0.848 & ~
|
||||||
|
& \textit{vsvr} & 0.848 & ~ \\
|
||||||
|
|
||||||
|
\hline
|
||||||
|
\multirow{3}{*}{7} & 1
|
||||||
|
& \textbf{\textit{trivial}}
|
||||||
|
& 0.730 & \multirow{3}{*}{\rotatebox{90}{4454}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.858 & \multirow{3}{*}{\rotatebox{90}{3132}}
|
||||||
|
& \textbf{\textit{hets}}
|
||||||
|
& 0.845 & \multirow{3}{*}{\rotatebox{90}{2597}}
|
||||||
|
& \textbf{\textit{hets}}
|
||||||
|
& 0.840 & \multirow{3}{*}{\rotatebox{90}{2007}} \\
|
||||||
|
~ & 2
|
||||||
|
& \textit{hsma} & 0.754 & ~
|
||||||
|
& \textit{hses} & 0.871 & ~
|
||||||
|
& \textit{hsma} & 0.847 & ~
|
||||||
|
& \textbf{\textit{vrfr}} & 0.845 & ~ \\
|
||||||
|
~ & 3
|
||||||
|
& \textit{pnaive} & 0.813 & ~
|
||||||
|
& \textit{hets} & 0.872 & ~
|
||||||
|
& \textbf{\textit{vsvr}} & 0.850 & ~
|
||||||
|
& \textit{vsvr} & 0.847 & ~ \\
|
||||||
|
|
||||||
|
\hline
|
||||||
|
\multirow{3}{*}{8} & 1
|
||||||
|
& \textbf{\textit{trivial}}
|
||||||
|
& 0.735 & \multirow{3}{*}{\rotatebox{90}{4402}}
|
||||||
|
& \textbf{\textit{hsma}}
|
||||||
|
& 0.867 & \multirow{3}{*}{\rotatebox{90}{3159}}
|
||||||
|
& \textbf{\textit{hets}}
|
||||||
|
& 0.846 & \multirow{3}{*}{\rotatebox{90}{2575}}
|
||||||
|
& \textbf{\textit{hets}}
|
||||||
|
& 0.836 & \multirow{3}{*}{\rotatebox{90}{2002}} \\
|
||||||
|
~ & 2
|
||||||
|
& \textit{hsma} & 0.758 & ~
|
||||||
|
& \textit{hets} & 0.877 & ~
|
||||||
|
& \textbf{\textit{vsvr}} & 0.850 & ~
|
||||||
|
& \textbf{\textit{vrfr}} & 0.842 & ~ \\
|
||||||
|
~ & 3
|
||||||
|
& \textit{pnaive} & 0.811 & ~
|
||||||
|
& \textit{hses} & 0.880 & ~
|
||||||
|
& \textit{hsma} & 0.851 & ~
|
||||||
|
& \textit{vsvr} & 0.849 & ~ \\
|
||||||
|
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
In summary, except for high demand, simple models trained on horizontal time
|
||||||
|
series work best.
|
||||||
|
By contrast, high demand (i.e., $25 < \text{ADD} < \infty$) and less than
|
||||||
|
six training weeks is the only situation where classical models trained on
|
||||||
|
vertical time series work well.
|
||||||
|
Then, \textit{rtarima} outperforms their siblings from Sub-sections
|
||||||
|
\ref{vert} and \ref{rt}.
|
||||||
|
We conjecture that intra-day auto-correlations as caused, for example, by
|
||||||
|
weather, are the reason for that.
|
||||||
|
Intuitively, a certain amount of demand (i.e., a high enough signal-to-noise
|
||||||
|
ratio) is required such that models with auto-correlations can see them
|
||||||
|
through all the noise.
|
||||||
|
That idea is supported by \textit{vrfr} reaching a similar accuracy under
|
||||||
|
high demand as their tree-structure allows them to fit auto-correlations.
|
||||||
|
As both \textit{rtarima} and \textit{vrfr} incorporate recent demand,
|
||||||
|
real-time information can indeed improve accuracy.
|
||||||
|
However, once models are trained on longer horizons, \textit{hets} is more
|
||||||
|
accurate than \textit{vrfr}.
|
||||||
|
Thus, to answer \textbf{Q4}, we conclude that real-time information only
|
||||||
|
improves accuracy if three or four weeks of training material are
|
||||||
|
available.
|
||||||
|
|
||||||
|
In addition to looking at the results in tables covering the entire one-year
|
||||||
|
horizon, we also created sub-analyses on the distinct seasons spring,
|
||||||
|
summer (incl. the long holiday season in France), and fall.
|
||||||
|
Yet, none of the results portrayed in this and the subsequent sections change
|
||||||
|
is significant ways.
|
||||||
|
We conjecture that there could be differences if the overall demand of the UDP
|
||||||
|
increased to a scale beyond the one this case study covers and leave that
|
||||||
|
up to a follow-up study with a bigger UDP.
|
31
tex/4_stu/5_training.tex
Normal file
31
tex/4_stu/5_training.tex
Normal file
|
@ -0,0 +1,31 @@
|
||||||
|
\subsection{Impact of the Training Horizon}
|
||||||
|
\label{training}
|
||||||
|
|
||||||
|
Whereas it is reasonable to assume that forecasts become more accurate as the
|
||||||
|
training horizon expands, our study reveals some interesting findings.
|
||||||
|
First, without demand, \textit{trivial} indeed performs better with more
|
||||||
|
training material, but improved pattern recognition cannot be the cause
|
||||||
|
here.
|
||||||
|
Instead, we argue that the reason for this is that the longer there has been
|
||||||
|
no steady demand, the higher the chance that this will not change soon.
|
||||||
|
Further, if we focus on shorter training horizons, the sample will necessarily
|
||||||
|
contain cases where a pixel is initiated after a popular-to-be restaurant
|
||||||
|
joined the platform:
|
||||||
|
Demand grows fast making \textit{trivial} less accurate, and the pixel moves
|
||||||
|
to another cluster soon.
|
||||||
|
|
||||||
|
Second, with low demand, the best-performing \textit{hsma} becomes less
|
||||||
|
accurate with more training material.
|
||||||
|
While one could argue that this is due to \textit{hsma} not fitting a trend,
|
||||||
|
the less accurate \textit{hses} and \textit{hets} do fit a trend.
|
||||||
|
Instead, we argue that any low-demand time series naturally exhibits a high
|
||||||
|
noise-to-signal ratio, and \textit{hsma} is the least susceptible to
|
||||||
|
noise.
|
||||||
|
Then, to counter the missing trend term, the training horizon must be shorter.
|
||||||
|
|
||||||
|
With medium demand, a similar argument can be made; however, the
|
||||||
|
signal already becomes more apparent favoring \textit{hets} with more
|
||||||
|
training data.
|
||||||
|
|
||||||
|
Lastly, with high demand, the signal becomes so clear that more sophisticated
|
||||||
|
models can exploit longer training horizons.
|
162
tex/4_stu/6_fams.tex
Normal file
162
tex/4_stu/6_fams.tex
Normal file
|
@ -0,0 +1,162 @@
|
||||||
|
\subsection{Results by Model Families}
|
||||||
|
\label{fams}
|
||||||
|
|
||||||
|
Besides the overall results, we provide an in-depth comparison of models
|
||||||
|
within a family.
|
||||||
|
Instead of reporting the MASE per model, we rank the models holding the
|
||||||
|
training horizon fixed to make comparison easier.
|
||||||
|
Table \ref{t:hori} presents the models trained on horizontal time series.
|
||||||
|
In addition to \textit{naive}, we include \textit{fnaive} and \textit{pnaive}
|
||||||
|
already here as more competitive benchmarks.
|
||||||
|
The tables in this section report two rankings simultaneously:
|
||||||
|
The first number is the rank resulting from lumping the low and medium
|
||||||
|
clusters together, which yields almost the same rankings when analyzed
|
||||||
|
individually.
|
||||||
|
The ranks from only high demand pixels are in parentheses if they differ.
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\captionof{table}{Ranking of benchmark and horizontal models
|
||||||
|
($1~\text{km}^2$ pixel size, 60-minute time steps):
|
||||||
|
the table shows the ranks for cases with $2.5 < ADD < 25$
|
||||||
|
(and $25 < ADD < \infty$ in parentheses if they differ)}
|
||||||
|
\label{t:hori}
|
||||||
|
\begin{tabular}{|c|ccc|cccccccc|}
|
||||||
|
\hline
|
||||||
|
\multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
|
||||||
|
& \multicolumn{3}{c|}{\thead{Benchmarks}}
|
||||||
|
& \multicolumn{8}{c|}{\thead{Horizontal (whole-day-ahead)}} \\
|
||||||
|
\cline{2-12}
|
||||||
|
~ & \textit{naive} & \textit{fnaive} & \textit{paive}
|
||||||
|
& \textit{harima} & \textit{hcroston} & \textit{hets} & \textit{hholt}
|
||||||
|
& \textit{hhwinters} & \textit{hses} & \textit{hsma} & \textit{htheta} \\
|
||||||
|
\hline \hline
|
||||||
|
3 & 11 & 7 (2) & 8 (5) & 5 (7) & 4 & 3
|
||||||
|
& 9 (10) & 10 (9) & 2 (6) & 1 & 6 (8) \\
|
||||||
|
4 & 11 & 7 (2) & 8 (3) & 5 (6) & 4 (5) & 3 (1)
|
||||||
|
& 9 (10) & 10 (9) & 2 (7) & 1 (4) & 6 (8) \\
|
||||||
|
5 & 11 & 7 (2) & 8 (4) & 5 (3) & 4 (9) & 3 (1)
|
||||||
|
& 9 (10) & 10 (5) & 2 (8) & 1 (6) & 6 (7) \\
|
||||||
|
6 & 11 & 8 (5) & 9 (6) & 5 (4) & 4 (7) & 2 (1)
|
||||||
|
& 10 & 7 (2) & 3 (8) & 1 (9) & 6 (3) \\
|
||||||
|
7 & 11 & 8 (5) & 10 (6) & 5 (4) & 4 (7) & 2 (1)
|
||||||
|
& 9 (10) & 7 (2) & 3 (8) & 1 (9) & 6 (3) \\
|
||||||
|
8 & 11 & 9 (5) & 10 (6) & 5 (4) & 4 (7) & 2 (1)
|
||||||
|
& 8 (10) & 7 (2) & 3 (8) & 1 (9) & 6 (3) \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{center}
|
||||||
|
\
|
||||||
|
|
||||||
|
A first insight is that \textit{fnaive} is the best benchmark in all
|
||||||
|
scenarios:
|
||||||
|
Decomposing flexibly by tuning the $ns$ parameter is worth the computational
|
||||||
|
cost.
|
||||||
|
Further, if one is limited in the number of non-na\"{i}ve methods,
|
||||||
|
\textit{hets} is the best compromise and works well across all demand
|
||||||
|
levels.
|
||||||
|
It is also the best model independent of the training horizon for high demand.
|
||||||
|
With low or medium demand, \textit{hsma} is the clear overall winner; yet,
|
||||||
|
with high demand, models with a seasonal fit (i.e., \textit{harima},
|
||||||
|
\textit{hets}, and \textit{hhwinters}) are more accurate, in particular,
|
||||||
|
for longer training horizons.
|
||||||
|
This is due to demand patterns in the weekdays becoming stronger with higher
|
||||||
|
overall demand.
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\captionof{table}{Ranking of classical models on vertical time series
|
||||||
|
($1~\text{km}^2$ pixel size, 60-minute time steps):
|
||||||
|
the table shows the ranks for cases with $2.5 < ADD < 25$
|
||||||
|
(and $25 < ADD < \infty$ in parentheses if they differ)}
|
||||||
|
\label{t:vert}
|
||||||
|
\begin{tabular}{|c|cc|ccccc|ccccc|}
|
||||||
|
\hline
|
||||||
|
\multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
|
||||||
|
& \multicolumn{2}{c|}{\thead{Benchmarks}}
|
||||||
|
& \multicolumn{5}{c|}{\thead{Vertical (whole-day-ahead)}}
|
||||||
|
& \multicolumn{5}{c|}{\thead{Vertical (real-time)}} \\
|
||||||
|
\cline{2-13}
|
||||||
|
~ & \textit{hets} & \textit{hsma} & \textit{varima} & \textit{vets}
|
||||||
|
& \textit{vholt} & \textit{vses} & \textit{vtheta} & \textit{rtarima}
|
||||||
|
& \textit{rtets} & \textit{rtholt} & \textit{rtses} & \textit{rttheta} \\
|
||||||
|
\hline \hline
|
||||||
|
3 & 2 (10) & 1 (7) & 6 (4) & 8 (6) & 10 (9)
|
||||||
|
& 7 (5) & 11 (12) & 4 (1) & 5 (3) & 9 (8) & 3 (2) & 12 (11) \\
|
||||||
|
4 & 2 (8) & 1 (10) & 6 (4) & 8 (6) & 10 (9)
|
||||||
|
& 7 (5) & 12 (11) & 3 (1) & 5 (3) & 9 (7) & 4 (2) & 11 (12) \\
|
||||||
|
5 & 2 (3) & 1 (10) & 7 (5) & 8 (7) & 10 (9)
|
||||||
|
& 6 & 11 & 4 (1) & 5 (4) & 9 (8) & 3 (2) & 12 \\
|
||||||
|
6 & 2 (1) & 1 (10) & 6 (5) & 8 (7) & 10 (9)
|
||||||
|
& 7 (6) & 11 (12) & 3 (2) & 5 (4) & 9 (8) & 4 (3) & 12 (11) \\
|
||||||
|
7 & 2 (1) & 1 (10) & 8 (5) & 7 & 10 (9)
|
||||||
|
& 6 & 11 (12) & 5 (2) & 4 & 9 (8) & 3 & 12 (11) \\
|
||||||
|
8 & 2 (1) & 1 (9) & 8 (5) & 7 (6) & 10 (8)
|
||||||
|
& 6 & 12 (10) & 5 (2) & 4 & 9 (7) & 3 & 11 \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{center}
|
||||||
|
\
|
||||||
|
|
||||||
|
Table \ref{t:vert} extends the previous analysis to classical models trained
|
||||||
|
on vertical time series.
|
||||||
|
Now, the winners from before, \textit{hets} and \textit{hsma}, serve as
|
||||||
|
benchmarks.
|
||||||
|
Whereas for low and medium demand, no improvements can be obtained,
|
||||||
|
\textit{rtarima} and \textit{rtses} are the most accurate with high demand
|
||||||
|
and short training horizons.
|
||||||
|
For six or more training weeks, \textit{hets} is still optimal.
|
||||||
|
Independent of retraining and the demand level, the models' relative
|
||||||
|
performances are consistent:
|
||||||
|
The \textit{*arima} and \textit{*ses} models are best, followed by
|
||||||
|
\textit{*ets}, \textit{*holt}, and \textit{*theta}.
|
||||||
|
Thus, models that can deal with auto-correlations and short-term forecasting
|
||||||
|
errors, as expressed by moving averages, and that cannot be distracted by
|
||||||
|
trend terms are optimal for vertical series.
|
||||||
|
|
||||||
|
Finally, Table \ref{t:ml} compares the two ML-based models against the
|
||||||
|
best-performing classical models and answers \textbf{Q2}:
|
||||||
|
With low and medium demand, no improvements can be obtained again; however,
|
||||||
|
with high demand, \textit{vrfr} has the edge over \textit{rtarima} for
|
||||||
|
training horizons up to six weeks.
|
||||||
|
We conjecture that \textit{vrfr} fits auto-correlations better than
|
||||||
|
\textit{varima} and is not distracted by short-term noise as
|
||||||
|
\textit{rtarima} may be due to the retraining.
|
||||||
|
With seven or eight training weeks, \textit{hets} remains the overall winner.
|
||||||
|
Interestingly, \textit{vsvr} is more accurate than \textit{vrfr} for low and
|
||||||
|
medium demand.
|
||||||
|
We assume that \textit{vrfr} performs well only with strong auto-correlations,
|
||||||
|
which are not present with low and medium demand.
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\captionof{table}{Ranking of ML models on vertical time series
|
||||||
|
($1~\text{km}^2$ pixel size, 60-minute time steps):
|
||||||
|
the table shows the ranks for cases with $2.5 < ADD < 25$
|
||||||
|
(and $25 < ADD < \infty$ in parentheses if they differ)}
|
||||||
|
\label{t:ml}
|
||||||
|
\begin{tabular}{|c|cccc|cc|}
|
||||||
|
\hline
|
||||||
|
\multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
|
||||||
|
& \multicolumn{4}{c|}{\thead{Benchmarks}}
|
||||||
|
& \multicolumn{2}{c|}{\thead{ML}} \\
|
||||||
|
\cline{2-7}
|
||||||
|
~ & \textit{fnaive} & \textit{hets} & \textit{hsma}
|
||||||
|
& \textit{rtarima} & \textit{vrfr} & \textit{vsvr} \\
|
||||||
|
\hline \hline
|
||||||
|
3 & 6 & 2 (5) & 1 (4) & 3 (1) & 5 (2) & 4 (3) \\
|
||||||
|
4 & 6 (5) & 2 (4) & 1 (6) & 3 (2) & 5 (1) & 4 (3) \\
|
||||||
|
5 & 6 (5) & 2 (4) & 1 (6) & 4 (2) & 5 (1) & 3 \\
|
||||||
|
6 & 6 (5) & 2 & 1 (6) & 4 & 5 (1) & 3 \\
|
||||||
|
7 & 6 (5) & 2 (1) & 1 (6) & 4 & 5 (2) & 3 \\
|
||||||
|
8 & 6 (5) & 2 (1) & 1 (6) & 4 & 5 (2) & 3 \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{center}
|
||||||
|
\
|
||||||
|
|
||||||
|
Analogously, we created tables like Table \ref{t:hori} to \ref{t:ml} for the
|
||||||
|
forecasts with time steps of 90 and 120 minutes and find that the relative
|
||||||
|
rankings do not change significantly.
|
||||||
|
The same holds true for the rankings with changing pixel sizes.
|
||||||
|
For conciseness reasons, we do not include these additional tables in this
|
||||||
|
article.
|
||||||
|
In summary, the relative performances exhibited by certain model families
|
||||||
|
are shown to be rather stable in this case study.
|
27
tex/4_stu/7_pixels_intervals.tex
Normal file
27
tex/4_stu/7_pixels_intervals.tex
Normal file
|
@ -0,0 +1,27 @@
|
||||||
|
\subsection{Effects of the Pixel Size and Time Step Length}
|
||||||
|
\label{pixels_intervals}
|
||||||
|
|
||||||
|
As elaborated in Sub-section \ref{grid}, more order aggregation leads to a
|
||||||
|
higher overall demand level and an improved pattern recognition in the
|
||||||
|
generated time series.
|
||||||
|
Consequently, individual cases tend to move to the right in tables equivalent
|
||||||
|
to Table \ref{t:results}.
|
||||||
|
With the same $ADD$ clusters, forecasts for pixel sizes of $2~\text{km}^2$ and
|
||||||
|
$4~\text{km}^2$ or time intervals of 90 and 120 minutes or combinations
|
||||||
|
thereof yield results similar to the best models as revealed in Tables
|
||||||
|
\ref{t:results}, \ref{t:hori}, \ref{t:vert}, and \ref{t:ml} for high
|
||||||
|
demand.
|
||||||
|
By contrast, forecasts for $0.5~\text{km}^2$ pixels have most of the cases
|
||||||
|
(i.e., $n$) in the no or low demand clusters.
|
||||||
|
In that case, the pixels are too small, and pattern recognition becomes
|
||||||
|
harder.
|
||||||
|
While it is true, that \textit{trivial} exhibits the overall lowest MASE
|
||||||
|
for no demand cases, these forecasts become effectively worthless for
|
||||||
|
operations.
|
||||||
|
In the extreme, with even smaller pixels we would be forecasting $0$ orders
|
||||||
|
in all pixels for all time steps.
|
||||||
|
In summary, the best model and its accuracy are determined primarily by the
|
||||||
|
$ADD$, and the pixel size and interval length are merely parameters to
|
||||||
|
control that.
|
||||||
|
The forecaster's goal is to create a grid with small enough pixels without
|
||||||
|
losing a recognizable pattern.
|
|
@ -1,14 +1,11 @@
|
||||||
\section{Forecasting Accuracies during Peak Times}
|
\section{Forecasting Accuracies during Peak Times}
|
||||||
\label{peak_results}
|
\label{peak_results}
|
||||||
|
|
||||||
This appendix shows all result tables from the main text with the MASE
|
This appendix shows all tables from the main text
|
||||||
averages calculated from time steps within peak times.
|
with the MASE averages calculated from time steps within peak times
|
||||||
Peaks are the times of the day where the typical customer has a lunch or
|
that are defined to be from 12 pm to 2 pm (=lunch) or from 6 pm to 8 pm (=dinner).
|
||||||
dinner meal and defined to be either from 12 pm to 2 pm or from 6 pm to
|
While the exact decimals of the MASEs differ,
|
||||||
8 pm.
|
the relative ranks of the forecasting methods are the same except in rare cases.
|
||||||
While the exact decimals of the MASEs differ from the ones in the main
|
|
||||||
text, the relative ranks of the forecasting methods are the same except in
|
|
||||||
rare cases.
|
|
||||||
|
|
||||||
\begin{center}
|
\begin{center}
|
||||||
\captionof{table}{Top-3 models by training weeks and average demand
|
\captionof{table}{Top-3 models by training weeks and average demand
|
||||||
|
|
|
@ -1,4 +1,7 @@
|
||||||
% Abbreviations for technical terms.
|
% Abbreviations for technical terms.
|
||||||
|
\newglossaryentry{add}{
|
||||||
|
name=ADD, description={Average Daily Demand}
|
||||||
|
}
|
||||||
\newglossaryentry{cart}{
|
\newglossaryentry{cart}{
|
||||||
name=CART, description={Classification and Regression Trees}
|
name=CART, description={Classification and Regression Trees}
|
||||||
}
|
}
|
||||||
|
|
|
@ -10,6 +10,9 @@
|
||||||
% Enable diagonal lines in tables.
|
% Enable diagonal lines in tables.
|
||||||
\usepackage{static/slashbox}
|
\usepackage{static/slashbox}
|
||||||
|
|
||||||
|
% Enable multiple lines in a table row
|
||||||
|
\usepackage{multirow}
|
||||||
|
|
||||||
% Make opening quotes look different than closing quotes.
|
% Make opening quotes look different than closing quotes.
|
||||||
\usepackage[english=american]{csquotes}
|
\usepackage[english=american]{csquotes}
|
||||||
\MakeOuterQuote{"}
|
\MakeOuterQuote{"}
|
||||||
|
@ -18,3 +21,4 @@
|
||||||
\usepackage{bm}
|
\usepackage{bm}
|
||||||
\newcommand{\mat}[1]{\bm{#1}}
|
\newcommand{\mat}[1]{\bm{#1}}
|
||||||
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
|
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
|
||||||
|
\newcommand{\thead}[1]{\textbf{#1}}
|
Loading…
Reference in a new issue