1
0
Fork 0
urban-meal-delivery-demand-.../tex/4_stu/6_fams.tex

162 lines
7.5 KiB
TeX

\subsection{Results by Model Families}
\label{fams}
\begin{center}
\captionof{table}{Ranking of benchmark and horizontal models
($1~\text{km}^2$ pixel size, 60-minute time steps):
the table shows the ranks for cases with $2.5 < ADD < 25$
(and $25 < ADD < \infty$ in parentheses if they differ)}
\label{t:hori}
\begin{tabular}{|c|ccc|cccccccc|}
\hline
\multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
& \multicolumn{3}{c|}{\thead{Benchmarks}}
& \multicolumn{8}{c|}{\thead{Horizontal (whole-day-ahead)}} \\
\cline{2-12}
~ & \textit{naive} & \textit{fnaive} & \textit{paive}
& \textit{harima} & \textit{hcroston} & \textit{hets} & \textit{hholt}
& \textit{hhwinters} & \textit{hses} & \textit{hsma} & \textit{htheta} \\
\hline \hline
3 & 11 & 7 (2) & 8 (5) & 5 (7) & 4 & 3
& 9 (10) & 10 (9) & 2 (6) & 1 & 6 (8) \\
4 & 11 & 7 (2) & 8 (3) & 5 (6) & 4 (5) & 3 (1)
& 9 (10) & 10 (9) & 2 (7) & 1 (4) & 6 (8) \\
5 & 11 & 7 (2) & 8 (4) & 5 (3) & 4 (9) & 3 (1)
& 9 (10) & 10 (5) & 2 (8) & 1 (6) & 6 (7) \\
6 & 11 & 8 (5) & 9 (6) & 5 (4) & 4 (7) & 2 (1)
& 10 & 7 (2) & 3 (8) & 1 (9) & 6 (3) \\
7 & 11 & 8 (5) & 10 (6) & 5 (4) & 4 (7) & 2 (1)
& 9 (10) & 7 (2) & 3 (8) & 1 (9) & 6 (3) \\
8 & 11 & 9 (5) & 10 (6) & 5 (4) & 4 (7) & 2 (1)
& 8 (10) & 7 (2) & 3 (8) & 1 (9) & 6 (3) \\
\hline
\end{tabular}
\end{center}
\
Besides the overall results, we provide an in-depth comparison of models
within a family.
Instead of reporting the MASE per model, we rank the models holding the
training horizon fixed to make comparison easier.
Table \ref{t:hori} presents the models trained on horizontal time series.
In addition to \textit{naive}, we include \textit{fnaive} and \textit{pnaive}
already here as more competitive benchmarks.
The tables in this section report two rankings simultaneously:
The first number is the rank resulting from lumping the low and medium
clusters together, which yields almost the same rankings when analyzed
individually.
The ranks from only high demand pixels are in parentheses if they differ.
A first insight is that \textit{fnaive} is the best benchmark in all
scenarios:
Decomposing flexibly by tuning the $ns$ parameter is worth the computational
cost.
Further, if one is limited in the number of non-na\"{i}ve methods,
\textit{hets} is the best compromise and works well across all demand
levels.
It is also the best model independent of the training horizon for high demand.
With low or medium demand, \textit{hsma} is the clear overall winner; yet,
with high demand, models with a seasonal fit (i.e., \textit{harima},
\textit{hets}, and \textit{hhwinters}) are more accurate, in particular,
for longer training horizons.
This is due to demand patterns in the weekdays becoming stronger with higher
overall demand.
\begin{center}
\captionof{table}{Ranking of classical models on vertical time series
($1~\text{km}^2$ pixel size, 60-minute time steps):
the table shows the ranks for cases with $2.5 < ADD < 25$
(and $25 < ADD < \infty$ in parentheses if they differ)}
\label{t:vert}
\begin{tabular}{|c|cc|ccccc|ccccc|}
\hline
\multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
& \multicolumn{2}{c|}{\thead{Benchmarks}}
& \multicolumn{5}{c|}{\thead{Vertical (whole-day-ahead)}}
& \multicolumn{5}{c|}{\thead{Vertical (real-time)}} \\
\cline{2-13}
~ & \textit{hets} & \textit{hsma} & \textit{varima} & \textit{vets}
& \textit{vholt} & \textit{vses} & \textit{vtheta} & \textit{rtarima}
& \textit{rtets} & \textit{rtholt} & \textit{rtses} & \textit{rttheta} \\
\hline \hline
3 & 2 (10) & 1 (7) & 6 (4) & 8 (6) & 10 (9)
& 7 (5) & 11 (12) & 4 (1) & 5 (3) & 9 (8) & 3 (2) & 12 (11) \\
4 & 2 (8) & 1 (10) & 6 (4) & 8 (6) & 10 (9)
& 7 (5) & 12 (11) & 3 (1) & 5 (3) & 9 (7) & 4 (2) & 11 (12) \\
5 & 2 (3) & 1 (10) & 7 (5) & 8 (7) & 10 (9)
& 6 & 11 & 4 (1) & 5 (4) & 9 (8) & 3 (2) & 12 \\
6 & 2 (1) & 1 (10) & 6 (5) & 8 (7) & 10 (9)
& 7 (6) & 11 (12) & 3 (2) & 5 (4) & 9 (8) & 4 (3) & 12 (11) \\
7 & 2 (1) & 1 (10) & 8 (5) & 7 & 10 (9)
& 6 & 11 (12) & 5 (2) & 4 & 9 (8) & 3 & 12 (11) \\
8 & 2 (1) & 1 (9) & 8 (5) & 7 (6) & 10 (8)
& 6 & 12 (10) & 5 (2) & 4 & 9 (7) & 3 & 11 \\
\hline
\end{tabular}
\end{center}
\
Table \ref{t:vert} extends the previous analysis to classical models trained
on vertical time series.
Now, the winners from before, \textit{hets} and \textit{hsma}, serve as
benchmarks.
Whereas for low and medium demand, no improvements can be obtained,
\textit{rtarima} and \textit{rtses} are the most accurate with high demand
and short training horizons.
For six or more training weeks, \textit{hets} is still optimal.
Independent of retraining and the demand level, the models' relative
performances are consistent:
The \textit{*arima} and \textit{*ses} models are best, followed by
\textit{*ets}, \textit{*holt}, and \textit{*theta}.
Thus, models that can deal with auto-correlations and short-term forecasting
errors, as expressed by moving averages, and that cannot be distracted by
trend terms are optimal for vertical series.
Finally, Table \ref{t:ml} compares the two ML-based models against the
best-performing classical models and answers \textbf{Q2}:
With low and medium demand, no improvements can be obtained again; however,
with high demand, \textit{vrfr} has the edge over \textit{rtarima} for
training horizons up to six weeks.
We conjecture that \textit{vrfr} fits auto-correlations better than
\textit{varima} and is not distracted by short-term noise as
\textit{rtarima} may be due to the retraining.
With seven or eight training weeks, \textit{hets} remains the overall winner.
Interestingly, \textit{vsvr} is more accurate than \textit{vrfr} for low and
medium demand.
We assume that \textit{vrfr} performs well only with strong auto-correlations,
which are not present with low and medium demand.
\begin{center}
\captionof{table}{Ranking of ML models on vertical time series
($1~\text{km}^2$ pixel size, 60-minute time steps):
the table shows the ranks for cases with $2.5 < ADD < 25$
(and $25 < ADD < \infty$ in parentheses if they differ)}
\label{t:ml}
\begin{tabular}{|c|cccc|cc|}
\hline
\multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
& \multicolumn{4}{c|}{\thead{Benchmarks}}
& \multicolumn{2}{c|}{\thead{ML}} \\
\cline{2-7}
~ & \textit{fnaive} & \textit{hets} & \textit{hsma}
& \textit{rtarima} & \textit{vrfr} & \textit{vsvr} \\
\hline \hline
3 & 6 & 2 (5) & 1 (4) & 3 (1) & 5 (2) & 4 (3) \\
4 & 6 (5) & 2 (4) & 1 (6) & 3 (2) & 5 (1) & 4 (3) \\
5 & 6 (5) & 2 (4) & 1 (6) & 4 (2) & 5 (1) & 3 \\
6 & 6 (5) & 2 & 1 (6) & 4 & 5 (1) & 3 \\
7 & 6 (5) & 2 (1) & 1 (6) & 4 & 5 (2) & 3 \\
8 & 6 (5) & 2 (1) & 1 (6) & 4 & 5 (2) & 3 \\
\hline
\end{tabular}
\end{center}
\
Analogously, we created tables like Table \ref{t:hori} to \ref{t:ml} for the
forecasts with time steps of 90 and 120 minutes and find that the relative
rankings do not change significantly.
The same holds true for the rankings with changing pixel sizes.
For conciseness reasons, we do not include these additional tables in this
article.
In summary, the relative performances exhibited by certain model families
are shown to be rather stable in this case study.