Merge branch 'study-section' into develop

2020-10-04 23:59:57 +02:00 · 2020-10-04 23:59:57 +02:00 · 9f282b3923
commit 9f282b3923
parent 20abf8eade 30cfeaae4a
12 changed files with 544 additions and 12 deletions
--- a/paper.pdf
+++ b/paper.pdf
--- a/paper.tex
+++ b/paper.tex
@ -30,8 +30,13 @@
 \input{tex/3_mod/7_models/4_rt}
 \input{tex/3_mod/7_models/5_ml}
 \input{tex/4_stu/1_intro}
 \input{tex/4_stu/2_data}
 \input{tex/4_stu/3_params}
 \input{tex/4_stu/4_overall}
 \input{tex/4_stu/5_training}
 \input{tex/4_stu/6_fams}
 \input{tex/4_stu/7_pixels_intervals}
 \input{tex/5_con/1_intro}
 \newpage
 \input{tex/glossary}
@ -43,6 +48,10 @@
 \newpage
 \input{tex/apx/enhanced_feats}
 \newpage
 \input{tex/apx/case_study}
 \newpage
 \input{tex/apx/peak_results}
 \newpage
 \bibliographystyle{static/elsarticle-harv}
 \bibliography{tex/references}
--- a/tex/4_stu/1_intro.tex
+++ b/tex/4_stu/1_intro.tex
@ -1,5 +1,6 @@
 \section{Empirical Study: A Meal Delivery Platform in Europe}
 \label{stu}
-% temporary placeholder
+In the following, we first give a brief overview of the case study dataset
-\label{params}
+    and the parameters we applied to calibrate the time series generation.
 Then, we discuss the overall results.
--- a/tex/4_stu/2_data.tex
+++ b/tex/4_stu/2_data.tex
@ -0,0 +1,23 @@
 \subsection{Case Study Dataset}
 \label{data}
 The studied dataset consists of a meal delivery platform's entire
    transactional data covering the French market from launch in February of
    2016 to January of 2017.
 The platform operated in five cities throughout this period and received a
    total of 686,385 orders.
 The forecasting models were developed based on the data from Lyon and Paris in
    the period from August through December; this ensures comparability across
    cities and avoids irregularities in demand assumed for a new service
    within the first operating weeks.
 The data exhibit a steady-state as the UDP's service area remained
    unchanged, and the numbers of orders and of couriers grew in lock-step and
    organically.
 This does not mean that no new restaurants were openend: If that happened, the
    new restaurant did not attract new customers, but demand was shifted from
    other member restaurants.
 Results are similar in both cities, and we only report them for Paris for
    greater conciseness.
 Lastly, the platform recorded all incoming orders, and lost demand does not
    exist.
 See \ref{dataset} for details on the raw data.
--- a/tex/4_stu/3_params.tex
+++ b/tex/4_stu/3_params.tex
@ -0,0 +1,37 @@
 \subsection{Calibration of the Time Series Generation Process}
 \label{params}
 Independent of the concrete forecasting models, the time series generation
    must be calibrated.
 We concentrate our forecasts on the pickup side for two reasons.
 First, the restaurants come in a significantly lower number than the
    customers resulting in more aggregation in the order counts and thus a
    better pattern recognition.
 Second, from an operational point of view, forecasts for the pickups are more
    valuable because of the waiting times due to meal preparation.
 We choose pixel sizes of $0.5~\text{km}^2$, $1~\text{km}^2$, $2~\text{km}^2$,
    and $4~\text{km}^2$, and time steps covering 60, 90, and 120 minute windows
    resulting in $H_{60}=12$, $H_{90}=9$, and $H_{120}=6$ time steps per day
    with the platform operating between 11 a.m. and 11 p.m. and corresponding
    frequencies $k_{60}=7*12=84$, $k_{90}=7*9=63$, and $k_{120}=7*6=42$ for the
    vertical time series.
 Smaller pixels and shorter time steps yield no recognizable patterns, yet would
    have been more beneficial for tactical routing.
 90 and 120 minute time steps are most likely not desirable for routing; however,
    we keep them for comparison and note that a UDP may employ such forecasts
    to activate more couriers at short notice if a (too) high demand is
    forecasted in an hour from now.
 This could, for example, be implemented by paying couriers a premium if they
    show up for work at short notice.
 Discrete lengths of 3, 4, 5, 6, 7, and 8 weeks are chosen as training
    horizons.
 We do so as the structure within the pixels (i.e., number and kind of
    restaurants) is not stable for more than two months in a row in the
    covered horizon.
 That is confirmed by the empirical finding that forecasting accuracy
    improves with longer training horizon but this effect starts to
    level off after about six to seven weeks.
 So, the demand patterns of more than two months ago do not resemble more
    recent ones.
 In total, 100,000s of distinct time series are forecast in the study.
--- a/tex/4_stu/4_overall.tex
+++ b/tex/4_stu/4_overall.tex
@ -0,0 +1,238 @@
 \subsection{Overall Results}
 \label{overall_results}
 Table \ref{t:results} summarizes the overall best-performing models grouped by
    training horizon and a pixel's average daily demand (\gls{add}) for a
    pixel size of $1~\text{km}^2$ and 60-minute time steps.
 Each combination of pixel and test day counts as one case, and the total
    number of cases is denoted as $n$.
 Clustering the individual results revealed that a pixel's ADD over the
    training horizon is the primary indicator of similarity and three to four
    clusters suffice to obtain cohesive clusters:
 We labeled them "no", "low", "medium", and "high" demand pixels with
    increasing ADD, and present the average MASE per cluster.
 The $n$ do not vary significantly across the training horizons, which confirms
    that the platform did not grow area-wise and is indeed in a steady-state.
 We use this table to answer \textbf{Q1} regarding the overall best methods
    under different ADDs.
 All result tables in the main text report MASEs calculated with all time
    steps of a day.
 In contrast, \ref{peak_results} shows the same tables with MASEs calculated
    with time steps within peak times only (i.e., lunch from 12 pm to 2 pm and
    dinner from 6 pm to 8 pm).
 The differences lie mainly in the decimals of the individual MASE
    averages while the ranks of the forecasting methods do not change except
    in rare cases.
 That shows that the presented accuracies are driven by the forecasting methods'
    accuracies at peak times.
 Intuitively, they all correctly predict zero demand for non-peak times.
 Unsurprisingly, the best model for pixels without demand (i.e.,
    $0 < \text{ADD} < 2.5$) is \textit{trivial}.
 Whereas \textit{hsma} also adapts well, its performance is worse.
 None of the more sophisticated models reaches a similar accuracy.
 The intuition behind is that \textit{trivial} is the least distorted by the
    relatively large proportion of noise given the low-count nature of the
    time series.
 For low demand (i.e., $2.5 < \text{ADD} < 10$), there is also a clear
    best-performing model, namely \textit{hsma}.
 As the non-seasonal \textit{hses} reaches a similar accuracy as its
    potentially seasonal generalization, the \textit{hets}, we conclude that
    the seasonal pattern from weekdays is not yet strong enough to be
    recognized in low demand pixels.
 So, in the absence of seasonality, models that only model a trend part are
    the least susceptible to the noise.
 For medium demand (i.e., $10 < \text{ADD} < 25$) and training horizons up to
    six weeks, the best-performing models are the same as for low demand.
 For longer horizons, \textit{hets} provides the highest accuracy.
 Thus, to fit a seasonal pattern, longer training horizons are needed.
 While \textit{vsvr} enters the top three, \textit{hets} has the edge as they
    neither require parameter tuning nor real-time data.
 \begin{center}
 \captionof{table}{Top-3 models by training weeks and average demand
                  ($1~\text{km}^2$ pixel size, 60-minute time steps)}
 \label{t:results}
 \begin{tabular}{|c|c|*{12}{c|}}
 \hline
 \multirow{3}{*}{\rotatebox{90}{\thead{Training}}}
    & \multirow{3}{*}{\rotatebox{90}{\thead{Rank}}}
    & \multicolumn{3}{c|}{\thead{No Demand}}
    & \multicolumn{3}{c|}{\thead{Low Demand}}
    & \multicolumn{3}{c|}{\thead{Medium Demand}}
    & \multicolumn{3}{c|}{\thead{High Demand}} \\
 ~ & ~
    & \multicolumn{3}{c|}{(0 - 2.5)}
    & \multicolumn{3}{c|}{(2.5 - 10)}
    & \multicolumn{3}{c|}{(10 - 25)}
    & \multicolumn{3}{c|}{(25 - $\infty$)} \\
 \cline{3-14}
 ~ & ~
    & Method & MASE & $n$
    & Method & MASE & $n$
    & Method & MASE & $n$
    & Method & MASE & $n$ \\
 \hline \hline
 \multirow{3}{*}{3} & 1
    & \textbf{\textit{trivial}}
        & 0.785 & \multirow{3}{*}{\rotatebox{90}{4586}}
    & \textbf{\textit{hsma}}
        & 0.819 & \multirow{3}{*}{\rotatebox{90}{2975}}
    & \textbf{\textit{hsma}}
        & 0.839 & \multirow{3}{*}{\rotatebox{90}{2743}}
    & \textbf{\textit{rtarima}}
        & 0.872 & \multirow{3}{*}{\rotatebox{90}{2018}} \\
 ~ & 2
    & \textit{hsma}    & 0.809 & ~
    & \textit{hses}    & 0.844 & ~
    & \textit{hses}    & 0.858 & ~
    & \textit{rtses}   & 0.873 & ~ \\
 ~ & 3
    & \textit{pnaive}  & 0.958 & ~
    & \textit{hets}    & 0.846 & ~
    & \textit{hets}    & 0.859 & ~
    & \textit{rtets}   & 0.877 & ~ \\
 \hline
 \multirow{3}{*}{4} & 1
    & \textbf{\textit{trivial}}
        & 0.770 & \multirow{3}{*}{\rotatebox{90}{4532}}
    & \textbf{\textit{hsma}}
        & 0.825 & \multirow{3}{*}{\rotatebox{90}{3033}}
    & \textbf{\textit{hsma}}
        & 0.837 & \multirow{3}{*}{\rotatebox{90}{2687}}
    & \textbf{\textit{vrfr}}
        & 0.855 & \multirow{3}{*}{\rotatebox{90}{2016}} \\
 ~ & 2
    & \textit{hsma}             & 0.788 & ~
    & \textit{hses}             & 0.848 & ~
    & \textit{hses}             & 0.850 & ~
    & \textbf{\textit{rtarima}} & 0.855 & ~ \\
 ~ & 3
    & \textit{pnaive}  & 0.917 & ~
    & \textit{hets}    & 0.851 & ~
    & \textit{hets}    & 0.854 & ~
    & \textit{rtses}   & 0.860 & ~ \\
 \hline
 \multirow{3}{*}{5} & 1
    & \textbf{\textit{trivial}}
        & 0.780 & \multirow{3}{*}{\rotatebox{90}{4527}}
    & \textbf{\textit{hsma}}
        & 0.841 & \multirow{3}{*}{\rotatebox{90}{3055}}
    & \textbf{\textit{hsma}}
        & 0.837 & \multirow{3}{*}{\rotatebox{90}{2662}}
    & \textbf{\textit{vrfr}}
        & 0.850 & \multirow{3}{*}{\rotatebox{90}{2019}} \\
 ~ & 2
    & \textit{hsma}             & 0.803 & ~
    & \textit{hses}             & 0.859 & ~
    & \textit{hets}             & 0.845 & ~
    & \textbf{\textit{rtarima}} & 0.852 & ~ \\
 ~ & 3
    & \textit{pnaive}  & 0.889 & ~
    & \textit{hets}    & 0.861 & ~
    & \textit{hses}    & 0.845 & ~
    & \textit{vsvr}    & 0.854 & ~ \\
 \hline
 \multirow{3}{*}{6} & 1
    & \textbf{\textit{trivial}}
        & 0.741 & \multirow{3}{*}{\rotatebox{90}{4470}}
    & \textbf{\textit{hsma}}
        & 0.847 & \multirow{3}{*}{\rotatebox{90}{3086}}
    & \textbf{\textit{hsma}}
        & 0.840 & \multirow{3}{*}{\rotatebox{90}{2625}}
    & \textbf{\textit{vrfr}}
        & 0.842 & \multirow{3}{*}{\rotatebox{90}{2025}} \\
 ~ & 2
    & \textit{hsma}             & 0.766 & ~
    & \textit{hses}             & 0.863 & ~
    & \textit{hets}             & 0.842 & ~
    & \textbf{\textit{hets}}    & 0.847 & ~ \\
 ~ & 3
    & \textit{pnaive}  & 0.837 & ~
    & \textit{hets}    & 0.865 & ~
    & \textit{hses}    & 0.848 & ~
    & \textit{vsvr}    & 0.848 & ~ \\
 \hline
 \multirow{3}{*}{7} & 1
    & \textbf{\textit{trivial}}
        & 0.730 & \multirow{3}{*}{\rotatebox{90}{4454}}
    & \textbf{\textit{hsma}}
        & 0.858 & \multirow{3}{*}{\rotatebox{90}{3132}}
    & \textbf{\textit{hets}}
        & 0.845 & \multirow{3}{*}{\rotatebox{90}{2597}}
    & \textbf{\textit{hets}}
        & 0.840 & \multirow{3}{*}{\rotatebox{90}{2007}} \\
 ~ & 2
    & \textit{hsma}          & 0.754 & ~
    & \textit{hses}          & 0.871 & ~
    & \textit{hsma}          & 0.847 & ~
    & \textbf{\textit{vrfr}} & 0.845 & ~ \\
 ~ & 3
    & \textit{pnaive}        & 0.813 & ~
    & \textit{hets}          & 0.872 & ~
    & \textbf{\textit{vsvr}} & 0.850 & ~
    & \textit{vsvr}          & 0.847 & ~ \\
 \hline
 \multirow{3}{*}{8} & 1
    & \textbf{\textit{trivial}}
        & 0.735 & \multirow{3}{*}{\rotatebox{90}{4402}}
    & \textbf{\textit{hsma}}
        & 0.867 & \multirow{3}{*}{\rotatebox{90}{3159}}
    & \textbf{\textit{hets}}
        & 0.846 & \multirow{3}{*}{\rotatebox{90}{2575}}
    & \textbf{\textit{hets}}
        & 0.836 & \multirow{3}{*}{\rotatebox{90}{2002}} \\
 ~ & 2
    & \textit{hsma}          & 0.758 & ~
    & \textit{hets}          & 0.877 & ~
    & \textbf{\textit{vsvr}} & 0.850 & ~
    & \textbf{\textit{vrfr}} & 0.842 & ~ \\
 ~ & 3
    & \textit{pnaive}  & 0.811 & ~
    & \textit{hses}    & 0.880 & ~
    & \textit{hsma}    & 0.851 & ~
    & \textit{vsvr}    & 0.849 & ~ \\
 \hline
 \end{tabular}
 \end{center}
 In summary, except for high demand, simple models trained on horizontal time
    series work best.
 By contrast, high demand (i.e., $25 < \text{ADD} < \infty$) and less than
    six training weeks is the only situation where classical models trained on
    vertical time series work well.
 Then, \textit{rtarima} outperforms their siblings from Sub-sections
    \ref{vert} and \ref{rt}.
 We conjecture that intra-day auto-correlations as caused, for example, by
    weather, are the reason for that.
 Intuitively, a certain amount of demand (i.e., a high enough signal-to-noise
    ratio) is required such that models with auto-correlations can see them
    through all the noise.
 That idea is supported by \textit{vrfr} reaching a similar accuracy under
    high demand as their tree-structure allows them to fit auto-correlations.
 As both \textit{rtarima} and \textit{vrfr} incorporate recent demand,
    real-time information can indeed improve accuracy.
 However, once models are trained on longer horizons, \textit{hets} is more
    accurate than \textit{vrfr}.
 Thus, to answer \textbf{Q4}, we conclude that real-time information only
    improves accuracy if three or four weeks of training material are
    available.
 In addition to looking at the results in tables covering the entire one-year
    horizon, we also created sub-analyses on the distinct seasons spring,
    summer (incl. the long holiday season in France), and fall.
 Yet, none of the results portrayed in this and the subsequent sections change
    is significant ways.
 We conjecture that there could be differences if the overall demand of the UDP
    increased to a scale beyond the one this case study covers and leave that
    up to a follow-up study with a bigger UDP.
--- a/tex/4_stu/5_training.tex
+++ b/tex/4_stu/5_training.tex
@ -0,0 +1,31 @@
 \subsection{Impact of the Training Horizon}
 \label{training}
 Whereas it is reasonable to assume that forecasts become more accurate as the
    training horizon expands, our study reveals some interesting findings.
 First, without demand, \textit{trivial} indeed performs better with more
    training material, but improved pattern recognition cannot be the cause
    here.
 Instead, we argue that the reason for this is that the longer there has been
    no steady demand, the higher the chance that this will not change soon.
 Further, if we focus on shorter training horizons, the sample will necessarily
    contain cases where a pixel is initiated after a popular-to-be restaurant
    joined the platform:
 Demand grows fast making \textit{trivial} less accurate, and the pixel moves
    to another cluster soon.
 Second, with low demand, the best-performing \textit{hsma} becomes less
    accurate with more training material.
 While one could argue that this is due to \textit{hsma} not fitting a trend,
    the less accurate \textit{hses} and \textit{hets} do fit a trend.
 Instead, we argue that any low-demand time series naturally exhibits a high
    noise-to-signal ratio, and \textit{hsma} is the least susceptible to
    noise.
 Then, to counter the missing trend term, the training horizon must be shorter.
 With medium demand, a similar argument can be made; however, the
    signal already becomes more apparent favoring \textit{hets} with more
    training data.
 Lastly, with high demand, the signal becomes so clear that more sophisticated
    models can exploit longer training horizons.
--- a/tex/4_stu/6_fams.tex
+++ b/tex/4_stu/6_fams.tex
@ -0,0 +1,162 @@
 \subsection{Results by Model Families}
 \label{fams}
 Besides the overall results, we provide an in-depth comparison of models
    within a family.
 Instead of reporting the MASE per model, we rank the models holding the
    training horizon fixed to make comparison easier.
 Table \ref{t:hori} presents the models trained on horizontal time series.
 In addition to \textit{naive}, we include \textit{fnaive} and \textit{pnaive}
    already here as more competitive benchmarks.
 The tables in this section report two rankings simultaneously:
 The first number is the rank resulting from lumping the low and medium
    clusters together, which yields almost the same rankings when analyzed
    individually.
 The ranks from only high demand pixels are in parentheses if they differ.
 \begin{center}
 \captionof{table}{Ranking of benchmark and horizontal models
                  ($1~\text{km}^2$ pixel size, 60-minute time steps):
                  the table shows the ranks for cases with $2.5 < ADD < 25$
                  (and $25 < ADD < \infty$ in parentheses if they differ)}
 \label{t:hori}
 \begin{tabular}{|c|ccc|cccccccc|}
 \hline
 \multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
    & \multicolumn{3}{c|}{\thead{Benchmarks}}
    & \multicolumn{8}{c|}{\thead{Horizontal (whole-day-ahead)}} \\
 \cline{2-12}
 ~ & \textit{naive}     & \textit{fnaive}   & \textit{paive}
  & \textit{harima}    & \textit{hcroston} & \textit{hets} & \textit{hholt}
  & \textit{hhwinters} & \textit{hses}     & \textit{hsma} & \textit{htheta} \\
 \hline \hline
 3 & 11      &  7 (2) &  8 (5) & 5 (7) & 4     & 3
  &  9 (10) & 10 (9) &  2 (6) & 1     & 6 (8) \\
 4 & 11      &  7 (2) &  8 (3) & 5 (6) & 4 (5) & 3 (1)
  &  9 (10) & 10 (9) &  2 (7) & 1 (4) & 6 (8) \\
 5 & 11      &  7 (2) &  8 (4) & 5 (3) & 4 (9) & 3 (1)
  &  9 (10) & 10 (5) &  2 (8) & 1 (6) & 6 (7) \\
 6 & 11      &  8 (5) &  9 (6) & 5 (4) & 4 (7) & 2 (1)
  & 10      &  7 (2) &  3 (8) & 1 (9) & 6 (3)  \\
 7 & 11      &  8 (5) & 10 (6) & 5 (4) & 4 (7) & 2 (1)
  &  9 (10) &  7 (2) &  3 (8) & 1 (9) & 6 (3) \\
 8 & 11      &  9 (5) & 10 (6) & 5 (4) & 4 (7) & 2 (1)
  &  8 (10) &  7 (2) &  3 (8) & 1 (9) & 6 (3) \\
 \hline
 \end{tabular}
 \end{center}
 \
 A first insight is that \textit{fnaive} is the best benchmark in all
    scenarios:
 Decomposing flexibly by tuning the $ns$ parameter is worth the computational
    cost.
 Further, if one is limited in the number of non-na\"{i}ve methods,
    \textit{hets} is the best compromise and works well across all demand
    levels.
 It is also the best model independent of the training horizon for high demand.
 With low or medium demand, \textit{hsma} is the clear overall winner; yet,
    with high demand, models with a seasonal fit (i.e., \textit{harima},
    \textit{hets}, and \textit{hhwinters}) are more accurate, in particular,
    for longer training horizons.
 This is due to demand patterns in the weekdays becoming stronger with higher
    overall demand.
 \begin{center}
 \captionof{table}{Ranking of classical models on vertical time series
                  ($1~\text{km}^2$ pixel size, 60-minute time steps):
                  the table shows the ranks for cases with $2.5 < ADD < 25$
                  (and $25 < ADD < \infty$ in parentheses if they differ)}
 \label{t:vert}
 \begin{tabular}{|c|cc|ccccc|ccccc|}
 \hline
 \multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
    & \multicolumn{2}{c|}{\thead{Benchmarks}}
    & \multicolumn{5}{c|}{\thead{Vertical (whole-day-ahead)}}
    & \multicolumn{5}{c|}{\thead{Vertical (real-time)}} \\
 \cline{2-13}
 ~ & \textit{hets}  & \textit{hsma}   & \textit{varima} & \textit{vets} 
  & \textit{vholt} & \textit{vses}   & \textit{vtheta} & \textit{rtarima}
  & \textit{rtets} & \textit{rtholt} & \textit{rtses}  & \textit{rttheta} \\
 \hline \hline
 3 &  2 (10) &  1  (7) & 6 (4) & 8 (6) & 10 (9)
  &  7  (5) & 11 (12) & 4 (1) & 5 (3) &  9 (8) & 3 (2) & 12 (11) \\
 4 &  2  (8) &  1 (10) & 6 (4) & 8 (6) & 10 (9)
  &  7  (5) & 12 (11) & 3 (1) & 5 (3) &  9 (7) & 4 (2) & 11 (12) \\
 5 &  2  (3) &  1 (10) & 7 (5) & 8 (7) & 10 (9)
  &  6      & 11      & 4 (1) & 5 (4) &  9 (8) & 3 (2) & 12 \\
 6 &  2  (1) &  1 (10) & 6 (5) & 8 (7) & 10 (9)
  &  7  (6) & 11 (12) & 3 (2) & 5 (4) &  9 (8) & 4 (3) & 12 (11) \\
 7 &  2  (1) &  1 (10) & 8 (5) & 7     & 10 (9)
  &  6      & 11 (12) &	5 (2) & 4     &  9 (8) & 3     & 12 (11) \\
 8 &  2  (1) &  1  (9) & 8 (5) & 7 (6) & 10 (8)
  &  6      & 12 (10) & 5 (2) & 4     &  9 (7) & 3     & 11 \\
 \hline
 \end{tabular}
 \end{center}
 \
 Table \ref{t:vert} extends the previous analysis to classical models trained
    on vertical time series.
 Now, the winners from before, \textit{hets} and \textit{hsma}, serve as
    benchmarks.
 Whereas for low and medium demand, no improvements can be obtained,
    \textit{rtarima} and \textit{rtses} are the most accurate with high demand
    and short training horizons.
 For six or more training weeks, \textit{hets} is still optimal.
 Independent of retraining and the demand level, the models' relative
    performances are consistent:
 The \textit{*arima} and \textit{*ses} models are best, followed by
    \textit{*ets}, \textit{*holt}, and \textit{*theta}.
 Thus, models that can deal with auto-correlations and short-term forecasting
    errors, as expressed by moving averages, and that cannot be distracted by
    trend terms are optimal for vertical series.
 Finally, Table \ref{t:ml} compares the two ML-based models against the
    best-performing classical models and answers \textbf{Q2}:
 With low and medium demand, no improvements can be obtained again; however,
    with high demand, \textit{vrfr} has the edge over \textit{rtarima} for
    training horizons up to six weeks.
 We conjecture that \textit{vrfr} fits auto-correlations better than
    \textit{varima} and is not distracted by short-term noise as
    \textit{rtarima} may be due to the retraining.
 With seven or eight training weeks, \textit{hets} remains the overall winner.
 Interestingly, \textit{vsvr} is more accurate than \textit{vrfr} for low and
    medium demand.
 We assume that \textit{vrfr} performs well only with strong auto-correlations,
    which are not present with low and medium demand.
 \begin{center}
 \captionof{table}{Ranking of ML models on vertical time series
                  ($1~\text{km}^2$ pixel size, 60-minute time steps):
                  the table shows the ranks for cases with $2.5 < ADD < 25$
                  (and $25 < ADD < \infty$ in parentheses if they differ)}
 \label{t:ml}
 \begin{tabular}{|c|cccc|cc|}
 \hline
 \multirow{2}{*}{\rotatebox{90}{\thead{\scriptsize{Training}}}}
    & \multicolumn{4}{c|}{\thead{Benchmarks}}
    & \multicolumn{2}{c|}{\thead{ML}} \\
 \cline{2-7}
 ~ & \textit{fnaive}  & \textit{hets} & \textit{hsma}
  & \textit{rtarima} & \textit{vrfr} & \textit{vsvr} \\
 \hline \hline
 3 & 6     & 2 (5) & 1 (4) & 3 (1) & 5 (2) & 4 (3) \\
 4 & 6 (5) & 2 (4) & 1 (6) & 3 (2) & 5 (1) & 4 (3) \\
 5 & 6 (5) & 2 (4) & 1 (6) & 4 (2) & 5 (1) & 3 \\
 6 & 6 (5) & 2     & 1 (6) & 4     & 5 (1) & 3 \\
 7 & 6 (5) & 2 (1) & 1 (6) & 4     & 5 (2) & 3 \\
 8 & 6 (5) & 2 (1) & 1 (6) & 4     & 5 (2) & 3 \\
 \hline
 \end{tabular}
 \end{center}
 \
 Analogously, we created tables like Table \ref{t:hori} to \ref{t:ml} for the
    forecasts with time steps of 90 and 120 minutes and find that the relative
    rankings do not change significantly.
 The same holds true for the rankings with changing pixel sizes.
 For conciseness reasons, we do not include these additional tables in this
    article.
 In summary, the relative performances exhibited by certain model families
    are shown to be rather stable in this case study.
--- a/tex/4_stu/7_pixels_intervals.tex
+++ b/tex/4_stu/7_pixels_intervals.tex
@ -0,0 +1,27 @@
 \subsection{Effects of the Pixel Size and Time Step Length}
 \label{pixels_intervals}
 As elaborated in Sub-section \ref{grid}, more order aggregation leads to a
    higher overall demand level and an improved pattern recognition in the
    generated time series.
 Consequently, individual cases tend to move to the right in tables equivalent
    to Table \ref{t:results}.
 With the same $ADD$ clusters, forecasts for pixel sizes of $2~\text{km}^2$ and
    $4~\text{km}^2$ or time intervals of 90 and 120 minutes or combinations
    thereof yield results similar to the best models as revealed in Tables
    \ref{t:results}, \ref{t:hori}, \ref{t:vert}, and \ref{t:ml} for high
    demand.
 By contrast, forecasts for $0.5~\text{km}^2$ pixels have most of the cases
    (i.e., $n$) in the no or low demand clusters.
 In that case, the pixels are too small, and pattern recognition becomes
    harder.
 While it is true, that \textit{trivial} exhibits the overall lowest MASE
    for no demand cases, these forecasts become effectively worthless for
    operations.
 In the extreme, with even smaller pixels we would be forecasting $0$ orders
    in all pixels for all time steps.
 In summary, the best model and its accuracy are determined primarily by the
    $ADD$, and the pixel size and interval length are merely parameters to
    control that.
 The forecaster's goal is to create a grid with small enough pixels without
    losing a recognizable pattern.
--- a/tex/apx/peak_results.tex
+++ b/tex/apx/peak_results.tex
@ -1,14 +1,11 @@
 \section{Forecasting Accuracies during Peak Times}
 \label{peak_results}
-This appendix shows all result tables from the main text with the MASE
+This appendix shows all tables from the main text
-    averages calculated from time steps within peak times.
+    with the MASE averages calculated from time steps within peak times
-Peaks are the times of the day where the typical customer has a lunch or
+    that are defined to be from 12 pm to 2 pm (=lunch) or from 6 pm to 8 pm (=dinner).
-    dinner meal and defined to be either from 12 pm to 2 pm or from 6 pm to
+While the exact decimals of the MASEs differ,
-    8 pm.
+    the relative ranks of the forecasting methods are the same except in rare cases.
 While the exact decimals of the MASEs differ from the ones in the main
    text, the relative ranks of the forecasting methods are the same except in
    rare cases.
 \begin{center}
 \captionof{table}{Top-3 models by training weeks and average demand
--- a/tex/glossary.tex
+++ b/tex/glossary.tex
@ -1,4 +1,7 @@
 % Abbreviations for technical terms.
 \newglossaryentry{add}{
    name=ADD, description={Average Daily Demand}
 }
 \newglossaryentry{cart}{
    name=CART, description={Classification and Regression Trees}
 }
--- a/tex/preamble.tex
+++ b/tex/preamble.tex
@ -10,6 +10,9 @@
 % Enable diagonal lines in tables.
 \usepackage{static/slashbox}
 % Enable multiple lines in a table row
 \usepackage{multirow}
 % Make opening quotes look different than closing quotes.
 \usepackage[english=american]{csquotes}
 \MakeOuterQuote{"}
@ -17,4 +20,5 @@
 % Define helper commands.
 \usepackage{bm}
 \newcommand{\mat}[1]{\bm{#1}}
-\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
+\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
 \newcommand{\thead}[1]{\textbf{#1}}