Add Model section

2020-10-04 23:39:20 +02:00 · 2020-10-04 23:39:20 +02:00 · 91bd4ba083
commit 91bd4ba083
parent 7c203cb87c
25 changed files with 1354 additions and 6 deletions
--- a/tex/3_mod/7_models/1_intro.tex
+++ b/tex/3_mod/7_models/1_intro.tex
@ -0,0 +1,20 @@
+\subsection{Forecasting Models}
+\label{models}
+
+This sub-section describes the concrete models in our study.
+Figure \ref{f:inputs} shows how we classify them into four families with
+    regard to the type of the time series, horizontal or vertical, and the
+    moment at which a model is trained:
+Solid lines indicate that the corresponding time steps lie before the
+    training, and dotted lines show the time horizon predicted by a model.
+For conciseness, we only show the forecasts for one test day.
+The setup is the same for each inner validation day.
+
+\
+
+\begin{center}
+\captionof{figure}{Classification of the models by input type and training
+                   moment}
+\label{f:inputs}
+\includegraphics[width=.95\linewidth]{static/model_inputs_gray.png}
+\end{center}
--- a/tex/3_mod/7_models/2_hori.tex
+++ b/tex/3_mod/7_models/2_hori.tex
@ -0,0 +1,42 @@
+\subsubsection{Horizontal and Whole-day-ahead Forecasts.}
+\label{hori}
+
+The upper-left in Figure \ref{f:inputs} illustrates the simplest way to
+    generate forecasts for a test day before it has started:
+For each time of the day, the corresponding horizontal slice becomes the input
+    for a model.
+With whole days being the unified time interval, each model is trained $H$
+    times, providing a one-step-ahead forecast.
+While it is possible to have models of a different type be selected per time
+    step, that did not improve the accuracy in the empirical study.
+As the models in this family do not include the test day's demand data in
+    their training sets, we see them as benchmarks to answer \textbf{Q4},
+    checking if a UDP can take advantage of real-time information.
+The models in this family are as follows; we use prefixes, such as "h" here,
+    when methods are applied in other families as well:
+\begin{enumerate}
+\item \textit{\gls{naive}}:
+          Observation from the same time step one week prior
+\item \textit{\gls{trivial}}:
+          Predict $0$ for all time steps
+\item \textit{\gls{hcroston}}:
+          Intermittent demand method introduced by \cite{croston1972}
+\item \textit{\gls{hholt}},
+      \textit{\gls{hhwinters}},
+      \textit{\gls{hses}},
+      \textit{\gls{hsma}}, and
+      \textit{\gls{htheta}}:
+          Exponential smoothing without calibration
+\item \textit{\gls{hets}}:
+          ETS calibrated as described by \cite{hyndman2008b}
+\item \textit{\gls{harima}}:
+          ARIMA calibrated as described by \cite{hyndman2008a}
+\end{enumerate}
+\textit{naive} and \textit{trivial} provide an absolute benchmark for the
+    actual forecasting methods.
+\textit{hcroston} is often mentioned in the context of intermittent demand;
+    however, the method did not perform well at all.
+Besides \textit{hhwinters} that always fits a seasonal component, the
+    calibration heuristics behind \textit{hets} and \textit{harima} may do so
+    as well.
+With $k=7$, an STL decomposition is unnecessary here.
--- a/tex/3_mod/7_models/3_vert.tex
+++ b/tex/3_mod/7_models/3_vert.tex
@ -0,0 +1,39 @@
+\subsubsection{Vertical and Whole-day-ahead Forecasts without Retraining.}
+\label{vert}
+
+The upper-right in Figure \ref{f:inputs} shows an alternative way to
+    generate forecasts for a test day before it has started:
+First, a seasonally-adjusted time series $a_t$ is obtained from a vertical
+    time series by STL decomposition.
+Then, the actual forecasting model, trained on $a_t$, makes an $H$-step-ahead
+    prediction.
+Lastly, we add the $H$ seasonal na\"{i}ve forecasts for the seasonal component
+    $s_t$ to them to obtain the actual predictions for the test day.
+Thus, only one training is required per model type, and no real-time data is
+    used.
+By decomposing the raw time series, all long-term patterns are assumed to be
+    in the seasonal component $s_t$, and $a_t$ only contains the level with
+    a potential trend and auto-correlations.
+The models in this family are:
+\begin{enumerate}
+\item \textit{\gls{fnaive}},
+      \textit{\gls{pnaive}}:
+          Sum of STL's trend and seasonal components' na\"{i}ve forecasts
+\item \textit{\gls{vholt}},
+      \textit{\gls{vses}}, and
+      \textit{\gls{vtheta}}:
+          Exponential smoothing without calibration and seasonal
+                       fit
+\item \textit{\gls{vets}}:
+          ETS calibrated as described by \cite{hyndman2008b}
+\item \textit{\gls{varima}}:
+          ARIMA calibrated as described by \cite{hyndman2008a}
+\end{enumerate}
+As mentioned in Sub-section \ref{unified_cv}, we include the sum of the
+    (seasonal) na\"{i}ve forecasts of the STL's trend and seasonal components
+    as forecasts on their own:
+For \textit{fnaive}, we tune the "flexible" $ns$ parameter, and for
+    \textit{pnaive}, we set it to a "periodic" value.
+Thus, we implicitly assume that there is no signal in the remainder $r_t$, and
+    predict $0$ for it.
+\textit{fnaive} and \textit{pnaive} are two more simple benchmarks.
--- a/tex/3_mod/7_models/4_rt.tex
+++ b/tex/3_mod/7_models/4_rt.tex
@ -0,0 +1,22 @@
+\subsubsection{Vertical and Real-time Forecasts with Retraining.}
+\label{rt}
+
+The lower-left in Figure \ref{f:inputs} shows how models trained on vertical
+    time series are extended with real-time order data as it becomes available
+    during a test day:
+Instead of obtaining an $H$-step-ahead forecast, we retrain a model after
+    every time step and only predict one step.
+The remainder is as in the previous sub-section, and the models are:
+\begin{enumerate}
+\item \textit{\gls{rtholt}},
+      \textit{\gls{rtses}}, and
+      \textit{\gls{rttheta}}:
+          Exponential smoothing without calibration and seasonal fit
+\item \textit{\gls{rtets}}:
+          ETS calibrated as described by \cite{hyndman2008b}
+\item \textit{\gls{rtarima}}:
+          ARIMA calibrated as described by \cite{hyndman2008a}
+\end{enumerate}
+Retraining \textit{fnaive} and \textit{pnaive} did not increase accuracy, and
+    thus we left them out.
+A downside of this family is the significant increase in computing costs.
--- a/tex/3_mod/7_models/5_ml.tex
+++ b/tex/3_mod/7_models/5_ml.tex
@ -0,0 +1,54 @@
+\subsubsection{Vertical and Real-time Forecasts without Retraining.}
+\label{ml_models}
+
+The lower-right in Figure \ref{f:inputs} shows how ML models take
+    real-time order data into account without retraining.
+Based on the seasonally-adjusted time series $a_t$, we employ the feature
+    matrix and label vector representations from Sub-section \ref{learning}
+    and set $n$ to the number of daily time steps, $H$, to cover all potential
+    auto-correlations.
+The ML models are trained once before a test day starts.
+For training, the matrix and vector are populated such that $y_T$ is set to
+    the last time step of the day before the forecasts, $a_T$.
+As the splitting during CV is done with whole days, the \gls{ml} models are
+    trained with training sets consisting of samples from all times of a day
+    in an equal manner.
+Thus, the ML models learn to predict each time of the day.
+For prediction on a test day, the $H$ observations preceding the time
+    step to be forecast are used as the input vector after seasonal
+    adjustment.
+As a result, real-time data are included.
+The models in this family are:
+\begin{enumerate}
+\item \textit{\gls{vrfr}}: RF trained on the matrix as described
+\item \textit{\gls{vsvr}}: SVR trained on the matrix as described
+\end{enumerate}
+We tried other ML models such as gradient boosting machines but found
+    only RFs and SVRs to perform well in our study.
+In the case of gradient boosting machines, this is to be expected as they are
+    known not to perform well in the presence of high noise - as is natural
+    with low count data - as shown, for example, by \cite{ma2018} or
+    \cite{mason2000}.
+Also, deep learning methods are not applicable as the feature matrices only
+    consist of several hundred to thousands of rows (cf., Sub-section
+    \ref{params}).
+In \ref{tabular_ml_models}, we provide an alternative feature matrix
+    representation that exploits the two-dimensional structure of time tables
+    without decomposing the time series.
+In \ref{enhanced_feats}, we show how feature matrices are extended
+    to include predictors other than historical order data.
+However, to answer \textbf{Q5} already here, none of the external data sources
+    improves the results in our study.
+Due to the high number of time series in our study, to investigate why
+    no external sources improve the forecasts, we must us some automated
+    approach to analyzing individual time series.
+\cite{barbour2014} provide a spectral density estimation approach, called
+    the Shannon entropy, that measures the signal-to-noise ratio in a
+    database with a number normalized between 0 and 1 where lower values
+    indicate a higher signal-to-noise ratio.
+We then looked at averages of the estimates on a daily level per pixel and
+    find that including any of the external data sources from
+    \ref{enhanced_feats} always leads to significantly lower signal-to-noise
+    ratios.
+Thus, we conclude that at least for the demand faced by our industry partner
+    the historical data contains all of the signal.