Add Literature section
This commit is contained in:
parent
8d10ba9a05
commit
3849e5fd3f
15 changed files with 878 additions and 3 deletions
|
@ -9,6 +9,15 @@
|
|||
|
||||
\input{tex/1_intro}
|
||||
\input{tex/2_lit/1_intro}
|
||||
\input{tex/2_lit/2_class/1_intro}
|
||||
\input{tex/2_lit/2_class/2_ets}
|
||||
\input{tex/2_lit/2_class/3_arima}
|
||||
\input{tex/2_lit/2_class/4_stl}
|
||||
\input{tex/2_lit/3_ml/1_intro}
|
||||
\input{tex/2_lit/3_ml/2_learning}
|
||||
\input{tex/2_lit/3_ml/3_cv}
|
||||
\input{tex/2_lit/3_ml/4_rf}
|
||||
\input{tex/2_lit/3_ml/5_svm}
|
||||
\input{tex/3_mod/1_intro}
|
||||
\input{tex/4_stu/1_intro}
|
||||
\input{tex/5_con/1_intro}
|
||||
|
|
|
@ -1,2 +1,17 @@
|
|||
\section{Literature Review}
|
||||
\label{lit}
|
||||
\label{lit}
|
||||
|
||||
In this section, we review the specific forecasting methods that make up our
|
||||
forecasting system.
|
||||
We group them into classical statistics and ML models.
|
||||
The two groups differ mainly in how they represent the input data and how
|
||||
accuracy is evaluated.
|
||||
|
||||
A time series is a finite and ordered sequence of equally spaced observations.
|
||||
Thus, time is regarded as discrete and a time step as a short period.
|
||||
Formally, a time series $Y$ is defined as $Y = \{y_t: t \in I\}$, or $y_t$ for
|
||||
short, where $I$ is an index set of positive integers.
|
||||
Besides its length $T = |Y|$, another property is the a priori fixed and
|
||||
non-negative periodicity $k$ of a seasonal pattern in demand:
|
||||
$k$ is the number of time steps after which a pattern repeats itself (e.g.,
|
||||
$k=12$ for monthly sales data).
|
||||
|
|
13
tex/2_lit/2_class/1_intro.tex
Normal file
13
tex/2_lit/2_class/1_intro.tex
Normal file
|
@ -0,0 +1,13 @@
|
|||
\subsection{Demand Forecasting with Classical Forecasting Methods}
|
||||
\label{class_methods}
|
||||
|
||||
Forecasting became a formal discipline starting in the 1950s and has its
|
||||
origins in the broader field of statistics.
|
||||
\cite{hyndman2018} provide a thorough overview of the concepts and methods
|
||||
established, and \cite{ord2017} indicate business-related applications
|
||||
such as demand forecasting.
|
||||
These "classical" forecasting methods share the characteristic that they are
|
||||
trained over the entire $Y$ first.
|
||||
Then, for prediction, the forecaster specifies the number of time steps for
|
||||
which he wants to generate forecasts.
|
||||
That is different for ML models.
|
78
tex/2_lit/2_class/2_ets.tex
Normal file
78
tex/2_lit/2_class/2_ets.tex
Normal file
|
@ -0,0 +1,78 @@
|
|||
\subsubsection{Na\"{i}ve Methods, Moving Averages, and Exponential Smoothing.}
|
||||
\label{ets}
|
||||
|
||||
Simple forecasting methods are often employed as a benchmark for more
|
||||
sophisticated ones.
|
||||
The so-called na\"{i}ve and seasonal na\"{i}ve methods forecast the next time
|
||||
step in a time series, $y_{T+1}$, with the last observation, $y_T$,
|
||||
and, if a seasonal pattern is present, with the observation $k$ steps
|
||||
before, $y_{T+1-k}$.
|
||||
As variants, both methods can be generalized to include drift terms in the
|
||||
presence of a trend or changing seasonal amplitude.
|
||||
|
||||
If a time series exhibits no trend, a simple moving average (SMA) is a
|
||||
generalization of the na\"{i}ve method that is more robust to outliers.
|
||||
It is defined as follows: $\hat{y}_{T+1} = \frac{1}{h} \sum_{i=T-h}^{T} y_i$
|
||||
where $h$ is the horizon over which the average is calculated.
|
||||
If a time series exhibits a seasonal pattern, setting $h$ to a multiple of the
|
||||
periodicity $k$ suffices that the forecast is unbiased.
|
||||
|
||||
Starting in the 1950s, another popular family of forecasting methods,
|
||||
so-called exponential smoothing methods, was introduced by
|
||||
\cite{brown1959}, \cite{holt1957}, and \cite{winters1960}.
|
||||
The idea is that forecasts $\hat{y}_{T+1}$ are a weighted average of past
|
||||
observations where the weights decay over time; in the case of the simple
|
||||
exponential smoothing (SES) method we obtain:
|
||||
$
|
||||
\hat{y}_{T+1} = \alpha y_T + \alpha (1 - \alpha) y_{T-1}
|
||||
+ \alpha (1 - \alpha)^2 y_{T-2}
|
||||
+ \dots + \alpha (1 - \alpha)^{T-1} y_{1}
|
||||
$
|
||||
where $\alpha$ (with $0 \le \alpha \le 1$) is a smoothing parameter.
|
||||
|
||||
Exponential smoothing methods are often expressed in an alternative component
|
||||
form that consists of a forecast equation and one or more smoothing
|
||||
equations for unobservable components.
|
||||
Below, we present a generalization of SES, the so-called Holt-Winters'
|
||||
seasonal method, in an additive formulation.
|
||||
$\ell_t$, $b_t$, and $s_t$ represent the unobservable level, trend, and
|
||||
seasonal components inherent in $y_t$, and $\beta$ and $\gamma$ complement
|
||||
$\alpha$ as smoothing parameters:
|
||||
\begin{align*}
|
||||
\hat{y}_{t+1} & = \ell_t + b_t + s_{t+1-k} \\
|
||||
\ell_t & = \alpha(y_t - s_{t-k}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\
|
||||
b_t & = \beta (\ell_{t} - \ell_{t-1}) + (1 - \beta) b_{t-1} \\
|
||||
s_t & = \gamma (y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma)s_{t-k}
|
||||
\end{align*}
|
||||
With $b_t$, $s_t$, $\beta$, and $\gamma$ removed, this formulation reduces to
|
||||
SES.
|
||||
Distinct variations exist: Besides the three components, \cite{gardner1985}
|
||||
add dampening for the trend, \cite{pegels1969} provides multiplicative
|
||||
formulations, and \cite{taylor2003} adds dampening to the latter.
|
||||
The accuracy measure commonly employed is the sum of squared errors between
|
||||
the observations and their forecasts.
|
||||
|
||||
Originally introduced by \cite{assimakopoulos2000}, \cite{hyndman2003} show
|
||||
how the Theta method can be regarded as an equivalent to SES with a drift
|
||||
term.
|
||||
We mention this method here only because \cite{bell2018} emphasize that it
|
||||
performs well at Uber.
|
||||
However, in our empirical study, we find that this is not true in general.
|
||||
|
||||
\cite{hyndman2002} introduce statistical processes, so-called innovations
|
||||
state-space models, to generalize the methods in this sub-section.
|
||||
They call this family of models ETS as they capture error, trend, and seasonal
|
||||
terms.
|
||||
Linear and additive ETS models have a structure like so:
|
||||
\begin{align*}
|
||||
y_t & = \vec{w} \cdot \vec{x}_{t-1} + \epsilon_t \\
|
||||
\vec{x_t} & = \mat{F} \vec{x}_{t-1} + \vec{g} \epsilon_t
|
||||
\end{align*}
|
||||
$y_t$ denote the observations as before while $\vec{x}_t$ is a state vector of
|
||||
unobserved components.
|
||||
$\epsilon_t$ is a white noise series and the matrix $\mat{F}$ and the vectors
|
||||
$\vec{g}$ and $\vec{w}$ contain a model's coefficients.
|
||||
Just as the models in the next sub-section, ETS models are commonly fitted
|
||||
with maximum likelihood and evaluated using information theoretical
|
||||
criteria against historical data.
|
||||
We refer to \cite{hyndman2008b} for a thorough summary.
|
69
tex/2_lit/2_class/3_arima.tex
Normal file
69
tex/2_lit/2_class/3_arima.tex
Normal file
|
@ -0,0 +1,69 @@
|
|||
\subsubsection{Autoregressive Integrated Moving Averages.}
|
||||
\label{arima}
|
||||
|
||||
\cite{box1962}, \cite{box1968}, and more papers by the same authors in the
|
||||
1960s introduce a type of model where observations correlate with their
|
||||
neighbors and refer to them as autoregressive integrated moving average
|
||||
(ARIMA) models for stationary time series.
|
||||
For a thorough overview, we refer to \cite{box2015} and \cite{brockwell2016}.
|
||||
|
||||
A time series $y_t$ is stationary if its moments are independent of the
|
||||
point in time where it is observed.
|
||||
A typical example is a white noise $\epsilon_t$ series.
|
||||
Therefore, a trend or seasonality implies non-stationarity.
|
||||
\cite{kwiatkowski1992} provide a test to check the null hypothesis of
|
||||
stationary data.
|
||||
To obtain a stationary time series, one chooses from several techniques:
|
||||
First, to stabilize a changing variance (i.e., heteroscedasticity), one
|
||||
applies a Box-Cox transformation (e.g., $log$) as first suggested by
|
||||
\cite{box1964}.
|
||||
Second, to factor out a trend (or seasonal) pattern, one computes differences
|
||||
of consecutive (or of lag $k$) observations or even differences thereof.
|
||||
Third, it is also common to pre-process $y_t$ with one of the decomposition
|
||||
methods mentioned in Sub-section \ref{stl} below with an ARIMA model
|
||||
then trained on an adjusted $y_t$.
|
||||
|
||||
In the autoregressive part, observations are modeled as linear combinations of
|
||||
its predecessors.
|
||||
Formally, an $AR(p)$ model is defined with a drift term $c$, coefficients
|
||||
$\phi_i$ to be estimated (where $i$ is an index with $0 < i \leq p$), and
|
||||
white noise $\epsilon_t$ like so:
|
||||
$
|
||||
AR(p): \ \
|
||||
y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p}
|
||||
+ \epsilon_t
|
||||
$.
|
||||
The moving average part considers observations to be regressing towards a
|
||||
linear combination of past forecasting errors.
|
||||
Formally, a $MA(q)$ model is defined with a drift term $c$, coefficients
|
||||
$\theta_j$ to be estimated, and white noise terms $\epsilon_t$ (where $j$
|
||||
is an index with $0 < j \leq q$) as follows:
|
||||
$
|
||||
MA(q): \ \
|
||||
y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2}
|
||||
+ \dots + \theta_q \epsilon_{t-q}
|
||||
$.
|
||||
Finally, an $ARIMA(p,d,q)$ model unifies both parts and adds differencing
|
||||
where $d$ is the degree of differences and the $'$ indicates differenced
|
||||
values:
|
||||
$
|
||||
ARIMA(p,d,q): \ \
|
||||
y'_t = c + \phi_1 y'_{t-1} + \dots + \phi_p y'_{t-p} + \theta_1 \epsilon_{t-1}
|
||||
+ \dots + \theta_q \epsilon_{t-q} + \epsilon_{t}
|
||||
$.
|
||||
|
||||
$ARIMA(p,d,q)$ models are commonly fitted with maximum likelihood estimation.
|
||||
To find an optimal combination of the parameters $p$, $d$, and $q$, the
|
||||
literature suggests calculating an information theoretical criterion
|
||||
(e.g., Akaike's Information Criterion) that evaluates the fit on
|
||||
historical data.
|
||||
\cite{hyndman2008a} provide a step-wise heuristic to choose $p$, $d$, and $q$,
|
||||
that also decides if a Box-Cox transformation is to be applied, and if so,
|
||||
which one.
|
||||
To obtain a one-step-ahead forecast, the above equation is reordered such
|
||||
that $t$ is substituted with $T+1$.
|
||||
For forecasts further into the future, the actual observations are
|
||||
subsequently replaced by their forecasts.
|
||||
Seasonal ARIMA variants exist; however, the high frequency $k$ in the kind of
|
||||
demand a UDP faces typically renders them impractical as too many
|
||||
coefficients must be estimated.
|
62
tex/2_lit/2_class/4_stl.tex
Normal file
62
tex/2_lit/2_class/4_stl.tex
Normal file
|
@ -0,0 +1,62 @@
|
|||
\subsubsection{Seasonal and Trend Decomposition using Loess.}
|
||||
\label{stl}
|
||||
|
||||
A time series $y_t$ may exhibit different types of patterns; to fully capture
|
||||
each of them, the series must be decomposed.
|
||||
Then, each component is forecast with a distinct model.
|
||||
Most commonly, the components are the trend $t_t$, seasonality $s_t$, and
|
||||
remainder $r_t$.
|
||||
They are themselves time series, where only $s_t$ exhibits a periodicity $k$.
|
||||
A decomposition may be additive (i.e., $y_t = s_t + t_t + r_t$) or
|
||||
multiplicative (i.e., $y_t = s_t * t_t * r_t$); the former assumes that
|
||||
the effect of the seasonal component is independent of the overall level
|
||||
of $y_t$ and vice versa.
|
||||
The seasonal component is centered around $0$ in both cases such that its
|
||||
removal does not affect the level of $y_t$.
|
||||
Often, it is sufficient to only seasonally adjust the time series, and model
|
||||
the trend and remainder together, for example, as $a_t = y_t - s_t$ in the
|
||||
additive case.
|
||||
|
||||
Early approaches employed moving averages (cf., Sub-section \ref{ets}) to
|
||||
calculate a trend component, and, after removing that from $y_t$, averaged
|
||||
all observations of the same seasonal lag to obtain the seasonal
|
||||
component.
|
||||
The downsides of this are the subjectivity in choosing the window lengths for
|
||||
the moving average and the seasonal averaging, the incapability of the
|
||||
seasonal component to vary its amplitude over time, and the missing
|
||||
handling of outliers.
|
||||
|
||||
The X11 method developed at the U.S. Census Bureau and described in detail by
|
||||
\cite{dagum2016} overcomes these disadvantages.
|
||||
However, due to its background in economics, it is designed primarily for
|
||||
quarterly or monthly data, and the change in amplitude over time cannot be
|
||||
controlled.
|
||||
Variants of this method are the SEATS decomposition by the Bank of Spain and
|
||||
the newer X13-SEATS-ARIMA method by the U.S. Census Bureau.
|
||||
Their main advantages stem from the fact that the models calibrate themselves
|
||||
according to statistical criteria without manual work for a statistician
|
||||
and that the fitting process is robust to outliers.
|
||||
|
||||
\cite{cleveland1990} introduce a seasonal and trend decomposition using a
|
||||
repeated locally weighted regression - the so-called Loess procedure - to
|
||||
smoothen the trend and seasonal components, which can be viewed as a
|
||||
generalization of the methods above and is denoted by the acronym
|
||||
\gls{stl}.
|
||||
In contrast to the X11, X13, and SEATS methods, the STL supports seasonalities
|
||||
of any lag $k$ that must, however, be determined with additional
|
||||
statistical tests or set with out-of-band knowledge by the forecaster
|
||||
(e.g., hourly demand data implies $k = 24 * 7 = 168$ assuming customer
|
||||
behavior differs on each day of the week).
|
||||
Moreover, the seasonal component's rate of change, represented by the $ns$
|
||||
parameter and explained in detail with Figure \ref{f:stl} in Section
|
||||
\ref{decomp}, must be set by the forecaster as well, while the trend's
|
||||
smoothness may be controlled via setting a non-default window size.
|
||||
Outliers are handled by assignment to the remainder such that they do not
|
||||
affect the trend and seasonal components.
|
||||
In particular, the manual input needed to calibrate the STL explains why only
|
||||
the X11, X13, and SEATS methods are widely used by practitioners.
|
||||
However, the widespread adoption of concepts like cross-validation (cf.,
|
||||
Sub-section \ref{cv}) in recent years enables the usage of an automated
|
||||
grid search to optimize the parameters.
|
||||
The STL's usage within a grid search is facilitated even further by its being
|
||||
computationally cheaper than the other methods discussed.
|
15
tex/2_lit/3_ml/1_intro.tex
Normal file
15
tex/2_lit/3_ml/1_intro.tex
Normal file
|
@ -0,0 +1,15 @@
|
|||
\subsection{Demand Forecasting with Machine Learning Methods}
|
||||
\label{ml_methods}
|
||||
|
||||
ML methods have been employed in all kinds of prediction tasks in recent
|
||||
years.
|
||||
In this section, we restrict ourselves to the models that performed well in
|
||||
our study: Random Forest (\gls{rf}) and Support Vector Regression
|
||||
(\gls{svr}).
|
||||
RFs are in general well-suited for datasets without a priori knowledge about
|
||||
the patterns, while SVR is known to perform well on time series data, as
|
||||
shown by \cite{hansen2006} in general and \cite{bao2004} specifically for
|
||||
intermittent demand.
|
||||
Gradient Boosting, another popular ML method, was consistently outperformed by
|
||||
RFs, and artificial neural networks require an amount of data
|
||||
exceeding what our industry partner has by far.
|
53
tex/2_lit/3_ml/2_learning.tex
Normal file
53
tex/2_lit/3_ml/2_learning.tex
Normal file
|
@ -0,0 +1,53 @@
|
|||
\subsubsection{Supervised Learning.}
|
||||
\label{learning}
|
||||
|
||||
A conceptual difference between classical and ML methods is the format
|
||||
for the model inputs.
|
||||
In ML models, a time series $Y$ is interpreted as labeled data.
|
||||
Labels are collected into a vector $\vec{y}$ while the corresponding
|
||||
predictors are aligned in an $(T - n) \times n$ matrix $\mat{X}$:
|
||||
$$
|
||||
\vec{y}
|
||||
=
|
||||
\begin{pmatrix}
|
||||
y_T \\
|
||||
y_{T-1} \\
|
||||
\dots \\
|
||||
y_{n+1}
|
||||
\end{pmatrix}
|
||||
~~~~~~~~~~
|
||||
\mat{X}
|
||||
=
|
||||
\begin{bmatrix}
|
||||
y_{T-1} & y_{T-2} & \dots & y_{T-n} \\
|
||||
y_{T-2} & y_{T-3} & \dots & y_{T-(n+1)} \\
|
||||
\dots & \dots & \dots & \dots \\
|
||||
y_n & y_{n-1} & \dots & y_1
|
||||
\end{bmatrix}
|
||||
$$
|
||||
The $m = T - n$ rows are referred to as samples and the $n$ columns as
|
||||
features.
|
||||
Each row in $\mat{X}$ is "labeled" by the corresponding entry in $\vec{y}$,
|
||||
and ML models are trained to fit the rows to their labels.
|
||||
Conceptually, we model a functional relationship $f$ between $\mat{X}$ and
|
||||
$\vec{y}$ such that the difference between the predicted
|
||||
$\vec{\hat{y}} = f(\mat{X})$ and the true $\vec{y}$ are minimized
|
||||
according to some error measure $L(\vec{\hat{y}}, \vec{y})$, where $L$
|
||||
summarizes the goodness of the fit into a scalar value (e.g., the
|
||||
well-known mean squared error [MSE]; cf., Section \ref{mase}).
|
||||
$\mat{X}$ and $\vec{y}$ show the ordinal character of time series data:
|
||||
Not only overlap the entries of $\mat{X}$ and $\vec{y}$, but the rows of
|
||||
$\mat{X}$ are shifted versions of each other.
|
||||
That does not hold for ML applications in general (e.g., the classical
|
||||
example of predicting spam vs. no spam emails, where the features model
|
||||
properties of individual emails), and most of the common error measures
|
||||
presented in introductory texts on ML, are only applicable in cases
|
||||
without such a structure in $\mat{X}$ and $\vec{y}$.
|
||||
$n$, the number of past time steps required to predict a $y_t$, is an
|
||||
exogenous model parameter.
|
||||
For prediction, the forecaster supplies the trained ML model an input
|
||||
vector in the same format as a row $\vec{x}_i$ in $\mat{X}$.
|
||||
For example, to predict $y_{T+1}$, the model takes the vector
|
||||
$(y_T, y_{T-1}, ..., y_{T-n+1})$ as input.
|
||||
That is in contrast to the classical methods, where we only supply the number
|
||||
of time steps to be predicted as a scalar integer.
|
38
tex/2_lit/3_ml/3_cv.tex
Normal file
38
tex/2_lit/3_ml/3_cv.tex
Normal file
|
@ -0,0 +1,38 @@
|
|||
\subsubsection{Cross-Validation.}
|
||||
\label{cv}
|
||||
|
||||
Because ML models are trained by minimizing a loss function $L$, the
|
||||
resulting value of $L$ underestimates the true error we see when
|
||||
predicting into the actual future by design.
|
||||
To counter that, one popular and model-agnostic approach is cross-validation
|
||||
(\gls{cv}), as summarized, for example, by \cite{hastie2013}.
|
||||
CV is a resampling technique, which ranomdly splits the samples into a
|
||||
training and a test set.
|
||||
Trained on the former, an ML model makes forecasts on the latter.
|
||||
Then, the value of $L$ calculated only on the test set gives a realistic and
|
||||
unbiased estimate of the true forecasting error, and may be used for one
|
||||
of two distinct aspects:
|
||||
First, it assesses the quality of a fit and provides an idea as to how the
|
||||
model would perform in production when predicting into the actual future.
|
||||
Second, the errors of models of either different methods or the same method
|
||||
with different parameters may be compared with each other to select the
|
||||
best model.
|
||||
In order to first select the best model and then assess its quality, one must
|
||||
apply two chained CVs:
|
||||
The samples are divided into training, validation, and test sets, and all
|
||||
models are trained on the training set and compared on the validation set.
|
||||
Then, the winner is retrained on the union of the training and validation
|
||||
sets and assessed on the test set.
|
||||
|
||||
Regarding the splitting, there are various approaches, and we choose the
|
||||
so-called $k$-fold CV, where the samples are randomly divided into $k$
|
||||
folds of the same size.
|
||||
Each fold is used as a test set once and the remaining $k-1$ folds become
|
||||
the corresponding training set.
|
||||
The resulting $k$ error measures are averaged.
|
||||
A $k$-fold CV with $k=5$ or $k=10$ is a compromise between the two extreme
|
||||
cases of having only one split and the so-called leave-one-out CV
|
||||
where $k = m$: Computation is still relatively fast and each sample is
|
||||
part of several training sets maximizing the learning from the data.
|
||||
We adapt the $k$-fold CV to the ordinal stucture in $\mat{X}$ and $\vec{y}$ in
|
||||
Sub-section \ref{unified_cv}.
|
66
tex/2_lit/3_ml/4_rf.tex
Normal file
66
tex/2_lit/3_ml/4_rf.tex
Normal file
|
@ -0,0 +1,66 @@
|
|||
\subsubsection{Random Forest Regression.}
|
||||
\label{rf}
|
||||
|
||||
\cite{breiman1984} introduce the classification and regression tree
|
||||
(\gls{cart}) model that is built around the idea that a single binary
|
||||
decision tree maps learned combinations of intervals of the feature
|
||||
columns to a label.
|
||||
Thus, each sample in the training set is associated with one leaf node that
|
||||
is reached by following the tree from its root and branching along the
|
||||
arcs according to some learned splitting rule per intermediate node that
|
||||
compares the sample's realization for the feature specified by the rule to
|
||||
the learned decision rule.
|
||||
While such models are computationally fast and offer a high degree of
|
||||
interpretability, they tend to overfit strongly to the training set as
|
||||
the splitting rules are not limited to any functional form (e.g., linear)
|
||||
in the relationship between the features and the labels.
|
||||
In the regression case, it is common to maximize the variance reduction $I_V$
|
||||
from a parent node $N$ to its two children, $C1$ and $C2$, as the
|
||||
splitting rule.
|
||||
\cite{breiman1984} formulate this as follows:
|
||||
$$
|
||||
I_V(N)
|
||||
=
|
||||
\frac{1}{|S_N|^2} \sum_{i \in S_N} \sum_{j \in S_N}
|
||||
\frac{1}{2} (y_i - y_j)^2
|
||||
- \left(
|
||||
\frac{1}{|S_{C1}|^2} \sum_{i \in S_{C1}} \sum_{j \in S_{C1}}
|
||||
\frac{1}{2} (y_i - y_j)^2
|
||||
+
|
||||
\frac{1}{|S_{C2}|^2} \sum_{i \in S_{C2}} \sum_{j \in S_{C2}}
|
||||
\frac{1}{2} (y_i - y_j)^2
|
||||
\right)
|
||||
$$
|
||||
$S_N$, $S_{C1}$, and $S_{C2}$ are the index sets of the samples in $N$, $C1$,
|
||||
and $C2$.
|
||||
|
||||
\cite{ho1998} and then \cite{breiman2001} generalize this method by combining
|
||||
many CART models into one forest of trees where every single tree is
|
||||
a randomized variant of the others.
|
||||
Randomization is achieved at two steps in the training process:
|
||||
First, each tree receives a distinct training set resampled with replacement
|
||||
from the original training set, an idea also called bootstrap
|
||||
aggregation.
|
||||
Second, at each node a random subset of the features is used to grow the tree.
|
||||
Trees can be fitted in parallel speeding up the training significantly.
|
||||
For prediction at the tree level, the average of all the samples at a
|
||||
particular leaf node is used.
|
||||
Then, the individual values are combined into one value by averaging again
|
||||
across the trees.
|
||||
Due to the randomization, the trees are decorrelated offsetting the
|
||||
overfitting.
|
||||
Another measure to counter overfitting is pruning the tree, either by
|
||||
specifying the maximum depth of a tree or the minimum number of samples
|
||||
at leaf nodes.
|
||||
|
||||
The forecaster must tune the structure of the forest.
|
||||
Parameters include the number of trees in the forest, the size of the random
|
||||
subset of features, and the pruning criteria.
|
||||
The parameters are optimized via grid search: We train many models with
|
||||
parameters chosen from a pre-defined list of values and select the best
|
||||
one by CV.
|
||||
RFs are a convenient ML method for any dataset as decision trees do not
|
||||
make any assumptions about the relationship between features and labels.
|
||||
\cite{herrera2010} use RFs to predict the hourly demand for water in an urban
|
||||
context, a similar application as the one in this paper, and find that RFs
|
||||
work well with time series type of data.
|
60
tex/2_lit/3_ml/5_svm.tex
Normal file
60
tex/2_lit/3_ml/5_svm.tex
Normal file
|
@ -0,0 +1,60 @@
|
|||
\subsubsection{Support Vector Regression.}
|
||||
\label{svm}
|
||||
|
||||
\cite{vapnik1963} and \cite{vapnik1964} introduce the so-called support vector
|
||||
machine (\gls{svm}) model, and \cite{vapnik2013} summarizes the research
|
||||
conducted since then.
|
||||
In its basic version, SVMs are linear classifiers, modeling a binary
|
||||
decision, that fit a hyperplane into the feature space of $\mat{X}$ to
|
||||
maximize the margin around the hyperplane seperating the two groups of
|
||||
labels.
|
||||
SVMs were popularized in the 1990s in the context of optical character
|
||||
recognition, as shown in \cite{scholkopf1998}.
|
||||
|
||||
\cite{drucker1997} and \cite{stitson1999} adapt SVMs to the regression case,
|
||||
and \cite{smola2004} provide a comprehensive introduction thereof.
|
||||
\cite{mueller1997} and \cite{mueller1999} focus on SVRs in the context of time
|
||||
series data and find that they tend to outperform classical methods.
|
||||
\cite{chen2006a} and \cite{chen2006b} apply SVRs to predict the hourly demand
|
||||
for water in cities, an application similar to the UDP case.
|
||||
|
||||
In the SVR case, a linear function
|
||||
$\hat{y}_i = f(\vec{x}_i) = \langle\vec{w},\vec{x}_i\rangle + b$
|
||||
is fitted so that the actual labels $y_i$ have a deviation of at most
|
||||
$\epsilon$ from their predictions $\hat{y}_i$ (cf., the constraints
|
||||
below).
|
||||
SVRs are commonly formulated as quadratic optimization problems as follows:
|
||||
$$
|
||||
\text{minimize }
|
||||
\frac{1}{2} \norm{\vec{w}}^2 + C \sum_{i=1}^m (\xi_i + \xi_i^*)
|
||||
\quad \text{subject to }
|
||||
\begin{cases}
|
||||
y_i - \langle \vec{w}, \vec{x}_i \rangle - b \leq \epsilon + \xi_i
|
||||
\text{,} \\
|
||||
\langle \vec{w}, \vec{x}_i \rangle + b - y_i \leq \epsilon + \xi_i^*
|
||||
\end{cases}
|
||||
$$
|
||||
$\vec{w}$ are the fitted weights in the row space of $\mat{X}$, $b$ is a bias
|
||||
term in the column space of $\mat{X}$, and $\langle\cdot,\cdot\rangle$
|
||||
denotes the dot product.
|
||||
By minimizing the norm of $\vec{w}$, the fitted function is flat and not prone
|
||||
to overfitting strongly.
|
||||
To allow individual samples outside the otherwise hard $\epsilon$ bounds,
|
||||
non-negative slack variables $\xi_i$ and $\xi_i^*$ are included.
|
||||
A non-negative parameter $C$ regulates how many samples may violate the
|
||||
$\epsilon$ bounds and by how much.
|
||||
To model non-linear relationships, one could use a mapping $\Phi(\cdot)$ for
|
||||
the $\vec{x}_i$ from the row space of $\mat{X}$ to some higher
|
||||
dimensional space; however, as the optimization problem only depends on
|
||||
the dot product $\langle\cdot,\cdot\rangle$ and not the actual entries of
|
||||
$\vec{x}_i$, it suffices to use a kernel function $k$ such that
|
||||
$k(\vec{x}_i,\vec{x}_j) = \langle\Phi(\vec{x}_i),\Phi(\vec{x}_j)\rangle$.
|
||||
Such kernels must fulfill certain mathematical properties, and, besides
|
||||
polynomial kernels, radial basis functions with
|
||||
$k(\vec{x}_i,\vec{x}_j) = exp(\gamma \norm{\vec{x}_i - \vec{x}_j}^2)$ are
|
||||
a popular candidate where $\gamma$ is a parameter controlling for how the
|
||||
distances between any two samples influence the final model.
|
||||
SVRs work well with sparse data in high dimensional spaces, such as
|
||||
intermittent demand data, as they minimize the risk of misclassification
|
||||
or predicting a significantly far off value by maximizing the error
|
||||
margin, as also noted by \cite{bao2004}.
|
|
@ -1,2 +1,8 @@
|
|||
\section{Model Formulation}
|
||||
\label{mod}
|
||||
\label{mod}
|
||||
|
||||
% temporary placeholders
|
||||
\label{decomp}
|
||||
\label{f:stl}
|
||||
\label{mase}
|
||||
\label{unified_cv}
|
|
@ -1,7 +1,25 @@
|
|||
% Abbreviations for technical terms.
|
||||
\newglossaryentry{cart}{
|
||||
name=CART, description={Classification and Regression Trees}
|
||||
}
|
||||
\newglossaryentry{cv}{
|
||||
name=CV, description={Cross Validation}
|
||||
}
|
||||
\newglossaryentry{ml}{
|
||||
name=ML, description={Machine Learning}
|
||||
}
|
||||
\newglossaryentry{rf}{
|
||||
name=RF, description={Random Forest}
|
||||
}
|
||||
\newglossaryentry{stl}{
|
||||
name=STL, description={Seasonal and Trend Decomposition using Loess}
|
||||
}
|
||||
\newglossaryentry{svm}{
|
||||
name=SVM, description={Support Vector Machine}
|
||||
}
|
||||
\newglossaryentry{svr}{
|
||||
name=SVR, description={Support Vector Regression}
|
||||
}
|
||||
\newglossaryentry{udp}{
|
||||
name=UDP, description={Urban Delivery Platform}
|
||||
}
|
||||
|
|
|
@ -6,4 +6,9 @@
|
|||
|
||||
% Make opening quotes look different than closing quotes.
|
||||
\usepackage[english=american]{csquotes}
|
||||
\MakeOuterQuote{"}
|
||||
\MakeOuterQuote{"}
|
||||
|
||||
% Define helper commands.
|
||||
\usepackage{bm}
|
||||
\newcommand{\mat}[1]{\bm{#1}}
|
||||
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
|
|
@ -7,6 +7,25 @@ volume={129},
|
|||
pages={263--286}
|
||||
}
|
||||
|
||||
@article{assimakopoulos2000,
|
||||
title={The theta model: a decomposition approach to forecasting},
|
||||
author={Assimakopoulos, Vassilis and Nikolopoulos, Konstantinos},
|
||||
year={2000},
|
||||
journal={International Journal of Forecasting},
|
||||
volume={16},
|
||||
number={4},
|
||||
pages={521--530}
|
||||
}
|
||||
|
||||
@inproceedings{bao2004,
|
||||
title={Forecasting intermittent demand by SVMs regression},
|
||||
author={Bao, Yukun and Wang, Wen and Zhang, Jinlong},
|
||||
year={2004},
|
||||
booktitle={2004 IEEE International Conference on Systems, Man and Cybernetics},
|
||||
volume={1},
|
||||
pages={461--466}
|
||||
}
|
||||
|
||||
@misc{bell2018,
|
||||
title = {Forecasting at Uber: An Introduction},
|
||||
author={Bell, Franziska and Smyl, Slawek},
|
||||
|
@ -15,6 +34,119 @@ howpublished = {\url{https://eng.uber.com/forecasting-introduction/}},
|
|||
note = {Accessed: 2020-10-01}
|
||||
}
|
||||
|
||||
@article{box1962,
|
||||
title={Some statistical Aspects of adaptive Optimization and Control},
|
||||
author={Box, George and Jenkins, Gwilym},
|
||||
year={1962},
|
||||
journal={Journal of the Royal Statistical Society. Series B (Methodological)},
|
||||
volume={24},
|
||||
number={2},
|
||||
pages={297--343}
|
||||
}
|
||||
|
||||
@article{box1964,
|
||||
title={An Analysis of Transformations},
|
||||
author={Box, George and Cox, David},
|
||||
year={1964},
|
||||
journal={Journal of the Royal Statistical Society. Series B (Methodological)},
|
||||
volume={26},
|
||||
number={2},
|
||||
pages={211--252}
|
||||
}
|
||||
|
||||
@article{box1968,
|
||||
title={Some recent Advances in Forecasting and Control},
|
||||
author={Box, George and Jenkins, Gwilym},
|
||||
year={1968},
|
||||
journal={Journal of the Royal Statistical Society.
|
||||
Series C (Applied Statistics)},
|
||||
volume={17},
|
||||
number={2},
|
||||
pages={91--109}
|
||||
}
|
||||
|
||||
@book{box2015,
|
||||
title={Time Series Analysis: Forecasting and Control},
|
||||
author={Box, George and Jenkins, Gwilym and Reinsel, Gregory and Ljung, Greta},
|
||||
series={Wiley Series in Probability and Statistics},
|
||||
year={2015},
|
||||
publisher={Wiley}
|
||||
}
|
||||
|
||||
@book{breiman1984,
|
||||
title={Classification and Regression Trees},
|
||||
author={Breiman, Leo and Friedman, Jerome and Olshen, R.A.
|
||||
and Stone, Charles},
|
||||
year={1984},
|
||||
publisher={Wadsworth}
|
||||
}
|
||||
|
||||
@article{breiman2001,
|
||||
title={Random Forests},
|
||||
author={Breiman, Leo},
|
||||
year={2001},
|
||||
journal={Machine Learning},
|
||||
volume={45},
|
||||
number={1},
|
||||
pages={5--32}
|
||||
}
|
||||
|
||||
@book{brockwell2016,
|
||||
title={Introduction to Time Series and Forecasting},
|
||||
author={Brockwell, Peter and Davis, Richard},
|
||||
series={Springer Texts in Statistics},
|
||||
year={2016},
|
||||
publisher={Springer}
|
||||
}
|
||||
|
||||
@book{brown1959,
|
||||
title={Statistical Forecasting for Inventory Control},
|
||||
author={Brown, Robert},
|
||||
year={1959},
|
||||
publisher={McGraw/Hill}
|
||||
}
|
||||
|
||||
@article{chen2006a,
|
||||
title={Hourly Water Demand Forecast Model based on Bayesian Least Squares
|
||||
Support Vector Machine},
|
||||
author={Chen, Lei and Zhang, Tu-qiao},
|
||||
year={2006},
|
||||
journal={Journal of Tianjin University},
|
||||
volume={39},
|
||||
number={9},
|
||||
pages={1037--1042}
|
||||
}
|
||||
|
||||
@article{chen2006b,
|
||||
title={Hourly Water Demand Forecast Model based on Least Squares Support
|
||||
Vector Machine},
|
||||
author={Chen, Lei and Zhang, Tu-qiao},
|
||||
year={2006},
|
||||
journal={Journal of Harbin Institute of Technology},
|
||||
volume={38},
|
||||
number={9},
|
||||
pages={1528--1530}
|
||||
}
|
||||
|
||||
@article{cleveland1990,
|
||||
title={STL: A Seasonal-Trend Decomposition Procedure Based on Loess},
|
||||
author={Cleveland, Robert and Cleveland, Williiam and McRae, Jean
|
||||
and Terpenning, Irma},
|
||||
year={1990},
|
||||
journal={Journal of Official Statistics},
|
||||
volume={6},
|
||||
number={1},
|
||||
pages={3--73}
|
||||
}
|
||||
|
||||
@book{dagum2016,
|
||||
title={Seasonal Adjustment Methods and Real Time Trend-Cycle Estimation},
|
||||
author={Dagum, Estela and Bianconcini, Silvia},
|
||||
series={Statistics for Social and Behavioral Sciences},
|
||||
year={2016},
|
||||
publisher={Springer}
|
||||
}
|
||||
|
||||
@article{de2006,
|
||||
title={25 Years of Time Series Forecasting},
|
||||
author={De Gooijer, Jan and Hyndman, Rob},
|
||||
|
@ -25,6 +157,16 @@ number={3},
|
|||
pages={443--473}
|
||||
}
|
||||
|
||||
@inproceedings{drucker1997,
|
||||
title={Support Vector Regression Machines},
|
||||
author={Drucker, Harris and Burges, Christopher and Kaufman, Linda
|
||||
and Smola, Alex and Vapnik, Vladimir},
|
||||
year={1997},
|
||||
booktitle={Advances in Neural Information Processing Systems},
|
||||
pages={155--161},
|
||||
organization={Springer}
|
||||
}
|
||||
|
||||
@article{ehmke2018,
|
||||
title={Optimizing for total costs in vehicle routing in urban areas},
|
||||
author={Ehmke, Jan Fabian and Campbell, Ann M and Thomas, Barrett W},
|
||||
|
@ -34,6 +176,45 @@ volume={116},
|
|||
pages={242--265}
|
||||
}
|
||||
|
||||
@article{gardner1985,
|
||||
title={Forecasting Trends in Time Series},
|
||||
author={Gardner, Everette and McKenzie, Ed},
|
||||
year={1985},
|
||||
journal={Management Science},
|
||||
volume={31},
|
||||
number={10},
|
||||
pages={1237--1246}
|
||||
}
|
||||
|
||||
@article{hansen2006,
|
||||
title={Some Evidence on Forecasting Time-Series with Support Vector Machines},
|
||||
author={Hansen, James and McDonald, James and Nelson, Ray},
|
||||
year={2006},
|
||||
journal={Journal of the Operational Research Society},
|
||||
volume={57},
|
||||
number={9},
|
||||
pages={1053--1063}
|
||||
}
|
||||
|
||||
@book{hastie2013,
|
||||
title={The Elements of Statistical Learning: Data Mining, Inference,
|
||||
and Prediction},
|
||||
author={Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome},
|
||||
year={2013},
|
||||
publisher={Springer}
|
||||
}
|
||||
|
||||
@article{herrera2010,
|
||||
title={Predictive Models for Forecasting Hourly Urban Water Demand},
|
||||
author={Herrera, Manuel and Torgo, Lu{\'\i}s and Izquierdo, Joaqu{\'\i}n
|
||||
and P{\'e}rez-Garc{\'\i}a, Rafael},
|
||||
year={2010},
|
||||
journal={Journal of Hydrology},
|
||||
volume={387},
|
||||
number={1-2},
|
||||
pages={141--150}
|
||||
}
|
||||
|
||||
@misc{hirschberg2016,
|
||||
title = {McKinsey: The changing market for food delivery},
|
||||
author={Hirschberg, Carsten and Rajko, Alexander and Schumacher, Thomas
|
||||
|
@ -44,6 +225,25 @@ howpublished = "\url{https://www.mckinsey.com/industries/high-tech/
|
|||
note = {Accessed: 2020-10-01}
|
||||
}
|
||||
|
||||
@article{ho1998,
|
||||
title={The Random Subspace Method for Constructing Decision Forests},
|
||||
author={Ho, Tin Kam},
|
||||
year={1998},
|
||||
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
|
||||
volume={20},
|
||||
number={8},
|
||||
pages={832--844}
|
||||
}
|
||||
|
||||
@article{holt1957,
|
||||
title={Forecasting Seasonals and Trends by Exponentially Weighted Moving
|
||||
Averages},
|
||||
author={Holt, Charles},
|
||||
year={1957},
|
||||
journal={ONR Memorandum},
|
||||
volume={52}
|
||||
}
|
||||
|
||||
@article{hou2018,
|
||||
title={Ride-matching and routing optimisation: Models and a large
|
||||
neighbourhood search heuristic},
|
||||
|
@ -54,6 +254,62 @@ volume={118},
|
|||
pages={143--162}
|
||||
}
|
||||
|
||||
@article{hyndman2002,
|
||||
title={A State Space Framework for Automatic Forecasting using Exponential
|
||||
Smoothing Methods},
|
||||
author={Hyndman, Rob and Koehler, Anne and Snyder, Ralph and Grose, Simone},
|
||||
year={2002},
|
||||
journal={International Journal of Forecasting},
|
||||
volume={18},
|
||||
number={3},
|
||||
pages={439--454}
|
||||
}
|
||||
|
||||
@article{hyndman2003,
|
||||
title={Unmasking the Theta method},
|
||||
author={Hyndman, Rob and Billah, Baki},
|
||||
year={2003},
|
||||
journal={International Journal of Forecasting},
|
||||
volume={19},
|
||||
number={2},
|
||||
pages={287--290}
|
||||
}
|
||||
|
||||
@article{hyndman2008a,
|
||||
title={Automatic Time Series Forecasting: The forecast package for R},
|
||||
author={Hyndman, Rob and Khandakar, Yeasmin},
|
||||
year={2008},
|
||||
journal={Journal of Statistical Software},
|
||||
volume={26},
|
||||
number={3}
|
||||
}
|
||||
|
||||
@book{hyndman2008b,
|
||||
title={Forecasting with Exponential Smoothing: the State Space Approach},
|
||||
author={Hyndman, Rob and Koehler, Anne and Ord, Keith and Snyder, Ralph},
|
||||
year={2008},
|
||||
publisher={Springer}
|
||||
}
|
||||
|
||||
@book{hyndman2018,
|
||||
title={Forecasting: Principles and Practice},
|
||||
author={Hyndman, Rob and Athanasopoulos, George},
|
||||
year={2018},
|
||||
publisher={OTexts}
|
||||
}
|
||||
|
||||
@article{kwiatkowski1992,
|
||||
title={Testing the null hypothesis of stationarity against the alternative of a
|
||||
unit root: How sure are we that economic time series have a unit root?},
|
||||
author={Kwiatkowski, Denis and Phillips, Peter and Schmidt, Peter
|
||||
and Shin, Yongcheol},
|
||||
year={1992},
|
||||
journal={Journal of Econometrics},
|
||||
volume={54},
|
||||
number={1-3},
|
||||
pages={159--178}
|
||||
}
|
||||
|
||||
@misc{laptev2017,
|
||||
title = {Engineering Extreme Event Forecasting
|
||||
at Uber with Recurrent Neural Networks},
|
||||
|
@ -74,6 +330,108 @@ volume={118},
|
|||
pages={392--420}
|
||||
}
|
||||
|
||||
@inproceedings{mueller1997,
|
||||
title={Predicting Time Series with Support Vector Machines},
|
||||
author={M{\"u}ller, Klaus-Robert and Smola, Alexander and R{\"a}tsch, Gunnar
|
||||
and Sch{\"o}lkopf, Bernhard and Kohlmorgen, Jens and Vapnik, Vladimir},
|
||||
year={1997},
|
||||
booktitle={International Conference on Artificial Neural Networks},
|
||||
pages={999--1004},
|
||||
organization={Springer}
|
||||
}
|
||||
|
||||
@article{mueller1999,
|
||||
title={Using Support Vector Machines for Time Series Prediction},
|
||||
author={M{\"u}ller, Klaus-Robert and Smola, Alexander and R{\"a}tsch, Gunnar
|
||||
and Sch{\"o}lkopf, Bernhard and Kohlmorgen, Jens and Vapnik, Vladimir},
|
||||
year={1999},
|
||||
journal={Advances in Kernel Methods — Support Vector Learning},
|
||||
pages={243--254},
|
||||
publisher={MIT, Cambridge, MA, USA}
|
||||
}
|
||||
|
||||
@book{ord2017,
|
||||
title={Principles of Business Forecasting},
|
||||
author={Ord, Keith and Fildes, Robert and Kourentzes, Nikos},
|
||||
year={2017},
|
||||
publisher={WESSEX Press}
|
||||
}
|
||||
|
||||
@article{pegels1969,
|
||||
title={Exponential Forecasting: Some new variations},
|
||||
author={Pegels, C.},
|
||||
year={1969},
|
||||
journal={Management Science},
|
||||
volume={15},
|
||||
number={5},
|
||||
pages={311--315}
|
||||
}
|
||||
|
||||
@incollection{scholkopf1998,
|
||||
title={Fast Approximation of Support Vector Kernel Expansions, and an
|
||||
Interpretation of Clustering as Approximation in Feature Spaces},
|
||||
author={Sch{\"o}lkopf, Bernhard and Knirsch, Phil and Smola, Alex
|
||||
and Burges, Chris},
|
||||
year={1998},
|
||||
booktitle={Mustererkennung 1998},
|
||||
publisher={Springer},
|
||||
pages={125--132}
|
||||
}
|
||||
|
||||
@article{smola2004,
|
||||
title={A Tutorial on Support Vector Regression},
|
||||
author={Smola, Alex and Sch{\"o}lkopf, Bernhard},
|
||||
year={2004},
|
||||
journal={Statistics and Computing},
|
||||
volume={14},
|
||||
number={3},
|
||||
pages={199--222}
|
||||
}
|
||||
|
||||
@article{stitson1999,
|
||||
title={Support Vector Regression with ANOVA Decomposition Kernels},
|
||||
author={Stitson, Mark and Gammerman, Alex and Vapnik, Vladimir
|
||||
and Vovk, Volodya and Watkins, Chris and Weston, Jason},
|
||||
year={1999},
|
||||
journal={Advances in Kernel Methods — Support Vector Learning},
|
||||
pages={285--292},
|
||||
publisher={MIT, Cambridge, MA, USA}
|
||||
}
|
||||
|
||||
@article{taylor2003,
|
||||
title={Exponential Smoothing with a Damped Multiplicative Trend},
|
||||
author={Taylor, James},
|
||||
year={2003},
|
||||
journal={International Journal of Forecasting},
|
||||
volume={19},
|
||||
number={4},
|
||||
pages={715--725}
|
||||
}
|
||||
|
||||
@article{vapnik1963,
|
||||
title={Pattern Recognition using Generalized Portrait Method},
|
||||
author={Vapnik, Vladimir and Lerner, A},
|
||||
year={1963},
|
||||
journal={Automation and Remote Control},
|
||||
volume={24},
|
||||
pages={774--780},
|
||||
}
|
||||
|
||||
@article{vapnik1964,
|
||||
title={A Note on one Class of Perceptrons},
|
||||
author={Vapnik, Vladimir and Chervonenkis, A},
|
||||
year={1964},
|
||||
journal={Automation and Remote Control},
|
||||
volume={25}
|
||||
}
|
||||
|
||||
@book{vapnik2013,
|
||||
title={The Nature of Statistical Learning Theory},
|
||||
author={Vapnik, Vladimir},
|
||||
year={2013},
|
||||
publisher={Springer}
|
||||
}
|
||||
|
||||
@article{wang2018,
|
||||
title={Delivering meals for multiple suppliers: Exclusive or sharing
|
||||
logistics service},
|
||||
|
@ -82,4 +440,14 @@ year={2018},
|
|||
journal={Transportation Research Part E: Logistics and Transportation Review},
|
||||
volume={118},
|
||||
pages={496--512}
|
||||
}
|
||||
|
||||
@article{winters1960,
|
||||
title={Forecasting Sales by Exponentially Weighted Moving Averages},
|
||||
author={Winters, Peter},
|
||||
year={1960},
|
||||
journal={Management Science},
|
||||
volume={6},
|
||||
number={3},
|
||||
pages={324--342}
|
||||
}
|
Loading…
Reference in a new issue