Just Do it a Lot and Take the Average!
16th February 2016
As part of our MRes course, we are currently producing a poster on an aspect of the Simulation Master Class given by Professor Barry Nelson in January. The aspect that our group have chosen to focus on is the Sample Average Approximation (SAA), one of the many methods of Simulation Optimisation currently being researched. As mentioned in Optimisation and Error in Simul?ation, Simulation Optimisation is used when the objective function $f(x)$ cannot be calculated explicity but can be estimated. The particular cases in which SAA can be applied are those for which $f(x)$ is the expectation of some function that involves a random element, $\xi$, whose distribution is unknown and does not depend on $x$. $$f(x) = E[f(x,\xi)]$$
We did not cover SAA in detail in the Master Class, but the poster has made us look at it in more detail. In this I have
found a very interesting topic mathematically, though it can be quit restricted in its application. The main paper I
have read was a review of the subject by Sujin Kim, Raghu Pasupathyy and Shane G. Henderson from National University of
Singapore, Virginia Technical University and Cornell University respectively.
The basis of the SAA method is to approximate $f(x)$ by a series of sample functions $f_n(x)$ that
converge uniformly to the true objective function as $n\rightarrow\infty$. For this, we need a sample path of realisations
of the source of randomness, $\boldsymbol{\xi} = (\xi_1,\xi_2,...,\xi_n)$. In practise, $\xi_i$ is the set of pseudorandom
numbers generated by the simulation in the $i^{th}$ run. The sample functions are then defined by
$$f_n(x) = \frac{1}{n}\sum_{i=1}^n f(x,\xi_i).$$
The hope is that as $n$ gets larger and larger, $f_n(x)$ tends to $f(x)$ in some respect. The property
that we are looking for is uniform convergence, but somehow this needs to be linked in with the Law of Large Numbers.
Fortunately for us, such a theorem does exist known as the Uniform Large of Large Numbers (ULLN). This states that if:
- $D$ is a compact set
- $f(x,\xi)$ is continuous at each $x\in D$ for almost all $\xi$ and a measurable function of $\xi$ at each $x$
- there exists $d(\xi)$ with finite expectation and $|f(x,\xi)|\leq d(\xi)$
then $E[f(x,\xi)]$ is continuous in $\xi$ and
$$\sup_{x\in D} \left|\frac{1}{n}\sum_{i=1}^nf(x,\xi_i) - E[f(x,\xi)]\right|\rightarrow 0$$
almost surely as $n\rightarrow\infty$. This is exactly what we require, and under certain conditions this is exactly what
we get for the sample functions. It is crucial that the ULLN holds as it is this that allows the optimum value of the sample
problems, $v_n^*$ to converge to the true problem's optimal value, $v^*$. Note that if $f(x)$ is convex on a compact set
$D$, ULLN holds automatically.
The ULLN on its own is not sufficient for the method to give the optimal solution. The guide goes through a rather wonderful
bit of theory that states what condtions are required for this to be true. In essence, we are after some form of convergence
of the sets of optimal points of the sample functions, $\Pi^*_n$, to the set of optimal for $f(x)$, $\pi^*$. The convergence
used is the distance between sets $A$ and $B$ defined as the largest distance of an element of $A$ to the set $B$. Under
particular conditions, $\Pi^*_n$ converges to $\pi^*$ just as desired.
The best case scenario from the SAA performance point of view is when $f(x)$ has a single optimum point $x^*$. Obviously,
this is guaranteed by convexity (which is often what is strived for in the field of optimisation), but it is not a
requirement. In this situation situation, plus a few more conditions such as $f(\cdot,\xi)$ and $\nabla_xf(\cdot,\xi)$
being Lipschitz continuous almost surely on the compact set $D$ and $f$ satisfying quadratic growth close to $x^*$, then
the distance between the optimal points of $f_n$, $X_n^*$, and $x^*$ is bounded in probability by $n^{-1/2}$. That is, for
any $\epsilon>0$, there exists an $M>0$ such that
$$Pr(\sqrt{n}||X_n^*-x^*||>M)\leq\epsilon$$
for all $n$. This is the best possible performance that SAA can achieve.
In cases where there are multiple optima, computational limitations mean that, despite the theory, the best we can hope for
is a local optima of $f(x)$. This is problematic, and means that actually the method is no better that other simulation
optimisers.
I do have a problem with SAA. Whilst it has some rather nice mathematics behind it, I am not quite sure of the point of
it. It is very limited in its applicablitiy because of the many conditions that are required for it to work at all, never
mind well. The properties required can also be difficult to prove. On top of this, the one thing that was always
reiterated in the paper was that SAA is appropriate when Infinitesimal Perturbation Analysis (IPA) is valid. I discussed
this in my blog post Optimisation and Error in Simul?ation. But Barry Nelson referred
to (IPA) as a Gold Standard of Simulation Optimisation. If this IPA so good, why would you use something else when IPA is
applicable? The paper even concludes that simple SAA does not perform well. However, perhaps alternative approaches based
on the same idea will produce good results in the future.
References
[1] A Guide to Sample-Average Approximation,
Sujin Kim, Raghu Pasupathyy and Shane G. Henderson (2011)