Description

A:\SteepAsG.wpd Written: July 30, 2001 Printed: July 30, 2001 (11:21AM) Searching for the Maximum Output in Random Simulation: New Signal/Noise Heuristics a b c Jack P.C. Kleijnen, Dick den Hertog, and

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

A:\SteepAsG.wpd Written: July 30, 2001 Printed: July 30, 2001 (11:21AM) Searching for the Maximum Output in Random Simulation: New Signal/Noise Heuristics a b c Jack P.C. Kleijnen, Dick den Hertog, and Ebru Angün a Corresponding author: Department of Information Systems/Center for Economic Research (CentER), Tilburg University (KUB), Postbox 90153, 5000 LE Tilburg, Netherlands Phone: ; Fax: ; Web: b Department of Econometrics and Operations Research/CentER, KUB Phone: ; Fax: ; Web: center.kub.nl/staff/hertog c BIK/CentER, KUB Phone: ; Fax: ; Keywords& descriptive phrases: design of experiments, statistical analysis Maximization requires deciding which factor combinations to simulate. RSM uses scale-dependent steepest ascent with intuitive step sizes. -2- Abstract This paper addresses some problems that arise when searching for those inputs that maximize the output of a random simulation model. That search might use classic Response Surface Methodology (RSM), which proceeds in stages. This paper focuses on the first stages, namely the fitting of a local first-order polynomial per stage - followed by estimation of the steepest ascent path. Those stages, however, suffer from two well-known problems: (i) steepest ascent is scale-dependent; (ii) the step size is selected intuitively. To tackle these problems, this paper derives two novel heuristics combining mathematical statistics and mathematical programming. Heuristic 1 uses the well-known signal/noise ratio to take a step such that the locally predicted minimum simulation output is maximized. This heuristic is proven to be scaleindependent. The step-size problem is solved tentatively. Heuristic 2 does follow the classic steepest ascent direction, but with a step size inspired by heuristic 1. The mathematical properties of the two heuristics are derived, interpreted, and illustrated through simple numerical examples. The search directions of the two heuristics are explored in simple Monte Carlo experiments. These experiments show that - in general - heuristic 1 gives a better search direction than heuristic 2 or classic RSM. 1. Introduction In this section we address three questions: (i) What is RSM? (ii) What is our research contribution? (iii) How is this paper organized? We assume that we wish to find the maximum simulation output (finding the minimum is equivalent to finding the maximum of minus the output). So there are no explicit con- -3- straints. This assumption is also used in classic Response Surface Methodology (RSM); see Box (1999), Fu (1994), Khuri (1996), Khuri and Cornell (1996), Myers (1999), Myers and Montgomery (1995), and Neddermeijer et al. (2000). This classic RSM was originally formulated in Box and Wilson (1951). Actually, Box (1999) states that the clue of his method is that it proceeds in steps (as we shall explain below). Other authors, however, mean by RSM a one-shot approach that fits a (meta)model to the input/output (I/O) data of a simulation model, over the whole experimental area (instead of a local area only); for example, these authors fit either a second-order polynomial or a socalled Kriging model. Next they use this single metamodel (response surface) to estimate the optimal values of the input variables. Their optimization uses methods such as generalized reduced gradients (GRG). A seminal paper for this second type of RSM is Sacks et al. (1989); also see Fu (1994), Jones, Schonlau, and Welch (1998), and Simpson et al. (2001). These publications provide many more references. Kleijnen (1998) states that classic RSM combines regression analysis, statistical designs, and steepest-ascent hill-climbing. It assumes a single type of response, and has the following four characteristics. (i) The method relies on first-order and second-order polynomial regression metamodels; the responses are assumed to have white noise (say) e; that is, e is normally, identically, and independently distributed (NIID) with zero mean µ e and constant variance F 2 e. (ii) It uses classic designs, namely resolution-3 (R-3) designs and central composite designs (CCD). (iii) It uses the mathematical (not statistical) technique of steepest ascent; that is, the estimated gradient determines in which direction the inputs are changed in the next step. (iv) It uses the mathematical technique of canonical analysis to analyze the shape of the optimal region, in the final stage: does that region have a unique maximum, a saddle point or a ridge (stationary points)? More precisely, this RSM consists of the following six -4- steps. Step 1: RSM begins by selecting a starting point. Because RSM is a heuristic, several starting points may be tried later on, if time permits. Step 2: Next, RSM explores the neighborhood of that starting point. The response surface is approximated locally by a first-order polynomial in the inputs. Let x denote the j original, non-standardized input j. Hence k main effects (say) $ with j = 1,..., k are to be j estimated. For that purpose, RSM uses a R-3 design, which specifies n. k + 1 combinations. Step 3: Then the steepest ascent path implies )x j /)x 1 = ˆ$ j / ˆ$ 1 ; in other words, steepest ascent uses the local gradient. Unfortunately, the steepest ascent technique does not quantify the step size along this path. In practice the analysts try an intuitively selected value for the step size. If that value yields a lower response, then they reduce the step size. Otherwise, they take one more step. (An example is the case study in Kleijnen (1993), which uses a step size that doubles the most important input.) A more refined step-size selection method (namely, line search ) is described by Fu (1994, p. 216). After a number of steps along this path, the simulation response decreases since the first-order polynomial is only a local approximation to the real I/O transformation specified by the simulation model. Step 4: If such a decrease occurs, steps 2 and 3 are repeated, around the best point found so far. So, n. k + 1 new input combinations are simulated. Next the factor effects in the new local first-order polynomial are estimated. And so RSM proceeds. Step 5: A first-order polynomial or hyperplane cannot adequately represent a hill top. So in the neighborhood of the optimum, a first-order polynomial may show serious lack of fit. Then, the analysts fit a second-order polynomial. Usually, they base this fitting on CCD. Step 6: Finally, the optimal values of the inputs are found by either straightforward -5- differentiation of this polynomial or a more sophisticated evaluation called canonical analysis. Applications of classic RSM to simulated systems can be found in Hood and Welch (1993). From here on, we call classic RSM briefly RSM. Note: RSM treats the simulation model as a black box, unlike some other optimization methods such as Perturbation Analysis and Likelihood Ratio or Score Function. Besides RSM there are other optimization methods that do follow the black box approach: Genetic Algorithms, Simulated Annealing, Tabu Search, etc. Our research contribution is the following. Describing RSM in detail would require a book such as Myers and Montgomery (1995) with its 700 pages. Describing RSM within the limits of a single paper would lead to a summary such as Kleijnen (1998). Therefore we focus on two main problems in RSM: scaling effects and step size selection in steepest ascent; these problems are also mentioned in Myers and Montgomery (1995, pp ). We do solve the scaling problem: we introduce our Adapted Steepest Ascent (ASE); that is, we adjust the estimated first-order factor effects through their estimated covariance matrix. ASE does give a better search direction, in most cases. We also propose and explore a solution for the step size: we suggest a max-min strategy. (Our step size will be further investigated in future research when we have integrated our new heuristic within the full RSM, which covers all stages - from the starting point to the declared optimum.) More precisely, in our new heuristic 1 we take a step - in a direction and of such a size - that we maximize the minimum predicted simulation output, computed from an estimated local first-order polynomial in the inputs. Our prediction uses well-known characteristics of linear regression models. In heuristic 2 we do follow the classic steepest ascent direction, but we use a step size inspired by heuristic 1. We derive mathematical properties of the step-size selection of heuristic 1. For -6- example, we prove that heuristic 1 is indeed scale-independent (unlike RSM). We interpret these mathematical results in terms of practical consequences for the optimization of simulation models. For example, the simulation inputs might need partitioning into random and (nearly) deterministic inputs. Further, the smaller the step size, the bigger the confidence in that step (the confidence is maximal, if we stay at the center of the local experimental area). Finally, we study the statistical properties of the search directions of the two heuristics through simple Monte Carlo experiments. These experiments suggest that heuristic 1 is not only scale-independent but also insensitive to linear transformations of the inputs. Usually, heuristic 1 gives a better direction than does steepest ascent. The remainder of this article is organized as follows. 2 summarizes those parts of linear regression analysis that we need to formulate our heuristics. 3 derives these heuristics, and their mathematical properties and interpretation. To get further insight into these properties, 4 applies the step-size selection of heuristic 1 to simple numerical problems - with only one or two inputs and various signal/noise ratios. 5 compares the search directions of the two heuristics in a series of Monte Carlo experiments. 6 gives conclusions and future research plans. 2. Linear-regression basics As we stated in the preceding section, we consider k (with k 0 ù) simulation inputs (say) d = ( d j ) with j = 1,..., k. We define the estimated signal/noise ratio (say) ˆ( j as ˆ( j ' ˆ$ j var(ˆ$ ˆ j ) (j ' 1,..., k) (1) -7- where ˆ$ j denotes the estimated main or first-order effect in the following local first-order polynomial approximation: y ' $ 0 % j k j ' 1 $ j x j % e (2) where y denotes the regression predictor of the corresponding expected simulation output. The polynomial in (2) is also called the linear-regression metamodel of the underlying simulation model. Note: Each of the k ratios in (1) equals Student s statistic t under the null-hypothesis of zero input effect, H 0 : E(ˆ$ j ) ' 0. However, this does not mean that we propose to test this H 0. Actually, if we use Ordinary Least Squares (OLS) to estimate $ = ($ 0, $ 1,..., $ k ) ), then we have the Best Linear Unbiased Estimator (BLUE) ˆ$. Testing H 0 : E(ˆ$ j ) ' 0 makes sense only if we have good reasons to postulate such a hypothesis. But in our case, effects that are non-significant in a certain stage, may still be practically important in that stage - or in later stages! And unimportant factors may be significant if the signal/noise ratio is high: the simulation model may have small intrinsic simulation noise, or very many runs may be executed if computer time per run is small. Given the white-noise assumption below (2), the OLS estimator of the regression parameters $ is ˆ$ ' (X ) X) &1 X ) w (3) with (in order of appearance) ˆ$ : vector with the q estimated effects in the regression model (q = 1 + k in eq. 2) q: number of regression effects including the intercept $ 0 -8- X: N q matrix of explanatory (independent) variables including the dummy variable x 0 with constant value 1; X is assumed to have linearly independent columns (so X has full column rank) N = 3 n i ' 1 m : number of actually executed simulation runs i m i : number of simulation replicates at input combination (point) i, with m i 0 ù v m i 0 n: number of different, actually simulated combinations of the k inputs, with n 0 ù v n $ q (necessary condition for avoiding singularity in eq. 3) w: vector with N simulation outputs corresponding to the N simulation inputs. Note: An example is a single-server simulation with w in (3) denoting the vector of average waiting times per replicate, and x 1 in (2) denoting the traffic rate. Obviously, the first m 1 rows of X are identical and equal to (1, x 1; 1,..., x 1; k ) ),..., the last m n rows of X are identical and equal to (1, x n; 1,..., x n; k ) ). Because of (1) through (3), we call ˆ$ j the estimated signal of input j. The signal s noise (see eq. 1's denominator) is the square root of the corresponding element on the main diagonal of cov(ˆ$) ' (X ) X) &1 F 2 e. (4) For example, (2) implies that var(ˆ$ 1 ) is the second element on the main diagonal of (4). Equation (4) leads to the estimated noise: replace the unknown parameter the mean squared residual (MSR) estimator ˆF 2 e ' 3n i ' 1 3m i r ' 1 (w i; r & ŷ i )2 (N & q) F 2 e in (4) by (5) where w i; r denotes the simulation output for input combination i and replicate r, and ŷ i denotes the OLS regression predictor for the simulation s input combination i that follows from -9- ŷ ' Xˆ$. (6) k For example, (6) combined with (2) gives ŷ 1 ' ˆ$ 0 % ˆ$ j j ' 1 j x 1; j. The variance of this predictor is a function of d, the input combination for which this estimate is computed. Indeed, d determines x, because the first-order regression model (2) implies x ) = (1, d ) ). Elementary regression analysis gives var(ŷ* x) ' x ) cov(ˆ$) x. (7) Observe that x in (7) may correspond with either one of the actually observed input combinations X - as in (6) - or a new point. A new point means interpolation or extrapolation. To illustrate the implications of (7), suppose that X is orthogonal; that is, X ) X = NI q q. Then combining (4) and (7) gives var(ŷ* x) ' x ) x F 2 e /N. (8) Obviously, the regression predictor becomes less reliable, as the number of underlying simulation runs N decreases. Likewise, the predictor gets inaccurate, as the noise F 2 e increases (for example, a single-server simulation implies that a higher traffic rate not only increases the mean but also the variance of waiting times so the intrinsic simulation noise increases and so does the white noise e). But what is the effect of x, the point that we wish to predict? In Appendix 1 we derive d o, the design point that minimizes the variance of the regression predictor. We find = &C &1 b where b is the k-dimensional vector obtained by d o -10- deleting the first entry of the first column of (X ) X) &1, and C is the k k matrix obtained by deleting both the first column and the first row of (X ) X) &1. Hence, if X is orthogonal, then (8) is minimal at the center of the experimental area: d o = 0 (also see the funnel shape of Figure 1, discussed below). Hence, extrapolation should be less trusted as the extrapolated point moves farther away into regions not yet explored by simulation. This property will guide our heuristics. (The term trust region is used in nonlinear optimization; see Conn, Gould, and Toint 2000.) 3. Two new search heuristics, and their properties We consider a lower (one-sided) 1 - confidence interval for the OLS predictor, given x. This interval ranges from infinity down to ŷ min (x) ' ŷ(x) & t N & q ˆF(ŷ, x) ' x)ˆ$ & t N & q ˆF e x ) (X ) X) &1 x (9) where t N & q denotes the 1 - quantile of the distribution of t with N - q degrees of freedom (DF), and ˆF(ŷ, x) follows from the basic linear-regression formulas in (4) through (7). The first term in (9) concerns the signal, whereas the second term concerns the noise. Note: When we consider a set of x values, then the set of intervals following from (9) has a joint (simultaneous) probability lower than 1 - . This complication we ignore in the step-size selection of our two heuristics. 3.1 Heuristic 1: maximize ŷ min (x) Our first heuristic finds (say) x %, which is the x that maximizes the minimum output predicted through (9). This x % -11- gives both a search direction and a step size! First we prove in Appendix 2 that the objective function in (9) is concave in d. Next in Appendix 3 we derive the following explicit solution for the optimal input values of the next simulation run: d % ' &C &1 b % 8C &1 ˆ$ &0 (10a) with step size 8 ' a & b ) C &1 b (t N & q ˆF e )2 & ˆ$ ) &0 C&1 ˆ$ &0 (10b) where a is the element (1, 1) of (X ) X) &1, and ˆ$ &0 is the vector of estimated first-order effects (so it excludes the intercept ˆ$ 0 associated with the dummy effect). The first term in (10a) means that the path on which the next run is placed, starts from &C &1 b (the point at which the predictor variance is minimal; see the end of 2). The second term means that this path is in the ASE direction: the classic steepest ascent direction is ˆ$ &0 (the second term s last factor), whereas C (defined in 2, last paragraph) is the covariance matrix of ˆ$ &0, well-known in mathematical statistics, or the scaling matrix, well-known in mathematical programming. Finally, the step size 8 is quantified in (10b). For the orthogonal case ( X ) X= NI q q ) it is easy to verify that a = 1, b = 0, and C = I, so (10) reduces to -12- d % ' ˆ$ &0 (t N & q )2 ˆF 2 e & ˆ$ ) &0 ˆ$ &0. (11) This solution implies a search direction that is identical to the steepest ascent direction in RSM (also see 3.3). Next we derive some mathematical properties of (10), and we interpret these mathematical results. In case of large signal/noise ratios (defined in eq. 1), the denominator under the square root is negative so (10b) does not give a finite solution for d % ; that is, (9) can be driven to infinity (unbounded solution)! Indeed, if the noise is negligible, we have a deterministic simulation model, which our heuristic is not meant to address (Many other researchers - including Conn et al. (2000) - study optimization of deterministic simulation models.) In case of a small signal/noise ratio, no step is taken. Actually, this ratio may be small because (i) the signal is small, or (ii) the noise is big. These two cases are discussed next. In case (i), the signal may be small because the first-order polynomial approximation is bad. Then we should switch to an alternative metamodel using transformations of d j such as log( d j ) and 1/ d j (an inexpensive alternative), a second-order polynomial, which adds d j d j ) ( j ) j) and d 2 (expensive because many more simulation runs are required to estimate the j corresponding effects), etc.; see the RSM literature. In case (ii), however, the first-order polynomial may fit, but the intrinsic noise may be high; see the comment below (8). To decrease this noise, we should increase the number of runs, N; see the denominator in (8). Hence, we should increase either n or m i ; see the definitions below (3). When our heuristic gives a value d % that is close to one of the old -13- points, then in practice we may increase m i. Otherwise we simulate a new combination: we increase n. So our heuristic suggests an approach to the old problem of how to choose between either using the next simulation run to increase the accuracy of the current local approximation, or trusting that approximation and moving into a new area! A different approach is discussed in Kleijnen (1975, p. 360). In the literature on maximizing the output of deterministic simulation, this is the geometry improvement problem. If we specify a different value in t N & q, then (10) gives a different step size (in the same direction). Obviously, t N & q increases to infinity, as decreases to zero. So, a sufficiently small always gives a finite solution. However, if we increase , then we make a bigger step. And we prefer to take a bigger step, in order to get quicker to the top of the response surface! We feel that a reasonable maximum value is 0.20 (so we are 80% sure ); however, more empirical research is needed to substantia

Search

Similar documents

Related Search

For Profit Higher Education In The United StaBECOMINGS IN J. M. COETZEE’S WAITING FOR THE Call for Research - the Consuming Child-in-CoBusiness Leadership for the poor in INDIAThe craze for English-medium education in ChiPopular Front For The Liberation Of PalestineUnited States Court Of Appeals For The Ninth Citizens For Responsibility And Ethics In WasPeople For The Ethical Treatment Of AnimalsUnited States Court Of Appeals For The Federa

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks