School Work

Response Surface Methodology 1

Description
Response Surface Methodology CASOS Technical Report Kathleen M. Carley, Natalia Y. Kamneva, Jeff Reminga October 004 CMU-ISRI Carnegie Mellon University School of Computer Science ISRI - Institute
Categories
Published
of 24
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
Response Surface Methodology CASOS Technical Report Kathleen M. Carley, Natalia Y. Kamneva, Jeff Reminga October 004 CMU-ISRI Carnegie Mellon University School of Computer Science ISRI - Institute for Software Research International CASOS - Center for Computational Analysis of Social and Organizational Systems This work was supported in part by NASA # NAG--569, Office of Naval Research Grant N , Dynamic Network Analysis: Estimating Their Size, Shape and Potential Weaknesses, Office of Naval Research, N , Constraint Based Team Transformation and Flexibility Analysis under Adaptive Architectures, the DOD and the National Science Foundation under MKIDS. Additional support was provided by the center for Computational Analysis of Social and Organizational Systems (CASOS) (http://www.casos.cs.cmu.edu) and the Institute for Software Research International at Carnegie Mellon University. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the National Science Foundation, or the U.S. government. Keywords: Response Surface Methodology (RSM), regression analysis, linear regression model, regressors, variable selection, model building, full model, multicollinearity, ridge regression, unit length scaling, condition number, optimization, Simulated Annealing, global optimum Abstract There is a problem faced by experimenters in many technical fields, where, in general, the response variable of interest is y and there is a set of predictor variables x, x,..., xk. For example, in Dynamic Network Analysis (DNA) Response Surface Methodology (RSM) might be useful for sensitivity analysis of various DNA measures for different kinds of random graphs and errors. In Social Network Problems usually the underlying mechanism is not fully understood, and the experimenter must approximate the unknown function g with appropriate empirical model y = f( x, x,..., xk ) + ε, where the term ε represents the error in the system. Usually the function f is a first-order or second-order polynomial. This empirical model is called a response surface model. Identifying and fitting from experimental data an appropriate response surface model requires some use of statistical experimental design fundamentals, regression modeling techniques, and optimization methods. All three of these topics are usually combined into Response Surface Methodology (RSM). Also the experimenter may encounter situations where the full model may not be appropriate. Then variable selection or model-building techniques may be used to identify the best subset of regressors to include in a regression model. In our approach we use the simulated annealing method of optimization for searching the best subset of regressors. In some response surface experiments, there can be one or more near-linear dependences among regressor variables in the model. Regression model builders refer to this as multicollinearity among the regressors. Multicollinearity can have serious effects on the estimates of the model parameters and on the general applicability of the final model. The RSM is also extremely useful as an automated tool for model calibration and validation especially for modern computational multi-agent large-scale social-networks systems that are becoming heavily used in modeling and simulation of complex social networks. The RSM can be integrated in many large-scale simulation systems such as BioWar, ORA and is currently integrating in Vista, Construct, and DyNet. This report describes the theoretical approach for solving of these problems and the implementation of chosen methods. Table of Contents. Introduction.... Response Surface Methodology.... Response Surface Methodology and Robust Design The Sequential Nature of the Response Surface Methodology Building Empirical Models Linear Regression Model Estimation of the Parameters in Linear Regression Models Model Adequacy Checking Variable Selection and Model Building in Regression Procedures for Variable Selection General Comments on Stepwise-Type Procedures Our Approach: Using Optimization Procedure for Variable Selection Variable Selection: Results A Simulation Framework for Response Surface Methodology Response Surface Methodology as an Automated Tool for Model Validation Steps of Response Surface Methodology in Automated Validation Process Multicollinearity and Biased Estimation in Regression Definition of Multicollinearity Detection of Multicollinearity Multicollinearity Remedial Measures Limitations and Future Extensions System Requirements... 4 References i - List of Tables Table : Data for Multiple Linear Regression... 6 Table : Factors and Response for Example A... 4 Table 3: All Possible Regressions Results for Example A ii - . Introduction. Response Surface Methodology Response Surface Methodology (RSM) is a collection of statistical and mathematical techniques useful for developing, improving, and optimizing processes []. The most extensive applications of RSM are in the particular situations where several input variables potentially influence some performance measure or quality characteristic of the process. Thus performance measure or quality characteristic is called the response. The input variables are sometimes called independent variables, and they are subect to the control of the scientist or engineer. The field of response surface methodology consists of the experimental strategy for exploring the space of the process or independent variables, empirical statistical modeling to develop an appropriate approximating relationship between the yield and the process variables, and optimization methods for finding the values of the process variables that produce desirable values of the response. In this report we will concentrate on the second strategy: statistical modeling to develop an appropriate approximating model between the response y and independent variablesξ ξ,..., ξ, k. In general, the relationship is y = f( ξ, ξ,..., ξ k ) + ε ; (.) where the form of the true response function f is unknown and perhaps very complicated, and ε is a term that represents other sources of variability not accounted for in f. Usually ε includes effects such as measurement error on the response, background noise, the effect of other variables, and so on. Usually ε is treated as a statistical error, often assuming it to have a normal distribution with mean zero and varianceσ. Then E(y) = η = E [f ( ξ, ξ,..., ξ k )] + E (ε) = f ( ξ, ξ,..., ξ k ); (.) The variables ξ, ξ,..., ξ k in Equation (.) are usually called the natural variables, because they are expressed in the natural units of measurement, such as degrees Celsius, pounds per square inch, etc. In much RSM work it is convenient to transform the natural variables to coded variables x, x,..., xk, which are usually defined to be dimensionless with mean zero and the same standard deviation. In terms of the coded variables, the response function (.) will be written as η = f ( x, x,..., xk ); (.3) Because the form of the true response function f is unknown, we must approximate it. In fact, successful use of RSM is critically dependent upon the experimenter s ability to develop a suitable approximation for f. Usually, a low-order polynomial in some relatively small region of - - the independent variable space is appropriate. In many cases, either a first-order or a secondorder model is used. The first-order model is likely to be appropriate when the experimenter is interested in approximating the true response surface over a relatively small region of the independent variable space in a location where there is little curvature in f. For the case of two independent variables, the first-order model in terms of the coded variables is η = β o + βx + β x ; (.4) The form of the first-order model in Equation (.4) is sometimes called a main effects model, because it includes only the main effects of the two variables x and x. If there is an interaction between these variables, it can be added to the model easily as follows: η = β + βx + β x + β xx ; (.5) o This is the first-order model with interaction. Adding the interaction term introduces curvature into the response function. Often the curvature in the true response surface is strong enough that the first-order model (even with the interaction term included) is inadequate. A second-order model will likely be required in these situations. For the case of two variables, the second-order model is η = β o + βx + β x + βx + β x + β xx ; (.6) This model would likely be useful as an approximation to the true response surface in a relatively small region. The second-order model is widely used in response surface methodology for several reasons:. The second-order model is very flexible. It can take on a wide variety of functional forms, so it will often work well as an approximation to the true response surface.. It is easy to estimate the parameters (the β s) in the second-order model. The method of least squares can be used for this purpose. 3. There is considerable practical experience indicating that second-order models work well in solving real response surface problems. In general, the first-order model is η = β o + β x + β x β k xk (.7) and the second-order model is - - k k k η= β 0 + β x + β x + β i xi x (.8) = = i = In some infrequent situations, approximating polynomials of order greater than two are used. The general motivation for a polynomial approximation for the true response function f is based on the Taylor series expansion around the point x 0, x0,..., xk 0. Finally, let s note that there is a close connection between RSM and linear regression analysis. For example, consider the model y = β β x + β x β + ε 0 + k xk The β s are a set of unknown parameters. To estimate the values of these parameters, we must collect data on the system we are studying. Because, in general, polynomial models are linear functions of the unknown β s, we refer to the technique as linear regression analysis.. Response Surface Methodology and Robust Design RSM is an important branch of experimental design. RSM is a critical technology in developing new processes and optimizing their performance. The obectives of quality improvement, including reduction of variability and improved process and product performance, can often be accomplished directly using RSM. It is well known that variation in key performance characteristics can result in poor process and product quality. During the 980s [, 3] considerable attention was given to process quality, and methodology was developed for using experimental design, specifically for the following:. For designing or developing products and processes so that they are robust to component variation.. For minimizing variability in the output response of a product or a process around a target value. 3. For designing products and processes so that they are robust to environment conditions. By robust means that the product or process performs consistently on target and is relatively insensitive to factors that are difficult to control. Professor Genichi Taguchi [, 3] used the term robust parameter design (RPD) to describe his approach to this important problem. Essentially, robust parameter design methodology prefers to reduce process or product variation by choosing levels of controllable factors (or parameters) that make the system insensitive (or robust) to changes in a set of uncontrollable factors that represent most of the sources of variability. Taguchi referred to these uncontrollable factors as noise factors. RSM assumes that these noise factors are uncontrollable in the field, but can be controlled during process development for purposes of a designed experiment. Considerable attention has been focused on the methodology advocated by Taguchi, and a number of flaws in his approach have been discovered. However, the framework of response - 3 - surface methodology allows easily incorporate many useful concepts in his philosophy []. There are also two other full-length books on the subect of RSM [4, 5]. In our technical report we are concentrated mostly on building and optimizing the empirical models and practically do not consider the problems of experimental design..3 The Sequential Nature of the Response Surface Methodology Most applications of RSM are sequential in nature. Phase 0: At first some ideas are generated concerning which factors or variables are likely to be important in response surface study. It is usually called a screening experiment. The obective of factor screening is to reduce the list of candidate variables to a relatively few so that subsequent experiments will be more efficient and require fewer runs or tests. The purpose of this phase is the identification of the important independent variables. Phase : The experimenter s obective is to determine if the current settings of the independent variables result in a value of the response that is near the optimum. If the current settings or levels of the independent variables are not consistent with optimum performance, then the experimenter must determine a set of adustments to the process variables that will move the process toward the optimum. This phase of RSM makes considerable use of the first-order model and an optimization technique called the method of steepest ascent (descent). Phase : Phase begins when the process is near the optimum. At this point the experimenter usually wants a model that will accurately approximate the true response function within a relatively small region around the optimum. Because the true response surface usually exhibits curvature near the optimum, a second-order model (or perhaps some higher-order polynomial) should be used. Once an appropriate approximating model has been obtained, this model may be analyzed to determine the optimum conditions for the process. This sequential experimental process is usually performed within some region of the independent variable space called the operability region or experimentation region or region of interest.. Building Empirical Models. Linear Regression Model In the practical application of RSM it is necessary to develop an approximating model for the true response surface. The underlying true response surface is typically driven by some unknown physical mechanism. The approximating model is based on observed data from the process or system and is an empirical model. Multiple regression is a collection of statistical techniques useful for building the types of empirical models required in RSM. The first-order multiple linear regression model with two independent variables is y = β o + βx + β x + ε (.) - 4 - The independent variables are often called predictor variables or regressors. The term linear is used because Equation (.) is a linear function of the unknown parameters β o, β, andβ. In general, the response variable y may be related to k regressor variables. The model y = β o + β x + β x β k xk + ε (.) is called a multiple linear regression model with k regressor variables. The parameters β, =0,, k, are called the regression coefficients. Models that are more complex in appearance than Equation (.) may often still be analyzed by multiple linear regression techniques. For example, considering adding an interaction term to the first-order model in two variables y = β o + βx + β x + βx x + ε (.3) As another example, consider the second-order response surface model in two variables y = β o + βx + β x + βx + β x + β x x + ε (.4) In general, any regression model that is linear in the parameters (the β-values) is a linear regression model, regardless of the shape of the response surface that it generates.. Estimation of the Parameters in Linear Regression Models The method of least squares is typically used to estimate the regression coefficients in a multiple linear regression model. Suppose that n k observations on the response variable are available, say y, y,..., yn. Along with each observed response y i, we will have an observation on each regressor variable, let x i denote the ith observation or level of variable x (see Table.). The model in terms of the observations may be written in matrix notation as y = Xβ + ε (.5) where y is an n x vector of the observations, X is an n x p matrix of the levels of the independent variables, β is a p x vector of the regression coefficients, and ε is an n x vector of random errors Table : Data for Multiple Linear Regression y x x x k y x x... x k y x x... x k y n x n x... n x nk We wish to find the vector of least squares estimators, b, that minimizes n L = ε = ε'ε =(y - Xβ)'(y - Xβ) (.6) i= i After some simplifications, the least squares estimator of β is b = (X X) X y (.7) It is easy to see that X X is a p x p symmetric matrix and X y is a p x column vector. The matrix X X has the special structure. The diagonal elements of X X are the sums of squares of the elements in the columns of X, and the off-diagonal elements are the sums of cross-products of the elements in the columns of X. Furthermore, the elements of X y are the sums of crossproducts of the columns of X and the observations {y i }. The fitted regression model is ŷ = Xb (.8) In scalar notation, the fitted model is ŷ i k 0 + xi, i =,,,n = = b b The difference between the observation y i and the fitted value ŷ i is a residual, e i = y i ŷ i. The n x vector of residuals is denoted by e = y ŷ (.9) - 6 - .3 Model Adequacy Checking It is always necessary to. Examine the fitted model to ensure that it provides an adequate approximation to the true system;. Verify that none of the least squares regression assumptions are violated. Now we consider several techniques for checking model adequacy..3. Properties of the Least Squares Estimators The method of least squares produces an unbiased estimator of the parameter β in the multiple linear regression model. The important parameter is the sum of squares of the residuals SS E = n i= n e i i= ( y i ŷ i ) = = e e (.0) Because X Xb = X y, we can derive a computational formula for SS E : SS E = y y b X y (.) Equation (.) is called the error or residual sum of squares. It can be shown that an unbiased estimator of σ is σ = SS E n p (.) where n is a number of observations and p is a number of regression coefficients. The total sum of squares is SS T ( yi ) i= = y' y = n n n i= y i n ( yi ) i= n (.3) Then the coefficient of multiple determination R is defined as R = SS SS E T (.4) - 7 - R is a measure of the amount of reduction in the variability of y obtained by using the regressor x, x,..., xk in the model. From inspection of the analysis of variance identity equation 0 R. However, a large value of variables (Equation (.4)) we can see that R does not necessarily imply that the regression model is good one. Adding a variable to the model will always increase R, regardless of whether the additional variable is statistically significant or not. Thus it is possible for models that have large values of R to yield poor predictions of new observations or estimates of the mean response. Because R always increases as we add terms to the model, some regression model builders prefer to use an adusted R statistic defined as SS E /( n p) n Rad = = ( R ) (.5) SS /( n ) n p T In general, the adusted R statistic will not always increase as variables are added to the model. In fact, if unnecessary terms are added, the value of R will often decrease. When R and R ad differ dramatically, there is a good chance that nonsignificant terms have been included in the model. We are frequently interested in testing hypotheses on the individual regression coefficients. Such tests would be useful in determining the value of each of the regressor variables in the regression model. For example, the model might be more effective with the inclusion of
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks