A simulation study to investigate the use of cutoff values for assessing model fit in covariance structure models

A simulation study to investigate the use of cutoff values for assessing model fit in covariance structure models
of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  A simulation study to investigate the use of cutoff values for assessingmodel fit in covariance structure models Subhash Sharma a, *, Soumen Mukherjee  b , Ajith Kumar  c , William R. Dillon d a   Moore School of Business, University of South Carolina, Columbia, SC 29208, USA  b  MAPS Inc., Waltham, MA, USA c  Arizona State University, Tempe, AZ, USA d Southern Methodist University, Dallas, TX, USA Received 3 January 2002; accepted 14 October 2003 Abstract In this paper, we used simulations to investigate the effect of sample size, number of indicators, factor loadings, and factor correlations onfrequencies of the acceptance/rejection of models (true and misspecified) when selected goodness-of-fit indices were compared with prespecified cutoff values. We found the percent of true models accepted when a goodness-of-fit index was compared with a prespecifiedcutoff value was affected by the interaction of the sample size and the total number of indicators. In addition, for the Tucker-Lewis index(TLI) and the relative noncentrality index (RNI), model acceptance percentages were affected by the interaction of sample size and size of factor loadings. For misspecified models, model acceptance percentages were affected by the interaction of the number of indicators and thedegree of model misspecification. This suggests that researchers should use caution in using cutoff values for evaluating model fit. However,the study suggests that researchers who prefer to use prespecified cutoff values should use TLI, RNI, NNCP, and root-mean-square-error-of-approximation (RMSEA) to assess model fit. The use of GFI should be discouraged. D  2004 Elsevier Inc. All rights reserved.  Keywords:  Structural equation modeling; Confirmatory factor analysis; Goodness-of-fit-indices; Simulation 1. Introduction The evaluation of covariance structure models is typical-ly carried out in two stages: (1) an evaluation of overallmodel fit and (2) evaluations of specific parts/aspects of themodel such as the measurement properties of indicators and/ or strength of structural relationships. The chi-square test statistic was among the first set of indices proposed toevaluate overall model fit to the data in a statistical sense.As is the case with most statistical tests, the power of thechi-square test increases with sample size. Since in covari-ance structure analysis, the nonrejection of the modelsubsumed under the null hypothesis is typically the desiredoutcome, the rejection of the model through the chi-squaretest in large samples, even for trivial differences between thesample and the estimated covariance matrices, soon came to be perceived as problematic (Bentler and Bonett, 1980;Tucker and Lewis, 1973). In response to this ‘‘sample-size’’ problem of the chi-square test statistic, several alternativegoodness-of-fit indices were proposed for evaluating overallmodel fit. In turn, a number of simulation studies evaluatedthe sensitivity of these indices to sample-size variations(e.g., Anderson and Gerbing, 1984; Bearden et al., 1982;Bentler, 1990; Marsh et al., 1988).In their comprehensive, integrative review of variousgoodness-of-fit indices, McDonald and Marsh (1990) con-cluded that only four indices were relatively insensitive tosample size: the noncentrality parameter (NCP) of  McDo-nald (1989) and a normed version thereof (NNCP), therelative noncentrality index (RNI), and the Tucker-Lewisindex (TLI). An index is defined to be insensitive to samplesize if the expected value of its sampling distribution is not affected by sample size. However, researchers typicallyevaluate model fit by comparing the value of some good-ness-of-fit index with some prespecified cutoff value. Basedon the results of a recent simulation study, Hu and Bentler (1998, 1999) suggest that a cutoff value close to 0.95 for TLI or RNI, a cutoff value close to 0.90 for NNCP or a 0148-2963/$ – see front matter   D  2004 Elsevier Inc. All rights reserved.doi:10.1016/j.jbusres.2003.10.007* Corresponding author. Tel.: +1-803-777-4912; fax: +1-803-777-6876.  E-mail address:  sharma@moore.sc.edu (S. Sharma).Journal of Business Research 58 (2005) 935–943  cutoff value of 0.06 for root-mean-square-error-of-approxi-mation (RMSEA; Steiger and Lind, 1980; Steiger, 1990) isneeded before one could claim good fit of the model to thedata. However, they caution that one cannot employ aspecific cutoff value because the indices may be affected bysuch factors as sample size, estimation methods, and distri- bution of data. Furthermore, finding that the expected valueof an index is independent of sample size does not logicallyimplythatthepercentageofindexvaluesexceedingthecutoff value isalso independent of sample size. Therefore, it isquite possible that even if the expected value of an index isunaffected by sample size, the relative frequencies of modelacceptance and rejection when a prespecified cutoff value isused could potentially depend on sample size. Should thisoccur, the use of a universal cutoff value may be inappropri-ate, as replication studies of a given model using different samplesizes could leadtodifferentconclusionsregardingtheacceptance/rejection of models.In addition, for a given sample size, the relative frequen-cies of model acceptance and rejection may vary with thenumber of indicators in the model, which is typically afunction of the number of constructs or factors in the model.However, for a given number of constructs, the number of indicators could vary due to the use of shorter or longer versions of previously developed scales.The objective of this paper, therefore, is to use simulationto empirically assess the effects of factors, such as samplesize and number of indicators, on goodness-of-fit index and,more importantly, on the use of prespecified cutoff valuesfor assessing model fit. The effects will be assessed both for true and for misspecified models. The paper is organized asfollows: First, we briefly discuss goodness-of-fit indicesevaluated in this study and their suggested cutoff values.Second, we present the simulation design employed. Third,we present the results of our simulations. Finally, we discussthe implications of our results for using prespecified cutoff values for acceptance/rejection decisions. 2. Goodness-of-fit indices and their cutoff values 2.1. Goodness-of-fit indices While several goodness-of-fit indices have been proposedin the literature, this study will assess the following fiveindices: the NNCP, the RNI, the TLI, the RMSEA, and thegoodness-of-fit index of  Joreskog and Sorbom (1982). Wenow discuss our rationale for including these five indices.First, in an integrative review of several GFIs, McDonaldand Marsh (1990) concluded that among the fit indicestypically used by researchers, only NCP, NNCP, RNI, andTLI were insensitive to sample size. We excluded the NCPfrom our analysis because we did not find it as being usedfrequently in substantive research for evaluating model fit, presumably because advocates of this index did not specifycutoff values for its use. Second, Marsh et al. (1988) did not include RMSEA in their simulation study, and neither didMcDonald and Marsh (1990) in their integrative review. More recently, however, Browne and Cudeck (1993) suggest using this index to assess model fit. This index was included by Hu and Bentler (1998) in their simulation study and foundto be quite sensitive to model misspecification. Finally, thegoodness-of-fit index, although found to be sensitive tosample size in a number of simulation studies, is still beingused extensively by researchers to assess model fit. 2.2. Cutoff values for assessing model fit  As mentioned earlier, researchers typically compare thecomputed value of some GFI to a prespecified cutoff valuefor evaluating model fit. For normed fit indices (i.e.,goodness-of-fit index, NNCP, RNI, and TLI) whose valuestypically range between 0 and 1, with 1 indicating perfect fit, the cutoff value of 0.90 recommended by Bentler andBonett (1980) is the most popular and widely employed byresearchers to evaluate model fit. The model is considered tohave an unacceptable fit if the value of the fit index is lessthan 0.90. We used a cutoff value of 0.90 for the NNCPeven though McDonald and Marsh (1990) did not prescribeany cutoffs for this index. For the RMSEA, whose valuedoes not range between 0 and 1, Browne and Cudeck (1993) suggested that values of 0.05 or less would indicatea ‘‘close fit’’, a value of 0.08 or less would indicate a‘‘reasonable fit’’, and values greater than 0.10 wouldindicate ‘‘unacceptable fit’’. 3. Simulation study Simulation studies were done to assess the effects of sample size, number of indicators, factor loadings size, andsize of factor correlations on the mean value of the selectedfit indices and on the percent of models accepted using prespecified cutoff values. Two specifications of correlatedtwo-factor, four-factor, six-factor, and eight-factor confir-matory factor models, with four indicators per factor, wereused. The two-factor model will have a total of eight indicators and one correlation among the two factors. Thefour-factor model will have a total of 16 indicators and sixcorrelations among the four factors. The six-factor modelwill have a total of 24 indicators and 15 correlations amongthe six factors. The eight-factor model will have a total of 32indicators and 28 correlations among the eight factors. In thefirst specification, the correct or true model was estimated.In the true or correct model, the specification of the modelestimated in the sample was identical with the populationmodel. That is, the model should have a perfect fit to thedata. Any lack of fit is attributed to sampling error. In thesecond specification, the model was not correctly specified,in that the model estimated in the sample was not the sameas the population model. Specifically, the correlationsamong the factors were not estimated. Misspecified models S. Sharma et al. / Journal of Business Research 58 (2005) 935–943 936  were included in the study to assess the extent to which theuse of cutoff values might result in Type II errors (i.e., thedecision to accept the model specified under the nullhypothesis as true when an alternative model is the correct one). 4. Simulation methodology Four factors were systematically varied to create thesimulation experimental design: (1) four sample sizes wereused (100, 200, 400, and 800); (2) number of indicatorswere varied from 8 to 32, in steps of 8 (i.e., 8, 16, 24, and32); (3) three factor loadings (i.e., .3, .5, and .7) were used;and (4) three correlations among the factors were employed(.3, .5, and .7). Following prior simulation studies, aconfirmatory factor analysis (CFA) model was chosen. 4.1. Data generation The simulation design resulted in a total of 36 different  population covariance matrices. A total of 100,000 obser-vations were generated from each of the 36 populationcovariance matrices using the GGNSM procedure (IMSLLibrary, 1980). From each of the 36 sets of 100,000observations representing a given population covariancematrix, 100 replications of each sample size were randomlydrawn. That is, 400 samples were drawn from each set of the 36 sets of observations. This gave a total of 14,400samples (3 levels of factor loadings  3 levels of factor correlations  4 levels of number of indicators  4 levels of samplesizes  100replications).Asamplecovariancematrixwas computed from each of the 14,400 samples. 4.2. Model estimated: true models For each sample, the corresponding true model wasestimated. All the parameters, including the correlationsamong the factors, were estimated. For a given index, the percent of true models rejected when compared with a prespecified cutoff value would give a measure of the TypeI error committed by the usage of the respective index for model acceptance/rejection decisions. 4.3. Model estimated: misspecified models As indicated earlier, another objective of our study was toinvestigate model acceptance/rejection frequencies whencutoff values are used to evaluate the fit of misspecifiedmodels. In general, misspecification could occur in count-less ways. However, since our main concern was to assesshow the fit indices behaved for misspecified models and tokeep the simulation study to manageable levels, we chose asubset that would span a wide range of misspecificationswith respect to the lack of overall fit. The subset of modelschosen were those that resulted from systematically not estimating the correlations among the factors. Specifically,misspecified models were operationalized by positing or-thogonal models for each of the following combinations: (1) k =.3,  / =.3; (2)  k =.5,  / =.5; and (3)  k =.7;  / =.7, where  k  and /  denote factor loadings and factor correlations, respective-ly. These combinations represent varying degrees of modelmisspecification, with the first combination resulting in thesmallest amount of misspecification and the third combina-tion resulting in the largest amount of misspecification.For each estimated model (true and misspecified), thefive goodness-of-fit indices discussed earlier were comput-ed. In addition, for each goodness-of-fit index, the percent of times the fitted models were accepted was computed for each cell of the simulation design on the basis of a prespecified cutoff value (values exceeding 0.90 for NNCP,TLI, RNI, and goodness-of-fit index and values below 0.05for RMSEA). The percent of misspecified models acceptedwhen compared with a prespecified cutoff value would givea measure of the Type II error committed by the usage of therespective index for model acceptance/rejection decisions. 5. Results In Monte Carlo simulations of covariance structuremodels, some of the samples analyzed inevitably yieldimproper solutions, wherein one or more of the parameter estimates are inadmissible (e.g., zero or negative error variances, standardized factor loadings or interfactor cor-relations exceeding one, etc.). While such improper sol-utions would be discarded in substantive research contextswhere, typically, a single-sample covariance matrix isanalyzed, it is important to include them in the analysisof the Monte Carlo results because the sampling distribu-tion that is ultimately being evaluated within each treat-ment of the simulation design includes all the samplecovariance matrices that are generated. There were a totalof 0.08% improper solutions for true models and 5.69%improper solutions for misspecified models. Consistent with the results of previous simulations, a majority of theimproper solutions were for small sample sizes (  N  =100and 200). There were no improper solutions for samples of size 800.To assess the effect of the manipulated factors, the datawere analyzed using ANOVA and computing the effect size,  g 2 . The  g 2 associated with each estimated effect represents the percent of variance in the dependent variablethat is accounted for by that effect after accounting for theimpact of all other effects. Because of large sample sizes,many of the effects that are practically insignificant (asmeasured by  g 2 ) will be statistically significant. Conse-quently, we present the results and the discussion only for those factors that are statistically significant and whose  g 2 is greater than 3% (Anderson and Gerbing, 1984; Sharmaet al., 1989); these effects will be referred to as significant effects. S. Sharma et al. / Journal of Business Research 58 (2005) 935–943  937  5.1. True models5.1.1. Goodness-of-fit indices As indicated earlier, we performed a 3  3  4  4(Factor Correlations  Factor Loadings  Sample Size   Number of Indicators) ANOVA, with each GFI as thedependent variable. Table 1 presents the significant results.The following conclusions can be drawn from the table: (1)Sample Size   Number-of-Indicators interaction (N   NIinteraction) is the only interaction that is significant, andthis interaction is significant only for NNCP and goodness-of-fit index; (2) the size of factor loadings and the size of correlations among the factors do not effect any of thegoodness-of-fit indices; (3) sample size effects NNCP,RMSEA, and goodness-of-fit index; and (4) the number of indicators effects only NNCP and goodness-of-fit index.To gain further insights into these effects, we examine themeans and standard deviations of goodness-of-fit indicesfor various combinations of sample sizes and number of indictors (the effects corresponding to the N   NI interac-tion). Table 2 presents the means and standard deviations.It can be seen that RMSEA is not substantially affected by sample size, and irrespective of the number of indicators,the effect seems to be the same for sample sizes of 200 andover. For NNCP and goodness-of-fit index, the effect of sample size becomes more prominent as the number of indicators increase. The mean values for the NNCP revealthe nature of the interaction and, also, the reason whyMcDonald and Marsh (1990) and Marsh et al. (1988) found this index to be insensitive to sample size. If the analysis isrestricted to results for models with 8 or 16 indicators, then,the NNCP would be insensitive to sample size in our studyas well. The inconsistency arises as a consequence of including models with larger number of indicators (i.e., 24and 32 indicators) in our simulation.While it appears from the mean values that RNI and TLIare affected by sample size for a large number of indicators,this effect is not significant, and this conclusion is consistent with previous studies. However, the reason for nonsignifi-cance is probably due to the fact that the standard deviationsof these two indices are relatively large compared with theother three. McDonald and Marsh (1990) noted that RNIand TLI are normed in the population (that is, they assumevalues between 0 and 1) but not in the sample, especially for small sample sizes. Bentler (1990) noted that the range for  Table 1Eta-squares for mean value of GFIs and percent of times models acceptedfor true models NNCP RMSEA RNI TLI GFISample size (N) 0.227 a  0.284 – * – 0.6330.459  b 0.603 0.310 0.304 0.525 Number of 0.151 – – – 0.256indicators (NI) 0.225 0.113 – – 0.187Factor – – – – – loadings (L) – – 0.386 0.378 – Factor – – – – – correlations (P) – – – – – Sample size   0.221      0.095 Number of indicators(N   NI)0.309 0.208 0.055 0.056 0.289Sample size   – – – – – Loadings(N  L) – – 0.168 0.175 –  a  Eta-square for goodness-of-fit indices.  b Eta-square for percent of times true models accepted for cutoff valueof 0.90 (0.05 for RMSEA).* Not significant at   P  V .05.Table 2Means and standard deviations of the GFI for true modelsIndex Number of indicators8 16 24 32Sample size Sample size Sample size Sample size100 200 400 800 100 200 400 800 100 200 400 800 100 200 400 800  Number of indicators and sample size interaction (N    NI)  NNCP 1.00 1.00 1.00 1.00 0.97 0.99 1.00 1.00 0.88 0.97 0.99 1.00 0.72 0.93 0.98 1.000.04 0.02 0.01 0.01 0.08 0.04 0.02 0.01 0.11 0.06 0.03 0.02 0.13 0.08 0.04 0.02RMSEA 0.02 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.03 0.01 0.01 0.00 0.04 0.01 0.01 0.000.03 0.02 0.01 0.01 0.02 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.00RNI 1.02 1.00 1.00 1.00 1.26 1.00 1.00 1.00 0.87 0.96 0.99 1.00 0.77 0.94 0.98 1.002.19 1.52 0.15 0.06 10.79 0.40 0.09 0.04 0.34 0.14 0.09 0.04 0.22 0.15 0.07 0.04TLI 1.02 1.00 1.01 1.00 1.26 1.00 1.00 1.00 0.87 0.96 0.99 1.00 0.77 0.94 0.99 1.001.98 1.37 0.13 0.05 10.19 0.38 0.08 0.04 0.33 0.14 0.08 0.04 0.21 0.15 0.07 0.03GFI 0.93 0.96 0.98 0.99 0.86 0.93 0.96 0.98 0.81 0.89 0.94 0.97 0.76 0.86 0.93 0.960.02 0.01 0.00 0.00 0.01 0.01 0.00 0.00 0.01 0.01 0.00 0.00 0.01 0.01 0.00 0.00 Values of TLI and RNI for models whose factor loadings are .50 or .70 RNI 1.00 1.00 1.00 1.00 0.98 1.00 1.00 1.00 0.94 0.99 1.00 1.00 0.89 0.98 0.99 1.000.10 0.04 0.02 0.01 0.07 0.04 0.02 0.01 0.07 0.03 0.02 0.01 0.08 0.04 0.02 0.01TLI 1.00 1.00 1.00 1.00 0.98 1.00 1.00 1.00 0.94 0.99 1.00 1.00 0.89 0.97 0.99 1.000.11 0.04 0.02 0.01 0.07 0.04 0.02 0.01 0.07 0.04 0.02 0.01 0.08 0.04 0.02 0.01For each index, the values at the top row indicate the means, and the values at the bottom row indicate the standard deviations. S. Sharma et al. / Journal of Business Research 58 (2005) 935–943 938  TLI is large, especially for small samples. In fact, for asample size of 100, the range of TLI was as high as 322.78(low value of    17.89 and a high value 304.89) and therange of RNI was as high as 341.60 (low value of    18.99and a high value of 322.61). These ‘‘outliers’’ obviouslywould affect the significance tests. An examination of theoutliers suggests that most of these outliers are for cases that have small factor loadings (i.e., .3) and small sample sizes(i.e., 100). We can only speculate as to why only these twoindices (out of the five) exhibit such large fluctuations. Areasonable conjecture is that these two indices, in contrast tothe other three, are, essentially, ratios of two statisticsderived from the null and true models. Therefore, theseindices are affected by the badness of the null model as wellas the goodness-of-fit of the hypothesized model. Thisconjecture is further supported by the fact that these twoindices are undefined in the population if the null model istrue, suggesting that these indices would be extremelyunstable in samples if the null model is approximately true(McDonald and Marsh, 1990). This problem is obviouslyexacerbated in the cases of small samples. To determine if the behavior of TLI and RNI change when factor loadingsare .5 or greater, we reanalyzed the data by deleting themodels whose factor loadings are .30. The results indicatedthat the sample size and the number of indicators, and their interaction, were significant (values of   g 2 for the N   NIinteraction are 0.085 and 0.089 for RNI and TLI, respec-tively; values of   g 2 for sample size are equal to 0.124 and0.128 for RNI and TLI, respectively; and values of   g 2 for thenumber of indicators are equal to 0.058 and 0.062 for RNIand TLI, respectively). Table 2 also gives the means andstandard deviations for models whose factor loadings are .50or .70. The behavior of RNI and TLI is similar with that of GFI and NNCP; however, these two indices do not seem to be substantially effected by sample size and number of indicators.The results for the mean values of the indices in Table 2can be summarized as follows: The RMSEA is the least effected index and is insensitive to sample size for samplesizes of over 200. Goodness-of-fit index and NNCP areinsensitive to sample size above some threshold (samplesize) value; however, this threshold value likely variesmonotonically with the number of manifest indicators inthe model and, furthermore, this threshold value may not bethe same for all the indices. That is, for a given index, thesample size at which the index becomes insensitive (tosample size) could be a function of the number of indicators.The behavior of TLI and RNI is erratic for models withsmall factor loadings (i.e., .30). When these models aredeleted, the behavior of TLI and RNI is similar with that of goodness-of-fit index and NNCP, in that TLI and RNI areaffected by sample size, and the effect depends on thenumber of indicators. The question then becomes: Are theeffects of sample size, number of indicators, factor loadings,and factor correlations the same when one uses these indicesto make model acceptance/rejection decisions by comparingan index value to a prespecified cutoff value? That is, what is the impact of the manipulated factors on the Type I error,the error of rejecting the model when it is indeed true? 5.1.2. Percent of models accepted  For each of the 144 cells or conditions defined by samplesize (four levels), number of indicators (four levels), factor loadings (three levels) and factor correlations (three levels),the percent of models accepted for each index was comput-ed. Model acceptance/rejection decision was made by com- paring the value of the index to a prespecified cutoff value(0.90 for NNCP, RNI, TLI, and GFI, and 0.05 for RMSEA).The percent of models accepted was the dependent variablein a 4  4  3  3 ANOVA. Since for each cell, there is asingle observation, the fourth-order interaction was used asthe error term for significance tests. Table 1 also gives the  g 2 of the effects. The following conclusions can be drawn fromthe table: (1) The effect of the interaction of the sample sizewith the number of indicators (N   NI) is even more pronounced for the percent of times the true model isaccepted compared with the mean value of the fit index;this interaction is significant for all the indices. Note that inthe case of the mean value of the fit index, this interactionwas not significant for RMSEA, RNI, and TLI; (2) TheSample Size  Size of Loading (N  L) interaction is sig-nificant for RNI and TLI. This interaction was not present for mean values of the indices; (3) The main effects of sample size for all the indices are significant; (4) The maineffects of the number of indicators are significant for NNCP,RMSEA, and goodness-of-fit index; and (5) The maineffects of factor loadings are significant for RNI and TLI.To gain further insights into these effects, we present inTable 3 the percent of times that true models are acceptedfor the number of indicators for the above significant effects.It is clear from Table 3 that the behavior of goodness-of-fit index is clearly the most aberrant, with substantial samplesize effects when the number of indicators is large, and points to the need to reconsider its continued use in modelevaluation. TLI and RNI are affected by sample size and itseffects depend on the number of indicators. The behavior of these two indices is extremely good for models with factor loadings of .5 or above and with sample sizes of 200 or above. For these models, the effect of sample size andnumber of indicators is practically nonexistent. For the NNCP, on the other hand, sample-size effects are dependent on the number of indicators. The effect of sample size andnumber of indicators appears to be the least for RMSEA.The findings so far suggest that the percent of modelsaccepted (when an index is compared with a cutoff value) isaffected by the interaction of sample size with the number of indicators. In addition, the RNI and TLI are affected by thetwo-way interaction of sample size and size of factor loadings; however, the effects are very little for modelswhose factor loadings are .5 or above. When used for evaluating model fit relative to some cutoff value, RMSEAemerges as the most promising candidate, and the RNI and S. Sharma et al. / Journal of Business Research 58 (2005) 935–943  939
Similar documents
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks