Variation of Preference Inconsistency When Applying Ratio and Interval Scale Pairwise Comparisons

Variation of Preference Inconsistency When Applying Ratio and Interval Scale Pairwise Comparisons
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Variation of Preference Inconsistency When Applying Ratioand Interval Scale Pairwise Comparisons SUSANNA SIRONEN a *, PEKKA LESKINEN a , ANNIKA KANGAS b and TEPPO HUJALA c a Finnish Environment Institute (SYKE), Joensuu, Finland  b FacultyofAgricultureandForestry,DepartmentofForestSciences,UniversityofHelsinki,Helsinki,Finland  c Finnish Forest Research Institute, Vantaa Unit, Vantaa, Finland  ABSTRACTSeveral studies on numerical rating in discrete choice problems address the tendency of inconsistencies in decision makers ’ measured preferences. This is partly due to true inconsistencies in preferences or the decision makers ’  uncertainty on what heor she really wants. This uncertainty may be re 󿬂 ected in the elicited preferences in different ways depending on the questionsasked and methods used in deriving the preferences for alternatives. Some part of the inconsistency is due to only having a discrete set of possible judgments. This study examined the variation of preference inconsistency when applying different pairwise preference elicitation techniques in a   󿬁 ve-item discrete choice problem. The study data comprised preferences of  󿬁 ve career alternatives elicited applying interval scale and numerically and verbally anchored ratio scale pairwisecomparisons. Statistical regression technique was used to analyse the differences of inconsistencies between the testedmethods. The resulting relative residual variances showed that the interval ratio scale comparison technique provided thegreatest variation of inconsistencies between respondents, thus being the most sensitive to inconsistency in preferences.The numeric ratio scale comparison gave the most uniform preferences between the respondents. The verbal ratio scalecomparison performed between the latter two when relative residual variances were considered. However, the verbal ratioscale comparison had weaker ability to differentiate the alternatives. The results indicated that the decision recommendationmay not be sensitive to the selection between these preference elicitation methods in this kind of   󿬁 ve-item discrete choiceproblem. The numeric ratio scale comparison technique seemed to be the most suitable method to reveal the decisionmakers ’  true preferences. However, to con 󿬁 rm this result, more studying will be needed, with an attention paid to users ’ comprehension and learning in the course of the experiment. Copyright © 2013 John Wiley & Sons, Ltd. KEY WORDS:  decision support; MCDA; inconsistency; measurement scales; preference elicitation 1. INTRODUCTIONMulti-objective decision making often requires thecomparison of qualitatively different entities (e.g. Alhoet al., 2001). Decision analysis is a theory designed tohelp the decision maker to make a choice from a set of alternative choices. The purpose of decision analysis isto contribute support to decision making in problemsthat are too complex to be solved only by the commonsense alone (e.g. Schmoldt et al., 2001). Multi-criteria decision analysis (MCDA) is a discipline aimed at supporting decision makers faced with multiple goals.In many MCDA cases, decision criteria are con 󿬂 ictingbecausetypically,allgoalscannotbereachedconcurrently(e.g. a large and affordable  󿬂 at with good quality andpreferred location). The central problem is how toevaluate a set of alternatives in terms of a number of criteria. Multi-objective decisions under certaintyrequire the decision maker to de 󿬁 ne the trade-offsbetween decision criteria or attributes (e.g. Kangaset al., 2008). There is no correct answer what valuesthe decision maker should use, because the trade-offsare subjective (e.g. Keeney and Raiffa, 1976; Kangaset al., 2008). Multi-objective decision making under uncertainty is the most challenging situation, becausethere may be uncertainty in all of the parameters of decision analysis (Kangas et al., 2008).Many MCDA methods with different de 󿬁 nitions andassumptions have been developed including methodsbased on Multi-attribute utility theory (Keeney andRaiffa, 1976), outranking (e.g. Roy, 1968; Brans et al.,1986; Rogers and Bruen, 1998) and the analytichierarchy process (AHP) (Saaty, 1980). The MCDAmethods differ in many respects, for example, withrespecttothepreferenceelicitationtechniquesemployed *Correspondence to: Susanna Sironen, Finnish Environment Institute (SYKE),P.O.Box111,FI-80101,Joensuu,Finland.E-mail: susanna.sironen@ymparisto. 󿬁 Copyright © 2013 John Wiley & Sons, Ltd.  Received 18 October 2012 Accepted 25 April 2013 JOURNAL OF MULTI-CRITERIA DECISION ANALYSIS  J. Multi-Crit. Decis. Anal.  (2013)Published online in Wiley Online Library( DOI: 10.1002/mcda.1500  (Aloysiusetal.,2006).Saaty ’ s(1977,1980)AHPisoneof the most widely used techniques developed for multi-criteria decision analysis (e.g. Schmoldt et al.,2001;Temesi, 2010).Inthe AHP and its generalization,the analytic network process, hierarchies or feedback networks are conducted to describe the decisionstructure(e.g.Saaty,2001a,2001b;KangasandKangas,2002; Wolfslehner et al., 2005). Then pairwisecomparisons of the decision alternatives are made at each level of the decision hierarchy to evaluate their relative importance with regard to each element abovein the hierarchy (Saaty, 1977). Rating a set of alternatives can be a demanding task even when oneuniquecriterionisconsidered,thuspairwisecomparisonmethods simplify the problem by concentrating theattention on pairs of alternatives to be compared under a given criterion (Limayem and Yannou, 2007).It is dif  󿬁 cult to compare the different ratingtechniques, as true ratings of a given decision maker are unobtainable withany method. Thus, no benchmark existsrelatedtowhichthequalityoftheratingscouldbecalculated.Theonlyavailablemeasuresofqualityoftheratings are thus normative, that is comparison tostandards that are selected on the basis of theoreticalgrounds. There has been many studies of the elicitationtechniques applied under MCDA concerning their effects on consistency, reliability and various other normative properties of the decision making; however,there are no consistent results that any speci 󿬁 ctechnique would be superior to other techniques(Aloysius et al., 2006). One normative standard is theconsistency of the given ratings. Although the decisionmakers need to be consistent in their ratings, a goodrating system should not increase the inconsistency of the ratings. The purpose of this study was to examinethevariationofpreferenceinconsistencywhenapplyingdifferent pairwise preference elicitation techniques. Theaim was to explore whether different elicitationtechniques differ in terms of sensitivity to inconsistencyand what areimplications for the useofsuch techniquesin multi-criteria decision analysis tasks. The testedmethods included verbal ratio scale, numeric ratio scaleand interval scale pairwise comparison techniques. Themethodswerecomparedthroughtheregressionanalysisof pairwise comparison data.2. ANALYSING PAIRWISE COMPARISONS 2.1. Alternative scoring techniques In a standard pairwise comparison experiment, everyalternative is paired with every other alternative inturn. The comparison is evaluating the preferenceintensity of a given pair of alternatives with potentialinconsistency between pairs. The AHP utilize ratioscale assessment of the decision alternatives (e.g.Saaty, 2001b). The original scoring techniqueproposed by Saaty (1977) applies a discrete and verbalscoring technique with numerical counterparts 1/9, 1/ 8,  . . . , 1/2, 1/1, 2/1,  . . . , 8/1, 9/1. Other techniqueshave been developed, for example, by Ma and Zheng(1991), Lootsma (1993) and Salo and Hämäläinen(1997). The ratio scale scoring technique applied inthe AHP has been debated and criticized, for example,by Barzilai (2005), who claims that the scale ratios arede 󿬁 ned only if there exist an absolute zero for theproperty under measurement such as in temperature.In addition to ratio scale assessments, other scoringmethods include nominal, ordinal, difference andinterval scale assessments (Saaty, 1980).Nominal scale essentially consists of assigning labelsto objects (e.g. Saaty, 1980). In an ordinal scale, onlydifferences in order can be distinguished, not differencesin preference intensity. Interval scale utilize two  󿬁 xedpoints of a preference scale. This means that we caninterpret differences in the distance along the scale.When applying interval scale methods, the value scaleis de 󿬁 ned by the set of alternatives under consideration.Usually, the most and least preferred alternative withsome criterion create the value scale, and theintermediate alternatives are then evaluated with respect to the speci 󿬁 ed local scale (Kainulainen et al., 2009).Ratio scale consists not only of equidistant points but also of a meaningful zero point. Ratio scale assessmentscan be formed both verbally and numerically. Verbalcomparison of objects is more common in our everydaylives than numerical. In addition, the use of verbalassessments is intuitively appealing and user-friendly.The intensity of importance may vary from equalimportance to absolute importance of one object over another in verbal ratio scale assessments (e.g. Saaty,1980). The verbal ratio scale comparisons must beconverted into numerical ones to derive priorities. For example, applying AHP, the verbal statements areconverted into their numeric counterparts from one tonine (e.g. Saaty, 1980). However, theoretically, there isno justi 󿬁 cation to be restricted only tothis kind of verbalgradation and numbers.The classical pairwise comparison process applyingthe ratio scale assumes that the decision maker cancompare any decision elements pairwisely and providea numerical value of the ratio of their importance(Mikhailov, 2004). The choice of the speci 󿬁 c elicitationtechnique usually depends on the decision problem,the number of alternatives and the time and moneyavailable. S. SIRONEN  ET AL  . Copyright © 2013 John Wiley & Sons, Ltd.  J. Multi-Crit. Decis. Anal.  (2013)DOI: 10.1002/mcda   2.2. Inconsistency and preference uncertainty Subjective preference assessments are ofteninconsistent, because decision makers may havedif  󿬁 culties in evaluating some of the alternatives or they are not necessarily consistent with their ownpairwise evaluations. It may simply be dif  󿬁 cult for the decision maker to express exact numericalestimates of the ratios of importance (Ramik, 2009),and suitable estimate may not be among the discreteset of choices available.Temesi (2010) classi 󿬁 es the decision makers toinformed or uninformed. The former know thepreference intensities explicitly; the latter do not knowthem suf  󿬁 ciently, or the values may not even exist (Temesi, 2010). If decision makers know their preferences, the valuation task involves only revealingthese well-de 󿬁 ned and pre-existing preferences. Yet,the design of the valuation task must be de 󿬁 ned so that the respondent is motivated to research his or her preferences and respond truthfully (Payne et al.,1999). If people do not have existing well-de 󿬁 nedvalues for many objects, the valuation task includesconstructing the preferences. According to Payneet al. (1999), expressed preferences generally re 󿬂 ect both a decision maker  ’ s basic values for highlightedattributes and the particular heuristics or processingstrategies used to combine information selectively toconstruct the required response to a particular situation. Thus, expressed preferences include bothrandom error and two different sources of systematicvariance.Inconsistency is one element of preferenceuncertainty. Measured preferences may also beinconsistent because of behavioural or psychologicalreasons, for example when the respondent becomestired or loses concentration with a long list of pairwisecomparisons or items to rate. It is hard to distinguishwhether and to which extent the observedinconsistencies are based on true inconsistency,uninformed respondent or behavioural reasons or framing or anchoring effects of the preference enquirymethod.In real-life decision, problems pairwise comparisonmatrices are rarely consistent; therefore, one crucialbut challenging point of the methodology is todetermine the inconsistency of the matrices (Hermanand Koczkodaj, 1996¸ Bozóki and Rapcsák, 2008).Decision maker  ’ s preferences may be inconsistent,for example so that for a given three possiblealternatives, alternative A is two times better than B,B is three times better than C, but A is not six timesbetter than C, but maybe four or eight times better.The preferences may even be intransitive, meaningthat alternative A is preferred to B, B is preferred toC, but then alternative C is preferred to A (e.g.Linares, 2009). Intransitives are more prone to happenwith multi-attribute comparisons between alternativesand in MCDA, especially if the comparisons areconducted with pairwise comparisons (Linares,2009). Especially with many pairs of alternatives, thedecision maker may not see the consequences of manypairwise comparisons (e.g. Temesi, 2010).Inconsistency has a possible effect on decisionmaking, since then, it may not be possible to  󿬁 nd out the optimal alternative. If inconsistency is highenough, decision recommendations of the rank order are impossible to render.Many kinds of methods have been proposed toaccount for the inconsistency and the decision maker  ’ sincomplete knowledge of his/her preferences. Oneoption to determine the internal consistency of theapplied method is to perform a classical test-retest experiment and examine the consistency of the elicitedweights at two points in a time (Bottomley and Doyle,2001). In AHP (Saaty, 1980), the inconsistencies of pairwise comparisons are measured by the consistencyratio (CR) of a pairwise comparison matrix, which isthe ratio of its consistency index to the correspondingrandom index value. However, the CR is criticizedbecause its de 󿬁 nition of inconsistency is based on a 10% rule of thumb (e.g. Ramik, 2009). According toBozóki and Rapcsák (2008), Saaty ’ s consistency of pairwise decision matrix is insuf  󿬁 cient to excludeasymmetric inconsistency. It is also criticized to allowcontradictory judgments in matrices (Kwiesielewiczand van Uden, 2004; Bana e Costa and Vansnick,2008). Therefore, several other methods have beenproposed to measure consistency. Crawford andWilliams (1985), for example, prefer to sum thedifference between the ratio of the calculated prioritiesand the given comparisons. Furthermore, ratio scalepairwise comparisons data can be analysed by usingstandard regression models as well (De Jong, 1984;Crawford and Williams, 1985). The regressionapproach has been further developed in several ways(e.g., Alho et al., 1996; Alho and Kangas, 1997;LeskinenandKangas,1998;Alhoetal.,2001;Leskinenet al., 2003). The uncertainty of the estimated prioritiesmeasuredbythe ^ s 2 canbeincorporatedintotheanalysisof preferences through statistical inference by applyingthe regression approach (Alho and Kangas, 1997).Otherwise, the priorities obtained through theregression analysis behave similarly to the prioritiesobtained through Saaty ’ s eigenvalue technique, andthedifferencesinthepriorityestimatesareusuallysmall(e.g. Alho and Kangas, 1997). VARIATION OF PREFERENCE INCONSISTENCY Copyright © 2013 John Wiley & Sons, Ltd.  J. Multi-Crit. Decis. Anal.  (2013)DOI: 10.1002/mcda   3. MATERIAL AND METHODS 3.1. Study material The study material contained parts of the preferenceenquiry experiment carried out at the University of Helsinki (Kainulainen et al., 2009). Altogether, 45forestry students were asked to evaluate  󿬁 ve different career alternatives according to their own preferences.The career alternatives included lecturer, forestry advisor,senior inspector, software designer and researcher. Thestudents were  󿬁 rst given a brief three-sentenceintroduction to each of the career alternatives. Thestudents were asked to give their preferences throughdirect rating and pairwise comparisons both in intervaland ratio scales. Each assignment was distinct. Thisparticular study included three different pairwisecomparison techniques: verbal ratio scale technique,numeric ratio scale technique, and numeric interval scaletechnique. These preference elicitation techniques wereselected, because they represent distinct and practicallyapplicable alternatives that have psychometrically different characteristics and two different but simple calculationprocedures that are both well-known in literature andthus enable comparison with other research. In verbalratio scale technique, the intensity of preference wasgiven in a scale from 1 to 9 as proposed by Saaty(1980). In the numeric ratio scale technique, intensityof preference was obtained as a comparison to the least preferred alternative with 100 points within each pair.In SMART methods, the alternatives are rated relativeto the least important one (e.g. Kangas et al., 2008). Innumeric interval scale technique, preference differencein each pair was compared with the difference of theoverall best and worst alternatives set as 100 (seeAppendix). Kainulainen et al. (2009) analysed thesedata sets from the viewpoint of priorities, whereas thisstudy took into account inconsistencies involved. 3.2. Methods Regression technique was used to analyse all thepairwise comparisons. In general, we may consider the relative merits of some attributes 1, . . . ,  n . In a pairwise comparisons experiment, the judge is askedto evaluate the attributes in a pairwise manner.Usually, all the pairs ( i ,  j  ) are compared, which leadsto a maximum of   m = n ( n  1)/2 comparisons (e.g.Leskinen, 2000). First, let   r  ij   be the relative value of attribute  i  compared with attribute  j   assessed by judgewith respect to a single decision criterion. It isassumed that   r  ij  = v i  /  v  j  exp( e ij  ), where  v i  and  v  j   refer to the true and unknown values of attributes  i  and  j  ,and  a  j   measures the uncertainty or error with whichthe true values are obtained in the elicitation by the judge. Then, by de 󿬁 ning  y ij  =ln( r  ij  ), the regressionmodel for ratio scale pairwise comparisons data in a single-judge case becomes (Crawford and Williams,1985; Alho and Kangas, 1997)  y ij   ¼ a i  a  j  þ e ij   (1)where  a i =ln( v i ) and the residuals  a  j   are uncorrelatedwith  E  ( e ij  )=0 and Var( e ij  )= s 2 . The  r  ij   is analysed onlogarithmic scale in the regression model, becausethe  r  ij  > 0 and after the logarithmic transformation,the response becomes an arithmetic scale variableallowing the direct utilization of normal distribution.The estimation method is similar for the pairwisecomparisons at the interval scale as well, except for thislogarithmic transformation (Kainulainen et al., 2009).The regression model can be written as Y ¼ X a þ « ;  (2)where Y , a ,and « arethevectorsofresponse,parametersand residuals, respectively, and  X  is a design matrixde 󿬁 ning the pairwise comparison in question (e.g. Alhoand Kangas, 1997). The parameters can be estimatedby ordinary least-squares technique. The ordinary least-squares estimators are  ^ a  ¼  X T X    1 X T Y  for the vector  a  and  ^ s 2 ¼  Y   X  ^ a ð Þ T Y   X  ^ a ð Þ  m  n þ 1 ð Þ  for theresidual variance, where  m  is the number of pairwisecomparisons and  n  is the number attributes to becompared.To ensure identi 󿬁 ability, it is required that   a n  0,where  n  is the number of attributes to be compared.Thus, the parameter   a i  measures the value of the entity i  relative to the entity  n . Thus, in this particular study,where the  n =5, the design matrix  X  gets the form  X ¼ 1   1 0 01 0   1 01 0 0   11 0 0 00 1   1 00 1 0   10 1 0 00 0 1   10 0 1 00 0 0 1 26666666666666643777777777777775 ;  (3)and a =( a 1 , a 2 , a 3 , a 4 ) T  , Y =(  y 12 ,  y 13 ,  y 14 ,  y 15 ,  y 23 ,  y 24 ,  y 25 ,  y 34 ,  y 35 ,  y 45 ) T  and  « =( e 12 , e 13 , e 14 , e 15 , e 23 , e 24 , e 25 , e 34 , e 35 , e 45 ) T  (Alho and Kangas, 1997). The estimates of the valuescan be transformed to the scale of priorities by  ^ a i  ¼ exp  ^ a i ð Þ = Σ i  exp  ^ a i ð Þ , so that   Σ i ^ a i  ¼ 1 S. SIRONEN  ET AL  . Copyright © 2013 John Wiley & Sons, Ltd.  J. Multi-Crit. Decis. Anal.  (2013)DOI: 10.1002/mcda   In this case, the career alternatives were arrangedin the following order: lecturer, forestry advisor,senior inspector, software designer and researcher asit was in the previous study of Kainulainen et al.(2009).Regression models were constructed separately for each of the student respondents, and for all of theelicitation techniques. The preferences expressed withthe verbal ratio scale technique were  󿬁 rst transformedto a numeric scale according to Saaty (1980), that is tointeger values varying from 1 to9. Toreceive as similar scale as possible for all of the elicitation methods toexamine residual variances, for example, the medianof the maximum relative values perceived by therespondents with the numeric ratio scale techniquewas used as the highest value in the verbal ratio scaletechnique. This value was 2; therefore, the scaleconstructed in this way took a form 1/2, 1/1.875,  . . . ,1/1.125, 1, 1.125,  . . . , 1.875, 2.If pairwise comparisons  r  ij   are perfectly consistent,the residual variance  ^ s 2 =0. Otherwise,  ^ s 2 > 0. Thus,inconsistency of the pairwise comparisons may bemeasured by the residual variance  ^ s 2 in a statisticalway. In addition to the residual variances  ^ s 2 ¼ Y   X  ^ a ð Þ T Y   X  ^ a ð Þ  m  n þ 1 ð Þ  , relative residualvariances were calculated, because they are not asscale dependent as the residual variances as such.Furthermore, the square of the multiple correlationcoef  󿬁 cient, that is, the coef  󿬁 cient of determination(  R 2 ) was calculated to describe the overall quality of the resulting regression. In this case, it may beestimated by  R 2 =1  SST   /  SSR , where  SSR  equals thesum of squared residuals from regression and  SST  equals the sum of squared scores, because there is noconstant in the regression models (e.g. Alho et al.,2001). Without the constant, the degrees of freedom for   SST   are  n  or the total number of the comparisonsincluded. The degree of freedom for   SSR  is  n  r  ,where  r   is the number of columns in the design matrix.Then adjusted  R 2 is  R 2adj  ¼ 1  SSR =  n  r  ð Þ SST  = n (4)Furthermore, the null hypothesis H 0 : a i = a  j  , that isthe merits of attributes  i  and  j  , against H 1 : a i 6¼ a  j   wastested by t   ¼  ^ a i  ^ a  j   ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Var   ^ a i ð Þþ Var   ^ a  j     2  Cov  ^ a i ; ^ a  j   q   ;  (5)which has  t  m  n +1  distribution under H 0  andnormality of residuals. The variances and thecovariances are in the form Cov  ^ a ð Þ¼ ^ s 2 X T X    1 ,that is, the denominator of the Equation (5) reducesto  ffiffiffiffiffiffiffiffi  2 ^ s 2 p   = n  for all  i  and  j   (e.g. Alho et al., 2001;Leskinen, 2000). The estimated regression coef  󿬁 cientswere  󿬁 rst ordered, and the best, that is, the largest regression coef  󿬁 cient was always tested against theothers.The resulting  p -values wereclassi 󿬁 ed in different classes according to the magnitude of the  p -value, andthe frequencies within these classes were calculated.4. RESULTSOf the srcinal dataset, three answers were incomplete,and they were dropped out. Thus, the  󿬁 nal studymaterial comprised altogether 42 student evaluations.The  󿬁 rst step was to derive the preference weights from the regression models constructed for each of thestudent respondents. Differences in the inconsistencybetween the elicitation techniques were analysedthrough estimated residual variances and relativeresidual variances of the regression models, coef  󿬁 cientsof determination, and by testing whether the estimatedregression coef  󿬁 cients are equal. The average of theestimated residual variances was smallest with theverbal ratio scale technique (Table I). Also, the rangeof the residual variances was much smaller than inthe other techniques, varying from 0.003 to 0.055. The Table I. Residual variance, relative residual variance andcoef  󿬁 cients of determination (  R 2 and adjusted  R 2 ) receivedthrough regression analyses for the pairwise comparisondata acquired with the tested scaling techniques Verbal ratio Numeric ratio IntervalResidual varianceMin 0.003 0.001 0.003Mean 0.021 0.432 0.116Max 0.055 16.81 2.156Relative residual varianceMin 0.054 0.001 0.030Mean 56.08 10.48 993.1Max 2303 381.1 41667  R 2 Min 0.594 0.301 0.703Mean 0.070 0.117 0.075Max 0.992 1.000 0.994  R 2adj Min 0.324   0.165 0.505Mean 0.873 0.884 0.876Max 0.986 1.000 0.990 VARIATION OF PREFERENCE INCONSISTENCY Copyright © 2013 John Wiley & Sons, Ltd.  J. Multi-Crit. Decis. Anal.  (2013)DOI: 10.1002/mcda 
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks