Assessing the quality of diagnostic studies using psychometric instruments: applying QUADAS

Assessing the quality of diagnostic studies using psychometric instruments: applying QUADAS
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  ORIGINAL PAPER Rachel Mann   Catherine E. Hewitt   Simon M. Gilbody  Assessing the quality of diagnostic studies using psychometricinstruments: applying QUADAS Received: 17 April 2008/ Revised: 15 September 2008/Published online: 4 October 2008 j  Abstract  Background   There has been an increasein the number of systematic reviews of diagnostictests, which has resulted in the introduction of twochecklists: statement for reporting of diagnosticaccuracy (STARD) and quality assessment of diag-nostic accuracy studies (QUADAS).  Objective  Toexamine the validity and usefulness of QUADAS whenapplied to diagnostic accuracy studies using psycho-metric instruments and to examine the quality inreporting of these studies during practical applicationof the checklist.  Method   Two reviewers indepen-dently rated the quality of 54 studies using QUADAS.The proportion of agreement was used to assessoverall agreement and individual agreement of QUADAS items between reviewers.  Results  Theoverall agreement between the two reviewers for allQUADAS items combined was 85.7%. The proportionof agreement between reviewers for each item rangedfrom just over 57–100% and was over 80% for 8 of theitems. The poorest agreement was associated with theitems for selection criteria, indeterminate results andwithdrawals. None of the studies adequately reportedall relevant information to enable all QUADAS item tobe scored as ‘yes’.  Conclusion  Overall QUADAS wasrelatively easy to use and appears to be an acceptabletool for appraising the quality of diagnostic accuracy studies using psychometric instruments. The appli-cation of QUADAS was hampered by the poor quality of reporting encountered. j  Key words  QUADAS – postnatal depression –quality appraisal – validity – psychometric instru-ments Introduction Diagnostic and screening tests are important in theclinical decision making process as they contribute toinformed diagnosis and patient care. The diagnosticaccuracy of such tests, however, must be established toensure that a patient’s diagnosis is correct and thus thepatientreceivestheappropriatemedicalcare[54].Withthe advent of evidence based practice the quality of diagnostic accuracy studies are particularly importantgiven the need to systematically evaluate all relevantresearch evidence. Therefore the use of systematic re-views to identify, appraise and synthesise all relevantevidence in diagnostic accuracy studies have comeincreasinglytotheforefrontwiththeviewtoinformingbest practice [24]. Both the quality of reporting and theappraisal of the quality of diagnostic accuracy studiesare fundamentally connected and recent attention hasfocused on both types of quality process in this area.Within the realms of evidence based practice, anumber of initiatives have emerged to generally im-prove the conduct and reporting of clinical research.The most prominent of these has been the CONSORTstatement, in association with randomised trials[4, 45]. A similar initiative—the statement for reporting of diagnostic accuracy (STARD) 2003guidelines—were designed to optimize the accuracy,transparency and completeness of reporting diagnos-tic studies [17]. However, a recent study examining theimpact of the guidelines on the quality of reportingfound that the quality of reporting had only slightly improved; the mean number of STARD items reportedin the 265 studies between 2000 (pre-STARD) and2004 (post-STARD) increased from approximately 12to 14 items, respectively [55].With regard to the quality appraisal of diagnosticaccuracy studies, a number of tools or ‘checklists’have been produced to judge the quality and validity of diagnostic studies. A review found large variationamong 91 tools identified in their definition of quality         S       P       P       E       4       4       0 R. Mann, MSc ( & )  Æ  C.E. Hewitt, PhDProf. S.M. Gilbody, Dphil, MRCPsychDept. of Health SciencesUniversity of YorkYork YO10 5DD, UK Soc Psychiatry Psychiatr Epidemiol (2009) 44:300–307 DOI 10.1007/s00127-008-0440-z  and the number of items to assess quality, further-more none had been systematically validated [63]. Aquality assessment tool, Quality of Diagnostic Accuracy Studies (QUADAS) has been developed using con-sensus methods to address this gap [62]. QUADAS isa structured checklist comprising 14 items which arerecorded as ‘yes’, ‘no’ or ‘unclear’. The items providea standardised approach to quality assessment andcover patient spectrum, choice of reference standard,disease progression bias, verification bias, review bias,clinical review bias, test execution, study withdrawalsand indeterminate results.QUADAS has only recently been formally validatedin 30 studies of patients with peripheral artery disease[64]. The level of agreement and the final consensusrating between three reviewers was over 80% for 10out of the 14 items. QUADAS has also been piloted inseveral systematic reviews in conditions such asosteoporosis, tuberculosis, paediatric urinary tractinfection, haematuria, shoulder pain, prostate cancer,dengue fever and angina [64]. On the basis of the pilotreviews and validation study the structure of QUA-DAS remains unchanged, although guidance forassessment of items concerned with indeterminateresults and withdrawals has been clarified due todifficulties with interpretation of guidance notes.QUADAS has been recommended by the CochraneDiagnostic Test Accuracy Working Group as theassessment tool of choice for systematic reviews of alldiagnostic accuracy studies [20]. QUADAS wasdeveloped as a generic quality assessment tool but themost widely reported applications of the checklisthave generally focused on laboratory tests, physicalexaminations or various imaging and radiology techniques. A recent systematic review to establishtest performance of the PHQ-9, a self-completedpsychometric screening tool which aims to identify amajor depressive episode, is one study which hasapplied QUADAS to a psychometric instrument [66],however this study did not validate QUADAS.To our knowledge, no validation study currently exists of QUADAS for any psychometric instrumentand in addition no basic appraisal of the quality of reporting of psychometric instruments have beenundertaken. This is surprising given that in somehealth care disciplines, for example mental health,diagnostic tests are generally conducted with psy-chometric instruments to identify the target conditionin question (e.g. depression) [29]. As part of a widerHealth Technology Assessment review [32] on meth-ods of identification of postnatal depression we ap-plied QUADAS to the diagnostic accuracy studies.The objectives of this paper were therefore two-fold:to examine the validity and usefulness of QUADASwhen applied to diagnostic accuracy studies usingpsychometric instruments designed to identify wo-men with postnatal depression, and to examine thequality in reporting of these studies during practicalapplication of the checklist. Methods Two reviewers (CH and RM) independently rated the quality of 54diagnostic studies using QUADAS [1–3, 5–16, 18, 19, 21–23, 25–28, 30, 31, 33–36, 38–44, 46–48, 50–53, 56–61, 65, 67–69]. Our valida- tion studies were drawn from a wider review of the clinical utility of screening or ‘case finding’ instruments in the detection of depres-sion in post natal women. A systematic literature search was per-formed dating between the databases inception and February 2007.A comprehensive search was undertaken to identify the literatureusing electronic sources, forward citation searching of key litera-ture, personal communication with authors and inspection of ref-erence lists. We included all studies validating depressionidentification strategies in the postnatal period and published in theEnglish language. All studies had to use a standardised diagnosticcriterion-based interview (such as DSM-IV or ICD-10) as a ‘goldstandard’ against which to judge the performance of the screeninginstrument. The majority of studies related to the application of theEdinburgh post-natal depression scale (EPDS) [23].QUADAS items were rated as ‘yes’, ‘no’ and ‘unclear’ inaccordance with the user’s guidance. Any disagreement betweenreviewers was resolved by discussion or consensus by a third party.We excluded QUADAS items for which articles had been pre-se-lected or which were not applicable. Item 12, which relates towhether the same clinical information at the time of test interpre-tation was available as would be in clinical practice, was excluded.Scoring and interpretation of the index test for psychometricmeasures was defined as ‘fully automated’, as per the QUADASguidance notes. The score obtained from patient self completeddata uses a pre-defined scale and can be undertaken withoutknowledge of any other clinical information, such as age, sex andsymptoms. Therefore when using a pre-defined scoring scale on apsychometric instrument any prior knowledge of clinical infor-mation when calculating and interpreting an individuals scorewould not influence the diagnostic test result.This item highlights a clear difference when applying QUADASto a psychometric instrument rather than in a clinical arena. Forexample, clinician examination to detect an anterior cruciate liga-ment tear of the knee using the Lachman test, which requiresflexion and manipulation of the leg, requires clinician based judgement on signs and severity of symptoms at the time of examination [49]. Therefore the same clinical information wouldneed to be available for clinical practice and test interpretation.In relation to question 4, a two week period between the ref-erence standard and index test was decided, a priori, to be shortenough to be reasonably sure that the target condition did notchange between the two tests. Question 13 was altered slightly torefer to missing items/unclear responses on the identificationstrategy rather than uninterpretable/intermediate test results.We used the proportion of agreement to assess overall agree-ment of all QUADAS items between reviewers and agreement be-tween reviewers for each individual QUADAS item. Kappa statisticswere not calculated given their limitations [37]. Results j  Assessment of agreement The overall proportion of agreement between the tworeviewers for all QUADAS items combined was 85.7%.Table 1 summarises the proportion of agreement be-tween the two raters for each individual QUADASitem. The proportion of agreement between reviewersfor each item ranged from just over 57 to 100% andwas over 80% for eight of the items. The poorestagreement was associated with the items for patientspectrum, selection criteria, uninterpretable results 301  and withdrawals. Examination of cross tabulated datarevealed that disagreement was generally between‘yes’ and ‘unclear’ responses or ‘no’ and ‘unclear’ re-sponse. We acknowledge that this disagreement is stillimportant as in practice ‘unclear’ ratings are oftencombined with the ‘no’ ratings for simplification. Quality of reporting Of the 54 diagnostic accuracy studies of identificationstrategies for postnatal depression only two studiesincluded a flow diagram [35, 42], as recommended by  STARD and referred to in the QUADAS guidance. Aflow diagram details the numbers of patients pro-gressing through each stage of the study, which wouldassist with the interpretation of items 2, 13 and 14,which were associated with the poorest agreement.Interestingly, approximately one third (30%) of the 54papers were published after the STARD guidelineswere introduced in 2003. None of the studies that weincluded provided adequate reporting of all relevantinformation to enable each QUADAS item to bescored as ‘yes’. Inconsistency in the reporting of ‘adequate’ or ‘good’ quality information existed forstudies both pre and post STARD guidance. Whereexamples were found of what we considered ‘ade-quate’ or ‘good’ quality reporting practice, these wereextracted and are presented below. These demonstratethe type of information that assisted the reviewers inthe application of QUADAS. The practical application of QUADAS When assessing the quality of diagnostic accuracy studies we found the quality of reporting of somestudies hampered our efforts to judge the applicability and to interpret some of the QUADAS items. Quality of reporting is essential in diagnostic accuracy studiesif QUADAS is to be applied effectively. Recently theSTARD guidelines were published with the overallaim of increasing the quality and transparency inreporting. This is important as diagnostic accuracy studies are prone to particular forms of poor designwhich may bias the overall results of the systematicreview. Based on our use of QUADAS we have com-piled some recommendations for improving thereporting of studies, however, these are with specificreference to the application of QUADAS in the con-text of psychometric instruments to detect psychiat-ric/psychological disorders (such as post-nataldepression). Based on our use of QUADAS we haveprovided some examples from the 54 papers we as-sessed which we felt represented good reportingpractice as related to an individual QUADAS item.These examples have been highlighted because they provided clear information when the QUADAS itemswere applied to the study. We have provided a shortrecommendation statement of reporting practice forcertain QUADAS items where it was not practicable toprovide an example of good reporting practice.1. Was the spectrum of patient’s representative of thepatients who will receive the test in practice?Recommendation: As recommended in the QUA-DAS guidance notes, reviewers should pre-specify thespectrum of patients based on their knowledge of thetarget condition. From the pre-specified protocol of our review, our spectrum of patients was to include allnewly-delivered mothers within one year, in any clinicsetting in urban and rural locations. However, wefound that an explicit statement which described thetype of patient population, and the location andmethod of recruitment to the study was helpful indetermining patient representativeness. Table 1  Agreement between reviewers for each QUADAS itemQUADAS item Proportion of agreement (%)1. Was the spectrum of patient’s representative of the patients who will receive the test in practice? 72.22. Was the selection criteria clearly described? 57.43. Is the reference standard likely to classify the target condition? 98.24. Is the time period between the reference standard and the index test short enough to be reasonably sure that thetarget condition did not change between the two tests?75.95. Did the whole sample or a random selection of the sample receive verification using a reference standard of diagnosis?81.56. Did patients receive the same reference standard regardless of the index test result? 98.27. Was the reference standard independent of the index test (i.e. the test did not form part of the reference standard)? 1008. Was the execution of the index test described in sufficient detail to permit replication of the test? 94.49. Was the execution of the reference standard described in sufficient detail to permit its replication? 88.910. Were the index test results interpreted without knowledge of the results of the reference standard? 90.711. Were the reference standard results interpreted without knowledge of the results of the index test? 94.412.  Were the same clinical data available when test results were interpreted as would be available when the test is used inclinical practice?  a –13. Were uninterpretable/intermediate test results reported? 72.214. Were withdrawals from the study explained? 75.9 a The item relating to availability of clinical data was not assessed302  Example of good reporting practice:Between May 1998 and August 1999 all Norwegianspeaking postnatal women older than 18 years intwo communities in Norway were invited to par-ticipate in a study of mental health. These twocommunities are situated approximately 60 k eastof Oslo ……  The women were recruited from twocommunity based child health clinics. These clinicsprovide routine health control examinations frombirth through 6 years of age. The child clinics re-ceive information from the hospitals about eachlive birth in their district. The data in our study were collected by a self administered questionnairedistributed to the mothers by the health personnelat the child health clinic 6 weeks after delivery [25]2. Was the selection criteria clearly described?Recommendation: We found that 29 of the 54studies were rated as unclear for quality of reportingtheir selection criteria. Eligibility criteria for patientsincluded in each study should be clearly stated, if possible both inclusion and exclusion criteria shouldbe stated or exclusions may be inferred by the eligi-bility criteria.Example of good reporting practice:Eligibility for sample inclusion involved (a) beingat least 18 years old (b) able to speak and readEnglish (c) being between 2 and 12 weeks postpartum (d) delivering a live, healthy infant [12]Eligibility criteria for inclusion in the sample wereas follows: the women had to be (a) Hispanic (b)able to read Spanish (c) 18 years of age or older (d)between 2 and 12 weeks postpartum, and (e)without a diagnosis of depression during preg-nancy (self-report) [14]3. Is the reference standard likely to classify the targetcondition?Recommendation: Explicit statement regarding thereference standard used. Of the 54 papers that wereassessed using QUADAS, 54 explicitly specified areference standard that was acceptable for use asdiagnostic criteria for post-natal depression.4. Is the time period between the reference standardand the index test short enough to be reasonably sure that the target condition did not change be-tween the two tests?Recommendation: We found that 21 of the 54studies were unclear as to a specified time frame be-tween the index test and reference standard. The timeframe between the index test and the reference testmust be explicitly stated otherwise disease progres-sion bias may be inferred.Example of good reporting practice:Mothers …… were sent various questionnairesincluding the EPDS at 6–7 weeks postpartum … ..Afew days after this (mean 3 days) they were inter-viewed in their home [43] … all subjects completed the BDI …… .between 6and 8 weeks postpartum. An average of 7.6 dayslater (SD = 5.9) they were interviewed for depres-sive symptomatology [61]5. Did the whole sample or a random selection of thesample receive verification using a reference stan-dard of diagnosis?Two issues arose with this item. The first wasregarding the clarity of reporting the reference stan-dard for studies which had applied the reference testto the whole sample. Fifty of the studies had appliedthe reference standard to the whole sample, howeverthis was not always explicitly stated and in themajority of cases we had to calculate 2  ·  2 tables toensure that we could account for all reference stan-dard diagnoses and therefore could account for allparticipants. The second issue was regarding studieswhich selected pre-scored participants and thensampled other participants within the remainingsample. In four studies, the sample included boththose participants who were pre-scored above a cer-tain threshold (e.g. those women scoring above 12 onthe EPDS) and a sample of control or low scoringparticipants. As the participant sample of controls orlow scorers appeared to be selected and the method of selection had not been described adequately thesestudies were scored ‘no’ for this item.Recommendation: A clear statement should bemade regarding the actual number of people whowere interviewed with the reference standard and thenumber of diagnostic cases. Where a pre-scoredsample of total participants is selected the method of selecting the sample should be explicitly described toenable this item to be scored as ‘yes’.6. Did patients receive the same reference standardregardless of the index test result?Recommendation: An explicit statement whichclearly describes that all participants received thereference standard, and the title and citation for thereference standard should be given.7. Was the reference standard independent of theindex test (i.e. the test did not form part of thereference standard)?Recommendation: An explicit statement should begiven which clearly states that the index test andreference test are independent and clear citations foreach test which make it transparent to the reader thatthis is the case. 303  8. Was the execution of the index test described insufficient detail to permit replication of the test?9. Was the execution of the reference standarddescribedinsufficientdetailtopermititsreplication?Recommendation: There should be clearly definedsections within the methods section with sub-head-ings which state the name and the citation of thequestionnaires or measures used. Any modificationsto the index test (questionnaire) such as languagemodifications due to colloquial phrasing should bedescribed and a copy supplied as an appendices, sothat replication would be possible. This also applies tothe reference standard, for example where modifica-tions are made with diagnostic interview schedulesdue to differing time frames.10. Were the index test results interpreted withoutknowledge of the results of the reference stan-dard?11. Were the reference standard results interpretedwithout knowledge of the results of the index test?Recommendation: Where the interpretation of theindex test and reference standard has been conductedblind, this should be reported in the study for bothitems. Forty nine of the 54 studies were scored asunclear on either one or both items associated withblinding. Four studies reported blinding on bothitems purely due to the study design (i.e. the indextest was pre-scored to determine which women abovea certain score would be offered a diagnostic interview with a reference standard). Only one study explicitly stated blinding on both items.Example of good reporting practice:The psychiatrist coded the PSE blind to the EPDSscore. The questionnaires were scored by an inde-pendent coder blind to the PSE ratings [42]Example of blinding both items by design:All the women with an EPDS sum score of 8 orhigher  …… was included (n = 100). Patient histo-ries were recorded and diagnose established by apsychiatrist (blind to their past EPDS scores) [16]13. Were uninterpretable/intermediate test resultsreported?14. Were withdrawals from the study explained?Recommendation: We found that 17 of the 54studies assessed were scored as unclear on either oneor both of these items. As with item 5, we couldgenerally discern whether these items could be scored‘yes’ after carefully reading the whole paper and con-struction of 2  ·  2 tables in the majority of cases. Wefound that only 2 of the 54 studies actually provided aflow diagram which is recommended by STARDguidelines, and referred to in the QUADAS guidancenotes to aid the assessment of these items [35, 42]. Example of good reporting practice: 992 deliveries815 mothersincluded (82.1%)57 not completed227 EPDS > 8(m1)758 completed(93%)177 mothers not included531 EPDS < 8(m2)200 randomizedcontrol subjects(m3)363 subjectsanalysed (85.8%)331 mothers not called427 phone-call +MINI(m1 +m3)60 failed to call64 subjectsnot analysed4 schizophrenic women21 illiterate156 denials Flow diagram from Jardri et al.(2006) [35]304
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!