A multidisciplinary evaluation of inter-reviewer agreement of the nephrometry score and the prediction of long-term outcomes

A multidisciplinary evaluation of inter-reviewer agreement of the nephrometry score and the prediction of long-term outcomes
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  A Multidisciplinary Evaluation of Inter-Reviewer Agreementof the Nephrometry Score and the Prediction ofLong-Term Outcomes Christopher J. Weight, Thomas D. Atwell, Robert T. Fazzio, Simon P. Kim,McCabe Kenny, Christine M. Lohse, Stephen A. Boorjian, Bradley C. Leibovichand R. Houston Thompson* From the Departments of Urology (CJW, SPK, SAB, BCL, RHT), Radiology (TDA, RTF, MK) and Health Sciences Research (CML),Mayo Clinic, Rochester, Minnesota  Purpose:  The nephrometry score was introduced in 2009 as a way to quantifyrenal tumor complexity in a systematic way. However, the reproducibility of scoring has not been rigorously validated across specialty or level of training, norhas it been evaluated with regard to meaningful clinical outcomes. Materials and Methods:  We identified 95 consecutive patients with a solid renalmass treated surgically. Each renal tumor was separately scored by 6 reviewers,including 2 staff urologists, 1 staff radiologist, 2 trainees (1 urology, 1 radiology)and 1 medical student. Inter-reviewer agreement for nephrometry score wasevaluated using Lin’s concordance correlation coefficient. We evaluated the abil-ity of the nephrometry score to predict surgery type, pathological features andclinical outcomes. Results:  Agreement in nephrometry score was substantial among the 3 staff physicians (0.72, 95% CI 0.64–0.80). Nephrometry score agreement continued tobe substantial when including the trainees and medical student in the analysis(0.75, 95% CI 0.69–0.81). The median nephrometry score of patients treated withradical nephrectomy was 9.0 vs 7.2 for those treated with a nephron sparing approach (p   0.001). Increasing nephrometry score was associated with in-creased risk of distant metastasis (HR 3.27, p   0.001), death from renal cellcarcinoma (HR 2.83, p  0.001) and death from any cause (HR 1.24, p  0.017). Conclusions:  Nephrometry scoring with minimal initial instruction was robustacross specialties and levels of training. The additional anatomical informationthat nephrometry score adds to size alone may be associated with other impor-tant clinical outcomes such as tumor aggressiveness and survival, and warrantsfurther study. Key Words:  carcinoma, renal cell; nephrectomy; radiology; survival Abbreviationsand Acronyms CT  computerized tomographyMRI  magnetic resonanceimagingNS  nephrometry scorePN  partial nephrectomyRCC  renal cell carcinomaRN  radical nephrectomy Submitted for publication January 17, 2011.* Correspondence: Department of Urology,Mayo Clinic, 200 First St. SW, Rochester, Min-nesota 55905 (telephone: 507-284-3982; T HE  use of partial nephrectomy totreat renal masses has been increas-ing in North America. 1–3 The reasonfor this increase in use is multifacto-rial, including the decreasing size of tumors at presentation, 4,5 the compa-rable complication profile and canceroutcomes, 6–14 and the improved over-all survival observed in patientstreated with PN vs RN. 10,15–18 How-ever, comparisons between PN andRN are problematic because they arenot randomized, and even when strat-ifying by stage, tumors selected forRN in most comparative series arestill on average larger, more likely to 0022-5347/11/1864-1223/0 Vol. 186, 1223-1228, October 2011THE JOURNAL OF UROLOGY  ® Printed in U.S.A.© 2011 by A MERICAN  U ROLOGICAL  A SSOCIATION  E DUCATION AND  R ESEARCH , I NC . DOI:10.1016/j.juro.2011.05.052  1223  be malignant, have a higher Fuhrman grade andmore likely to have lymph node metastasis thanthose treated with PN. 6,9,10,12,14,17 Thus, size or clin-ical stage alone appear not to be sufficiently fineinstruments to address all the factors surgeons usein determining when to offer PN or RN. To ade-quately compare surgical series in patients treatedwith RN and PN, and to compare series among in-stitutions, it became necessary to develop a stan-dardized scoring system for renal tumors.The R. (radius), E. (exophytic/endophytic), N. (near-ness), A. (anterior), L. (location) nephrometry scorehas been proposed as such a model. 19  Although pre-sentedasasystemthatoffersareproduciblemethodof scoring of renal tumors, the srcinal description didnot present any assessment of agreement across dif-ferent observers. Furthermore, it is unclear what de-greeoftrainingisneededtoproperlyscorethemasses.In other words, are scores more comparable when uro-logical surgeons are scoring the same tumor vs radiol-ogists? Can trainees or medical students score tumors just as reliably as staff physicians? Once the reproduc-ibility of the NS is validated at other institutions, NSwill need to be evaluated to see if it properly scoresincreasingly complex tumors and whether these in-creasingly complex tumors are associated withmeaningful clinical outcomes. We evaluated theagreement in NS and each of its components using multiple reviewers at multiple levels of training inmultiple specialties. We also evaluated the ability of NS to predict the type of surgery performed (RN vsPN), and its ability to predict other meaningful out-comes such as metastasis-free, cancer specific andoverall survival. METHODS We identified 100 consecutive patients treated surgicallyfor a renal mass between January and October 2000 withimages available for electronic review. Of these patients95 had a solitary, enhancing, renal mass and were in-cluded in the study, including 9 with metastatic diseaseand 9 with benign renal masses. This cohort from a decadeago was chosen so we could evaluate the effect of NS onoutcomes such as metastasis-free, cancer specific andoverall survival. A total of 56 (59%) patients were treatedwith open RN, 33 (35%) treated with open PN, 5 (5%)treated with laparoscopic RN and 1 (1%) treated withradio frequency ablation. Preoperative CT (77) or MRI (18)was reviewed and rendered a score by a staff radiologist, aradiology resident, 2 staff urological surgeons who rou-tinely perform PN, a urology fellow and a medical student.EachreviewerrecordedallofthecomponentsoftheR.E.N.A.L.NS individually, 19 and 5 of the reviewers recorded actualtumor size in maximal diameter in addition to the Rcomponent. The only instruction on scoring the tumorsgiven to each of the reviewers was to look at the Each of the reviewers was blinded tothe type of surgery used to treat the patient.Inter-reviewer agreement for nominal variables (the A component and hilar location) was evaluated using mul-tirater kappa coefficients (Fleiss, 1981). Agreement forordinal variables (the R, E, N and L components) wasevaluated using Kendall’s coefficient of concordance(1995). Agreement for continuous variables (NS and tu-mor size) was evaluated using Lin’s concordance correla-tion coefficient (1989). To evaluate agreement as tumorsize increased we took the mean R component score acrossreviewers and rounded to the nearest integer. We thencompared the remaining components of NS (E, N, A, L,hilar) stratified according to mean R score, ie agreementfor R1, R2 and R3. For all coefficients, values of 0 corre-spond to agreement no better than chance alone and val-ues of 1 correspond to perfect agreement. Values of 0.0 to0.2, 0.2 to 0.4, 0.4 to 0.6, 0.6 to 0.8 and 0.8 to 1.0 corre-spond to slight, fair, moderate, substantial and almostperfect agreement, respectively. Nephrometry scores werecompared between patients treated with and without RNusing Wilcoxon rank sum tests in metastasis-free pa-tients. Univariate associations of mean NS across all re- viewers with metastasis-free, cancer specific and overallsurvival were evaluated using Cox proportional hazardsmodels. Statistical analyses were performed using theSAS® software package. RESULTS Inter-reviewer Agreement in NS Measuresofagreementamongthereviewers(Kappacoefficient, Kendall’s coefficient of concordance, Lin’sconcordance correlation coefficient) and 95% CIs aresummarized in table 1. The R component (coefficient of concordance 0.87) and tumor size (concordancecorrelation coefficient 0.92) itself demonstrated al-most perfect agreement among the reviewers. The Nand L components and the NS demonstrated sub-stantial agreement (coefficient of concordance 0.61,0.70, respectively), while the E and A componentsand the assessment of hilar location demonstratedmoderate agreement (concordance correlation coef-ficient 0.56, 0.56, respectively). For example, for theE component raters often disagreed, with the num-ber of cases assigned 1 point on this componentranging from a low of 26 by 1 reviewer to a high of 59by another reviewer.When all 6 reviewers were evaluated, the concor-dance correlation coefficient (95% CI) for NS was0.75 (0.69–0.81). The concordance correlation coef-ficients (95% CIs) for NS after excluding the medicalstudent, after excluding the medical student and theradiology resident, and after excluding the medicalstudent, the radiology resident and the urology fel-low were 0.76 (0.70–0.82), 0.74 (0.67–0.81) and 0.72(0.64–0.80), respectively.Concordance was highest on the R component.Therefore, we stratified agreement by the average Rscore across reviewers to evaluate if agreement wasaffected by increasing tumor size. Among the 41 NEPHROMETRY SCORE AND PREDICTION OF LONG-TERM OUTCOMES 1224  patients in the R1 subset (0 to 4 cm) the concordancecorrelation coefficient (95% CI) for NS was 0.67(0.56–0.78). For the 27 patients in R2 (4 to 7 cm)and 27 in R3 (greater than 7 cm) the coefficients and95% CIs were 0.51 (0.35–0.67) and 0.41 (0.24–0.58),respectively (table 2). The measures of agreement for the R, E, N, A and L components, as well as hilarlocation, NS and tumor size for the 77 patients withpreoperative CT are shown in table 1, as are the summaries for the 18 patients with preoperativeMRI. NS, Surgery Type, andPathological Outcomes and Survival For each reviewer NS in the 77 patients with organconfined RCC was significantly higher for the 50treated with RN vs the 27 treated with a nephronsparing approach (table 3). Of the 95 patients stud- ied 86 (91%) had RCC on final pathology, of whom 20died of RCC at a mean of 3.1 years after surgery(median 2.0, range 0.2 to 9.1). The majority of pa-tients had lower grade tumors (66 [77%]) grades 1 or2, 59 clear cell histology [69%]). In terms of patho-logical stage 59 cases were pT1 (69%), 17 pT2 (20%)and 10 pT3 (12%). Mean NS tended to increase withincreasing pathological stage, even when excluding the R component (p   0.001, table 4). Higher grade tumors also had a higher mean NS compared withlower grade tumors (9.8 vs 8.1, p  0.001). However,in this cohort clear cell histology and whether thetumor was malignant or benign were not associatedwith higher NS.Of the 95 patients studied 49 died at a mean of 4.2years after surgery (median 4.0, range 0.1 to 9.3). Among the 46 patients still alive at last followup themean followup was 9.0 years (median 9.7, range 0.1to 10.8). The hazard ratio for the association of meanNS across all reviewers with death was 1.24 (95% CI1.04–1.48, p    0.017). There were 77 patients withM0 RCC at surgery. The HR for the association of mean NS with death from RCC in this subset was2.83 (1.53–5.24, p  0.001). Of these patients distantmetastasis developed in 17 at a mean of 2.6 yearsafter surgery (median 1.4, range 0.3 to 8.9). The HRfor the association of mean NS with distant metas-tasis was 3.27 (95% CI 1.86–5.76, p  0.001).Since increasing size was previously demon-strated to increase the risk of metastasis and deathfrom RCC, 20,21 the R component (based on tumorsize) of the NS was subtracted from the total NS to Table 1.  Agreement in R.E.N.A.L. NS among reviewers  Feature Agreement 95% CIEntire cohort preop CT or MRI:R 0.87 0.81–0.92E 0.56 0.47–0.62N 0.61 0.49–0.72A 0.56 0.48–0.62L 0.70 0.63–0.76Hilar 0.57 0.47–0.65NS 0.75 0.69–0.81Size 0.92 0.89–0.94Preop CT only:R 0.88 0.82–0.93E 0.57 0.47–0.66N 0.64 0.51–0.72A 0.55 0.47–0.64L 0.71 0.63–0.78Hilar 0.54 0.43–0.62NS 0.76 0.69–0.82Size 0.89 0.86–0.93Preop MRI only:R 0.84 0.66–0.95E 0.49 0.32–0.64N 0.32 0.13–0.51A 0.59 0.32–0.77L 0.58 0.41–0.75Hilar 0.60 0.37–0.78NS 0.62 0.44–0.81Size 0.93 0.88–0.98 Table 2.  Concordance correlation coefficients for NS excluding the R component stratified according to increasing tumor size  Feature Agreement 95% CIAgreement in R1:E 0.69 0.48–0.77N 0.63 0.50–0.78A 0.59 0.48–0.72L 0.65 0.53–0.75Hilar 0.52 0.33–0.69NS total 0.67 0.56–0.78Agreement in R2:E 0.44 0.24–0.56N 0.47 0.26–0.70A 0.44 0.26–0.58L 0.67 0.52–0.81Hilar 0.37 0.17–0.58NS total 0.51 0.35–0.67Agreement in R3:E 0.50 0.35–0.63N 0.27 0.14–0.42A 0.61 0.46–0.79L 0.57 0.36–0.77Hilar 0.51 0.28–0.66NS total 0.41 0.24–0.58 Table 3.  Comparisons of R.E.N.A.L. NS between patients treated with and without RN  Mean NS (median; range)p ValueNo RN Yes RNStaff radiologist 7.3 (7; 4–10) 9.1 (9; 4–12)   0.001Radiology resident 7.3 (8; 4–10) 9.0 (9; 4–11)   0.001Staff urologist 7.7 (7; 4–10) 9.4 (10; 4–11)   0.001Staff urologist 6.5 (6; 4–11) 8.3 (8.5; 4–12) 0.001Urology fellow 7.3 (7; 4–12) 9.5 (9; 4–12)   0.001Medical student 7.1 (7; 4–11) 8.8 (9; 4–12) 0.001 NEPHROMETRY SCORE AND PREDICTION OF LONG-TERM OUTCOMES  1225  evaluate the collective contribution of the other com-ponents to patient outcomes. In those patients withlocalized, nonmetastatic RCC (77), NS (after re-moval of the R component) remained associated withincreased risk of distant metastasis (HR 3.15, 95%CI 1.58–6.28, p  0.001), death from RCC (HR 2.44,95% CI 1.20–4.96, p    0.014) and death from anycause (HR 1.21, 95% CI 0.94–1.55, p    0.14), al-though this was not statistically significant. DISCUSSION To date, renal tumor size has been the primaryanatomical characteristic used to describe renal tu-mors in most surgical series. While investigatorshave attempted to establish the importance of otheranatomical features of renal tumors such as central-ity, polarity etc, the definitions of these featureshave not been consistent and the results have beenmixed. 22,23 Size, defined as the largest single diam-eter of the tumor in any direction, is one of the mostreproducible parameters in describing a renal tumorand agreement was nearly perfect in the currentstudy (concordance correlation coefficient 0.92). Agreement of the R component of the NS was alsonearly perfect (coefficient of concordance 0.87). Al-though size supplies a tremendous amount of infor-mation about the risk of malignancy, tumor gradeand metastatic potential, 20,21 nevertheless, it doesnot capture all the information a surgeon uses whendeciding which operation to perform, nor does itallow fair comparisons of tumors among series atdifferent institutions.There is a great need for a reproducible, stan-dardized scoring system that accurately classifiesincreasingly complex renal tumors. Such a systemwould allow better quantitative comparisons among treatment types and institutions. It would likelyhave other benefits as well, such as prediction of renal function loss, intraoperative blood loss, risk of urine leak, and possibly the risk of malignancy andcancer specific outcomes. The R.E.N.A.L. NS hasbeen proposed to be such a model, 19 and the authorsshould be recognized for the attempt to standardizecommunication about renal tumors. However, ourenthusiasm for such a scoring system must not over-run a proper evaluation of the system. In this studywe evaluated NS with regard to inter-revieweragreement and its ability to predict meaningful clin-ical outcomes.We were able to demonstrate that agreement inoverall NS was fairly robust among a wide range of reviewers from different levels of training and spe-cialty (concordance correlation coefficient 0.75).When excluding those in training or medical school,the staff physicians had comparable but not im-proved agreement (concordance correlation coeffi-cient 0.72). These findings are encouraging, suggest-ing that others, apart from staff physicians, couldreliably score tumors with minimal staff physiciansupervision.Not all components demonstrated high levels of agreement, particularly in tumors scored from a pre-operative MRI. It appears that quantification of thedistance between the collecting system and the tu-mor, ie the N component, on MRIs from a decade agoproved to be the most difficult (coefficient of concor-dance 0.32), which corresponds to only fair agree-ment (table 1). Agreement for the N component was better on CT (coefficient of concordance 0.64). How-ever, we note that the scans we used were a decadeold and a more contemporary series, particularlywith MRI, may likely prove otherwise. Nevertheless,this component may be an area in which more in- vestigation or instruction is needed to achieve betteragreement and reproducibility.The endophytic, anterior/posterior and hilar com-ponents of the NS only had moderate agreementamong the reviewers for all 95 patients. With par-ticularly large tumors, determining the relative en-dophytic vs exophytic component can be quite diffi-cult, accounting for decreased agreement among observers. Furthermore, large tumors often distortthe anatomy and it may be difficult to identify thecollecting systems for the N component (table 2). Table 4.  Pathological features and mean NS with and without the R component  Mean (median; range) p Value NS with R component  Tumor type:RCC 8.5 (8.8; 4.0–11.3) 0.12Benign 7.3 (7.3; 4.0–9.7)Tumor stage:pT1 7.8 (8.3; 4.0–10.7)   0.001pT2 10.0 (10.2; 7.8–11.3)pT3 10.0 (10.5; 5.8–11.2)Grade:1, 2 8.1 (8.5; 4.0–11.0)   0.0013, 4 9.8 (10.2; 5.8–11.3)Subtype:Clear cell 8.6 (8.8; 4.5–11.3) 0.39Other 8.2 (8.7; 4.0–10.7) NS without R component  Tumor type:RCC 6.6 (7.0; 3.0–8.7) 0.35Benign 6.1 (6.3; 3.0–8.7)Tumor stage:pT1 6.3 (6.7; 3.0–8.5) 0.015pT2 7.0 (7.2; 5.5–8.3)pT3 7.3 (7.8; 4.2–8.7)Grade:1, 2 6.4 (6.7; 3.0–8.3) 0.0173, 4 7.2 (7.4; 4.2–8.7)Subtype:Clear cell 6.7 (7.2; 3.5–8.7) 0.11Other 6.3 (6.8; 3.0–8.5) NEPHROMETRY SCORE AND PREDICTION OF LONG-TERM OUTCOMES 1226  The NS was srcinally designed to have higherscores represent more complex tumors. Indeed weobserved that patients with increasing NS weremore likely to be treated with RN. Median NS wasnearly 2 points higher for patients treated with RNcompared with those treated with a nephron sparing approach (9.0 vs 7.2). However, not all of the com-ponents of the NS accurately reflect increasing com-plexity with increasing score. For example, the Escore often violates this pattern. In other words, alarge, 10 cm tumor that completely replaces thelower pole of the kidney and bulges in a nonreniformpattern would not be technically scored with thehighest score on the E component, although thislarge tumor would be much more challenging than asmall, completely endophytic 1 cm lower pole tumor,which would get the highest score on the E compo-nent.We also observed only moderate agreement withhilar designation. Interpreting the hilar vascularanatomy in the presence of a large renal mass canalso be technically challenging and yield variation inthe hilar scoring (table 2). But in a similar fashion, a tumor that touches the first branching segments of the main renal vessels may be just as challenging orcomplex and reviewers may be subconsciously resis-tant to not score these types of tumors with the hilardesignation.Since the scoring of many of these componentsappears to be more difficult with increasing tumorsize, we stratified the scoring system according toR component (table 2). This demonstrates thatagreement worsens substantially as tumor sizeincreases, particularly on the E, N and hilar com-ponents. However, we note that the radiographicimages used for our study were 10 years old. Wehypothesize that agreement is likely to be betterin a more contemporary series given the improve-ments in imaging quality. In addition, recent es-timates suggest that close to 80% of renal tumorsin the United States are 7 cm or less at presenta-tion. 24 However, in our series only 72% were 7 cmor less and 9 patients had metastatic disease atpresentation.It appears that increasing NS is significantlyassociated with pathological stage, nuclear grade,risk of metastasis, death from RCC and deathfrom any cause (table 4). For example, for each unit increase in mean NS across all reviewers,there was an associated 3.3-fold increased risk of metastatic disease (HR 3.27, p   0.001) and 2.8-fold increased risk of death from RCC (HR 2.83,p   0.001). This finding was not just driven byrenal tumor size. We removed the R component,which is based on size, from the NS. The abbrevi-ated NS was still associated with the risk of dis-tant metastasis (HR 3.15, p    0.001) and deathfrom RCC (HR 2.4, p    0.014). This suggests thatanatomical details of renal tumors, apart fromsize, may be associated with metastatic potential. A similar pattern was observed with overall sur- vival. The relatively small numbers of events pre-cluded multivariable or subcomponent analysis.Nevertheless, these hypothesis-generating resultsare intriguing and merit further study.We did not evaluate the other recently proposedstandardized anatomical descriptions of renal tu-mors, namely the PADUA  25 and the C index. 26  Al-though the C index is meritorious for its simplicity,our findings suggest that other anatomical data(apart from size) offer important prognostic informa-tion that is not likely captured in the C index. ThePADUA has some components that may have higheragreementsuchasrimvshilarlocationintheR.E.N.A.L.system. However, it shares most components withNS. Therefore, the areas where interobserver agree-ment was low, such as the E component, would notbe remedied by the PADUA system.This study is limited by the quality of the imagesused to measure agreement. We purposefully choseolder scans so we could evaluate the survival out-comes and we noted decreased image quality com-pared to current imaging. However, even with thesedrawbacks, interobserver agreement for overallscore was still high and we suspect it would improvewith more contemporary images. The study is alsolimited by the single institution, retrospective na-ture and, therefore, it should be evaluated by otherinvestigators. CONCLUSIONS Nephrometry scoring with minimal initial instruc-tion was fairly robust across specialties and levels of training. However, modest agreement in some of thecomponents of the NS, particularly as the tumor sizeincreases, suggests opportunities for improvementwith additional experience. The additional anatom-ical information NS adds to size alone may be asso-ciated with other important clinical outcomes suchas pathological stage, nuclear grade and survival,and warrants further study. REFERENCES 1. Abouassaly R, Alibhai SM, Tomlinson G et al:Unintended consequences of laparoscopic sur-gery on partial nephrectomy for kidney cancer.J Urol 2010;  183:  467.2. Hollenbeck BK, Taub DA, Miller DC et al: Na-tional utilization trends of partial nephrectomy for NEPHROMETRY SCORE AND PREDICTION OF LONG-TERM OUTCOMES  1227
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks