metodology of non inferiority.pdf

Review Methodology of superiority vs. equivalence trials and non-inferiority trials Erik Christensen * Clinic of Internal Medicine I, Bispebjerg University Hospital, Bispebjerg Bakke 23, DK-2400 Copenhagen NV, Copenhagen, Denmark The randomized clinical trial (RCT) is generally accepted as the best method of comparing effects of therapies. Most often the aim of an RCT is to show that a new therapy is superior to an established therapy or placebo, i.e. they
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Review Methodology of superiority vs. equivalence trialsand non-inferiority trials Erik Christensen * Clinic of Internal Medicine I, Bispebjerg University Hospital, Bispebjerg Bakke 23, DK-2400 Copenhagen NV, Copenhagen, Denmark  The randomized clinical trial (RCT) is generally accepted as the best method of comparing effects of therapies. Mostoften the aim of an RCT is to show that a new therapy is superior to an established therapy or placebo, i.e. they are plannedand performed as superiority trials. Sometimes the aim of an RCT is just to show that a new therapy is not superior butequivalent to or not inferior to an established therapy, i.e. they are planned and performed as equivalence trials or non-inferiority trials. Since the types of trials have different aims, they differ significantly in various methodological aspects.The awareness of the methodological differences is generally quite limited. This paper reviews the methodology of thesetypes of trials with special reference to differences in respect to planning, performance, analysis and reporting of the trial.In this context the relevant basal statistical concepts are reviewed. Some of the important points are illustrated byexamples.   2007 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.1. Introduction The randomized clinical trial (RCT) is generallyaccepted as the best method of comparing effects of ther-apies [1,2]. Most often the aim of an RCT is to showthat a new therapy is superior to an established therapyor placebo, i.e. they are planned and performed as supe-riority trials. Sometimes the aim of an RCT is just toshow that a new therapy is not superior but equivalentto or not inferior to an established therapy, i.e. theyare planned and performed as equivalence trials ornon-inferiority trials [3]. Since these types of trials havedifferent aims, they differ significantly in various meth-odological aspects [4]. The awareness of the methodo-logical differences is generally quite limited. Forexample it is a rather common belief that failure of find-ing a significant difference between therapies in a superi-ority trial implies that the therapies have the same effector are equivalent [5–10]. However, such a conclusion isnot correct because of a considerable risk of overlookinga clinically relevant effect due to insufficient sample size.The purpose of this paper is to review the method-ology of the different types of trials, with special refer-ence to differences in respect to planning, performance,analysis and reporting of the trial. In this context therelevant basal statistical concepts will be reviewed.Some of the important points will be illustrated byexamples. 2. Superiority trials  2.1. Sample size estimation and power of an RCT  An important aspect in the planning of any RCT is toestimate the number of patients necessary i.e. the samplesize. The various types of trials differ in this respect[1,2,11]. A superiority trial aims to demonstrate thesuperiority of a new therapy compared to an establishedtherapy or placebo. The following description applies toa superiority trial. The features, by which an equivalenceor a non-inferiority trial differ, will be described later. 0168-8278/$32.00    2007 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.doi:10.1016/j.jhep.2007.02.015 * Tel.: +45 3531 2854; fax: +45 3531 3556. E-mail address: Journal of Hepatology 46 (2007) 947–954  To estimate the sample size one needs to consider someimportant aspects described in the following.By how much should the new therapy be better thanthe reference therapy? This extra effect of the new com-pared to the reference therapy is called the Least Rele-vant Difference or the Clinical Significance. It is oftendenoted by the Greek letter  D  (Fig. 1).By how much would the difference in effect betweenthe two groups be influenced by random factors? Likeany other biological measurement a treatment effect issubject to a considerable ‘‘random’’ variation, whichneeds to be determined and taken into account. Themagnitude of the variation is described in statisticalterms by the standard deviation  S   or the variance  S  2 (see Fig. 1c). The variance of the effect variable wouldneed to be obtained from a pilot study or from previ-ously published similar studies. The trial should dem-onstrate as precisely as possible the true difference ineffect between the treatments. However, because of the random variation the final result of the trial maydeviate from the true difference and give erroneousresults. If for example the null hypothesis  H  0  of nodifference were true, it could be still that the trial insome cases would show a difference. This type of error – the type 1 error (‘‘false positive’’) (Fig. 1) – wouldhave the consequence of introducing an ineffectivetherapy. If on the other hand the alternative hypothe-sis  H  D  of the difference being  D  were true, the trialcould in some cases fail to show a difference. This typeof error – the type 2 error (‘‘false negative’’) (Fig. 1) – would have the consequence of rejecting an effectivetherapy.Thus one needs to specify how large risks of type 1and type 2 errors would be acceptable for the trial. Ide-ally the type 1 and type 2 error risks should be near zero,but this would need extremely large trials. Limitedresources and patient numbers make it necessary toaccept some small risk of type 1 and 2 errors.Most often the type 1 error risk  a  would be specifiedto 5%. In this paper,  a  means the type 1 error risk in onedirection i.e. either up or down from  H  0  i.e.  a  = 5%.However, in many situations one would be interestedin detecting both beneficial and harmful effects of thenew therapy compared to the control therapy, i.e. onewould be interested in ‘‘two-sided’’ testing for a differ-ence in both ‘‘upward’’ and ‘‘downward’’ direction(Fig. 1). Hence we would instead specifiy the type 1 errorrisk to be 2 a  (i.e.  a upwards  +  a downwards ), i.e. 2 a  = 5%.The type 2 error risk  b  would normally be specifiedto 10-20%. Since a given value of   D  is always eitherabove or below zero ( H  0 ), the type 2 error risk  b  isalways one-sided. The smaller  b , the larger the com-plementary probability 1  b  of accepting  H  D  whenit is in fact true. 1  b  is called the power of the trialbecause it states the probability of finding  D  if thisdifference truly exists.From given values of   D ,  S  2 ,  a  and  b  the needed num-ber ( N  ) of patients in each group can be estimated usingthis relatively simple general formula: Fig. 1. Illustration of factors influencing the sample size of a trial. Theeffect difference found in a trial will be subject to random variation. Thevariation is illustrated by bell-shaped normal distribution curves for adifference of zero corresponding to the null hypothesis ( H  0 ) and for adifference of   D  corresponding to the alternative hypothesis ( H  D ),respectively. Defined areas under the curves indicate the probability of a given difference being compatible with  H  0  or  H  D , respectively. If thedifference lies near  H  0 , one would accept  H  0 . The farther the differencewould be from H  0 , the less probable H  0  would be. If the probability of  H  0 becomes very small (less than the specified type 1 error) risk 2 a  (being  a in either tail of the curve) one would reject  H  0.  The sample distributioncurves show some overlap. A large overlap will result in considerable riskof interpretation error, in particular the type 2 error risk may besubstantial as indicated in the figure. An important issue would be toreduce the type 2 error risk  b  (and increase the power 1  b ) to areasonable level. Three ways of doing that are shown in (b–d), a being areference situation. (b) Isolated increase of 2 a  will decrease  b  andincrease power. Conversely, isolated decrease of 2 a  will increase  b  anddecrease power. (c) Isolated narrowing of the sample distribution curves – by increasing sample size 2 N  and/or decreasing variance of the difference S  2  – will decrease  b  and increase power. Conversely, isolated widening of the sample distribution curves – by decreasing sample size and/orincreasing variance of the difference – will increase  b  and decrease power.(d) Isolated increase of   D  – larger therapeutic effect – will decrease  b  andincrease power. Conversely, isolated decrease of   D  – smaller therapeuticeffect – will increase  b  and decrease power. 948  E. Christensen / Journal of Hepatology 46 (2007) 947–954   N   ¼ð Z  2 a þ Z  b Þ 2  S  2 = D 2 ; where  Z  2 a  and  Z  b  are the standardized normal deviatescorresponding to the levels of the defined values of 2 a (Table 1, left), and  b  (Table 1, right), respectively. If for some reason one wants to test for difference in onlyone direction (‘‘one-sided’’ testing) one should replace Z  2 a  with  Z  a  in the formula and apply the right side of Table 1. The formula is approximate, but it gives inmost cases a good estimation of the necessary numberof patients. For a trial with two parallel groups of equalsize the total sample size will be 2 N  .The values used for 2 a ,  b  and  D  should be decided bythe researcher, not by the statistician. The values chosenshould take into account the disease, its stage, the effec-tiveness and side effects of the control therapy and anestimate of how much extra effect may be reasonablyexpected by the new therapy.If for example the disease is rather benign with a rel-atively good prognosis and the new therapy is moreexpensive and may have more side effects than a rathereffective control therapy, one should specify a relativelylarger  D  and  b  and a smaller 2 a , because the new therapywould only be interesting if it is markedly better than thecontrol therapy.If on the other hand the disease is aggressive, the newtherapy is less expensive or may have less side effectsthan a not very effective control therapy, one shouldspecify a relatively smaller  D  and  b  and a larger 2 a ,because the new therapy would be interesting even if itis only slightly better than the control therapy.As mentioned above 2 a  would normally be specifiedto 5% or 0.05, but one may justify values of 0.10 or0.01 in certain situations as mentioned above.  b  wouldnormally be specified to 0.10–0.20, but in special situa-tions a higher or lower value may be justified.  D  shouldbe decided on clinical grounds as the least relevant ther-apeutic gain of the new therapy considering the diseaseand its prognosis, the efficacy of the control therapyand what may reasonably be expected of the new ther-apy. Preliminary data from pilot studies or historicalobservational data can be guidelines for the choice of  D . Even if it may be tempting to specify a relatively large D  as fewer patients will then be needed,  D  should neverbe specified larger than what is biologically reasonable.It will always be unethical to perform trials with unreal-istic aims. Fig. 1 illustrates the effects on the type 2 errorrisk  b  and hence also on the power (1  b ) of changing2 a ,  N  ,  S  2 and  D . Thus  b  will be decreased and the power1  b  will be increased if 2 a  is increased (Fig. 1b), if thesample size is increased (Fig. 1c), and if   D  is increased(Fig. 1d).The estimated sample size should be increased in pro-portion to the expected loss of patients during follow-updue to drop-outs and withdrawals.  2.2. The confidence interval  An important concept indicating the confidence of the result obtained in an RCT is the width of the confi-dence interval of the difference  D  in effect between thetherapies investigated [1,2]. The narrower the confidence Table 1Abbreviated table of the standardized normal distribution (adapted for this paper) Two-sidedprobabilityOne-sided probability Z  2 a  2 a  Z  a  or  Z  b  a  or  b  Z  a  or  Z  b  a  or  b 3.72 0.0002 3.72 0.0001 0.00 0.503.29 0.001 3.29 0.0005   0.13 0.553.09 0.002 3.09 0.001   0.25 0.602.58 0.01 2.58 0.005   0.39 0.652.33 0.02 2.33 0.010   0.52 0.701.96 0.05 1.96 0.025   0.67 0.751.64 0.1 1.64 0.05   0.84 0.801.28 0.2 1.28 0.10   1.04 0.851.04 0.3 1.04 0.15   1.28 0.900.84 0.4 0.84 0.20   1.64 0.950.67 0.5 0.67 0.25   1.96 0.9750.52 0.6 0.52 0.30   2.33 0.9900.39 0.7 0.39 0.35   2.58 0.9950.25 0.8 0.25 0.40   3.09 0.9990.13 0.9 0.13 0.45   3.29 0.99950.00 1.0 0.00 0.50   3.72 0.9999 Note.  The total area under the normal distribution curve is one. The area under a given part of the curve gives the probability of an observation beingin that part. The  y -axis indicates the ‘‘probability density’’, which is highest in the middle of the curve and decreases in either direction toward thetails of the curve. The normal distribution is symmetric, i.e. the probability from  Z   to plus infinity (right side of the table) is the same as from  Z   to 1 . The right side of the table gives the one-sided probability from a given  Z  -value on the  x -axis to + 1 . The left side of the table gives the two-sidedprobability as the sum of the probability from a given positive Z-value to + 1 and the probability from the corresponding negative  Z  -value to 1 . E. Christensen / Journal of Hepatology 46 (2007) 947–954  949  interval would be, the more reliable the result would be.In general the width of the confidence interval is deter-mined by the sample size. A large sample size wouldresult in a narrow confidence interval. Normally the95% confidence interval would be estimated. The 95%confidence interval is the interval, which would on aver-age include the true difference in 95 out of 100 similarstudies. This is illustrated in Fig. 2 where 100 trial sam-ples of the same size have been randomly drawn fromthe same population. It is important to note that in 5of the 100 samples the 95% confidence interval of thedifference in effect  D  does not include the true differencefound in the population. When the sorted confidenceintervals are aligned to their middle (Fig. 2c), the varia-tion in relation to the true value in the populationbecomes even clearer. If simulation is carried out onan even greater scale, the likelihood distribution of thetrue difference in the population, given the results froma certain trial sample, will follow a normal distributionlike that presented in Fig. 3 [2]. It is seen that the likeli- hood of the true difference in the population is maxi-mum at the difference  D  found in the sample and thatit decreases with higher and lower values. The figure alsoillustrates the 95% confidence interval, which is theinterval that includes the middle 95% of the total likeli-hood area under the normal curve. This area can be cal-culated from the difference  D  and its standard errorSED. To be surer that the true difference is included inthe confidence interval, one may calculate a 99% confi-dence interval, which would be wider, since it shouldinclude the middle 99% of the total likelihood area.  2.3. The type 2 error risk of having overlooked a difference  D If the 95% confidence interval of   D  includes zero,then there is no significant difference in effect betweenthe two therapies. However, this does not mean thatone can conclude that the effects of the therapies arethe same. There may still be a true difference in effectbetween the therapies, which the RCT has just not beenable to detect e.g. because of insufficient sample size andpower. The risk of having overlooked a certain differ-ence in effect of   D  between the therapies is the type 2error risk  b . In some cases this risk may be substantial.Example 1 gives an illustration of this. Example 1.  In naı¨ve cases of chronic hepatitis Cgenotype 1 pegylated interferon plus ribavirin for 3months induce sustained virologic response in about40%. One wishes to test if a new therapeutic regimen canincrease the sustained response in this type of patients to Fig. 2. Illustration of the variation of confidence limits in randomsamples (computer simulation). (a) ninety-five percent confidence inter-vals in 100 random samples of same size from the same populationaligned according to the true value in the population. In 5 of the samplesthe 95% confidence interval does not include the true value found in thepopulation. (b) The same confidence intervals are here sorted accordingto their values. (c) When the sorted confidence intervals are aligned totheir middle, their variation in relation to the true value in the populationis again clearly seen. This presentation corresponds to how investigatorswould see the world. They investigate samples in order to extrapolate thefindings to the population. However, the potential imprecision of extrapolating from a sample to the population is apparent – especiallyif the confidence interval is wide. Thus keeping confidence intervals rathernarrow is important. This would mean relatively large trials.Fig. 3. (a) Histogram showing the distribution of the true difference inthe population in relation to the difference  D  found in the trial sample(computer simulation of 10,000 samples). (b) The normally distributedlikelihood curve of the true difference in the population in relation to thedifference D found in a trial sample. The 95% confidence interval (CI) isshown. 950  E. Christensen / Journal of Hepatology 46 (2007) 947–954
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks