Description

Review
Methodology of superiority vs. equivalence trials
and non-inferiority trials
Erik Christensen
*
Clinic of Internal Medicine I, Bispebjerg University Hospital, Bispebjerg Bakke 23, DK-2400 Copenhagen NV, Copenhagen, Denmark
The randomized clinical trial (RCT) is generally accepted as the best method of comparing eﬀects of therapies. Most
often the aim of an RCT is to show that a new therapy is superior to an established therapy or placebo, i.e. they

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

Review
Methodology of superiority vs. equivalence trialsand non-inferiority trials
Erik Christensen
*
Clinic of Internal Medicine I, Bispebjerg University Hospital, Bispebjerg Bakke 23, DK-2400 Copenhagen NV, Copenhagen, Denmark
The randomized clinical trial (RCT) is generally accepted as the best method of comparing eﬀects of therapies. Mostoften the aim of an RCT is to show that a new therapy is superior to an established therapy or placebo, i.e. they are plannedand performed as superiority trials. Sometimes the aim of an RCT is just to show that a new therapy is not superior butequivalent to or not inferior to an established therapy, i.e. they are planned and performed as equivalence trials or non-inferiority trials. Since the types of trials have diﬀerent aims, they diﬀer signiﬁcantly in various methodological aspects.The awareness of the methodological diﬀerences is generally quite limited. This paper reviews the methodology of thesetypes of trials with special reference to diﬀerences in respect to planning, performance, analysis and reporting of the trial.In this context the relevant basal statistical concepts are reviewed. Some of the important points are illustrated byexamples.
2007 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.1. Introduction
The randomized clinical trial (RCT) is generallyaccepted as the best method of comparing eﬀects of ther-apies [1,2]. Most often the aim of an RCT is to showthat a new therapy is superior to an established therapyor placebo, i.e. they are planned and performed as supe-riority trials. Sometimes the aim of an RCT is just toshow that a new therapy is not superior but equivalentto or not inferior to an established therapy, i.e. theyare planned and performed as equivalence trials ornon-inferiority trials [3]. Since these types of trials havediﬀerent aims, they diﬀer signiﬁcantly in various meth-odological aspects [4]. The awareness of the methodo-logical diﬀerences is generally quite limited. Forexample it is a rather common belief that failure of ﬁnd-ing a signiﬁcant diﬀerence between therapies in a superi-ority trial implies that the therapies have the same eﬀector are equivalent [5–10]. However, such a conclusion isnot correct because of a considerable risk of overlookinga clinically relevant eﬀect due to insuﬃcient sample size.The purpose of this paper is to review the method-ology of the diﬀerent types of trials, with special refer-ence to diﬀerences in respect to planning, performance,analysis and reporting of the trial. In this context therelevant basal statistical concepts will be reviewed.Some of the important points will be illustrated byexamples.
2. Superiority trials
2.1. Sample size estimation and power of an RCT
An important aspect in the planning of any RCT is toestimate the number of patients necessary i.e. the samplesize. The various types of trials diﬀer in this respect[1,2,11]. A superiority trial aims to demonstrate thesuperiority of a new therapy compared to an establishedtherapy or placebo. The following description applies toa superiority trial. The features, by which an equivalenceor a non-inferiority trial diﬀer, will be described later.
0168-8278/$32.00
2007 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.doi:10.1016/j.jhep.2007.02.015
*
Tel.: +45 3531 2854; fax: +45 3531 3556.
E-mail address:
ec05@bbh.hosp.dk
www.elsevier.com/locate/jhep
Journal of Hepatology 46 (2007) 947–954
To estimate the sample size one needs to consider someimportant aspects described in the following.By how much should the new therapy be better thanthe reference therapy? This extra eﬀect of the new com-pared to the reference therapy is called the Least Rele-vant Diﬀerence or the Clinical Signiﬁcance. It is oftendenoted by the Greek letter
D
(Fig. 1).By how much would the diﬀerence in eﬀect betweenthe two groups be inﬂuenced by random factors? Likeany other biological measurement a treatment eﬀect issubject to a considerable ‘‘random’’ variation, whichneeds to be determined and taken into account. Themagnitude of the variation is described in statisticalterms by the standard deviation
S
or the variance
S
2
(see Fig. 1c). The variance of the eﬀect variable wouldneed to be obtained from a pilot study or from previ-ously published similar studies. The trial should dem-onstrate as precisely as possible the true diﬀerence ineﬀect between the treatments. However, because of the random variation the ﬁnal result of the trial maydeviate from the true diﬀerence and give erroneousresults. If for example the null hypothesis
H
0
of nodiﬀerence were true, it could be still that the trial insome cases would show a diﬀerence. This type of error – the type 1 error (‘‘false positive’’) (Fig. 1) – wouldhave the consequence of introducing an ineﬀectivetherapy. If on the other hand the alternative hypothe-sis
H
D
of the diﬀerence being
D
were true, the trialcould in some cases fail to show a diﬀerence. This typeof error – the type 2 error (‘‘false negative’’) (Fig. 1) – would have the consequence of rejecting an eﬀectivetherapy.Thus one needs to specify how large risks of type 1and type 2 errors would be acceptable for the trial. Ide-ally the type 1 and type 2 error risks should be near zero,but this would need extremely large trials. Limitedresources and patient numbers make it necessary toaccept some small risk of type 1 and 2 errors.Most often the type 1 error risk
a
would be speciﬁedto 5%. In this paper,
a
means the type 1 error risk in onedirection i.e. either up or down from
H
0
i.e.
a
= 5%.However, in many situations one would be interestedin detecting both beneﬁcial and harmful eﬀects of thenew therapy compared to the control therapy, i.e. onewould be interested in ‘‘two-sided’’ testing for a diﬀer-ence in both ‘‘upward’’ and ‘‘downward’’ direction(Fig. 1). Hence we would instead speciﬁy the type 1 errorrisk to be 2
a
(i.e.
a
upwards
+
a
downwards
), i.e. 2
a
= 5%.The type 2 error risk
b
would normally be speciﬁedto 10-20%. Since a given value of
D
is always eitherabove or below zero (
H
0
), the type 2 error risk
b
isalways one-sided. The smaller
b
, the larger the com-plementary probability 1
b
of accepting
H
D
whenit is in fact true. 1
b
is called the power of the trialbecause it states the probability of ﬁnding
D
if thisdiﬀerence truly exists.From given values of
D
,
S
2
,
a
and
b
the needed num-ber (
N
) of patients in each group can be estimated usingthis relatively simple general formula:
Fig. 1. Illustration of factors inﬂuencing the sample size of a trial. Theeﬀect diﬀerence found in a trial will be subject to random variation. Thevariation is illustrated by bell-shaped normal distribution curves for adiﬀerence of zero corresponding to the null hypothesis (
H
0
) and for adiﬀerence of
D
corresponding to the alternative hypothesis (
H
D
),respectively. Deﬁned areas under the curves indicate the probability of a given diﬀerence being compatible with
H
0
or
H
D
, respectively. If thediﬀerence lies near
H
0
, one would accept
H
0
. The farther the diﬀerencewould be from
H
0
, the less probable
H
0
would be. If the probability of
H
0
becomes very small (less than the speciﬁed type 1 error) risk 2
a
(being
a
in either tail of the curve) one would reject
H
0.
The sample distributioncurves show some overlap. A large overlap will result in considerable riskof interpretation error, in particular the type 2 error risk may besubstantial as indicated in the ﬁgure. An important issue would be toreduce the type 2 error risk
b
(and increase the power 1
b
) to areasonable level. Three ways of doing that are shown in (b–d), a being areference situation. (b) Isolated increase of 2
a
will decrease
b
andincrease power. Conversely, isolated decrease of 2
a
will increase
b
anddecrease power. (c) Isolated narrowing of the sample distribution curves – by increasing sample size 2
N
and/or decreasing variance of the diﬀerence
S
2
– will decrease
b
and increase power. Conversely, isolated widening of the sample distribution curves – by decreasing sample size and/orincreasing variance of the diﬀerence – will increase
b
and decrease power.(d) Isolated increase of
D
– larger therapeutic eﬀect – will decrease
b
andincrease power. Conversely, isolated decrease of
D
– smaller therapeuticeﬀect – will increase
b
and decrease power.
948
E. Christensen / Journal of Hepatology 46 (2007) 947–954
N
¼ð
Z
2
a
þ
Z
b
Þ
2
S
2
=
D
2
;
where
Z
2
a
and
Z
b
are the standardized normal deviatescorresponding to the levels of the deﬁned values of 2
a
(Table 1, left), and
b
(Table 1, right), respectively. If for some reason one wants to test for diﬀerence in onlyone direction (‘‘one-sided’’ testing) one should replace
Z
2
a
with
Z
a
in the formula and apply the right side of Table 1. The formula is approximate, but it gives inmost cases a good estimation of the necessary numberof patients. For a trial with two parallel groups of equalsize the total sample size will be 2
N
.The values used for 2
a
,
b
and
D
should be decided bythe researcher, not by the statistician. The values chosenshould take into account the disease, its stage, the eﬀec-tiveness and side eﬀects of the control therapy and anestimate of how much extra eﬀect may be reasonablyexpected by the new therapy.If for example the disease is rather benign with a rel-atively good prognosis and the new therapy is moreexpensive and may have more side eﬀects than a rathereﬀective control therapy, one should specify a relativelylarger
D
and
b
and a smaller 2
a
, because the new therapywould only be interesting if it is markedly better than thecontrol therapy.If on the other hand the disease is aggressive, the newtherapy is less expensive or may have less side eﬀectsthan a not very eﬀective control therapy, one shouldspecify a relatively smaller
D
and
b
and a larger 2
a
,because the new therapy would be interesting even if itis only slightly better than the control therapy.As mentioned above 2
a
would normally be speciﬁedto 5% or 0.05, but one may justify values of 0.10 or0.01 in certain situations as mentioned above.
b
wouldnormally be speciﬁed to 0.10–0.20, but in special situa-tions a higher or lower value may be justiﬁed.
D
shouldbe decided on clinical grounds as the least relevant ther-apeutic gain of the new therapy considering the diseaseand its prognosis, the eﬃcacy of the control therapyand what may reasonably be expected of the new ther-apy. Preliminary data from pilot studies or historicalobservational data can be guidelines for the choice of
D
. Even if it may be tempting to specify a relatively large
D
as fewer patients will then be needed,
D
should neverbe speciﬁed larger than what is biologically reasonable.It will always be unethical to perform trials with unreal-istic aims. Fig. 1 illustrates the eﬀects on the type 2 errorrisk
b
and hence also on the power (1
b
) of changing2
a
,
N
,
S
2
and
D
. Thus
b
will be decreased and the power1
b
will be increased if 2
a
is increased (Fig. 1b), if thesample size is increased (Fig. 1c), and if
D
is increased(Fig. 1d).The estimated sample size should be increased in pro-portion to the expected loss of patients during follow-updue to drop-outs and withdrawals.
2.2. The conﬁdence interval
An important concept indicating the conﬁdence of the result obtained in an RCT is the width of the conﬁ-dence interval of the diﬀerence
D
in eﬀect between thetherapies investigated [1,2]. The narrower the conﬁdence
Table 1Abbreviated table of the standardized normal distribution (adapted for this paper)
Two-sidedprobabilityOne-sided probability
Z
2
a
2
a
Z
a
or
Z
b
a
or
b
Z
a
or
Z
b
a
or
b
3.72 0.0002 3.72 0.0001 0.00 0.503.29 0.001 3.29 0.0005
0.13 0.553.09 0.002 3.09 0.001
0.25 0.602.58 0.01 2.58 0.005
0.39 0.652.33 0.02 2.33 0.010
0.52 0.701.96 0.05 1.96 0.025
0.67 0.751.64 0.1 1.64 0.05
0.84 0.801.28 0.2 1.28 0.10
1.04 0.851.04 0.3 1.04 0.15
1.28 0.900.84 0.4 0.84 0.20
1.64 0.950.67 0.5 0.67 0.25
1.96 0.9750.52 0.6 0.52 0.30
2.33 0.9900.39 0.7 0.39 0.35
2.58 0.9950.25 0.8 0.25 0.40
3.09 0.9990.13 0.9 0.13 0.45
3.29 0.99950.00 1.0 0.00 0.50
3.72 0.9999
Note.
The total area under the normal distribution curve is one. The area under a given part of the curve gives the probability of an observation beingin that part. The
y
-axis indicates the ‘‘probability density’’, which is highest in the middle of the curve and decreases in either direction toward thetails of the curve. The normal distribution is symmetric, i.e. the probability from
Z
to plus inﬁnity (right side of the table) is the same as from
Z
to
1
. The right side of the table gives the one-sided probability from a given
Z
-value on the
x
-axis to +
1
. The left side of the table gives the two-sidedprobability as the sum of the probability from a given positive Z-value to +
1
and the probability from the corresponding negative
Z
-value to
1
.
E. Christensen / Journal of Hepatology 46 (2007) 947–954
949
interval would be, the more reliable the result would be.In general the width of the conﬁdence interval is deter-mined by the sample size. A large sample size wouldresult in a narrow conﬁdence interval. Normally the95% conﬁdence interval would be estimated. The 95%conﬁdence interval is the interval, which would on aver-age include the true diﬀerence in 95 out of 100 similarstudies. This is illustrated in Fig. 2 where 100 trial sam-ples of the same size have been randomly drawn fromthe same population. It is important to note that in 5of the 100 samples the 95% conﬁdence interval of thediﬀerence in eﬀect
D
does not include the true diﬀerencefound in the population. When the sorted conﬁdenceintervals are aligned to their middle (Fig. 2c), the varia-tion in relation to the true value in the populationbecomes even clearer. If simulation is carried out onan even greater scale, the likelihood distribution of thetrue diﬀerence in the population, given the results froma certain trial sample, will follow a normal distributionlike that presented in Fig. 3 [2]. It is seen that the likeli-
hood of the true diﬀerence in the population is maxi-mum at the diﬀerence
D
found in the sample and thatit decreases with higher and lower values. The ﬁgure alsoillustrates the 95% conﬁdence interval, which is theinterval that includes the middle 95% of the total likeli-hood area under the normal curve. This area can be cal-culated from the diﬀerence
D
and its standard errorSED. To be surer that the true diﬀerence is included inthe conﬁdence interval, one may calculate a 99% conﬁ-dence interval, which would be wider, since it shouldinclude the middle 99% of the total likelihood area.
2.3. The type 2 error risk of having overlooked a diﬀerence
D
If the 95% conﬁdence interval of
D
includes zero,then there is no signiﬁcant diﬀerence in eﬀect betweenthe two therapies. However, this does not mean thatone can conclude that the eﬀects of the therapies arethe same. There may still be a true diﬀerence in eﬀectbetween the therapies, which the RCT has just not beenable to detect e.g. because of insuﬃcient sample size andpower. The risk of having overlooked a certain diﬀer-ence in eﬀect of
D
between the therapies is the type 2error risk
b
. In some cases this risk may be substantial.Example 1 gives an illustration of this.
Example 1.
In naı¨ve cases of chronic hepatitis Cgenotype 1 pegylated interferon plus ribavirin for 3months induce sustained virologic response in about40%. One wishes to test if a new therapeutic regimen canincrease the sustained response in this type of patients to
Fig. 2. Illustration of the variation of conﬁdence limits in randomsamples (computer simulation). (a) ninety-ﬁve percent conﬁdence inter-vals in 100 random samples of same size from the same populationaligned according to the true value in the population. In 5 of the samplesthe 95% conﬁdence interval does not include the true value found in thepopulation. (b) The same conﬁdence intervals are here sorted accordingto their values. (c) When the sorted conﬁdence intervals are aligned totheir middle, their variation in relation to the true value in the populationis again clearly seen. This presentation corresponds to how investigatorswould see the world. They investigate samples in order to extrapolate theﬁndings to the population. However, the potential imprecision of extrapolating from a sample to the population is apparent – especiallyif the conﬁdence interval is wide. Thus keeping conﬁdence intervals rathernarrow is important. This would mean relatively large trials.Fig. 3. (a) Histogram showing the distribution of the true diﬀerence inthe population in relation to the diﬀerence
D
found in the trial sample(computer simulation of 10,000 samples). (b) The normally distributedlikelihood curve of the true diﬀerence in the population in relation to thediﬀerence
D
found in a trial sample. The 95% conﬁdence interval (CI) isshown.
950
E. Christensen / Journal of Hepatology 46 (2007) 947–954

Search

Similar documents

Tags

Related Search

Inferiority of Non-Modernist IdiomsSeries Of Non Fiction BooksMetodology of social researchAdaptation of Non-Fiction for PerformanceDigitalizations of Non-Digital Art Works, ShoAesthetic of Non Western CulturesPrinciple of Non-ContradictionLaw of Non-ContradictionKinetic Theory of Non equiibrium SystemsThe History and the Metodology of Literary St

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks