## Why Equivalence Studies Are So Fascinating

**Objectives and talking points**:

- To recap basic concepts of hypothesis testing in scientific experiments. Readers should read-up on hypothesis testing in reference works.
- To contrast drug vs. placebo and drug vs. standard drug study designs.
- To contrast non-equivalence and equivalence studies.
- To understand implications of these study designs, in terms of interpreting study results.

——————————————————————————————————–

Howdy readers! Today I’m going to share with you some very interesting concepts from a fabulous book that I finished recently – “Designing Clinical Research – An Epidemiologic Approach” by Stephen Hulley et al. The book speaks fairly early on, on what are called “equivalence studies”. Equivalence studies are truly fascinating. Let’s see how.

When a new drug is tested for efficacy, there are multiple ways for us to do so.

**A Non-equivalence Study Of Drug vs. Placebo**

A drug can be compared to something that doesn’t have any treatment effect whatsoever – a ‘placebo’. Examples of placebos include sugar tablets, distilled water, inert substances, etc. Because pharmaceutical companies try hard to make drugs that have a treatment effect and that are thus different from placebos, the objective of such a comparison is to answer the following question:

Is the new drugany differentfrom the placebo?

Note the emphasis on ‘any different’. As is usually the case, a study of this kind is designed to test for differences between drug and placebo effects in both directions^{1}. That is:

Is the new drug better than the placebo?

OR

Is the new drug worse than the placebo?

The boolean operator ‘OR’, is key here.

Since we can not conduct such an experiment on all people in the *target ‘population’ *(eg. all people with diabetes from the whole country), we conduct it on a random and representative *‘sample’ of this population* (eg. randomly selected diabetes patients from the whole country). Because of this, we can not directly extrapolate our findings to the target population without doing some fancy roundabout thinking and a lot of voodoo first – a.k.a. ‘hypothesis testing’. Hypothesis testing is crucial to take in to account random chance (error) effects that might have crept in to the experiment.

In this experiment:

- The
**null hypothesis**is that the drug and the placebo DO NOT differ in the real world^{2}. - The
**alternative hypothesis**is that the drug and the placebo DO differ in the real world.

So off we go, with our experiment with an understanding that our results might be influenced by random chance (error) effects. Say that, before we start, we take the following error rates to be acceptable:

- Even
**if the null hypothesis is true**in the real world, we would find that the drug and the placebo DO NOT differ*only*95% of the time, purely by random chance. [Although this rate doesn't have a name, it is equal to (1 - Type 1 error)]. - Even
**if the null hypothesis is true**in the real world, we would find that the drug and the placebo DO differ 5% of the time, purely by random chance. [This rate is also called our**Type 1 error**, or*critical level of significance*, or*critical α level*, or*critical 'p' value*]. - Even
**if the alternative hypothesis is true**in the real world, we would find that the drug and the placebo DO differ*only*80% of the time, purely by random chance. [This rate is also called the '**Power**' of the experiment. It is equal to (1 - Type 2 error)]. - Even
**if the alternative hypothesis is true**in the real world, we would find that the drug and the placebo DO NOT differ 20% of the time, purely by random chance. [This rate is also called our**Type 2 error**].

The strategy of the experiment is this:

If we are able to accept these error rates and show in our experiment that the null hypothesis is false (that is ‘**reject**‘ it), the only other hypothesis left on the table is the alternative hypothesis. This has then, GOT to be true and we thus ‘accept’ the alternative hypothesis.

Q:With what degree of uncertainty?

A:With the uncertainty that we might arrive at such a conclusion 5% of the time, even if the null hypothesis is true in the real world.

Q:In English please!

A:With the uncertainty that we might arrive at a conclusion that the drug DOES differ from the placebo 5% of the time, even if the drug DOES NOT differ from the placebo in the real world.

Our next question would be:

Q:How do we reject the null hypothesis?

A:We proceed by initially assuming that the null hypothesis is true in the real world (i.e. Drug effect DOES NOT differ from Placebo effect in the real world). We then use a ‘test of statistical significance‘ to calculate the probability of observing a difference in treatment effect in the real world,as large or largerthan that actually observed in the experiment. If this probability is <5%, we reject the null hypothesis. We do this with the belief that such a conclusion is within our pre-selected margin of error. Our pre-selected margin of error, as mentioned previously, is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)^{3}.If we fail to show that this calculated probability is <5%, we ‘

fail to reject‘ the null hypothesis and conclude that a difference in effect has not been proven^{4}.

A lot of scientific literature out there is riddled with drug vs. placebo studies. This kind of thing is good if we do not already have an effective drug for our needs. Usually though, we already have a standard drug that we know works well. It is of more interest to see how a new drug compares to our standard drug.

**A Non-equivalence Study Of Drug vs. Standard Drug**

These studies are conceptually the same as drug vs. placebo studies and the same reasoning for inference is applied. These studies ask the following question:

Is the new drugany differentthan the standard drug?

Note the emphasis on ‘any different’. As is often the case, a study of this kind is designed to test the difference between the two drugs in both directions^{1}. That is:

Is the new drug better than the standard drug?

OR

Is the new drug worse than the standard drug??

Again, the boolean operator ‘OR’, is key here.

In this kind of experiment:

- The
**null hypothesis**is that the new drug and the standard drug DO NOT differ in the real world^{2}. - The
**alternative hypothesis**is that the new drug and the standard drug DO differ in the real world.

Exactly like we discussed before, we initially assume that the null hypothesis is true in the real world (i.e. the new drug’s effect DOES NOT differ from the standard drug’s effect in the real world). We then use a ‘*test of statistical significance*‘ to calculate the probability of observing a difference in treatment effect in the real world, **as large or larger** than that actually observed in the experiment. If this probability is <5%, we reject the null hypothesis – with the belief that such a conclusion is within our pre-selected margin of error. Just to repeat ourselves here, our pre-selected margin of error, is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)^{3}.

If we fail to show that this calculated probability is <5%, we ‘fail to reject’ the null hypothesis and conclude that a difference in effect has not been proven^{4}.

**An Equivalence Study Of Drug vs. Standard Drug**

Sometimes all you want is a drug that is as good as the standard drug. This can be for various reasons – the standard drug is just too expensive, just too difficult to manufacture, just too difficult to administer, … and so on. Whereas the new drug might not have these undesirable qualities yet retain the same treatment effect.

In an equivalence study, the incentive is to prove that the two drugs are the same. Like we did before, let’s explicitly formulate our two hypotheses:

- The
**null hypothesis**is that the new drug and the standard drug DO NOT differ in the real world^{2}. - The
**alternative hypothesis**is that the new drug and the standard drug DO differ in the real world.

We are mainly interested in proving the null hypothesis. Since this can’t be done^{4}, we’ll be content with ‘failing to reject’ the null hypothesis. Our strategy is to design a study powerful enough to detect a difference close to 0 and then ‘fail to reject’ the null hypothesis. In doing so, although we can’t ‘prove’ for sure that the null hypothesis is true, we can nevertheless be more comfortable saying that it in fact is true.

In order to detect a difference close to 0, we have to increase the Power of the study from the usual 80% to something like 95% or higher. We wan’t to maximize power to detect the smallest difference possible. Usually though, it’s enough if we are able to detect the the largest difference that doesn’t have clinical meaning (eg: a difference of 4mm on a BP measurement). This way we can compromise a little on Power and choose a less extreme figure, say 88% or something.

And then just as in our previous examples, we proceed with the assumption that the null hypothesis is true in the real world. We then use a ‘*test of statistical significance*‘ to calculate the probability of observing a difference in treatment effect in the real world, **as large or larger** than that actually observed in the experiment. If this probability is <5%, we reject the null hypothesis – with the belief that such a conclusion is within our pre-selected margin of error. And to repeat ourselves yet again (boy, do we like doing this :-P ), our pre-selected margin of error is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)^{3}.

If we fail to show that this calculated probability is <5%, we ‘**fail to reject**‘ the null hypothesis and conclude that although a difference in effect has not been proven, we can be reasonably comfortable saying that there is in fact no difference in effect.

**So Where Are The Gotchas?**** **

If your study isn’t designed or conducted properly (eg: without enough power, inadequate sample size, improper randomization, loss of subjects to followup, inaccurate measurements, etc.) you might end up ‘failing to reject’ the null hypothesis whereas if you had taken the necessary precautions, this might not have happened and you would have come to the opposite conclusion. Purely because of random chance (error) effects. Such improper study designs usually dampen any obvious differences in treatment effect in the experiment.

In a **non-equivalence study**, researchers, whose incentive it is to reject the null hypothesis, are thus forced to make sure that their designs are rigorous.

In an **equivalence study**, this isn’t the case. Since researchers are motivated to ‘fail to reject’ the null hypothesis from the get go, it becomes an easy trap to conduct a study with all kinds of design flaws and very conveniently come to the conclusion that one has ‘failed to reject’ the null hypothesis!

Hence, it is extremely important, more so in equivalence studies than in non-equivalence studies, to have a critical and alert mind during all phases of the experiment. Interpreting an equivalence study published in a journal is hard, because one needs to know the very guts of everything the research team did!

Even though we have discussed these concepts with drugs as an example, you could apply the same reasoning to many other forms of treatment interventions.

Hope you’ve found this post interesting :-) . Do send in your suggestions, corrections and comments!

Adios for now!

Copyright © Firas MR. All rights reserved.

Readability grades for this post:

Automated readability index: 8.1

Flesch-Kincaid grade level: 7.4

Coleman-Liau index: 9

Gunning fog index: 11.8

SMOG index: 11

–

1. An alternative hypothesis for such a study is called a ‘*two-tailed alternative hypothesis*‘. A study that tests for differences in only one direction has an alternative hypothesis that is called a ‘*one-tailed alternative hypothesis*‘.

2. This situation is a good example of a ‘null’ hypothesis also being a ‘nil’ hypothesis. A null hypothesis is usually a nil hypothesis, but it’s important to realize that this isn’t always the case.

4. Note that we never use the term, ‘accept the null hypothesis’.

[...] strides depends crucially on a solid grounding in Math, Probability and Logic. What are the pitfalls of hypothesis testing? What is randomness and what does it mean? When do we know that something is truly random as [...]

Revitalizing Science Education « My Dominant HemisphereNovember 7, 2010 at 12:16 am

[...] A cluster of cases (such as an epidemic of cholera) would be considered non-random if by hypothesis testing we found that the probability of such a cluster coming about by random chance was so small as to be [...]

Meeting Ghosts In The Chase For Reality « My Dominant HemisphereNovember 16, 2010 at 9:00 am

Your explanation of the null hypothesis for an equivalence study is incorrect. In fact, in an in equivalence study the null hypothesis is that there IS a difference, vs. the alternative that there IS NOT a difference.

celesteApril 20, 2011 at 7:38 pm

Thanks for dropping by! Well, I’ve only re-phrased concepts from the book. I’m only a student and am no expert. Can you point me to your source?

Firas MRMay 1, 2011 at 2:44 pm