## Archive for the ‘**Statistics**’ Category

## Meeting Ghosts In The Chase For Reality

Watching the morning sun beaming through the clouds during today’s morning jog, I was struck by an epiphany. What ultimately transpired was a streak of thoughts, that left me in a overwhelming sense of awe and humility for its profound implications.

Perhaps the rejuvenating air, the moist earth from the previous night’s rains and the scent of the fresh Golden Flamboyant trees lining my path made the sun’s splendor much more obvious to see. Like in a photograph coming to life, when objects elsewhere in the scene enhance the main subject’s impact.

As I gazed in its direction wondering about the sunspots that neither I nor anyone else around me could see (but that I knew were really there, from reading the work of astronomers), I began thinking about my own positional coordinates. So this was the East, I found. But how did I know that? Well as you might have guessed, from the age old phrase: “the sun rises in the East and sets in the West”. Known in Urdu as “سورج مشرق میں نکلتا ہے اور مغرب میں ڈوبتا ہے ” or in Hindi, “सूरज पूरव में निकलता है और पश्चिम में डूबता है” and indeed to be found in many other languages, we observe that man has come to form an interesting model to wrap his mind around this majestic phenomenon. Indeed, many religious scriptures and books of wisdom, from ancient history to the very present, find use of this phrase in their deep moral teachings.

But we’ve come to think that we know this model is not *really* “correct”, is it? We’ve come to develop this thinking with the benefit of hindsight (a relative term, given Einstein’s famous theory, by the way. One man’s hindsight could actually be another man’s foresight!). We’ve ventured beyond our usual abode and looked at our planet from a different vantage point – that of Space. From the Moon and satellites. The sun doesn’t *actually* rise or set. That experience occurs because of our peculiar vantage point – of relatively slow or immobile creatures grounded here on Earth. One could say that it is an interesting illusion. Indeed, you could sit on a plane and with the appropriate speed, chase that sliver of sunlight as the Sol (as it’s lovingly called by scientists) appears or disappears in the horizon, never letting it vanish from view and do so essentially indefinitely.

### Notes In The Margin About Language

Coming back, for a moment, to this amusing English phrase that helped me gauge my position, I thought about how language itself can shape one’s thinking. A subject matter upon which I’ve reflected before. There really comes a point when our models of the world and the universe get locked within the phraseology of a language that can actually reach the limits of its power of expression fairly unexpectedly. Speak in English and your view is different from somebody who can speak in Math. Even within Math, the coming about of algebra expanded the language’s power of expression incredibly from its meager beginnings. New models get incorporated into the lexicon of a language and because we tend to feed off of such phrases to make sense of ourselves and our universe, there is the potential for an inertia to develop, whereby it becomes easy to stay put with our abstractions of reality and not move on to radically new ones – models that are beyond the power of expression of a language and that haven’t yet been captured in its lexicon. In a way we find that models influence languages and languages themselves influence models and ultimately there is this interesting potential for a peculiar steady state to be reached – which may or may not be such a good thing.

So when it comes to this phenomenon, we’ve moved from one model to another. We began with “primitive” maxims. Perhaps during a time when people used to think of the Earth as flat and stars as pin-point objects too. And then progressed to geocentrism and then heliocentrism, both of which were basically formulated by careful and detailed observations of the sky using telescopes, long before the luxury of satellites and space travel came into being. And now that we see the Earth from this improved vantage point – of Space – our model for understanding reality has been refined. And actually, really shifted in profound ways.

So what does this all mean? It looks like reality is one thing, that exists out there. And we as humans make sense of reality through abstractions or models. How accurate we are with our abstractions really depends on how much information we’ve been able to gather. New information (through ever more detailed experiments or observations and indeed as Godel and Poincare showed, sometimes by mere pontification), drives us to alter our existing models. Sometimes in radically different ways (a classic example is our model of matter: one minute particle, one minute wave). There is this continuous flux about how we make sense of the cosmos, and it will likely go on this way until the day mankind has been fully informed – which may never really happen if pondered upon objectively. There have been moments in the past where man has thought that this precipice had been finally reached, that he was at last fully informed, only to realize with utter embarrassment that this was not the case. Can man ever know, by himself, that he has finally reached such a point? Especially, given that this is like a student judging his performance at an exam without the benefit of an independent evaluator? The truth is that we may never know. Whether we think we will ever reach such a precipice really does depend on a leap of faith. And scientists and explorers who would like to make progress, depend on this faith – that either such a precipice will one day be reached or at least that their next observation or experiment will increase them in information on the path to such a glorious point. When at last, a gestalt vision of all of reality can be attained. It’s hard to stay motivated otherwise, you see. And you thought you heard that faith had nothing to do with science or vice versa!

It is indeed quite remarkable the extent to which we get stuck in this or that model and keep fooling ourselves about reality. No sooner do we realize that we’ve been had and move on from our old abstraction to a new one and one what we think is much better, are we struck with another blow. This actually reminds me of a favorite quote by a stalwart of modern Medicine:

And not only are the reactions themselves variable, but

we, the doctors, are so fallible, ever beset with the common and fatal facility of reaching conclusions from superficial observations, and constantly misled by the ease with which our minds fall into the rut of one or two experiences.

The phenomenon is really quite pervasive. The early cartographers who divided the world into various regions thought funny stuff by today’s standards. But you’ve got to understand that that’s how our forefathers modeled reality! And whether you like it or not someday many generations after our time, we will be looked upon with similar eyes.

Watching two interesting Royal Society lectures by Paul Nurse (The Great Ideas of Biology) and Eric Lander (Beyond The Human Genome Project: Medicine In The 21st Century) the other day, this thought kept coming back to me. Speaking about the advent of Genomic Medicine, Eric Lander (who trained as a mathematician, by the way) talked about the discovery of the EGFR gene and the realization that its mutations strongly increase the risk for a type of lung cancer called Adenocarcinoma. He mentioned how clinical trials of the drug *Iressa* – a drug whose mechanism of action scientists weren’t sure of yet but was nevertheless proposed as a viable option for lung adenocarcinomas – failed to show statistically significant differences from standard therapy. Well, that was because the trial’s subjects were members of the broad population of *all* lung adenocarcinoma cases. Many doctors realizing the lack of conclusive evidence of a greater benefit, felt no reason to choose *Iressa* over standard therapy and drastically shift their practice. Which is what Evidence-Based-Medical practice would have led them to do, really. But soon after the discovery of the EGFR gene, scientists decided to do a subgroup analysis using patients with EGFR mutations, and it was rapidly learned that *Iressa* did have a statistically significant effect in decreasing tumor progression and improving survival in this particular subgroup. A significant section of patients could now have hope for cure! And doctors suddenly began to prescribe *Iressa* as the therapy of choice for them.

As I was thinking about what Lander had said, I remembered that Probability Theory as a science, which forms the bedrock of such things as clinical trials and indeed many other scientific studies, had not even developed until the Middle Ages. At least, so far as we know. And modern probability theory really began much later, in the early 1900s.

You begin to realize what a quantum leap this was in our history. We now think of patterns and randomness very differently from ancient times. Which is pretty significant, given that for some reason our minds are drawn to looking for patterns even where there might not be any. Over the years, we’ve developed the understanding that clusters (patterns) of events or cases could occur in a random system just as in a non-random one. Indeed, such clusters (patterns) would be a fundamental defining characteristic of a random process. Absence of clusters would indicate that a process wasn’t truly random. Whether such clusters (patterns) would fit with a random process as opposed to a non-random one would depend on whether or not we find an even greater pattern of how these clusters are distributed. A cluster of cases (such as an epidemic of cholera) would be considered non-random if by hypothesis testing we found that the probability of such a cluster coming about by random chance was so small as to be negligible. And even when thinking about randomness, we’ve learned to ask ourselves if a random process could be pseudo-random as opposed to truly random – which can sometimes be a difficult thing to establish. So unlike our forefathers, we don’t immediately jump to conclusions about what look to our eyes as patterns. It’s all quite marvelous to think about, really. What’s even more fascinating, is that Probability Theory is in a state of flux and continues to evolve to this day, as mathematicians gather new information. So what does this mean for the validity of our models that depend on Probability Theory? If a model could be thought of as a chain, it is obvious that such a model would be as strong as the links with which it is made! So we find that statisticians keep finding errors in how old epidemiologic studies were conducted and interpreted. And the science of Epidemiology itself improves as Probability Theory is continuously polished. This goes to show the fact that the validity of our abstractions keeps shifting as the foundations upon which they are based themselves continue to transform. A truly intriguing idea when one thinks about it.

Some other examples of the shifting of abstractions with the gathering of new information come to mind.

Like early cartographers, anatomists never really understood human anatomy very well back in the days of cutting open animals and extrapolating their findings to humans. There were these weird ideas that diseases were caused by a disturbance in the four humors. And then Vesalius came along and by stressing on the importance of dissecting cadavers, revolutionized how anatomy came to be understood and taught. But even then, our models for the human body were until recently plagued by ideas such as the concept that the seat of the soul lay in the pineal gland and some of the other stuff now popularly characterized as folk-medicine. In our models for disease causation, we’ve progressed over the years from looking at pure environmental factors to pure DNA factors and now to a multifactorial model that stresses on the idea that many diseases are caused by a mix of the two.

The Monty Hall paradox, about which I’ve written before is another good example. You’re presented with new information midway in the game and you use this new information to re-adjust the old model of reality that you had in your mind. The use of decision trees in genetic counseling, is yet another example. Given new information about a patient’s relatives and their genotype, your model for what is real and its accuracy improves. You become better at diagnosis with each bit of new information.

The phenomenon can often be found in how people understand Scripture too. Mathematician, Gary Miller has an interesting article that describes how some scholars examining the word *Iram* have gradually transformed their thinking based on new information gathered by archeological excavations.

So we see how abstractions play a fundamental role in our perceptions of reality.

One other peculiar thing to note is that sometimes, as we try to re-shape our abstractions to better congrue with any new information we get, there is the tendency to stick with the old as much as possible. A nick here or a nudge there is acceptable but at its heart we are usually loath to discard our old model entirely. There is a potential danger in this. Because it could be that we inherit flaws from our old model without even realizing it, thus constraining the new one in ways yet to be understood. Especially when we are unaware of what these flaws could be. A good example of abstractions feeding off of each other are the space-time fabric of relativity theory and the jitteriness of quantum mechanics. In our quest for a new model – a unified theory or abstraction – we are trying to mash these two abstractions together in curious ways, such that a serene space-time fabric exists when zoomed out, but when zoomed in we should expect to see it behave erratically with jitters all over the place. Our manner of dealing with such inertia when it comes to building new abstractions is basically to see if these mash-ups agree with experiments or observations much better than our old models. Which is an interesting way to go about doing things and could be something to think about.

Listening to Paul Nurse’s lecture I also learned how Mendel chose Pea plants for his studies on inheritance rather than other complicated vegetation because of the simplicity and clarity with which one could distinguish their phenotypes, making the experiment much easier to carry out. Depending on how one crossed them, one could trace the inheritance of traits – of color of fruit, height of plant, etc. very quickly and very accurately. It actually reminded me of something I learned a long time ago about the various kinds of data in statistics. That these data could be categorized into various types based on the amount of information they contain. The highest amount of information is seen in Ratio data. The lowest is seen in Nominal data. The implication of this is that the more your experiment or scientific study uses Ratio data rather than Nominal data, the more accurate will your inferences about reality be. The more information you throw out, the weaker will your model be. So we see that there is quite an important caveat when we build abstractions based on keeping it simple and stripping away intricacy. When we are stuck with having to use an ape thumb with a fine instrument. It’s primitive, but it often gets us ahead in understanding reality much faster. The cost we pay though, is that our abstraction congrues better with a simpler and more artificial version of the reality that we seek to understand. And reality usually is quite complex. So when we limit ourselves to examining a bunch of variables in say for example the clinical trial of a drug, and find that it has a treatment benefit, we can be a lot more certain that this would be the case in the real world too provided that we prescribe the drug to as similar a patient pool as in our experiment. Which rarely happens as you might have guessed! And that’s why you find so many cases of treatment failure and unpredictable disease outcomes. How the validity of an abstraction is influenced by the *KISS* *principle* is something to think about. Epidemiologists get sleepless nights when pondering over it sometimes. And a lot of time is spent in trying to eliminate selection bias (i.e. when errors of inference creep in because the pool of patients in the study doesn’t match to an acceptable degree, the kinds of patients doctors would interact with out in the real world). The goal is to make an abstraction agree with as much of reality as possible, but in doing so not to make it so far removed from the *KISS principle* that carrying out the experiment would be impractical or impossible. It’s such a delicate and fuzzy balance!

So again and again we find that abstractions define our experiences. Some people get so immersed and attached with their models of reality that they make them their lifeblood, refusing to move on. And some people actually wonder if life as we know it, is itself an abstraction :-D! I was struck by this when I came upon the idea of the Holographic principle in physics – that in reality we and our universe are bound by an enveloping surface and that our real existence is on this plane. That what we see, touch or smell in our common experience is simply a projection of what is actually happening on that surface. That these everyday experiences are essentially holograms :-D! Talk about getting wild, eh :-D?!

The thought that I ultimately came with at the end of my jog was that of maintaining humility in knowledge. For those of us in science, we find that it is very common for arrogance to creep in. When the fact is that there is so much about reality that we don’t know anything about and that our abstractions may never agree with it to full accuracy, ever! When pondered upon deeply this is a very profound and humbling thing to realize.

Even the arrogance in Newton melted away for a moment when he proclaimed:

If I have seen a little further it is by standing on the shoulders of Giants.

— Isaac Newton in a letter to rival Robert Hooke

Here’s to Isaac Newton for that spark of humility, even if it was rather fleeting :-). I’m guessing there must have been times when he might have had stray thoughts of cursing at himself for having said that :-)! Oh well, that’s how they all are …

—

Copyright Firas MR. All Rights Reserved.

*“A mote of dust, suspended in a sunbeam.”*

**Search Blog For Tags: **science, research, medicine, statistics, nature, history, probability

## Let’s Face It, We Are Numskulls At Math!

*Noted mathematician, Timothy Gowers, talks about the importance of math*

I’ve often written about Mathematics before ^{Footnotes}. As much as math helps us better understand our world (Modern Medicine’s recent strides have a lot to do with applied math for example), it also tells us how severely limited man’s common thinking is.

Humans and yes some animals too, are born with or soon develop an innate ability for understanding numbers. Yet, just like animals, our proficiency with numbers seems to stop short of the stuff that goes beyond our immediate activities of daily living (ADL) and survival. Because we are a higher form of being (or allegedly so, depending on your point of view), our ADLs are a lot more sophisticated than say those of, canaries or hamsters. And consequently you can expect to see a little more refined arithmetic being used by us. But fundamentally, we share this important trait – of being able to work with numbers from an early stage. A man who has a family with kids knows almost by instinct that if he has two kids to look after, that would mean breakfast, lunch and dinner times 2 in terms of putting food on the table. He would have to buy two sets of clothes for his kids. A kid soon learns that he has two parents. And so on. It’s almost natural. And when someone can’t figure out their way doing simple counting or arithmetic, we know that something might be wrong. In Medicine, we have a term for this. It’s called *acalculia* and often indicates the presence of a neuropsychiatric disorder.

It’s easy for ‘normal’ people to do 2 + 2 in their heads. Two oranges AND two oranges make a TOTAL of four oranges. This basic stuff helps us get by day-to-day. But how many people can wrap their heads around 1 divided by 0? If you went to school, yea sure your teachers must have hammered the answer into you: infinity. But how do you visualize it? Yes, I know it’s possible. But it takes unusual work. I think you can see my point, even with this simple example. We haven’t even begun to speak about probability, wave functions, symmetries, infinite kinds of infinities, multiple-space-dimensions, time’s arrow, quantum mechanics, the Higgs field or any of that stuff yet!

As a species, it is so obvious that we aren’t at all good at math. It’s almost as if we construct our views of the universe through this tunneled vision that helps us in our day-to-day tasks, but fails otherwise.

We tend to think of using math as an ability when really it should be thought of as a sensory organ. Something that is as vital to understanding our surroundings as our eyes, ears, noses, tongues and skins. And despite lacking this sense, we tend to go about living as though we somehow understand everything. That we are aware of what it is to be aware of. This can often lead to trouble down the road. I’ve talked about numerous PhDs having failed at the Monty Hall Paradox before. But a recent talk I watched, touched upon something with serious consequences that meant people being wrongfully convicted because of a stunted interpretation of DNA, fingerprint evidence, etc. by none other than “expert” witnesses. In other words, serious life and death issues. So much for our *expertise* as a species, eh?!

*How the human mind struggles with math!*

We recently also learned that the hullabaloo over H1N1 pandemic influenza had a lot do with our naive understanding of math, the pitfalls of corporate-driven public-interest research notwithstanding.

Anyhow, one of my main feelings is that honing one’s math not only helps us survive better, but it can also teach us about our place in the universe. Because we can then begin to fully use it as a sensory organ in its own right. Which is why a lot of pure scientists have argued that doing math for math’s own sake can not only be great fun (if done the right way, of course :-P) but should also be considered necessary. Due to the fact that such research has the potential to reveal entirely new vistas that can enchant us and surprise us at the same time (take Cantor’s work on infinity for example). For in the end, discovery, really, is far more enthralling than invention.

**UPDATE 1**: Check out **the Khan Academy** for a virtually A-Z education on math — and all of it for * free*! This is especially a great resource for those of us who can’t even recall principles of addition, subtraction, etc. let alone calculus or any of the more advanced stuff.

Copyright © Firas MR. All rights reserved.

# Footnotes:

## The Doctor’s Apparent Ineptitude

As a fun project, I’ve decided to frame this post as an abstract.

### AIMS/OBJECTIVES:

To elucidate factors influencing perceived incompetence on the part of the doctor by the layman/patient/patient’s caregiver.

### MATERIALS & METHODS:

Arm-chair pontification and a little gedankenexperiment based on prior experience with patients as a medical trainee.

### RESULTS:

Preliminary analyses indicate widespread suspicions among patients on the ineptitude of doctors no matter what the level of training. This is amply demonstrated in the following figure:

As one can see, perceived ineptitude forms a wide spectrum – from most severe (med student) to least severe (attending). The underlying perceptions of incompetence do not seem to abate at any level however, and eyewitness testimonies include phrases such as ‘all doctors are inept; some more so than others’. At the med student level, exhausted patients find their anxious questions being greeted with a variety of responses ranging from the dumb ‘I don’t know’, to the dumber ‘well, I’m not the attending’, to the dumbest ‘uhh…mmmm..hmmm <eyes glazed over, pupils dilated>’. Escape routes will be meticulously planned in advance both by patients and more importantly by med students to avert catastrophe.

As for more senior medics such as attendings, evasion seems to be just a matter of hiding behind statistics. A gedankenexperiment was conducted to demonstrate this. The settings were two patients A and B, undergoing a certain surgical procedure and their respective caregivers, C-A and C-B.

#### Patient A

**Consent & Pre-op**

*C-A*: (anxious), Hey doc, ya think he’s gonna make it?

*Doc*: It’s difficult to say and I don’t know that at the moment. There are studies indicating that 95% live and 5% die during the procedure though.

*C-A*: ohhh kay (slightly confused) (murmuring)…’All this stuff about knowing medicine. What *does* he know? One simple question and he gives me this? What the heck has this guy spent all these years studying for?!’

**Post-op & Recovery**

*C-A*: Ah, I just heard! He made it! Thank you doctor!

*Doc*: You’re welcome (smug, god-complex)! See, I told ya 95% live. There was no reason for you to worry!

*C-A*: (sarcastic murmur) ‘Yeah, right. Let him go through the pain of not knowing and he’ll see. Look at him, so full of himself – as if he did something special; luck was on our side anyway. Heights of incompetence!’

#### Patient B

**Consent & Pre-op**

*C-B*: (anxious) Hey doc, ya think he’s gonna make it?

*Doc*: It’s difficult to say and I don’t know that at the moment. There are studies indicating that 95% live and 5% die during the procedure though.

*C-B*: ohhh kay (slightly confused) (murmuring)…’All this stuff about knowing medicine. What *does* he know? One simple question and he gives me this? What the heck has this guy spent all these years studying for?!’

**Post-op & Recovery**

*C-B*: (angry, shouting numerous explicatives) What?! He died on the table?!

*Doc*: Well, I did mention that there was a 5% death rate.

*C-B*: (angry, shouting numerous explicatives).. You (more explicatives) incompetent quack! (murmuring) “How convenient! A lawsuit should fix him for good!”

#### The Doctor’s Coping Strategy

Although numerous psychology models can be applied to understand physician behavior, the Freudian model reveals some interesting material. Common defense strategies that help doctors include:

**Isolation of affect**: eg. Resident tells Fellow, “you know that patient with the …well, she had a massive MI and went into VFib..died despite ACLS..poor soul…so hey, I hear they’re serving pizza today at the conference…(the conference about commercializing healthcare and increasing physician pay-grades for ‘a better and healthier tomorrow’)”

**Intellectualization**: eg. Attending tells Fellow, “so you understand why that particular patient bled to death? Yeah it was DIC in the setting of septic shock….plus he had a prior MI with an Ejection Fraction of 33% so there was that component as well..but we couldn’t really figure out why the antibiotics didn’t work as expected…ID gave clearance….(ad infinitum)…so let’s present this at our M&M conference this week..”

**Displacement**: eg. Caregiver yells at Fellow, “<explicatives>”. Fellow yells at intern, “You *knew* that this was a case that I had a special interest in and yet you didn’t bother to page me? Unacceptable!…” Intern then yells at med student, “Go <explicatives> disimpact Mr. X’s bowels…if I don’t see that done within the next 15 minutes, you’re in for a class! Go go go…clock’s ticking…tck tck tck!”

We believe there are other coping mechanisms that are important too, but in our observations these appear to be the most common. Of the uncommon ones, we think med students as a group in particular, are the most vulnerable to **Regression** & **Dissociation**, duly accounting for confounding factors.

All of these form a systematic ego-syntonic pattern of behavior, but for reasons we are still exploring, is not included in the DSM-IV manual’s section on Personality Disorders.

### CONCLUSIONS:

Patients and their caregivers seem to think that ALL doctors are * fundamentally* inept, period. Ineptitude follows a wide spectrum however – ranging from the bizarre to the mundane. Further studies (including but not limited to arm-chair pontification) need to be carried out to corroborate these startling results and the factors that we have reported. Other studies need to elucidate remedial measures that can be employed to save the doctor-patient relationship.

—

NOTE: I wrote this piece as a reminder of how the doctor-patient relationship is experienced from the patient’s side. In our business-as-usual frenzy, we as medics often don’t think about these things. And these things often DO matter a LOT to our patients!

—

Copyright © Firas MR. All rights reserved.

## USMLE – Designing The Ultimate Questions

There are strategies that examiners can employ to frame questions that are designed to stump you on an exam such as the USMLE. Many of these strategies are listed out in the Kaplan Qbook and I’m sure this stuff will be familiar to many. My favorite techniques are the ‘multi-step’ and the ‘bait-and-switch’.

### The Multi-Step

Drawing on principles of probability theory, examiners will often frame questions that require you to know multiple facts and concepts to get the answer right. As a crude example:

“This inherited disease exclusive to females is associated with acquired microcephaly and the medical management includes __________________.”

Such a question would be re-framed as a clinical scenario (an outpatient visit) with other relevant clinical data such as a pedigree chart. To get the answer right, you would need:

- Knowledge of how to interpret pedigree charts and identify that the disease manifests exclusively in females.
- Knowledge of Mendelian inheritance patterns of genetic diseases.
- Knowledge of conditions that might be associated with acquired microcephaly.
- Knowledge of medical management options for such patients.

Now taken individually, each of these steps – 1, 2, 3 and 4 – has a probability of 50% that you could get it right purely by random guessing. Combined together however, which is what is necessary to get the answer, the probability would be 50% * 50% * 50% * 50% = 6.25% [combined probability of independent events]. So now you know why they actually prefer multi-step questions over one or two-liners! 🙂 Notice that this doesn’t necessarily have anything to do with testing your intelligence as some might think. It’s just being able to recollect hard facts and then being able to put them together. They aren’t asking you to prove a math theorem or calculate the trajectory of a space satellite 😛 !

### The Bait-and-Switch

Another strategy is to riddle the question with chock-full of irrelevant data. You could have paragraph after paragraph describing demographic characteristics, anthropometric data, and ‘bait’ data that’s planted there to persuade you to think along certain lines and as you grind yourself to ponder over these things you are suddenly presented with an entirely unrelated sentence at the very end, asking a completely unrelated question! Imagine being presented with the multi-step question above with one added fly in the ointment. As you finally finish the half-page length question, it ends with ‘<insert-similar-disease> is associated with the loss of this enzyme and/or body part: _______________’. Very tricky! Questions like these give flashbacks and dejavu of days from 2nd year med school, when that patient with a neck lump begins by giving you his demographic and occupational history. As an inexperienced med student you immediately begin thinking: ‘hmmm..okay, could the lump be related to his occupation? …hmm…’. But wait! You haven’t even finished the physical exam yet, let alone the investigations. As medics progress along their careers they tend to phase out this kind of analysis in favor of more refined ‘heuristics’ as Harrison’s puts it. A senior medic will often wait to formulate opinions until the investigations are done and will focus on triaging problems and asking if management options are going to change them. The keyword here is ‘triage’. Just as a patient’s clinical information in a real office visit is filled with much irrelevant data, so too are many USMLE questions. That’s not to say that demographic data, etc. are irrelevant under all conditions. Certainly, an occupational history of being employed at an asbestos factory would be relevant in a case that looks like a respiratory disorder. If the case looks like a respiratory disorder, but the question mentions an occupational history of being employed as an office clerk, then this is less likely to be relevant to the case. Similarly if it’s a case that overwhelmingly looks like an acute abdomen, then a stray symptom of foot pain is less likely to be relevant. Get my point? That is why many recommend reading the last sentence or two of a USMLE question before reading the entire thing. It helps you establish what exactly is the main problem that needs to be addressed.

Hope readers have found the above discussion interesting :). Adios for now!

—

Copyright © Firas MR. All rights reserved.

## Decision Tree Questions In Genetics And The USMLE

Just a quick thought. It just occurred to me that some of the questions on the USMLE involving pedigree analysis in genetics, are actually typical decision tree questions. The probability that a certain individual, **A**, has a given disease (eg: Huntington’s disease) purely by random chance is simply the disease’s prevalence in the general population. But what if you considered the following questions:

- How much genetic code do
AandBshare if they are third cousins?- If you suddenly knew that
Bhas Huntington’s disease, what is the new probability forA?- What is the disease probability for
A‘s children, given how much genetic code they share withB?

When I’d initially written about decision trees, it did not at all occur to me at the time how this stuff was so familiar to me already!

Apply a little Bayesian strategy to these questions and your mind is suddenly filled with all kinds of probability questions ripe for decision tree analysis:

- If the genetic test I utilize to detect Huntington’s disease has a false-positive rate
xand a false-negative ratey, now what is the probability forA?- If the pre-test likelihood is
mand the post-test likelihood isn, now what is the probability forA?

I find it truly amazing how so many geneticists and genetic counselors accomplish such complex calculations using decision trees without even realizing it! Don’t you 🙂 ?

Copyright © Firas MR. All rights reserved.

## Why Equivalence Studies Are So Fascinating

**Objectives and talking points**:

- To recap basic concepts of hypothesis testing in scientific experiments. Readers should read-up on hypothesis testing in reference works.
- To contrast drug vs. placebo and drug vs. standard drug study designs.
- To contrast non-equivalence and equivalence studies.
- To understand implications of these study designs, in terms of interpreting study results.

——————————————————————————————————–

Howdy readers! Today I’m going to share with you some very interesting concepts from a fabulous book that I finished recently – “Designing Clinical Research – An Epidemiologic Approach” by Stephen Hulley et al. The book speaks fairly early on, on what are called “equivalence studies”. Equivalence studies are truly fascinating. Let’s see how.

When a new drug is tested for efficacy, there are multiple ways for us to do so.

**A Non-equivalence Study Of Drug vs. Placebo**

A drug can be compared to something that doesn’t have any treatment effect whatsoever – a ‘placebo’. Examples of placebos include sugar tablets, distilled water, inert substances, etc. Because pharmaceutical companies try hard to make drugs that have a treatment effect and that are thus different from placebos, the objective of such a comparison is to answer the following question:

Is the new drugany differentfrom the placebo?

Note the emphasis on ‘any different’. As is usually the case, a study of this kind is designed to test for differences between drug and placebo effects in both directions^{1}. That is:

Is the new drug better than the placebo?

OR

Is the new drug worse than the placebo?

The boolean operator ‘OR’, is key here.

Since we can not conduct such an experiment on all people in the *target ‘population’ *(eg. all people with diabetes from the whole country), we conduct it on a random and representative *‘sample’ of this population* (eg. randomly selected diabetes patients from the whole country). Because of this, we can not directly extrapolate our findings to the target population without doing some fancy roundabout thinking and a lot of voodoo first – a.k.a. ‘hypothesis testing’. Hypothesis testing is crucial to take in to account random chance (error) effects that might have crept in to the experiment.

In this experiment:

- The
**null hypothesis**is that the drug and the placebo DO NOT differ in the real world^{2}. - The
**alternative hypothesis**is that the drug and the placebo DO differ in the real world.

So off we go, with our experiment with an understanding that our results might be influenced by random chance (error) effects. Say that, before we start, we take the following error rates to be acceptable:

- Even
**if the null hypothesis is true**in the real world, we would find that the drug and the placebo DO NOT differ*only*95% of the time, purely by random chance. [Although this rate doesn’t have a name, it is equal to (1 – Type 1 error)]. - Even
**if the null hypothesis is true**in the real world, we would find that the drug and the placebo DO differ 5% of the time, purely by random chance. [This rate is also called our**Type 1 error**, or*critical level of significance*, or*critical α level*, or*critical ‘p’ value*]. - Even
**if the alternative hypothesis is true**in the real world, we would find that the drug and the placebo DO differ*only*80% of the time, purely by random chance. [This rate is also called the ‘**Power**‘ of the experiment. It is equal to (1 – Type 2 error)]. - Even
**if the alternative hypothesis is true**in the real world, we would find that the drug and the placebo DO NOT differ 20% of the time, purely by random chance. [This rate is also called our**Type 2 error**].

The strategy of the experiment is this:

If we are able to accept these error rates and show in our experiment that the null hypothesis is false (that is ‘**reject**‘ it), the only other hypothesis left on the table is the alternative hypothesis. This has then, GOT to be true and we thus ‘accept’ the alternative hypothesis.

Q:With what degree of uncertainty?

A:With the uncertainty that we might arrive at such a conclusion 5% of the time, even if the null hypothesis is true in the real world.

Q:In English please!

A:With the uncertainty that we might arrive at a conclusion that the drug DOES differ from the placebo 5% of the time, even if the drug DOES NOT differ from the placebo in the real world.

Our next question would be:

Q:How do we reject the null hypothesis?

A:We proceed by initially assuming that the null hypothesis is true in the real world (i.e. Drug effect DOES NOT differ from Placebo effect in the real world). We then use a ‘test of statistical significance‘ to calculate the probability of observing a difference in treatment effect in the real world,as large or largerthan that actually observed in the experiment. If this probability is <5%, we reject the null hypothesis. We do this with the belief that such a conclusion is within our pre-selected margin of error. Our pre-selected margin of error, as mentioned previously, is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)^{3}.If we fail to show that this calculated probability is <5%, we ‘

fail to reject‘ the null hypothesis and conclude that a difference in effect has not been proven^{4}.

A lot of scientific literature out there is riddled with drug vs. placebo studies. This kind of thing is good if we do not already have an effective drug for our needs. Usually though, we already have a standard drug that we know works well. It is of more interest to see how a new drug compares to our standard drug.

**A Non-equivalence Study Of Drug vs. Standard Drug**

These studies are conceptually the same as drug vs. placebo studies and the same reasoning for inference is applied. These studies ask the following question:

Is the new drugany differentthan the standard drug?

Note the emphasis on ‘any different’. As is often the case, a study of this kind is designed to test the difference between the two drugs in both directions^{1}. That is:

Is the new drug better than the standard drug?

OR

Is the new drug worse than the standard drug??

Again, the boolean operator ‘OR’, is key here.

In this kind of experiment:

- The
**null hypothesis**is that the new drug and the standard drug DO NOT differ in the real world^{2}. - The
**alternative hypothesis**is that the new drug and the standard drug DO differ in the real world.

Exactly like we discussed before, we initially assume that the null hypothesis is true in the real world (i.e. the new drug’s effect DOES NOT differ from the standard drug’s effect in the real world). We then use a ‘*test of statistical significance*‘ to calculate the probability of observing a difference in treatment effect in the real world, **as large or larger** than that actually observed in the experiment. If this probability is <5%, we reject the null hypothesis – with the belief that such a conclusion is within our pre-selected margin of error. Just to repeat ourselves here, our pre-selected margin of error, is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)^{3}.

If we fail to show that this calculated probability is <5%, we ‘fail to reject’ the null hypothesis and conclude that a difference in effect has not been proven^{4}.

**An Equivalence Study Of Drug vs. Standard Drug**

Sometimes all you want is a drug that is as good as the standard drug. This can be for various reasons – the standard drug is just too expensive, just too difficult to manufacture, just too difficult to administer, … and so on. Whereas the new drug might not have these undesirable qualities yet retain the same treatment effect.

In an equivalence study, the incentive is to prove that the two drugs are the same. Like we did before, let’s explicitly formulate our two hypotheses:

- The
**null hypothesis**is that the new drug and the standard drug DO NOT differ in the real world^{2}. - The
**alternative hypothesis**is that the new drug and the standard drug DO differ in the real world.

We are mainly interested in proving the null hypothesis. Since this can’t be done^{4}, we’ll be content with ‘failing to reject’ the null hypothesis. Our strategy is to design a study powerful enough to detect a difference close to 0 and then ‘fail to reject’ the null hypothesis. In doing so, although we can’t ‘prove’ for sure that the null hypothesis is true, we can nevertheless be more comfortable saying that it in fact is true.

In order to detect a difference close to 0, we have to increase the Power of the study from the usual 80% to something like 95% or higher. We wan’t to maximize power to detect the smallest difference possible. Usually though, it’s enough if we are able to detect the the largest difference that doesn’t have clinical meaning (eg: a difference of 4mm on a BP measurement). This way we can compromise a little on Power and choose a less extreme figure, say 88% or something.

And then just as in our previous examples, we proceed with the assumption that the null hypothesis is true in the real world. We then use a ‘*test of statistical significance*‘ to calculate the probability of observing a difference in treatment effect in the real world, **as large or larger** than that actually observed in the experiment. If this probability is <5%, we reject the null hypothesis – with the belief that such a conclusion is within our pre-selected margin of error. And to repeat ourselves yet again (boy, do we like doing this 😛 ), our pre-selected margin of error is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)^{3}.

If we fail to show that this calculated probability is <5%, we ‘**fail to reject**‘ the null hypothesis and conclude that although a difference in effect has not been proven, we can be reasonably comfortable saying that there is in fact no difference in effect.

**So Where Are The Gotchas?**** **

If your study isn’t designed or conducted properly (eg: without enough power, inadequate sample size, improper randomization, loss of subjects to followup, inaccurate measurements, etc.) you might end up ‘failing to reject’ the null hypothesis whereas if you had taken the necessary precautions, this might not have happened and you would have come to the opposite conclusion. Purely because of random chance (error) effects. Such improper study designs usually dampen any obvious differences in treatment effect in the experiment.

In a **non-equivalence study**, researchers, whose incentive it is to reject the null hypothesis, are thus forced to make sure that their designs are rigorous.

In an **equivalence study**, this isn’t the case. Since researchers are motivated to ‘fail to reject’ the null hypothesis from the get go, it becomes an easy trap to conduct a study with all kinds of design flaws and very conveniently come to the conclusion that one has ‘failed to reject’ the null hypothesis!

Hence, it is extremely important, more so in equivalence studies than in non-equivalence studies, to have a critical and alert mind during all phases of the experiment. Interpreting an equivalence study published in a journal is hard, because one needs to know the very guts of everything the research team did!

Even though we have discussed these concepts with drugs as an example, you could apply the same reasoning to many other forms of treatment interventions.

Hope you’ve found this post interesting 🙂 . Do send in your suggestions, corrections and comments!

Adios for now!

Copyright © Firas MR. All rights reserved.

Readability grades for this post:

Automated readability index: 8.1

Flesch-Kincaid grade level: 7.4

Coleman-Liau index: 9

Gunning fog index: 11.8

SMOG index: 11

—

1. An alternative hypothesis for such a study is called a ‘*two-tailed alternative hypothesis*‘. A study that tests for differences in only one direction has an alternative hypothesis that is called a ‘*one-tailed alternative hypothesis*‘.

2. This situation is a good example of a ‘null’ hypothesis also being a ‘nil’ hypothesis. A null hypothesis is usually a nil hypothesis, but it’s important to realize that this isn’t always the case.

4. Note that we never use the term, ‘accept the null hypothesis’.

## Does Changing Your Anwer In The Exam Help?

*The Monty Hall Paradox*

*One of the 3 doors hides a car. The other two hide a goat each. In search of a new car, the player picks a door, say 1. The game host then opens one of the other doors, say 3, to reveal a goat and offers to let the player pick door 2 instead of door 1. Is there an advantage if the the player decides to switch? (Courtesy: Wikipedia)
*

Hola amigos! Yes, I’m back! It’s been eons and I’m sure many of you may have been wondering why I was MIA. Let’s just say it was academia as usual.

This post is unique as it’s probably the first where I’ve actually learned something from contributors and feedback. A very critical audience and pure awesome discussion. The main thrust was going to be an analysis of the question, “If you had to pick an answer in an MCQ randomly, does changing your answer alter the probabilities to success?” and it was my hope to use decision trees to attack the question. I first learned about decision trees and decision analysis in Dr. Harvey Motulsky’s great book, “Intuitive Biostatistics“. I do highly recommend his book. As I pondered over the question, I drew a decision tree that I extrapolated from his book. Thanks to initial feedback from BrownSandokan (my venerable computer scientist friend from yore :P) and Dr. Motulsky himself, who was so kind as to write back to just a random reader, it turned out that my diagram was wrong and so was the original analysis. The problem with the original tree (that I’m going to maintain for other readers to see and reflect on here) was that the tree in the book is specifically for a math (or rather logic) problem called the **Monty Hall Paradox**. You can read more about it here. As you can see, the Monty Hall Paradox is a special kind of unequal conditional probability problem, in which knowing something for sure, influences the probabilities of your guesstimates. It’s a very interesting problem, and has bewildered thousands of people, me included. When it was originally circulated in a popular magazine, “nearly 1000 PhDs” (cf. Wikipedia) wrote back to say that the solution put forth was wrong, prompting numerous psychoanalytical studies to understand human behavior. A decision tree for such a problem is conceptually different from a decision tree for our question and so my original analysis was incorrect.

So what the heck are decision trees anyway? They are basically conceptual tools that help you make the right decisions given a couple of known probabilities. You draw a line to represent a decision, and explicitly label it with a corresponding probability. To find the final probability for a number of decisions (or lines) in sequence, you multiply or add their individual probabilities. It takes skill and a critical mind to build a correct tree, as I learned. But once you have a tree in front of you, its easier to see the whole picture.

Let’s just ignore decision trees completely for the moment and think in the usual sense. How good an idea is it to change an answer on an MCQ exam such as the USMLE? The Kaplan lecture notes will tell you that your chances of being correct are better off if you don’t. Let’s analyze this. If every question has 1 correct option and 4 incorrect options (the total number of options being 5), then any single try on a random choice gives you a probability of 20% for the correct choice and 80% for the incorrect choice. The odds are higher that on any given attempt, you’ll get the answer wrong. If your choice was correct the first time, it still doesn’t change these basic odds. You are still likely to pick the incorrect choice 80% of the time. Borrowing from the concept of “regression towards the mean” (repeated measurements of something, yield values closer to said thing’s mean), we can apply the same reasoning to this problem. Since the outcomes in question are categorical (binomial to be exact), the measure of central tendency used is the Mode (defined as the most commonly or frequently occurring thing in a series). In a categorical series – cat, dog, dog, dog, cat – the mode is ‘dog’. Since the Mode in this case happens to be the category “incorrect”, if you pick a random answer and repeat this multiple times, you are more likely to pick an incorrect answer! See, it all make sense 🙂 ! It’s not voodoo after all 😀 !

Coming back to decision analysis, just as there’s a way to prove the solution to the Monty Hall Paradox using decision trees, there’s also a way to prove our point on the MCQ problem using decision trees. While I study to polish my understanding of decision trees, building them for either of these problems will be a work in progress. And when I’ve figured it all out, I’ll put them up here. A decision tree for the Monty Hall Paradox can be accessed here.

To end this post, I’m going to complicate our main question a little bit and leave it out in the void. What if on your initial attempt you have no idea which of the answers is correct or incorrect but on your second attempt, your mind suddenly focuses on a structure flaw in one or more of the options? Assuming that an option with a structure flaw can’t be correct, wouldn’t this be akin to Monty showing the goat? One possible structure flaw, could be an option that doesn’t make grammatical sense when combined with the stem of the question. Does that mean you should switch? Leave your comments below!

Hope you’ve found this post interesting. Adios for now!

Copyright © Firas MR. All rights reserved.

*Readability grades for this post:*

*Flesch reading ease score: 72.4
Automated readability index: 7.8
Flesch-Kincaid grade level: 7.3
Coleman-Liau index: 8.5
Gunning fog index: 11.4
SMOG index: 10.7*

Intuitive Biostatistics, by Harvey Motulsky

*Powered by ScribeFire.*

## USMLE Scores – Debunking Common Myths

Lot’s of people have misguided notions as to the true nature of USMLE scores and what exactly they represent. In my opinion, this occurs in part due to a lack of interest in understanding the logistic considerations of the exam. Another contributing factor could be the bordering brainless, mentally zero-ed scientific culture most exam goers happen to be cultivated in. Many if not most of these candidates, in their naive wisdoms got into Medicine hoping to rid themselves of numerical burdens forever!

The following, I hope, will help debunk some of these common myths.

### Percentile? Uh…what percentile?

This myth is without doubt, the king of all 🙂 . It isn’t uncommon that you find a candidate basking in the self-righteous glory of having scored a ’99 percent’ or worse, a ’99 percentile’. The USMLE at one point used to provide percentile scores. That stopped sometime in the mid to late ’90s. Why? Well, the USMLE organization believed that scores were being unduly given more weightage than they ought to in medics’ careers. This test is a licensure exam, period. That has always been the motto. Among other things, when residency programs started using the exam as a yard stick to differentiate and rank students, the USMLE saw this as contrary to its primary purpose and said enough is enough. To make such rankings difficult, the USMLE no longer provides percentile scores to exam takers.

The USMLE does have an extremely detailed FAQ on what the 2-digit (which people confuse as a percentage or percentile) and 3-digit scores mean. I strongly urge all test-takers to take a hard look at it and ponder about some of the stuff said therein.

Simply put, the way the exam is designed, it measures a candidate’s level of knowledge and provides a 3-digit score with an important import. This 3-digit score is an unfiltered indication of an individual’s USMLE know-how, that in theory shouldn’t be influenced by variations in the content of the exam, be it across space (another exam center and/or questions from a different content pool) or time (exam content from the future or past). This means that provided a person’s knowledge remains constant, he or she should in theory, achieve the same 3-digit score regardless of where and when he or she took the test. Or, supposedly so. The minimum 3-digit score that is required to ‘pass’ the exam is revised on an annual basis to preserve this space-time independent nature of the score. For the last couple of years, the passing score has hovered around 185. A ‘pass’ score makes you eligible to apply for a license.

What then is the 2-digit score? For god knows what reason, the Federation of State Medical Boards (these people provide medics in the US, licenses based on their USMLE scores) has a 2-digit format for a ‘pass’ score on the USMLE exam. Unlike the 3-digit score this passing score is fixed at 75 and isn’t revised every year.

How does one convert a 3-digit score to a 2-digit score? The exact conversion algorithm hasn’t been disclosed (among lots of other things). But for matters of simplicity, I’m going to use a very crude approach to illustrate:

Equate the passing 3-digit score to 75. So if the passing 3-digit score is 180, then 180 = 75. 185 = 80, 190 = 85 … and so on.

I’m sure the relationship isn’t linear as shown above. For one, by very definition, a 2-digit score ends at 99. 100 is a 3-digit number! So let’s see what happens with our example above:

190 = 85, 195 = 90, 199 = 99. We’ve reached the 2-digit limit at this point. Any score higher than 199 will also be equated to 99. It doesn’t matter if you scored a 240 or 260 on the 3 digit scale. You immediately fall under the 99 bracket along with the lesser folk!

These distortions and constraints make the 2-digit score an unjust system to rank test-takers and today, most residency programs use the 3-digit score to compare people. Because the 3-digit to 2-digit scale conversion changes every year, it makes sense to stick to the 3-digit scale which makes comparisons between old-timers and new-timers possible, besides the obvious advantage in helping comparisons between candidates who deal/dealt with different exam content.

**Making Assumptions And Approximate Guesses**

The USMLE does provide Means and Standard Deviations on students’ score cards. But these statistics don’t strictly apply to them because they are derived from different test populations. The score card specifically mentions that these statistics are* “for recent” *instances of the test.

Each instance of an exam is directed at a group of people which form its test population. Each population has its own characteristics such as whether or not it’s governed by Gaussian statistics, whether there is skew or kurtosis in its distribution, etc. The summary statistics such as the mean and standard deviation will also vary between different test populations. So unless you know the exact summary statistics and the nature of the distribution that describes the test population from which a candidate comes, you can’t possibly assign him/her a percentile rank. And because Joe and Jane can be from two entirely different test populations, percentiles in the end don’t carry much meaning. It’s that simple folks.

You could however make assumptions and arbitrary conclusions about percentile ranks though. Say for argument sake, all populations have a mean equal to 220 and a standard deviation equal to 20 and conform to Gaussian statistics. Then a 3-digit score of:

220 = 50th percentile

220 + 20 = 84th percentile

220 + 20 + 20 = 97th percentile

[Going back to our ’99 percentile’ myth and with the specific example we used, don’t you see how a score equal to 260 (with its 2-digit 99 equivalent) still doesn’t reach the 99 percentile? It’s amazing how severely people can delude themselves. A 99 percentile rank is no joke and I find it particularly fascinating to observe how hundreds of thousands of people ludicrously claim to have reached this magic rank with a 2-digit 99 score. I mean, doesn’t the sheer commonality hint that something in their thinking is off?]

This calculator makes it easy to calculate a percentile based on known Mean and Standard Deviations for Gaussian distributions. Just enter the values for Mean and Standard Deviation on the left, and in the ‘Probability’ field enter a percentile value in decimal form (97th percentile corresponds to 0.97 and so forth). Hit the ‘Compute x’ button and you will be given the corresponding value of ‘x’.

**99th Percentile Ain’t Cake
**

Another point of note about a Gaussian distribution:

The distance from the 0th percentile to the 25th percentile is also equal to the distance between the 75th and 100th percentile. Let’s say this distance is x. The distance between the 25th percentile and the 50th percentile is also equal to the distance between the 50th percentile and the 75th percentile. Let’s say this distance is y.

It so happens that x>>>y. In a crude sense, this means that it is disproportionately tougher for you to score extreme values than to stay closer to the mean. Going from a 50th percentile baseline, scoring a 99th percentile is disproportionately tougher than scoring a 75th percentile. If you aim to score a 99 percentile, you’re gonna have to seriously sweat it out!

### It’s the interval, stupid

Say there are infinite clones of you existent in this world and you’re all like the Borg. Each of you is mentally indistinguishable from the other – possessing ditto copies of USMLE knowhow. Say that each of you took the USMLE and then we plot the frequencies of these scores on a graph. We’re going to end up with a Gaussian curve depicting this sample of clones, with its own mean score and standard deviation. This process is called ‘parametric sampling’ and the distribution obtained is called a ‘sampling distribution’.

The idea behind what we just did is to determine the variation that we would expect in scores even if knowhow remained constant – either due to a flaw in the test or by random chance.

The standard deviation of a sampling distribution is also called ‘standard error’. As you’ll probably learn during your USMLE preparation, knowing the standard error helps calculate what are called ‘confidence intervals’.

A confidence interval for a given score can be calculated as follows (using the Z-statistic):-

True score = Measured score +/- 1.96 (standard error of measurement) … for 95% confidence

True score = Measured score +/- 2.58 (standard error of measurement) … for 99% confidence

For many recent tests, the standard error for the 3-digit scale has been 6 [Every score card quotes a certain **SEM** (**Standard Error of Measurment**) for the 3-digit scale]. This means that given a measured score of 240, we can be 95% certain that the true value of your performance lies between a low of 240 – 1.96 (6) and a high of 240 + 1.96 (6). Similarly we can say with 99% confidence that the true score lies between 240 – 2.58 (6) and 240 + 2.58 (6). These score intervals are probablistically flat when graphed – each true score value within the intervals calculated has an equal chance of being the right one.

What this means is that, when you compare two individuals and see their scores side by side, you ought to consider what’s going on with their respective confidence intervals. Do they overlap? Even a nanometer of overlapping between CI*s* makes the two, statistically speaking, indistinguishable, even if in reality there is a difference. As far as the test is concerned, when two CI*s* overlap, the test failed to detect any difference between these two individuals (some statisticians disagree. How to interpret statistical significance when two or more CI*s* overlap is still a matter of debate! I’ve used the view of the authors of the Kaplan lecture notes here). Capiche?

Beating competitors by intervals rather than pinpoint scores is a good idea to make sure you really did do better than them. The wider the distance separating two CI*s,* the larger is the difference between them.

There’s a special scenario that we need to think about here. What about the poor fellow who just missed the passing mark? For a passing mark of 180, what of the guy who scored, say 175? Given a standard error of 6, his 95% CI definitely does include 180 and there is no statistically significant (using a 5% margin of doubt) difference between him and another guy who scored just above 180. Yet this guy failed while the other passed! How do we account for this? I’ve been wondering about it and I think that perhaps, the pinpoint cutoffs for passing used by the USMLE exist as a matter of practicality. Using intervals to decide passing/failing results might be tedious, and maybe scientific endeavor ends at this point. Anyhow, I leave this question out in the void with the hope that it sparks discussions and clarifications.

If you care to give it a thought, the graphical subject-wise profile bands on the score card are actually confidence intervals (95%, 99% ?? I don’t know). This is why the score card clearly states that if any two subject-wise profile bands overlap, performance in these subjects should be deemed equal.

I hope you’ve found this post interesting if not useful. Please feel free to leave behind your valuable suggestions, corrections, remarks or comments. Anything 🙂 !

—

*Readability grades for this post:*

*Kincaid: 8.8
ARI: 9.4
Coleman-Liau: 11.4
Flesch Index: 64.3/100 (plain English)
Fog Index: 12.0
Lix: 40.3 = school year 6
SMOG-Grading: 11.1*

—

*Powered by Kubuntu Linux 8.04 *

–

*Copyright © 2006 – 2008 Firas MR. All rights reserved.*

## Quantifying Medicine – A Tricky Road

I have been really enjoying Feinstein’s “Principles of Medical Statistics” the past couple of days. And today I felt like sharing a nifty and pragmatic lesson from the book. Now I’d love to put up an entire chunk from the book right here, but I’m not sure if that would do justice to the copyright. So I’ll just stick to as little of excerpt as possible. But to honestly enjoy it, I recommend reading the entire section. So grab yourself a copy at a local library or whatever and dive in. The chapter of interest is Chapter 6 in Unit 1. Towards the end, there’s a section that goes into interesting detail as to the merits and possible demerits of quantifying medicine. To demonstrate the delicate interplay of qualitative and quantitative descriptions in modern medicine, the author quotes a number of research studies that investigated how qualitative terms like “more”, “a lot more”, “a great deal”, “often”, etc. meant different things to different people. They were able to do this using clever research designs that allowed them to correlate a given qualitative term and its corresponding quantitative estimate and they did this for different groups of people – doctors, clerks, etc. Frustrated at the lack of a consensus on the exact *amount *or *probability *or *percentile/percentage* and so on, of mundane terms like the above, one scientist even thought of a universal coding mechanism for day to day use. What frustrations you ask? One example is where an ulcer deemed “large” on one visit to a doctor at the clinic could actually be deemed “small” on a subsequent visit to a different doctor, even though the ulcer might have really grown larger during this time.

It is quite clear then, that qualitativeness in medicine often seems like a roadblock of some sort. Not to dismay however, as Dr. Feinstein ends this chapter with a subsection called “virtues of imprecision”. I found this part to be the most worth savoring. He describes some of the advantages of using qualitative terms and why on some occasions they might in fact be better in communication:-

- Qualitative terms allow you to convey a message without resorting to painstaking detail. Detail that you might not have the ability to perceive or compute.
- Patients find qualitative terms more intuitive and so do doctors.
- Defining or maybe replacing qualitative terms with quantitative ones, potentially could lead to endless debates on where cut-offs would lie (why should 1001 come under ‘large’ and 1000 under ‘small’…hope you get the drift).
- Many statistical estimates like survival rates, etc. come out of potentially biased studies and it may be wrong to say that “good” survival is say 90% in 5 years and “better” is 99% in 5 years. Which is to say, that it may be wrong to give an impression of precision when in fact it isn’t present.
- Perhaps the most important and pragmatic lesson he gave, was about the false sense of security/insecurity numbers could give to either patients or doctors. Naivety plays devil here. He demonstrated this using the cancer staging system. Each cancer stage has some sort of survival statistic attached to it, right? So for example (the numbers here are solely arbitrary), for Stage I cancer, the 5-year survival is 90%. Stage III cancer in contrast is given a 5-year survival probability of 40%. A patient with Stage III cancer, will be given this information by his or her physician and management plans will be made. What the physician might not realize is that if Stage III is split into further sub-stages, say from Stage III-substage 1 to Stage III-substage 10, the survival probabilities range from 75% to 5%. The 40% statistic is the ‘average’ and may not be sufficiently relevant to this particular patient, who for all we know could belong to Stage III-substage 1. So, broad statistical numbers are not necessarily pertinent to individual cases.

Oh and did I mention excerpt? Ah, never mind. I’ve covered most of the juice paraphrasing anyway 🙂 .

Hope you’ve found this post interesting. And if you have, do send in your comments 🙂 .

—

*Readability grades for this post:*

*Kincaid: 8.8
ARI: 9.1
Coleman-Liau: 11.8
Flesch Index: 62.3/100 (plain English)
Fog Index: 12.2
Lix: 40.4 = school year 6
SMOG-Grading: 11.3*

—

*Powered by Kubuntu Linux 7.10*

–

*Copyright © 2006 – 2008 Firas MR. All rights reserved.*

## Know Thy Numbers!

Being face to face with writer’s block, I suppose there isn’t anything particularly exciting I feel like writing about for today. I will therefore talk about a couple of things that I’ve been learning from biostatistics and that I feel many of my fellow medics would benefit from.

We all make comparisons between numbers. If ‘A’ weighs 100 kg and ‘B’ weighs 50 kg, we often say A is twice as heavy as B (wt. of A / wt. of B). We can also say A is 50 kg heavier than B (weight of A – weight of B). Is the same true for temperature in Fahrenheit? Is 100F twice as hot as 50F? Well interestingly, no! A temperature of 100F is 50F hotter than a temperature of 50F but not twice as hot. Therein lies a fundamental difference between two different kinds of ‘*Dimensional*‘ (otherwise called ‘*Continuous*‘) data:

**Interval data:**a dimensional data set that has values with an equal difference between them. So if numbers denoting Fahrenheit in F are listed as 1, 2, 3, 4, … we clearly know that as we progress from 1 to 2 and then to 3, every subsequent number in that set is separated from its predecessor by an equal interval.**Ratio data:**a dimensional data set having properties of an Interval data set*and,**in addition*has an absolute zero. Kelvin vs. Fahrenheit is a classic example. Kelvin has an absolute zero while Fahrenheit does not. Weight in kg, too belongs to the class of Ratio data.

The implications of the above dictate how we can manipulate and handle our data. In making comparisons between interval data such as Fahrenheit, we don’t have a universal reference against which two compare two different values – in our example 100F and 50F. The 0F standard is purely arbitrary. If in a fit of mad-hatter rage, we suddenly said that from now on 0F is no longer 0F but 10F, our original values for 100F and 50F now become 110F and 60F. The difference (110-60) remains the same as before (100-50) but the ratio (110/60) changes from the original (100/50). All of this occurs because there isn’t anything stopping you from making a change to your arbitrary 0F standard.

Ratio data sets on the other hand have an absolute standard – the absolute zero. By definition, you can’t change it! This standard is not subject to arbitrary whims and fancies. Taking our Kelvin example, 100K is 50K hotter than a temperature of 50K (100-50). Not only that, it is absolutely fine for you to say 100K is twice as hot as 50K (100/50). Similarly for weight in kilograms, 0kg is absolute. And thus 100kg is 50kg heavier than a weight measurement of 50kg (100-50) and it is also twice as heavy as 50kg (100/50).

The crude analogy is that of a sailor out in the sea. In order to navigate, he could use objects in the ocean such as rocks that could very well change their positions due to climatic conditions (~interval data). Or he could use the Pole Star to help him navigate (~ratio data).

**Lessons Learned**

You can compare interval data by calculating their difference. No matter what you set as your arbitrary standard, the difference will not change. For ratio data, in addition to calculating differences you also have the luxury of calculating ratios.

**A Comedy of Errors**

Most people don’t realize this but the IQ score is an example of interval data. A guy scoring 200 on the test did not do twice as good as another who scored 100. He did 100 points better. Standards for a given IQ testing method are set arbitrarily. Not only that, different testing methods could have different arbitrary standards. The WAIS has a different standard than the Stanford-Binet. Remember that.

[In real life, the IQ score isn’t truly interval in nature. How is one to assume that there’s an equal interval of ‘intelligence’ between subsequent scores of 100, 101, 102, … ? It’s analogous to cancer staging actually. Stage IV disease is no doubt worse than Stage III disease which in turn is worse than Stage II disease, … You don’t necessarily progress by equal intervals of ‘disease-ness’ with each subsequent stage from I to IV. Similar to numbers for cancer staging, numbers for IQ scores are actually ‘*Ordinal*‘ data in disguise.]

**Notes:**

All data can be divided into the following types (from least informative to most informative):

*Categorical – Nominal*: Distinct categories of data, that you assign names to and that you can’t rank. Eg. Smoker and Non-smoker; Asian, African, American, Australian, etc.*Categorical – Ordinal*: Distinct categories of data that you can not only assign names to but can also assign ranks. Intervals between ranks aren’t equal. Eg. Gold medal, Silver medal, Bronze medal; Class rank, Cancer Staging, etc. are also examples of ordinal data. The only difference is that they are disguised as numbers.*Dimensional – Interval*: Numerical data with ranks. Ranks have equal intervals between them. There is no absolute zero.*Dimensional – Ratio*: Interval data with an absolute zero.

—

**References**

*Biostatistics – The Bare Essentials (by Geoffrey R. Norman (Author), David L. Streiner)*

*Principles of Medical Statistics (by Alvan R. Feinstein)*

—

*Powered by Kubuntu Linux 7.10*

—

*Readability grades for this post:-*

Kincaid: 6.3

ARI: 5.3

Coleman-Liau: 10.3

Flesch Index: 70.6/100

Fog Index: 9.8

Lix: 33.9 = below school year 5

SMOG-Grading: 9.7

–

*Copyright © 2006 – 2008 Firas MR. All rights reserved.*