Posts Tagged ‘Statistics’
Watching the morning sun beaming through the clouds during today’s morning jog, I was struck by an epiphany. What ultimately transpired was a streak of thoughts, that left me in a overwhelming sense of awe and humility for its profound implications.
Perhaps the rejuvenating air, the moist earth from the previous night’s rains and the scent of the fresh Golden Flamboyant trees lining my path made the sun’s splendor much more obvious to see. Like in a photograph coming to life, when objects elsewhere in the scene enhance the main subject’s impact.
As I gazed in its direction wondering about the sunspots that neither I nor anyone else around me could see (but that I knew were really there, from reading the work of astronomers), I began thinking about my own positional coordinates. So this was the East, I found. But how did I know that? Well as you might have guessed, from the age old phrase: “the sun rises in the East and sets in the West”. Known in Urdu as “سورج مشرق میں نکلتا ہے اور مغرب میں ڈوبتا ہے ” or in Hindi, “सूरज पूरव में निकलता है और पश्चिम में डूबता है” and indeed to be found in many other languages, we observe that man has come to form an interesting model to wrap his mind around this majestic phenomenon. Indeed, many religious scriptures and books of wisdom, from ancient history to the very present, find use of this phrase in their deep moral teachings.
But we’ve come to think that we know this model is not really “correct”, is it? We’ve come to develop this thinking with the benefit of hindsight (a relative term, given Einstein’s famous theory, by the way. One man’s hindsight could actually be another man’s foresight!). We’ve ventured beyond our usual abode and looked at our planet from a different vantage point – that of Space. From the Moon and satellites. The sun doesn’t actually rise or set. That experience occurs because of our peculiar vantage point – of relatively slow or immobile creatures grounded here on Earth. One could say that it is an interesting illusion. Indeed, you could sit on a plane and with the appropriate speed, chase that sliver of sunlight as the Sol (as it’s lovingly called by scientists) appears or disappears in the horizon, never letting it vanish from view and do so essentially indefinitely.
So when it comes to this phenomenon, we’ve moved from one model to another. We began with “primitive” maxims. Perhaps during a time when people used to think of the Earth as flat and stars as pin-point objects too. And then progressed to geocentrism and then heliocentrism, both of which were basically formulated by careful and detailed observations of the sky using telescopes, long before the luxury of satellites and space travel came into being. And now that we see the Earth from this improved vantage point – of Space – our model for understanding reality has been refined. And actually, really shifted in profound ways.
So what does this all mean? It looks like reality is one thing, that exists out there. And we as humans make sense of reality through abstractions or models. How accurate we are with our abstractions really depends on how much information we’ve been able to gather. New information (through ever more detailed experiments or observations and indeed as Godel and Poincare showed, sometimes by mere pontification), drives us to alter our existing models. Sometimes in radically different ways (a classic example is our model of matter: one minute particle, one minute wave). There is this continuous flux about how we make sense of the cosmos, and it will likely go on this way until the day mankind has been fully informed – which may never really happen if pondered upon objectively. There have been moments in the past where man has thought that this precipice had been finally reached, that he was at last fully informed, only to realize with utter embarrassment that this was not the case. Can man ever know, by himself, that he has finally reached such a point? Especially, given that this is like a student judging his performance at an exam without the benefit of an independent evaluator? The truth is that we may never know. Whether we think we will ever reach such a precipice really does depend on a leap of faith. And scientists and explorers who would like to make progress, depend on this faith – that either such a precipice will one day be reached or at least that their next observation or experiment will increase them in information on the path to such a glorious point. When at last, a gestalt vision of all of reality can be attained. It’s hard to stay motivated otherwise, you see. And you thought you heard that faith had nothing to do with science or vice versa!
It is indeed quite remarkable the extent to which we get stuck in this or that model and keep fooling ourselves about reality. No sooner do we realize that we’ve been had and move on from our old abstraction to a new one and one what we think is much better, are we struck with another blow. This actually reminds me of a favorite quote by a stalwart of modern Medicine:
And not only are the reactions themselves variable, but we, the doctors, are so fallible, ever beset with the common and fatal facility of reaching conclusions from superficial observations, and constantly misled by the ease with which our minds fall into the rut of one or two experiences.
The phenomenon is really quite pervasive. The early cartographers who divided the world into various regions thought funny stuff by today’s standards. But you’ve got to understand that that’s how our forefathers modeled reality! And whether you like it or not someday many generations after our time, we will be looked upon with similar eyes.
Watching two interesting Royal Society lectures by Paul Nurse (The Great Ideas of Biology) and Eric Lander (Beyond The Human Genome Project: Medicine In The 21st Century) the other day, this thought kept coming back to me. Speaking about the advent of Genomic Medicine, Eric Lander (who trained as a mathematician, by the way) talked about the discovery of the EGFR gene and the realization that its mutations strongly increase the risk for a type of lung cancer called Adenocarcinoma. He mentioned how clinical trials of the drug Iressa – a drug whose mechanism of action scientists weren’t sure of yet but was nevertheless proposed as a viable option for lung adenocarcinomas – failed to show statistically significant differences from standard therapy. Well, that was because the trial’s subjects were members of the broad population of all lung adenocarcinoma cases. Many doctors realizing the lack of conclusive evidence of a greater benefit, felt no reason to choose Iressa over standard therapy and drastically shift their practice. Which is what Evidence-Based-Medical practice would have led them to do, really. But soon after the discovery of the EGFR gene, scientists decided to do a subgroup analysis using patients with EGFR mutations, and it was rapidly learned that Iressa did have a statistically significant effect in decreasing tumor progression and improving survival in this particular subgroup. A significant section of patients could now have hope for cure! And doctors suddenly began to prescribe Iressa as the therapy of choice for them.
As I was thinking about what Lander had said, I remembered that Probability Theory as a science, which forms the bedrock of such things as clinical trials and indeed many other scientific studies, had not even developed until the Middle Ages. At least, so far as we know. And modern probability theory really began much later, in the early 1900s.
You begin to realize what a quantum leap this was in our history. We now think of patterns and randomness very differently from ancient times. Which is pretty significant, given that for some reason our minds are drawn to looking for patterns even where there might not be any. Over the years, we’ve developed the understanding that clusters (patterns) of events or cases could occur in a random system just as in a non-random one. Indeed, such clusters (patterns) would be a fundamental defining characteristic of a random process. Absence of clusters would indicate that a process wasn’t truly random. Whether such clusters (patterns) would fit with a random process as opposed to a non-random one would depend on whether or not we find an even greater pattern of how these clusters are distributed. A cluster of cases (such as an epidemic of cholera) would be considered non-random if by hypothesis testing we found that the probability of such a cluster coming about by random chance was so small as to be negligible. And even when thinking about randomness, we’ve learned to ask ourselves if a random process could be pseudo-random as opposed to truly random – which can sometimes be a difficult thing to establish. So unlike our forefathers, we don’t immediately jump to conclusions about what look to our eyes as patterns. It’s all quite marvelous to think about, really. What’s even more fascinating, is that Probability Theory is in a state of flux and continues to evolve to this day, as mathematicians gather new information. So what does this mean for the validity of our models that depend on Probability Theory? If a model could be thought of as a chain, it is obvious that such a model would be as strong as the links with which it is made! So we find that statisticians keep finding errors in how old epidemiologic studies were conducted and interpreted. And the science of Epidemiology itself improves as Probability Theory is continuously polished. This goes to show the fact that the validity of our abstractions keeps shifting as the foundations upon which they are based themselves continue to transform. A truly intriguing idea when one thinks about it.
Some other examples of the shifting of abstractions with the gathering of new information come to mind.
Like early cartographers, anatomists never really understood human anatomy very well back in the days of cutting open animals and extrapolating their findings to humans. There were these weird ideas that diseases were caused by a disturbance in the four humors. And then Vesalius came along and by stressing on the importance of dissecting cadavers, revolutionized how anatomy came to be understood and taught. But even then, our models for the human body were until recently plagued by ideas such as the concept that the seat of the soul lay in the pineal gland and some of the other stuff now popularly characterized as folk-medicine. In our models for disease causation, we’ve progressed over the years from looking at pure environmental factors to pure DNA factors and now to a multifactorial model that stresses on the idea that many diseases are caused by a mix of the two.
The Monty Hall paradox, about which I’ve written before is another good example. You’re presented with new information midway in the game and you use this new information to re-adjust the old model of reality that you had in your mind. The use of decision trees in genetic counseling, is yet another example. Given new information about a patient’s relatives and their genotype, your model for what is real and its accuracy improves. You become better at diagnosis with each bit of new information.
The phenomenon can often be found in how people understand Scripture too. Mathematician, Gary Miller has an interesting article that describes how some scholars examining the word Iram have gradually transformed their thinking based on new information gathered by archeological excavations.
So we see how abstractions play a fundamental role in our perceptions of reality.
One other peculiar thing to note is that sometimes, as we try to re-shape our abstractions to better congrue with any new information we get, there is the tendency to stick with the old as much as possible. A nick here or a nudge there is acceptable but at its heart we are usually loath to discard our old model entirely. There is a potential danger in this. Because it could be that we inherit flaws from our old model without even realizing it, thus constraining the new one in ways yet to be understood. Especially when we are unaware of what these flaws could be. A good example of abstractions feeding off of each other are the space-time fabric of relativity theory and the jitteriness of quantum mechanics. In our quest for a new model – a unified theory or abstraction – we are trying to mash these two abstractions together in curious ways, such that a serene space-time fabric exists when zoomed out, but when zoomed in we should expect to see it behave erratically with jitters all over the place. Our manner of dealing with such inertia when it comes to building new abstractions is basically to see if these mash-ups agree with experiments or observations much better than our old models. Which is an interesting way to go about doing things and could be something to think about.
Listening to Paul Nurse’s lecture I also learned how Mendel chose Pea plants for his studies on inheritance rather than other complicated vegetation because of the simplicity and clarity with which one could distinguish their phenotypes, making the experiment much easier to carry out. Depending on how one crossed them, one could trace the inheritance of traits – of color of fruit, height of plant, etc. very quickly and very accurately. It actually reminded me of something I learned a long time ago about the various kinds of data in statistics. That these data could be categorized into various types based on the amount of information they contain. The highest amount of information is seen in Ratio data. The lowest is seen in Nominal data. The implication of this is that the more your experiment or scientific study uses Ratio data rather than Nominal data, the more accurate will your inferences about reality be. The more information you throw out, the weaker will your model be. So we see that there is quite an important caveat when we build abstractions based on keeping it simple and stripping away intricacy. When we are stuck with having to use an ape thumb with a fine instrument. It’s primitive, but it often gets us ahead in understanding reality much faster. The cost we pay though, is that our abstraction congrues better with a simpler and more artificial version of the reality that we seek to understand. And reality usually is quite complex. So when we limit ourselves to examining a bunch of variables in say for example the clinical trial of a drug, and find that it has a treatment benefit, we can be a lot more certain that this would be the case in the real world too provided that we prescribe the drug to as similar a patient pool as in our experiment. Which rarely happens as you might have guessed! And that’s why you find so many cases of treatment failure and unpredictable disease outcomes. How the validity of an abstraction is influenced by the KISS principle is something to think about. Epidemiologists get sleepless nights when pondering over it sometimes. And a lot of time is spent in trying to eliminate selection bias (i.e. when errors of inference creep in because the pool of patients in the study doesn’t match to an acceptable degree, the kinds of patients doctors would interact with out in the real world). The goal is to make an abstraction agree with as much of reality as possible, but in doing so not to make it so far removed from the KISS principle that carrying out the experiment would be impractical or impossible. It’s such a delicate and fuzzy balance!
So again and again we find that abstractions define our experiences. Some people get so immersed and attached with their models of reality that they make them their lifeblood, refusing to move on. And some people actually wonder if life as we know it, is itself an abstraction :-D! I was struck by this when I came upon the idea of the Holographic principle in physics – that in reality we and our universe are bound by an enveloping surface and that our real existence is on this plane. That what we see, touch or smell in our common experience is simply a projection of what is actually happening on that surface. That these everyday experiences are essentially holograms :-D! Talk about getting wild, eh :-D?!
The thought that I ultimately came with at the end of my jog was that of maintaining humility in knowledge. For those of us in science, we find that it is very common for arrogance to creep in. When the fact is that there is so much about reality that we don’t know anything about and that our abstractions may never agree with it to full accuracy, ever! When pondered upon deeply this is a very profound and humbling thing to realize.
Even the arrogance in Newton melted away for a moment when he proclaimed:
If I have seen a little further it is by standing on the shoulders of Giants.
Here’s to Isaac Newton for that spark of humility, even if it was rather fleeting :-). I’m guessing there must have been times when he might have had stray thoughts of cursing at himself for having said that :-)! Oh well, that’s how they all are …
Copyright Firas MR. All Rights Reserved.
“A mote of dust, suspended in a sunbeam.”
As a fun project, I’ve decided to frame this post as an abstract.
To elucidate factors influencing perceived incompetence on the part of the doctor by the layman/patient/patient’s caregiver.
MATERIALS & METHODS:
Arm-chair pontification and a little gedankenexperiment based on prior experience with patients as a medical trainee.
Preliminary analyses indicate widespread suspicions among patients on the ineptitude of doctors no matter what the level of training. This is amply demonstrated in the following figure:
As one can see, perceived ineptitude forms a wide spectrum – from most severe (med student) to least severe (attending). The underlying perceptions of incompetence do not seem to abate at any level however, and eyewitness testimonies include phrases such as ‘all doctors are inept; some more so than others’. At the med student level, exhausted patients find their anxious questions being greeted with a variety of responses ranging from the dumb ‘I don’t know’, to the dumber ‘well, I’m not the attending’, to the dumbest ‘uhh…mmmm..hmmm <eyes glazed over, pupils dilated>’. Escape routes will be meticulously planned in advance both by patients and more importantly by med students to avert catastrophe.
As for more senior medics such as attendings, evasion seems to be just a matter of hiding behind statistics. A gedankenexperiment was conducted to demonstrate this. The settings were two patients A and B, undergoing a certain surgical procedure and their respective caregivers, C-A and C-B.
Consent & Pre-op
C-A: (anxious), Hey doc, ya think he’s gonna make it?
Doc: It’s difficult to say and I don’t know that at the moment. There are studies indicating that 95% live and 5% die during the procedure though.
C-A: ohhh kay (slightly confused) (murmuring)…’All this stuff about knowing medicine. What does he know? One simple question and he gives me this? What the heck has this guy spent all these years studying for?!’
Post-op & Recovery
C-A: Ah, I just heard! He made it! Thank you doctor!
Doc: You’re welcome (smug, god-complex)! See, I told ya 95% live. There was no reason for you to worry!
C-A: (sarcastic murmur) ‘Yeah, right. Let him go through the pain of not knowing and he’ll see. Look at him, so full of himself – as if he did something special; luck was on our side anyway. Heights of incompetence!’
Consent & Pre-op
C-B: (anxious) Hey doc, ya think he’s gonna make it?
C-B: ohhh kay (slightly confused) (murmuring)…’All this stuff about knowing medicine. What does he know? One simple question and he gives me this? What the heck has this guy spent all these years studying for?!’
Post-op & Recovery
C-B: (angry, shouting numerous explicatives) What?! He died on the table?!
Doc: Well, I did mention that there was a 5% death rate.
C-B: (angry, shouting numerous explicatives).. You (more explicatives) incompetent quack! (murmuring) “How convenient! A lawsuit should fix him for good!”
The Doctor’s Coping Strategy
Isolation of affect: eg. Resident tells Fellow, “you know that patient with the …well, she had a massive MI and went into VFib..died despite ACLS..poor soul…so hey, I hear they’re serving pizza today at the conference…(the conference about commercializing healthcare and increasing physician pay-grades for ‘a better and healthier tomorrow’)”
Intellectualization: eg. Attending tells Fellow, “so you understand why that particular patient bled to death? Yeah it was DIC in the setting of septic shock….plus he had a prior MI with an Ejection Fraction of 33% so there was that component as well..but we couldn’t really figure out why the antibiotics didn’t work as expected…ID gave clearance….(ad infinitum)…so let’s present this at our M&M conference this week..”
Displacement: eg. Caregiver yells at Fellow, “<explicatives>”. Fellow yells at intern, “You knew that this was a case that I had a special interest in and yet you didn’t bother to page me? Unacceptable!…” Intern then yells at med student, “Go <explicatives> disimpact Mr. X’s bowels…if I don’t see that done within the next 15 minutes, you’re in for a class! Go go go…clock’s ticking…tck tck tck!”
We believe there are other coping mechanisms that are important too, but in our observations these appear to be the most common. Of the uncommon ones, we think med students as a group in particular, are the most vulnerable to Regression & Dissociation, duly accounting for confounding factors.
Patients and their caregivers seem to think that ALL doctors are fundamentally inept, period. Ineptitude follows a wide spectrum however – ranging from the bizarre to the mundane. Further studies (including but not limited to arm-chair pontification) need to be carried out to corroborate these startling results and the factors that we have reported. Other studies need to elucidate remedial measures that can be employed to save the doctor-patient relationship.
NOTE: I wrote this piece as a reminder of how the doctor-patient relationship is experienced from the patient’s side. In our business-as-usual frenzy, we as medics often don’t think about these things. And these things often DO matter a LOT to our patients!
Copyright © Firas MR. All rights reserved.
There are strategies that examiners can employ to frame questions that are designed to stump you on an exam such as the USMLE. Many of these strategies are listed out in the Kaplan Qbook and I’m sure this stuff will be familiar to many. My favorite techniques are the ‘multi-step’ and the ‘bait-and-switch’.
Drawing on principles of probability theory, examiners will often frame questions that require you to know multiple facts and concepts to get the answer right. As a crude example:
“This inherited disease exclusive to females is associated with acquired microcephaly and the medical management includes __________________.”
Such a question would be re-framed as a clinical scenario (an outpatient visit) with other relevant clinical data such as a pedigree chart. To get the answer right, you would need:
- Knowledge of how to interpret pedigree charts and identify that the disease manifests exclusively in females.
- Knowledge of Mendelian inheritance patterns of genetic diseases.
- Knowledge of conditions that might be associated with acquired microcephaly.
- Knowledge of medical management options for such patients.
Now taken individually, each of these steps – 1, 2, 3 and 4 – has a probability of 50% that you could get it right purely by random guessing. Combined together however, which is what is necessary to get the answer, the probability would be 50% * 50% * 50% * 50% = 6.25% [combined probability of independent events]. So now you know why they actually prefer multi-step questions over one or two-liners! 🙂 Notice that this doesn’t necessarily have anything to do with testing your intelligence as some might think. It’s just being able to recollect hard facts and then being able to put them together. They aren’t asking you to prove a math theorem or calculate the trajectory of a space satellite 😛 !
Another strategy is to riddle the question with chock-full of irrelevant data. You could have paragraph after paragraph describing demographic characteristics, anthropometric data, and ‘bait’ data that’s planted there to persuade you to think along certain lines and as you grind yourself to ponder over these things you are suddenly presented with an entirely unrelated sentence at the very end, asking a completely unrelated question! Imagine being presented with the multi-step question above with one added fly in the ointment. As you finally finish the half-page length question, it ends with ‘<insert-similar-disease> is associated with the loss of this enzyme and/or body part: _______________’. Very tricky! Questions like these give flashbacks and dejavu of days from 2nd year med school, when that patient with a neck lump begins by giving you his demographic and occupational history. As an inexperienced med student you immediately begin thinking: ‘hmmm..okay, could the lump be related to his occupation? …hmm…’. But wait! You haven’t even finished the physical exam yet, let alone the investigations. As medics progress along their careers they tend to phase out this kind of analysis in favor of more refined ‘heuristics’ as Harrison’s puts it. A senior medic will often wait to formulate opinions until the investigations are done and will focus on triaging problems and asking if management options are going to change them. The keyword here is ‘triage’. Just as a patient’s clinical information in a real office visit is filled with much irrelevant data, so too are many USMLE questions. That’s not to say that demographic data, etc. are irrelevant under all conditions. Certainly, an occupational history of being employed at an asbestos factory would be relevant in a case that looks like a respiratory disorder. If the case looks like a respiratory disorder, but the question mentions an occupational history of being employed as an office clerk, then this is less likely to be relevant to the case. Similarly if it’s a case that overwhelmingly looks like an acute abdomen, then a stray symptom of foot pain is less likely to be relevant. Get my point? That is why many recommend reading the last sentence or two of a USMLE question before reading the entire thing. It helps you establish what exactly is the main problem that needs to be addressed.
Hope readers have found the above discussion interesting :). Adios for now!
Copyright © Firas MR. All rights reserved.
Just a quick thought. It just occurred to me that some of the questions on the USMLE involving pedigree analysis in genetics, are actually typical decision tree questions. The probability that a certain individual, A, has a given disease (eg: Huntington’s disease) purely by random chance is simply the disease’s prevalence in the general population. But what if you considered the following questions:
- How much genetic code do A and B share if they are third cousins?
- If you suddenly knew that B has Huntington’s disease, what is the new probability for A?
- What is the disease probability for A‘s children, given how much genetic code they share with B?
When I’d initially written about decision trees, it did not at all occur to me at the time how this stuff was so familiar to me already!
Apply a little Bayesian strategy to these questions and your mind is suddenly filled with all kinds of probability questions ripe for decision tree analysis:
- If the genetic test I utilize to detect Huntington’s disease has a false-positive rate x and a false-negative rate y, now what is the probability for A?
- If the pre-test likelihood is m and the post-test likelihood is n, now what is the probability for A?
I find it truly amazing how so many geneticists and genetic counselors accomplish such complex calculations using decision trees without even realizing it! Don’t you 🙂 ?
Copyright © Firas MR. All rights reserved.
The Monty Hall Paradox
One of the 3 doors hides a car. The other two hide a goat each. In search of a new car, the player picks a door, say 1. The game host then opens one of the other doors, say 3, to reveal a goat and offers to let the player pick door 2 instead of door 1. Is there an advantage if the the player decides to switch? (Courtesy: Wikipedia)
Hola amigos! Yes, I’m back! It’s been eons and I’m sure many of you may have been wondering why I was MIA. Let’s just say it was academia as usual.
This post is unique as it’s probably the first where I’ve actually learned something from contributors and feedback. A very critical audience and pure awesome discussion. The main thrust was going to be an analysis of the question, “If you had to pick an answer in an MCQ randomly, does changing your answer alter the probabilities to success?” and it was my hope to use decision trees to attack the question. I first learned about decision trees and decision analysis in Dr. Harvey Motulsky’s great book, “Intuitive Biostatistics“. I do highly recommend his book. As I pondered over the question, I drew a decision tree that I extrapolated from his book. Thanks to initial feedback from BrownSandokan (my venerable computer scientist friend from yore :P) and Dr. Motulsky himself, who was so kind as to write back to just a random reader, it turned out that my diagram was wrong and so was the original analysis. The problem with the original tree (that I’m going to maintain for other readers to see and reflect on here) was that the tree in the book is specifically for a math (or rather logic) problem called the Monty Hall Paradox. You can read more about it here. As you can see, the Monty Hall Paradox is a special kind of unequal conditional probability problem, in which knowing something for sure, influences the probabilities of your guesstimates. It’s a very interesting problem, and has bewildered thousands of people, me included. When it was originally circulated in a popular magazine, “nearly 1000 PhDs” (cf. Wikipedia) wrote back to say that the solution put forth was wrong, prompting numerous psychoanalytical studies to understand human behavior. A decision tree for such a problem is conceptually different from a decision tree for our question and so my original analysis was incorrect.
So what the heck are decision trees anyway? They are basically conceptual tools that help you make the right decisions given a couple of known probabilities. You draw a line to represent a decision, and explicitly label it with a corresponding probability. To find the final probability for a number of decisions (or lines) in sequence, you multiply or add their individual probabilities. It takes skill and a critical mind to build a correct tree, as I learned. But once you have a tree in front of you, its easier to see the whole picture.
Let’s just ignore decision trees completely for the moment and think in the usual sense. How good an idea is it to change an answer on an MCQ exam such as the USMLE? The Kaplan lecture notes will tell you that your chances of being correct are better off if you don’t. Let’s analyze this. If every question has 1 correct option and 4 incorrect options (the total number of options being 5), then any single try on a random choice gives you a probability of 20% for the correct choice and 80% for the incorrect choice. The odds are higher that on any given attempt, you’ll get the answer wrong. If your choice was correct the first time, it still doesn’t change these basic odds. You are still likely to pick the incorrect choice 80% of the time. Borrowing from the concept of “regression towards the mean” (repeated measurements of something, yield values closer to said thing’s mean), we can apply the same reasoning to this problem. Since the outcomes in question are categorical (binomial to be exact), the measure of central tendency used is the Mode (defined as the most commonly or frequently occurring thing in a series). In a categorical series – cat, dog, dog, dog, cat – the mode is ‘dog’. Since the Mode in this case happens to be the category “incorrect”, if you pick a random answer and repeat this multiple times, you are more likely to pick an incorrect answer! See, it all make sense 🙂 ! It’s not voodoo after all 😀 !
Coming back to decision analysis, just as there’s a way to prove the solution to the Monty Hall Paradox using decision trees, there’s also a way to prove our point on the MCQ problem using decision trees. While I study to polish my understanding of decision trees, building them for either of these problems will be a work in progress. And when I’ve figured it all out, I’ll put them up here. A decision tree for the Monty Hall Paradox can be accessed here.
To end this post, I’m going to complicate our main question a little bit and leave it out in the void. What if on your initial attempt you have no idea which of the answers is correct or incorrect but on your second attempt, your mind suddenly focuses on a structure flaw in one or more of the options? Assuming that an option with a structure flaw can’t be correct, wouldn’t this be akin to Monty showing the goat? One possible structure flaw, could be an option that doesn’t make grammatical sense when combined with the stem of the question. Does that mean you should switch? Leave your comments below!
Hope you’ve found this post interesting. Adios for now!
Copyright © Firas MR. All rights reserved.
Readability grades for this post:
Flesch reading ease score: 72.4
Automated readability index: 7.8
Flesch-Kincaid grade level: 7.3
Coleman-Liau index: 8.5
Gunning fog index: 11.4
SMOG index: 10.7
Powered by ScribeFire.
Lot’s of people have misguided notions as to the true nature of USMLE scores and what exactly they represent. In my opinion, this occurs in part due to a lack of interest in understanding the logistic considerations of the exam. Another contributing factor could be the bordering brainless, mentally zero-ed scientific culture most exam goers happen to be cultivated in. Many if not most of these candidates, in their naive wisdoms got into Medicine hoping to rid themselves of numerical burdens forever!
The following, I hope, will help debunk some of these common myths.
Percentile? Uh…what percentile?
This myth is without doubt, the king of all 🙂 . It isn’t uncommon that you find a candidate basking in the self-righteous glory of having scored a ’99 percent’ or worse, a ’99 percentile’. The USMLE at one point used to provide percentile scores. That stopped sometime in the mid to late ’90s. Why? Well, the USMLE organization believed that scores were being unduly given more weightage than they ought to in medics’ careers. This test is a licensure exam, period. That has always been the motto. Among other things, when residency programs started using the exam as a yard stick to differentiate and rank students, the USMLE saw this as contrary to its primary purpose and said enough is enough. To make such rankings difficult, the USMLE no longer provides percentile scores to exam takers.
The USMLE does have an extremely detailed FAQ on what the 2-digit (which people confuse as a percentage or percentile) and 3-digit scores mean. I strongly urge all test-takers to take a hard look at it and ponder about some of the stuff said therein.
Simply put, the way the exam is designed, it measures a candidate’s level of knowledge and provides a 3-digit score with an important import. This 3-digit score is an unfiltered indication of an individual’s USMLE know-how, that in theory shouldn’t be influenced by variations in the content of the exam, be it across space (another exam center and/or questions from a different content pool) or time (exam content from the future or past). This means that provided a person’s knowledge remains constant, he or she should in theory, achieve the same 3-digit score regardless of where and when he or she took the test. Or, supposedly so. The minimum 3-digit score that is required to ‘pass’ the exam is revised on an annual basis to preserve this space-time independent nature of the score. For the last couple of years, the passing score has hovered around 185. A ‘pass’ score makes you eligible to apply for a license.
What then is the 2-digit score? For god knows what reason, the Federation of State Medical Boards (these people provide medics in the US, licenses based on their USMLE scores) has a 2-digit format for a ‘pass’ score on the USMLE exam. Unlike the 3-digit score this passing score is fixed at 75 and isn’t revised every year.
How does one convert a 3-digit score to a 2-digit score? The exact conversion algorithm hasn’t been disclosed (among lots of other things). But for matters of simplicity, I’m going to use a very crude approach to illustrate:
Equate the passing 3-digit score to 75. So if the passing 3-digit score is 180, then 180 = 75. 185 = 80, 190 = 85 … and so on.
I’m sure the relationship isn’t linear as shown above. For one, by very definition, a 2-digit score ends at 99. 100 is a 3-digit number! So let’s see what happens with our example above:
190 = 85, 195 = 90, 199 = 99. We’ve reached the 2-digit limit at this point. Any score higher than 199 will also be equated to 99. It doesn’t matter if you scored a 240 or 260 on the 3 digit scale. You immediately fall under the 99 bracket along with the lesser folk!
These distortions and constraints make the 2-digit score an unjust system to rank test-takers and today, most residency programs use the 3-digit score to compare people. Because the 3-digit to 2-digit scale conversion changes every year, it makes sense to stick to the 3-digit scale which makes comparisons between old-timers and new-timers possible, besides the obvious advantage in helping comparisons between candidates who deal/dealt with different exam content.
Making Assumptions And Approximate Guesses
The USMLE does provide Means and Standard Deviations on students’ score cards. But these statistics don’t strictly apply to them because they are derived from different test populations. The score card specifically mentions that these statistics are “for recent” instances of the test.
Each instance of an exam is directed at a group of people which form its test population. Each population has its own characteristics such as whether or not it’s governed by Gaussian statistics, whether there is skew or kurtosis in its distribution, etc. The summary statistics such as the mean and standard deviation will also vary between different test populations. So unless you know the exact summary statistics and the nature of the distribution that describes the test population from which a candidate comes, you can’t possibly assign him/her a percentile rank. And because Joe and Jane can be from two entirely different test populations, percentiles in the end don’t carry much meaning. It’s that simple folks.
You could however make assumptions and arbitrary conclusions about percentile ranks though. Say for argument sake, all populations have a mean equal to 220 and a standard deviation equal to 20 and conform to Gaussian statistics. Then a 3-digit score of:
220 = 50th percentile
220 + 20 = 84th percentile
220 + 20 + 20 = 97th percentile
[Going back to our ’99 percentile’ myth and with the specific example we used, don’t you see how a score equal to 260 (with its 2-digit 99 equivalent) still doesn’t reach the 99 percentile? It’s amazing how severely people can delude themselves. A 99 percentile rank is no joke and I find it particularly fascinating to observe how hundreds of thousands of people ludicrously claim to have reached this magic rank with a 2-digit 99 score. I mean, doesn’t the sheer commonality hint that something in their thinking is off?]
This calculator makes it easy to calculate a percentile based on known Mean and Standard Deviations for Gaussian distributions. Just enter the values for Mean and Standard Deviation on the left, and in the ‘Probability’ field enter a percentile value in decimal form (97th percentile corresponds to 0.97 and so forth). Hit the ‘Compute x’ button and you will be given the corresponding value of ‘x’.
99th Percentile Ain’t Cake
Another point of note about a Gaussian distribution:
The distance from the 0th percentile to the 25th percentile is also equal to the distance between the 75th and 100th percentile. Let’s say this distance is x. The distance between the 25th percentile and the 50th percentile is also equal to the distance between the 50th percentile and the 75th percentile. Let’s say this distance is y.
It so happens that x>>>y. In a crude sense, this means that it is disproportionately tougher for you to score extreme values than to stay closer to the mean. Going from a 50th percentile baseline, scoring a 99th percentile is disproportionately tougher than scoring a 75th percentile. If you aim to score a 99 percentile, you’re gonna have to seriously sweat it out!
It’s the interval, stupid
Say there are infinite clones of you existent in this world and you’re all like the Borg. Each of you is mentally indistinguishable from the other – possessing ditto copies of USMLE knowhow. Say that each of you took the USMLE and then we plot the frequencies of these scores on a graph. We’re going to end up with a Gaussian curve depicting this sample of clones, with its own mean score and standard deviation. This process is called ‘parametric sampling’ and the distribution obtained is called a ‘sampling distribution’.
The idea behind what we just did is to determine the variation that we would expect in scores even if knowhow remained constant – either due to a flaw in the test or by random chance.
The standard deviation of a sampling distribution is also called ‘standard error’. As you’ll probably learn during your USMLE preparation, knowing the standard error helps calculate what are called ‘confidence intervals’.
A confidence interval for a given score can be calculated as follows (using the Z-statistic):-
True score = Measured score +/- 1.96 (standard error of measurement) … for 95% confidence
True score = Measured score +/- 2.58 (standard error of measurement) … for 99% confidence
For many recent tests, the standard error for the 3-digit scale has been 6 [Every score card quotes a certain SEM (Standard Error of Measurment) for the 3-digit scale]. This means that given a measured score of 240, we can be 95% certain that the true value of your performance lies between a low of 240 – 1.96 (6) and a high of 240 + 1.96 (6). Similarly we can say with 99% confidence that the true score lies between 240 – 2.58 (6) and 240 + 2.58 (6). These score intervals are probablistically flat when graphed – each true score value within the intervals calculated has an equal chance of being the right one.
What this means is that, when you compare two individuals and see their scores side by side, you ought to consider what’s going on with their respective confidence intervals. Do they overlap? Even a nanometer of overlapping between CIs makes the two, statistically speaking, indistinguishable, even if in reality there is a difference. As far as the test is concerned, when two CIs overlap, the test failed to detect any difference between these two individuals (some statisticians disagree. How to interpret statistical significance when two or more CIs overlap is still a matter of debate! I’ve used the view of the authors of the Kaplan lecture notes here). Capiche?
Beating competitors by intervals rather than pinpoint scores is a good idea to make sure you really did do better than them. The wider the distance separating two CIs, the larger is the difference between them.
There’s a special scenario that we need to think about here. What about the poor fellow who just missed the passing mark? For a passing mark of 180, what of the guy who scored, say 175? Given a standard error of 6, his 95% CI definitely does include 180 and there is no statistically significant (using a 5% margin of doubt) difference between him and another guy who scored just above 180. Yet this guy failed while the other passed! How do we account for this? I’ve been wondering about it and I think that perhaps, the pinpoint cutoffs for passing used by the USMLE exist as a matter of practicality. Using intervals to decide passing/failing results might be tedious, and maybe scientific endeavor ends at this point. Anyhow, I leave this question out in the void with the hope that it sparks discussions and clarifications.
If you care to give it a thought, the graphical subject-wise profile bands on the score card are actually confidence intervals (95%, 99% ?? I don’t know). This is why the score card clearly states that if any two subject-wise profile bands overlap, performance in these subjects should be deemed equal.
I hope you’ve found this post interesting if not useful. Please feel free to leave behind your valuable suggestions, corrections, remarks or comments. Anything 🙂 !
Readability grades for this post:
Flesch Index: 64.3/100 (plain English)
Fog Index: 12.0
Lix: 40.3 = school year 6
Powered by Kubuntu Linux 8.04
Copyright © 2006 – 2008 Firas MR. All rights reserved.
Being face to face with writer’s block, I suppose there isn’t anything particularly exciting I feel like writing about for today. I will therefore talk about a couple of things that I’ve been learning from biostatistics and that I feel many of my fellow medics would benefit from.
We all make comparisons between numbers. If ‘A’ weighs 100 kg and ‘B’ weighs 50 kg, we often say A is twice as heavy as B (wt. of A / wt. of B). We can also say A is 50 kg heavier than B (weight of A – weight of B). Is the same true for temperature in Fahrenheit? Is 100F twice as hot as 50F? Well interestingly, no! A temperature of 100F is 50F hotter than a temperature of 50F but not twice as hot. Therein lies a fundamental difference between two different kinds of ‘Dimensional‘ (otherwise called ‘Continuous‘) data:
- Interval data: a dimensional data set that has values with an equal difference between them. So if numbers denoting Fahrenheit in F are listed as 1, 2, 3, 4, … we clearly know that as we progress from 1 to 2 and then to 3, every subsequent number in that set is separated from its predecessor by an equal interval.
- Ratio data: a dimensional data set having properties of an Interval data set and, in addition has an absolute zero. Kelvin vs. Fahrenheit is a classic example. Kelvin has an absolute zero while Fahrenheit does not. Weight in kg, too belongs to the class of Ratio data.
The implications of the above dictate how we can manipulate and handle our data. In making comparisons between interval data such as Fahrenheit, we don’t have a universal reference against which two compare two different values – in our example 100F and 50F. The 0F standard is purely arbitrary. If in a fit of mad-hatter rage, we suddenly said that from now on 0F is no longer 0F but 10F, our original values for 100F and 50F now become 110F and 60F. The difference (110-60) remains the same as before (100-50) but the ratio (110/60) changes from the original (100/50). All of this occurs because there isn’t anything stopping you from making a change to your arbitrary 0F standard.
Ratio data sets on the other hand have an absolute standard – the absolute zero. By definition, you can’t change it! This standard is not subject to arbitrary whims and fancies. Taking our Kelvin example, 100K is 50K hotter than a temperature of 50K (100-50). Not only that, it is absolutely fine for you to say 100K is twice as hot as 50K (100/50). Similarly for weight in kilograms, 0kg is absolute. And thus 100kg is 50kg heavier than a weight measurement of 50kg (100-50) and it is also twice as heavy as 50kg (100/50).
The crude analogy is that of a sailor out in the sea. In order to navigate, he could use objects in the ocean such as rocks that could very well change their positions due to climatic conditions (~interval data). Or he could use the Pole Star to help him navigate (~ratio data).
You can compare interval data by calculating their difference. No matter what you set as your arbitrary standard, the difference will not change. For ratio data, in addition to calculating differences you also have the luxury of calculating ratios.
A Comedy of Errors
Most people don’t realize this but the IQ score is an example of interval data. A guy scoring 200 on the test did not do twice as good as another who scored 100. He did 100 points better. Standards for a given IQ testing method are set arbitrarily. Not only that, different testing methods could have different arbitrary standards. The WAIS has a different standard than the Stanford-Binet. Remember that.
[In real life, the IQ score isn’t truly interval in nature. How is one to assume that there’s an equal interval of ‘intelligence’ between subsequent scores of 100, 101, 102, … ? It’s analogous to cancer staging actually. Stage IV disease is no doubt worse than Stage III disease which in turn is worse than Stage II disease, … You don’t necessarily progress by equal intervals of ‘disease-ness’ with each subsequent stage from I to IV. Similar to numbers for cancer staging, numbers for IQ scores are actually ‘Ordinal‘ data in disguise.]
All data can be divided into the following types (from least informative to most informative):
- Categorical – Nominal : Distinct categories of data, that you assign names to and that you can’t rank. Eg. Smoker and Non-smoker; Asian, African, American, Australian, etc.
- Categorical – Ordinal : Distinct categories of data that you can not only assign names to but can also assign ranks. Intervals between ranks aren’t equal. Eg. Gold medal, Silver medal, Bronze medal; Class rank, Cancer Staging, etc. are also examples of ordinal data. The only difference is that they are disguised as numbers.
- Dimensional – Interval : Numerical data with ranks. Ranks have equal intervals between them. There is no absolute zero.
- Dimensional – Ratio : Interval data with an absolute zero.
- Biostatistics – The Bare Essentials (by Geoffrey R. Norman (Author), David L. Streiner)
- Principles of Medical Statistics (by Alvan R. Feinstein)
Powered by Kubuntu Linux 7.10
Readability grades for this post:-
Flesch Index: 70.6/100
Fog Index: 9.8
Lix: 33.9 = below school year 5
Copyright © 2006 – 2008 Firas MR. All rights reserved.