USMLE Scores – Debunking Common Myths

Lot’s of people have misguided notions as to the true nature of USMLE scores and what exactly they represent. In my opinion, this occurs in part due to a lack of interest in understanding the logistic considerations of the exam. Another contributing factor could be the bordering brainless, mentally zero-ed scientific culture most exam goers happen to be cultivated in. Many if not most of these candidates, in their naive wisdoms got into Medicine hoping to rid themselves of numerical burdens forever!

The following, I hope, will help debunk some of these common myths.

Percentile? Uh…what percentile?

This myth is without doubt, the king of all 🙂 . It isn’t uncommon that you find a candidate basking in the self-righteous glory of having scored a ’99 percent’ or worse, a ’99 percentile’. The USMLE at one point used to provide percentile scores. That stopped sometime in the mid to late ’90s. Why? Well, the USMLE organization believed that scores were being unduly given more weightage than they ought to in medics’ careers. This test is a licensure exam, period. That has always been the motto. Among other things, when residency programs started using the exam as a yard stick to differentiate and rank students, the USMLE saw this as contrary to its primary purpose and said enough is enough. To make such rankings difficult, the USMLE no longer provides percentile scores to exam takers.

The USMLE does have an extremely detailed FAQ on what the 2-digit (which people confuse as a percentage or percentile) and 3-digit scores mean. I strongly urge all test-takers to take a hard look at it and ponder about some of the stuff said therein.

Simply put, the way the exam is designed, it measures a candidate’s level of knowledge and provides a 3-digit score with an important import. This 3-digit score is an unfiltered indication of an individual’s USMLE know-how, that in theory shouldn’t be influenced by variations in the content of the exam, be it across space (another exam center and/or questions from a different content pool) or time (exam content from the future or past). This means that provided a person’s knowledge remains constant, he or she should in theory, achieve the same 3-digit score regardless of where and when he or she took the test. Or, supposedly so. The minimum 3-digit score that is required to ‘pass’ the exam is revised on an annual basis to preserve this space-time independent nature of the score. For the last couple of years, the passing score has hovered around 185. A ‘pass’ score makes you eligible to apply for a license.

What then is the 2-digit score? For god knows what reason, the Federation of State Medical Boards (these people provide medics in the US, licenses based on their USMLE scores) has a 2-digit format for a ‘pass’ score on the USMLE exam. Unlike the 3-digit score this passing score is fixed at 75 and isn’t revised every year.

How does one convert a 3-digit score to a 2-digit score? The exact conversion algorithm hasn’t been disclosed (among lots of other things). But for matters of simplicity, I’m going to use a very crude approach to illustrate:

Equate the passing 3-digit score to 75. So if the passing 3-digit score is 180, then 180 = 75. 185 = 80, 190 = 85 … and so on.

I’m sure the relationship isn’t linear as shown above. For one, by very definition, a 2-digit score ends at 99. 100 is a 3-digit number! So let’s see what happens with our example above:

190 = 85, 195 = 90, 199 = 99. We’ve reached the 2-digit limit at this point. Any score higher than 199 will also be equated to 99. It doesn’t matter if you scored a 240 or 260 on the 3 digit scale. You immediately fall under the 99 bracket along with the lesser folk!

These distortions and constraints make the 2-digit score an unjust system to rank test-takers and today, most residency programs use the 3-digit score to compare people. Because the 3-digit to 2-digit scale conversion changes every year, it makes sense to stick to the 3-digit scale which makes comparisons between old-timers and new-timers possible, besides the obvious advantage in helping comparisons between candidates who deal/dealt with different exam content.

Making Assumptions And Approximate Guesses

The USMLE does provide Means and Standard Deviations on students’ score cards. But these statistics don’t strictly apply to them because they are derived from different test populations. The score card specifically mentions that these statistics are “for recent” instances of the test.

Each instance of an exam is directed at a group of people which form its test population. Each population has its own characteristics such as whether or not it’s governed by Gaussian statistics, whether there is skew or kurtosis in its distribution, etc. The summary statistics such as the mean and standard deviation will also vary between different test populations. So unless you know the exact summary statistics and the nature of the distribution that describes the test population from which a candidate comes, you can’t possibly assign him/her a percentile rank. And because Joe and Jane can be from two entirely different test populations, percentiles in the end don’t carry much meaning. It’s that simple folks.

You could however make assumptions and arbitrary conclusions about percentile ranks though. Say for argument sake, all populations have a mean equal to 220 and a standard deviation equal to 20 and conform to Gaussian statistics. Then a 3-digit score of:

220 = 50th percentile

220 + 20 = 84th percentile

220 + 20 + 20 = 97th percentile

[Going back to our ’99 percentile’ myth and with the specific example we used, don’t you see how a score equal to 260 (with its 2-digit 99 equivalent) still doesn’t reach the 99 percentile? It’s amazing how severely people can delude themselves. A 99 percentile rank is no joke and I find it particularly fascinating to observe how hundreds of thousands of people ludicrously claim to have reached this magic rank with a 2-digit 99 score. I mean, doesn’t the sheer commonality hint that something in their thinking is off?]

This calculator makes it easy to calculate a percentile based on known Mean and Standard Deviations for Gaussian distributions. Just enter the values for Mean and Standard Deviation on the left, and in the ‘Probability’ field enter a percentile value in decimal form (97th percentile corresponds to 0.97 and so forth). Hit the ‘Compute x’ button and you will be given the corresponding value of ‘x’.

99th Percentile Ain’t Cake

Another point of note about a Gaussian distribution:

The distance from the 0th percentile to the 25th percentile is also equal to the distance between the 75th and 100th percentile. Let’s say this distance is x. The distance between the 25th percentile and the 50th percentile is also equal to the distance between the 50th percentile and the 75th percentile. Let’s say this distance is y.

It so happens that x>>>y. In a crude sense, this means that it is disproportionately tougher for you to score extreme values than to stay closer to the mean. Going from a 50th percentile baseline, scoring a 99th percentile is disproportionately tougher than scoring a 75th percentile. If you aim to score a 99 percentile, you’re gonna have to seriously sweat it out!

It’s the interval, stupid

Say there are infinite clones of you existent in this world and you’re all like the Borg. Each of you is mentally indistinguishable from the other – possessing ditto copies of USMLE knowhow. Say that each of you took the USMLE and then we plot the frequencies of these scores on a graph. We’re going to end up with a Gaussian curve depicting this sample of clones, with its own mean score and standard deviation. This process is called ‘parametric sampling’ and the distribution obtained is called a ‘sampling distribution’.

The idea behind what we just did is to determine the variation that we would expect in scores even if knowhow remained constant – either due to a flaw in the test or by random chance.

The standard deviation of a sampling distribution is also called ‘standard error’. As you’ll probably learn during your USMLE preparation, knowing the standard error helps calculate what are called ‘confidence intervals’.

A confidence interval for a given score can be calculated as follows (using the Z-statistic):-

True score = Measured score +/- 1.96 (standard error of measurement) … for 95% confidence

True score = Measured score +/- 2.58 (standard error of measurement) … for 99% confidence

For many recent tests, the standard error for the 3-digit scale has been 6 [Every score card quotes a certain SEM (Standard Error of Measurment) for the 3-digit scale]. This means that given a measured score of 240, we can be 95% certain that the true value of your performance lies between a low of 240 – 1.96 (6) and a high of 240 + 1.96 (6). Similarly we can say with 99% confidence that the true score lies between 240 – 2.58 (6) and 240 + 2.58 (6). These score intervals are probablistically flat when graphed – each true score value within the intervals calculated has an equal chance of being the right one.

What this means is that, when you compare two individuals and see their scores side by side, you ought to consider what’s going on with their respective confidence intervals. Do they overlap? Even a nanometer of overlapping between CIs makes the two, statistically speaking, indistinguishable, even if in reality there is a difference. As far as the test is concerned, when two CIs overlap, the test failed to detect any difference between these two individuals (some statisticians disagree. How to interpret statistical significance when two or more CIs overlap is still a matter of debate! I’ve used the view of the authors of the Kaplan lecture notes here). Capiche?

Beating competitors by intervals rather than pinpoint scores is a good idea to make sure you really did do better than them. The wider the distance separating two CIs, the larger is the difference between them.

There’s a special scenario that we need to think about here. What about the poor fellow who just missed the passing mark? For a passing mark of 180, what of the guy who scored, say 175? Given a standard error of 6, his 95% CI definitely does include 180 and there is no statistically significant (using a 5% margin of doubt) difference between him and another guy who scored just above 180. Yet this guy failed while the other passed! How do we account for this? I’ve been wondering about it and I think that perhaps, the pinpoint cutoffs for passing used by the USMLE exist as a matter of practicality. Using intervals to decide passing/failing results might be tedious, and maybe scientific endeavor ends at this point. Anyhow, I leave this question out in the void with the hope that it sparks discussions and clarifications.

If you care to give it a thought, the graphical subject-wise profile bands on the score card are actually confidence intervals (95%, 99% ?? I don’t know). This is why the score card clearly states that if any two subject-wise profile bands overlap, performance in these subjects should be deemed equal.

I hope you’ve found this post interesting if not useful. Please feel free to leave behind your valuable suggestions, corrections, remarks or comments. Anything 🙂 !

—

Readability grades for this post:

Kincaid: 8.8
ARI: 9.4
Coleman-Liau: 11.4
Flesch Index: 64.3/100 (plain English)
Fog Index: 12.0
Lix: 40.3 = school year 6
SMOG-Grading: 11.1
—

Powered by Kubuntu Linux 8.04

–

Written by Firas MR

May 6, 2008 at 7:42 pm

23 Responses

Subscribe to comments with RSS.

Very useful explanation. Thanks for the effort and initiative.

shrikant

May 8, 2008 at 5:25 am

Reply
@shrikant:

Thank you for your kind comment. I’m glad you liked my post 🙂 .

Do visit again!

Firas MR

May 8, 2008 at 10:26 am

Reply
No offense but….
Ahhh, numbers again!!
😀

Rock on dude. Very informative.

Noor

May 14, 2008 at 8:29 pm

Reply
boy you nailed it !!!

Thx

sara

May 23, 2008 at 8:50 am

Reply
Noor – Thanks for leaving behind your comment ARN 🙂 .

sara – Thank you too, for dropping by and caring to send your comment. I can’t tell you how much I appreciate that! Do visit again 🙂 .

Firas MR

May 23, 2008 at 5:31 pm

Reply
I don’t know how accurate this is…
Step 1 scores do not follow a normal distribution ( I believe the curve is skewed towards the top scores) so all these measurments do not really apply.

genko

July 17, 2008 at 5:57 pm

Reply
@genko:

Thank you for your comment. I haven’t claimed that my thoughts on this are necessarily on the mark. In fact, that the USMLE doesn’t give any solid data to make comparisons between students was precisely one of my main arguments in the first place. Given that, I can’t possibly claim to be accurate on anything! I’d be surprised if any one else would make that assertion about themselves either.

Most of what I discussed with regard to the specific examples used was based on the assumption that scores follow a Gaussian distribution. The important point I wanted to get across was that people often overlook the statistical intricacies of scoring and thus make rather naive judgements because of that. Things aren’t as black and white as people take them to be in this matter.

Speaking of skew in bell-shaped distributions, I believe there are ways to delineate ranks and percentiles in such situations as well, which I omitted for simplicity sake. I did in fact alert readers to the possibilities of skew and/or kurtosis in my article.

Purely as a side note here, If you’ve read Feinstein’s Principles of Medical Statistics, the author makes an interesting point in one of the early chapters about Gaussian distributions. Surprisingly, most medical data are not Gaussian! So descriptive research studies such as cross-sectional studies that assume medical data as Gaussian are not necessarily representative of truth. Analytical studies such as case-control studies on the other hand aren’t affected as much because of the implications of what is known as the Central Limit Theorem during probability calculations for statistical inference. Which makes me wonder, would it be right to assume USMLE scores as being perfectly Gaussian for our inferential calculations? Any thoughts? Do let us know!

Firas MR

July 17, 2008 at 7:42 pm

Reply
Thank you for clarifying the USMLE score. But I don’t know yet what percentage of my answers must be correct to get 240 or 260 – although with your explanation 240 and 260 are alike as their CI overlaps. Does anybody know it?

olsernode

August 11, 2008 at 2:02 pm

Reply
@oslernode

Thank you very much for dropping by and caring to leave your comment. It’s difficult to say exactly how many questions count towards your score as in the end many questions which are actually there for research purposes are discarded away. How many exactly? Different sources say different things, it could be anywhere from 25-50 I think. That makes answering your question a little difficult! Try to do the best possible!

Firas MR

August 17, 2008 at 4:56 pm

Reply
Nice post 🙂

Good to set the blokes who have the 99 delusion scratch their heads once again 🙂

Vikas

August 28, 2008 at 3:15 am

Reply
@Vikas

Thanks for dropping by and sending in your comment! It’s feedback like this that keeps me going! 🙂 Do visit again!

Firas MR

August 28, 2008 at 6:44 pm

Reply
Very interesting, I myself was wondering about this myself.

But I have another question and would like your take on this.

I would LOVE to start and investigation in the practices of the USMLE/ECFMG/NMBE machinery and find out, WHY people (IMG’s in particular) who have failed any of the USMLE steps at least once, continue to fail with the same…. often times close score of 180/72, 181/73, until they

1.) either give up for good and do something else or

2.) finally pass with a good or really high score.

I personally have heard from many that tell this story, but I have never heard of any IMG that failed and then just barely pass the following attempt.

Has anyone else noticed that before? Am I going insane???

I find that very awkward….. 50 experimental q’s??? I am not so sure, if they are not using those same q’s to “adjust” the results until you are good enough that there is some sort of guarantee you’ll pass Step 3 at the end of residency?

Any takes on that??

marlbrnh

September 13, 2008 at 7:08 pm

Reply
@marlbrnth

Thank you so much for your comment. Yea, I agree, many of the USMLE’s methods are unnecessarily cryptic. While I’m not aware of such a phenomenon occurring with regard to marginal candidates, there is no doubt that the experimental questions are purely for research purposes and do not count towards scoring. They aren’t used to adjust scores or influence them. As far as I know, these questions ultimately help in maintaining the space-time independent nature of the USMLE scoring system.

Purely from a statistical point of view, what happens if a candidate continually resits an exam? Provided the testing conditions are the same and the candidate’s level of knowledge remains constant, he or she would be subject to what is known as Regression Towards The Mean. Over repeated measurements, a quantity’s value approaches the mean value of said quantity, in this case the candidate’s USMLE knowhow. If one assumes that a candidate achieved a score higher than his/her mean in a previous attempt, then it is very likely that he/she would get a score lower than this value in successive attempts. This would not be the case however, if the candidate’s level of knowledge (not only pertaining to subject matter but also examination and time management technique, etc.) and/or testing conditions were to change with successive attempts. In that case, obviously his/her mean would jump to higher levels with each sitting.

Hope that answers your queries, at least partially.

Firas MR

September 21, 2008 at 8:45 am

Reply
It does, thanks, and you just proved my point.

That would mean that if you continued to increase your level of knowledge (which pretty much is a given if you study for an additional 2-3 months on a schedule and keep on repeating subjects you have already learned?!) and did better on available Q-banks overall, that it should be reflected in your score, provided, you got a fair test.

But that is often NOT the case from what I found. It really is remarkably often the case that their is NO improvement in the score with each sitting, regardless of preparation changes. Weird eh?

Anyway, thanks so much for your enlightenment.

marlbrnh

September 22, 2008 at 5:26 pm

Reply
- ..COuld be , that people who are failing might have bad study techniques eg poor revision of old material i.e. a system where they continue to study not revise and forget material…. thus not improving overall..
  
  Arnab
  
  November 13, 2010 at 5:56 pm
  
  Reply
@marlbrnh

Appreciate you coming back again 🙂 ! And thanks for making this post one of my most popular ones ever!

Ah, if only we had trained in Mathematical Logic , problems such as these would be cake!

I’m just making random guesses here but have you considered the following possibilities for your observations :-

1. Your sample of observations is statistically biased due to some flaw? Either due to a Systematic Error because of unrepresentative samples or a Random Error owing to a small sample size?

2. With repeated attempts maybe USMLE knowhow per se is increasing, but what about knowledge in other areas such as exam technique? What about morale, quality of food on the test day or similar entities? That is, might your analysis be missing out hidden confounding factors?

3. If Q-bank scores aren’t reflecting actual performance then maybe there’s an as yet hidden difference between the Q-bank material vis-a-vis the real test, in terms of either the content of the material itself (not sufficiently simulative of the real exam) or the way you interact with the material (and this could also include your anxiety levels and such and such).

Anyhow, if you’re one of those who are struggling with their scores, hang in there. And I wish you best of luck! Take your time analyzing your weaknesses.

And for what its worth, I think the entire system of teaching and testing medicine is a load of crap anyway!

Cheers!

Firas MR

September 28, 2008 at 9:17 pm

Reply
I admit I am a bit slow to process this USMLE scoring logic.
1) What is the maximum 3-digit score you can get?
2) What does a sample test result look like that one receives after they’ve taken Step-1 let’s say.
I just need to clear the confusion…maybe it is suppose to remain confusing..

zimbo

September 24, 2009 at 1:59 am

Reply
- I might be wrong about this, but the theoretical maximum is supposed to be 300. I don’t know the answer to the second question.
  
  Firas MR
  
  September 29, 2009 at 10:07 am
  
  Reply
Dear Author, thank you much for your very insightful explanation…I agree with marlbrnh’s observations..I too failed the usmle three times, getting close to the same score with each consective attempt. This didnt make any sense to me as:
a, I was always very prepared for the exams
b, took months off between each attempt and my level of knowledge definitely improved…
c, scored really high on the NBME exams and was predicted a 260 on the boards…
so i really dont know how the USMLE exam is calculated. I must say that my main problem has never been the material covered but longevity..sitting an 8hr exam isnt easy..as my brain runs out of glucose within the first 3 blocks..even with snacks…lol..

But anyway, what i wanted to ask you was this…I always thought the three digits score was how many questions a person got right on the exam. With that notion, what happens if a person got 178 of the normal questions right and got 30 of the 50 experimental questions right?…would they still score 178 (FAIL)..or would the experimental questions be exchanged with some of their other questions so that their score could be brought up…
I guess what I’m trying to say is this..It wouldnt be fair for a person to fail even thogh they got most of the experimental questions correct..and woulda passed if the experimental questions were taken into account..

armani

November 6, 2009 at 12:06 pm

Reply
Thanks for the explanation.
Can you please tell what are the chances of getting residency in a good hospital with a score of 225/95

Thanks

singh64

July 15, 2010 at 1:44 pm

Reply
- Thanks for the comment. Unfortunately as I see it, there are no hard and fast rules that one can go by. All I can say is do your best!
  
  Firas MR
  
  July 16, 2010 at 3:53 pm
  
  Reply
Good explanation of how the USMLE exams are scored. Thanks!

Score 99

June 13, 2011 at 5:36 am

Reply
[…] percentile is a three digit score of 260 according to a nice analysis by My Dominant Hemisphere (https://mydominanthemisphere.wordpress.com/2008/05/06/usmle-scores-debunking-common-myths/.) This was a big point of confusion for a lot of people for a long time. The only people I ever […]

2 digit USMLE scores are going away « Internal Medicine Residency for IMGs

August 4, 2011 at 12:11 pm

Reply

	Shakti on The Story Of Sine
	2 digit USMLE scores… on USMLE Scores – Debunking Commo…
	Keisha on Calling For A Common Worldwide…
	bunny on Comprehending Drug Concentrati…
	Score 99 on USMLE Scores – Debunking Commo…
	Firas MR on Why Equivalence Studies Are So…
	celeste on Why Equivalence Studies Are So…
	Firas MR on New Beginning: Going Anon…
	Jaffer on New Beginning: Going Anon…
	How To [Windows/Linu… on Beginning Programming In Plain…
	Meeting Ghosts In Th… on Know Thy Numbers!
	Meeting Ghosts In Th… on Know Thy Numbers!
	Meeting Ghosts In Th… on Decision Tree Questions In Gen…
	Meeting Ghosts In Th… on Does Changing Your Anwer In Th…
	Meeting Ghosts In Th… on Elegance In Inelegance

My Dominant Hemisphere