My Dominant Hemisphere

The Official Weblog of ‘The Basilic Insula’

Archive for the ‘Medical Education’ Category

Decision Tree Questions In Genetics And The USMLE

without comments

Courtesy cayusa@flickr. (creative commons by-nc license)

Courtesy cayusa@flickr. (creative commons by-nc license)

Just a quick thought. It just occurred to me that some of the questions on the USMLE involving pedigree analysis in genetics, are actually typical decision tree questions. The probability that a certain individual, A, has a given disease (eg: Huntington’s disease) purely by random chance is simply the disease’s prevalence in the general population. But what if you considered the following questions:

  • How much genetic code do A and B share if they are third cousins?
  • If you suddenly knew that B has Huntington’s disease, what is the new probability for A?
  • What is the disease probability for A’s children, given how much genetic code they share with B?

When I’d initially written about decision trees, it did not at all occur to me at the time how this stuff was so familiar to me already!

Apply a little Bayesian strategy to these questions and your mind is suddenly filled with all kinds of probability questions ripe for decision tree analysis:

  • If the genetic test I utilize to detect Huntington’s disease has a false-positive rate x and a false-negative rate y, now what is the probability for A?
  • If the pre-test likelihood is m and the post-test likelihood is n, now what is the probability for A?

I find it truly amazing how so many geneticists and genetic counselors accomplish such complex calculations using decision trees without even realizing it! Don’t you :-) ?

Copyright © Firas MR. All rights reserved.

Keywords For Your Surgical Rotation In Med School

without comments

An Ongoing Surgery

An Ongoing Surgery

Bonjour everyone! Today, I’m going to share with you some high yield keywords that should hopefully help you breeze through your surgical rotations in med school. Call it a checklist if you will. The objective is to facilitate memory recall and help you gear up with areas that you just have to familiarize yourself with, ideally before the start of your rotations. Understand that these are just keywords, with a special emphasis on surgical instruments, and you’ll really need to read some good books to develop your knowledge base. For a rapid-fire review I suggest Surgical Recall. For basic surgical skills, you might like RM Kirk’s Basic Surgical Techniques. It is also a good idea to refer to specific sections (for pictures of incisions, instruments, etc.) of a good reference book on the surgical specialty you’ll be rotating in. Finally, like we all know, surgery is an area that is incredibly skill based and different people have different preferences when carrying out the same thing – be it tying a knot, controlling a bleeder or what have you. You’ll learn to modify the way you do things depending on the specific ways of your surgical team.

I’ve also interspersed keywords specific to two areas that I have an interest in with regards to surgery, or rather surgical oncology to be exact – general thoracic surgery and colorectal surgery.

I shall be updating this list as the need comes. Comments, corrections and feedback are always welcome! Bye for now :-) !

Copyright © Firas MR. All rights reserved.

Read the rest of this entry »

Written by Firas MR

August 21, 2009 at 1:37 am

Why Equivalence Studies Are So Fascinating

without comments

Bronze balance pans and lead weights from the Vapheio tholos tomb, circa 15th century BC. National Museum, Athens. Shot courtesy dandiffendale@Flickr. by-nc-ca license.

Bronze balance pans and lead weights from the Vapheio tholos tomb, circa 15th century BC. National Museum, Athens. Shot courtesy dandiffendale@Flickr. by-nc-sa license.

Objectives and talking points:

  • To recap basic concepts of hypothesis testing in scientific experiments. Readers should read-up on hypothesis testing in reference works.
  • To contrast drug vs. placebo and drug vs. standard drug study designs.
  • To contrast non-equivalence and equivalence studies.
  • To understand implications of these study designs, in terms of interpreting study results.

——————————————————————————————————–

Howdy readers! Today I’m going to share with you some very interesting concepts from a fabulous book that I finished recently – “Designing Clinical Research – An Epidemiologic Approach” by Stephen Hulley et al. The book speaks fairly early on, on what are called “equivalence studies”. Equivalence studies are truly fascinating. Let’s see how.

When a new drug is tested for efficacy, there are multiple ways for us to do so.

A Non-equivalence Study Of Drug vs. Placebo

A drug can be compared to something that doesn’t have any treatment effect whatsoever – a ‘placebo’. Examples of placebos include sugar tablets, distilled water, inert substances, etc. Because pharmaceutical companies try hard to make drugs that have a treatment effect and that are thus different from placebos, the objective of such a comparison is to answer the following question:

Is the new drug any different from the placebo?

Note the emphasis on ‘any different’. As is usually the case, a study of this kind is designed to test for differences between drug and placebo effects in both directions1. That is:

Is the new drug better than the placebo?

OR

Is the new drug worse than the placebo?

The boolean operator ‘OR’, is key here.

Since we can not conduct such an experiment on all people in the target ‘population’ (eg. all people with diabetes from the whole country), we conduct it on a random and representative ’sample’ of this population (eg. randomly selected diabetes patients from the whole country). Because of this, we can not directly extrapolate our findings to the target population without doing some fancy roundabout thinking and a lot of voodoo first – a.k.a. ‘hypothesis testing’. Hypothesis testing is crucial to take in to account random chance (error) effects that might have crept in to the experiment.

In this experiment:

  • The null hypothesis is that the drug and the placebo DO NOT differ in the real world2.
  • The alternative hypothesis is that the drug and the placebo DO differ in the real world.

So off we go, with our experiment with an understanding that our results might be influenced by random chance (error) effects. Say that, before we start, we take the following error rates to be acceptable:

  1. Even if the null hypothesis is true in the real world, we would find that the drug and the placebo DO NOT differ only 95% of the time, purely by random chance. [Although this rate doesn't have a name, it is equal to (1 - Type 1 error)].
  2. Even if the null hypothesis is true in the real world, we would find that the drug and the placebo DO differ 5% of the time, purely by random chance. [This rate is also called our Type 1 error, or critical level of significance, or critical α level, or critical 'p' value].
  3. Even if the alternative hypothesis is true in the real world, we would find that the drug and the placebo DO differ only 80% of the time, purely by random chance. [This rate is also called the 'Power' of the experiment. It is equal to (1 - Type 2 error)].
  4. Even if the alternative hypothesis is true in the real world, we would find that the drug and the placebo DO NOT differ 20% of the time, purely by random chance. [This rate is also called our Type 2 error].

The strategy of the experiment is this:

If we are able to accept these error rates and show in our experiment that the null hypothesis is false (that is ‘reject‘ it), the only other hypothesis left on the table is the alternative hypothesis. This has then, GOT to be true and we thus ‘accept’ the alternative hypothesis.

Q: With what degree of uncertainty?

A: With the uncertainty that we might arrive at such a conclusion 5% of the time, even if the null hypothesis is true in the real world.

Q: In English please!

A: With the uncertainty that we might arrive at a conclusion that the drug DOES differ from the placebo 5% of the time, even if the drug DOES NOT differ from the placebo in the real world.

Our next question would be:

Q: How do we reject the null hypothesis?

A: We proceed by initially assuming that the null hypothesis is true in the real world (i.e. Drug effect DOES NOT differ from Placebo effect in the real world). We then use a ‘test of statistical significance‘ to calculate the probability of observing a difference in treatment effect in the real world, as large or larger than that actually observed in the experiment.  If this probability is <5%, we reject the null hypothesis. We do this with the belief that such a conclusion is within our pre-selected margin of error. Our pre-selected margin of error, as mentioned previously, is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)3.

If we fail to show that this calculated probability is <5%, we ‘fail to reject‘ the null hypothesis and conclude that a difference in effect has not been proven4.

A lot of scientific literature out there is riddled with drug vs. placebo studies. This kind of thing is good if we do not already have an effective drug for our needs. Usually though, we already have a standard drug that we know works well. It is of more interest to see how a new drug compares to our standard drug.

A Non-equivalence Study Of Drug vs. Standard Drug

These studies are conceptually the same as drug vs. placebo studies and the same reasoning for inference is applied. These studies ask the following question:

Is the new drug any different than the standard drug?

Note the emphasis on ‘any different’. As is often the case, a study of this kind is designed to test the difference between the two drugs in both directions1. That is:

Is the new drug better than the standard drug?

OR

Is the new drug worse than the standard drug??

Again, the boolean operator ‘OR’, is key here.

In this kind of experiment:

  • The null hypothesis is that the new drug and the standard drug DO NOT differ in the real world2.
  • The alternative hypothesis is that the new drug and the standard drug DO differ in the real world.

Exactly like we discussed before, we initially assume that the null hypothesis is true in the real world (i.e. the new drug’s effect DOES NOT differ from the standard drug’s effect in the real world). We then use a ‘test of statistical significance‘ to calculate the probability of observing a difference in treatment effect in the real world, as large or larger than that actually observed in the experiment.  If this probability is <5%, we reject the null hypothesis – with the belief that such a conclusion is within our pre-selected margin of error. Just to repeat ourselves here, our pre-selected margin of error, is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)3.

If we fail to show that this calculated probability is <5%, we ‘fail to reject’ the null hypothesis and conclude that a difference in effect has not been proven4.

An Equivalence Study Of Drug vs. Standard Drug

Sometimes all you want is a drug that is as good as the standard drug. This can be for various reasons – the standard drug is just too expensive, just too difficult to manufacture, just too difficult to administer, … and so on. Whereas the new drug might not have these undesirable qualities yet retain the same treatment effect.

In an equivalence study, the incentive is to prove that the two drugs are the same. Like we did before, let’s explicitly formulate our two hypotheses:

  • The null hypothesis is that the new drug and the standard drug DO NOT differ in the real world2.
  • The alternative hypothesis is that the new drug and the standard drug DO differ in the real world.

We are mainly interested in proving the null hypothesis. Since this can’t be done4, we’ll be content with ‘failing to reject’ the null hypothesis. Our strategy is to design a study powerful enough to detect a difference close to 0 and then ‘fail to reject’ the null hypothesis. In doing so, although we can’t ‘prove’ for sure that the null hypothesis is true, we can nevertheless be more comfortable saying that it in fact is true.

In order to detect a difference close to 0, we have to increase the Power of the study from the usual 80% to something like 95% or higher. We wan’t to maximize power to detect the smallest difference possible. Usually though, it’s enough if we are able to detect the the largest difference that doesn’t have clinical meaning (eg: a difference of 4mm on a BP measurement). This way we can compromise a little on Power and choose a less extreme figure, say 88% or something.

And then just as in our previous examples, we proceed with the assumption that the null hypothesis is true in the real world. We then use a ‘test of statistical significance‘ to calculate the probability of observing a difference in treatment effect in the real world, as large or larger than that actually observed in the experiment.  If this probability is <5%, we reject the null hypothesis – with the belief that such a conclusion is within our pre-selected margin of error. And to repeat ourselves yet again (boy, do we like doing this :-P ), our pre-selected margin of error is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)3.

If we fail to show that this calculated probability is <5%, we ‘fail to reject‘ the null hypothesis and conclude that although a difference in effect has not been proven, we can be reasonably comfortable saying that there is in fact no difference in effect.

So Where Are The Gotchas?

If your study isn’t designed or conducted properly (eg: without enough power, inadequate  sample size, improper randomization, loss of subjects to followup, inaccurate measurements, etc.)  you might end up ‘failing to reject’ the null hypothesis whereas if you had taken the necessary precautions, this might not have happened and you would have come to the opposite conclusion. Purely because of random chance (error) effects. Such improper study designs usually dampen any obvious differences in treatment effect in the experiment.

In a non-equivalence study, researchers, whose incentive it is to reject the null hypothesis, are thus forced to make sure that their designs are rigorous.

In an equivalence study, this isn’t the case. Since researchers are motivated to ‘fail to reject’ the null hypothesis from the get go, it becomes an easy trap to conduct a study with all kinds of design flaws and very conveniently come to the conclusion that one has ‘failed to reject’ the null hypothesis!

Hence, it is extremely important, more so in equivalence studies than in non-equivalence studies, to have a critical and alert mind during all phases of the experiment. Interpreting an equivalence study published in a journal is hard, because one needs to know the very guts of everything the research team did!

Even though we have discussed these concepts with drugs as an example, you could apply the same reasoning to many other forms of treatment interventions.

Hope you’ve found this post interesting :-) . Do send in your suggestions, corrections and comments!

Adios for now!

Copyright © Firas MR. All rights reserved.

Readability grades for this post:

Flesch reading ease score: 71.4
Automated readability index: 8.1
Flesch-Kincaid grade level: 7.4
Coleman-Liau index: 9
Gunning fog index: 11.8
SMOG index: 11

1. An alternative hypothesis for such a study is called a ‘two-tailed alternative hypothesis‘. A study that tests for differences in only one direction has an alternative hypothesis that is called a ‘one-tailed alternative hypothesis‘.
2. This situation is a good example of a ‘null’ hypothesis also being a ‘nil’ hypothesis. A null hypothesis is usually a nil hypothesis, but it’s important to realize that this isn’t always the case.
4. Note that we never use the term, ‘accept the null hypothesis’.

What You Might Not Know About Scientific Journals

with 11 comments

A reviewer at the National Institutes of Health evaluates a grant proposal. (Wikipedia)

A reviewer at the National Institutes of Health evaluates a grant proposal. (Wikipedia)

I managed to read quite a number of interesting books in the last couple of months. Among them, was Scientific Writing: Easy When You Know How by Jennifer Peat et al. Marvelous book and one that I highly recommend. The book has been mainly written for health professionals. It gives you an insider’s view of how the entire peer(expert)-review process in scientific publishing works. There are also interesting nuggets on peer-review outside of medical journals such as conferences, scientific meetings, etc.

The publishing process in a nutshell:

  1. Upon submission to a journal, a paper will first go through preliminary screening by special staff who check for typographical errors. Not scientific merit. Did you stick to the word limit? Are the margins, fonts and spaces in accordance with the journal’s ‘instructions to authors‘ policy? If not, the paper will bounce back like rejected email!
  2. If it does scrape through, it goes to an editorial committee. Editors in turn run an ambiguous check on the paper’s scientific rigor and impact, whether it appeals to their sensibilities and whether it makes business sense to get it out in their journal. It is then forwarded to external reviewers.
  3. Many journals maintain databases of potential external reviewers who are ‘experts’ in their fields, some of whom are on contract for the journal and others who are not. These reviewers have a track record of being active in other journals and meetings. Journals may even rank reviewers based on whether they review papers on time, their general demeanor with authors of papers, etc. Often these chaps are perched in just about every nook and corner of the world. They look at the paper’s strengths and weaknesses in terms of study design, whether the conclusions put forth are in accordance with the reported results, whether the statistics measure up, whether certain areas need clarification, whether some parts should be rephrased or even omitted altogether. Their comments and annotations are then forwarded to the editors and in turn to the authors.
  4. Both editors and reviewers often refer to checklists to standardize this process, even if it be somewhat ambiguous. Because different people have different mental cutoffs for ‘clinical significance’ when it comes to reported results, different people will reach different conclusions even if they look at the same ’statistically significant’ data. When two reviewers differ in what they think about a paper, editors will often request a third reviewer to look at it.
  5. After a lot of back and forth communication between authors, editors and reviewers the paper is finally published. The editorial committee is the final arbiter that decides whether or not the paper gets published.
  6. This process usually take months, unless there is a good reason.

Here are some interesting facts that you might not know about scientific journals:

  1. Multiple surveys have shown that journals are more likely to publish ’statistically significant’ findings. This is an important thing to realize. For any scientific study with a Type 1 error rate of 5%, if the null hypothesis was true you would get a statistically significant result 5% of the time. Purely as a result of random chance. But it’s the 5% of studies that report such a ’statistically significant’ result that are more likely to get published than the remaining 95% of studies that don’t.
  2. Most of the scientific literature is biased in favor of content produced in English. Translated works are an extreme minority.
  3. The most popular articles in a journal are reviews, editorials, letters, etc. and not research papers. Consequently, journals contain more narrative reviews than genuine research. It’s what keeps them in business.
  4. Being published is not necessarily something that is a natural consequence of your scientific caliber or contribution to mankind. It is a very political and arbitrary thing. Maybe the editors or reviewers for the journal are biased against your work. Or it could be that the editors do not think publishing your paper will increase their business, for obscure reasons. Maybe your paper is just too specialized and caters to a minority niche of readers. Editors usually want stuff that sells and increases readership (who by the way, more often care about narrative reviews as mentioned previously), impact factors and profits. Quite similar to newspapers actually. Editors may even decide to publish a paper regardless of what the reviewers think, as long as it makes sense to them to do so!
  5. When you submit a paper to a journal for consideration, you immediately transfer whole and sole copyrights to it. You are not permitted to share that paper outside of the research team without prior permission from the editors. Transfer of copyright to journals is pretty common and there are only a minority of fledgling journals out there that give you the luxury of retaining copyrights.
  6. Many journals have pre-publication ‘embargoes’. If you have discussed your paper in a scientific conference, meeting, on a random website, with the press … and so on, different journals will have different policies on whether or not such a paper constitutes ‘duplicate’ material. That depends on how many beans you spilled out during such conferences, talks, … etc. and under what circumstances. Did you discuss just the abstract, some random figures and tables or the whole thing? Did you submit the paper before or after such disclosure? Does it constitute a copyright violation? If it’s considered duplicate, it will not be published unless there is a good reason.
  7. Transfer of copyright also means that you cannot submit your paper elsewhere or hand out copies of it to colleagues in meetings, conferences, etc. You can’t show off the paper on a website either. As long as the paper is under consideration for publication, you need prior permission from the journal. If the paper is rejected or withdrawn from submission, the copyrights are transferred back to the authors.
  8. Different journals will have different time limits on copyright. Some will allow you to maintain a copy on a website or a repository after a number of years have passed. These can rightly be called post-publication ‘embargoes’2.
  9. Scientific knowledge is thus ultimately controlled by vested interests making it difficult for a free and open society. This has led to calls for reform in peer-reviewed scientific publishing, including the open-access movement. There are two main models in open-access: Open-access journals, that make all peer-reviewed content free to the public. Journals from the Public Library Of Science (PLoS) are a good example. Open-access self-archives are another model. Authors can deposit copies (a.k.a. ’self-archives’) of pre-prints or post-prints of articles that they have submitted to non-open-access, peer-reviewed journals that agree to such activity. They can then share these self-archives using websites and other tools. However, often self-archives are deposited in repositories which are usually institutional. Such repositories allow free public access not only to peer-reviewed scholarly content, but also non-peer-reviewed content such as theses and other gray literature. OAIster is a good example of a cross-repository search engine1.
  10. In certain cases you may want to submit your research for urgent publishing. Different journals will call these kinds of papers by different names – ‘rapid response’, ‘rapid paper‘ …, etc. Often they do not contain too much detail as to study design or statistical rigor. These papers will be submitted by editors to external reviewers on the condition that they be reviewed within a specified time frame. Once such a paper has been accepted and published, you may not be able to submit an addendum or supplement later as it might be considered ‘duplicate’ material!
  11. Following reporting guidelines such as those mentioned at the Equator Network, will improve your chances of being published.
  12. Submitting your paper to a specialty journal increases your chances of success. Most papers fulfill a niche and so do most specialty journals.
  13. The chances of you being struck by lightning are higher than the chances that your paper will be accepted without modification. Nearly always, editors and reviewers will get back asking you to change your paper in some way.
  14. In highly specialized fields, many journals will use the same set of reviewers. If you disagree with a reviewer and choose to withdraw your submission, it will not do you much good to submit to a different journal.
  15. Reviewers are usually free to remain anonymous to authors. And some journals will let authors be anonymous to reviewers in the interest of fairness. However, anonymity does not always happen.
  16. If you are well known in your field, don’t be surprised if you receive an offer to expert-review a paper from a random journal.
  17. Despite how enticing it sounds, reviewers do not make a lot of money from this business!
  18. Different journals select editors using different criteria. At the end of the day, it is the business team of a journal that usually decides. A candidate who can improve a journal’s appeal, impact factor and business profits ultimately wins.

Have anything else to share that’s not on the list? Send me your feedback and I’ll put it up here!

Your feedback counts:

1. Special thanks to Stevan Harnad of Open Access Archivangelism fame for corrections in the comments. Matt Warren writes in to talk about the NIH’s involvement in open-access. Their Pubmed Central service is worth checking out. [go back]
2. With regards to ‘embargoes’ and copyrights, Christina Pikas writes in to say that most of this stuff is part of the ‘copyright transfer agreement’, which should always be examined carefully. She also says that many institutions can influence how many rights you have and that if your work was done for a corporation, a corporate lawyer will often help you in the process. Just to add a tiny point, the book that I referred to above mentions that many institutions have policies on copyright and intellectual property (IP) for their departments. Some will allow researchers to hold on to IP rights, while others will take over these IP rights from them. It’s always a good idea to check with your institution or department. [go back]

Copyright © Firas MR. All rights reserved.

Readability grades for this post:

Flesch reading ease score: 62.7
Automated readability index: 8
Flesch-Kincaid grade level: 7.6
Coleman-Liau index: 10.9
Gunning fog index: 11.1
SMOG index: 10.6

Powered by ScribeFire.

A Brief Tour Of The Field Of Bioinformatics

with 5 comments

This is an example of a full genome sequencing machine. It is the ABI PRISM 3100 Genetic Analyzer. Sequencers like it completely automate the process of sequencing the entire genome. Yes, even yours! [Courtesy: Wikipedia]

Some Background Before The Tour

Ahoy readers! I’ve had the opportunity to read a number of books recently. Among them, is “Developing Bioinformatics Computer Skills” by Cynthia Gibas and Per Jambeck. I dived into the book straight away, having no basic knowledge at all of what comprises the field of bioinformatics. Actually, it was quite like the first time I started medical college. On our first day, we were handed a tiny handbook on human anatomy, called “Handbook Of General Anatomy” by B D Chaurasia. Until actually opening that book, absolutely no one in the class had any idea of what Medicine truly was. All we had with us were impressions of charismatic white-coats who could, as if by magic, diagnose all kinds of weird things by the mere touch of a hand. Not to mention, legendary tales from the likes of Discovery Channel. Oh yes, our expectations were of epic proportions :-P . As we flipped through the pages of that little book, we were flabbergasted by the sheer volume of information that one had to rote. It had soon become clear to us, what medicine was all about – Physiology is the study of normal body functions akin to physics, Anatomy is the study of the structural organization of the human body a la geography … – and this set us on the path to learning to endure an avalanche of learn-by-rote information for the rest of our lives.

Bioinformatics is shrouded in mystery for most medics. Because, so many of these ideas are completely new. The technologies are new. The data available are new. Before the human genome was sequenced, there was virtually no point of using computers to understand genes and alleles. Most of what needed to be sorted out could be done by hand. But now that we have huge volumes of data, and data that are growing at an exponential rate at that, it makes sense to use computers to connect the dots and frame hypotheses. I guess, bioinformatics is a conundrum to most other people too – whether you are coming from a math background, a computer science background or a biology background – we all have something missing from our repertoire of knowledge and skills.

What is the rationale behind using computation to understand genes? In yore times, all we had were a couple of known genes. We had the tools of Mendelian genetics and linkage analysis to solve most of the genetic mysteries. The human genome project changed that. We are suddenly flooded not only with sequences that we don’t know anything about, but also the gigantic hurdle of finding relationships between them. To give you a sense of the magnitude of numbers we’re talking about here: we could simplify DNA’s 3-D structure and represent the entire genetic code contained in a single polynucleotide strand of the human genome, as a string of letters A, C, G or T each representing a given nucleic acid (base) in a long sequence (like so …..ATCGTTACGTAAAA…..). Since it has been found that this strand is approximately 3 billion bases long, its entire length comes to 3 billion bytes. That’s because each letter A, T, C or G could be thought of as being represented by a single ASCII character. And we all know that an ASCII character is equal to 1 byte of data. Since we are talking about two complementary strands within a molecule of DNA, the amount of information within the genome is 6 billion bytes§. But human cells are diploid! So the amount of DNA information in the nucleus of a single human cell is 12 billion bytes! That’s 1.2 terabytes of data neatly packed in to the DNA sequence of every cell – we haven’t even begun to talk about the 3-D structure of DNA or the sequence and 3-D structure of RNA and proteins yet!

§ Special thanks to Martijn for bringing this up in the comments: If you really think about it for a moment, bioinformaticians don’t need to store the sequences of both the DNA strands of a genome in a computer, because the sequence of one strand can be derived from the other – they are complementary by definition. If you store 3 billion bytes from one strand, you can easily derive the complementary 3 billion bytes of information on the other strand, provided that the two strands are truly complementary and there aren’t any blips of mismatch mutations between them. Using this concept, you can get away with storing 3 billion bytes and not 6 billion bytes to capture the information in the human genome.

Special thanks also to Dr. Atul Butte ¥ of Stanford University who dropped by to say that a programmer really doesn’t need a full byte to store a nucleic acid base. A base can be represented by 2 bits (eg. 00 for A, 11 for C, 01 for G and 10 for T). Since 1 byte contains 8 bits, a byte can actually hold 4 bases. Without compression. So 3 billion bases can be held within 750,000,000 bytes. That’s 715 megabytes (1 megabyte = 1048576 bytes), which can easily fit on to an extended-length CD-ROM (not even a DVD). So the entire genetic code from a single polynucleotide strand of the human genome can easily fit on to a single CD-ROM. Since human cells are diploid, with two CD-ROMs – one CD-ROM for each set of chromosomes – you can capture this information for both sets of chromosomes. [go back]

To compound the issue, we don’t have a taxonomy system in place to describe the sequences we have. When Linnaeus invented his taxonomy system for living things, he used basic morphologic criteria to classify organisms. If it walked like a duck and talked like a duck, it was a duck! But how do you apply this reasoning to genes? You might think, why not classify them by organism? But there’s a more subtle issue here too. Some of these genetic sequences can be classified in to various categories – is this gene a promoter, exon, intron or could it be a sequence that plays a role in growth, death, inflammatory response, and so on. Not only that, many sequences could be found in more than one organism. So how do you solve the problem of classification? Man’s answer to this problem is simple – you don’t!

Here’s how we can get away with that. Simply create a relational database using MySQL, PostgreSQL or what have you and create appropriate links between sequence entries, their functions, etc. Run queries to find relationships and voila, there you have it! This was our first step in developing bioinformatics as a field. Building databases. You can do this with a genetic sequence (a string of letters A for ‘adenine‘, C for ‘cytosine‘, G for ‘guanine‘ and T for ‘thymine‘ …represented like so ATGGCTCCTATGCGGTTAAAATTT….) or with an RNA sequence (a string of letters A for ‘adenine’, C for ‘cytosine, G for ‘guanine’ and U for ‘Uracil‘ like so …AUGGCACCCU…) or even a protein sequence (a string of 20 letters each letter representing one amino acid). By breaking down and simplifying a 3-D structure this way, you can suddenly enhance data storage, retrieval and more importantly, analysis between:

  1. Two or more sequences of DNA
  2. Two or more sequences of RNA
  3. Two or more sequences of Protein

You can even find relationships between:

  1. A DNA sequence and an RNA sequence
  2. An RNA sequence and a Protein sequence
  3. A DNA sequence and a Protein sequence

If you can represent the spatial coordinates of the molecules within a protein 3-D structure as cartesian coordinates (x, y, z), you can even analyze structure not only within a given protein, but also try to predict the best possible 3-D structure for a protein that is hypothetically synthesized by a given DNA or RNA sequence. In fact that is the Holy Grail of bioinformatics today. How to predict protein structure from a DNA sequence? And consequentially, how to manipulate protein structure to suit your needs.

The Tour Begins

Let’s take a tour of what bioinformatics holds for us.

The Ability To Build Relational Databases

We have already discussed this above.

Local Sequence Comparison

An example of sequence alignment. Alignment of 27 avian influenza hemagglutinin protein sequences colored by residue conservation (top) and residue properties (bottom) [Courtesy: Wikipedia]

Before we delve in to the idea of sequence comparisons further, let’s take an example from the bioinformatics book I mentioned to understand how sequence comparisons help in the real world. It speaks of a gene-knockout experiment that targets a specific sequence in the fruit fly’s (Drosophila melanogaster) genome. Knocking this sequence out, results in the flies’ progeny being born without eyes. By knocking this gene – called eyeless – out you learn that it somehow plays an important role in eye development in the fruit fly. There’s a similar (but not quite the same) condition in humans called aniridia, in which eyes develop in the usual manner, except for the lack of an iris. Researchers were able to identify the particular gene that causes aniridia and called it aniridia. By inserting the aniridia gene in to an eyeless-knockout Drosophila’s genome, they observed that suddenly its offspring bore eyes! Remarkable isn’t it? Somehow there’s a connection between two genes separated not only by different species, but also by genera and phyla. To discern how each of these genes functions, you proceed by asking if the two sequences could be the same? How similar would they might be exactly? To answer this question you could do an alignment of the two sequences. This is the absolute basic kind of stuff when we do sequence analysis.

Instead of doing it by hand (which could be possible if the sequences being compared were small), you could find the best alignment between these two long sequences using a program such as BLAST. There are a number of ways BLAST can work. Because the two sequences may have only certain regions that fit nicely, with other regions that don’t – called gaps – you can have multiple ways of aligning them side by side. But what you are interested in, is to find the best fit that maximizes how much they overlap with each other (and minimize gaps). Here’s where computer science comes in to play. In order to maximize overlap, you use the concept of ‘dynamic programming‘. It is helpful to understand dynamic programming as an algorithm rather than a program per se (it’s not like you’ll be sitting in front of a computer and programming code if you want to compare eyeless and aniridia; the BLAST program will do the dirty work for you. It uses dynamic programming code that’s built in to it). Amazingly enough, dynamic programming is not something as hi-fi as you might think. It is apparently the same strategy used in many computer spell-checkers! Little did the bioinformaticians who first developed dynamic programming techniques in genetics know, that the concept of dynamic programming was discovered far earlier than them. There are apparently many such cases in bioinformatics where scientists keep reinventing the wheel, purely because it is such an interdisciplinary field! One of the most common algorithms that is a subset of dynamic programming and that is used for aligning specific sequences within a genome is called the Smith-Waterman algorithm. Like dynamic programming, another useful algorithm in bioinformatics is what is called a greedy algorithm. In a greedy algorithm, you are interested in maximizing overlap in each baby-step as you construct the alignment procedure, without consideration to the final overlap. In other words, it doesn’t matter to you how the sequences overlap in the end as long as each step of the way during the alignment process, you maximize overlap. Other concepts in alignment include, using a (substitution) matrix of possible scores when two letters – each in a sequence – overlap and trying to maximize scores using dynamic programming. Common matrices for this purpose are BLOSUM-62, BLOSUM-45 and PAM (Point Accepted Mutation).

So now that we know the basic idea behind sequence alignment, here’s what you can actually do in sequence analysis:

  1. Using alignment, find a sequence from a database (eg. GenBank from the NCBI) that maximizes overlap between it and a sequence that isn’t yet in the database. This way, if you discover some new sequence, you can find relationships between it and known sequences. If the sequence in the database is associated with a given protein, you might be able to look for it in your specimen. This is called pairwise alignment.
  2. Just as you can compare two sequences and find out if there is a statistically significant association between them or not, you can also compare multiple sequences at once. This is called multiple sequence alignment.
  3. If certain regions of two sequences are the same, it can be inferred that they are conserved across species or organisms despite environmental stresses and evolution. A sequence encoding development of the eye is very likely to remain unchanged across multiple species for which sight is an essential function to survive. Here comes another interesting concept – phylogenetic relationships between organisms at a genetic level. Using alignment it is possible to develop phylogenetic trees and phylogenetic networks that link two or more gene sequences and as a consequence find related proteins.
  4. Similar to finding evolutionary homology between sequences as above, one could also look for homology between protein structures – motifs – and then conclude that the regions of DNA encoding these proteins have a certain degree of homology.
  5. There are tools in sequence analysis that look at features characteristic of known functioning regions of DNA and see if the same features exist in a random sequence. This process is called gene finding. You’re trying to discover functionality in hitherto unknown sequences of DNA. This is important, as the vast majority of genetic code is as far as we know, non-functional random junk. Could there be some region in this vast ocean of randomness that might, just might have an interesting function? Gene finding uses software that looks for tRNA encoding regions, promoter sites, open reading frames, exon-intron splicing regions, … – in short, the whole gamut of what we know is characteristic of functional code – in random junk. Once a statistically significant result is obtained, you’re ready to test this in a lab!
  6. A special situation in sequence alignment is whole genome alignment (or global alignment). That is, finding the best fit between entire genomes of different organisms! Despite how arduous this sounds, the underlying ideas are pretty similar to local sequence alignment. One of the most common dynamic programming algorithms used in whole genome alignment is the Needleman–Wunsch algorithm.

Many of the things discussed for sequence analysis of DNA, have equal counterparts for RNA and proteins.

Protein Structure Property Analysis

Say that you have an amino acid sequence for a protein. There’s nothing in the databases that has your sequence. In order to build a 3-D model of this  protein, you’ll need to predict what could be the best possible shape given the constraints of bond angles, electrostatic forces between constituent atoms, etc. There’s a specific technique that warrants mentioning here – the Ramachandran Plot – that takes information on steric hindrance and plots the probabilities for different 3-D structures of an amino acid sequence. With a 3-D model, you could try to predict this protein’s chemical properties (such as pKa, etc.). You could also look for active sites on this protein that are the crucial regions that bind to substrates, based on known structures of active sites from other proteins… and so on.

This figure depicts an unrooted phylogenetic tree for myosin, a superfamily of proteins. [Courtesy: Wikipedia]

Protein Structure Alignment

This is when you try to find the best fit between two protein structures. The idea is very similar to sequence alignment, only this time the algorithms are a bit different. In most cases, the algorithms for this process are computationally intensive and rely on trial and error. You could build phylogenetic trees based on structural evolutionary homology too.

Protein Fingerprint Analysis

This is basically using computational tools to identify relationships between two or more proteins by analyzing their break-down products – their peptide fingerprints. Using protein fragments, it is possible to compare entire cocktails of different proteins. How does the protein mixture from a human retinal cell, compare to a protein mixture from the retinal cell of a mouse? This kind of stuff, is called Proteomics, because you’re comparing the entire protein from an organism to another. You could also analyze protein fragments from different cells within the same organism to see how they might have evolved or developed.

DNA Micro-array Analysis

A DNA microarray is a slide with hundreds of tiny dots on it. Each dot is tagged with a fluorescent marker that glows under UV (or another form of) light, if the cells within that dot produce a given protein. When a given protein is made, it means that a given genetic sequence is being expressed (or transcribed into RNA which in turn is being translated in to protein). By inoculating these dots with the same population of cells and by measuring the amount of light coming from these dots, you could develop a gene expression profile for these cells. You could then study the expression profiles of these cells under different environmental conditions to see how they behave and change.

You could also inoculate different dots with different cell populations and study how their expression profiles differ. Example: normal gastric epithelium vs cancerous gastric epithelium.

Of course you could try looking at all these light emitting dots with your eyes and count manually. If you want to take a shot at it, you might even be able to tell the difference between the different levels of brightness between dots! But why not use computers to do the job for you? There are software tools out there that can quantitatively measure these expression profiles for you.

Primer Design

There are many experiments and indeed diagnostic tests that use an artificially synthesized DNA sequence to serve as an anchor that flanks a specific region of interest in the DNA of a cell, and amplify this region. By amplify – we mean, make multiple copies. These flanking sequences are also called primers. Applications for example include, amplifying DNA material of the HIV virus to better detect presence or absence of HIV in the blood of a patient. The specific name for this kind of test or experiment is called the polymerase chain reaction. There are a number of other applications of primers such as gene cloning, genetic hybridization, etc. Primers ought to be constructed in specific ways that prevent them from forming loops or binding to non-specific sites on cell DNA. How do you find the best candidate for a primer? Of course, computation!

Metabolomics

A fancy word for modeling metabolic pathways and their relationships using computational analyses. How does the glycolytic pathway relate to some random metabolic pathway found in the neurons of the brain? Computational tools help identify potential relationships between all of these different pathways and help you map them. In fact, there are metabolic pathway maps out there on the web that continually get updated to reflect this fascinating area of ongoing research.

I guess that covers a whole lot of what bioinformatics is all about. When it comes to definitions, some people say that bioinformatics is the application part whereas computational biology is the part that mainly deals with the development of algorithms.

Neologisms Galore!

As you can see, some fancy new words have come into existence as a result of all this frenzied activity:

  • Genomics: Strictly speaking, the study of entire genomes of organisms/cells. In bioinformatics, this term is applied to any studies on DNA.
  • Transcriptomics: Strictly speaking, the study of entire transcriptomes (the RNA complement of DNA) of organisms/cells. In bioinformatics, this term is applied to any studies on RNA.
  • Proteomics: Strictly speaking, the study of entire proteins made by organisms/cells. In bioinformatics, this term is applied to any studies on proteins. Structural biology is a special branch of proteomics that explores the 3-D structure of proteins.
  • Metabolomics: The study of entire metabolic pathways in organisms/cells. In bioinformatics, this term is applied to any studies on metabolic pathways and their inter-relationships.

Real World Impact

So what can all of this theoretical ‘data-dredging’ give us anyway? Short answer – hypotheses. Once you have a theoretical hypothesis for something you can test it in the lab. Without forming intelligent hypotheses, humanity might very well take centuries to experiment with every possible permutation or combination of data that has been amassed so far and mind you, which continues to grow as we speak!

Thanks to bioinformatics, we are now discovering genetic relationships between different diseases that were hitherto considered completely unrelated – such as diabetes mellitus and rheumatoid arthritis! Scientists like Dr. Atul Butte [go back] and his team are trying to reclassify all known diseases using all of the data that we’ve been able to gather from Genomics. Soon, the days of the traditional International Classification of Diseases (ICD) might be gone. We might some day have a genetic ICD!

Sequencing of individual human genomes (technology for this already exists and many commercial entities out there will happily sequence your genome for a fee) could help in detecting or predicting disease susceptibility.

Proteins could be substituted between organisms (a la pig and human insulin) and better yet, completely manipulated to suit an objective – such as drug delivery or effectiveness. Knowing a DNA sequence, would give you enough information to predict protein structure and function, giving you yet another tool in diagnosis.

And the list of possibilities is endless!

Bioinformatics, is thus man’s attempt to making biology and medicine a predictive science :-) .

Further Reading

I haven’t had the chance to read any other books on bioinformatics, what with exams just a couple of months away. Having read, “Developing Bioinformatics Computer Skills“, and found it a little too dense especially in the last couple of chapters, I would only recommend it as an introductory text to someone who already has some knowledge of computer algorithms. Because different algorithms have different caveats and statistical gotchas, it makes sense to have a sound understanding of what each of these algorithms do. Although the authors have done a pretty decent job in describing the essentials, the explanations of the algorithms and how they really function are a bit complicated for the average biologist. It’s difficult for me to recommend a book that I might not have read, but here are two I’m considering worth exploring in the future:

Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum

Introduction to Bioinformatics: A Theoretical And Practical Approach by Stephen Krawetz and David Womble

As books to refresh my knowledge of molecular biology and genetics I’m considering the following:

Molecular Biology Of The Cell by Bruce Alberts et al

Molecular Biology Of The Gene by none other than James D Watson himself et al (Of ‘Watson & Crick‘ model of DNA fame)

Let me know if you have any other suggested readings in the comments1.

There are also a number of excellent Opencourseware lectures on bioinformatics out on the web (example: at AcademicEarth.org. For beginners though, I suggest Dr. Daniel Lopresti’s (Lehigh University) fantastic high level introduction to the field here. Also don’t forget to check out “A Short Course On Synthetic Genomics” by George Church and Craig Venter on Edge.org for a fascinating overview of what might lie ahead in the future! In the race to sequence the human genome, Craig Venter headed the main private company that posed competition to the NIH’s project. His group of researchers ultimately developed a much faster way to sequence the genome than had previously been imagined - the shotgun sequencing method.

Hope you’ve enjoyed this high level tour. Do send in your thoughts, suggestions and corrections!

Copyright © Firas MR. All rights reserved.

Your feedback counts:

1. Dr. Atul Butte ¥ suggests checking out some of the excellent material at NCBI’s Bookshelf. [go back]

Readability grades for this post:

Flesch reading ease score: 57.4
Automated readability index: 10.8
Flesch-Kincaid grade level: 9.7
Coleman-Liau index: 11.5
Gunning fog index: 13.4
SMOG index: 12.2

Powered by ScribeFire.

Does Changing Your Anwer In The Exam Help?

with 6 comments

monty hall paradox

The Monty Hall Paradox

One of the 3 doors hides a car. The other two hide a goat each. In search of a new car, the player picks a door, say 1. The game host then opens one of the other doors, say 3, to reveal a goat and offers to let the player pick door 2 instead of door 1. Is there an advantage if the the player decides to switch? (Courtesy: Wikipedia)

Hola amigos! Yes, I’m back! It’s been eons and I’m sure many of you may have been wondering why I was MIA. Let’s just say it was academia as usual.

This post is unique as it’s probably the first where I’ve actually learned something from contributors and feedback. A very critical audience and pure awesome discussion. The main thrust was going to be an analysis of the question, “If you had to pick an answer in an MCQ randomly, does changing your answer alter the probabilities to success?” and it was my hope to use decision trees to attack the question. I first learned about decision trees and decision analysis in Dr. Harvey Motulsky’s great book, “Intuitive Biostatistics“. I do highly recommend his book. As I pondered over the question, I drew a decision tree that I extrapolated from his book. Thanks to initial feedback from BrownSandokan (my venerable computer scientist friend from yore :P ) and Dr. Motulsky himself, who was so kind as to write back to just a random reader, it turned out that my diagram was wrong and so was the original analysis. The problem with the original tree (that I’m going to maintain for other readers to see and reflect on here) was that the tree in the book is specifically for a math (or rather logic) problem called the Monty Hall Paradox. You can read more about it here. As you can see, the Monty Hall Paradox is a special kind of unequal conditional probability problem, in which knowing something for sure, influences the probabilities of your guesstimates. It’s a very interesting problem, and has bewildered thousands of people, me included. When it was originally circulated in a popular magazine,  “nearly 1000 PhDs” (cf. Wikipedia) wrote back to say that the solution put forth was wrong, prompting numerous psychoanalytical studies to understand human behavior. A decision tree for such a problem is conceptually different from a decision tree for our question and so my original analysis was incorrect.

So what the heck are decision trees anyway? They are basically conceptual tools that help you make the right decisions given a couple of known probabilities. You draw a line to represent a decision, and explicitly label it with a corresponding probability. To find the final probability for a number of decisions (or lines) in sequence, you multiply or add their individual probabilities. It takes skill and a critical mind to build a correct tree, as I learned. But once you have a tree in front of you, its easier to see the whole picture.

Let’s just ignore decision trees completely for the moment and think in the usual sense. How good an idea is it to change an answer on an MCQ exam such as the USMLE? The Kaplan lecture notes will tell you that your chances of being correct are better off if you don’t. Let’s analyze this. If every question has 1 correct option and 4 incorrect options (the total number of options being 5), then any single try on a random choice gives you a probability of 20% for the correct choice and 80% for the incorrect choice. The odds are higher that on any given attempt, you’ll get the answer wrong. If your choice was correct the first time, it still doesn’t change these basic odds. You are still likely to pick the incorrect choice 80% of the time. Borrowing from the concept of “regression towards the mean” (repeated measurements of something, yield values closer to said thing’s mean), we can apply the same reasoning to this problem. Since the outcomes in question are categorical (binomial to be exact), the measure of central tendency used is the Mode (defined as the most commonly or frequently occurring thing in a series). In a categorical series – cat, dog, dog, dog, cat – the mode is ‘dog’. Since the Mode in this case happens to be the category “incorrect”, if you pick a random answer and repeat this multiple times, you are more likely to pick an incorrect answer! See, it all make sense :) ! It’s not voodoo after all :D !

Coming back to decision analysis, just as there’s a way to prove the solution to the Monty Hall Paradox using decision trees, there’s also a way to prove our point on the MCQ problem using decision trees. While I study to polish my understanding of decision trees, building them for either of these problems will be a work in progress. And when I’ve figured it all out, I’ll put them up here. A decision tree for the Monty Hall Paradox can be accessed here.

To end this post, I’m going to complicate our main question a little bit and leave it out in the void. What if on your initial attempt you have no idea which of the answers is correct or incorrect but on your second attempt, your mind suddenly focuses on a structure flaw in one or more of the options? Assuming that an option with a structure flaw can’t be correct, wouldn’t this be akin to Monty showing the goat? One possible structure flaw, could be an option that doesn’t make grammatical sense when combined with the stem of the question. Does that mean you should switch? Leave your comments below!

Hope you’ve found this post interesting. Adios for now!

Copyright © Firas MR. All rights reserved.

Readability grades for this post:

Flesch reading ease score:  72.4
Automated readability index: 7.8
Flesch-Kincaid grade level: 7.3
Coleman-Liau index: 8.5
Gunning fog index: 11.4
SMOG index: 10.7

Readings:

Intuitive Biostatistics, by Harvey Motulsky

The Monty Hall Problem: The Remarkable Story Of Math’s Most Contentious Brain Teaser, by Jason Rosenhouse

, , , , , , , , ,

Powered by ScribeFire.

Infusions Redux, DNS And Cerebral Edema

without comments

Source, Author and License

There’s a book on fluid and electrolyte management that I’ve been reading recently. Called, “Practical Guideline on Fluid Therapy” and authored, as probably evident by the English used in the title, by a very Indian Sanjay Pandya, the book contains many interesting nuggets for day to day practice. Although like most Indian books there is a distinct absence of the emphasis on applying one’s brain, it is nevertheless worth the time to peruse. Today I will be discussing two equations from the book and a question that came up in my mind about the usage of a specific fluid.

Calculating ECF volume deficit (in dehydration, etc.)

  1. If the patient’s previous body weight is known, all you gotta do to obtain ECF deficit is find out the difference between his present and past weight.
  2. Another technique uses changes in the Hematocrit to discern ECF volume deficit. This method is applicable only if there is no hemorrhage, hemolysis or other situations involving loss of blood cells, the idea being that any change in blood volume is caused by plasma volume change. So if there’s dehydration and loss of ECF volume, plasma volume shrinks and causes the hematocrit to rise.

ECF Volume Deficit in liters = 0.2 * lean body weight * [(Current hematocrit/Desired hematocrit) - 1]

Can someone figure out the proof for the above equation and post it here? Like most other stuff, I absolutely hate roting math formulas and prefer remembering their derivations. This equation is taking me some time to prove.

To help get started, here are a couple of possible pointers I’m currently exploring:

Total body water (TBW) when expressed as a percentage of Total body weight (TBwt), varies by gender and age. In young adult men for example

TBW = 60% TBwt

TBW in liters

TBwt in kg

Interestingly enough, TBW when expressed as a percentage of lean body weight (LBwt) is a constant and isn’t conditioned upon gender or age.

TBW = 70% LBwt

LBwt = (100/70) * TBW

= (100/70) * [(x/100) * TBwt]

= (x/70) * TBwt

x is the percentage of TBwt that is TBW

Plasma volume is related to blood volume as follows

Plasma volume = Blood volume * [(100 - Hematocrit)/Hematocrit]

Plasma volume is also 1/4 of ECF volume. ECF is 1/3 of TBW. So plasma volume is 1/12 of TBW.

Calculating Electrolyte Infusion Rates

Change in plasma electrolyte concentration in mEq/L when 1 liter of  infusate is given

= [Infusate electrolyte concentration in mEq/L - Actual electrolyte concentration in mEq/L] / (TBW + 1)

This one’s easy to derive. Taking Na+ as our electrolyte example,

Initial Na+ content = x * TBW

Initial Na+ concentration = (x * TBW)/TBW

Final Na+ content after infusing 1L infusate = (x * TBW) + {y * 1}

Final Na+ concentration = [(x * TBW) + {y}]/(TBW + 1)

Change in Na+ concentration due to infusion = [(x * TBW) + {y}/(TBW + 1)] – [(x * TBW)/TBW]

= (yx)/(TBW+1)

x = mEq/L of Na+ initially in the body

y = mEq/L of Na+ in the infusate

And voila! There you have it!

And now for that promised question:

Given the fact that DNS (Dextrose Normal Saline) only stays in the ECF, would it be right to assume that it’s contraindicated in cerebral edema?

The interesting thing is that on exploring the scientific literature, I found that recent research shows that it isn’t just the shifting of fluid into the brain parenchyma that should be avoided when infusing fluid; hyperglycemia is a real danger as well. How hyperglycemia contributes to cerebral edema and especially in situations of cerebral ischemia is a topic of ongoing research and multiple plausible hypotheses are being investigated.

As per Pandya’s book, by the way, it is best to restrict glucose infusion to ≤ 0.5 grams/kg/hour when infusing any glucose containing fluid to avoid complications of hyperglycemia.

Readability grades for this post:

Kincaid: 11.4
ARI: 12.4
Coleman-Liau: 11.2
Flesch Index: 57.0/100
Fog Index: 14.6
Lix: 46.9 = school year 8
SMOG-Grading: 12.4

Copyright © Firas MR. All rights reserved.

Written by Firas MR

June 28, 2008 at 1:54 pm

Infusion Confusion – How To Calculate Drug Infusion Rates

with 4 comments

Source, author and license

The erosion of math and analytical skills that occurs with medics is truly astounding. Not surprising some might argue, what with it being such a memory oriented field. One area that many medics struggle with is drug dosage calculations. In the ER, one often doesn’t have the luxury of time and instant thinking is absolutely critical. Numbers need to be played out in seconds and optimal drug regimens have to be formulated. I was helping a colleague understand calculations for dopamine infusion the other day and thought like sharing with you folks some of the things we talked about.

Dopamine is used especially in ER settings to increase perfusion/blood pressure by means of its vasopressor, inotropic and chronotropic effects. When re-establishing blood pressure in a patient,  attention not only needs to be paid to drugs that might be used but also fluid replacement for any amount of fluid loss from the body. Two questions need to be asked before starting a dopamine infusion:

  1. How much dopamine?
  2. How much fluid and how fast?

The usual dosage of dopamine is somewhere between 5-10 μg/kg/min. For the following example I’ll use 10 μg/kg/min.

1μg = 0.001mg.

For a patient weighing x kg, the dosage is therefore 0.01x mg/min. Now that you’ve established how much dopamine you need to infuse per minute, here comes the second part.

Suppose you intend to infuse y ml of fluid (as part of the dopamine infusion, i.e. aside from any other fluid infusions already in place). Say also that you’ve added z mg of dopamine to form the infusate. Dopamine is supplied in liquid form, so any amount of dopamine occupies a certain volume in ml, which in most situations is negligible.

y ml of infusate = volume of Normal Saline, etc. + volume of dopamine

If z mg of dopamine is contained in y ml of infusate,

0.01x mg dopamine is contained in [0.01x/z] * y ml of infusate.

Thus you’re interested in giving [0.01x/z] * y ml of infusate every minute and a simple formula is derived where:

rate of dopamine infusion in ml/min = [0.01x/z] * y

and therefore, z = [0.01x/(rate of infusion in ml/min)] * y

x = body weight in kg

z = amount of dopamine added in mg

y = total volume of infusate in ml

For any drug infusion:

rate of infusion in ml/min = [(total drug dose in mg/min)/(amount of drug added in infusate in mg)] * volume of infusate in ml

This infusate is typically given via an infusion set that specifies a unique drops per ml ratio. At our pediatrics ER for example, infusion sets come in two forms – microdrip infusion sets (1 ml = 60 drops) and macrodrip infusion sets (1 ml = 20 drops). Simply multiply the rate of infusion in ml/min with 60 or 20 to get the infusion rate in drops/min for micro and macro IV sets respectively.

As seen from the formula above, when deciding to add a given amount of drug to form the infusate, three things need to be fixed first:-

  1. Dose of drug in the mg/min format (should be appropriate to the clinical condition of the patient).
  2. Total volume of infusate in ml (again, this depends on the clinical condition and hemodynamic stability of the patient).
  3. Speed or rate of fluid replacement in ml/min (this is important as sudden fluid-volume changes in the body can be problematic in certain cases and you want to go for a rate that is optimal, neither too slow nor too fast.)

And with that I end this post. Hope readers find this useful. Comments and corrections are welcome!

Readability grades for this post:

Kincaid: 8.4
ARI: 7.9
Coleman-Liau: 10.2
Flesch Index: 65.7/100 (plain English)
Fog Index: 12.7
Lix: 39.4 = school year 6
SMOG-Grading: 11.6

Copyright © Firas MR. All rights reserved.

Written by Firas MR

June 13, 2008 at 1:41 pm

USMLE Scores – Debunking Common Myths

with 18 comments

Source, Author and License

Lot’s of people, particularly from the wider South Asian region, have misguided notions as to the true nature of USMLE scores and what exactly they represent. In my opinion, this occurs in part due to a lack of interest in understanding the logistic considerations of the exam. Another contributing factor could be the bordering brainless, mentally zero-ed scientific culture most exam goers happen to be cultivated in. Many if not most of these candidates, in their naive wisdoms got into Medicine hoping to rid themselves of numerical burdens forever!

The following, I hope, will help debunk some of these common myths.

Percentile? Uh…what percentile?

This myth is without doubt, the king of all :-) . It isn’t uncommon that you find a candidate basking in the self-righteous glory of having scored a ‘99 percent’ or worse, a ‘99 percentile’. The USMLE at one point used to provide percentile scores. That stopped sometime in the mid to late ’90s. Why? Well, the USMLE organization believed that scores were being unduly given more weightage than they ought to in medics’ careers. This test is a licensure exam, period. That has always been the motto. Among other things, when residency programs started using the exam as a yard stick to differentiate and rank students, the USMLE saw this as contrary to its primary purpose and said enough is enough. To make such rankings difficult, the USMLE no longer provides percentile scores to exam takers.

The USMLE does have an extremely detailed FAQ on what the 2-digit (which people confuse as a percentage or percentile) and 3-digit scores mean. I strongly urge all test-takers to take a hard look at it and ponder about some of the stuff said therein.

Simply put, the way the exam is designed, it measures a candidate’s level of knowledge and provides a 3-digit score with an important import. This 3-digit score is an unfiltered indication of an individual’s USMLE know-how, that in theory shouldn’t be influenced by variations in the content of the exam, be it across space (another exam center and/or questions from a different content pool) or time (exam content from the future or past). This means that provided a person’s knowledge remains constant, he or she should in theory, achieve the same 3-digit score regardless of where and when he or she took the test. Or, supposedly so. The minimum 3-digit score that is required to ‘pass’ the exam is revised on an annual basis to preserve this space-time independent nature of the score. For the last couple of years, the passing score has hovered around 185. A ‘pass’ score makes you eligible to apply for a license.

What then is the 2-digit score? For god knows what reason, the Federation of State Medical Boards (these people provide medics in the US, licenses based on their USMLE scores) has a 2-digit format for a ‘pass’ score on the USMLE exam. Unlike the 3-digit score this passing score is fixed at 75 and isn’t revised every year.

How does one convert a 3-digit score to a 2-digit score? The exact conversion algorithm hasn’t been disclosed (among lots of other things). But for matters of simplicity, I’m going to use a very crude approach to illustrate:

Equate the passing 3-digit score to 75. So if the passing 3-digit score is 180, then 180 = 75. 185 = 80, 190 = 85 … and so on.

I’m sure the relationship isn’t linear as shown above. For one, by very definition, a 2-digit score ends at 99. 100 is a 3-digit number! So let’s see what happens with our example above:

190 = 85, 195 = 90, 199 = 99. We’ve reached the 2-digit limit at this point. Any score higher than 199 will also be equated to 99. It doesn’t matter if you scored a 240 or 260 on the 3 digit scale. You immediately fall under the 99 bracket along with the lesser folk!

These distortions and constraints make the 2-digit score an unjust system to rank test-takers and today, most residency programs use the 3-digit score to compare people. Because the 3-digit to 2-digit scale conversion changes every year, it makes sense to stick to the 3-digit scale which makes comparisons between old-timers and new-timers possible, besides the obvious advantage in helping comparisons between candidates who deal/dealt with different exam content.

Making Assumptions And Approximate Guesses

The USMLE does provide Means and Standard Deviations on students’ score cards. But these statistics don’t strictly apply to them because they are derived from different test populations. The score card specifically mentions that these statistics are “for recent” instances of the test.

Each instance of an exam is directed at a group of people which form its test population. Each population has its own characteristics such as whether or not it’s governed by Gaussian statistics, whether there is skew or kurtosis in its distribution, etc. The summary statistics such as the mean and standard deviation will also vary between different test populations. So unless you know the exact summary statistics and the nature of the distribution that describes the test population from which a candidate comes, you can’t possibly assign him/her a percentile rank. And because Joe and Jane can be from two entirely different test populations, percentiles in the end don’t carry much meaning. It’s that simple folks.

You could however make assumptions and arbitrary conclusions about percentile ranks though. Say for argument sake, all populations have a mean equal to 220 and a standard deviation equal to 20 and conform to Gaussian statistics. Then a 3-digit score of:

220 = 50th percentile

220 + 20 = 84th percentile

220 + 20 + 20 = 97th percentile

[Going back to our '99 percentile' myth and with the specific example we used, don't you see how a score equal to 260 (with its 2-digit 99 equivalent) still doesn't reach the 99 percentile? It's amazing how severely people can delude themselves. A 99 percentile rank is no joke and I find it particularly fascinating to observe how hundreds of thousands of people ludicrously claim to have reached this magic rank with a 2-digit 99 score. I mean, doesn't the sheer commonality hint that something in their thinking is off?]

This calculator makes it easy to calculate a percentile based on known Mean and Standard Deviations for Gaussian distributions. Just enter the values for Mean and Standard Deviation on the left, and in the ‘Probability’ field enter a percentile value in decimal form (97th percentile corresponds to 0.97 and so forth). Hit the ‘Compute x’ button and you will be given the corresponding value of ‘x’.

99th Percentile Ain’t Cake

Another point of note about a Gaussian distribution:

The distance from the 0th percentile to the 25th percentile is also equal to the distance between the 75th and 100th percentile. Let’s say this distance is x. The distance between the 25th percentile and the 50th percentile is also equal to the distance between the 50th percentile and the 75th percentile. Let’s say this distance is y.

It so happens that x>>>y. In a crude sense, this means that it is disproportionately tougher for you to score extreme values than to stay closer to the mean. Going from a 50th percentile baseline, scoring a 99th percentile is disproportionately tougher than scoring a 75th percentile. If you aim to score a 99 percentile, you’re gonna have to seriously sweat it out!

It’s the interval, stupid

Among all of the things that I respect about the USMLE, the one thing that I just totally appreciate, is that this is one of those rare exams that strive to make scoring and score reporting scientific. Things aren’t perfect, that’s for sure, but you’ve just gotta commend the effort.

Let’s talk about the 3-digit score in this context. Say there are infinite clones of you existent in this world and you’re all like the Borg. Each of you is mentally indistinguishable from the other – possessing ditto copies of USMLE knowhow. Say that each of you took the USMLE and then we plot the frequencies of these scores on a graph. We’re going to end up with a Gaussian curve depicting this sample of clones, with its own mean score and standard deviation. This process is called ‘parametric sampling’ and the distribution obtained is called a ’sampling distribution’.

The idea behind what we just did is to determine the variation that we would expect in scores even if knowhow remained constant – either due to a flaw in the test or by random chance.

The standard deviation of a sampling distribution is also called ’standard error’. As you’ll probably learn during your USMLE preparation, knowing the standard error helps calculate what are called ‘confidence intervals’.

A confidence interval for a given score can be calculated as follows (using the Z-statistic):-

True score = Measured score +/- 1.96 (standard error of measurement) … for 95% confidence

True score = Measured score +/- 2.58 (standard error of measurement) … for 99% confidence

For many recent tests, the standard error for the 3-digit scale has been 6 [Every score card quotes a certain SEM (Standard Error of Measurment) for the 3-digit scale]. This means that given a measured score of 240, we can be 95% certain that the true value of your performance lies between a low of 240 – 1.96 (6) and a high of 240 + 1.96 (6). Similarly we can say with 99% confidence that the true score lies between 240 – 2.58 (6) and 240 + 2.58 (6). These score intervals are probablistically flat when graphed – each true score value within the intervals calculated has an equal chance of being the right one.

What this means is that, when you compare two individuals and see their scores side by side, you ought to consider what’s going on with their respective confidence intervals. Do they overlap? Even a nanometer of overlapping between CIs makes the two, statistically speaking, indistinguishable, even if in reality there is a difference. As far as the test is concerned, when two CIs overlap, the test failed to detect any difference between these two individuals. Capiche?

Beating competitors by intervals rather than pinpoint scores is a good idea to make sure you really did do better than them. The wider the distance separating two CIs, the larger is the difference between them.

There’s a special scenario that we need to think about here. What about the poor fellow who just missed the passing mark? For a passing mark of 180, what of the guy who scored, say 175? Given a standard error of 6, his 95% CI definitely does include 180 and there is no statistically significant (using a 5% margin of doubt) difference between him and another guy who scored just above 180. Yet this guy failed while the other passed! How do we account for this? I’ve been wondering about it and I think that perhaps, the pinpoint cutoffs for passing used by the USMLE exist as a matter of practicality. Using intervals to decide passing/failing results might be tedious, and maybe scientific endeavor ends at this point. Anyhow, I leave this question out in the void with the hope that it sparks discussions and clarifications.

If you care to give it a thought, the graphical subject-wise profile bands on the score card are actually confidence intervals (95%, 99% ?? I don’t know). This is why the score card clearly states that if any two subject-wise profile bands overlap, performance in these subjects should be deemed equal.

I hope you’ve found this post interesting if not useful. Please feel free to leave behind your valuable suggestions, corrections, remarks or comments. Anything :-) !

Readability grades for this post:

Kincaid: 8.8
ARI: 9.4
Coleman-Liau: 11.4
Flesch Index: 64.3/100 (plain English)
Fog Index: 12.0
Lix: 40.3 = school year 6
SMOG-Grading: 11.1

Powered by Kubuntu Linux 8.04

-

Copyright © 2006 – 2008 Firas MR. All rights reserved.

Calling For A Common Worldwide Medical Licensure Pathway

with 9 comments

Obstacles

Source, Author and License

Medicine – Realm Of The Unknown

For ages, the medical sphere has been shrouded in mystery – for people outside of medicine that is. And this hasn’t been too good for the medical profession because many policy makers on matters of healthcare/medicine aren’t sufficiently acquainted with its many nuances to yield considered judgements. Sometimes you just can’t help get the feeling that doctors have a language of their own, with a community so tightly knit that it borders some sort of illuminati like cult.

Earlier, most of this mystery was limited to the knowledge base of medicine. Doctors were treated like gods walking on earth and people had no qualms whatsoever in having blind faith in them. With the rapid rise of web technologies however, doctors find themselves facing tough and pointed questions by their patients and policy makers about the decisions they make.

Some aspects, for the large part, still remain hidden away however. Stuff that affects policy decisions and how medical communities across the world interact with each other. Issues concerning licensure and taxonomy immediately come to mind.

An aspect of medicine that to this day, remains an enigma for many ‘outsiders’ is the entire academic hierarchy that applies to medical systems across the globe. Many ‘insiders’ end up at their wits ends too. The taxonomy is definitely confusing. What the heck is a Senior Registrar? Or for that matter, what in god’s name is the difference between house surgeons/officers, resident medical officers, civil surgeons, residents, interns, attendings, senior house officers and all that jargon? The world could definitely use a universal taxonomic architecture for medical systems akin to the WHO’s International Classification of Diseases (ICD) to streamline stuff and make interactions between communities easier.

Licensure – One Too Many Exams For A Globalised Age

When medical students step into the medical world, being relatively new ‘insiders’ at this stage, very few are cognizant of the fact that their careers depend on having to satisfy licensure requirements before even thinking about pursuing higher education. Getting through medical school is one step. After that, students are required to go through long winded licensure pathways before even beginning to gain higher training. Licensure serves as a quality control measure to ensure the safety of patients and is arguably, a necessary evil.

Modern society depends on the exchange of ideas and talent between countries. The same applies to medicine as well. Unfortunately, due to the myriads of medical licensure exams across different countries, this kind of exchange and collaboration can become extremely tedious and at times impractical. Getting into higher training for the international trainee becomes a daunting task. Take the following hypothetical scenario:-

Dr. Underdog went to medical school in a country bordering Angola and got his local medical license after graduating and passing local licensure exams. He now intends to gain higher training in colorectal surgery (… of all things :-) ) in the US. Before getting into a higher training program he needs an American license. He proceeds to sit for the United States Medical Licensure Exam (USMLE) and passes all 4 component exams in this process with flying colors. Good for him, Dr. Underdog’s thirst for knowledge is relentless. After gaining qualifications as a colorectal surgeon, he is now interested in learning a highly advanced and experimental procedure involving cosmic radiation and bizarre tumor polyps :-P , only available in Australia. He is now required to pass the Australian Medical Council licensure exams before he begins. He goes ahead with that and gains the skills he’s always dreamed about :-) . By now, Dr. Underdog has been through at least a dozen different licensure exams. The exams he gave in the US and Australia weren’t directly related to the subjects he studied at those places. Seeing great potential in this emerging pioneer, a group of people from a country near Chile invite Dr. Underdog over. They’d like him to impart some of the training he received to a couple of their fortunate students. Unfortunately, he needs to clear their local licensure exams before he can begin. He candidly goes through that as well. In this new land, Dr. Underdog meets a fellow international doc who’s been through twice the number of licensure exams as he has, to get to a position as senior faculty member while also dealing with some mind blowing research – literally involving blowing stuff :-P , partly as an outlet for his bottled up frustrations over licensure systems. … See how tedious it can get?

If I’m interested in gaining specialized skills and/or knowledge available in only certain parts of the world, I need to get straight down to business without having to worry about sitting for multiple licensure exams. Sitting for multiple licensure exams is not only wasteful of time and money, it is also redundant. Most of these exams test the same content anyway. Most importantly, as an aspiring international trainee, my focus has to be on the exams directly related to the training I intend to pursue rather than random licensure tests.

Solution? A universal licensure pathway ratified by an international body such as the WHO that should be acceptable to all countries.

At the moment, a few agencies such the Medical Council of Canada and the Australian Medical Council are conducting joint licensure tests. Their efforts in this direction are laudable and should be wholeheartedly welcomed. Hopefully other countries will follow suit and some day a universal licensure pathway will become a reality. Until then, international trainees can only follow in Dr. Underdog’s tortuous footsteps!

Readability grades for this post:

Kincaid: 10.0
ARI: 11.2
Coleman-Liau: 14.4
Flesch Index: 53.2/100
Fog Index: 13.1
Lix: 48.9 = school year 9
SMOG-Grading: 12.0

Powered by Kubuntu Linux 8.04
-

Copyright © 2006 – 2008 Firas MR. All rights reserved.

Written by Firas MR

May 3, 2008 at 8:44 pm