Posts Tagged ‘Biostatistics’
I have been really enjoying Feinstein’s “Principles of Medical Statistics” the past couple of days. And today I felt like sharing a nifty and pragmatic lesson from the book. Now I’d love to put up an entire chunk from the book right here, but I’m not sure if that would do justice to the copyright. So I’ll just stick to as little of excerpt as possible. But to honestly enjoy it, I recommend reading the entire section. So grab yourself a copy at a local library or whatever and dive in. The chapter of interest is Chapter 6 in Unit 1. Towards the end, there’s a section that goes into interesting detail as to the merits and possible demerits of quantifying medicine. To demonstrate the delicate interplay of qualitative and quantitative descriptions in modern medicine, the author quotes a number of research studies that investigated how qualitative terms like “more”, “a lot more”, “a great deal”, “often”, etc. meant different things to different people. They were able to do this using clever research designs that allowed them to correlate a given qualitative term and its corresponding quantitative estimate and they did this for different groups of people – doctors, clerks, etc. Frustrated at the lack of a consensus on the exact amount or probability or percentile/percentage and so on, of mundane terms like the above, one scientist even thought of a universal coding mechanism for day to day use. What frustrations you ask? One example is where an ulcer deemed “large” on one visit to a doctor at the clinic could actually be deemed “small” on a subsequent visit to a different doctor, even though the ulcer might have really grown larger during this time.
It is quite clear then, that qualitativeness in medicine often seems like a roadblock of some sort. Not to dismay however, as Dr. Feinstein ends this chapter with a subsection called “virtues of imprecision”. I found this part to be the most worth savoring. He describes some of the advantages of using qualitative terms and why on some occasions they might in fact be better in communication:-
- Qualitative terms allow you to convey a message without resorting to painstaking detail. Detail that you might not have the ability to perceive or compute.
- Patients find qualitative terms more intuitive and so do doctors.
- Defining or maybe replacing qualitative terms with quantitative ones, potentially could lead to endless debates on where cut-offs would lie (why should 1001 come under ‘large’ and 1000 under ‘small’…hope you get the drift).
- Many statistical estimates like survival rates, etc. come out of potentially biased studies and it may be wrong to say that “good” survival is say 90% in 5 years and “better” is 99% in 5 years. Which is to say, that it may be wrong to give an impression of precision when in fact it isn’t present.
- Perhaps the most important and pragmatic lesson he gave, was about the false sense of security/insecurity numbers could give to either patients or doctors. Naivety plays devil here. He demonstrated this using the cancer staging system. Each cancer stage has some sort of survival statistic attached to it, right? So for example (the numbers here are solely arbitrary), for Stage I cancer, the 5-year survival is 90%. Stage III cancer in contrast is given a 5-year survival probability of 40%. A patient with Stage III cancer, will be given this information by his or her physician and management plans will be made. What the physician might not realize is that if Stage III is split into further sub-stages, say from Stage III-substage 1 to Stage III-substage 10, the survival probabilities range from 75% to 5%. The 40% statistic is the ‘average’ and may not be sufficiently relevant to this particular patient, who for all we know could belong to Stage III-substage 1. So, broad statistical numbers are not necessarily pertinent to individual cases.
Oh and did I mention excerpt? Ah, never mind. I’ve covered most of the juice paraphrasing anyway .
Hope you’ve found this post interesting. And if you have, do send in your comments .
Readability grades for this post:
Flesch Index: 62.3/100 (plain English)
Fog Index: 12.2
Lix: 40.4 = school year 6
Powered by Kubuntu Linux 7.10
Copyright © 2006 – 2008 Firas MR. All rights reserved.
Being face to face with writer’s block, I suppose there isn’t anything particularly exciting I feel like writing about for today. I will therefore talk about a couple of things that I’ve been learning from biostatistics and that I feel many of my fellow medics would benefit from.
We all make comparisons between numbers. If ‘A’ weighs 100 kg and ‘B’ weighs 50 kg, we often say A is twice as heavy as B (wt. of A / wt. of B). We can also say A is 50 kg heavier than B (weight of A – weight of B). Is the same true for temperature in Fahrenheit? Is 100F twice as hot as 50F? Well interestingly, no! A temperature of 100F is 50F hotter than a temperature of 50F but not twice as hot. Therein lies a fundamental difference between two different kinds of ‘Dimensional‘ (otherwise called ‘Continuous‘) data:
- Interval data: a dimensional data set that has values with an equal difference between them. So if numbers denoting Fahrenheit in F are listed as 1, 2, 3, 4, … we clearly know that as we progress from 1 to 2 and then to 3, every subsequent number in that set is separated from its predecessor by an equal interval.
- Ratio data: a dimensional data set having properties of an Interval data set and, in addition has an absolute zero. Kelvin vs. Fahrenheit is a classic example. Kelvin has an absolute zero while Fahrenheit does not. Weight in kg, too belongs to the class of Ratio data.
The implications of the above dictate how we can manipulate and handle our data. In making comparisons between interval data such as Fahrenheit, we don’t have a universal reference against which two compare two different values – in our example 100F and 50F. The 0F standard is purely arbitrary. If in a fit of mad-hatter rage, we suddenly said that from now on 0F is no longer 0F but 10F, our original values for 100F and 50F now become 110F and 60F. The difference (110-60) remains the same as before (100-50) but the ratio (110/60) changes from the original (100/50). All of this occurs because there isn’t anything stopping you from making a change to your arbitrary 0F standard.
Ratio data sets on the other hand have an absolute standard – the absolute zero. By definition, you can’t change it! This standard is not subject to arbitrary whims and fancies. Taking our Kelvin example, 100K is 50K hotter than a temperature of 50K (100-50). Not only that, it is absolutely fine for you to say 100K is twice as hot as 50K (100/50). Similarly for weight in kilograms, 0kg is absolute. And thus 100kg is 50kg heavier than a weight measurement of 50kg (100-50) and it is also twice as heavy as 50kg (100/50).
The crude analogy is that of a sailor out in the sea. In order to navigate, he could use objects in the ocean such as rocks that could very well change their positions due to climatic conditions (~interval data). Or he could use the Pole Star to help him navigate (~ratio data).
You can compare interval data by calculating their difference. No matter what you set as your arbitrary standard, the difference will not change. For ratio data, in addition to calculating differences you also have the luxury of calculating ratios.
A Comedy of Errors
Most people don’t realize this but the IQ score is an example of interval data. A guy scoring 200 on the test did not do twice as good as another who scored 100. He did 100 points better. Standards for a given IQ testing method are set arbitrarily. Not only that, different testing methods could have different arbitrary standards. The WAIS has a different standard than the Stanford-Binet. Remember that.
[In real life, the IQ score isn't truly interval in nature. How is one to assume that there's an equal interval of 'intelligence' between subsequent scores of 100, 101, 102, ... ? It's analogous to cancer staging actually. Stage IV disease is no doubt worse than Stage III disease which in turn is worse than Stage II disease, ... You don't necessarily progress by equal intervals of 'disease-ness' with each subsequent stage from I to IV. Similar to numbers for cancer staging, numbers for IQ scores are actually 'Ordinal' data in disguise.]
All data can be divided into the following types (from least informative to most informative):
- Categorical – Nominal : Distinct categories of data, that you assign names to and that you can’t rank. Eg. Smoker and Non-smoker; Asian, African, American, Australian, etc.
- Categorical – Ordinal : Distinct categories of data that you can not only assign names to but can also assign ranks. Intervals between ranks aren’t equal. Eg. Gold medal, Silver medal, Bronze medal; Class rank, Cancer Staging, etc. are also examples of ordinal data. The only difference is that they are disguised as numbers.
- Dimensional – Interval : Numerical data with ranks. Ranks have equal intervals between them. There is no absolute zero.
- Dimensional – Ratio : Interval data with an absolute zero.
- Biostatistics – The Bare Essentials (by Geoffrey R. Norman (Author), David L. Streiner)
- Principles of Medical Statistics (by Alvan R. Feinstein)
Powered by Kubuntu Linux 7.10
Readability grades for this post:-
Flesch Index: 70.6/100
Fog Index: 9.8
Lix: 33.9 = below school year 5
Copyright © 2006 – 2008 Firas MR. All rights reserved.