Know Thy Numbers!
Being face to face with writer’s block, I suppose there isn’t anything particularly exciting I feel like writing about for today. I will therefore talk about a couple of things that I’ve been learning from biostatistics and that I feel many of my fellow medics would benefit from.
We all make comparisons between numbers. If ‘A’ weighs 100 kg and ‘B’ weighs 50 kg, we often say A is twice as heavy as B (wt. of A / wt. of B). We can also say A is 50 kg heavier than B (weight of A – weight of B). Is the same true for temperature in Fahrenheit? Is 100F twice as hot as 50F? Well interestingly, no! A temperature of 100F is 50F hotter than a temperature of 50F but not twice as hot. Therein lies a fundamental difference between two different kinds of ‘Dimensional‘ (otherwise called ‘Continuous‘) data:
- Interval data: a dimensional data set that has values with an equal difference between them. So if numbers denoting Fahrenheit in F are listed as 1, 2, 3, 4, … we clearly know that as we progress from 1 to 2 and then to 3, every subsequent number in that set is separated from its predecessor by an equal interval.
- Ratio data: a dimensional data set having properties of an Interval data set and, in addition has an absolute zero. Kelvin vs. Fahrenheit is a classic example. Kelvin has an absolute zero while Fahrenheit does not. Weight in kg, too belongs to the class of Ratio data.
The implications of the above dictate how we can manipulate and handle our data. In making comparisons between interval data such as Fahrenheit, we don’t have a universal reference against which two compare two different values – in our example 100F and 50F. The 0F standard is purely arbitrary. If in a fit of mad-hatter rage, we suddenly said that from now on 0F is no longer 0F but 10F, our original values for 100F and 50F now become 110F and 60F. The difference (110-60) remains the same as before (100-50) but the ratio (110/60) changes from the original (100/50). All of this occurs because there isn’t anything stopping you from making a change to your arbitrary 0F standard.
Ratio data sets on the other hand have an absolute standard – the absolute zero. By definition, you can’t change it! This standard is not subject to arbitrary whims and fancies. Taking our Kelvin example, 100K is 50K hotter than a temperature of 50K (100-50). Not only that, it is absolutely fine for you to say 100K is twice as hot as 50K (100/50). Similarly for weight in kilograms, 0kg is absolute. And thus 100kg is 50kg heavier than a weight measurement of 50kg (100-50) and it is also twice as heavy as 50kg (100/50).
The crude analogy is that of a sailor out in the sea. In order to navigate, he could use objects in the ocean such as rocks that could very well change their positions due to climatic conditions (~interval data). Or he could use the Pole Star to help him navigate (~ratio data).
You can compare interval data by calculating their difference. No matter what you set as your arbitrary standard, the difference will not change. For ratio data, in addition to calculating differences you also have the luxury of calculating ratios.
A Comedy of Errors
Most people don’t realize this but the IQ score is an example of interval data. A guy scoring 200 on the test did not do twice as good as another who scored 100. He did 100 points better. Standards for a given IQ testing method are set arbitrarily. Not only that, different testing methods could have different arbitrary standards. The WAIS has a different standard than the Stanford-Binet. Remember that.
[In real life, the IQ score isn’t truly interval in nature. How is one to assume that there’s an equal interval of ‘intelligence’ between subsequent scores of 100, 101, 102, … ? It’s analogous to cancer staging actually. Stage IV disease is no doubt worse than Stage III disease which in turn is worse than Stage II disease, … You don’t necessarily progress by equal intervals of ‘disease-ness’ with each subsequent stage from I to IV. Similar to numbers for cancer staging, numbers for IQ scores are actually ‘Ordinal‘ data in disguise.]
All data can be divided into the following types (from least informative to most informative):
- Categorical – Nominal : Distinct categories of data, that you assign names to and that you can’t rank. Eg. Smoker and Non-smoker; Asian, African, American, Australian, etc.
- Categorical – Ordinal : Distinct categories of data that you can not only assign names to but can also assign ranks. Intervals between ranks aren’t equal. Eg. Gold medal, Silver medal, Bronze medal; Class rank, Cancer Staging, etc. are also examples of ordinal data. The only difference is that they are disguised as numbers.
- Dimensional – Interval : Numerical data with ranks. Ranks have equal intervals between them. There is no absolute zero.
- Dimensional – Ratio : Interval data with an absolute zero.
- Biostatistics – The Bare Essentials (by Geoffrey R. Norman (Author), David L. Streiner)
- Principles of Medical Statistics (by Alvan R. Feinstein)
Powered by Kubuntu Linux 7.10
Readability grades for this post:-
Flesch Index: 70.6/100
Fog Index: 9.8
Lix: 33.9 = below school year 5
Copyright © 2006 – 2008 Firas MR. All rights reserved.