My Dominant Hemisphere

The Official Weblog of 'The Basilic Insula'

Posts Tagged ‘Research

Meeting Ghosts In The Chase For Reality

leave a comment »

Sunrise

Sunrise (via faxpilot @ Flickr CC BY-NC-ND license)

Watching the morning sun beaming through the clouds during today’s morning jog, I was struck by an epiphany. What ultimately transpired was a streak of thoughts, that left me in a overwhelming sense of awe and humility for its profound implications.

Perhaps the rejuvenating air, the moist earth from the previous night’s rains and the scent of the fresh Golden Flamboyant trees lining my path made the sun’s splendor much more obvious to see. Like in a photograph coming to life, when objects elsewhere in the scene enhance the main subject’s impact.

As I gazed in its direction wondering about the sunspots that neither I nor anyone else around me could see (but that I knew were really there, from reading the work of astronomers), I began thinking about my own positional coordinates. So this was the East, I found. But how did I know that? Well as you might have guessed, from the age old phrase: “the sun rises in the East and sets in the West”. Known in Urdu as “سورج مشرق میں نکلتا ہے اور مغرب میں ڈوبتا ہے ” or in Hindi, “सूरज पूरव में निकलता है और पश्चिम में डूबता है” and indeed to be found in many other languages, we observe that man has come to form an interesting model to wrap his mind around this majestic phenomenon. Indeed, many religious scriptures and books of wisdom, from ancient history to the very present, find use of this phrase in their deep moral teachings.

But we’ve come to think that we know this model is not really “correct”, is it? We’ve come to develop this thinking with the benefit of hindsight (a relative term, given Einstein’s famous theory, by the way. One man’s hindsight could actually be another man’s foresight!). We’ve ventured beyond our usual abode and looked at our planet from a different vantage point – that of Space. From the Moon and satellites. The sun doesn’t actually rise or set. That experience occurs because of our peculiar vantage point – of relatively slow or immobile creatures grounded here on Earth. One could say that it is an interesting illusion. Indeed, you could sit on a plane and with the appropriate speed, chase that sliver of sunlight as the Sol (as it’s lovingly called by scientists) appears or disappears in the horizon, never letting it vanish from view and do so essentially indefinitely.

Notes In The Margin About Language

Coming back, for a moment, to this amusing English phrase that helped me gauge my position, I thought about how language itself can shape one’s thinking. A subject matter upon which I’ve reflected before. There really comes a point when our models of the world and the universe get locked within the phraseology of a language that can actually reach the limits of its power of expression fairly unexpectedly. Speak in English and your view is different from somebody who can speak in Math. Even within Math, the coming about of algebra expanded the language’s power of expression incredibly from its meager beginnings. New models get incorporated into the lexicon of a language and because we tend to feed off of such phrases to make sense of ourselves and our universe, there is the potential for an inertia to develop, whereby it becomes easy to stay put with our abstractions of reality and not move on to radically new ones – models that are beyond the power of expression of a language and that haven’t yet been captured in its lexicon. In a way we find that models influence languages and languages themselves influence models and ultimately there is this interesting potential for a peculiar steady state to be reached – which may or may not be such a good thing.

So when it comes to this phenomenon, we’ve moved from one model to another. We began with “primitive” maxims. Perhaps during a time when people used to think of the Earth as flat and stars as pin-point objects too. And then progressed to geocentrism and then heliocentrism, both of which were basically formulated by careful and detailed observations of the sky using telescopes, long before the luxury of satellites and space travel came into being. And now that we see the Earth from this improved vantage point – of Space – our model for understanding reality has been refined. And actually, really shifted in profound ways.

So what does this all mean? It looks like reality is one thing, that exists out there. And we as humans make sense of reality through abstractions or models. How accurate we are with our abstractions really depends on how much information we’ve been able to gather. New information (through ever more detailed experiments or observations and indeed as Godel and Poincare showed, sometimes by mere pontification), drives us to alter our existing models. Sometimes in radically different ways (a classic example is our model of matter: one minute particle, one minute wave). There is this continuous flux about how we make sense of the cosmos, and it will likely go on this way until the day mankind has been fully informed – which may never really happen if pondered upon objectively. There have been moments in the past where man has thought that this precipice had been finally reached, that he was at last fully informed, only to realize with utter embarrassment that this was not the case. Can man ever know, by himself, that he has finally reached such a point? Especially, given that this is like a student judging his performance at an exam without the benefit of an independent evaluator? The truth is that we may never know. Whether we think we will ever reach such a precipice really does depend on a leap of faith. And scientists and explorers who would like to make progress, depend on this faith – that either such a precipice will one day be reached or at least that their next observation or experiment will increase them in information on the path to such a glorious point. When at last, a gestalt vision of all of reality can be attained. It’s hard to stay motivated otherwise, you see. And you thought you heard that faith had nothing to do with science or vice versa!

It is indeed quite remarkable the extent to which we get stuck in this or that model and keep fooling ourselves about reality. No sooner do we realize that we’ve been had and move on from our old abstraction to a new one and one what we think is much better, are we struck with another blow. This actually reminds me of a favorite quote by a stalwart of modern Medicine:

And not only are the reactions themselves variable, but we, the doctors, are so fallible, ever beset with the common and fatal facility of reaching conclusions from superficial observations, and constantly misled by the ease with which our minds fall into the rut of one or two experiences.

William Osler in Counsels and Ideals

The World According To Anaximander

The World According To Anaximander (c. 610-546 BCE)

The phenomenon is really quite pervasive. The early cartographers who divided the world into various regions thought funny stuff by today’s standards. But you’ve got to understand that that’s how our forefathers modeled reality! And whether you like it or not someday many generations after our time, we will be looked upon with similar eyes.

Watching two interesting Royal Society lectures by Paul Nurse (The Great Ideas of Biology) and Eric Lander (Beyond The Human Genome Project: Medicine In The 21st Century) the other day, this thought kept coming back to me. Speaking about the advent of Genomic Medicine, Eric Lander (who trained as a mathematician, by the way) talked about the discovery of the EGFR gene and the realization that its mutations strongly increase the risk for a type of lung cancer called Adenocarcinoma. He mentioned how clinical trials of the drug Iressa – a drug whose mechanism of action scientists weren’t sure of yet but was nevertheless proposed as a viable option for lung adenocarcinomas – failed to show statistically significant differences from standard therapy. Well, that was because the trial’s subjects were members of the broad population of all lung adenocarcinoma cases. Many doctors realizing the lack of conclusive evidence of a greater benefit, felt no reason to choose Iressa over standard therapy and drastically shift their practice. Which is what Evidence-Based-Medical practice would have led them to do, really. But soon after the discovery of the EGFR gene, scientists decided to do a subgroup analysis using patients with EGFR mutations, and it was rapidly learned that Iressa did have a statistically significant effect in decreasing tumor progression and improving survival in this particular subgroup. A significant section of patients could now have hope for cure! And doctors suddenly began to prescribe Iressa as the therapy of choice for them.

As I was thinking about what Lander had said, I remembered that Probability Theory as a science, which forms the bedrock of such things as clinical trials and indeed many other scientific studies, had not even developed until the Middle Ages. At least, so far as we know. And modern probability theory really began much later, in the early 1900s.

Front page of "Doctrine of Chance – a method for calculating the probabilities of events in plays" by Abraham de Moivre, London, 1718

Abraham de Moivre's "Doctrine of Chances" published in 1718, was the first textbook on Probability Theory

You begin to realize what a quantum leap this was in our history. We now think of patterns and randomness very differently from ancient times. Which is pretty significant, given that for some reason our minds are drawn to looking for patterns even where there might not be any. Over the years, we’ve developed the understanding that clusters (patterns) of events or cases could occur in a random system just as in a non-random one. Indeed, such clusters (patterns) would be a fundamental defining characteristic of a random process. Absence of clusters would indicate that a process wasn’t truly random. Whether such clusters (patterns) would fit with a random process as opposed to a non-random one would depend on whether or not we find an even greater pattern of how these clusters are distributed. A cluster of cases (such as an epidemic of cholera) would be considered non-random if by hypothesis testing we found that the probability of such a cluster coming about by random chance was so small as to be negligible. And even when thinking about randomness, we’ve learned to ask ourselves if a random process could be pseudo-random as opposed to truly random – which can sometimes be a difficult thing to establish. So unlike our forefathers, we don’t immediately jump to conclusions about what look to our eyes as patterns. It’s all quite marvelous to think about, really. What’s even more fascinating, is that Probability Theory is in a state of flux and continues to evolve to this day, as mathematicians gather new information. So what does this mean for the validity of our models that depend on Probability Theory? If a model could be thought of as a chain, it is obvious that such a model would be as strong as the links with which it is made! So we find that statisticians keep finding errors in how old epidemiologic studies were conducted and interpreted. And the science of Epidemiology itself improves as Probability Theory is continuously polished. This goes to show the fact that the validity of our abstractions keeps shifting as the foundations upon which they are based themselves continue to transform. A truly intriguing idea when one thinks about it.

Some other examples of the shifting of abstractions with the gathering of new information come to mind.

Image from Andreas Vesalius's De humani corporis fabrica (1543), page 190.

An image from Vesalius's "De Humani Corporis Fabrica" (1543)

Like early cartographers, anatomists never really understood human anatomy very well back in the days of cutting open animals and extrapolating their findings to humans. There were these weird ideas that diseases were caused by a disturbance in the four humors. And then Vesalius came along and by stressing on the importance of dissecting cadavers, revolutionized how anatomy came to be understood and taught. But even then, our models for the human body were until recently plagued by ideas such as the concept that the seat of the soul lay in the pineal gland and some of the other stuff now popularly characterized as folk-medicine. In our models for disease causation, we’ve progressed over the years from looking at pure environmental factors to pure DNA factors and now to a multifactorial model that stresses on the idea that many diseases are caused by a mix of the two.

The Monty Hall paradox, about which I’ve written before is another good example. You’re presented with new information midway in the game and you use this new information to re-adjust the old model of reality that you had in your mind. The use of decision trees in genetic counseling, is yet another example. Given new information about a patient’s relatives and their genotype, your model for what is real and its accuracy improves. You become better at diagnosis with each bit of new information.

The phenomenon can often be found in how people understand Scripture too. Mathematician, Gary Miller has an interesting article that describes how some scholars examining the word Iram have gradually transformed their thinking based on new information gathered by archeological excavations.

So we see how abstractions play a fundamental role in our perceptions of reality.

One other peculiar thing to note is that sometimes, as we try to re-shape our abstractions to better congrue with any new information we get, there is the tendency to stick with the old as much as possible. A nick here or a nudge there is acceptable but at its heart we are usually loath to discard our old model entirely. There is a potential danger in this. Because it could be that we inherit flaws from our old model without even realizing it, thus constraining the new one in ways yet to be understood. Especially when we are unaware of what these flaws could be. A good example of abstractions feeding off of each other are the space-time fabric of relativity theory and the jitteriness of quantum mechanics. In our quest for a new model – a unified theory or abstraction – we are trying to mash these two abstractions together in curious ways, such that a serene space-time fabric exists when zoomed out, but when zoomed in we should expect to see it behave erratically with jitters all over the place. Our manner of dealing with such inertia when it comes to building new abstractions is basically to see if these mash-ups agree with experiments or observations much better than our old models. Which is an interesting way to go about doing things and could be something to think about.

Making Sense Of Reality Through The Looking Glass

Making Sense Of Reality Through The Looking Glass (via Jose @ Flickr, CC BY-SA-NC license)

Listening to Paul Nurse’s lecture I also learned how Mendel chose Pea plants for his studies on inheritance rather than other complicated vegetation because of the simplicity and clarity with which one could distinguish their phenotypes, making the experiment much easier to carry out. Depending on how one crossed them, one could trace the inheritance of traits – of color of fruit, height of plant, etc. very quickly and very accurately. It actually reminded me of something I learned a long time ago about the various kinds of data in statistics. That these data could be categorized into various types based on the amount of information they contain. The highest amount of information is seen in Ratio data. The lowest is seen in Nominal data. The implication of this is that the more your experiment or scientific study uses Ratio data rather than Nominal data, the more accurate will your inferences about reality be. The more information you throw out, the weaker will your model be. So we see that there is quite an important caveat when we build abstractions based on keeping it simple and stripping away intricacy. When we are stuck with having to use an ape thumb with a fine instrument. It’s primitive, but it often gets us ahead in understanding reality much faster. The cost we pay though, is that our abstraction congrues better with a simpler and more artificial version of the reality that we seek to understand. And reality usually is quite complex. So when we limit ourselves to examining a bunch of variables in say for example the clinical trial of a drug, and find that it has a treatment benefit, we can be a lot more certain that this would be the case in the real world too provided that we prescribe the drug to as similar a patient pool as in our experiment. Which rarely happens as you might have guessed! And that’s why you find so many cases of treatment failure and unpredictable disease outcomes. How the validity of an abstraction is influenced by the KISS principle is something to think about. Epidemiologists get sleepless nights when pondering over it sometimes. And a lot of time is spent in trying to eliminate selection bias (i.e. when errors of inference creep in because the pool of patients in the study doesn’t match to an acceptable degree, the kinds of patients doctors would interact with out in the real world). The goal is to make an abstraction agree with as much of reality as possible, but in doing so not to make it so far removed from the KISS principle that carrying out the experiment would be impractical or impossible. It’s such a delicate and fuzzy balance!

So again and again we find that abstractions define our experiences. Some people get so immersed and attached with their models of reality that they make them their lifeblood, refusing to move on. And some people actually wonder if life as we know it, is itself an abstraction :-D! I was struck by this when I came upon the idea of the Holographic principle in physics – that in reality we and our universe are bound by an enveloping surface and that our real existence is on this plane. That what we see, touch or smell in our common experience is simply a projection of what is actually happening on that surface. That these everyday experiences are essentially holograms :-D! Talk about getting wild, eh :-D?!

The thought that I ultimately came with at the end of my jog was that of maintaining humility in knowledge. For those of us in science, we find that it is very common for arrogance to creep in. When the fact is that there is so much about reality that we don’t know anything about and that our abstractions may never agree with it to full accuracy, ever! When pondered upon deeply this is a very profound and humbling thing to realize.

Even the arrogance in Newton melted away for a moment when he proclaimed:

If I have seen a little further it is by standing on the shoulders of Giants.

Isaac Newton in a letter to rival Robert Hooke

Here’s to Isaac Newton for that spark of humility, even if it was rather fleeting :-). I’m guessing there must have been times when he might have had stray thoughts of cursing at himself for having said that :-)! Oh well, that’s how they all are …


Copyright Firas MR. All Rights Reserved.

“A mote of dust, suspended in a sunbeam.”


Search Blog For Tags: , , , , , ,

Written by Firas MR

November 16, 2010 at 12:18 am

Seeking Profundity In The Mundane

leave a comment »

seeking a new vision

Seeking A New Vision (via Jared Rodriguez/Truthout CC BY-NC-SA license)

The astronomer, Carl Sagan once said:

It has been said that astronomy is a humbling and character-building experience. There is perhaps no better demonstration of the folly of human conceits than this distant image of our tiny world. To me, it underscores our responsibility to deal more kindly with one another, and to preserve and cherish the pale blue dot, the only home we’ve ever known.

— in the Pale Blue Dot

And likewise Frank Borman, astronaut and Commander of Apollo 8, the first mission to fly around the Moon said:

When you’re finally up on the moon, looking back at the earth, all these differences and nationalistic traits are pretty well going to blend and you’re going to get a concept that maybe this is really one world and why the hell can’t we learn to live together like decent people?

Why is it I wonder, that we the human race, have the tendency to reach such profound truths only when placed in an extraordinary environment? Do we have to train and become astronomers or cosmonauts to appreciate our place in the universe? To find respect for and to cherish what we’ve been bestowed with? To care about each other, our environment and this place that we are loath to remember is the one home for all of life as we know it?

There is much to be learned by reflecting upon this idea. Our capacity to gain wisdom and feel impressed really does depend on the level to which our experiences deviate from the banal, doesn’t it? Ask what a grain of food means to somebody who has never had the luxury of a mediocre middle-class life. Ask a lost child what it must be like to have finally found his mother. Or question the rejoicing farmer who has just felt rain-drops on his cheeks, bringing hope after a painful drought.

I’m sure you can think of other examples that speak volumes about the way we, consciously or not, program ourselves to look at things.

The other day, I was just re-reading an old article about the work of biomathematician, Steven Strogatz. He mentioned how as a high-school student studying science, he was asked to drop down on his knees and measure the dimensions of floors, graph the time periods of pendulums and figure out the speed of sound from resonating air columns in hollow tubes partly filled with water, etc. Each time, the initial reaction was that of dreariness and insipidity. But he would then soon realize how these mundane experiments would in reality act as windows to profound discoveries – such as the idea that resonance is something without which atoms wouldn’t come together to form material objects or how a pendulum’s time period when graphed reflects a specific mathematical equation.

There he was – peering into the abstruse and finding elegance in the mundane. The phenomenon reminded me of a favorite quote:

The real voyage of discovery consists not in seeking new landscapes, but in having new eyes.

Marcel Proust

For that’s what Strogatz, like Sagan and Borman was essentially experiencing. A new vision about things. But with an important difference – he was doing it by looking at the ordinary. Not by gazing at extra-ordinary galaxies and stars through a telescope. Commonplace stuff, that when examined closely, suddenly was ordinary no more. Something that had just as much potential to change man’s perspective of himself and his place in the universe.

I think it’s important to realize this. The universe doesn’t just exist out there among the celestial bodies that lie beyond normal reach. It exists everywhere. Here; on this earth. Within yourself and your environment and much closer to home.

Perhaps, that’s why we’ve made much scientific progress by this kind of exploration. By looking at ordinary stuff using ordinary means. But with extra-ordinary vision. And successful scientists have proven again and again, the value of doing things this way.

The concept of hand-washing to prevent the spread of disease for instance, wasn’t born out of a sophisticated randomized-clinical trial. But by a mediocre accounting of mortality rates using a much less developed epidemiologic study. The obstetrician who stumbled upon this profound discovery, long before Pasteur later postulated the germ theory of disease, was called Ignaz Semmelweis, later to be known as the “savior of mothers”. His new vision led to the discovery of something so radical, that the medical community of his day rejected it and his results were never seriously looked at during his lifetime (So much for peer-review, eh?). The doctor struggled with this till his last breath, suffering at an insane asylum and ultimately dying at the young age of 47.

That smoking is tied with lung cancer was first conclusively learned by an important prospective cohort study that was largely done by mailing a series of questionnaires out to smoking and non-smoking physicians over a period of time, asking how they were doing. Yes, even questionnaires, when used intelligently, could be more than just unremarkable pieces of paper; they could be gateways that open our eyes to our magnificent universe!

From the polymath and physician, Copernicus’s seemingly pointless calculations on the positions of planets to the dreary routine of looking at microbial growth in petri-dishes by physician Koch, to physicist and polymath, Young‘s proposal of a working theory for color vision, to the physician, John Snow’s phenomenal work on preventing cholera by studying water wells long before the microbe was even identified, time and time again we have learned about the enormous implications of science on the cheap. And science of the mundane. There’s wisdom in applying the KISS (Keep It Simple Stupid) principle to science after all! Even in the more advanced technologically replete scientific studies.

More on the topic of finding extraordinary ideas in ordinary things, I was reminded recently of a couple of enchanting papers and lectures. One was about finding musical patterns in the sequence of our DNA. And the second was an old but interesting paper1 that proposes a radical model for the biology of the cell and that seeks to reconcile the paradoxes that we observe in biological experiments. That there could be some deep logical underpinning to the maxim, “biology is a science of exceptions”, is really quite an exciting idea:

Surprise is a sign of failed expectations. Expectations are always derived from some basic assumptions. Therefore, any surprising or paradoxical data challenges either the logical chain leading from assumptions to a failed expectation or the very assumptions on which failed expectations are based. When surprises are sporadic, it is more likely that a particular logical chain is faulty, rather than basic assumptions. However, when surprises and paradoxes in experimental data become systematic and overwhelming, and remain unresolved for decades despite intense research efforts, it is time to reconsider basic assumptions.

One of the basic assumptions that make proteomics data appear surprising is the conventional deterministic image of the cell. The cell is commonly perceived and traditionally presented in textbooks and research publications as a pre-defined molecular system organized and functioning in accord with the mechanisms and programs perfected by billions years of biological evolution, where every part has its role, structure, and localization, which are specified by the evolutionary design that researchers aim to crack by reverse engineering. When considered alone, surprising findings of proteomics studies are not, of course, convincing enough to challenge this image. What makes such a deterministic perception of the cell untenable today is the massive onslaught of paradoxical observations and surprising discoveries being generated with the help of advanced technologies in practically every specialized field of molecular and cell biology [1217].

One of the aims of this article is to show that, when reconsidered within an alternative framework of new basic assumptions, virtually all recent surprising discoveries as well as old unresolved paradoxes fit together neatly, like pieces of a jigsaw puzzle, revealing a new image of the cell–and of biological organization in general–that is drastically different from the conventional one. Magically, what appears as paradoxical and surprising within the old image becomes natural and expected within the new one. Conceptually, the transition from the old image of biological organization to a new one resembles a gestalt switch in visual perception, meaning that the vast majority of existing data is not challenged or discarded but rather reinterpreted and rearranged into an alternative systemic perception of reality.

— (CC BY license)

Inveigled yet 🙂 ? Well then, go ahead and give it a look!

And as mentioned earlier in the post, one could extend this concept of seeking out phenomenal truths in everyday things to many other fields. As a photography buff, I can tell you that ordinary and boring objects can really start to get interesting when viewed up close and magnified. A traveler who takes the time to immerse himself in the communities he’s exploring, much like Xuan Zang or Wilfred Thesiger or Ibn Battuta, suddenly finds that what is to be learned is vast and all the more enjoyable.

The potential to find and learn things with this new way to envision our universe can be truly revolutionary. If you’re good at it, it soon becomes hard to ever get bored!

Footnotes:

  1. Kurakin, A. (2009). Scale-free flow of life: on the biology, economics, and physics of the cell. Theoretical Biology and Medical Modelling, 6(1), 6. doi:10.1186/1742-4682-6-6


Copyright Firas MR. All Rights Reserved.

“A mote of dust, suspended in a sunbeam.”


Search Blog For Tags: , , , ,

Written by Firas MR

November 13, 2010 at 10:48 am

Contrasts In Nerdity & What We Gain By Interdisciplinary Thinking

leave a comment »

scientific fields and purity

Where Do You Fit In This Paradigm? (via xkcd CC BY-NC license)

I’ve always been struck by how nerds can act differently in different fields.

An art nerd is very different from a tech nerd. Whereas the former could go on and on about brush strokes, lighting patterns, mixtures of paint, which drawing belongs to which artist, etc. the latter can engage in ad-infinitum discussions about the architecture of the internet, how operating systems work, whose grip on Assembly is better, why their code works better, etc.

And what about math and physics nerds? They tend to show their feathers off by displaying their understanding of chaos theory, why imaginary numbers matter, and how we are all governed by “laws of nature”, etc.

How about physicians and med students? Well, like most biologists, they’ll compete with each other by showing off how much of anatomy, physiology or biochemistry or drug properties they can remember, who’s uptodate on the most recent clinical trial statistics (sort of like a fan of cricket/baseball statistics), and why their technique of proctoscopy is better than somebody else’s, the latest morbidity/mortality rates following a given procedure, etc.

And you could actually go on about nerds in other fields too – historians (who remembers what date or event), political analysts (who understands the Thai royal family better), farmers (who knows the latest in pesticides), etc.

Each type has its own traits, that reflect the predominant mindset (at the highest of intellectual levels) when it comes to approaching their respective subject matter. And nerds, being who they are, can tend to take it all to their heads and think they’ve found that place — of ultimate truth, peace and solace. That they are at last, “masters” of their subjects.

I’ve always found this phenomenon to be rather intriguing. Because in reality, things are rarely that simple – at least when it comes to “mastery”.

In medicine for instance, the nerdiest of most nerds out there will be proud and rather content with the vast statistics, nomenclature, and learn-by-rote information that he has finally been able to contain within his head. Agreed, being able to keep such information at the tip of one’s tongue is an achievement considering the bounds of average human memory. But what about the fact that he has no clue as to what fundamentally drives those statistics, why one drug works for a condition whereas another drug with the same properties (i.e. properties that medical science knows of) fails or has lower success rates, etc.? A physicist nerd would approach this matter as something that lies at the crux of an issue — so much so that he would get sleepless nights without being able to find some model or theory that explains it mathematically, in a way that seems logical. But a medical nerd? He’s very different. His geekiness just refuses to go there, because of the discomforting feeling that he has no idea whatsoever! More stats and names to rote please, thank you!

I think one of the biggest lessons we learn from the really great stalwarts in human history is that, they refused to let such stuff get to their heads. The constant struggle to find and maintain humility in knowledge was central to how they saw themselves.

… I can live with doubt and uncertainty and not knowing. I think it’s much more interesting to live not knowing than to have answers which might be wrong. I have approximate answers and possible beliefs and different degrees of certainty about different things, but I’m not absolutely sure of anything and there are many things I don’t know anything about, such as whether it means anything to ask why we’re here, and what the question might mean. I might think about it a little bit and if I can’t figure it out, then I go on to something else, but I don’t have to know and answer, I don’t feel frightened by not knowing things, by being lost in a mysterious universe without having any purpose, which is the way it really is so far as I can tell. It doesn’t frighten me.

Richard Feynman speaking with Horizon, BBC (1981)

The scientist has a lot of experience with ignorance and doubt and uncertainty, and this experience is of great importance, I think. When a scientist doesn’t know the answer to a problem, he is ignorant. When he has a hunch as to what the result is, he is uncertain. And when he is pretty darn sure of what the result is going to be, he is in some doubt. We have found it of paramount importance that in order to progress we must recognize the ignorance and leave room for doubt. Scientific knowledge is a body of statements of varying degrees of certainty – some most unsure, some nearly sure, none absolutely certain.

Now, we scientists are used to this, and we take it for granted that it is perfectly consistent to be unsure – that it is possible to live and not know. But I don’t know everybody realizes that this is true. Our freedom to doubt was born of a struggle against authority in the early days of science. It was a very deep and very strong struggle. Permit us to question – to doubt, that’s all – not to be sure. And I think it is important that we do not forget the importance of this struggle and thus perhaps lose what we have gained.

What Do You Care What Other People Think?: Further Adventures of a Curious Character by Richard Feynman as told to Ralph Leighton

an interdisciplinary web of a universe

An Interdisciplinary Web of a Universe (via Clint Hamada @ Flickr; CC BY-NC-SA license)

Besides being an important aspect for high-school students to consider when deciding what career path to pursue, I think that these nerd-personality-traits also illustrate the role that interdisciplinary thinking can play in our lives and how it can add tremendous value in the way we think. The more one diversifies, the more his or her thinking expands — for the better, usually.

Just imagine a nerd who’s cool about art, physics, math or medicine, etc. — all put together, in varying degrees. What would his perspective of his subject matter and of himself be like? Would he make the ultimate translational research nerd? It’s not just the knowledge one could potentially piece together, but the mindset that one would begin to gradually develop. After all, we live in an enchanting web of a universe, where everything intersects everything!


Copyright Firas MR. All Rights Reserved.

“A mote of dust, suspended in a sunbeam.”



Search Blog For Tags: , , , , ,

 

Written by Firas MR

November 12, 2010 at 12:00 am

Revitalizing Science Education

with one comment

[Video]
Richard Feynman: “… But you’ve gotta stop and think about it. About the complexity to really get the pleasure. And it’s all really there … the inconceivable nature of nature! …”

And when I read Feynman’s description of a rose — in which he explained how he could experience the fragrance and beauty of the flower as fully as anyone, but how his knowledge of physics enriched the experience enormously because he could also take in the wonder and magnificence of the underlying molecular, atomic, and subatomic processes — I was hooked for good. I wanted what Feynman described: to assess life and to experience the universe on all possible levels, not just those that happened to be accessible to our frail human senses. The search for the deepest understanding of the cosmos became my lifeblood […] Progress can be slow. Promising ideas, more often than not, lead nowhere. That’s the nature of scientific research. Yet, even during periods of minimal progress, I’ve found that the effort spent puzzling and calculating has only made me feel a closer connection to the cosmos. I’ve found that you can come to know the universe not only by resolving its mysteries, but also by immersing yourself within them. Answers are great. Answers confirmed by experiment are greater still. But even answers that are ultimately proven wrong represent the result of a deep engagement with the cosmos — an engagement that sheds intense illumination on the questions, and hence on the universe itself. Even when the rock associated with a particular scientific exploration happens to roll back to square one, we nevertheless learn something and our experience of the cosmos is enriched.

Brian Greene, in The Fabric of The Cosmos

When people think of “science education”, they usually tend to think about it in the context of high school or college. When in reality it should be thought of as encompassing education life-long, for if analyzed deeply, we all realize that we never cease to educate ourselves no matter what our trade. Because we understand that what life demands of us is the capacity to function efficiently in a complex society. As we gain or lose knowledge, our capacities keep fluctuating and we always desire and often strive for them to be right at the very top along that graph.

When it comes to shaping attitudes towards science, which is what I’m concerned about in this post, I’ve noticed that this begins quite strongly during high school, but as students get to college and then university, it gradually begins to fade away, even in some of the more scientific career paths. By then I guess, some of these things are assumed (at times you could say, wrongfully). We aren’t reminded of it as frequently and it melts into the background as we begin coping with the vagaries of grad life. By the time we are out of university, for a lot of us, the home projects, high-school science fests, etc. that we did in the past as a means to understand scientific attitude, ultimately become a fuzzy, distant dream.

I’ve observed this phenomenon as a student in my own life. As med students, we are seldom reminded by professors of what it is that constitutes scientific endeavor or ethic. Can you recall when was the last time you had didactic discussions on the topic?

I came to realize this vacuum early on in med school. And a lot of times this status quo doesn’t do well for us. Take Evidence-Based-Medicine (EBM) for example. One of the reasons, why people make errors in interpreting and applying EBM in my humble opinion, is precisely because of the naivete that such a vacuum allows to fester. What ultimately happens is that students remain weak in EBM principles, go on to become professors, can not teach EBM to the extent that they ought to and a vicious cycle ensues whereby the full impact of man’s progress in Medicine will not be fulfilled. And the same applies to how individuals, departments and institutions implement auditing, quality-assurance, etc. as well.

A random post that I recently came across in the blogosphere touched upon the interesting idea that when you really think about it, most practicing physicians are ultimately technicians whose job it is to fix and maintain patients (like how a mechanic oils and fixes cars). The writer starts out with a provocative beginning,

Is There A Doctor In The House?


[…]

Medical doctors often like to characterize themselves as scientists, and many others in the public are happy to join them in this.

I submit, however, that such a characterization is an error.

[…]

and divides science professionals into,

[…]

SCIENTIST: One whose inquiries are directed toward the discovery of new facts.

ENGINEER: One whose inquiries are directed toward the new applications of established facts.

TECHNICIAN: One whose inquiries are directed toward the maintenance of established facts.

[…]

and then segues into why even if that’s the case, being a technician in the end has profound value.

Regardless of where you find yourselves in that spectrum within this paradigm, I think it’s obvious that gaining skills in one area helps you perform better in others. So as technicians, I’m sure that practicing physicians will find that their appraisal and implementation of EBM will improve if they delve into how discoverers work and learn about the pitfalls of their trade. The same could be said of learning about how inventors translate this knowledge from the bench to the bedside as new therapies, etc. are developed and the caveats involved in the process.

Yet it is precisely in these aspects that I find that medical education requires urgent reform. Somehow, as if by magic, we are expected to do the work of a technician and to get a grip on EBM practices without a solid foundation for how discoverers and inventors work.

I think it’s about time that we re-kindled the spirit of understanding scientific attitude at our higher educational institutions and in our lives (for those of us who are already out of university).

From self-study and introspection, here are a couple of points and questions that I’ve made a note of so far, as I strive to re-invigorate the scientific spirit within me, in my own way. As you reflect on them, I hope that they are useful to you in working to become a better science professional as well:

  1. Understand the three types of science professionals and their roles. Ask where in the spectrum you lie. What can you learn about the work professionals in the other categories do to improve how you yourself function?
  2. Learning about how discoverers work, helps us in getting an idea about the pitfalls of science. Ultimately, questions are far more profound than the answers we keep coming up with. Do we actually know the answer to a question? Or is it more correct to say that we think we know the answer? What we think we know, changes all the time. And this is perfectly acceptable, as long as you’re engaged as a discoverer.
  3. What are the caveats of using language such as the phrase “laws of nature”? Are they “laws”, really? Or abstractions of even deeper rules and/or non-rules that we cannot yet touch?
  4. Doesn’t the language we use influence how we think?
  5. Will we ever know if we have finally moved beyond abstractions to deeper rules and/or non-rules? Abstractions keep shifting, sometimes in diametrically opposite directions (eg: from Newton’s concepts of absolute space-time to Einstein’s concepts of relative space-time, the quirky and nutty ideas of quantum mechanics such as the dual nature of matter and the uncertainty principle, concepts of disease causation progressing from the four humours to microbes and DNA and ultimately a multifactorial model for etiopathogenesis). Is it a bad idea to pursue abstractions in your career? Just look at String Theorists; they have been doing this for a long time!
  6. Develop humility in approach and knowledge. Despite all the grand claims we make about our scientific “progress”, we’re just a tiny speck amongst the billions and billions of specks in the universe and limited by our senses and the biology of which we are made. The centuries old debate among philosophers of whether man can ever claim to one day have found the “ultimate truth” still rages on. However, recently we think we know from Kurt Godel’s work that there are truths out there in nature that man can never arrive at by scientific proof. In other words, truths that we may never ever know of! Our understanding of the universe and its things keeps shifting continuously, evolving as we ourselves as a species improve (or regress, depending on your point of view). Understanding that all of this is how science works is paramount. And there’s nothing wrong with that. It’s just the way it is! 🙂
  7. Understand the overwhelming bureaucracy in science these days. But don’t get side-tracked! It’s far too big of a boatload to handle on one’s own! There are dangers that lead people to leave science altogether because of this ton of bureaucracy.
  8. Science for career’s sake is how many people get into it. Getting a paper out can be a good career move. But it’s far more fun and interesting to do science for science’s own sake, and the satisfaction you get by roaming free, untamed, and out there to do your own thing will be ever more lasting.
  9. Understand the peer-review process in science and its benefits and short-comings.
  10. Realize the extremely high failure rate in terms of the results you obtain. Over 90% by most anecdotal accounts – be that in terms of experimental results or publications. But it’s important to inculcate curiosity and to keep the propensity to question alive. To discover. And to have fun in the process. In short, the right attitude; despite knowing that you’re probably never going to earn a Fields medal or Nobel prize! Scientists like Carl Friederich Gauss were known to dislike publishing innumerable papers, favoring quality over quantity. Quite contrary to the trends that Citation Metrics seem to have a hand in driving these days. It might be perfectly reasonable to not get published sometimes. Look at the lawyer-mathematician, Pierre de Fermat of Fermat’s Last Theorem fame. He kept notes and wrote letters but rarely if ever published in journals. And he never did publish the proof of Fermat’s Last Theorem, claiming that it was too large to fit in the margins of a copy of a book he was reading as the thought occurred to him. He procrastinated until he passed away, when it became one of the most profound math mysteries ever to be tackled, only to be solved about 358 years later by Andrew Wiles. But the important thing to realize is that Fermat loved what he did, and did not judge himself by how many gazillion papers he could or could not have had to his name.
  11. Getting published does have a sensible purpose though. The general principle is that the more peer-review the better. But what form this peer-review takes does not necessarily have to be in the form of hundreds of thousands of journal papers. There’s freedom in how you go about getting it, if you get creative. And yes, sometimes, peer-review fails to serve its purpose. Due to egos and politics. The famous mathematician, Evariste Galois was so fed-up by it that he chose to publish a lot of his work privately. And the rest, as they say, is history.
  12. Making rigorous strides depends crucially on a solid grounding in Math, Probability and Logic. What are the pitfalls of hypothesis testing? What is randomness and what does it mean? When do we know that something is truly random as opposed to pseudo-random? If we conclude that something is truly random, how can we ever be sure of it? What can we learn from how randomness is interpreted in inflationary cosmology in the manner that there’s “jitter” over quantum distances but that it begins to fade over larger ones (cf. Inhomogeneities in Space)? Are there caveats involved when you create models or conceptions about things based on one or the other definitions of randomness? How important is mathematics to biology and vice versa? There’s value in gaining these skills for biologists. Check out this great paper1 and my own posts here and here. Also see the following lecture that stresses on the importance of teaching probability concepts for today’s world and its problems:


    [Video]

  13. Developing collaborative skills helps. Lateral reading, attending seminars and discussions at various departments can help spark new ideas and perspectives. In Surely You’re Joking Mr. Feynman!, the famous scientist mentions how he always loved to dabble in other fields, attending random conferences, even once working on ribosomes! It was the pleasure of finding things out that mattered! 🙂
  14. Reading habits are particularly important in this respect. Diversify what you read. Focus on the science rather than the dreary politics of science. It’s a lot more fun! Learn the value of learning-by-self and taking interest in new things.
  15. Like it or not, it’s true that unchecked capitalism can ruin balanced lives, often rewarding workaholic self-destructive behavior. Learning to diversify interests helps take off the pressure and keeps you grounded in reality and connected to the majestic nature of the stuff that’s out there to explore.
  16. The rush that comes from all of this exploration has the potential to lead to unethical behavior. It’s important to not lose sight of the sanctity of life and the sanctity of our surroundings. Remember all the gory examples that  WW2 gave rise to (from the Nazi doctors to all of those scientists whose work ultimately gave way to the loss of life that we’ve come to remember in the euphemism, “Hiroshima and Nagasaki”). Here’s where diversifying interests also helps. Think how a nuclear scientist’s perspectives could change about his work if he spent time taking a little interest in wildlife and the environment. Also, check this.
  17. As you diversify, try seeing science in everything – eg: When you think about photography think not just about the art, but about the nature of the stuff you’re shooting, the wonders of the human eye and the consequences of the arrangement of rods and cones and the consequences of the eyeball being round, its tonal range compared to spectral cameras, the math of perspective, and the math of symmetry, etc.
  18. Just like setting photography assignments helps to ignite the creative spark in you, set projects and goals in every avenue that you diversify into. There’s no hurry. Take it one step at a time. And enjoy the process of discovery!
  19. How we study the scientific process/method should be integral to the way people should think about education. A good analogy although a sad one is, conservation and how biology is taught at schools. Very few teachers and schools will go out of their way to emphasize and interweave solutions for sustainable living and conserving wildlife within the matter that they talk about even though they will more than easily get into the nitty-gritty of the taxonomy, the morphology, etc. You’ll find paragraphs and paragraphs of verbiage on the latter but not the former. This isn’t the model to replicate IMHO! There has to be a balance. We should be constantly reminded about what constitutes proper scientific ethic in our education, and it should not get to the point that it begins to fade away into the background.
  20. The current corporate-driven, public-interest research model is a mixed bag. Science shouldn’t in essence be something for the privileged or be monopolized in the hands of a few. Good ideas have the potential to get dropped if they don’t make business sense. Understand public and private funding models and their respective benefits and shortcomings. In the end realize that there are so many scientific questions out there to explore, that there’s enough to fill everybody’s plate! It’s not going to be the end of the world, if your ideas or projects don’t receive the kind of funding you desire. It’s ultimately pretty arbitrary 🙂 ! Find creative solutions to modify your project or set more achievable goals. The other danger in monetizing scientific progress is the potential to inculcate the attitude of science for money. Doing science for the joy of it is much more satisfying than the doing it for material gain IMHO. But different people have different preferences. It’s striking a balance that counts.
  21. The business model of science leads us into this whole concept of patent wars and Intellectual Property issues. IMHO there’s much value in having a free-culture attitude to knowledge, such as the open-access and open-source movements. Imagine what the world would be like if Gandhi (see top-right) patented the Satyagrah, requiring random licensing fees or other forms of bondage! 🙂
  22. It’s important to pursue science projects and conduct fairs and workshops even at the university level (just as much as it is emphasized in high school; I would say to an even greater degree actually). Just to keep the process of discovery and scientific spirit vibrant and alive, if for no other reason. Also, the more these activities reflect the inter-relationship between the three categories of science professionals and their work, the better. Institutions should recognize the need to encourage these activities for curricular credit, even if that means cutting down on other academic burdens. IMHO, on balance, the small sacrifice is worth it.
  23. Peer-review mechanisms currently reward originality. But at the same time, I think it’s important to reward repeatability/reproducibility. And to reward statistically insignificant findings. This not only helps remove bias in published research, but also helps keep the science community motivated in the face of a high failure rate in experiments, etc.
  24. Students should learn the art of benchmarking progress on a smaller scale, i.e. in the experiments, projects, etc. that they do. In the grand scheme of things however, we should realize that we may never be able to see humongous shifts in how we are doing in our lifetimes! 🙂

    Srinivasa Ramanujan

    Srinivasa Ramanujan

  25. A lot of stuff that happens at Ivy League universities can be classified as voodoo and marketing. So it’s important to not fret if you can’t get into your dream university. The ability to learn lies within and if appropriately tapped and channelized can be used to accomplish great stuff regardless of where you end up studying. People who graduate from Ivy League institutes form a wide spectrum, with a surprising number who could easily be regarded as brain-dead. IMHO what can be achieved is a lot more dependent on the person rather than the institution he or she goes to. If there’s a will, there’s a way! 🙂 Remember some of science’s most famous stalwarts like Michael Faraday and Srinivasa Ramanujan were largely self-taught!
  26. Understand the value of computing in science. Not only has this aspect been neglected at institutes (especially in Biology and Medicine), but it’s soon getting indispensable because of the volume of data that one has to sift and process these days. I’ve recently written about bioinformatics and computer programming here and here.
  27. It’s important to develop a level of honesty and integrity that can withstand the onward thrust of cargo-cult science.
  28. Learn to choose wisely who your mentors are. Factor in student-friendliness, the time they can spend with you, and what motivates them to pursue science.
  29. I usually find myself repelled by demagoguery. But if you must, choose wisely who your scientific heroes are. Are they friendly to other beings and the environment? You’d be surprised as to how many evil scientists there can be out there! 🙂

I’m sure there are many many points that I have missed and questions that I’ve left untouched. I’ll stop here though and add new stuff as and when it occurs to me later. Send me your comments, corrections and feedback and I’ll put them up here!

I have academic commitments headed my way and will be cutting down on my blogular activity for a while. But don’t worry, not for long! 🙂

I’d like to end now, by quoting one of my favorite photographers, George Steinmetz:

[Video]
George Steinmetz: “… I find that there is always more to explore, to question and, ultimately, to understand …”

Footnotes:

  1. Bialek, W., & Botstein, D. (2004). Introductory Science and Mathematics Education for 21st-Century Biologists. Science, 303(5659), 788-790. doi:10.1126/science.1095480


Copyright Firas MR. All Rights Reserved.

“A mote of dust, suspended in a sunbeam.”


Search Blog For Tags: , , , ,

Written by Firas MR

November 6, 2010 at 5:21 am

The Mucking About That Pervades Academia In Scientific Pursuit

with one comment

Bureaucracy (by Kongharald @ Flickr by-sa license)

Howdy readers!

I’ve not had the chance yet to delve into the bureaucracy of academia in science, having relegated it to future reading and followup. Some interesting reading material that I’ve put on my to-read list for future review is:


Academic medicine: a guide for clinicians
By Robert B. Taylor


Advice for a Young Investigator
By Santiago Ramón y Cajal, Neely Swanson, Larry W. Swanson

Do let me know if there any others that you’ve found worth a look.

In the meantime, I just caught the following incisive read on the topic via a trackback to my blog from a generous reader:

    Lawrence, P. A. (2009). Real Lives and White Lies in the Funding of Scientific Research. PLoS Biol, 7(9), e1000197. doi:10.1371/journal.pbio.1000197

Writing about the odious tentacles that young academics have to maneuver against, author Peter Lawrence of Cambridge (UK) says that “the granting system turns young scientists into bureaucrats and then betrays them”.

He then goes on to describe in detail with testimonies from scientists as to how and why exactly that’s the case. And concludes that not only does the status quo fundamentally perverse freedom in scientific pursuit but also causes unnecessary wastage sometimes to the detriment of people’s careers and livelihoods despite their best endeavors to stay dedicated to the pursuit of scientific knowledge. And how this often leads to die hard researchers dropping out from continuing research altogether!

Some noteworthy excerpts (Creative Commons Attribution License):

[…]

The problem is, over and over again, that many very creative young people, who have demonstrated their creativity, can’t figure out what the system wants of them—which hoops should they jump through? By the time many young people figure out the system, they are so much a part of it, so obsessed with keeping their grants, that their imagination and instincts have been so muted (or corrupted) that their best work is already behind them. This is made much worse by the US system in which assistant professors in medical schools will soon have to raise their own salaries. Who would dare to pursue risky ideas under these circumstances? Who could dare change their research field, ever?—Ted Cox, Edwin Grant Conklin Professor of Biology, Director of the Program on Biophysics, Princeton University

[…]

the present funding system in science eats its own seed corn [2]. To expect a young scientist to recruit and train students and postdocs as well as producing and publishing new and original work within two years (in order to fuel the next grant application) is preposterous. It is neither right nor sensible to ask scientists to become astrologists and predict precisely the path their research will follow—and then to judge them on how persuasively they can put over this fiction. It takes far too long to write a grant because the requirements are so complex and demanding. Applications have become so detailed and so technical that trying to select the best proposals has become a dark art. For postdoctoral fellowships, there are so many arcane and restrictive rules that applicants frequently find themselves to be of the wrong nationality, in the wrong lab, too young, or too old. Young scientists who make the career mistake of concentrating on their research may easily miss the deadline for the only grant they might have won.

[…]

After more than 40 years of full-time research in developmental biology and genetics, I wrote my first grant and showed it to those experienced in grantsmanship. They advised me my application would not succeed. I had explained that we didn’t know what experiments might deliver, and had acknowledged the technical problems that beset research and the possibility that competitors might solve problems before we did. My advisors said these admissions made the project look precarious and would sink the application. I was counselled to produce a detailed, but straightforward, program that seemed realistic—no matter if it were science fiction. I had not mentioned any direct application of our work: we were told a plausible application should be found or created. I was also advised not to put our very best ideas into the application as it would be seen by competitors—it would be safer to keep those ideas secret.

The peculiar demands of our granting system have favoured an upper class of skilled scientists who know how to raise money for a big group [3]. They have mastered a glass bead game that rewards not only quality and honesty, but also salesmanship and networking. A large group is the secret because applications are currently judged in a way that makes it almost immaterial how many of that group fail, so long as two or three do well. Data from these successful underlings can be cleverly packaged to produce a flow of papers—essential to generate an overlapping portfolio of grants to avoid gaps in funding.

Thus, large groups can appear effective even when they are neither efficient nor innovative. Also, large groups breed a surplus of PhD students and postdocs that flood the market; many boost the careers of their supervisors while their own plans to continue in research are doomed from the outset. The system also helps larger groups outcompete smaller groups, like those headed by younger scientists such as K. It is no wonder that the average age of grant recipients continues to rise [4]. Even worse, sustained success is most likely when risky and original topics are avoided and projects tailored to fit prevailing fashions—a fact that sticks a knife into the back of true research [5]. As Sydney Brenner has said, “Innovation comes only from an assault on the unknown” [6].

How did all this come about? Perhaps because the selection process is influenced by two sets of people who see things differently. The first are the granting organisations whose employees are charged to spend the money wisely and who believe that the more detailed and complex the applications are, the more accurately they will be judged and compared. Over the years, the application forms have become encrusted with extra requirements.

Universities have whole departments devoted to filling in the financial sections of these forms. Liaison between the scientists and these departments and between the scientists and employees of the granting agencies has become more and more Kafkaesque.

The second set of people are the reviewers and the committee, usually busy scientists who themselves spend much time writing grants. They try to do their best as fast as they can. Generally, each reviewer reads just one or two applications and is asked to give each a semiquantitative rating (“outstanding,” “nationally competitive,” etc.). Any such rating must be whimsical because each reviewer sees few grants. It is particularly difficult to rank strongly original grants; for no one will know their chances of success. The committee are usually presented with only the applications that have received uniformly positive reviews—perhaps favouring conventional applications that upset no one. The committee might have 30 grants to place in order of priority, which is vital, as only the top few can be funded. I wonder if the semiquantitative and rather spurious ratings help make this ordering just [7]. I also suspect any gain in accuracy of assessment due to the detail provided in the applications does not justify the time it takes scientists to produce that detail.

[…]

At the moment, young people need a paper as a ticket for the next step, and we should therefore give deserving, but unlucky, students another chance. One way would be to put more emphasis on open interviews (with presentation by the candidate and questions from the audience) and references. Not objective? No, but only false objectivity is offered by evaluating real people using unreal calculations with numbers of papers, citations, and journal impact factors. These calculations have not only demoralised and demotivated the scientific community [13], they have also redirected our research and vitiated its purpose [14].

[…]

Reading the piece, one can’t help but get the feeling that the current paradigm – “dark art” as the author puts it – is a lot like lobbying in politics! It isn’t enough for someone to have an interest in pursuing a research career. Being successful at it requires an in-depth understanding of a lot of the red-tape involved. Something that is such a fundamental aspect of academic life and yet that isn’t usually brought up – during career guidance talks, assessments of research aptitude, recruitment or what have you.

Do give the entire article a read. It’s worth it!

That does it for today. Until we meet again, cheers!

Copyright © Firas MR. All rights reserved.

Powered by ScribeFire.

Written by Firas MR

July 23, 2010 at 7:17 pm

On Literature Search Tools And Translational Medicine

with 2 comments

Courtesy danmachold@flickr (by-nc-sa license)

Howdy all!

Apologies for the lack of recent blogular activity. As usual, I’ve been swamped with academia.

A couple of interesting pieces on literature search strategies & tools that caught my eye recently, some of which were quite new to me. Do check them out:

  • Matos, S., Arrais, J., Maia-Rodrigues, J., & Oliveira, J. (2010). Concept-based query expansion for retrieving gene related publications from MEDLINE. BMC Bioinformatics, 11(1), 212. doi:10.1186/1471-2105-11-212

[…]

The most popular biomedical information retrieval system, PubMed, gives researchers access to over 17 million citations from a broad collection of scientific journals, indexed by the MEDLINE literature database. PubMed facilitates access to the biomedical literature by combining the Medical Subject Headings (MeSH) based indexing from MEDLINE, with Boolean and vector space models for document retrieval, offering a single interface from which these journals can be searched [5]. However, and despite these strong points, there are some limitations in using PubMed or other similar tools. A first limitation comes from the fact that keyword-based searches usually lead to underspecified queries, which is a main problem in any information retrieval (IR) system [6]. This usually means that users will have to perform various iterations and modifications to their queries in order to satisfy their information needs. This process is well described in [7] in the context of information-seeking behaviour patterns in biomedical information retrieval. Another drawback is that PubMed does not sort the retrieved documents in terms of how relevant they are for the user query. Instead, the documents satisfying the query are retrieved and presented in reverse date order. This approach is suitable for such cases in which the user is familiar with a particular field and wants to find the most recent publications. However, if the user is looking for articles associated with several query terms and possibly describing relations between those terms, the most relevant documents may appear too far down the result list to be easily retrieved by the user.

To address the issues mentioned above, several tools have been developed in the past years that combine information extraction, text mining and natural language processing techniques to help retrieve relevant articles from the biomedical literature [8]. Most of these tools are based on the MEDLINE literature database and take advantage of the domain knowledge available in databases and resources like the Entrez Gene, UniProt, GO or UMLS to process the titles and abstracts of texts and present the extracted information in different forms: relevant sentences describing a biological process or linking two or more biological entities, networks of interrelations, or in terms of co-occurrence statistics between domain terms. One such example is the GoPubMed tool [9], which retrieves MEDLINE abstracts and categorizes them according to the Gene Ontology (GO) and MeSH terms. Another tool, iHOP [10], uses genes and proteins as links between sentences, allowing the navigation through sentences and abstracts. The AliBaba system [11] uses pattern matching and co-occurrence statistics to find associations between biological entities such as genes, proteins or diseases identified in MEDLINE abstracts, and presents the search results in the form of a graph. EBIMed [12] finds protein/gene names, GO annotations, drugs and species in PubMed abstracts showing the results in a table with links to the sentences and abstracts that support the corresponding associations. FACTA [13] retrieves abstracts from PubMed and identifies biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) co-occurring with the terms in the user’s query. The concepts are presented to the user in a tabular format and are ranked based on the co-occurrence statistics or on pointwise mutual information. More recently, there has been some focus on applying more detailed linguistic processing in order to improve information retrieval and extraction. Chilibot [14] retrieves sentences from MEDLINE abstracts relating to a pair (or a list) of proteins, genes, or keywords, and applies shallow parsing to classify these sentences as interactive, non-interactive or simple abstract co-occurrence. The identified relationships between entities or keywords are then displayed as a graph. Another tool, MEDIE [15], uses a deep-parser and a term recognizer to index abstracts based on pre-computed semantic annotations, allowing for real-time retrieval of sentences containing biological concepts that are related to the user query terms.

Despite the availability of several specific tools, such as the ones presented above, we feel that the demand for finding references relevant for a large set of is still not fully addressed. This constitutes an important query type, as it is a typical outcome of many experimental techniques. An example is a gene expression study, in which, after measuring the relative mRNA expression levels of thousands of genes, one usually obtains a subset of differentially expressed genes that are then considered for further analysis [16,17]. The ability to rapidly identify the literature describing relations between these differentially expressed genes is crucial for the success of data analysis. In such cases, the problem of obtaining the documents which are more relevant for the user becomes even more critical because of the large number of genes being studied, the high degree of synonymy and term variability, and the ambiguity in gene names.

While it is possible to perform a composite query in PubMed, or use a list of genes as input to some of the IR tools described above, these systems do not offer a retrieval and ranking strategy which ensures that the obtained results are sorted according to the relevance for the entire input list. A tool more oriented to analysing a set of genes is microGENIE [18], which accepts a set of genes as input and combines information from the UniGene and SwissProt databases to create an expanded query string that is submitted to PubMed. A more recently proposed tool, GeneE [19], follows a similar approach. In this tool, gene names in the user input are expanded to include known synonyms, which are obtained from four reference databases and filtered to eliminate ambiguous terms. The expanded query can then be submitted to different search engines, including PubMed. In this paper, we propose QuExT (Query Expansion Tool), a document indexing and retrieval application that obtains, from the MEDLINE database, a ranked list of publications that are most significant to a particular set of genes. Document retrieval and ranking are based on a concept-based methodology that broadens the resulting set of documents to include documents focusing on these gene-related concepts. Each gene in the input list is expanded to its various synonyms and to a network of biologically associated terms, namely proteins, metabolic pathways and diseases. Furthermore, the retrieved documents are ranked according to user-defined weights for each of these concept classes. By simply changing these weights, users can alter the order of the documents, allowing them to obtain for example, documents that are more focused on the metabolic pathways in which the initial genes are involved.

[…]

(Creative Commons Attribution License: http://creativecommons.org/licenses/by/2.0)

  • Kim, J., & Rebholz-Schuhmann, D. (2008). Categorization of services for seeking information in biomedical literature: a typology for improvement of practice. Brief Bioinform, 9(6), 452-465. doi:10.1093/bib/bbn032
  • Weeber, M., Kors, J. A., & Mons, B. (2005). Online tools to support literature-based discovery in the life sciences. Brief Bioinform, 6(3), 277-286. doi:10.1093/bib/6.3.277

I’m sure there are many other nice ones out there. Don’t forget to also check out the NCBI Handbook. Another great resource …

————————————————————————————————————

On a separate note, a couple of NIH affiliated authors have written some thought provoking stuff about Translational Medicine:-

  • Nussenblatt, R., Marincola, F., & Schechter, A. (2010). Translational Medicine – doing it backwards. Journal of Translational Medicine, 8(1), 12. doi:10.1186/1479-5876-8-12

[…]

The present paradigm of hypothesis-driven research poorly suits the needs of biomedical research unless efforts are spent in identifying clinically relevant hypotheses. The dominant funding system favors hypotheses born from model systems and not humans, bypassing the Baconian principle of relevant observations and experimentation before hypotheses. Here, we argue that that this attitude has born two unfortunate results: lack of sufficient rigor in selecting hypotheses relevant to human disease and limitations of most clinical studies to certain outcome parameters rather than expanding knowledge of human pathophysiology; an illogical approach to translational medicine.

[…]

A recent candidate for a post-doctoral fellowship position came to the laboratory for an interview and spoke of the wish to leave in vitro work and enter into meaningful in vivo work. He spoke of an in vitro observation with mouse cells and said that it could be readily applied to treating human disease. Indeed his present mentor had told him that was the rationale for doing the studies. When asked if he knew whether the mechanisms he outlined in the mouse existed in humans, he said that he was unaware of such information and upon reflection wasn’t sure in any event how his approach could be used with patients. This is a scenario that is repeated again and again in the halls of great institutions dedicated to medical research. Any self respecting investigator (and those they mentor) knows that one of the most important new key words today is “translational”. However, in reality this clarion call for medical research, often termed “Bench to Bedside” is far more often ignored than followed. Indeed the paucity of real translational work can make one argue that we are not meeting our collective responsibility as stewards of advancing the health of the public. We see this failure in all areas of biomedical research, but as a community we do not wish to acknowledge it, perhaps in part because the system, as it is, supports superb science. Looking this from another perspective, Young et al [2] suggest that the peer-review of journal articles is one subtle way this concept is perpetuated. Their article suggests that the incentive structure built around impact and citations favors reiteration of popular work, i.e., more and more detailed mouse experiments, and that it can be difficult and dangerous for a career to move into a new arena, especially when human study is expensive of time and money.

[…]

(Creative Commons Attribution License: http://creativecommons.org/licenses/by/2.0)

Well, I guess that does it for now. Hope those articles pique your interest as much as they did mine. Until we meet again, adios 🙂 !

Copyright © Firas MR. All rights reserved.

Written by Firas MR

June 29, 2010 at 4:33 pm

Why Equivalence Studies Are So Fascinating

with 4 comments

Bronze balance pans and lead weights from the Vapheio tholos tomb, circa 15th century BC. National Museum, Athens. Shot courtesy dandiffendale@Flickr. by-nc-ca license.

Bronze balance pans and lead weights from the Vapheio tholos tomb, circa 15th century BC. National Museum, Athens. Shot courtesy dandiffendale@Flickr. by-nc-sa license.

Objectives and talking points:

  • To recap basic concepts of hypothesis testing in scientific experiments. Readers should read-up on hypothesis testing in reference works.
  • To contrast drug vs. placebo and drug vs. standard drug study designs.
  • To contrast non-equivalence and equivalence studies.
  • To understand implications of these study designs, in terms of interpreting study results.

——————————————————————————————————–

Howdy readers! Today I’m going to share with you some very interesting concepts from a fabulous book that I finished recently – “Designing Clinical Research – An Epidemiologic Approach” by Stephen Hulley et al. The book speaks fairly early on, on what are called “equivalence studies”. Equivalence studies are truly fascinating. Let’s see how.

When a new drug is tested for efficacy, there are multiple ways for us to do so.

A Non-equivalence Study Of Drug vs. Placebo

A drug can be compared to something that doesn’t have any treatment effect whatsoever – a ‘placebo’. Examples of placebos include sugar tablets, distilled water, inert substances, etc. Because pharmaceutical companies try hard to make drugs that have a treatment effect and that are thus different from placebos, the objective of such a comparison is to answer the following question:

Is the new drug any different from the placebo?

Note the emphasis on ‘any different’. As is usually the case, a study of this kind is designed to test for differences between drug and placebo effects in both directions1. That is:

Is the new drug better than the placebo?

OR

Is the new drug worse than the placebo?

The boolean operator ‘OR’, is key here.

Since we can not conduct such an experiment on all people in the target ‘population’ (eg. all people with diabetes from the whole country), we conduct it on a random and representative ‘sample’ of this population (eg. randomly selected diabetes patients from the whole country). Because of this, we can not directly extrapolate our findings to the target population without doing some fancy roundabout thinking and a lot of voodoo first – a.k.a. ‘hypothesis testing’. Hypothesis testing is crucial to take in to account random chance (error) effects that might have crept in to the experiment.

In this experiment:

  • The null hypothesis is that the drug and the placebo DO NOT differ in the real world2.
  • The alternative hypothesis is that the drug and the placebo DO differ in the real world.

So off we go, with our experiment with an understanding that our results might be influenced by random chance (error) effects. Say that, before we start, we take the following error rates to be acceptable:

  1. Even if the null hypothesis is true in the real world, we would find that the drug and the placebo DO NOT differ only 95% of the time, purely by random chance. [Although this rate doesn’t have a name, it is equal to (1 – Type 1 error)].
  2. Even if the null hypothesis is true in the real world, we would find that the drug and the placebo DO differ 5% of the time, purely by random chance. [This rate is also called our Type 1 error, or critical level of significance, or critical α level, or critical ‘p’ value].
  3. Even if the alternative hypothesis is true in the real world, we would find that the drug and the placebo DO differ only 80% of the time, purely by random chance. [This rate is also called the ‘Power‘ of the experiment. It is equal to (1 – Type 2 error)].
  4. Even if the alternative hypothesis is true in the real world, we would find that the drug and the placebo DO NOT differ 20% of the time, purely by random chance. [This rate is also called our Type 2 error].

The strategy of the experiment is this:

If we are able to accept these error rates and show in our experiment that the null hypothesis is false (that is ‘reject‘ it), the only other hypothesis left on the table is the alternative hypothesis. This has then, GOT to be true and we thus ‘accept’ the alternative hypothesis.

Q: With what degree of uncertainty?

A: With the uncertainty that we might arrive at such a conclusion 5% of the time, even if the null hypothesis is true in the real world.

Q: In English please!

A: With the uncertainty that we might arrive at a conclusion that the drug DOES differ from the placebo 5% of the time, even if the drug DOES NOT differ from the placebo in the real world.

Our next question would be:

Q: How do we reject the null hypothesis?

A: We proceed by initially assuming that the null hypothesis is true in the real world (i.e. Drug effect DOES NOT differ from Placebo effect in the real world). We then use a ‘test of statistical significance‘ to calculate the probability of observing a difference in treatment effect in the real world, as large or larger than that actually observed in the experiment.  If this probability is <5%, we reject the null hypothesis. We do this with the belief that such a conclusion is within our pre-selected margin of error. Our pre-selected margin of error, as mentioned previously, is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)3.

If we fail to show that this calculated probability is <5%, we ‘fail to reject‘ the null hypothesis and conclude that a difference in effect has not been proven4.

A lot of scientific literature out there is riddled with drug vs. placebo studies. This kind of thing is good if we do not already have an effective drug for our needs. Usually though, we already have a standard drug that we know works well. It is of more interest to see how a new drug compares to our standard drug.

A Non-equivalence Study Of Drug vs. Standard Drug

These studies are conceptually the same as drug vs. placebo studies and the same reasoning for inference is applied. These studies ask the following question:

Is the new drug any different than the standard drug?

Note the emphasis on ‘any different’. As is often the case, a study of this kind is designed to test the difference between the two drugs in both directions1. That is:

Is the new drug better than the standard drug?

OR

Is the new drug worse than the standard drug??

Again, the boolean operator ‘OR’, is key here.

In this kind of experiment:

  • The null hypothesis is that the new drug and the standard drug DO NOT differ in the real world2.
  • The alternative hypothesis is that the new drug and the standard drug DO differ in the real world.

Exactly like we discussed before, we initially assume that the null hypothesis is true in the real world (i.e. the new drug’s effect DOES NOT differ from the standard drug’s effect in the real world). We then use a ‘test of statistical significance‘ to calculate the probability of observing a difference in treatment effect in the real world, as large or larger than that actually observed in the experiment.  If this probability is <5%, we reject the null hypothesis – with the belief that such a conclusion is within our pre-selected margin of error. Just to repeat ourselves here, our pre-selected margin of error, is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)3.

If we fail to show that this calculated probability is <5%, we ‘fail to reject’ the null hypothesis and conclude that a difference in effect has not been proven4.

An Equivalence Study Of Drug vs. Standard Drug

Sometimes all you want is a drug that is as good as the standard drug. This can be for various reasons – the standard drug is just too expensive, just too difficult to manufacture, just too difficult to administer, … and so on. Whereas the new drug might not have these undesirable qualities yet retain the same treatment effect.

In an equivalence study, the incentive is to prove that the two drugs are the same. Like we did before, let’s explicitly formulate our two hypotheses:

  • The null hypothesis is that the new drug and the standard drug DO NOT differ in the real world2.
  • The alternative hypothesis is that the new drug and the standard drug DO differ in the real world.

We are mainly interested in proving the null hypothesis. Since this can’t be done4, we’ll be content with ‘failing to reject’ the null hypothesis. Our strategy is to design a study powerful enough to detect a difference close to 0 and then ‘fail to reject’ the null hypothesis. In doing so, although we can’t ‘prove’ for sure that the null hypothesis is true, we can nevertheless be more comfortable saying that it in fact is true.

In order to detect a difference close to 0, we have to increase the Power of the study from the usual 80% to something like 95% or higher. We wan’t to maximize power to detect the smallest difference possible. Usually though, it’s enough if we are able to detect the the largest difference that doesn’t have clinical meaning (eg: a difference of 4mm on a BP measurement). This way we can compromise a little on Power and choose a less extreme figure, say 88% or something.

And then just as in our previous examples, we proceed with the assumption that the null hypothesis is true in the real world. We then use a ‘test of statistical significance‘ to calculate the probability of observing a difference in treatment effect in the real world, as large or larger than that actually observed in the experiment.  If this probability is <5%, we reject the null hypothesis – with the belief that such a conclusion is within our pre-selected margin of error. And to repeat ourselves yet again (boy, do we like doing this 😛 ), our pre-selected margin of error is that we would be wrong about rejecting the null hypothesis 5% of the time (our Type 1 error rate)3.

If we fail to show that this calculated probability is <5%, we ‘fail to reject‘ the null hypothesis and conclude that although a difference in effect has not been proven, we can be reasonably comfortable saying that there is in fact no difference in effect.

So Where Are The Gotchas?

If your study isn’t designed or conducted properly (eg: without enough power, inadequate  sample size, improper randomization, loss of subjects to followup, inaccurate measurements, etc.)  you might end up ‘failing to reject’ the null hypothesis whereas if you had taken the necessary precautions, this might not have happened and you would have come to the opposite conclusion. Purely because of random chance (error) effects. Such improper study designs usually dampen any obvious differences in treatment effect in the experiment.

In a non-equivalence study, researchers, whose incentive it is to reject the null hypothesis, are thus forced to make sure that their designs are rigorous.

In an equivalence study, this isn’t the case. Since researchers are motivated to ‘fail to reject’ the null hypothesis from the get go, it becomes an easy trap to conduct a study with all kinds of design flaws and very conveniently come to the conclusion that one has ‘failed to reject’ the null hypothesis!

Hence, it is extremely important, more so in equivalence studies than in non-equivalence studies, to have a critical and alert mind during all phases of the experiment. Interpreting an equivalence study published in a journal is hard, because one needs to know the very guts of everything the research team did!

Even though we have discussed these concepts with drugs as an example, you could apply the same reasoning to many other forms of treatment interventions.

Hope you’ve found this post interesting 🙂 . Do send in your suggestions, corrections and comments!

Adios for now!

Copyright © Firas MR. All rights reserved.

Readability grades for this post:

Flesch reading ease score: 71.4
Automated readability index: 8.1
Flesch-Kincaid grade level: 7.4
Coleman-Liau index: 9
Gunning fog index: 11.8
SMOG index: 11

1. An alternative hypothesis for such a study is called a ‘two-tailed alternative hypothesis‘. A study that tests for differences in only one direction has an alternative hypothesis that is called a ‘one-tailed alternative hypothesis‘.
2. This situation is a good example of a ‘null’ hypothesis also being a ‘nil’ hypothesis. A null hypothesis is usually a nil hypothesis, but it’s important to realize that this isn’t always the case.
4. Note that we never use the term, ‘accept the null hypothesis’.

A Brief Tour Of The Field Of Bioinformatics

with 10 comments

This is an example of a full genome sequencing machine. It is the ABI PRISM 3100 Genetic Analyzer. Sequencers like it completely automate the process of sequencing the entire genome. Yes, even yours! [Courtesy: Wikipedia]

Some Background Before The Tour

Ahoy readers! I’ve had the opportunity to read a number of books recently. Among them, is “Developing Bioinformatics Computer Skills” by Cynthia Gibas and Per Jambeck. I dived into the book straight away, having no basic knowledge at all of what comprises the field of bioinformatics. Actually, it was quite like the first time I started medical college. On our first day, we were handed a tiny handbook on human anatomy, called “Handbook Of General Anatomy” by B D Chaurasia. Until actually opening that book, absolutely no one in the class had any idea of what Medicine truly was. All we had with us were impressions of charismatic white-coats who could, as if by magic, diagnose all kinds of weird things by the mere touch of a hand. Not to mention, legendary tales from the likes of Discovery Channel. Oh yes, our expectations were of epic proportions 😛 . As we flipped through the pages of that little book, we were flabbergasted by the sheer volume of information that one had to rote. It had soon become clear to us, what medicine was all about – Physiology is the study of normal body functions akin to physics, Anatomy is the study of the structural organization of the human body a la geography … – and this set us on the path to learning to endure an avalanche of learn-by-rote information for the rest of our lives.

Bioinformatics is shrouded in mystery for most medics. Because, so many of these ideas are completely new. The technologies are new. The data available are new. Before the human genome was sequenced, there was virtually no point of using computers to understand genes and alleles. Most of what needed to be sorted out could be done by hand. But now that we have huge volumes of data, and data that are growing at an exponential rate at that, it makes sense to use computers to connect the dots and frame hypotheses. I guess, bioinformatics is a conundrum to most other people too – whether you are coming from a math background, a computer science background or a biology background – we all have something missing from our repertoire of knowledge and skills.

What is the rationale behind using computation to understand genes? In yore times, all we had were a couple of known genes. We had the tools of Mendelian genetics and linkage analysis to solve most of the genetic mysteries. The human genome project changed that. We are suddenly flooded not only with sequences that we don’t know anything about, but also the gigantic hurdle of finding relationships between them. To give you a sense of the magnitude of numbers we’re talking about here: we could simplify DNA’s 3-D structure and represent the entire genetic code contained in a single polynucleotide strand of the human genome, as a string of letters A, C, G or T each representing a given nucleic acid (base) in a long sequence (like so …..ATCGTTACGTAAAA…..). Since it has been found that this strand is approximately 3 billion bases long, its entire length comes to 3 billion bytes. That’s because each letter A, T, C or G could be thought of as being represented by a single ASCII character. And we all know that an ASCII character is equal to 1 byte of data. Since we are talking about two complementary strands within a molecule of DNA, the amount of information within the genome is 6 billion bytes§. But human cells are diploid! So the amount of DNA information in the nucleus of a single human cell is 12 billion bytes! That’s 1.2 terabytes of data neatly packed in to the DNA sequence of every cell – we haven’t even begun to talk about the 3-D structure of DNA or the sequence and 3-D structure of RNA and proteins yet!

§ Special thanks to Martijn for bringing this up in the comments: If you really think about it for a moment, bioinformaticians don’t need to store the sequences of both the DNA strands of a genome in a computer, because the sequence of one strand can be derived from the other – they are complementary by definition. If you store 3 billion bytes from one strand, you can easily derive the complementary 3 billion bytes of information on the other strand, provided that the two strands are truly complementary and there aren’t any blips of mismatch mutations between them. Using this concept, you can get away with storing 3 billion bytes and not 6 billion bytes to capture the information in the human genome.

Special thanks also to Dr. Atul Butte ¥ of Stanford University who dropped by to say that a programmer really doesn’t need a full byte to store a nucleic acid base. A base can be represented by 2 bits (eg. 00 for A, 11 for C, 01 for G and 10 for T). Since 1 byte contains 8 bits, a byte can actually hold 4 bases. Without compression. So 3 billion bases can be held within 750,000,000 bytes. That’s 715 megabytes (1 megabyte = 1048576 bytes), which can easily fit on to an extended-length CD-ROM (not even a DVD). So the entire genetic code from a single polynucleotide strand of the human genome can easily fit on to a single CD-ROM. Since human cells are diploid, with two CD-ROMs – one CD-ROM for each set of chromosomes – you can capture this information for both sets of chromosomes. [go back]

To compound the issue, we don’t have a taxonomy system in place to describe the sequences we have. When Linnaeus invented his taxonomy system for living things, he used basic morphologic criteria to classify organisms. If it walked like a duck and talked like a duck, it was a duck! But how do you apply this reasoning to genes? You might think, why not classify them by organism? But there’s a more subtle issue here too. Some of these genetic sequences can be classified in to various categories – is this gene a promoter, exon, intron or could it be a sequence that plays a role in growth, death, inflammatory response, and so on. Not only that, many sequences could be found in more than one organism. So how do you solve the problem of classification? Man’s answer to this problem is simple – you don’t!

Here’s how we can get away with that. Simply create a relational database using MySQL, PostgreSQL or what have you and create appropriate links between sequence entries, their functions, etc. Run queries to find relationships and voila, there you have it! This was our first step in developing bioinformatics as a field. Building databases. You can do this with a genetic sequence (a string of letters A for ‘adenine‘, C for ‘cytosine‘, G for ‘guanine‘ and T for ‘thymine‘ …represented like so ATGGCTCCTATGCGGTTAAAATTT….) or with an RNA sequence (a string of letters A for ‘adenine’, C for ‘cytosine, G for ‘guanine’ and U for ‘Uracil‘ like so …AUGGCACCCU…) or even a protein sequence (a string of 20 letters each letter representing one amino acid). By breaking down and simplifying a 3-D structure this way, you can suddenly enhance data storage, retrieval and more importantly, analysis between:

  1. Two or more sequences of DNA
  2. Two or more sequences of RNA
  3. Two or more sequences of Protein

You can even find relationships between:

  1. A DNA sequence and an RNA sequence
  2. An RNA sequence and a Protein sequence
  3. A DNA sequence and a Protein sequence

If you can represent the spatial coordinates of the molecules within a protein 3-D structure as cartesian coordinates (x, y, z), you can even analyze structure not only within a given protein, but also try to predict the best possible 3-D structure for a protein that is hypothetically synthesized by a given DNA or RNA sequence. In fact that is the Holy Grail of bioinformatics today. How to predict protein structure from a DNA sequence? And consequentially, how to manipulate protein structure to suit your needs.

The Tour Begins

Let’s take a tour of what bioinformatics holds for us.

The Ability To Build Relational Databases

We have already discussed this above.

Local Sequence Comparison

An example of sequence alignment. Alignment of 27 avian influenza hemagglutinin protein sequences colored by residue conservation (top) and residue properties (bottom) [Courtesy: Wikipedia]

Before we delve in to the idea of sequence comparisons further, let’s take an example from the bioinformatics book I mentioned to understand how sequence comparisons help in the real world. It speaks of a gene-knockout experiment that targets a specific sequence in the fruit fly’s (Drosophila melanogaster) genome. Knocking this sequence out, results in the flies’ progeny being born without eyes. By knocking this gene – called eyeless – out you learn that it somehow plays an important role in eye development in the fruit fly. There’s a similar (but not quite the same) condition in humans called aniridia, in which eyes develop in the usual manner, except for the lack of an iris. Researchers were able to identify the particular gene that causes aniridia and called it aniridia. By inserting the aniridia gene in to an eyeless-knockout Drosophila’s genome, they observed that suddenly its offspring bore eyes! Remarkable isn’t it? Somehow there’s a connection between two genes separated not only by different species, but also by genera and phyla. To discern how each of these genes functions, you proceed by asking if the two sequences could be the same? How similar would they might be exactly? To answer this question you could do an alignment of the two sequences. This is the absolute basic kind of stuff when we do sequence analysis.

Instead of doing it by hand (which could be possible if the sequences being compared were small), you could find the best alignment between these two long sequences using a program such as BLAST. There are a number of ways BLAST can work. Because the two sequences may have only certain regions that fit nicely, with other regions that don’t – called gaps – you can have multiple ways of aligning them side by side. But what you are interested in, is to find the best fit that maximizes how much they overlap with each other (and minimize gaps). Here’s where computer science comes in to play. In order to maximize overlap, you use the concept of ‘dynamic programming‘. It is helpful to understand dynamic programming as an algorithm rather than a program per se (it’s not like you’ll be sitting in front of a computer and programming code if you want to compare eyeless and aniridia; the BLAST program will do the dirty work for you. It uses dynamic programming code that’s built in to it). Amazingly enough, dynamic programming is not something as hi-fi as you might think. It is apparently the same strategy used in many computer spell-checkers! Little did the bioinformaticians who first developed dynamic programming techniques in genetics know, that the concept of dynamic programming was discovered far earlier than them. There are apparently many such cases in bioinformatics where scientists keep reinventing the wheel, purely because it is such an interdisciplinary field! One of the most common algorithms that is a subset of dynamic programming and that is used for aligning specific sequences within a genome is called the Smith-Waterman algorithm. Like dynamic programming, another useful algorithm in bioinformatics is what is called a greedy algorithm. In a greedy algorithm, you are interested in maximizing overlap in each baby-step as you construct the alignment procedure, without consideration to the final overlap. In other words, it doesn’t matter to you how the sequences overlap in the end as long as each step of the way during the alignment process, you maximize overlap. Other concepts in alignment include, using a (substitution) matrix of possible scores when two letters – each in a sequence – overlap and trying to maximize scores using dynamic programming. Common matrices for this purpose are BLOSUM-62, BLOSUM-45 and PAM (Point Accepted Mutation).

So now that we know the basic idea behind sequence alignment, here’s what you can actually do in sequence analysis:

  1. Using alignment, find a sequence from a database (eg. GenBank from the NCBI) that maximizes overlap between it and a sequence that isn’t yet in the database. This way, if you discover some new sequence, you can find relationships between it and known sequences. If the sequence in the database is associated with a given protein, you might be able to look for it in your specimen. This is called pairwise alignment.
  2. Just as you can compare two sequences and find out if there is a statistically significant association between them or not, you can also compare multiple sequences at once. This is called multiple sequence alignment.
  3. If certain regions of two sequences are the same, it can be inferred that they are conserved across species or organisms despite environmental stresses and evolution. A sequence encoding development of the eye is very likely to remain unchanged across multiple species for which sight is an essential function to survive. Here comes another interesting concept – phylogenetic relationships between organisms at a genetic level. Using alignment it is possible to develop phylogenetic trees and phylogenetic networks that link two or more gene sequences and as a consequence find related proteins.
  4. Similar to finding evolutionary homology between sequences as above, one could also look for homology between protein structures – motifs – and then conclude that the regions of DNA encoding these proteins have a certain degree of homology.
  5. There are tools in sequence analysis that look at features characteristic of known functioning regions of DNA and see if the same features exist in a random sequence. This process is called gene finding. You’re trying to discover functionality in hitherto unknown sequences of DNA. This is important, as the vast majority of genetic code is as far as we know, non-functional random junk. Could there be some region in this vast ocean of randomness that might, just might have an interesting function? Gene finding uses software that looks for tRNA encoding regions, promoter sites, open reading frames, exon-intron splicing regions, … – in short, the whole gamut of what we know is characteristic of functional code – in random junk. Once a statistically significant result is obtained, you’re ready to test this in a lab!
  6. A special situation in sequence alignment is whole genome alignment (or global alignment). That is, finding the best fit between entire genomes of different organisms! Despite how arduous this sounds, the underlying ideas are pretty similar to local sequence alignment. One of the most common dynamic programming algorithms used in whole genome alignment is the Needleman–Wunsch algorithm.

Many of the things discussed for sequence analysis of DNA, have equal counterparts for RNA and proteins.

Protein Structure Property Analysis

Say that you have an amino acid sequence for a protein. There’s nothing in the databases that has your sequence. In order to build a 3-D model of this  protein, you’ll need to predict what could be the best possible shape given the constraints of bond angles, electrostatic forces between constituent atoms, etc. There’s a specific technique that warrants mentioning here – the Ramachandran Plot – that takes information on steric hindrance and plots the probabilities for different 3-D structures of an amino acid sequence. With a 3-D model, you could try to predict this protein’s chemical properties (such as pKa, etc.). You could also look for active sites on this protein that are the crucial regions that bind to substrates, based on known structures of active sites from other proteins… and so on.

This figure depicts an unrooted phylogenetic tree for myosin, a superfamily of proteins. [Courtesy: Wikipedia]

Protein Structure Alignment

This is when you try to find the best fit between two protein structures. The idea is very similar to sequence alignment, only this time the algorithms are a bit different. In most cases, the algorithms for this process are computationally intensive and rely on trial and error. You could build phylogenetic trees based on structural evolutionary homology too.

Protein Fingerprint Analysis

This is basically using computational tools to identify relationships between two or more proteins by analyzing their break-down products – their peptide fingerprints. Using protein fragments, it is possible to compare entire cocktails of different proteins. How does the protein mixture from a human retinal cell, compare to a protein mixture from the retinal cell of a mouse? This kind of stuff, is called Proteomics, because you’re comparing the entire protein from an organism to another. You could also analyze protein fragments from different cells within the same organism to see how they might have evolved or developed.

DNA Micro-array Analysis

A DNA microarray is a slide with hundreds of tiny dots on it. Each dot is tagged with a fluorescent marker that glows under UV (or another form of) light, if the cells within that dot produce a given protein. When a given protein is made, it means that a given genetic sequence is being expressed (or transcribed into RNA which in turn is being translated in to protein). By inoculating these dots with the same population of cells and by measuring the amount of light coming from these dots, you could develop a gene expression profile for these cells. You could then study the expression profiles of these cells under different environmental conditions to see how they behave and change.

You could also inoculate different dots with different cell populations and study how their expression profiles differ. Example: normal gastric epithelium vs cancerous gastric epithelium.

Of course you could try looking at all these light emitting dots with your eyes and count manually. If you want to take a shot at it, you might even be able to tell the difference between the different levels of brightness between dots! But why not use computers to do the job for you? There are software tools out there that can quantitatively measure these expression profiles for you.

Primer Design

There are many experiments and indeed diagnostic tests that use an artificially synthesized DNA sequence to serve as an anchor that flanks a specific region of interest in the DNA of a cell, and amplify this region. By amplify – we mean, make multiple copies. These flanking sequences are also called primers. Applications for example include, amplifying DNA material of the HIV virus to better detect presence or absence of HIV in the blood of a patient. The specific name for this kind of test or experiment is called the polymerase chain reaction. There are a number of other applications of primers such as gene cloning, genetic hybridization, etc. Primers ought to be constructed in specific ways that prevent them from forming loops or binding to non-specific sites on cell DNA. How do you find the best candidate for a primer? Of course, computation!

Metabolomics

A fancy word for modeling metabolic pathways and their relationships using computational analyses. How does the glycolytic pathway relate to some random metabolic pathway found in the neurons of the brain? Computational tools help identify potential relationships between all of these different pathways and help you map them. In fact, there are metabolic pathway maps out there on the web that continually get updated to reflect this fascinating area of ongoing research.

I guess that covers a whole lot of what bioinformatics is all about. When it comes to definitions, some people say that bioinformatics is the application part whereas computational biology is the part that mainly deals with the development of algorithms.

Neologisms Galore!

As you can see, some fancy new words have come into existence as a result of all this frenzied activity:

  • Genomics: Strictly speaking, the study of entire genomes of organisms/cells. In bioinformatics, this term is applied to any studies on DNA.
  • Transcriptomics: Strictly speaking, the study of entire transcriptomes (the RNA complement of DNA) of organisms/cells. In bioinformatics, this term is applied to any studies on RNA.
  • Proteomics: Strictly speaking, the study of entire proteins made by organisms/cells. In bioinformatics, this term is applied to any studies on proteins. Structural biology is a special branch of proteomics that explores the 3-D structure of proteins.
  • Metabolomics: The study of entire metabolic pathways in organisms/cells. In bioinformatics, this term is applied to any studies on metabolic pathways and their inter-relationships.

Real World Impact

So what can all of this theoretical ‘data-dredging’ give us anyway? Short answer – hypotheses. Once you have a theoretical hypothesis for something you can test it in the lab. Without forming intelligent hypotheses, humanity might very well take centuries to experiment with every possible permutation or combination of data that has been amassed so far and mind you, which continues to grow as we speak!

Thanks to bioinformatics, we are now discovering genetic relationships between different diseases that were hitherto considered completely unrelated – such as diabetes mellitus and rheumatoid arthritis! Scientists like Dr. Atul Butte [go back] and his team are trying to reclassify all known diseases using all of the data that we’ve been able to gather from Genomics. Soon, the days of the traditional International Classification of Diseases (ICD) might be gone. We might some day have a genetic ICD!

Sequencing of individual human genomes (technology for this already exists and many commercial entities out there will happily sequence your genome for a fee) could help in detecting or predicting disease susceptibility.

Proteins could be substituted between organisms (a la pig and human insulin) and better yet, completely manipulated to suit an objective – such as drug delivery or effectiveness. Knowing a DNA sequence, would give you enough information to predict protein structure and function, giving you yet another tool in diagnosis.

And the list of possibilities is endless!

Bioinformatics, is thus man’s attempt to making biology and medicine a predictive science 🙂 .

Further Reading

I haven’t had the chance to read any other books on bioinformatics, what with exams just a couple of months away. Having read, “Developing Bioinformatics Computer Skills“, and found it a little too dense especially in the last couple of chapters, I would only recommend it as an introductory text to someone who already has some knowledge of computer algorithms. Because different algorithms have different caveats and statistical gotchas, it makes sense to have a sound understanding of what each of these algorithms do. Although the authors have done a pretty decent job in describing the essentials, the explanations of the algorithms and how they really function are a bit complicated for the average biologist. It’s difficult for me to recommend a book that I might not have read, but here are two I’m considering worth exploring in the future:

Understanding Bioinformatics
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum

Introduction to Bioinformatics: A Theoretical and Practical Approach
Introduction to Bioinformatics: A Theoretical And Practical Approach by Stephen Krawetz and David Womble

As books to refresh my knowledge of molecular biology and genetics I’m considering the following:

Molecular Biology of the Cell
Molecular Biology Of The Cell by Bruce Alberts et al


Molecular Biology Of The Gene by none other than James D Watson himself et al (Of ‘Watson & Crick‘ model of DNA fame)

Let me know if you have any other suggested readings in the comments1.

There are also a number of excellent Opencourseware lectures on bioinformatics out on the web (example: at AcademicEarth.org. For beginners though, I suggest Dr. Daniel Lopresti’s (Lehigh University) fantastic high level introduction to the field here. Also don’t forget to check out “A Short Course On Synthetic Genomics” by George Church and Craig Venter on Edge.org for a fascinating overview of what might lie ahead in the future! In the race to sequence the human genome, Craig Venter headed the main private company that posed competition to the NIH’s project. His group of researchers ultimately developed a much faster way to sequence the genome than had previously been imagined – the shotgun sequencing method.

Hope you’ve enjoyed this high level tour. Do send in your thoughts, suggestions and corrections!

UPDATE 1: Check out Dr. Eric Lander‘s (one of the stalwarts behind the Human Genome Project) excellent lecture at The Royal Society from 2005 called Beyond the Human Genome Project – Medicine in the 21st Century that tries to gives you the big picture on this topic.

UPDATE 2: Also check out NEJM’s special review on Genomics called Genomics — An Updated Primer.

Copyright © Firas MR. All rights reserved.

Your feedback counts:

1. Dr. Atul Butte ¥ suggests checking out some of the excellent material at NCBI’s Bookshelf. [go back]

Readability grades for this post:

Flesch reading ease score: 57.4
Automated readability index: 10.8
Flesch-Kincaid grade level: 9.7
Coleman-Liau index: 11.5
Gunning fog index: 13.4
SMOG index: 12.2

Powered by ScribeFire.