Posts Tagged ‘Bioinformatics’
Before I begin today’s discussion (since it concerns another book), a quick plug for Steve McCurry, whose photography I deeply admire and whose recent photo-essays on the subject of reading, are especially inspirational and worth checking out. I quote:
“Reading is a means of thinking with another person’s mind; it forces you to stretch your own.” — Charles Scribner
Susan Sontag said: “The camera makes everyone a tourist in other people’s reality.” The same can be said for reading books.
Every once in a while, I receive feedback from readers as to how much they appreciate some of my writing on non-clinical/non-medical subjects. Sometimes, the subject matter concerns books or web resources that I’ve recently read. Occasionally, I also like taking notes as I happen to read this material. And often, friends, family and colleagues ask me questions on topics that I’ve either read a book about or have made notes on. Note-taking is a good habit as you grow your comprehension of things. In my opinion, it also helps you skeletonize reading material – sort of like building a quick ‘Table Of Contents’ – that you can utilize to build your knowledge base as you assimilate more and more.
If you’ve ever visited a college bookstore in India, you’ll find dozens and dozens of what are popularly referred to as “guides” or “guidebooks”. These contain summaries and notes on all kinds of subjects – from medicine to engineering and beyond. They help students:
- Get verbosity in their main coursebooks (often written in English that is more befitting the Middle Ages) out of the way to focus on skeletonizing material
- Cram before exams
I tend to think of my notes and summaries of recently-read books, as guidebooks. Anchor points, that I (& often family or friends) can come back to later on, sometimes when I’ve long forgotten a lot of the material!
I write this summary in this spirit. So with all of that behind us, let’s begin.
I stumbled upon an enticing little book recently, called “Learning the BASH shell“, by Cameron Newham & Bill Rosenblatt. Being the technophile that I am, I just couldn’t resist taking a peek.
I’ve always been fascinated by the innards of computers – from how they’re made and assembled to how they are programmed and used. My first real foray into them began with learning some of the fundamentals of DOS and BASIC on an old 286 (I think) as a 7th grader. Those were the days of pizza-box styled CPU-case form factors, monochrome monitors that had a switch that would turn text green, hard disks that were in the MB range, RAM that was measured in KB and when people thought 3.5 inch floppies were cool. Oh boy, I still do remember the way people used to go gaga over double-sided, high-density, pre-formatted and stuff! As I witnessed the emergence of CDs and then later DVDs and now SSDs and portable HDs, I got my hands dirty on the 386, the 486, the Pentium 1, the Pentium 3, the Pentium 4 (still working!) and my current main workstation which is a Core 2 Duo. Boy, have I come a long way! Over the years I’ve read a number of books on computer hardware (this one and this one recently – more on them for a future post) and software applications and Operating Systems (such as this one on GIMP, this one on GPG, this one, this one and this one on Linux and this one and this one on FreeBSD – again, more on them later!). But there was always one cranny that seemed far too daunting to approach. Yup, programming. Utterly jargoned, the world of modern programming has seemed really quite esoteric & complicated to me from the old days, when BASIC and dBASE could get your plate full. When you’ve lost >95% of your memory on BASIC, it doesn’t help either. Ever since reading about computational biology or bioinformatics (see my summary of a book on the topic here), I’ve been convinced that getting at least a superficial handle on computer programming concepts can mean a lot in terms of having a competitive edge if you ever contemplate being in the research world. This interplay between technology and biology and the level to which our research has evolved over the past few decades was further reinforced by something I read recently from an interview of Kary Mullis, the inventor of PCR. He eventually won the Nobel Prize for his work:
What I do personally is the research, which I can do from home because of the Internet, which pleases me immensely. I don’t need to go to a library; I don’t need to even talk to people face to face.
There are now whole books and articles geared towards programming and biology. I recommend the great introductory essay, Why Biologists Want to Program Computers by author, James Tisdall.
“Learning the BASH shell” is a fascinating newbie-friendly introduction to the world of programming and assumes extremely rudimentary familiarity with how computers work or computer programming in general. It certainly helps if you have a working understanding of Linux or any one of the Unix operating system flavors, but if you’re on Windows you can get by using Cygwin. I’ve been using Linux for the last couple of years (originally beginning with Ubuntu 6.06, then Arch Linux and Debian, Debian being my current favorite), so this background certainly helped me grasp some of the core concepts much faster.
So what exactly is programming anyway? Well, think of programming as a means to talk to your computer to carry out tasks. Deep down, computers understand nothing but the binary number system (eg: copy this file from here to there translates into gibberish like .…010001100001111000100110…). Not something that most humans would find even remotely appealing (apparently some geeks’ favorite pastime is reverse-engineering human-friendly language from binary!). Now most of us are familiar with using a mouse to point-and-click our way to getting tasks done. But sometimes it becomes necessary to speak to our computers in more direct terms. This ultimately comes down to entering a ‘programming environment’, typing words in a special syntax (depending on what programming language you use) using this environment, saving these words in a file and then translating the file and the words it contains into language the computer can understand (binary language). The computer then executes tasks according to the words you typed. Most languages can broadly be divided into:
- Compiler-based: Words in the programming language need to be converted into binary using a program called a ‘compiler’. The binary file can then be run independently. (eg. the C programming language)
- Interpreter-based: Words in the programming language are translated on-the-fly into binary. This on-the-fly conversion occurs by means of an intermediary program called an ‘interpreter’. Because of the additional resources required to run the interpreter program, it can sometimes take a while before your computer understands what exactly it needs to do. (eg. the Perl or Python programming languages)
What is BASH?
BASH is first and foremost a ‘shell’. If you’ve ever opened up a Command-Prompt or CLI (Command Line Interface) on Windows (Start Menu > Accessories > Command Prompt), then you’ve seen what a shell looks like. Something that provides a text interface to communicate with the innards of your operating system. We’re used to doing stuff the GUI way (Graphical User Interface), using attractive buttons, windows and graphics. Think of the shell as just an alternative means to talk to your computer. Phone-line vs. paper-mail, if that metaphor helps.
Alright, so we get that BASH provides us with an interface. But what else does it do? Well, BASH is also an interpreted programming language! That is amazing because what this allows you to do, is to use your shell to create programs for repetitive or complicated multi-step tasks. A little segue into Unix philosophy bears merit here. Unix-derivative operating systems, unlike others, basically stress on breaking complicated tasks in to tiny bits. Each bit is to be worked on by a program that specializes in that given component of a task. sort is a Unix program that sorts text. cut snips off a chunk of text from a larger whole. grep is used to find text. sed is used to replace text. The find program is used to find files and directories. And so on. If you need to find a given file, then look for certain text in it, yank out a portion of it, replace part of this chunk, then sort it from ascending to descending order, all you do is combine find, grep, sed, cut and sort using the proper syntax. But what if you didn’t really want to replace text? Then all you do is omit sed from the workflow. See, that’s the power of Unix-based OS(s) like Linux or FreeBSD. Flexibility.
The BASH programming language takes simple text files as its input. Then an interpreter called bash translates the words (commands, etc.) into machine-readable code. It’s really as simple as that. Because BASH stresses on the Unix philosophy, it assumes you’ll need to use the various Unix-type programs to get stuff done. So at the end of the day, a BASH program looks a lot like:
execute the Unix program date
assign the output of date to variable x
if x = 8 AM
then execute these Unix program in this order(find, grep, sed, cut, sort, etc.)
Basic Elements of Programming
In general, programming consists of breaking down complicated tasks into bits using unambiguous language in a standard syntax.
The fundamental idea (using BASH as an example) is to:
- Construct variables.
- Manipulate variables. Add, subtract, change their text content, etc.
- Use Conditions such as if/then (referred to in technobabble as “Flow Control”)
- Execute Unix programs based on said Conditions
All it takes to get going is learning the syntax of framing your thoughts. And for some languages this can get hairy.
This explains why some of the most popular programming languages out there try to emulate human language as much as possible in their syntax. And why a popular language such as Perl was in fact developed by a linguist!
This was just a brief and extremely high-level introduction to basic concepts in programming. Do grab yourself a copy and dive in to “Learning the BASH shell” with the aforementioned framework in mind. And before you know it, you’ll soon start putting two and two together and be on your way to developing your own nifty program!
I’m going to end for today with some of the additional excellent learning resources that I’m currently exploring to take my quest further:
- Steve Parker’s BASH tutorial (extremely easy to follow along)
- Greg’s BASH Guide (another one recommended for absolute noobs)
- Learning to Program Using Python – A Tutorial for Hobbyists, Self-Starters, and All Who Want to Learn the Art of Computer Programming by Alan Gauld
- How to think like a Computer Scientist – Learning with Python by Jeffrey Elkner, Allen B. Downey, and Chris Meyers
UPDATE 1: If you’re looking for a programming language to begin with and have come down to either Perl or Python, but are finding it difficult to choose one over the other, then I think you’ll find the following article by the famous Open Source Software advocate, Eric S. Raymond, a resourceful read: Why Python?
UPDATE 2: A number of resourceful, science-minded people at SciPy conduct workshops aimed at introducing Python and its applications in science. They have a great collection of introductory videos on Python programming concepts & syntax here. Another group, called FOSSEE, has a number of workshop videos introducing Python programming here. They also have a screencast series on the subject here.
UPDATE 3: AcademicEarth.org has quite a number of useful lecture series and Open Courseware material on learning programming and basic Computer Science concepts. Check out the MIT lecture, “Introduction to Computer Science and Programming” which is specifically designed for students with little to no programming experience. The lecture focuses on Python.
Copyright Firas MR. All rights reserved.
- Download the Stream Player plugin as a zip. Extract it locally. Rename the player.swf file to player-swf.jpg
- Upload player-swf.jpg to your WordPress.com Media Library. Don’t worry, WordPress.com will not complain since it thinks it’s being given a JPG file!
- Next insert the gigya shortcode as explained at Panos’ website. I inserted the following between square brackets, [ ] :
gigya src="http://mydominanthemisphere.files.wordpress.com/2010/11/player-swf.jpg" width="512" wmode="transparent" allowFullScreen="true" quality="high" flashvars="file=http://ia311014.us.archive.org/1/items/scipy09_introTutorialDay1_1/scipy09_introTutorialDay1_1_512kb.mp4&image=http://ia311014.us.archive.org/1/items/scipy09_introTutorialDay1_1/scipy09_introTutorialDay1_1.thumbs/scipy09_introTutorialDay1_1_000180.jpg&provider=http"
flashvars are separated by ampersands like
flashvars="file=MOVIE URL HERE&image=IMAGE URL HERE". The
provider="http" parameter to
flashvars states that we would like to enable skipping within the video stream.