On Literature Search Tools And Translational Medicine
Apologies for the lack of recent blogular activity. As usual, I’ve been swamped with academia.
A couple of interesting pieces on literature search strategies & tools that caught my eye recently, some of which were quite new to me. Do check them out:
- Matos, S., Arrais, J., Maia-Rodrigues, J., & Oliveira, J. (2010). Concept-based query expansion for retrieving gene related publications from MEDLINE. BMC Bioinformatics, 11(1), 212. doi:10.1186/1471-2105-11-212
The most popular biomedical information retrieval system, PubMed, gives researchers access to over 17 million citations from a broad collection of scientific journals, indexed by the MEDLINE literature database. PubMed facilitates access to the biomedical literature by combining the Medical Subject Headings (MeSH) based indexing from MEDLINE, with Boolean and vector space models for document retrieval, offering a single interface from which these journals can be searched . However, and despite these strong points, there are some limitations in using PubMed or other similar tools. A first limitation comes from the fact that keyword-based searches usually lead to underspecified queries, which is a main problem in any information retrieval (IR) system . This usually means that users will have to perform various iterations and modifications to their queries in order to satisfy their information needs. This process is well described in  in the context of information-seeking behaviour patterns in biomedical information retrieval. Another drawback is that PubMed does not sort the retrieved documents in terms of how relevant they are for the user query. Instead, the documents satisfying the query are retrieved and presented in reverse date order. This approach is suitable for such cases in which the user is familiar with a particular field and wants to find the most recent publications. However, if the user is looking for articles associated with several query terms and possibly describing relations between those terms, the most relevant documents may appear too far down the result list to be easily retrieved by the user.
To address the issues mentioned above, several tools have been developed in the past years that combine information extraction, text mining and natural language processing techniques to help retrieve relevant articles from the biomedical literature . Most of these tools are based on the MEDLINE literature database and take advantage of the domain knowledge available in databases and resources like the Entrez Gene, UniProt, GO or UMLS to process the titles and abstracts of texts and present the extracted information in different forms: relevant sentences describing a biological process or linking two or more biological entities, networks of interrelations, or in terms of co-occurrence statistics between domain terms. One such example is the GoPubMed tool , which retrieves MEDLINE abstracts and categorizes them according to the Gene Ontology (GO) and MeSH terms. Another tool, iHOP , uses genes and proteins as links between sentences, allowing the navigation through sentences and abstracts. The AliBaba system  uses pattern matching and co-occurrence statistics to find associations between biological entities such as genes, proteins or diseases identified in MEDLINE abstracts, and presents the search results in the form of a graph. EBIMed  finds protein/gene names, GO annotations, drugs and species in PubMed abstracts showing the results in a table with links to the sentences and abstracts that support the corresponding associations. FACTA  retrieves abstracts from PubMed and identifies biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) co-occurring with the terms in the user’s query. The concepts are presented to the user in a tabular format and are ranked based on the co-occurrence statistics or on pointwise mutual information. More recently, there has been some focus on applying more detailed linguistic processing in order to improve information retrieval and extraction. Chilibot  retrieves sentences from MEDLINE abstracts relating to a pair (or a list) of proteins, genes, or keywords, and applies shallow parsing to classify these sentences as interactive, non-interactive or simple abstract co-occurrence. The identified relationships between entities or keywords are then displayed as a graph. Another tool, MEDIE , uses a deep-parser and a term recognizer to index abstracts based on pre-computed semantic annotations, allowing for real-time retrieval of sentences containing biological concepts that are related to the user query terms.
Despite the availability of several specific tools, such as the ones presented above, we feel that the demand for finding references relevant for a large set of is still not fully addressed. This constitutes an important query type, as it is a typical outcome of many experimental techniques. An example is a gene expression study, in which, after measuring the relative mRNA expression levels of thousands of genes, one usually obtains a subset of differentially expressed genes that are then considered for further analysis [16,17]. The ability to rapidly identify the literature describing relations between these differentially expressed genes is crucial for the success of data analysis. In such cases, the problem of obtaining the documents which are more relevant for the user becomes even more critical because of the large number of genes being studied, the high degree of synonymy and term variability, and the ambiguity in gene names.
While it is possible to perform a composite query in PubMed, or use a list of genes as input to some of the IR tools described above, these systems do not offer a retrieval and ranking strategy which ensures that the obtained results are sorted according to the relevance for the entire input list. A tool more oriented to analysing a set of genes is microGENIE , which accepts a set of genes as input and combines information from the UniGene and SwissProt databases to create an expanded query string that is submitted to PubMed. A more recently proposed tool, GeneE , follows a similar approach. In this tool, gene names in the user input are expanded to include known synonyms, which are obtained from four reference databases and filtered to eliminate ambiguous terms. The expanded query can then be submitted to different search engines, including PubMed. In this paper, we propose QuExT (Query Expansion Tool), a document indexing and retrieval application that obtains, from the MEDLINE database, a ranked list of publications that are most significant to a particular set of genes. Document retrieval and ranking are based on a concept-based methodology that broadens the resulting set of documents to include documents focusing on these gene-related concepts. Each gene in the input list is expanded to its various synonyms and to a network of biologically associated terms, namely proteins, metabolic pathways and diseases. Furthermore, the retrieved documents are ranked according to user-defined weights for each of these concept classes. By simply changing these weights, users can alter the order of the documents, allowing them to obtain for example, documents that are more focused on the metabolic pathways in which the initial genes are involved.
(Creative Commons Attribution License: http://creativecommons.org/licenses/by/2.0)
- Kim, J., & Rebholz-Schuhmann, D. (2008). Categorization of services for seeking information in biomedical literature: a typology for improvement of practice. Brief Bioinform, 9(6), 452-465. doi:10.1093/bib/bbn032
- Weeber, M., Kors, J. A., & Mons, B. (2005). Online tools to support literature-based discovery in the life sciences. Brief Bioinform, 6(3), 277-286. doi:10.1093/bib/6.3.277
I’m sure there are many other nice ones out there. Don’t forget to also check out the NCBI Handbook. Another great resource …
On a separate note, a couple of NIH affiliated authors have written some thought provoking stuff about Translational Medicine:-
- Nussenblatt, R., Marincola, F., & Schechter, A. (2010). Translational Medicine – doing it backwards. Journal of Translational Medicine, 8(1), 12. doi:10.1186/1479-5876-8-12
The present paradigm of hypothesis-driven research poorly suits the needs of biomedical research unless efforts are spent in identifying clinically relevant hypotheses. The dominant funding system favors hypotheses born from model systems and not humans, bypassing the Baconian principle of relevant observations and experimentation before hypotheses. Here, we argue that that this attitude has born two unfortunate results: lack of sufficient rigor in selecting hypotheses relevant to human disease and limitations of most clinical studies to certain outcome parameters rather than expanding knowledge of human pathophysiology; an illogical approach to translational medicine.
A recent candidate for a post-doctoral fellowship position came to the laboratory for an interview and spoke of the wish to leave in vitro work and enter into meaningful in vivo work. He spoke of an in vitro observation with mouse cells and said that it could be readily applied to treating human disease. Indeed his present mentor had told him that was the rationale for doing the studies. When asked if he knew whether the mechanisms he outlined in the mouse existed in humans, he said that he was unaware of such information and upon reflection wasn’t sure in any event how his approach could be used with patients. This is a scenario that is repeated again and again in the halls of great institutions dedicated to medical research. Any self respecting investigator (and those they mentor) knows that one of the most important new key words today is “translational”. However, in reality this clarion call for medical research, often termed “Bench to Bedside” is far more often ignored than followed. Indeed the paucity of real translational work can make one argue that we are not meeting our collective responsibility as stewards of advancing the health of the public. We see this failure in all areas of biomedical research, but as a community we do not wish to acknowledge it, perhaps in part because the system, as it is, supports superb science. Looking this from another perspective, Young et al  suggest that the peer-review of journal articles is one subtle way this concept is perpetuated. Their article suggests that the incentive structure built around impact and citations favors reiteration of popular work, i.e., more and more detailed mouse experiments, and that it can be difficult and dangerous for a career to move into a new arena, especially when human study is expensive of time and money.
(Creative Commons Attribution License: http://creativecommons.org/licenses/by/2.0)
Well, I guess that does it for now. Hope those articles pique your interest as much as they did mine. Until we meet again, adios :-) !
Copyright © Firas MR. All rights reserved.