FABLE
Fast Automated Biomedical Literature Extraction
Gene Lister Article Finder

Overview

What is FABLE?

FABLE is a text mining process designed to more thoroughly identify information written in biomedical text. FABLE is optimized for finding mentions of human genes, and it currently searches the MEDLINE®/PubMed® set of research articles. Currently, FABLE has three tools:
Article Finder identifies biomedical research articles that mention genes and proteins of interest. Search terms are normalized by default, so that searching for "p53" will identify articles that mention "TP53", "P53", "tumor suppressor protein p53", and any other known synonym used to describe the p53 gene.
Gene Lister identifies sets of genes that are mentioned in articles containing one or more keyword search terms of interest. Searching for "Mendel AND peas" will identify all genes mentioned in articles that contain both of the words "Mendel" and also "peas"
LitTrack visually aligns journal articles to genomic position by associated HGNC gene symbol. LitTrack is a local mirror of the UCSC Genome Browser that includes a literature track.

How does FABLE work?

FABLE uses an entirely computational procedure to identify mentions of genes in text. Briefly, a trained probabilistic model (gene tagger) analyzes various features of text surrounding a possible mention of a gene to determine if the mention is likely to be a gene or not (named entity recognition). Mentions that have a sufficient likelihood of being a gene are then "normalized" to official gene symbols by comparing the text mention to a set of known human gene names, both by exact and approximate means. If a document states "we studied the p53 gene", the model would likely identify "p53" as a gene, and the normalizer would refer this mention to its official gene symbol "TP53". The gene tagger and the normalizer are used to analyze all documents in MEDLINE®/PubMed®. The results are imported into a database where the mention, the normalized term, and the article mentioning the gene are all recorded. The FABLE website queries this database.

What is different about FABLE?

Current tools to search the biomedical literature, such as PubMed, use human annotators to read articles and identify what they contain. FABLE uses this information too, but FABLE also analyzes the text to predict its intended meaning using sophisticated algorithms based upon linguistic theory. This often identifies mentions of objects such as genes that were somehow missed or unappreciated by the human annotators, and it can perform this analysis much more rapidly. Also, FABLE keeps track of synonyms and aliases of mentions to identify articles that might use different names to mean the same thing. While FABLE is not perfect in distinguishing true objects from unintended ones, we believe that its ability to identify a much larger set of objects than existing manually-assisted systems is of great benefit. Our tests show that searching FABLE for a human gene identifies approximately 25% more articles than PubMed on average.

Why FABLE?

FABLE currently tags only genes and proteins and only normalizes human genes. This allows us to design a customized tool that is tailored specifically for this task, rather than a generalized tool such as PubMed that provides broad search capabilities with less specificity.

More information

R. McDonald and F. Pereira. Identifying Gene and Protein Mentions in Text Using Conditional Random Fields. BMC Bioinformatics 2005, 6(Suppl 1):S6

J. Crim, R. McDonald and F. Pereira. Automatically Annotating Documents with Normalized Gene Lists. BMC Bioinformatics 2005, 6(Suppl 1):S13

Fang H, Murphy K, Jin Y, Kim JS, White PS. Human gene name normalization using text matching with automatically extracted synonym dictionaries. Proceedings of BioNLP‘06, 2006.