What is FABLE?
FABLE is a text mining process designed to more thoroughly
identify information written in biomedical text. FABLE is optimized for
finding mentions of human genes, and it currently searches the
MEDLINE®/PubMed®
set of research articles. Currently, FABLE has three tools:
Article Finder identifies
biomedical research articles that mention genes and proteins of interest.
Search terms are normalized by default, so that searching for "p53"
will identify articles that mention "TP53", "P53",
"tumor suppressor protein p53", and any other known synonym
used to describe the p53 gene.
Gene Lister identifies
sets of genes that are mentioned in articles containing one or more keyword
search terms of interest. Searching for "Mendel AND peas" will
identify all genes mentioned in articles that contain both of the words
"Mendel" and also "peas"
LitTrack
visually aligns journal articles to genomic position by
associated HGNC gene symbol. LitTrack is a local mirror of the UCSC Genome Browser that includes a literature track.
How does FABLE work?
FABLE uses an entirely computational procedure to identify mentions of
genes in text. Briefly, a trained probabilistic model (gene tagger) analyzes
various features of text surrounding a possible mention of a gene to determine
if the mention is likely to be a gene or not (named entity recognition).
Mentions that have a sufficient likelihood of being a gene are then "normalized"
to official gene symbols by comparing the text mention to a set of known
human gene names, both by exact and approximate means. If a document states
"we studied the p53 gene", the model would likely identify "p53"
as a gene, and the normalizer would refer this mention to its official
gene symbol "TP53". The gene tagger and the normalizer are used
to analyze all documents in MEDLINE®/PubMed®. The results are
imported into a database where the mention, the normalized term, and the
article mentioning the gene are all recorded. The FABLE website queries
this database.
What is different about FABLE?
Current tools to search the biomedical literature, such as PubMed, use
human annotators to read articles and identify what they contain. FABLE
uses this information too, but FABLE also analyzes the text to predict
its intended meaning using sophisticated algorithms based upon linguistic
theory. This often identifies mentions of objects such as genes that were
somehow missed or unappreciated by the human annotators, and it can perform
this analysis much more rapidly. Also, FABLE keeps track of synonyms and
aliases of mentions to identify articles that might use different names
to mean the same thing. While FABLE is not perfect in distinguishing true
objects from unintended ones, we believe that its ability to identify
a much larger set of objects than existing manually-assisted systems is
of great benefit. Our tests show that searching FABLE for a human gene
identifies approximately 25% more articles than PubMed on average.
Why FABLE?
FABLE currently tags only genes and proteins and only normalizes human
genes. This allows us to design a customized tool that is tailored specifically
for this task, rather than a generalized tool such as PubMed that provides
broad search capabilities with less specificity.