FABLE
Forthright Automated Biomedical Literature Extraction
Gene Lister Article Finder
for support, email fable at the subdomain genome in the domain chop dot edu

Gene Lister: find human genes associated with keywords

Search terms (Gene Lister)

What can be searched for (anything)

A search term can be any keyword for which you wish to find associated human genes. You can search for drugs, diseases, people, places, institutions, genes, proteins — any text that occurs in the biomedical literature. See the search examples below.

It is possible to search for keywords containing special characters (such as ø) either by using the special characters in the keyword (e.g., searching for "Jørgensen"), or by using a corresponding regular character (e.g., searching for "Jorgensen" will find "Jørgensen"). In this example, a search for Jørgensen will yield more specific results.

Search features:

  • Searches are case-insensitive.
  • To search for a phrase, place the phrase within quotation marks.
  • Boolean logic can be expressed using upper-case AND, OR, and NOT, and parentheses. See below.
  • See Query Parser Syntax for advanced searching techniques, including wildcard and fuzzy searches.

Search examples (Gene Lister)

What to search for:

Search for genes associated with a disease:

Schizophrenia AND bipolar NOT depression

Search for genes associated with a disease attribute:

Metastasis AND "colon cancer"

Search for genes associated with a person:

Asthma AND "Doe JA"

Search for genes associated with a place:

"University of Pennsylvania" AND "heart disease"

Search for genes associated with genes:

Myoglobin NOT hemoglobin

Search for genes associated with techniques:

Luciferase

Other helpful search examples:

Searches are case-insensitive:

PCR and Pcr are equivalent.

Synonym search (Gene Lister)

Synonym search ("Include synonyms" option checked)

If the Synonyms check box is checked, a keyword that is a gene or gene alias will act as a proxy for all known aliases of the gene. For instance, searching for p53 with Synonyms checked will find genes associated with p53, but also those associated with TP53, or tumor suppressor gene p53, or any other established synonym.

Exact search ("Include synonyms" option not checked)

If the Synonyms box is unchecked, all keywords are treated literally. For example, searching for p53 will find genes mentioned in articles with TP53 only when it is referred to as p53.

Alias search

Beside each gene symbol you will see a blue triangle (). Clicking the blue triangle will display all synonyms of the gene known by FABLE. If a synonym is a link, clicking the link will retrieve any articles mentioning this gene and using this synonym. If a synonym is not linked, FABLE identified no articles using that synonym to mention the gene in question.

Alias matches include close variants

The aliases are matched using a form of fuzzy matching in which a match may occur for a term that differs in punctuation, space, or parenthesized substrings. Note that the latter feature occasionally causes invalid matches.

Boolean searches (Gene Lister)

Boolean searches are supported by Gene Lister (unlike Article Finder). You can use the AND, OR, and NOT operators, along with parentheses for grouping.
  • mycn OR neuroblastoma finds genes associated with either MYCN or neuroblastoma.
  • apob "heart disease" and apob AND "heart disease" are equivalent; they find genes associated with both APOB and heart disease.
  • "heart disease" NOT insulin finds genes associated with heart disease but not with insulin.
  • cancer AND (cigarettes OR cigars OR "chewing tobacco") finds genes associated with cigarettes or cigars or "chewing tobacco" and intersects those genes with the genes associated with cancer

No ambiguous name resolution (Gene Lister)

In the current version of Gene Lister, if the "include synonyms" option is selected, and one of the search terms is a gene alias that maps to more than one gene symbol (delta, for example), then this search term is treated as literal text, i.e. no synonym search will be performed.

Sorting search results (Gene Lister)

Sorting by frequency

Genes with the highest number of articles mentioning them are shown at the top of the list.

Sorting by gene

Selecting this option sorts matching gene symbols alphabetically.

Browsing search results (Gene Lister)

Browsing Gene Lister results is similar to browsing Article Finder results.

However, the displayed data is different. A Gene Lister results page displays several columns:

  • The Gene column lists the names of genes found in the articles identified by your search term(s).
  • The Articles column lists the number of articles mentioning the corresponding gene that also contain your search term(s).
  • The Synonyms column contains clickable drop-down lists of aliases for each gene.
  • The LitTrack column displays the literature-genome alignment for each gene.
  • The Links column contains links to external resources about each gene.

Click on a gene to be taken to the Article Finder page for that gene, which will display all articles associated with the gene (not just those matching your search criteria). The articles displayed are matched using synonym expansion, so articles matching any alias of the gene in question will be displayed, and the Synonyms checkbox will be automatically checked. Remember that the Synonyms checkbox on the Gene Lister results page will not influence the results of clicking on the gene name.

Click on the number of articles to the right of the gene name to be taken to a special Article Finder page that displays just the articles that match your search criteria and mention the gene. Just as for your original Gene Lister search, your search criteria will be interpreted according to the state of the Synonyms checkbox, but the gene name for this line will always be interpreted with synonym expansion.

Click on a blue gene symbol or the adjacent blue triangle (e.g. MYCN) to see aliases of a gene considered equivalent in the search. If a synonym is a link, clicking the link will return the articles of a exact Article Finder search for that synonym. If a synonym is not a link, no articles will be returned by a exact Article Finder search for that synonym.

Click on a LitTrack link to see documents mentioning the gene aligned to the gene's genomic position in the UCSC Genome Browser. The text of the link is the cytogenetic localization or range of the gene. Note that a small fraction of genes cannot be identified in the UCSC Browser data, usually because the UCSC data does not yet include the most recent HGNC gene symbol.

Click on an external link to NCBI Gene, Google, UCSC, SymAtlas, GeneCards , or GeneLynx to find information on the gene.

Example:

Searching for spleen returns all genes mentioned in articles about the spleen. The most frequently mentioned gene is IL2. Clicking on IL2 displays all articles mentioning IL2 or its synonyms. Clicking on the number of articles for IL2 (~ 5500) displays just those articles mentioning spleen in addition to IL2.

Downloading search results (Gene Lister)

Downloading Gene Lister results is similar to downloading Article Finder results.

Using the search bar in the page header (Article Finder and Gene Lister)

A Gene Lister (or Article Finder) search can be initiated not only from the home page, but from any other page, using the search bar in the page header. See above for more about this feature.


Article Finder: find articles about human genes

Search terms (Article Finder)

What can be searched for (gene, RNA, or protein)

A search term must be a name, description, synonym, or identifier of an individual gene, RNA or protein.

What can NOT be searched for (non-gene terms and non-protein terms)

A search term other than a name, description, synonym, or identifier of an individual gene, RNA or protein will not be found.

Search features:

  • Searches are case-insensitive.
  • Boolean operators are not supported.
  • Multiple search terms result in the intersection of the searches for the individual terms (i.e., there is an implicit AND operator between multiple search terms.)
  • Quotation marks must be placed around multi-word gene aliases.

Search examples (Article Finder)

Searches are case-insensitive:

MYC and myc will find the same set of articles.

To search for articles containing more than one search term (logical "AND" searches), separate the terms with a space:

MYCN p53 finds articles mentioning both MYCN and also p53

To search for a gene alias that contains spaces, place the phrase in quotation marks:

"CD77 synthase" finds articles mentioning the multi-word gene alias CD77 synthase

If you did not place quotation marks around CD77 synthase, then Article Finder would not find anything, because it would look for two separate gene aliases CD77 and synthase, which don't exist.

Synonym search (Article Finder)

Synonym search ("Include synonyms" option checked)

If the "Include synonyms" box is checked during a search, articles using either the search term(s) or known alternative forms of the gene(s) being searched for will be identified. Searching for p53 will identify articles using the term p53, but also articles using the term TP53, or tumor suppressor gene p53, or any other established synonym.
Beside each gene symbol you will see a blue triangle (). Clicking the blue triangle will display all synonyms of the gene known by FABLE. If a synonym is a link, clicking the link will retrieve any articles mentioning this gene and using this synonym. If a synonym is not linked, FABLE identified no articles using that synonym to mention the gene in question.

Alias matches include close variants

The aliases are matched using a form of fuzzy matching in which a match may occur for a term that differs in punctuation, space, or parenthesized substrings. Note that the latter feature occasionally causes invalid matches.

Exact search ("Include synonyms" option not checked)

If the "Include synonyms" box is unchecked, the search will be restricted to all articles mentioning the exact gene or protein name(s) used as the search term(s). Searching for p53 without synonyms will identify articles mentioning the gene p53 only when it is referred to as p53.

No boolean searches (Article Finder)

Boolean searching (the use of AND, OR, and NOT operators) is not currently supported. To perform an "AND" search, enter the search terms with spaces in between each term, such as:
MYCN p53
which identifies articles mentioning both MYCN and also p53

Ambiguous name resolution (Article Finder)

When a search term has been used to describe more than one gene, FABLE will need assistance to determine what gene is of interest. An intermediate web page will present a list of all genes the search term is known to represent. For example, the term BAP refers both to the genes PHB2 and SIL1. For these ambiguous searches, all known synonyms associated with each possible gene are displayed. Links to online databases that might help to select the intended gene are included to assist users in selecting the gene of interest. Also, if the gene is present in the UCSC Genome Browser, a link to FABLE LitTrack is provided to show the gene's genomic positional context.

Sorting search results (Article Finder)

Sorting by relevance

The relevance option (default) sorts matching articles based upon a measure of their relevance to the search term(s). Relevance is measured as a combination of where the search term occurs in an article (title, abstract body, MeSH terms, keywords, etc.), how frequently it appears, and, if the "include synonyms" option is selected, how closely the actual search term resembles the mention in the article. Articles are ordered from most relevant to least relevant.

Sorting in chronological order

Selecting this option sorts matching articles based upon their publication dates, most recent first.

Sorting in reverse chronological order

Selecting this option sorts matching articles based upon their publication dates, oldest first.

Sorting by first author

Selecting this option sorts matching articles alphabetically by last name of the first author. Articles with no listed authors are placed at the beginning.

Sorting by impact factor

The articles in the result will be ranked by the ISI impact factors of the corresponding journals. Articles are sorted from highest to lowest impact factor. Articles from journals with no impact factor are listed at the end of the results.

Browsing search results (Article Finder)

Clicking on the first line of a citation takes you to the NCBI PubMed page for the citation. Users can choose the number of articles they wish to appear on each result page by entering the number of articles desired in the Results/page box on the search interface. The default is 25 articles per page, unless there are fewer than 25 articles identified.

Citations are displayed in Unicode/UTF-8 format.

The pink summary bar just above the citation results provides the following functions:

Navigation links (First, Previous, Next, Last) to allow a user to browse though pages of a search result (also displayed underneath the citations).

The number of documents identified by the search term(s).

A list of gene aliases for each gene being searched for. To view the list, click on the blue triangle.

A link to LitTrack, which will display the gene and its literature aligned to the gene's position in the human genome. The link includes the gene's cytogenetic position.

A tool for downloading citation results, described in the next section.

Downloading search results (Article Finder)

The downloading option allows results to be saved to a file on a user's local computer. Three formats are supported: Excel, XML, and CSV. Files include a header row that describes the content of each column. The "Derived Date" column extrapolates non-calendar MEDLINE dates to a precise day in a calendar year for sorting purposes.
The Excel (Microsoft Excel) file format organizes the results in a table, with each column corresponding to a column of Excel cells.
The XML (eXtensible Markup Language) file format organizes the results in a text file as citations with markup annotations denoting the field names.
The CSV (Comma Separated Value) file format organizes the values in a table as a series of Unicode/UTF-8 text lines, such that each column value is delimited by quotation marks and columns are separated by commas. Make sure your spreadsheet or other software interprets the file using the Unicode/UTF-8 character set and encoding.

Columns: the downloaded data includes the following fields: PMID, Authors, Title, Journal, Volume, Pages, Date, and Derived Date. PMID is the PubMed® ID for the article (see www.pubmed.gov.) Date is the publication date, and it has no consistent format. Derived date is the midpoint of the publication date (Date) range, to the extent it can be automatically determined. Derived date is the same as Date if the latter is a single day. Derived date has a consistent machine-usable format (YYYY-MM-DD), unlike Date.

Using the search bar in the page header (Article Finder and Gene Lister)

An Article Finder (or Gene Lister) search can be initiated not only from the home page, but from any other page, using the search bar in the page header. A search can be quickly modified and re-run by using the search bar. Click the Go button to execute the search. Note that the Go button will run either Article Finder or Gene Lister, depending on which application has been selected in the application selector beneath the top left FABLE logo:

When the application is switched, the default search terms and search options will change correspondingly in the search bar beneath.


LitTrack: align literature to genomic positions

Description

FABLE's LitTrack is a custom track built in a local instance of the UCSC Genome Browser, which is a popular tool for visualizing annotations of genomes. LitTrack is currently available only for human genes. LitTrack includes the following features:

  • A customized track that aligns the MEDLINE documents determined to be most relevant to a particular gene to the gene's determined genomic position.
  • A fully functional UCSC Genome Browser, including the ability to zoom in and out or search for an object of interest.
  • The ability to conduct FABLE Article Finder searches for a particular displayed gene.
  • The ability to retrieve the MEDLINE records for a displayed document.
  • Indicators for the gene's coding region, the number of documents associated with a gene, and the first author of the documents displayed.
  • A parameter that allows the number of documents displayed to be customized.

Accessing LitTrack

From the Main Page: Clicking on the "View Browser" button will link to the main UCSC Genome Browser search page The search page can be used to specify the organism, sequence build, and genomic position or text search term. Note that while LitTrack is currently available only for human genes, all other non-human data available at the main Genome Browser is also available at the FABLE site.

From the Article Finder Page: Clicking on "LitTrack", if present, within the pink bar above displayed citation results will generate a graphical depiction of the genomic region encompassing the gene of interest.

From the Gene Lister Page: Clicking on a cytogenetic position in the "LitTrack" column will generate a graphical depiction of the genomic region encompassing the gene of interest.

Layout

The layout places the LitTrack as the top-most annotation track. LitTrack depicts the following:

Gene Bar

The top, colored bar represents a known gene that has been associated with a HGNC symbol. The bar corresponds to the sequence position previously identified for a RefSeq RNA corresponding to the gene. The official gene symbol is shown to the left of each bar. The number to the right to the bar represents the corresponding number of MEDLINE documents identified for the query the user had requested.This total may reflect all documents found for a gene, if an Article Finder search was performed, or the subset of documents that mention both the gene and one or more other search terms, if a Gene Lister search was performed. Clicking on this bar will display a list of these citations. Holding the mouse over the bar will display the gene's official symbol.

Top bar colors: Orange=coding region of the gene specified in the FABLE search; Red=non-coding region of the gene specified in the FABLE search; Green=coding region of genes not specifically searched for; Purple: non-coding region of genes not specifically searched for.

Document Bars

Below the Gene Bar are one or more additional bars that represent the most relevant (as determined by FABLE's relevance algorithm) MEDLINE documents that are associated with the gene. The first author of the document is shown adjacent to each bar. Clicking on a bar will display the MEDLINE abstract for the document. Holding the mouse over the bar will display all or a portion of the document title. The default number of documents shown is 10, or less than 10 if the search identified fewer documents.

Document bar colors: Gold=associated with the gene specifically searched for; Grey=associated with a gene not specifically searched for.

LitTrack Settings

Below the genomic visualization window, parameter settings for the LitTrack and other annotation tracks are listed. The LitTrack settings are listed first, under the category "Literature Hits". Various setting options to control the amount of information displayed in the window are included, using the standard UCSC Genome Browser conventions. The default setting is "Pack", which displays all Gene Bars and Document Bars in a space-efficient manner.

The link "Literature Hits" displays an advanced configuration page for LitTrack. Parameters include a setting for changing the maximum number of documents to display for each gene, and several settings for customizing the track display.

Other browser features

In general, all other features will work exactly like the corresponding features in the UCSC Genome Browser.