Related Links


Firefox is recommended for browsing subsystems
Overview of the BeeSpace Question-Answering System
            Our goal is to answer questions naturally asked by biologists such as: "what genes are involved in foraging behavior?" or "What are possible regulators of brain gene expression?". To this end, we identify entities and relationships between entities from a large amount of literature. The initial collection of 38,844 documents came directly to us from FlyBase as their official set. We are conducting a weekly update of our document collection using the latest Medline abstracts.

Subsystems
  • Entity Enrichment
    This subsystem identifies important concepts, in the form of entities, related to a free text query.
  • Relation Mining
    This subsystem supports question-answering by utilizing entity-relation semantics.

Firefox is recommended for browsing the subsystems

Entities and Relations
            The system supports multiple types of entities of interest to insect and other biologists, including:
  • Genes          (The system accepts gene symbols of Drosophila genes from FlyBase)
  • Anatomy      (The system accepts FlyBase anatomy terms listed here)
  • Behavior     (curated by experts – the list can be downloaded here)
  • Chemical     (curated by experts – the list can be downloaded here

            Entity recognition is done by matching words with entity dictionaries as well as disambiguation techniques (for gene names). The relations among entities we recognize include:
  • Regulation: an interaction relationship describing one gene being regulated by another, identified through matching with predefined syntactic patterns.
  • Expression: a gene-anatomy relationship describing the physical location where a gene is expressed, through keywords such as expression and localized.
  • Function: a gene-chemical or gene-behavior relationship describing the molecular function or biological process that a gene regulates, through co-occurrence in the same sentence.
  • Annotation: the gene-GO relationship describing the GO terms of a gene, imported from FlyBase.
 

Entity Enrichment Subsystem
Overview:

            The first subsystem identifies important entities related to a free text query. After retrieving documents for a query, the system uses a statistical test to rank all entities of each type, based on their overrepresentation in returned documents against the entire collection. As one scenario, a user may query for a biological process and the system will retrieve related documents with the recognized entities highlighted (and color coded); the system will suggest a list of important genes involved in this process derived from the retrieved documents.

Running the system:

            To run the entity subsystem, enter your query in the search box in the BeeSpace Semantic Search tab and press enter or the go button (magnifying glass icon).  Once the area below populates with the pertinent abstracts, you can look for enriched entities.  For example, if you are interested in genes associated with your query click on the “Show Enriched Genes” button along the left side of the interface, and then proceed to click on the tab “Enriched Genes”.
Similarly if you are interested in enriched Behaviors
or Chemicals
or Anatomy
associated with your query term, click on the relevant button along the left and proceed to the relevant tab.  Within the tabs of the enriched entity there are three columns; entity identity, Score and Doc List. The score denotes statistical significance (based on their overrepresentation in returned documents against the entire collection).  The scoring is directly correlated to the significance, i.e. higher the score, higher the significance. The Doc List contains the PMID numbers of abstracts that denotes the abstract linking the query with the entity.

            The BeeSpace Semantic Search tab contains all the abstracts returned with your query and the abstract can be viewed by clicking the “+” on the left of the abstract title. Within the abstract, the relevant entities will be highlighted, orange for anatomy, blue for behavior, yellow for chemical and green for gene. All the highlighted entities are hyperlinked to the respective database; to Flybase (for gene
and anatomy),
to PubMed (for behavior)
and to PubChem (for chemical).


Examples to try on the entity enrichment subsystem:
- courtship
- chemosensory
- juvenile hormone

Relation Mining Subsystem
Overview:

            The second subsystem supports question-answering by utilizing entity-relation semantics. We provide template queries on the relational database, enabling the user to submit questions, simply by filling in one or two fields. The templates correspond to the relations: regulation, expression, and function. Returned abstracts contain highlighted (color-coded) entities with live links to FlyBase, PubMed and PubChem, similar to the entity subsystem. We also allow queries joining multiple relations, allowing one to ask sophisticated questions, such as find transcription factors expressed in honey bee brains. This is a powerful approach that can uncover facts described across articles.

Running the system:

           The relations subsystem
allows extraction of information about several different combinations
of relationships, as described below.
  • Find all genes that are expressed in the anatomic term X
    eg. X = brain
    eg. X = wing
    eg. X = wing margin
  • Find the body parts where the gene X is expressed
    eg. X = hb
    eg. X = Sxl
    eg. X = nos
  • Find all genes that may be related to the behavior X
    eg. X = foraging behavior
  • Find all types of behavior that the gene X is likely to influence
    eg. X = fru
    eg. X = tra
  • Find the target genes that are regulated by the regulatory gene X
    eg. X = bcd
  • Find the genes that regulate the expression of the gene X
    eg. X = hb
  • Find the genes that are expressed in body part X, and annotated by the GO term Y
    eg. X = neuron
    eg. Y = GO:0007417 (central nervous system development)
  • Find genes involved in behavior X with GO term Y
    eg. X = ecdysis behavior and Y = GO:0005184 (neuropeptide hormone activity) X = hb
  • Find genes that are expressed in anatomy part X and Y (Intersection)
    eg. X = mushroom body and Y = larva
  • Find genes that are expressed in anatomy part X or Y (Union)
    eg. X = mushroom body and Y = larva
  • Find pairs of regulator-target that are expressed in tissue X
    eg. X = embryo
    eg. X = larva
    eg. X = follicle cell
  • Find genes associated with behavior X and Y
    eg. X = courtship behavior and Y = reproductive behavior

           
For example, if you are interested in finding the anatomical regions where a certain gene is expressed;
  • Select Find all genes that are expressed in the anatomic term X in the pull-down menu
      in the relation search panel (left panel).
  • Then enter the Gene name (gene symbol only) in the box next to “X:”
    and press the search button at the bottom of the left panel.
  • Once the search is done the search results (right panel) will populate.
  • The abstracts that display the relationship will be clustered, in this example, by anatomy and the number of items that show evidence for the relationship will be displayed. The ‘+’ and ‘–‘ signs on the left will enable you to see more or less information
    respectively. All the entities will be highlighted (color-coded) and will be hyperlinked to FlyBase, PubMed, and PubChem for genes, anatomy, behaviors and chemicals respectively, similar to the entity subsystem (described above).

Procedure and Evaluation of Entity and Relation Extraction
1. Entity Recognition
1.1 Method

The detailed description of the Gene Recognition can be downloaded here
1.2 Evaluation of Entity Recognition

The evaluation procedure and measure can be downloaded here
Evaluation data for all genes and anatomy terms can be downloaded here
Evaluation data for the ambiguous genes can be downloaded here

2. Relation Extraction

The method for extraction of relationships between entities can be downloaded here
2.1 Full list of 32 patterns for gene-gene relation extraction is available here
2.2 Full list of keywords for gene-anatomy relation extraction is available here

The BeeSpace Question-Answering System is developed by:

Faculty:
Prof. Bruce Schatz
Prof. Chengxiang Zhai

Students and staff:
Xin He
Yanen Li
Radhika Khetani
Barry Sanders
Yue Lu
Xu Ling

BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects. Xin He; Yanen Li; Radhika Khetani, Barry Sanders, Yue Lu; Xu Ling; ChengXiang Zhai; Bruce Schatz. Nucleic Acids Research 2010 38:W175-W181

free-access links to the online article:
 Abstract:
 http://nar.oxfordjournals.org/cgi/content/abstract/38/suppl_2/W175
 Full Text:
 http://nar.oxfordjournals.org/cgi/content/full/38/suppl_2/W175
 PDF:
 http://nar.oxfordjournals.org/cgi/reprint/38/suppl_2/W175

If you have any question or feedback, please send email to: beespace-help@igb.uiuc.edu

 

 


Powered by The BeeSpace Team at University of Illinois
Copyright 2010