BeeSpace Data Sources

 

 

Genome Databases:

 

GenBank (Gene Bank) is for DNA sequences, computing sequence homology is a common method for determining equivalence (via heuristic string similarity).

PDB (Protein Data Bank) is for protein sequences, these contain the experimentally determined sequences, although most protein sequences are computed when used for matching purposes in gene clusters or annotations.

The NCBI (National Center for Biotechnology Information, part of the NLM National Library of Medicine, part of NIH National Institutes of Health) hosts the archive Banks.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi

 

There are also data Bases specifically for the Bee Genome such as hosted by

the EBI (European Bioinformatics Institute at Cambridge University in England).

http://www.ensembl.org/Apis_mellifera/

 

Our collaborator Christine Elsik at Texas A&M University is developing BeeBase.

http://racerx00.tamu.edu/cgi-bin/gbrowse/bee_genome

 

 

Genome Classifications:

 

Gene Ontology (GO) is a classification scheme for identifying the functions of genes.  It has major meta-categories (ontologies) of molecular function, biological process, cellular component.  The major model organisms all are annotated using GO by their local curators (human biologists).   The website and overview paper are:

http://www.geneontology.org/

http://www.canis.uiuc.edu/~schatz/databases/gene.ontology.pdf

 

GO is often used as cross-organism thesaurus, e.g. from flies to bees.  Our gene microarray experiments use computed sequence homology to assign GO categories to bee genes, by matching bee protein sequences to equivalent fly genes where the categories have been assigned by human curators.  The website and overview paper are:

http://titan.biotec.uiuc.edu/bee/honeybee_project.htm

http://www.canis.uiuc.edu/~schatz/databases/bee.microarray.pdf

 

Behavior is not well covered by existing classifications.  An international standards committee is developing a Behavior Ontology, similar to the Gene Ontology, with hierarchical classification.  See the Animal Behavior Metadata Standard at:

http://ethodata.comm.nsdl.org/cgi-bin/wiki.pl


 

Kyoto Encyclopedia of Genes and Genomes (KEGG) is a classification scheme for identifying networks of genes.  The graphs of genes in functional order is displayed with live links into the gene description databases of the model organisms.  These graph networks represent the small known set of manual clusters of related genes.  The website and overview paper are:

http://www.genome.jp/kegg/

http://www.canis.uiuc.edu/~schatz/databases/kegg.overview.nar.pdf

 

 

 

Model Organisms:

 

The major classical organisms have decades of genetics behind them.  Thus there are gene descriptions based on experiments with actual organisms.  Other organisms typically do not have gene descriptions available, the genes are computationally determined by software algorithms from the sequences.  

 

The classical models are: worm and fly, mice and men.  Plus single-cell and model-plant.  Each of the models typically has a data Base or Genome Database containing the genetic informations.  In complexity order, baker’s yeast is a single cell Saccharomyces, nematode worm is thousands of cells  C. elegans, fruit fly is an insect with millions of cells Drosophila, laboratory mouse is a mammal Mus musculus, humans have genetic diseases, mustard plant is a model weed Arabidopsis.

 

Fly FlyBase:                 http://flybase.bio.indiana.edu/

Worm WormBase:       http://www.wormbase.org/

Mouse MGD:               http://www.informatics.jax.org/

Human OMIM:            http://www.nslij-genetics.org/search_omim.html

Yeast SGD:                  http://www.yeastgenome.org/

Plant TAIR:                  http://www.arabidopsis.org/