BeeSpace Data
Sources
Genome Databases:
GenBank (Gene Bank) is for DNA sequences, computing sequence homology is a common method for determining equivalence (via heuristic string similarity).
PDB (Protein Data Bank) is for protein sequences, these contain the experimentally determined sequences, although most protein sequences are computed when used for matching purposes in gene clusters or annotations.
The NCBI (
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
There are also data Bases specifically for the Bee Genome such as hosted by
the EBI (European Bioinformatics
Institute at
http://www.ensembl.org/Apis_mellifera/
Our collaborator Christine Elsik at
http://racerx00.tamu.edu/cgi-bin/gbrowse/bee_genome
Genome Classifications:
Gene Ontology (GO) is a classification scheme for identifying the functions of genes. It has major meta-categories (ontologies) of molecular function, biological process, cellular component. The major model organisms all are annotated using GO by their local curators (human biologists). The website and overview paper are:
http://www.canis.uiuc.edu/~schatz/databases/gene.ontology.pdf
GO is often used as cross-organism thesaurus, e.g. from flies to bees. Our gene microarray experiments use computed sequence homology to assign GO categories to bee genes, by matching bee protein sequences to equivalent fly genes where the categories have been assigned by human curators. The website and overview paper are:
http://titan.biotec.uiuc.edu/bee/honeybee_project.htm
http://www.canis.uiuc.edu/~schatz/databases/bee.microarray.pdf
Behavior is not well
covered by existing classifications.
An international standards committee is developing a Behavior Ontology,
similar to the Gene Ontology, with hierarchical classification. See the Animal Behavior Metadata
Standard at:
http://ethodata.comm.nsdl.org/cgi-bin/wiki.pl
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a classification scheme for identifying networks of genes. The graphs of genes in functional order is displayed with live links into the gene description databases of the model organisms. These graph networks represent the small known set of manual clusters of related genes. The website and overview paper are:
http://www.canis.uiuc.edu/~schatz/databases/kegg.overview.nar.pdf
Model Organisms:
The major classical organisms have decades of genetics behind them. Thus there are gene descriptions based on experiments with actual organisms. Other organisms typically do not have gene descriptions available, the genes are computationally determined by software algorithms from the sequences.
The classical models are: worm and fly, mice and men. Plus single-cell and model-plant. Each of the models typically has a data Base or Genome Database containing the genetic informations. In complexity order, baker’s yeast is a single cell Saccharomyces, nematode worm is thousands of cells C. elegans, fruit fly is an insect with millions of cells Drosophila, laboratory mouse is a mammal Mus musculus, humans have genetic diseases, mustard plant is a model weed Arabidopsis.
Fly FlyBase: http://flybase.bio.indiana.edu/
Worm WormBase: http://www.wormbase.org/
Mouse MGD: http://www.informatics.jax.org/
Human OMIM: http://www.nslij-genetics.org/search_omim.html
Yeast SGD: http://www.yeastgenome.org/
Plant TAIR: http://www.arabidopsis.org/