Skip to main content

Table 1. Databases, tools, resources for genomes and annotation.

From: Solving the Problem: Genome Annotation Standards before the Data Deluge

Category/Title

Description

Reference

URL

General

   

NCBI Genome Annotation Workshop

All information from this publication, the Annotation Workshop, and futureannouncements will be made available

 

http://www.ncbi.nlm.nih.gov/genomes/AnnotationWorkshop.html

Difference between Archive and Curated Databases

GenBank, RefSeq, TPA and UniProt:What’s in a Name?

Microbe Online

http://www.microbemagazine.org/index.php?option=com_content&view=article&id=1270:genbank-refseq-tpa-and-uniprot-whats-in-a-name&catid=347:letters&Itemid=646

Difference between Archive and Curated Databases

GenBank, RefSeq, TPA and UniProt:What’s in a Name?

NCBI Handbook

http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook&part=ch1#GenBank_ASM

INSDC

International Nucleotide Sequence Database Collaboration

 

http://www.insdc.org

INSDC Feature Table

Feature table document

 

http://www.insdc.org/documents/feature_table.html

DDBJ

DNA Databank of Japan

[35]

http://www.ddbj.nig.ac.jp

ENA

European Nucleotide Archive

[36]

http://www.ebi.ac.uk/ena

GenBank

GenBank

[20]

http://www.ncbi.nlm.nih.gov/genbank/index.html

Automated Annotation providers

   

NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP)

Intended for use during the annotation of prokaryotic genomes in preparation for submission to GenBank — capable of annotating complete genomes as wells WGS genomes

 

http://www.ncbi.nlm.nih.gov/genomes/static/Pipeline.html

JCVI Annotation Service

Anyone with a prokaryotic genome sequence in need of annotation may submit it to the JCVI Annotation Service completely free-of-charge

 

http://www.jcvi.org/cms/research/projects/annotation-service/overview

IGS Annotation Engine

A free resource for genomics researchers and educators bringing advanced bioinformatics tools to the lab bench and the classroom.

 

http://ae.igs.umaryland.edu/cgi/index.cgi

KAAS - KEGG Automatic Annotation Server

KAAS (KEGG Automatic Annotation Server)provides functional annotation of genes by BLAST comparisons against the manually curated KEGG GENES database with resulting KO (KEGG Orthology) assignments and automatically generated KEGG pathways

[37]

http://www.genome.jp/tools/kaas

RAST

RAST (Rapid Annotation using Subsystem Technology) is a fully automated service for annotating bacterial and archaeal genomes — provides high quality genome annotations for these genomes across the whole phylogenetic tree

[38]

http://rast.nmpdr.org

DOE-JGI MAP

Expert Review Data Submission: Microbial Genomes & Management

[39]

http://img.jgi.doe.gov/cgi-bin/submit/main.cgi

Annotation Cleanup, Analyses, and Validation Tools

   

NCBI Submission Check Tool

For the validation of genome submissions to GenBank — utilizes a series of self-consistency checks as well as comparison of submitted annotations to computed annotations — web-based and downloadable versions available

 

http://www.ncbi.nlm.nih.gov/genomes/frameshifts/frameshifts.cgi

NCBI Sequin Validation

Sequin is a standalone tool for submitting and updating sequences

[20]

http://www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequin.html

NCBI TBL2ASN

Command-line tool for automation of sequence records to GenBank

[20]

http://www.ncbi.nlm.nih.gov/Genbank/tbl2asn2.html

NCBI Discrepancy report

Evaluation of ASN.1 files for annotation discrepancies-part of Sequin, available separately as downloadable command line version, and part of tbl2asn

[20]

http://www.ncbi.nlm.nih.gov/Genbank/asndisc.html

Broad’s Gene Pidgin (formerly BioName)

A free resource for genomics researchersand educators bringing advanced bioinformaticstools to the lab bench and the classroom.

 

http://ae.igs.umarvland.edu/cgi/index.cgi

JCVI’s Protein Naming Utility

KAAS (KEGG Automatic Annotation Server) provides functional annotation of genes by BLAST comparisons against themanually curated KEGG GENES database with resultingKO (KEGG Orthology) assignments andautomatically generated KEGG pathways

[37]

http://kaas.genome.jp/tools/kaas/

Frameshift Tool

RAST (Rapid Annotation using Subsystem Technology) is afully-automated service for annotating bacterial andarchaeal genomes — provides high quality genome annotations forthese genomes across the whole phylogenetic tree

[38]

http://rast.nmpdr.org

Annotation Report

Expert Review Data Submission: Microbial Genomes & Management

39

http://img.jgi.doe.gov/cgi-bin/submit/main.cgi

Annotation Guidelines

   

GenBank Bacterial Genome Submission Guidelines

For the validation of genome submissions to GenBank-utilizes a series of self-consistency checksas well as comparison of submitted annotations tocomputed annotations — web-based anddownloadable versions available

 

http://www.ncbi.nlm.nih.gov/genomes/frameshifts/frameshifts.cgi

Annotation Instructions

Sequin is a standalone tool for submittingand updating sequences

[20]

http://www.ncbi.nlm.nih.gov/Sequin/OuickGuide/sequin.htm

Project Submission

Command-line tool for automation of sequencere-cords to GenBank

[20]

http://www.ncbi.nlm.nih.gov/Genbank/tbl2asn2.html

Locus_tag proposal

Evaluation of ASN.1 files for annotation discrepancies-part of Sequin, available separately as downloadablecommand line version, and part of tbl2asn

[20]

http://www.ncbi.nlm.nih.gov/Genbank/asndisc.html

UniProt’s Protein Naming Guidelines

UniProt’s prokaryotic-specific protein naming guidelines — adopted by INSDC

 

http://www.uniprot.org/docs/nameprot

GSC Structured Format

Accepted structured format for genome metadata including SOPs

[43]

http://gensc.org/gc_wiki/index.php/MIGS/MIMS/MIENS

Insertion Sequences

Insertion sequence finder, nomenclature, and registry

[44]

http://www-is.biotoul.fr/

Transposons

Transposon nomenclature and registry

[45]

http://www.ucl.ac.uk/eastman/tn/

Enzyme Commission Numbers

Official NC-IUBMB site

 

http://www.chem.qmul.ac.uk/iubmb/enzyme/

UniProt ENZYME

ENZYME is a repository of information relative to the nomenclature of enzymes.

 

http://ca.expasy.org/enzyme/

Functional Annotation/Protein Families

   

NCBI COGs

Clusters of orthologous groups - no longer actively curated

[46]

http://www.ncbi.nlm.nih.gov/COG/

NCBI ProtClustDB

Cliques of related proteins — curated and uncurated — for multiple organism groups including prokaryotes and Viruses

[33]

http://www.ncbi.nlm.nih.gov/proteinclusters

NCBI Cluster Comparison Tool

Protein family comparison for functional annotation

 

http://www.ncbi.nlm.nih.gov/sutils/clustcomp.cgi

NCBI Cluster Comparison Tool - Core Mode

Protein family core comparison for functional annotation

 

http://www.ncbi.nlm.nih.gov/sutils/clustcomp.cgi?core=on

List of Core Clusters

Protein family core list

 

http://www.ncbi.nlm.nih.gov/sutils/clustcomp.cgi?report=core

UniProt HAMAP

system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies in prokaryotes and plastids

[47]

http://ca.expasy.org/sprot/hamap/

KEGG Orthology Groups

Manually defined ortholog groups that correspond to KEGG pathway nodes and BRITE hierarchy nodes

[48]

http://www.genome.jp/kegg/ko.html

JCVI’s TIGRFAMs

Protein families based on Hidden Markov Models

[49]

http://www.jcvi.org/cms/research/projects/tigrfams/overview/

ACLAME

Database dedicated to the collection and classification of mobile genetic elements

[50]

http://aclame.ulb.ac.be/

E. coli CCDS Project

Comparison of annotation for model E. coli K-12 MG1655

 

http://www.ncbi.nlm.nih.gov/genomes/MICROBES/ecok12.cgi