High quality draft genome sequence of Bacteroides barnesiae type strain BL2T (DSM 18169T) from chicken caecum

Bacteroides barnesiae Lan et al. 2006 is a species of the genus Bacteroides, which belongs to the family Bacteroidaceae. Strain BL2T is of interest because it was isolated from the gut of a chicken and the growing awareness that the anaerobic microbiota of the caecum is of benefit for the host and may impact poultry farming. The 3,621,509 bp long genome with its 3,059 protein-coding and 97 RNA genes is a part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.


Introduction
Strain BL2 T (= DSM 18169 = CCUG 54636 = JCM 13652) is the type strain of Bacteroides barnesiae which belongs to the genus Bacteroides [1]. The species epithet is derived from the name of Ella M. Barnes, a British microbiologist, who has contributed much to our knowledge of intestinal bacteriology and anaerobic bacteriology in general. B. barnesiae strain BL2 T was isolated from caecum of a healthy chicken. Four other strains belonging to the same species have been isolated from the same source [1]. The genus Bacteroides represents one of the predominant anaerobic genera found in chicken caecum [2][3][4]. Bacteroides species are thought to play a fundamental role in the breakdown of complex molecules (such as polysaccharides) into simpler compounds that are used by the animal host as well as the microorganisms themselves [5,6], in the utilization of nitrogenous substances and in the biotransformation of bile acids and other steroids [7]. They also play a role as beneficent protectors of the gut against pathogenic microorganisms [8]. Here we present a summary classification and set of features for B. barnesiae strain BL2 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
A 1301 bp long contig contained the most complete 16S rRNA gene copy in the draft genome. This partial gene differed by 7 nucleotides (0.5 %) from the 16S rRNA reference sequence (AB253726) generated for the original description of B. barnesiae [1]. Such a difference is not unusual when comparing original sequences from the time organisms were initially described with sequences of type strain genomes sequenced in the KMG project [9], a problem that was only partially resolved in the sequencing orphan species initiative (SOS) [10]. A representative 16S rRNA gene sequence of strain BL2 T was compared with GenBank using NCBI BLAST. The single most frequent genus found was Bacteroides. The highest-scoring environmental sequences (up to 99.8 % sequence identity), including HQ784912 ('gastrointestinal specimens clone ELU0102-T240-S-NI_000093'), were all from a study on gastrointestinal specimens linked to inflammatory bowel diseases phenotype in human ileum [11] and indicate that close relatives of strain BL2 T and representatives of B. barnesiae are also relevant to human health. Fig. 1 shows the phylogenetic position of B. barnesiae in a 16S rRNA gene sequence-based tree.

Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position [12][13][14]. Sequencing of B. barnesiae strain BL2 T is part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes project [9] which aims not only to increase the sequencing coverage of key reference microbial genomes [15], but also to generate a large genomic basis for the discovery of genes encoding novel enzymes [16]. The genome project is deposited in the Genomes OnLine Database [17] and the permanent draft genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute using state of the art sequencing technology [18]. A summary of the project information is shown in Table 2.
Growth conditions and genomic DNA preparation B. barnesiae strain BL2 T , DSM 18169, was grown anaerobically in DSMZ medium 429 (Columbia Blood Agar) at 37°C [19]. DNA was isolated from 0.5-1 g of cell paste using JetFlex genomic DNA purification (GENOMED) following the standard protocol as recommended by the manufacturer with and additional protease K (50 μl; 21 mg/ml) digest for 60 min. at 58°C followed by addition of 200 μl Protein Precipitation Buffer after protein precipitation and overnight incubation on ice. DNA is available through the DNA Bank Network [20].

Genome sequencing and assembly
The permanent draft genome of B. barnesiae strain BL2 T was generated using Illumina technology [18,21]. An Illumina Standard shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 11,109,700 reads totaling 1,666.5 Mb. All general aspects of library construction and sequencing performed at the DOE-JGI can be found at [22]. All raw Illumina sequence data was passed through DUK,

MIGS-4.4 Altitude Not reported
Evidence codes -TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [44] [23]. Following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet [24], (2) 1-3 kb simulated paired end reads were created from Velvet Contigs using wgsim [25], (3) Illumina reads were assembled with simulated read pairs using Allpaths-LG (version r41043) [26]. Parameters for assembly steps were: 1) Velvet (velveth: 63 -shortPaired and velvetg: −very clean yes -export-

Genome annotation
Genes were identified using Prodigal [27] as part of the DOE-JGI genome annotation pipeline [28,29], following by a round of manual curation using the JGI GenePRIMP pipeline [30]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information non-redundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro database. These data sources were combined to assert a product description for each predicted protein. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes-Expert Review platform [31].

Genome properties
The assembly of the draft genome sequence consists of 43 scaffolds amounting to 3,621,509 bp, and the G + C content is 46.8 % (Table 3). Of the 3,156 genes predicted, 3,059 were protein-coding genes, and 97 RNAs. The majority of the protein-coding genes (71.7 %) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.
Insights from the genome sequence B. barnesiae strain BL2 T , Bacteroides salanitronis strain BL78 T and Bacteroides gallinarum strain C35 T were isolated from the cecum of the same healthy chicken [1].  The GGDC (Genome-to-Genome Distance Calculator) web server (GGDC 2.0) [32] was used for the estimation of the overall similarity between the three Bacteroides genomes. The comparison of B. barnesiae with B. salanitronis and B. gallinarum revealed that 11.1 % and 5.2 %, respectively, of the average of the genome lengths are covered with HSPs (high-scoring segment pairs). The identity within the HSPs was 83.6 % and 84.6 %, respectively, whereas the identity over the whole genome was 9.3 % and 4.4 %, respectively. The comparison of B. gallinarum with B. salanitronis revealed that 5.4 % of the genome is covered with HSPs, with an identity within in the HSPs of 84.1 % and an identity over the whole genome of 4.6 %. According to these calculations the similarity between B. barnesiae and B. salanitronis is higher than the similarity between B. barnesiae and B. gallinarum as well as the similarity between B. gallinarum and B. salanitronis. The genome size of B. barnesiae (3.6 Mb) is significantly smaller than those of B. salanitronis (4.3 Mb) and B. gallinarum (4.9 Mb).