High-quality draft genome sequences of five anaerobic oral bacteria and description of Peptoanaerobacter stomatis gen. nov., sp. nov., a new member of the family Peptostreptococcaceae

Here we report a summary classification and the features of five anaerobic oral bacteria from the family Peptostreptococcaceae. Bacterial strains were isolated from human subgingival plaque. Strains ACC19a, CM2, CM5, and OBRC8 represent the first known cultivable members of “yet uncultured” human oral taxon 081; strain AS15 belongs to “cultivable” human oral taxon 377. Based on 16S rRNA gene sequence comparisons, strains ACC19a, CM2, CM5, and OBRC8 are distantly related to Eubacteriumyurii subs. yurii and Filifactor alocis, with 93.2 – 94.4 % and 85.5 % of sequence identity, respectively. The genomes of strains ACC19a, CM2, CM5, OBRC8 and AS15 are 2,541,543; 2,312,592; 2,594,242; 2,553,276; and 2,654,638 bp long. The genomes are comprised of 2277, 1973, 2325, 2277, and 2308 protein-coding genes and 54, 57, 54, 36, and 28 RNA genes, respectively. Based on the distinct characteristics presented here, we suggest that strains ACC19a, CM2, CM5, and OBRC8 represent a novel genus and species within the family Peptostreptococcaceae, for which we propose the name Peptoanaerobacter stomatis gen. nov., sp. nov. The type strain is strain ACC19aT (=HM-483T; =DSM 28705T; =ATCC BAA-2665T). Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0027-8) contains supplementary material, which is available to authorized users.


Introduction
The oral cavity is a major gateway to the human body [1] and one of the principle sites of interest to the Human Microbiome Project, which aims to characterize this microbiome and understand its role in health and disease.
The 16S rRNA surveys and metagenomic analyses indicate that the typical oral community is comprised of over 700 bacterial species [2][3][4], approximately half of which have been isolated in culture and formally named. The rest remain uncultivated or unclassified [1,5]. Anaerobic species are of particular importance as they constitute approximately one half of the human oral microbiome [6][7][8] and likely play an important role in the function of the oral microbial community.
The Human Oral Microbiome Database, provides comprehensive information on currently known prokaryote species and presents a provisional "oral taxa" naming scheme for the presently unnamed cultivable and uncultivable species. HOMD also provides links to genome sequencing projects of oral bacteria [9]. There are annotated genomes for 381 oral taxa currently available at HOMD.
Five anaerobic strains ACC19a, CM2, CM5, OBRC8, and AS15 from the family Peptostreptococcaceae were isolated earlier from the subgingival plaque obtained from two young African American and two young Caucasian females. Cultivation techniques were described before [10].
Family Peptostreptococcaceae currently is represented by five validly-named genera, Anaerosphaera, Filifactor, Peptostreptococcus, Sporacetigenium, and Tepidibacter [11,12], and several unclassified species. At this time, genome sequences of oral bacteria from the family Peptostreptococcaceae are available for three strains of Peptostreptococcus anaerobius, one strain of P. stomatis, one strain of Filifactor alocis, and one strain of unclassified Eubacterium yurii subsp. margaretiae.
According to HOMD, the genera Peptostreptococcus and Filifactor are represented by three oral taxa, while the other eleven Peptostreptococcaceae oral taxa remain formally unclassified. To date, only two unclassified oral taxa are represented by cultivable isolates, whereas nine stay "yet uncultured" and are known only by their molecular signatures. Strains ACC19a, CM2, CM5, and OBRC8 described here represent the first known cultivable members of "yet uncultured" human oral taxon 081; strain AS15 is classified as a member of "cultivable" oral taxon 377.
Cells of strains ACC19a, CM2, CM5, and OBRC8 are non-spore-forming, highly motile, peritrichous rods with round ends; cells often form chains. Cells of strain AS15 are motile, monotrichous, straight rods with square ends that often form rosettes or brushlike aggregates (Table 1, Fig. 2). On liquid TY medium, Fig. 1 Maximum-Likelihood phylogenetic tree based on 16S rRNA gene sequence comparisons of strains ACC19a, CM2, CM5, OBRC8, and AS15 (shown in bold) together with other representatives of the Peptostreptococcaceae family and other related human bacteria. The tree was derived based on Tamura-Nei model using MEGA 5 [39]. Bootstrap values > 50 % calculated for 1000 subsets are shown at branch-points. Bar 0.02 substitutions per position. Strains whose genomes have been sequenced are marked with an asterisk  Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from Gene Ontology project [37,38] cells of strains ACC19a, CM2, CM5, and OBRC8 range from 1.0 to 3.4 μm in length and from 0.4 to 0.8 μm in width; cells of strain AS15 are 1.5 -4.7 μm long and 0.4 -0.5 μm wide (Table 1, Fig. 2). Cells are Gram-positive, structurally and by staining ( Table 1, Fig. 2). After 48-72 h incubation on TY blood agar plates at 37°C, strains ACC19a, CM2, CM5, and OBRC8 formed pin-point, beige, circular, convex, non-hemolytic colonies, approximately 0.5 mm in diameter. Colonies of strain AS15 are circular, umbonate, alpha-hemolytic, yellow-greenish in pigment, 1 mm in diameter after 48-72 h, and 2-3 mm in diameter after 168 h. Isolated strains grew only under strict anaerobic conditions. Growth occurred from 30 to 42°C, with optimum growth at 37°C. All isolates were susceptible to discs containing 1 mg kanamycin, 2 units penicillin, 60 μg erythromycin, 30 μg chloramphenicol, 30 μg tetracycline and bile. Catalase, oxidase and urease activities were negative; nitrate reduction was not detected, gelatin was not liquefied, and aesculin was not hydrolyzed. Strains ACC19a, CM2, CM5, and OBRC8 did not produce indole, while strain AS15 did produce indole (Table 1). All strains were able to grow on 2.0 -10 g l −1 of yeast extract, but not on casamino acids. No visible biomass was formed in medium with 0.5 -2.0 g l −1 of yeast extract only. All five strains produced acid on API 20A media containing glucose, maltose and sucrose, but not lactose, arabinose, cellobiose, mannose, melezitose, raffinose, rhamnose, trehalose, xylose, glycerol, mannitol, salicin and sorbitol. All produced gas on TY liquid medium. In liquid medium, supplemented with 5.0 g l −1 of yeast extract, strains CM2, OBRC8 and AS15 fermented D-glucose, D-sucrose and D-maltose; strains ACC19a, CM2, CM5 and OBRC8 poorly fermented L-glutamine; strain CM2 fermented L-serine; strains ACC19a, CM5, and AS15 weakly fermented L-alanine; strains CM2, CM5, and AS15 poorly fermented L-valine. The major metabolic end products of strains ACC19a, CM2, and CM5 on TY medium were acetate and propionate (Table 1).

Genome sequencing information
Genome project history The genomes were selected for sequencing in 2010-11 by the HMP. For strains ACC19a, CM2, and CM5, sequencing, finishing, and annotation were performed by the Broad Institute of Harvard and MIT. For strains OBRC8 and AS15, sequencing, finishing, and annotation were performed by the J. Craig Venter Institute (JCVI). The genomes were deposited in the Genome On-Line Database [16]; the complete genome sequences were deposited in GenBank and are available in the RefSeq database [17][18][19]. Project information and association with MIGS version 2.0 is presented in Table 3. The genome finishing quality for all strains was High-Quality Draft.

Growth conditions and genomic DNA preparation
Strains ACC19a, CM2, CM5, OBRC8, and AS15 were cultivated on liquid TY anaerobic medium as previously described [10]. Genomic DNA was extracted from microbial biomass with the PowerMicrobial® Maxi DNA Isolation Kit (MO BIO Laboratories, Inc.) using phenol: chloroform in combination with bead beating cell lysis.

Genome sequencing and assembly
Strains ACC19a, CM2, and CM5 were sequenced using two 454 pyrosequence libraries on the 454 platform: one standard 0.6 kb fragment library and one 2.5 kb jump library [20]. Library construction and sequencing process details are available at www.broadinstitute.org and 454 technologies. For strain CM2, additional sequence data was generated using two Illumina libraries on the Illumina HiSeq 2000 platform: one standard 180 bp fragment library and one 3-5 kb jump library. Library construction and sequencing process details are available at www.broadinstitute.org. Strains ACC19a and CM5 454 data set was assembled using Newbler Assembler version 2.3 PostRelease-11/ 19/2009 and CM2 data sets were assembled using ALL-PATHS version R39099 (Table 3).
All three assemblies are considered High-Quality Draft and consist of: 59 contigs with a total size of 2,541,543 bases for strain ACC19a; 106 contigs with a total size of 2,594,242 bases for strain CM5; and 19 contigs with a total size of 2,312,592 bases for strain CM2. The error rates of the draft genome sequences for strains ACC19a and CM5 are estimated to be less than one in 10,000 (accuracy of~Q40) and less than 1 in 1,000,000 (accuracy of~Q60) for strain CM2. Average sequence coverage for strains ACC19a and CM5 is 40× and 39×, respectively, and 282× for strain CM2 (Tables 3, 4 and 2, Additional file 1: Table S1).
Strains OBRC8 and AS15 were sequenced using Illumina paired-end sequencing technology on the Illumina HiSeq 2000 platform: one standard Illumina paired-end library. Library construction and sequencing process details are available at www.jcvi.org. Strains OBRC8 and AS15 Illumina data sets were assembled using Celera Assembler version 6.1.
Both assemblies are considered High-Quality Draft and consist of: 40 contigs with a total size of 2,553,276 bases for strain OBRC8 and 52 contigs with a total size of 2,654,638 bases for strain AS15. The error rates of the draft genome sequences for strains OBRC8 and AS15 are estimated to be less than 0.03 or 3 %. Average    (Tables 3, 4 and 2, Additional file 1: Table S1). Assessment of coverage, GC content, contig BLAST and 16S rRNA gene classification was consistent with the expected organism for all five genomes.

Genome annotation
Strains ACC19a, CM2, and CM5 were annotated using PRODIGAL [21] with no additional manual curation performed. For strains OBRC8 and AS15, genes were identified using GLIMMER, also with no additional manual curation. Table 2 summarizes statistics for each genome, including gene count, according to the original annotations and the Integrated Microbial Genomes (IMG) and Metagenomes website as of May 15, 2014 [22]. Additional annotations using RAST were performed for comparison [23].
COG values for the annotation data directly from the sequencing centers were found on the IMG website, as of May 15, 2014 ( Table 5). The percentages in Table 5 are the number of COG proteins out of the total number of annotated genes. For all strains, 32.9 % -39.8 % of the proteins were not predicted to be part of a COG category; strain ACC19a had the highest percentage of proteins unassigned (Table 5). Strain CM2 had the highest sequence coverage, at 282×, and the lowest percentage of unassigned proteins, at 32.9 % (Table 3 and 5).

Metabolic network analysis
The metabolic Pathway/Genome Databases (PGDBs) for strains ACC19a, CM2, and CM5 were generated on February 10, 2013 from genomic data obtained from RefSeq [17][18][19] by the PathoLogic program using Pathway Tools software version 17.0 [24] and MetaCyc version 17.0 [25]. These PGDBs are categorized as Tier 3, meaning that they were generated computationally, have undergone no subsequent manual curation, and may contain errors [26]. In addition, the RAST annotations of the genomic data for all five strains were uploaded to a downloadable version of Pathway Tools version 17.5 [24].
According to the RAST annotations, for strains ACC19a, CM2, and CM5, complete "sucrose degradation III (sucrose invertase)" pathways were predicted in Pathway Tools, but were marked as not present based on the RefSeq data. Based on the RAST annotations, for strains OBRC8 and AS15, this pathway was also predicted in Pathway Tools. Based on biological testing, strains CM2, OBRC8, and AS15, but not ACC19a and CM5, used sucrose as a carbon source. Strains CM2, OBRC8, and AS15 were also able to use glucose and maltose as carbon sources (Table 1). In Pathway Tools, glucose is part of multiple pathways, including glycolysis I and III, glucose and xylose degradation, and heterolactic fermentation pathways. For all five strains, there was a complete glycolysis III pathway. In Pathway Tools, maltose is also part of multiple pathways, including, the starch degradation I through V and the glycogen degradation I pathways. In the starch degradation V pathway, a 4-α-glucanotransferase (EC 2.4.1.25) is required to degrade maltose into α-D-glucose. We confirmed that strains CM2, OBRC8, and AS15 have a gene for this protein.
The number of genes identified by RAST [23] in biosynthetic pathway of strains ACC19a, CM2, CM5, OBRC8, AS15 and related organisms is shown in Table 7. Eight to nine genes associated with synthesis of teichoic and lipoteichoic acids, as annotated by RAST, were found in the genomes of strains ACC19a, CM2, CM5, and OBRC8; nine to eleven were found in the genomes of AS15 and [E.] yurii subsp. margaretiae; and four were found in the genome of F. alocis (Table 7). We detected one gene associated with synthesis of benzoquinones or naphthoquinones in genomes of strain AS15, [E.] yurii subsp. margaretiae only. There were no predicted gene sequences with recognizable homology to mycolic acids or lipopolysaccharides biosynthesis. Three and six RAST-annotated genes associated with diaminopimelic acid (DAP) synthesis were present in the genome of strains ACC19a, CM2, CM5, OBRC8, and AS15 and [E.] yurii subsp. margaretiae, respectively. According to the RAST annotations, eight to nine genes associated with polyamines metabolism, and eleven to eighteen genes, that are associated with polar lipids metabolism, were present in the genomes (Table 7).  Accession number AFZE00000000 AFZF00000000 AFZG00000000 ALNK00000000 ALJM00000000 AEES00000000 CP002390 Teichoic and lipoteichoic acids Description of Peptoanaerobacter gen. nov.
Peptoanaerobacter (Gr. v. peptô, cook, digest; Gr. pref. an-, not; Gr. masc. n. aer, air; N.L. masc. n. bacter, rod, staff; N.L. masc. n. anaerobacter, the digesting rod not [living] in air). Cells are Gram-positive, structurally and after staining, motile peritrichous rods with round ends, about 1.2 -2.5 μm long and 0.4 -0.8 μm wide, often occurring in chains. No spores are formed. Strictly anaerobic. Catalase, oxidase and urease are negative. Nitrate is not reduced. Growth is supported by yeast extract but not Casamino acids. Yeast extract is required for growth on glucose, sucrose and maltose. The major metabolic end-products of glucose fermentation are acetate and propionate. Growth temperature range is 30-42 o C. Major fatty acids are C14:0, C16:0, C16:1ω 7c. Genes responsible for biosynthesis of teichoic and lipoteichoic acids, polar lipids, polyamines and DAP are present in the genome. There are no genes responsible for biosynthesis of respiratory benzoquinones or naphthoquinones, mycolic acids or lipopolysaccharides. The type species is Peptoanaerobacter stomatis.