Non-contiguous finished genome sequence and description of the gliding bacterium Flavobacterium seoulense sp. nov.

Flavobacterium seoulense strain EM1321T is the type strain of Flavobacterium seoulense sp. nov., a proposed novel species within the genus Flavobacterium. This strain is a Gram-reaction-negative, aerobic, rod-shaped bacterium isolated from stream water in Bukhansan National Park, Seoul. This organism is motile by gliding. Here, we describe the features of Flavobacterium seoulense EM1321T, together with its genome sequence and annotation. The genome comprised 3,792,640 bp, with 3,230 protein-coding genes and 52 RNA genes.


Introduction
Flavobacterium is the type genus of the family Flavobacteriaceae in the phylum Bacteroidetes. Flavobacterium was proposed by Bergey et al. [1,2] and the description was emended by Bernardet et al. [3]. Flavobacterium species have been isolated from various environments, including seawater, freshwater, river sediments, and soil [4][5][6][7][8]. Members of the genus Flavobacterium are Gramnegative, rod-shaped, yellow-pigmented, aerobic bacteria. At the time of writing, about 118 Flavobacterium species with validly published names have been described [9]; however, the genomes of only 14 type strains in this genus have been sequenced.
Flavobacterium seoulense sp. nov. strain EM1321 T (= KACC 18114 T = JCM 30145 T ) was isolated from stream water in Bukhansan National Park, Seoul, Korea. Here, we present a summary classification and the features of Flavobacterium seoulense EM1321 T as well as its genome sequence and annotation.

Classification and features
Based on its 16S rRNA gene phylogeny and phenotypic characteristics, strain EM1321 T was classified as a member of the genus Flavobacterium (Table 1). Preliminary sequence-based identification using the 16S RNA gene sequences in the EzTaxon database [10] indicated that strain EM1321 T was most closely related to F. granuli Kw05 T (GenBank accession no. AB180738) with a sequence similarity of 96.54%. This value was lower than the 98.7% 16S rRNA gene sequence similarity as a threshold recommended by Stackebrandtia and Ebers [11] to delineate a new species without carrying out DNA-DNA hybridization. Subsequent phylogenetic analysis was performed using the 16S rRNA gene sequences of strain EM1321 T and related species. Sequences were aligned according to the bacterial rRNA secondary structure model using the jPHYDIT [12]. Phylogenic trees were constructed using neighbor-joining (NJ) and maximum-likelihood (ML) methods implemented in MEGA version 5 [13]. The resultant tree topologies were evaluated by bootstrap analyses with 1,000 random samplings. Strain EM1321 T formed a monophyletic clade together with Flavobacterium soli [5] in both the NJ and ML trees; however, the clustering was not supported by the bootstrap analysis ( Figure 1). Flavobacterium nitratireducens [8] was further recovered as a sister group of the monophyletic clade in the ML tree only. Based on these phylogenetic trees, F. soli KACC 17417 T and F. nitratiredu-cens JCM 17678 T were selected as reference strains and were obtained from the corresponding culture collections for comparative study.
Matrix-assisted laser-desorption/ionization time-offlight (MALDI-TOF) MS protein analysis was carried out as previously described [24]. Deposits were done from 12 isolated colonies for each strain (strain EM1321 T and reference strains). Measurements were Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [23]. If the evidence is IDA, the property was directly observed by one of the authors.
made with a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (IS1), 20 kV; IS2, 18.5 kV; lens, 7 kV). The time of acquisition was between 30 seconds and 1 minute per spot. The twelve EM1321 T spectra were imported into the MALDI BioTyper software (version 2.0; Bruker) and analyzed by standard pattern matching (with default parameter settings) against 4,613 bacterial spectra including eight Flavobacterium species, used as reference data, in the BioTyper database. For strain EM1321 T spectrum (Figure 3), no significant score was obtained, suggesting that our isolate was not a member of the eight known species in the database. Spectrum differences with the two closely related Flavobacterium species are shown in Figure 4.

Genome sequencing information
Genome project history Flavobacterium seoulense EM1321 T was selected for genome sequencing based on its phylogenetic position and its 16S rRNA similarity to other members of the genus Flavobacterium. The genome sequence was deposited in Gen-Bank under accession number JNCA00000000.1. A summary of the project and the Minimum Information about a Genome Sequence (MIGS) [14] are shown in Table 3.

Growth conditions and DNA isolation
Flavobacterium seoulense EM1321 T was cultured aerobically on R2A agar medium at 30°C. Genomic DNA was extracted using the QIAamp DNA mini kit (Qiagen).

Genome sequencing and assembly
The genome of strain EM1321 T was sequenced at Chun-Lab, Inc. by using an Illumina Miseq_PE_300 system  with 2 × 300 paired-end reads. The Illumina platform provided 166× coverage (for a total of 3,792,640 sequencing reads) of the genome. CLC Genomics Workbench (ver. 6.5.1) was used for sequence assembly and quality assessment. The final draft assembly contained 56 contigs.

Genome annotation
The genes in the assembled genome were predicted with Rapid Annotation using Subsystem Technology (RAST) server databases [25] and the gene-caller GLIMMER 3.02 [26]. The predicted ORFs were annotated by searching clusters of orthologous groups (COGs) [11] using the  Data from Nupur et al. [8]. *Data incongruent with a previous study [5].

Figure 3
Reference mass spectrum from Flavobacterium seoulense EM1321 T . Spectra from 12 individual colonies were compared and a reference spectrum was generated.

Figure 4
Gel view comparing the Flavobacterium seoulense EM1321 T spectrum with those of other members in the genus Flavobacterium. The gel view displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel-like look. The x-axis records the m/z value. Peak intensity is shown as a gray-scale scheme code. The color bar and the right y-axis indicate the relation between the color of a peak and peak intensity in arbitrary units.

Genome properties
The genome comprised a circular chromosome with a length of 3,792,640 bp and 33.25% G + C content ( Figure 5 and Table 4). It is composed of 56 contigs. Of the 3,282 predicted genes, 3,230 were protein-coding genes and 52 were RNA genes (2 rRNA genes and 50 tRNA genes). The sequencing coverage of rRNA operon (673×) indicated that 4 copies of rRNA operons are exist in this genome. The majority of the protein-coding genes (2,054 genes, 62.58%) were assigned putative functions, while the remaining genes were annotated as hypothetical proteins (1,176 genes, 35.83%). The properties of and statistics for the genome are summarized in Table 4. The distribution of genes into COG functional categories is presented in Table 5 and Figure 5.

Conclusions
Based on the results from phylogenetic and phenotypic analyses, we formally propose the creation of the new species Flavobacterium seoulense sp. nov. for strain EM1321 T . The non-contiguous genome sequence of the type strain was determined and described here.
Description of Flavobacterium seoulense sp. nov.
Flavobacterium seoulense (seo.ul.en'se. N.L. neut. adj., named after Seoul, Korea, the geographical origin of the type strain). Aerobic, Gram-reaction negative. Cells are rod shaped and motile by gliding. Does not have a flagellum. The colonies are yellow in color and translucent on R2A agar medium. Grows at 4-35°C, with optimum growth at 30°C and in 0-4% (w/v) NaCl. Catalase-and oxidase-positive. Positive for alkaline phosphatase, esterase (C4), esterase lipase (C8), leucine arylamidase, acid phosphatase, naphthol-AS-BI-phosphohydrolase, β-galactosidase, and valine arylamidase. Positive for nitrate reduction, but negative for indole production, glucose fermentation, arginine dihydrolase, urease activity, and aesculin and gelatin   The total is based on either the size of the genome in base pairs or the total number of protein-coding genes in the annotated genome.
The G + C content of the genome is 33.25%. The 16S rRNA and genome sequences are deposited in GenBank under accession numbers KJ461685 and JNCA00000000.1, respectively. The type strain EM1321 T (= KACC 18114 T = JCM 30145 T ) was isolated from stream water in Bukhansan National Park, Seoul, Korea. The total is based on the total number of protein coding genes in the annotated genome.