High quality draft genomic sequence of Flavihumibacter solisilvae 3-3T

Flavihumibacter solisilvae 3-3T (= KACC 17917T = JCM 19891T) represents a type strain of the genus Flavihumibacter within the family Chitinophagaceae. This strain can use various sole carbon sources, making it applicable in industry and bioremediation. In this study, the draft genomic information of F. solisilvae 3-3T is described. F. solisilvae 3-3T owns a genome size of 5.41 Mbp, 47 % GC content and a total of 4,698 genes, including 4,215 protein coding genes, 439 pseudo genes and 44 RNA encoding genes. Analysis of its genome reveals high correlation between the genotypes and the phenotypes.


Introduction
The genus Flavihumibacter was established in 2010 [1] and comprises three recognized species, Flavihumibacter petaseus T41 T [1], Flavihumibacter cheonanensis WS16 T [2] and Flavihumibacter solisilvae 3-3 T [3], that were isolated from a subtropical rainforest soil, a shallow stream sediment and a forest soil, respectively. The Flavihumibacter members are Gram-positive, rod-shaped, strictly aerobic, non-motile, yellow-pigmented bacteria. The strains all contain phosphatidylethanolamine (as the major polar lipid, menaquinone-7 as the major respiratory quinine, iso-C 15:0 and iso-C 15:1 G as the principal fatty acids. In addition, the strains are oxidase-and catalase-positive and with a G + C content range of 45.9-49.5 mol% [1][2][3]. To the best of our knowledge, the genomic information of Flavihumibacter members still remains unknown. In this study, we present the draft genome information of F. solisilvae 3-3 T . A polyphasic taxonomic study revealed that F. solisilvae 3-3 T could utilize 33 kinds of sole carbon substrates, including 11 kinds of saccharides and 22 kinds of organic acids and amino acids [3]. Specially, this strain could utilize aromatic compound 4-hydroxyphenylacetic acid as a sole carbon source making it applicable environmental bioremediation [4][5][6]. In addition, this strain could utilize quinic acid as a sole carbon. Quinic acid is the substrate used to synthesize aromatic amino acids (phenylalanine, tyrosine and tryptophan) via the shikimate pathway. These aromatic amino acids are very useful as food additives, sweetener and pharmaceutical intermediates [6,7]. The genome analysis of F. solisilvae 3-3 T will provide the genomic basis for better understanding these mechanisms and applying the strain to industries and bioremediation more efficiently.

Organism information
Classification and features F. solisilvae 3-3 T was isolated from forest soil of Bac Kan province in Vietnam [3]. The classification and features of F. solisilvae 3-3 T are shown in Table 1. A maximumlikelihood tree was constructed based on the 16S rRNA gene sequences using MEGA 5.0 [8]. The bootstrap values were calculated based on 1,000 replications and distances were calculated in accordance with Kimura's two-parameter method [9]. The phylogenetic tree showed that F. solisilvae 3-3 T was clustered with the other Flavihumibacter members (Fig. 1).

Genome sequencing and assembly
The genome of F. solisilvae 3-3 T was sequenced by Illumina Hiseq 2,000 technology [11] with Paired-End library strategy (300 bp insert size). TruSeq DNA Sample Preparation Kits are used to prepare DNA libraries with insert sizes from 300-500 bp for single, paired-end, and multiplexed sequencing. The protocol supports shearing by either sonication or nebulization of 1 μg of DNA [12].

Genome annotation
Genome annotation was performed through the NCBI Prokaryotic Genome Annotation Pipeline which combined the Best-Placed reference protein set and the gene caller GeneMarkS+. WebMGA-server [14] with E-value cutoff 1-e 3 was used to assess the COGs. The translated predicted CDSs were also used to search against the Pfam protein families database [15]. TMHMM Server v.2.0 [16], SignalP 4.1 Server [17] and CRISPRfinder program [18] were used to predict transmenbrane helices,  signal peptides and CRISPRs in the genome, respectively. The metabolic pathway analysis were constructed using the KEGG (Kyoto Encyclopedia of Genes and Genomes) [19].

Genome properties
The daft genome size of F. solisilvae 3-3 T is 5,410,659 bp with 47 % GC content and contains 75 contigs. From a total of 4,698 genes, 4,215 (89.72 %) genes are protein coding genes, 439 (9.34 %) are pseudo genes and 44 (0.94 %) are RNA encoding genes. The genome properties and statistics are shown in Table 3 and Fig. 3  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome Fig. 3 A Graphical circular map of F. solisilvae 3-3 T genome. From outside to center, ring 1, 4 show protein-coding genes colored by COG categories on forward or reverse strand; ring 2, 3 denote genes on forward or reverse strand; ring 5 shows G + C content plot, and the innermost ring represents GC skew different metabolic pathways [20]. The putative enzymes that responsible to the utilization of 20 sole carbons were found in the genome (Table 5). All key enzymes in the Embden-Meyerhof-Parnas pathway (glucokinase, pyruvate kinase and 6-phosphofructokinase) and TCA cycle are present in F. solisilvae 3-3 T . The key enzymes of Pentose Phosphate pathway (glucose-6-phosphate dehydrogenase, 6-phosphogluconolactonase and 6-phosphogluconate dehydrogenase) were also found. The presence of 4-hydroxyphenylpyruvate dioxygenase (KIC95062), homogentisate 1,2-dioxygenase (KIC93392) and other related enzymes suggests that 4-hydroxyphenylacetic acid is degradable via homogentisic acid pathway [21]. In addition, the presence of 3-dehydroquinate dehydratase (KIC93382), shikimate dehydrogenase (KIC92987), shikimate kinase (KIC93265), 3-phosphoshikimate 1-carboxyvinyltransferase (KIC94147) and chorismate synthase (KIC94148) indicates that F. solisilvae 3-3 T could probably utilize quinic acid to synthesize the three aromatic amino acids (tryptophan, tyrosine and phenylalanine) via shikimate pathway [7].

Conclusion
To the best of our knowledge, this report provides the first genomic information of the genus Flavihumibacter. Analysis of the genome shows high correlation between the genotypes and the phenotypes. The genome possesses many key proteins of central carbohydrate metabolism which provides the genomic basis to utilize the various carbon sources. In addition, analyzing its genome indicates that this strain has potential application for the production of aromatic amino acids and for environmental bioremediation. The total is based on the total number of protein coding genes in the genome