High-quality permanent draft genome sequence of the extremely osmotolerant diphenol degrading bacterium Halotalea alkalilenta AW-7T, and emended description of the genus Halotalea

Members of the genus Halotalea (family Halomonadaceae) are of high significance since they can tolerate the greatest glucose and maltose concentrations ever reported for known bacteria and are involved in the degradation of industrial effluents. Here, the characteristics and the permanent-draft genome sequence and annotation of Halotalea alkalilenta AW-7T are described. The microorganism was sequenced as a part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project at the DOE Joint Genome Institute, and it is the only strain within the genus Halotalea having its genome sequenced. The genome is 4,467,826 bp long and consists of 40 scaffolds with 64.62 % average GC content. A total of 4,104 genes were predicted, comprising of 4,028 protein-coding and 76 RNA genes. Most protein-coding genes (87.79 %) were assigned to a putative function. Halotalea alkalilenta AW-7T encodes the catechol and protocatechuate degradation to β-ketoadipate via the β-ketoadipate and protocatechuate ortho-cleavage degradation pathway, and it possesses the genetic ability to detoxify fluoroacetate, cyanate and acrylonitrile. An emended description of the genus Halotalea Ntougias et al. 2007 is also provided in order to describe the delayed fermentation ability of the type strain.


Introduction
The genus Halotalea includes a single species, i.e., H. alkalilenta, which is a motile, rod-shaped, alkalitolerant and halotolerant Gram-negative staining heterotrophic bacterium [1]. Strain AW-7 T (=DSM 17697 T =CECT 7134 T =CIP 109710 T ) is the type species of the genus Halotalea and of the type strain of the species H. alkalilenta [1]. The strain was isolated from alkaline olive mill waste, which was generated by a two-phase centrifugal olive oil extraction system located in the Toplou Monastery area, Sitia, Crete [1]. The Neo-Latin genus name derived from the Greek and the Latin nouns halos and talea, meaning salt-living and rodshaped cells, respectively. The Neo-Latin species epithet halotalea composed of the Arabic term al qaliy and the Latin epithet lentus (a), meaning alkali and slow respectively which refer to slowly-growing cells under alkaline conditions (alkalitolerant) [1].
Here, a summarized classification and key characteristics are presented for H. alkalilenta AW-7 T , together with the description of the high-quality permanent draft genome sequence and annotation.

Classification and features
The 16S rRNA gene sequence of H. alkalilenta AW-7 T was compared using NCBI BLAST under default settings (e.g., considering only the high-scoring segment pairs (HSPs) from the best 250 hits) with the most recent release of the Greengenes database [17] and the relative frequencies of taxa and keywords (reduced to their stem [18]) were determined and weighted by BLAST scores. The frequency of genera that belonged to the family Halomonadaceae was 95.2 %. The closest match of H. alkalilenta AW-7 T in 16S rRNA gene, submitted in INSDC (=EMBL/NCBI/DDBJ) under the accession number DQ421388 (=NR_043806), were Zymobacter palmae ATCC 51623 T (NR_041786) [7] and Carnimonas nigrifaciens CTCBS1 T (NR_029342) [8] showing BLAST similarities of 96.2 % and 95.3 % respectively and HSP coverages of 99.7 % and 100 % respectively. Figure 1 shows the phylogenetic allocation of H. alkalilenta AW-7 T within the family Halomonadaceae in a 16S rRNA gene sequence-based tree. The sequence of the only 16S rRNA gene copy in the genome differs by 5 nucleotides from the previously published 16S rRNA sequence (DQ421388= NR_043806, coverage 95.0 %).
In the past, H. alkalilenta AW-7 T and C. nigrificans CTCBS1 T were reported as oxidase positive [1,8]. However, genome comparisons showed that both H. alkalilenta AW-7 T and C. nigrificans CTCBS1 T possessed an identical oxidative phosphorylation pathway that lacks cytochrome c oxidase, which was distinct from that of Z. palmae T109 T . In addition, no fermentation ability was previously detected for H. alkalilenta AW-7 T using standard incubation periods [1], although the pyruvate fermentation to acetate II MetaCyc pathway is encoded in both H. alkalilenta AW-7 T and Z. palmae T109 T . For this reason, the fermentation ability of H. alkalilenta AW-7 T was re-examined under prolonged incubation period using the EnteroPluri-Test (BD, USA). No fermentation reaction was observed for incubations up to 4-days, although, thereafter, a positive reaction was obtained for glucose (at the 5th day of incubation, without gas production) and dulcitol (at 9th day of incubation). H. alkalilenta AW-7 T could not ferment adonitol, lactose, arabinose and sorbitol after a 9-days incubation period. In agreement to what was previously reported by Ntougias et al. [1], no growth of H. alkalilenta AW-7 T was observed in the present study on yeast extract-peptone-glucose agar plates placed for an incubation period of 1 month in an anaerobic jar containing the Anaerocult A system (Merck). However, exposure of culture plates to oxygen led to fastidious growth. In this sense, it is concluded that H. alkalilenta AW-7 T can tolerate anaerobic conditions through a slow fermentation mechanism.

Genome sequencing and annotation
Genome project history H. alkalilenta AW-7 T was selected for sequencing on the basis of its phylogenetic position [19][20][21], and is part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes project [22] which aims not only to increase the sequencing coverage of key reference microbial genomes [23]. The genome project is accessible in the Genomes On Line Database [24] and the entire genome sequence is deposited in GenBank. Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [53] (See figure on previous page.) Fig. 1 Phylogenetic tree displaying the position of H. alkalilenta AW-7 T among the type strains of other species within the Halomonadaceae. The tree was inferred from 1152 aligned characters [38,39] of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion [40]. Tree branches are constructed on the basis of the expected number of substitutions per site. Values above branches denote support values from 100 ML bootstrap replicates [41]. Members of different genera within the Halomonadaceae are depicted in different fonts color. Lineages with strain genome sequencing projects registered in GOLD [24] are labeled with one asterisk, and those also listed as 'Complete and Published' with two asterisks Sequencing, finishing and annotation were accomplished by the DOE Joint Genome Institute [25] using state of the art genome sequencing technology [26]. The project information is summarized in Table 2.

Growth conditions and genomic DNA preparation
H. alkalilenta AW-7 T was cultivated aerobically in trypticase soy yeast extract medium at 28°C. Genomic DNA was obtained using the Invitrogen PureLink® Genomic DNA Mini Kit (Life Technologies Inc.) following the standard protocol. In addition, DNA prepared by the DSMZ is available via the DNA Bank Network [27].

Genome sequencing and assembly
The draft genome of was generated at the DOE Joint Genome Institute using the Illumina technology [28].
An Illumina std shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 13,537,536 reads totaling 2,030.6 Mb. All general aspects of library construction and sequencing performed can be found at JGI website [29]. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts (Mingkun L, et al., unpublished). Following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet (version 1.2.07) [30], (2) 1-3 kb simulated paired end reads were created from Velvet contigs using wgsim [31], (3) Illumina reads were assembled with simulated read pairs using Allpaths-LG (version r46652) [32]. Parameters for assembly steps were

Genome annotation
Genes were detected using the Prodigal software [33] at the DOE-JGI Genome Annotation pipeline [34,35]. The CDSs predicted were translated and searched against the National Center for Biotechnology Information nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction and functional annotation analysis was carried   [36]. The genome sequence and the annotations described in this paper are available from the Integrated Microbial Genome system [37].

Genome properties
The genome is 4,467,826 bp long and comprised of 40 scaffolds with 64.62 % average GC content (Table 3). A total of 4,104 genes were predicted, consisting of 4,028 protein-coding and 76 RNA genes. The majority of protein-coding genes (87.79 %) were assigned to a putative function, whereas the remaining ones were annotated as hypothetical proteins. Distribution of genes into COGs functional categories is displayed in Table 4.

Insights into the genome sequence
The genome size of H. alkalilenta AW-7 T (4.47 Mbp) is 50 % and 60 % greater than those of Z. palmae T109 T and C. nigrificans CTCBS1 T (2.73 and 2.98 Mbp) respectively. In H. alkalilenta AW-7 T , protein coding genes involved in the major functional categories (i.e., amino acid, carbohydrate and lipid metabolism, membrane transport, energy metabolism) are 50 % and 30 % greater in number than those detected in Z. palmae T109 T and C. nigrificans CTCBS1 T , respectively. Moreover, genes encoding xenobiotic metabolic proteins are 69 % and 57 % more in H. alkalilenta AW-7 T than those identified in Z. palmae T109 T and C. nigrificans CTCBS1 T respectively. Genome data uncovered the genetic ability of H. alkalilenta AW-7 T to degrade several recalcitrant substrates.
H. alkalilenta AW-7 T encodes the bioconversion of catechol and protocatechuate to β-ketoadipate via the β-ketoadipate and protocatechuate degradation II (ortho-cleavage) pathway respectively, as verified by the ability of strain AW-7 T to catabolize certain phenolic compounds. Aerobic benzoate degradation I is also encoded, permitting its catabolism via the catechol degrading pathway. Genes encoding fluoroacetate dehalogenase were identified in the genome of H. alkalilenta AW-7 T , indicating its ability for fluoroacetate degradation. The detection of genes involved in cyanate and acrylonitrile degradation was also verified. Lastly, H. alkalilenta AW-7 T is genetically able to produce ectoine and glycine betaine, which appear to serve as the main osmolytes for the adaptation of this species under high osmotic conditions. Based on genome metabolic features, H. alkalilenta AW-7 T is prototrophic for L-arginine, L-histidine, Lisoleucine, L-leucine, L-lysine, L-phenylalanine, L- The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome tryptophan, L-tyrosine and L-valine auxotroph, and Laspartate, L-glutamate, L-glutamine and glycine. Strain AW-7 T can synthesize selenocysteine but not biotin.

Conclusions
Genome sequence and biochemical data of the highly osmotolerant species Halotalea alkalilenta AW-7 T revealed the presence of an oxidative phosphorylation pathway that lacks cytochrome c oxidase, and the encoding of the pyruvate fermentation to acetate II (MetaCyc pathway). H. alkalilenta AW-7 T could ferment glucose and ducitol after a prolonged incubation period, which is indicative of the induction of a slow fermentation mechanism, and results in the emendation of the genus Halotalea Ntougias et al. 2007. Comparisons to its closest phylogenetic relatives Zymobacter palmae T109 T and Carnimonas nigrificans CTCBS1 T , confirm the distinct taxonomic position of H. alkalilenta AW-7 on the basis of its larger genome size and number of protein coding genes involved in the major functional categories and in xenobiotics metabolism. Furthermore, H. alkalilenta AW-7 T encodes the biotransformation of catechol and protocatechuate to β-ketoadipate via the βketoadipate and protocatechuate degradation II (orthocleavage) pathway respectively, verifying at the genome level the ability of strain AW-7 T to degrade phenolic compounds.

Ntougias et al. 2007
The description of the genus Halotalea is the one given by Ntougias et al. 2007 [1], with the following modification: Facultative anaerobe, which exhibits delayed glucose and dulcitol fermentation ability, and lacks cytochrome c oxidase activity.