Complete genome sequence of the molybdenum-resistant bacterium Bacillus subtilis strain LM 4–2

Bacillus subtilis LM 4–2, a Gram-positive bacterium was isolated from a molybdenum mine in Luoyang city. Due to its strong resistance to molybdate and potential utilization in bioremediation of molybdate-polluted area, we describe the features of this organism, as well as its complete genome sequence and annotation. The genome was composed of a circular 4,069,266 bp chromosome with average GC content of 43.83 %, which included 4149 predicted ORFs and 116 RNA genes. Additionally, 687 transporter-coding and 116 redox protein-coding genes were identified in the strain LM 4–2 genome. Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0118-6) contains supplementary material, which is available to authorized users.


Introduction
Bacillus subtilis LM 4-2 was a molybdenum-resistant strain isolated from a molybdenum mine. It has been reported that many microbes can resist the toxicity of molybdate ion though reduction of molybdate (Mo 6+ ) to Mo-blue. Molybdenum-reducing microorganisms came from a variety of genera and included the following species, Klebsiella spp. [1,2], Acidithiobacillus ferrooxidans [3], Enterobacter cloacae [4], Serratia marcescens [5,6], Acinetobacter calcoaceticus [7], Pseudomonas spp. [8], and Escherichia coli K12 [9]. The capability of molybdatereduction presents potential possibility of molybdenum bioremediationin many polluted areas [10]. Strain LM 4-2 showed stronger resistance to molybdate (up to 850 mM Na 2 MoO 4 ) than many other reported molybdenumresistant bacteria [11,12]. However, no information related to the molecular mechanism of molybdenum-resistance has been identified, also in genus Bacillus. Thus, strain LM 4-2 might be a perfect subject for us to unveil the mechanism and evaluate its possibility utilization in bioremediation. Here we present the complete genome sequence and detailed genomic features of B. subtilis LM 4-2.

Genome project history
Bacillus subtilis LM 4-2 was selected for sequencing due to its strong resistance to molybdate and potential utilization in bioremediation of molybdate-polluted areas.
The genome sequence was deposited in GenBank under accession number CP011101 and the genome project was deposited in the Genomes on Line Database [42] under Gp0112736. Genome sequencing and annotation were performed by Chinese National Human Genome Center at Shanghai. A summary of the project was given in Table 2.  for 4 h. After centrifugation (12,000 rpm) for 10 min, genomic DNA was extracted by phenol-chloroform methods as described previously [43]. DNA was dissolved in 2 mL sterilized deionized water with a final concentration of 12.67 μg/μL and 2.04 of OD260/OD280 ratio determined by NanoDrop 2000 spectrophotometer (Thermo Scientific, USA). The genomic DNA was stored in −20°C freezer.

Genome sequencing and assembly
The genome of Bacillus subtilis LM 4-2 was sequenced by a dual sequencing approach that using a combination of PacBio RS II and Genome Analyzer IIx sequence platforms. Approximately 121,583 PacBio and 1637 million Illumina reads were generated from PacBio platform and the Illumina platform (2 × 150 bp paired-end sequencing) with average sequence coverage of 213-and 409-fold.Sequence reads from the PacBio RS II were assembled by using hierarchical genome-assembly process assembler and finally only one self-cycled supper contig was generated. The Illumina reads were quality trimmed with the CLC Genomics Workbench and then utilized for error correction of the PacBio reads by using bowtie2 (version 2.1.0) software [44].  The map was generated with the DNAPlotter [54]. From outside to the center: the first two outer circles represent the positions of genes in the chromosome (Circle 1: plus strand, Circle 2: minus strand). Circle 3 represents tRNA genes (blue), Circle 4 represents G + C content, and Circle 5 represents GC skew

Genome annotation
The Glimmer 3.02 and GeneMark programs were used to predict the positions of open reading frames [45,46]. Protein function was predicted by the following methods: 1) homology searches in the GenBank and UniProt protein database [47]; 2) function assignment searches in CDD database [48]; and 3) domain or motif searches in the Pfam databases [49]. The KEGG database was used to reconstruct metabolic pathways [50]. Ribosomal RNAs and Transfer RNAs were predicted by using RNAmmer and tRNAscan-SE programs [51,52]. Transporters were predicted by searching the TCDB database using BLASTP program [27,53] with expectation value lower than 1e-05.

Genome properties
The complete strain LM 4-2 genome was composed of a circular 4,069,266 bp chromosome with an overall 43.83 % G + C content. Four thousand one hundred forty-nine ORFs, 10 sets of rRNA operons, and 84 tRNAs were predicted in the LM 4-2 genome (Table 3 and Fig. 3). Two thousand seven hundred forty-two of total 4149 predicted ORFs could be functional assignment, 1415 were annotated as hypothetical proteins. When analyzed for biological roles according to COG categories, amino acid transport and metabolism proteins accounted for the largest percent (7.18 %) of all functionally assigned proteins, followed by carbohydrate transport and metabolism proteins (6.89 %), and Transcription proteins (6.43 %). There are 687 transportercoding and 116 redox protein-coding genes were identified in the LM 4-2 genome. The distribution of genes into COGs functional categories is presented in Table 4.

Conclusions
Molybdenum pollution has been reported in water and soils all around the world [55]. Some Mo-resistance bacteria can be used to immobilize soluble molybdenum toinsoluble formsalong with reducing the toxicity. In this study we presented the complete genome sequence of Bacillus subtilis LM 4-2, which was isolated from a molybdenum mine in Luoyang city. Due to its strong resistance to molybdate and potential utilization in bioremediation of molybdate-polluted area, we sequence the genome and try to identify the possible molecular mechanism of molybdenum-resistance. Genomic analysis of strain LM 4-2 revealed 687 transporter-coding and 116 redox protein-coding genes were separated in the genome. Three genome islands were identified in the strain LM 4-2 genome, covering 2.71 % of the whole genome. Three gene clusters were involved in the non-ribosomal synthesis of lipopeptides, such as surfactin, fengycin, and dipeptide bacilysin. Additionally, one gene clusters for subtilosin A synthesis and one gene clusters for polyketide synthesis.
No CRISPRs were identified in the strain LM 4-2 genome.
The complete genome sequence of strain LM 4-2 will facilitate functional genomics to elucidate the molecular mechanisms that underlie molybdenumresistance and it may facilitate the bioremediation of molybdenum-contaminated areas.

Additional file
Additional file 1: Table S1. The results of ANI, AAI and GGDH value between genomes of strain LM 4-2 and other 30 complete sequenced B. subtilis species. (DOC 58 kb)

Competing interests
The authors declare that they have no competing interests.
Authors' contributions X-YY and HW participated in the design of the study, carried out the molecular genetic studies and drafted the manuscript. G-YR and XD performed the laboratory experiments. J-JL prepared the genomic DNA. H-JZ performed the bioinformatics analysis. Z-QJ conceived of the study and helped to draft the manuscript. All authors read and approved the final manuscript. The total is based on the total number of protein coding genes in the annotated genome