Permanent draft genome sequence of sulfoquinovose-degrading Pseudomonas putida strain SQ1

Pseudomonas putida SQ1 was isolated for its ability to utilize the plant sugar sulfoquinovose (6-deoxy-6-sulfoglucose) for growth, in order to define its SQ-degradation pathway and the enzymes and genes involved. Here we describe the features of the organism, together with its draft genome sequence and annotation. The draft genome comprises 5,328,888 bp and is predicted to encode 5,824 protein-coding genes; the overall G + C content is 61.58 %. The genome annotation is being used for identification of proteins that might be involved in SQ degradation by peptide fingerprinting-mass spectrometry.


Introduction
Pseudomonas putida strain SQ1 belongs to the family of Pseudomonadaceae in the class of Gammaproteobacteria. The genus Pseudomonas was first described by Migula (in the year 1894 [1]) and the species Pseudomonas putida by Trevisan (in 1889 [2]). P. putida strain KT2440 was the first strain whose genome had been sequenced (in 2002 [3]), and it is the most well-studied P. putida strain thus far [4]. Currently, there are more than 30 genome sequences of P. putida strains available (e.g., 12 complete and 24 draft genomes in NCBI; January 2015), including the complete genome sequence of type strain NBRC 14164 T [5]. P. putida species are highly abundant in water, soil and in the rhizosphere [6,7], can be plant-beneficial [8], and are extensively studied for their capabilities to degrade a broad range of substrates, especially aromatic compounds [9][10][11][12].
P. putida strain SQ1 was isolated for its ability to utilize the sulfonated plant sugar sulfoquinovose (6-deoxy-6-sulfoglucose) as a sole source of carbon and energy for growth, and was enriched from a sample of littoral sediment of pre-Alpine Lake Constance, Germany [13]. SQ is the polar headgroup of the plant sulfolipid sulfoquinovosyl diacylglycerol, which is present in the photosynthetic membranes of all higher plants, mosses, ferns and algae and most photosynthetic bacteria [14]. SQ is one of the most abundant organosulfur compounds in the biosphere, following glutathione, cysteine, and methionine, and the global production of SQ is estimated at 10 gigatons (10 10 tons) per year [15]. Hence, the complete degradation of SQ concomitant with a recycling of the bound sulfur in form of inorganic sulfate is an important process of the carbon and sulfur cycle, e.g. in soils.
Until today only one bacterial degradation pathway for SQ has been identified, 'sulfoglycolysis' in Escherichia coli K-12 [16]. In this pathway, SQ is catabolized in analogy to glucose-6-phosphate via an adapted Embden-Meyerhof-Parnas (glycolysis) pathway, involving four newly identified enzymes and genes, and four newly identified metabolites. The pathway yields dihydroxyacetone phosphate (DHAP), which drives energy metabolism and growth of E. coli, and sulfolactaldehyde, which is reduced to dihydroxypropanesulfonate and excreted [16]. For Pseudomonas species, it is well-known that these bacteria lack the key enzyme for glycolysis, phosphofructokinase, but that the alternative Entner-Doudoroff pathway is operative, i.e., an oxidative entry into glucose-6-phosphate catabolism via a dehydrogenase enzyme. We detected a SQ-dehydrogenase activity in crude extract of SQ-grown P. putida SQ1 cells, and we therefore suspect that a 'Sulfo-Entner-Doudoroff'-type of pathway might be operative in P. putida SQ1 for catabolism of SQ, but not sulfoglycolysis.
A draft genome sequence of strain SQ1 has been established and annotated in the IMG pipeline, and the annotation has been transferred to a proteomics (Mascot) database for peptide fingerprinting-mass spectrometry: in our present (unpublished) work, the database is used to identify enzymes and genes that are specifically induced during growth with SQ, e.g. in comparison to cells grown with glucose, by two-dimensional protein gel electrophoresis. Here, we present a summary classification and a set of features for Pseudomonas putida strain SQ1, together with the description of the shotgun genomic sequencing and annotation.

Organism Information
Classification and features P. putida SQ1 is a rod-shaped ( Fig. 1), motile, Gramnegative bacterium that grows aerobically in complex medium (e.g. LB-medium), or prototrophically in mineralsalts medium with a single carbon source (e.g., succinate, glucose, SQ). Strain SQ1 grows overnight on LB-agar plates and forms beige-whitish, smooth colonies (Table 1). Pseudomonas putida SQ1 has been deposited in the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures under reference number DSM 100120.
Based on its 16S rRNA gene sequence, strain SQ1 is a member of the genus and species Pseudomonas putida, which is placed in the family Pseudomonadaceae within the order Pseudomonadales of Gammaproteobacteria, as illustrated by a phylogenetic tree shown in Fig. 2. Currently, 1,732 genome sequences of member of the order Pseudomonadales of Gammaproteobacteria, and 707 genome sequences within the family Pseudomonadaceae have been established (IMG JGI, January 2015).

Genome project history
The DNA sample was submitted to GATC Biotech (Konstanz, Germany) in December 2012 where the whole-genome shotgun sequencing phase was completed in April 2013; the whole-genome shotgun sequencing was performed by GATC using the Illumina HiSeq2000 platform and a 100-bp paired-end library. After the mapping and de-novo assembly of the unmapped reads, which was done at the Genomics Center of the University of Konstanz, the draft genome sequence was uploaded into the IMG Pipeline for annotation and presented for public access on December 2014. The draft genome annotation is available at IMG under the IMG submission ID 14279, and was also deposited in Genbank under the accession number JTCJ00000000. Table 2 presents the project information and its association with MIGS version 2.0 compliance [17].

Growth conditions and genomic DNA preparation
Genomic DNA was extracted from an overnight culture of P. putida SQ1 grown at 30°C in LB medium (500-ml

Genome sequencing and assembly
The whole-genome shotgun sequencing was performed under contract by GATC Biotech (Konstanz, Germany) using the Illumina HiSeq2000 platform and a 100-bp paired-end library, which resulted in 23,816,201 sequenced reads (1.85 × 10 9 total bases). The trimming, mapping, as well as the de novo assembly of the unmapped raw reads, was performed at the Genomics Center of the University of Konstanz, Germany. First, the remaining adapters were removed and reads were trimmed by quality in CLC Genomics Workbench v6.5 (CLC bio, Aarhus, Denmark). In the next step, Bowtie v2.2.3 [18] was used to align the filtered reads against the genome of the closest relative, P. putida strain W619, to which 21,943,994 reads matched. These mapped reads were assembled with a reference-guided approach using the Columbus module implemented in Velvet v1.2.10 [19]. Velvet was then used to de novo assemble also all unmatched 1,872,207 reads (8.5 % of total reads). The whole process resulted in a total number of 1,634 contigs larger than 200 bp; the largest contig is 37,533 bp. The size of the draft genome is 5.3 Mb with 4,750,611 DNA coding bases, which is a normal size compared to other known P. putida genomes (range 3.0 to 7.1 Mb). The average G + C content is 61.58 %. At this time, no additional work is planned for this genome sequencing project (labeled as Permanent Draft).

Genome annotation
Genes were identified and auto-annotated in the DOE-IMG pipeline [20]. Genes were identified using Prodigal [21] and the predicted CDGs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt [22], TIGRFam [23], Pfam [24], KEGG [25], COG [26], and InterPro [27] databases. The tRNAscan-SE tool [28] was used to identify tRNA sequences, whereas ribosomal Fig. 2 Phylogenetic tree based on the 16S rRNA gene sequence of P. putida SQ1, and sequences of other strains of the species P. putida, P. aeruginosa and P. fluorescens. The sequences were aligned with the CLUSTAL W program and the tree was built with the neighbor-joining algorithm integrated in the MEGA 6.0 program [31]. The phylogenetic tree was tested with 1000 bootstrap replicates; bootstrap values are shown at each node. The scale bar represents a 0.005 % nucleotide sequence divergence , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [32] RNA sequences were identified by searches against models of the ribosomal RNA genes built from SILVA [29]. The RNA components of the protein secretion complex and the RNaseP were identified by searching the genome of the corresponding Rfam profiles using INFERNAL.

Genome properties
The draft genome assembly of P. putida SQ1 consists of 1,634 contigs with an overall G + C content of 61.58 %. For these contigs, 5,925 complete genes or partial genes at ends of contigs have been predicted, 5,824 (98.30 %) of which for protein-coding genes. 4,624 (78.04 %) of these were assigned to a putative function with the remaining annotated as hypothetical proteins. The draft genome annotation predicted also 101 (1.70 %) sequences of RNA coding genes. The properties and the statistics of the draft genome annotation are summarized in Table 3 and the distribution of genes into COGs functional categories is presented in Table 4. Currently, there are 50 genome sequencing projects for Pseudomonas putida strains registered in the JGI Genomes Online Database (GOLD), and 32 P. putida genome sequences (finished or permanent draft) are accessible within the IMG database (January 2015) for direct comparison; their genome sizes range between 3.0 Mb (P. putida MR3) and 7.1 Mb (P. putida S12), and their overall G + C content ranges between 60.81 % (P. putida MR3) and 63.14 % (P. putida CSV86). The genome sequence of P. putida W619 was chosen as reference genome for the mapping, as this genome showed the highest overall nucleotide sequence identity (91.9 %) of all genomes of P. putida strains that had been available at the time of sequencing. For comparison, the genome of the most well-studied P. putida strain, strain KT2440, shows 49.3 % overall nucleotide sequence identity to that of strain SQ1.
No candidate genes for a sulfoglycolytic pathway for SQ, as found in E. coli K12 [16], were detected in the draft genome sequence of strain SQ1, which supports the notion that a novel, alternative pathway for SQ is operative in strain SQ1 (see Introduction). Neither P. putida strains W619, KT2440 nor F1 grew with SQ when tested ( [13] and this study). Further, our preliminary proteomic data (not shown) indicates that enzymes/ genes of the 'classical' Entner-Doudoroff pathway for glucose/glucose-6-phosphate (see above) are highly induced during growth with glucose, as expected, but not during growth with SQ. We concluded that additional genes in P. putida strain SQ1 are involved in the utilization of SQ, and that these genes might be located on contigs that resulted from the de novo assembly of the un-mapped reads. If appropriate, the proteomic identification of the core enzymes of this novel SQ degradation pathway based on the draft genome sequence established in this study, and their confirmation by biochemical and analytical-chemical methods, will be reported in a future communication.

Conclusions
Here, we present a summary classification and a set of features for Pseudomonas putida strain SQ1, together with the description of the shotgun genomic sequencing and annotation. The draft genome annotation contains no candidate genes for a sulfoglycolytic pathway for SQ, as found in E. coli K12, hence, the pathway operative in P. putida SQ1 represents a second, yet unknown bacterial degradation pathway for SQ. Furthermore, our preliminary proteomic data suggested that the 'classical' Entner-Doudoroff enzymes for a utilization of glucose/ glucose-6-phophate are not induced during growth with SQ and that, hence, additional enzymes in strain SQ1 are operative during utilization of SQ. Based on the draft genome sequence, these enzymes and genes can now be defined. The total is based on the total number of protein coding genes in the annotated genome