Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4

Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs in two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.


Introduction
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin [1] and its genome sequence was determined to gain insights into the mechanisms and pathways of PAH metabolism. D. acidovorans Cs1-4 was also unique as the strain in which the novel extracellular structures called nanopods were discovered [2]. Nanopods are extensions of the cell that consist of a surface layer protein encasing outer membrane vesicles ( Fig. 1; [2]). The specific functions of nanopods in D. acidovorans Cs1-4 are unknown. But, some connection to phenanthrene degradation is likely as growth on that compound induces production of these structures [2] and mutants disabled in production of nanopods are impaired in growth on phenanthrene [2,3].
Bacterial degraders of PAH can be grouped based on the amino acid similarities in the large subunit of the ring hydroxylating dioxygenase, which initiates PAH metabolism. Based on those RHD similarities, several groups of PAH-degrading bacteria have been resolved [4,5], and D. acidovorans Cs1-4 is the first representative of the group designated as the Phn family to have a full genome sequence determined. A draft genome sequence has also been determined for a second representative of the Phn family, Burkholderia sp. Ch1-1 (GenBank ADNR00000000; [6]).
Bacteria belonging to different RHD groups typically differ in other characteristics relevant to PAH metabolism including the range of PAH degraded, pathways for PAH metabolism and the structure of gene clusters encoding PAH degradation [7]. Furthermore, lateral transfer of RHD genes of different phylogenetic groups appears to be mediated by different types of mobile genetic elements [8]. The full genome sequence of D. acidovorans Cs1-4 can thus provide insights into a variety of aspects of PAH metabolism in general, and phenanthrene degradation in particular.

Classification and features
The genus Delftia is a phylogenetically cohesive group within the Comamonadaceae family of the Betaproteobacteria (Table 1, Fig. 2). At the time of writing, permanent draft genome sequences were publically available for four draft or finished genome sequences of D. acidovorans (including D. acidovorans Cs1-4). But, of those four genomes, only those of strains Cs1-4 and SPH1 (NC_010002) appear to be bone fide representatives of this species, as 16S rRNA sequences of the other two strains, CCUG 247B and CCUG 15835, best match to Delftia tsuruhatensis (Table 2). These species affiliations were supported by a phylogenetic tree of the Comamonadaceae, which resolved three species clusters within Delftia: D. acidovorans, D. tsruhatensis and D. litopenaei (Fig. 2). Strains CCUG 247B and CCUG 15835 clustered with D. tsruhatensis instead of D. acidovorans. A fourth species, D. lacustris, clustered with D. tsruhatensis and thus, in this analysis, did not have strong phylogenetic support as a distinct species (Fig. 2).
Delftia acidovorans Cs1-4 was isolated from soil contaminated by coal tar at the former site of a manufactured gas plant in Chippewa Falls, WI by using an enrichment culture with phenanthrene as the sole added carbon source [1]. Strain Cs1-4 has since been used as the model organism for the study of nanopods [2,3]. Delftia acidovorans Cs1-4 is deposited in the culture collection of the United States Department of Agriculture, Agricultural Research Service (Peoria, IL) as strain NRRL B-65277.

Genome sequencing and annotation
Genome project history The genome sequence was completed in May 2011 and presented for public access on December 2011 and is available in GenBank (NC_015563). Quality assurance of the genomic DNA preparation used for sequencing was done in the laboratory of W.J. Hickey at the University of Wisconsin-Madison. Sequencing, finishing and annotation were performed by the U.S. DOE JGI. A summary of the project information is shown in Table 3.

Growth conditions and genomic DNA preparation
Delftia acidovorans Cs1-4 was grown aerobically in Nutrient Broth at 30°C. DNA was isolated from 100 mL of overnight culture by a CTAB method [12]. Cells were collected by centrifugation (10,000 × g 10 min) and then resuspended in Tris-EDTA buffer to OD600 of 1.0. Lysozyme (100 mg/mL) was added followed by 10 % SDS and proteinase K (10 mg/mL). After incubation for 1 h at 37°C, 5 M NaCl and CTAB/NaCl were added, and the solution incubated at 65°C for 10 min. DNA was purified by phenol chloroform extraction, and then re-suspended in TE buffer with RNase (10 mg/mL). For quality confirmation, the DNA preparation was run on a 1 % agarose gel with phage λ DNA as a mass standard (DOE JGI).

Genome sequencing and assembly
The draft genome of D. acidovorans Cs1-4 was generated at the DOE JGI using a combination of Illumina and 454 technologies. An Illumina GAii shotgun library which generated 16,416,493 reads totaling 591 Mb, a 454 Titanium standard library which generated 770,171 reads and 1 paired end 454 library with an average insert size of 8 kb which generated 508,092 reads totaling 372.8 Mb of 454 data, was constructed and sequenced. The initial draft assembly contained 40 contigs in 2 scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together with Newbler, version 2.3. The Newbler consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled with VELVET, version 0.7.63, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. We integrated the 454 Newbler consensus shreds, the Illumina VELVET consensus shreds and the read pairs in the 454 paired end library using parallel phrap, version SPS -4.24 (High Performance Software, LLC). The

Genome annotation
Genes were identified using Prodigal as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline. The predicted CDSs were translated and used to search the National Center for Biotechnology Information non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE, RNAMMer, Rfam, TMHMM, and signalP. The final genome sequence is deposited in GenBank under accession NC_015563.

Genome properties
The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 % G + C content) containing with 6,028 predicted genes (Fig. 4, Table 4). There were 5,931 protein-encoding genes; 4,425 of these gene products were assigned to a putative function with the remaining annotated as hypothetical proteins (Table 5). There were 80 tRNA genes and five 16S rRNA genes. For the latter, three were 100 % identical to each other, while two were 99 % identical (DelCs14_R0076, DelCs14_R0098).

Insights from the genome sequence
Comparisons with other sequenced Delftia genomes The genome of strain Cs1-4 was similar to that of the other available Delftia genomes with respect to categories such as gene count and % G + C (Table 4). However, there was a clear distinction between these strains in the area of ribosomal genes, as their abundance was nearly identical in the genomes of strains Cs1-4 and SPH1, but much greater than that in  strains CCUG 274B and CCUG 15835. There was also a marked difference between the organisms in the area of function prediction, with strains Cs1-4 and SPH1 having a much larger portion of genes with function predicted than did strains CCUG 274B and CCUG 15835. This divergence of the latter two strains from strains Cs1-4 and SPH1 would further support re-assignment of strains CCUG 274B and CCUG 15835 to a Delftia species other than acidovorans (e.g., D. tsruhatensis).
The closest relative of D. acidovorans Cs1-4 with genome sequence data is D. acidovorans SPH1. Strain SPH1 was isolated from a sewage treatment plant in Germany as a part of a microbial consortium that degraded linear alkylbenzene sulfonates [13]. Strains Cs1-4 and SPH1 had 99 % identity over the full length of the 16S rRNA gene. Compared to the type strain D. acidovorans ATCC 15668, both strains Cs1-4 and SPH1 had 99 % identity over 99 % of the 16S rRNA gene. The genomes of strain Cs1-4 and strain SPH1 had an average nucleotide identity (ANI) of 97.48 %, which qualified the strains as the same species using the 95 % ANI threshold [14,15]. Based on this high level of nucleotide sequence similarity, a common species is reasonable for these bacteria. But, while the genomic composition of strains SPH1 and  Cs1-4 was similar, dot plots exhibited an "X" alignment ( Fig. 5), which is indicative of inversions about the origin of replication [16]. Alignment of the D. acidovorans Cs1-4 and SPH1 genomes revealed in the former a large genomic island (232 kb) termed the phn island that was absent from strain SPH1 [6]. The island contained near its 3' end a bacteriophage P4-like integrase, a type of enzyme often associated with chromosomal integration of mobile genetic elements [17]. Numerous close orthologs of the strain Cs1-4 integrase can be identified by BLAST-P searches of Genbank, and these may serve as starting points for the elucidation of mobile genetic elements possibly related to the phn island.
Conversely, genomic alignment of strains Cs1-4 and SPH1 revealed in strain SPH1 a mobile genetic element of ca. 67 kb, which was absent from strain Cs1-4. The loci bounding this element in strain SPH1 were (gene name, equivalent loci in strain Cs1-4) Daci_1694 (rpoH, Delcs14_4885) and Daci_1756 (inclR, Delcs14_4884). The mobile genetic element in strain SPH1 contained metal detoxification functions, and had extensive similarity to a region in the genomes of strains CCUG 274B and CCUG 15835.
The secretory machinery of strain Cs1-4 were similar to strain SPH1, as well as to strains CCUG 274B and CCUG 15835. These consisted of Type II, Type IV and Type VI secretion system (T6SS) along with the components of a Sec-Signal Recognition Particle Translocon and the tatABC genes of the Twin-Arginine Translocation pathway. For all strains, there was a single T6SS cluster. The functions of T6SS have been explored mostly in pathogenic bacteria, where a common feature is mediation of intercellular contact in antagonistic interactions [18]. Functions of T6SS in environmental bacteria such as the D. acidovorans strains discussed here are unknown.
Delftia acidovorans Cs1-4 produces a novel surface layer protein, which is essential for the formation of extracellular structure called nanopods [2]. The surface layer protein (termed Nanopod Protein A, NpdA) is encoded by DelCs14_5206 (npdA), and orthologs of npdA are present in the genomes of D. acidovorans SPH1 as well as D. acidovorans strains CCUG 274B and CCUG 15835. The total is based on the total number of protein coding genes in the annotated genome

Profiles of metabolic networks and pathways Characterization of phenanthrene catabolism genes
Genes for the entire phenanthrene catabolic pathway were identified on a novel 232 kb genomic island named the phn island [6] and were segregated into three clusters that were predicted to encode the metabolism of phenanthrene to ortho-phthalate (phn genes), ortho-phthalate to protocatechuate (oph genes) and meta-cleavage of protocatechuate to pyruvate and oxaloacetate (pmd genes). These clusters were non-contiguous; the oph and pmd clusters were separated from each other by ca. 5 kb and located 60 kb upstream of the phn cluster. The phn gene cluster had a %G + C content of ca. 50, which differed significantly from the 66.7 % G + C content of the Cs1-4 chromosome, while %G + C of the oph and pmd genes was similar to the chromosomal backbone. The G + C content of Comamonadacea genomes ranges from 60 % to 70 %, thus the phylogenetic origin of the phn genes is outside of this family and likely outside of the order Burkholderiales.

Styrene degradation via the phenyl acetate pathway
Styrene is often a soil pollutant and strain Cs1-4 possessed genes predicted to encode its degradation by the phenylacetate pathway. The putative styrene operon (conversion of styrene to phenyl acetate [19]), consisted of a regulatory element (marR, DelCs14_4846), dienelactone hydrolase (DelCs14_4847), flavin reductase (styB, DelCs14_4848), monooxygense (styA, DelCs14_48449), short chain dehydrogenase, (DelCs14_4850) and AraCtype transcriptional regulator (DelCs14_4851). The genetics of phenyl acetate metabolism has been studied in a variety of bacteria, and thirteen genes encoding its transformation to succinyl-CoA and acetyl-CoA have been identified [20,21]. These genes often occur in a single cluster, but in strain Cs1-4, they are dispersed across the genome. The single largest cluster contained paaABCDE (DelCs14_5720-24) and paaK (DelCs14_5725), which were predicted to encode a phenylacetyl-CoA epoxidase and phenylacetate-CoA ligase, respectively. Other orthologs that were identified included an epoxyphenylacetyl-CoA isomerase (paaG, DelCs14_0512) and a ring-opening enzyme (paaN, DelCs14_5726).

Benzoate degradation by the benzoyl-CoA pathway
Metabolism of benzoate is important in the degradation of many aromatic compounds, and benzoate degradation by aerobic bacteria is most commonly initiated by oxygenases. In contrast, growth of strain Cs1-4 on benzoate was predicted to proceed by an alternative pathway in which benzoyl-CoA is formed as the primary metabolite [22]. The gene cluster putatively conferring this function included an ABC-type transporter (DelCs14_0078-82), a benzoate-CoA ligase (DelCs14_0073) and a benzoate oxygenase (boxABC, DelCs14_0073-0075).

Conclusions
Determination of the complete genome sequence of D. acidovorans Cs1-4 achieved the objective of enabling new insights into the genes underlying PAH metabolism and evolutionary mechanisms that may shape the process in the environment. Furthermore, the genome sequence data suggested that biodegradative capacity of D. acidovorans Cs1-4 extended beyond PAH, and the organism was well equipped for growth in soils contaminated by a variety of compounds such as N-heterocycles and pesticides.