Genome sequence of the dark pink pigmented Listia bainesii microsymbiont Methylobacterium sp. WSM2598

Strains of a pink-pigmented Methylobacterium sp. are effective nitrogen- (N2) fixing microsymbionts of species of the African crotalarioid genus Listia. Strain WSM2598 is an aerobic, motile, Gram-negative, non-spore-forming rod isolated in 2002 from a Listia bainesii root nodule collected at Estcourt Research Station in South Africa. Here we describe the features of Methylobacterium sp. WSM2598, together with information and annotation of a high-quality draft genome sequence. The 7,669,765 bp draft genome is arranged in 5 scaffolds of 83 contigs, contains 7,236 protein-coding genes and 18 RNA-only encoding genes. This rhizobial genome is one of 100 sequenced as part of the DOE Joint Genome Institute 2010 G enomic E ncyclopedia for B acteria and A rchaea- R oot N odule B acteria (GEBA-RNB) project.


Introduction
Nodulated legumes are important and established components of Australian agricultural systems: the value of atmospheric nitrogen (N 2 ) fixed by rhizobia in symbiotic association with these legumes is estimated to be worth more than $2 billion annually [1,2]. The major agricultural region of south-western Australia has a Mediterranean climate, with soils that are often acid, have a low clay content and low organic matter, and tend to be inherently infertile [3,4]. The last forty years, however, have seen a sharp decrease in average winter rainfall by about 15-20% [5]. This, together with the development of dryland salinity [6], has challenged the sustainability of using the commonly sown subterranean clover and annual medics as pasture legumes in these systems. Alternative perennial legume species (and their associated rhizobia) are therefore being sought [2]. We have identified a suite of South African perennial, herbaceous forage legumes, including several species in the crotalarioid genus Listia (previously Lotononis) [7], that are potentially welladapted to the arid climate and acid, infertile soils of the target agricultural areas.
Listia species are found in seasonally wet habitats throughout southern and tropical Africa [8]. They produce stoloniferous roots [8,9] and form lupinoid nodules rather than the indeterminate type found in other crotalarioid species [7,10]. Rhizobial infection occurs by epidermal entry rather than via root hair curling [7]. Listia-rhizobia symbioses are highly specific. The tropically distributed L. angolensis forms effective (i.e. N 2 -fixing) nodules with newly described species of Microvirga [11], while all other studied Listia species are only nodulated by strains of pigmented methylobacteria [7,10,12]. Unlike the methylotrophic Methylobacterium nodulans, which specifically nodulates some species of Crotalaria [13], the Listia methylobacteria are unable to utilize methanol as a sole carbon source [14]. In Australia, strains of pigmented methylobacteria have been used as commercial inoculants for Listia bainesii and are able to persist in acidic, sandy, infertile soils, while remaining symbiotically and serologically stable [10,15].  A pigmented Methylobacterium strain, WSM2598, isolated from a root nodule of L. bainesii cv "Miles" in South Africa in 2002, was found to be a highly effective nitrogen fixing microsymbiont of both L. bainesii and Listia heterophylla (previously Lotononis listii) [10]. Here we present a set of preliminary classification and general features for Methylobacterium sp. strain WSM2598, together with the description of the genome sequence and annotation.

Organism information
Methylobacterium sp. strain WSM2598 is a motile, non-sporulating, non-encapsulated, Gram-negative rod with one to several flagella. It is a member of the family Methylobacteriaceae in the class Alphaproteobacteria. The rod-shaped form varies in size with dimensions of approximately 0.5 μm in width and 1.0-1.5 μm in length ( Figure 1 Left and 1 Center). WSM2598 is medium to slow growing, forming . All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5 [28]. The tree was built using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis [29] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain an accession number. Strains with a genome sequencing project registered in GOLD [30] are in bold print and the GOLD ID is mentioned after the accession number. Published genomes are designated with an asterisk. 0.5-1.5 mm diameter colonies within 6-7 days at 28°C. WSM2598 is pigmented, an unusual property for rhizobia. When grown on half strength Lupin Agar (½LA) [10], WSM2598 forms dark pink pigmented, opaque, slightly domed colonies with smooth margins (Figure 1 Right).

Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [30] and an improved-high-quality-draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 3.

Growth conditions and DNA isolation
Methylobacterium sp. WSM2598 was grown to midlogarithmic phase in TY rich media on a gyratory shaker at 28°C [32]. DNA was isolated from 60 mL of cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [33].

Genome sequencing and assembly
The draft genome of Methylobacterium sp. WSM2598 was generated at the DOE Joint Genome Institute (JGI) using Illumina technology [34,35]. For this genome, we constructed and sequenced an Illumina short-insert paired-end library with an average insert size of 270 bp which generated 19,048,548 reads and an Illumina longinsert paired-end library with an average insert size of 6354.14 +/− 3100.07 bp which generated 18,876,864 reads totaling 5,689 Mbp of Illumina data. (unpublished, Feng Chen). All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website. The initial draft assembly contained 141 contigs in 41 scaffold(s). The initial draft data was assembled with Allpaths, version 39750, and the consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet, version 1.1.05 [36] and the consensus sequences were computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second VELVET assembly was shredded into 1.5 Kbp overlapping fake reads. The fake reads from the Allpaths assembly and both Velvet assemblies and a subset of the Illumina CLIP paired-end reads were assembled using parallel phrap, version 4.24 (High Performance Software, LLC). Possible mis-assemblies were corrected with manual editing in Consed [37][38][39]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished), and sequencing of bridging PCR fragments with Sanger and/or PacBio (unpublished, Cliff

Genome annotation
Genes were identified using Prodigal [40] as part of the DOE-JGI Annotation pipeline [41], followed by a round of manual curation using the JGI GenePRIMP pipeline [42]. Within the Integrated Microbial Genomes (IMG-ER) system [43], predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [44], RNAMMer [45], Rfam [46], TMHMM [47], and SignalP [48]. Additional gene prediction analyses and functional annotation were performed within IMG.

Genome properties
The genome is 7,669,765 nucleotides with 71.17% GC content ( Table 4) and comprised of 5 scaffolds (Figure 3) of 83 contigs. From a total of 7,349 genes, 7,236 were protein encoding and 18 RNA only encoding genes. The majority of genes (71.22%) were assigned a putative function whilst the remaining genes were annotated as hypothetical.
The distribution of genes into COGs functional categories is presented in Table 5.

Conclusion
WSM2598 was sequenced as part of the DOE Joint Genome Institute GEBA-RNB project. In common with other sequenced rhizobial strains, WSM2598 has a comparatively large genome of around 7.69 Mbp, with a high proportion of genes assigned to the COG functional categories associated with transcription control and signal transduction (14.69%), transport and metabolism (29.38%) and secondary metabolite biosynthesis (3.12%). These features are characteristic of soil bacteria, which inhabit oligotrophic environments with typically diverse but scarce nutrient sources. Rhizobial methylobacteria are unusual, however, in that they form symbiotic associations exclusively with African crotalarioid legume hosts, several species of which are well-adapted to arid climates and acid, infertile soils and are therefore potentially useful pasture plants in marginal agricultural systems. The molecular basis for this symbiotic specificity has yet to be determined. As WSM2598 is highly effective for N 2fixation on several of these hosts, its sequenced genome is a valuable resource for gaining an understanding of symbiotic specificity and N 2 -fixation in a currently understudied group of legumes and rhizobia.

Additional file
Additional file 1: Table S1. Associated MIGS record.