High-quality permanent draft genome sequence of Rhizobium leguminosarum bv. viciae strain GB30; an effective microsymbiont of Pisum sativum growing in Poland

Rhizobium leguminosarum bv. viciae GB30 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of Pisum sativum. GB30 was isolated in Poland from a nodule recovered from the roots of Pisum sativum growing at Janow. GB30 is also an effective microsymbiont of the annual forage legumes vetch and pea. Here we describe the features of R. leguminosarum bv. viciae strain GB30, together with sequence and annotation. The 7,468,464 bp high-quality permanent draft genome is arranged in 78 scaffolds of 78 contigs containing 7,227 protein-coding genes and 75 RNA-only encoding genes, and is part of the GEBA-RNB project proposal. Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0029-6) contains supplementary material, which is available to authorized users.


Introduction
The most efficient biological nitrogen fixation occurs when bacterial microsymbionts (rhizobia) form an effective symbiotic association with legume host plants. Legumes can develop these interactions with many different species of rhizobia belonging mainly to the Alphaproteobacteria, including Azorhizobium, Allorhizobium, Bradyrhizobium, Ensifer, Mesorhizobium and Rhizobium [1,2]. The genus Rhizobium contains at the time of writing 71 species, and within a species there may be distinct symbiovars [3].
Within the species Rhizobium leguminosarum, there are three distinct symbiovars [4,5] including bv. phaseoli that forms nodules with Phaseolus vulgaris, bv. trifolii that forms nodules with clover (Trifolium) and bv. viciae that forms nodules on vetch, pea and lentil (Vicia, Lathyrus, Pisum and Lens). In R. leguminosarum the nod genes that define these distinct host specificities are mostly located on the symbiotic plasmid, which has generically been designated pSym. The genomes of R. leguminosarum strains are usually large and complex containing, in addition to pSym, a chromosomal replicon and extra-chromosomal low-copy-number replicons characterized by the presence of repABC replication systems [6][7][8]. Recent studies have revealed that substantial divergence can occur in this genome organization and in the metabolic versatility of R. leguminosarum isolates [5,[9][10][11][12]. Kumar et al. [5] demonstrated that the diversity of R. leguminosarum within a local population of nodule isolates was 10 times higher than that found for Ensifer medicae. It was noted that the abundance of a particular genotype within the population can vary significantly and adaptation to the edaphic environment is a sought after trait particularly for the development of inoculants [13,14].
R. leguminosarum bv. viciae GB30 was isolated as the most abundant nodule inhabitant (>42 %) of Pisum sativum cv. Ramrod plants cultivated at a field site in Janow, Poland [10]. In contrast to other abundant isolates, GB30 formed nodules and fixed nitrogen with both P. sativum and Vicia villosa (cv. Wista). Preliminary investigation into the genome architecture using Eckhardt analysis has revealed that GB30 contained a multipartite genome consisting of six replicons with one chromosome and five plasmids [10]. The genome of this strain could therefore provide important insights into the mechanisms required by effective R. leguminosarum microsymbionts to adapt to a particular edaphic environment. Here, we present a set of general features for Rhizobium leguminosarum bv. viciae GB30 together with the description of the complete genome sequence and annotation.

Organism information
Classification and features R. leguminosarum bv. viciae strain GB30 is a motile, Gram-negative rod in the order Rhizobiales of the class Alphaproteobacteria. The rod-shaped form varies in size with dimensions of 0.8-1 μm in width and 2.3-2.5 μm in length ( Fig. 1 Left and Center). It is fast growing, forming colonies within 3-4 days when grown on half strength Lupin Agar (½LA) [15] at 28°C. Colonies on ½LA are white-opaque, slightly domed and moderately mucoid with smooth margins (Fig. 1 Right). Figure 2 shows the phylogenetic relationship of Rhizobium leguminosarum bv. viciae GB30 in a 16S rRNA gene sequence based tree. This strain is phylogenetically most related to Rhizobium laguerreae FB206 T and Rhizobium gallicum R602sp T based on the 16S rRNA gene alignment with sequence identities of 100 %, as determined using the EzTaxon-e server [16]. Rhizobium laguerreae FB206 T was isolated from effective Vicia faba root nodules in Tunisia [17], whereas Rhizobium gallicum R602sp T was isolated from effective Phaseolus vulgaris root nodules in France [18]. Sequence similarity was also investigated with strains from the GEBA-RNB project [12] and GB30 was found to be closely related to R. leguminosarum bv. trifolii WSM1689 with 100 % 16S rRNA gene sequence identity. R. leguminosarum bv. trifolii WSM1689 is a highly effective microsymbiont of the perennial clover Trifolium uniflorum and has been shown to have a remarkable narrow host range [19]. Minimum Information about the Genome Sequence (MIGS) is provided in Table 1 and Additional file 1: Table S1.

Symbiotaxonomy
R. leguminosarum bv. viciae strain GB30 was obtained from pea nodules (P. sativum cv. Ramrod) growing in sandy loam (N:P:K 0.157:0.014:0.013 %) in Janow near Lublin (Poland). The soil contained a relatively high number of R. leguminosarum bv. viciae, bv. trifolii and bv. phaseoli cells i.e., 9.2 × 10 3 , 4.2 ÷ 10 3 and 1.5 × 10 3 bacteria/g of soil, respectively, as determined by the most probable number (MPN) method [10]. Plants were grown on 1 m 2 plot for six weeks between May and June, 2008. Five randomly chosen pea plants growing in each other's vicinity were harvested; the nodules were collected, surface-sterilized and the microsymbionts isolated [10]. One of the most abundant isolates, GB30, formed nodules (Nod + ) and fixed N 2 (Fix + ) with P. sativum and Vicia villosa (cv. Wista) increasing the wet mass weight by 54 and 38 %, respectively. Plants inoculated with GB30 also showed a 2.6 fold increase in nodule number and a 2.2 fold increase in seed pod number.

Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Genomic Encyclopedia of Bacteria and Archaea, The Root Nodulating Bacteria chapter (GEBA-RNB) project at the U.S. Department of Energy, Joint Genome Institute [12]. The genome project is deposited in the Genomes OnLine Database [20] and the high-quality permanent draft genome sequence in IMG [21]. Sequencing, finishing and annotation were performed by the JGI using state of the art sequencing technology [22]. A summary of the project information is shown in Table 2.
Growth conditions and genomic DNA preparation R. leguminosarum bv. viciae strain GB30 was grown to mid logarithmic phase in TY rich media [23] on a gyratory shaker at 28°C. DNA was isolated from 60 mL of cells  viciae GB30 (shown in blue print) relative to other type and non-type strains in the Rhizobium genus using a 901 bp internal region of the 16S rRNA gene. Bradyrhizobium elkanii ATCC 49852 T was used as outgroup. All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5.05 [36]. The tree was built using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [20] are shown in bold and have the GOLD ID mentioned after the strain number, otherwise the NCBI accession number has been provided. Finished genomes are designated with an asterisk using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [24].

Genome sequencing and assembly
The draft genome of Rhizobium leguminosarum bv. viciae GB30 was generated at the DOE Joint Genome Institute [22]. An Illumina Std shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 25,943,396 reads totaling 3,891.5 Mbp. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI web site [25]. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artefacts (Mingkun L, Copeland A, Han J. unpublished). Following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet version 1.1.04 [26] (2) 1-3 Kbp simulated paired end reads were created

Genome annotation
Genes were identified using Prodigal [29], as part of the DOE-JGI genome annotation pipeline [30,31]. The predicted CDSs were translated and used to search the National Centre for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [32] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [33]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [34]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes-Expert Review (IMG-ER) system [35] developed by the Joint Genome Institute, Walnut Creek, CA, USA.

Genome Properties
The genome is 7,468,464 nucleotides with 60.81 % GC content ( Table 3) and comprised of 78 scaffolds of 78 contigs. From a total of 7,302 genes, 7,227 were protein encoding and 75 RNA only encoding genes. The majority of genes (79.57 %) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.

Conclusion
Rhizobium leguminosarum bv. viciae GB30 belongs to a group of Alpha-rhizobia strains isolated from Pisum sativum in Poland. Strain GB30 is part of the GEBA-RNB project that sequenced 24 R. leguminosarum strains  The total is based on the total number of protein coding genes in the genome. and 12 R. leguminosarum bv. viciae strains [12]. Phylogenetic analysis revealed that GB30 is most closely related to Rhizobium leguminosarum bv. trifolii CB782 and WSM1689, both part of the GEBA-RNB project [12]. Full genome comparison of GB30 and WSM1689 [19] revealed that GB30 has the largest genome (7.4 Mbp), with the highest COG count (5,182), the lowest Pfam % (82.51) and the lowest TIGRfam % (22.13 %). The genome attributes of R. leguminosarum bv. viciae GB30, in conjunction with the other R. leguminosarum genomes, will be important for on-going comparative and functional analyses of the plant microbe interactions required for the successful establishment of agricultural crops.

Additional file
Additional file 1: Table S1. Associated MIGS record.