Complete genome sequence of the potato pathogen Ralstonia solanacearum UY031

Ralstonia solanacearum is the causative agent of bacterial wilt of potato. Ralstonia solanacearum strain UY031 belongs to the American phylotype IIB, sequevar 1, also classified as race 3 biovar 2. Here we report the completely sequenced genome of this strain, the first complete genome for phylotype IIB, sequevar 1, and the fourth for the R. solanacearum species complex. In addition to standard genome annotation, we have carried out a curated annotation of type III effector genes, an important pathogenicity-related class of genes for this organism. We identified 60 effector genes, and observed that this effector repertoire is distinct when compared to those from other phylotype IIB strains. Eleven of the effectors appear to be nonfunctional due to disruptive mutations. We also report a methylome analysis of this genome, the first for a R. solanacearum strain. This analysis helped us note the presence of a toxin gene within a region of probable phage origin, raising the hypothesis that this gene may play a role in this strain’s virulence.


Introduction
Ralstonia solanacearum is the causal agent of bacterial wilt, one of the most devastating plant diseases worldwide [1]. It is a highly diversified bacterial plant pathogen in terms of host range, geographical distribution, pathogenicity, epidemiological relationships, and physiological properties [2]. Strains are divided in four phylotypes, corresponding roughly to their geographic origin: Asia (phylotype I), the Americas (II), Africa (III), and Indonesia (IV) [3]. Strain UY031 belongs to phylotype IIB, sequevar 1 (IIB1), the group considered mainly responsible for bacterial wilt of potato in cold and temperate regions [4]. Phylotype IIB, sequevar 1 is also traditionally classified as race 3 biovar 2.
Strain UY031 was isolated in Uruguay from infected potato tubers in 2003 and displays high aggressiveness both on potato and tomato hosts [5]. This strain is being used as a model in plant-pathogen gene expression studies carried out by our group; having its genome available greatly facilitates the identification of pathogenicityrelated genes. Four other IIB1 R. solanacearum strains have been partially sequenced: UW551 [6], IPO1609 [7], NCPPB909 [8], and CFIA906 [8]. This is the first genome of this group to be completely sequenced, and the fourth within the R. solanacearum species complex (the other three are strains GMI1000 [9], Po82 [10] , and PSI07 [11]).

Classification and features
Ralstonia solanacearum UY031 strain is classified within the order Burkholderiales of the class Betaproteobacteria. It is an aerobic, non-sporulating, Gram-negative bacterium with rod-shaped cells ranging from 0.5 to 1.5 μm in length (Fig. 1, (a) and (b)). The strain is moderately fastgrowing, forming 3-4 mm colonies within 2-3 days at 28°C . On a general nutrient medium containing tetrazolium chloride and high glucose content, strain UY031 usually produces a diffusible brown pigment and develops pearly cream-white, flat, irregular, and fluidal colonies with characteristic pink whorls in the centre (Fig. 1, (c)). Strain UY031 was isolated from a naturally infected potato tuber showing typical brown rot symptoms (creamy exudates from the vascular rings and eyes of the tuber). This strain is highly pathogenic in different solanaceous hosts including important crops like tomato and potato [5]. Pathogenicity of this strain was also confirmed in several accessions  Evidence codes -IDA Inferred from direct assay, TAS Traceable author statement (i.e., a direct report exists in the literature), NAS Non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [37] of Solanum commersonii Dunal, a wild species considered as a valuable source of resistance for potato breeding. Due to its great aggressiveness, strain UY031 is being used for selection of resistant germplasm as part of the potato breeding program developed in Uruguay. This strain has been deposited in the CFBP collection of plant-associated bacteria, and has received code CFBP 8401. Minimum Information about the Genome Sequence of R.
solanacearum strain UY031 is summarized in Table 1, and a phylogenetic tree is shown in Fig. 2.

Genome project history
This sequencing project was carried out in 2015; the result is a complete and finished genome. Project data is available from GenBank (Table 2). Accession codes for reads in the Fig. 2 Phylogenetic tree highlighting the position of the Ralstonia solanacearum UY031 (shown in bold) relative to other strains from the same species. The phylogenetic tree was constructed using four conserved prokaryotic marker genes, namely: recA, rpoA, rpoB and rpoC. Each gene was aligned individually with MUSCLE [25]; the resulting multiple alignments were concatenated. PhyML [26] was used to perform tree reconstruction using the GTR model and 1,000 bootstrap replicas. Strain names are colour-coded according to the correspondent phylotype. GenBank accession numbers are displayed within brackets. Strains whose genome was completely sequenced are marked with an asterisk. Ralstonia pickettii 12 J (NCBI accession NC_010682) was used as an outgroup Growth conditions and genomic DNA preparation R. solanacearum strain UY031 was routinely grown in rich B medium (10 g/l bactopeptone, 1 g/l yeast extract and 1 g/ l casaminoacids). Genomic DNA was extracted from a bacterial culture grown to stationary phase to avoid overrepresentation of genomic sequences close to the origin of replication. Twelve ml of a culture grown for 16 h at 30°C and shaking at 200 rpm (OD 600 = 0.87) were used to extract DNA with Blood & Cell Culture DNA Midi kit (Qiagen), following manufacturer's instructions for gram-negative bacteria. DNA concentration and quality were measured in a Nanodrop (ND-8000 8-sample spectrophotometer).

Genome sequencing and assembly
Whole-genome sequencing was performed on the PacBio RS II platform at the Duke Center for Genomic and Computational Biology (USA). P5-C3 chemistry and a single SMRTcell were used, and quality control was performed with DUGSIM. The number of Pre-Filter Polymerase Read Bases was greater than 749 million (>130x genome coverage). Reads were assembled using RS_HGAP_Assembly.2 protocol from SMRT Analysis 2.3 [12]. This resulted in one circular chromosome (3,412,138 bp) and one circular megaplasmid (1,999,545 bp). These lengths are very similar to those of the corresponding replicons in R. solanacearum Po82, a IIB sequevar 4 strain, also a potato pathogen and which has also been completely sequenced [10]. The origin of replication was defined for both replicons based on the putative origin for reference strain GMI1000 [9]. An assembly quality assessment was performed before all downstream analyses. All reads were mapped back to the assembled sequences using RS_Resequencing.1 protocol from SMRT Analysis 2.3. This analysis revealed that chromosome and megaplasmid sequences had 100 % of bases called (percentage of assembled sequence with coverage > = 1) and 99.9999 % and 99.9992 %, respectively, of consensus concordance.

Genome annotation
Genome annotation was done using Prokka [13] with the option for ncRNA search. Type III effectors of strain UY031 were identified and annotated in three steps: First, 17 of the T3Es from the R. solanacearum species complex [14] were identified based on the Prokka annotations.
Second, the 15 T3Es annotated as "Type III Effector Protein", "Probable Type III Effector Protein" or "Putative Type III Effector Protein" by Prokka were manually annotated using the first BLAST [15] hits (usually 100 % identity) of their DNA sequences against genome sequences of phylotype IIB strains MOLK2 and Po82. Third, the UY031 genome was uploaded to the "Ralstonia T3E" web interface tool [14] to search for additional T3Es not annotated as such with Prokka. The additional 28 T3E genes identified were manually annotated as above. Homologous Gene Group clustering was performed with get_homologues [16] using the orthoMCL program [17] and requiring a minimum sequence identity in BLAST query/subject pairs of 30 %.
The sequencing plataform used to assemble the genome (PacBio RS II) also gives kinectics information about the sequenced genome. The presence of a methylated base in the DNA template delays the incorporation of the complementary nucleotide; such modifications in the kinectics may be used to characterize modified bases by methylation including: 6-mA, 5-mC and 4-mC [18]. The analysis of these modifications in a genome-wide and single-base-resolution scale allowed us to characterize the 'methylome' of this strain. These epigenetic marks are commonly used by bacteria, and its implications vary from a defense mechanism, protecting the cell from invading bacteriophages or other foreign DNA, to the bacterial virulence itself [19][20][21]. We performed methylome analysis and motif detection using RS_Modification_and_Motif_analysis.1 protocol from SMRT Analysis 2.3. Such epigenetic marks arise from DNA methyl-transferases, sometimes coupled with a restriction endonuclease (a Restriction-Modification System). We

Genome properties
The genome of R. solanacearum strain UY031 has one chromosome (3,412,138 bp) and one circular megaplasmid (1,999,545 bp) ( Table 3). The average GC content of the chromosome is 66.5 % while that of the megaplasmid is 66.7 %. A total of 4,778 genes (4,683 CDSs and 95 RNAs) were predicted. Of the protein-coding genes, 3,566 (76.1 %) had functions assigned while 1,212 were considered hypothetical (Table 4). Of all CDSs, 76.6 % could be assigned to one COG functional category and for 83.1 % one or more conserved PFAM-A domains were identified (Table 5).

Insights from the genome sequence
We performed a pan-genome analysis of the R. solanacearum UY031 genome, comparing it to four other genomes: two closely-related R. solanacearum strains (UW551 and IPO1609) and two others with complete genome sequences available (GMI1000 and Po82). The pan-genome consists of 7,594 HGGs while the core genome consists of 2,958 HGGs; the variable genome consists of 2,643 HGGs, and the number of strain-specific HGGs ranges from 193 to 774 (Fig. 3). We identified 193 HGGs that are UY031-specific; 75.1 % of them were annotated as hypothetical proteins. Type III effector genes are among the most important for virulence determinants in bacterial plant pathogens such as R. solanacearum [14]. Based on comparisons with effector gene sequences in public databases (see above) we have identified 60 T3Es (Table 6), of which 11 appear to be nonfunctional due to frameshifts or other

Not in COGs
The total is based on the total number of protein coding genes in the genome mutations that disrupt the coding sequence. For example, the effector RipS5 is encoded by a gene that has been clearly interrupted by a 34 kbp prophage. Table 6 also shows the orthologs of these genes in the related strains GMI1000, Po82, IPO1609, and UW551. In the table it can be seen that the genes that code for RipAA and RipAR have frameshifts or truncations in strain UY031 only. The absence of a particular effector may be enough for a pathogen to avoid host defenses, and therefore cause disease. These two genes are therefore a good starting point for additional investigations of phenotypic differences between these strains. Other effector genes of interest are those that are present and do not have disrupting mutations in UY031 but are absent or appear to be nonfunctional in other strains. We have found several such cases (Table 6), but in all cases there is at least one other strain that also has the same gene in what appears to be a functional state. Our modification analysis revealed two motifs that are essentially always methylated, namely: CAACRAC and GTWWAC. Both are fairly frequent in the genome, occurring respectively 2144 and 716 times. Motif CAACRAC is associated with the product of gene RSUY_11320 (R. Roberts, personal communication), which is hypothesized to be an enzyme of the Restriction-Modification System, with a restriction nuclease and a DNA methyltransferase role. This gene does not have homologs in other R. solanacearum strains and is located close to a region containing phage-related genes. This region contains gene RSUY_11410, which has been annotated as encoding a zonular occludens toxin. The provenance of this annotation is an enterotoxin gene found in Vibrio cholera [23]; in R. solanacearum the role of this toxin gene is still unclear [24]. Motif GTWWAC is probably associated with the product of gene RSUY_22890 (R. Roberts, personal communication), which is hypothesized to be a solitary DNA methyltransferase (no restriction endonuclease linked). This gene does have homologs in other R. solanacearum strains (GMI1000, IPO1609, Po82 and PSI07). To our knowledge this is the first R. solanacearum genome with a methylome profile available.

Conclusions
The complete sequence of R. solanacearum UY031 strain presented here should provide a rich platform upon which additional plant-pathogen studies can be carried out. Even though this is the fifth phylotype IIB1 sequenced, we found many differences with respect to the genomes of the other strains. In particular, the repertoire of T3E genes has many variations among these strains, and this may help explain some of the most relevant pathogenicityrelated phenotypes described in the literature, opening the way to new control methods for bacterial wilt.