High quality draft genomes of the Mycoplasma mycoides subsp. mycoides challenge strains Afadé and B237

Members of the Mycoplasma mycoides cluster’ represent important livestock pathogens worldwide. Mycoplasma mycoides subsp. mycoides is the etiologic agent of contagious bovine pleuropneumonia (CBPP), which is still endemic in many parts of Africa. We report the genome sequences and annotation of two frequently used challenge strains of Mycoplasma mycoides subsp. mycoides, Afadé and B237. The information provided will enable downstream ‘omics’ applications such as proteomics, transcriptomics and reverse vaccinology approaches. Despite the absence of Mycoplasma pneumoniae like cyto-adhesion encoding genes, the two strains showed the presence of protrusions. This phenotype is likely encoded by another set of genes.


Introduction
The 'Mycoplasma mycoides cluster' comprises five species/subspecies, Mycoplasma mycoides subsp. mycoides, Mycoplasma leachii, Mycoplasma mycoides subsp. capri, Mycoplasma capricolum subsp. capripneumoniae and Mycoplasma capricolum subsp. capricolum [1,2]. Among them, Mycoplasma mycoides subsp. mycoides, the causative agent of contagious bovine pleuropneumonia (CBPP), is an economically very important bacterial bovine pathogen in sub-Saharan Africa. CBPP was first described in Europe already in 1773 [3], and the causative Mycoplasma was then cultivated and characterized in 1898 in Europe [4]. It has been shown that it spread from Europe to North America, Africa, Australia and Asia via livestock movements. Currently the disease is endemic and widespread in sub-Saharan Africa, ranging from western, central to eastern Africa. In Europe the last outbreaks were reported in Spain, Italy, Portugal and France in the 1980s and 1990s [5]. In comparison to other members of the' Mycoplasma mycoides cluster' , with the exception of Mycoplasma capricolum subsp. capripneumoniae, Mycoplasma mycoides subsp. mycoides shows limited sequence diversity, probably due to its recent emergence about 300 years ago [5,6].
Currently the complete genomes of only three Mycoplasma mycoides subsp. mycoides strains have been deposited in GenBank, the type strain PG1 [7], which is often used in laboratories but which is considered to be avirulent, the Australian outbreak strain Gladysdale [8] and a European outbreak strain 57/13 [9]. PG1 has been shown to differ genetically and phenotypically from field stains of Mycoplasma mycoides subsp. mycoides, showing attenuated cytotoxicity and reduced adhesion to bovine epithelial cells [5,10,11], most likely because of the multiple in vitro passages this strain underwent before being deposited in the strain collections. In particular strain PG1 contains 2 large 24 kb repeats while 27 field strains isolated from three different continents only contain one [11]. Strain Gladysdale was isolated from Australia around 1953 [12]. Strain 57/13 was isolated in Italy in 1992. Neither of these three strains, therefore, represent virulent African strains. The genetic diversity of Mycoplasma mycoides subsp. mycoides strains has been reported to be highest in Africa [5] where the disease is present in many countries of sub-Saharan Africa [13]. We sequenced and annotated the genomes of two virulent African strains Afadé and B237, which are frequently used as challenge strains in animal experiments [14][15][16][17][18]. The strains have been re-isolated directly from experimentally infected animals and have not been exposed to subsequent passaging beyond filtercloning to promote uniformity before genomic DNA was isolated for sequencing. The genomic sequence information from this work will contribute to comparative genomic analyses and therefore the characterization of the core and pan genome of the 'Mycoplasma mycoides cluster' and Mycoplasma mycoides subsp. mycoides in particular. The genomic information will also be useful for downstream 'omics' applications, such as proteomics, transcriptomics and reverse vaccinology approaches.

Classification and features
Mycoplasma mycoides subsp. mycoides is an obligate parasite, which resides in the respiratory tract of animals. It is a non-motile, non-sporulating bacterium. It lacks a cell wall and has a pleomorphic shape. Transmission electron microscopy images were generated for both Afadé and B237 strains (Fig. 1). Cell pellets were fixed in 150 mM HEPES, pH 7.35, containing 1.5 % formaldehyde and 1.5 % glutaraldehyde for 30 min at RT and at 4°over night. After dehydration in acetone and embedding in EPON, ultrathin sections of 40 nm were mounted on formvar-coated coppergrids, poststained with uranyl acetate and lead citrate [19] and observed in a Morgagni TEM (FEI). Images were taken with a side mounted Veleta CCD camera.
Interestingly the transmission electron microscopy revealed protrusions resembling the attachment organelle observed in Mycoplasma pneumonia [20][21][22][23]. The physiological function of these protrusions and branching phenotype needs to be defined in future studies. The general features of Mycoplasma mycoides subsp. mycoides strains Afadé and B237 are presented in Table 1 and Appendix: Table 6.

Genome project history
The sequencing and quality assurance was performed at Lausanne Genomic Technologies Facility, Center for Integrative Genomics, University of Lausanne, Switzerland. The assemblies and finishing were done at the Institute for Genome Sciences and International Livestock Research Institute. Functional annotation was produced by the Institute for Genome Sciences Analysis Engine [26] (http://www.igs.umaryland.edu/research/bioinformatics/ analysis/index.php). Table 2 presents the project information and itsassociation with MIGSversion 2.0 compliance [27].
Liquid cultures of Mycoplasma were filter cloned using a 0.22 μm filter to disrupt possible cell aggregates. A serial dilution (1/10 -1/10,000,000,000) was made immediately and 50 μl was plated on PPLO agar.
After 3-4 days of incubation at 37°C, a single colony was picked and was used to inoculate 4 ml of PPLO medium which was aliquoted and stored at −80°C.
Filter cloned Mycoplasma were grown overnight in 100 ml PPLO medium at 37°C. Before entering the stationary growth phase the culture was centrifuged at 2,862 g for 1 h, and the pellet was resuspended in 2.5 ml of TNE buffer (0.01 M Tris-HCl, pH 8.0; 0.01 M NaCl; 0.01 M EDTA). Subsequently 50 μl SDS (10 %) and 50 μl Proteinase K (20 mg/ml) were added and the tubes were incubated at 37°C for 2 h. After addition of 26 μl of 100 mM PMSF the tubes were incubated 15 min at room temperature, 25 μl of RNase A (10 mg/ml) was added, followed by incubation at 37°C for 1 hr. Sodium acetate and Phenol Saturated Buffer

Genome sequencing and assembly
The genome sequence of Mycoplasma mycoides subsp. mycoides strain Afadé was generated using a combination of Pacific Biosciences R.S. (PacBio) sequencing (65,280 reads/2853 bp average read length) and Illumina MiSeq sequencing (7,078,010 reads/295 average read length) downsampled to cover 50 times the expected genome size. The sequencing errors of the long PacBio single-molecule reads were corrected with the shorter, high accuracy Illumina reads using the Celera Assembler (CA) pacbio correction module PBcR (version 7.0, [28]). The resulting corrected PacBio reads were randomly sampled to 25 genome fold and assembled using CA (version 7.0, [29]) and yielded 18 contigs with a total size of 1,278,455 bp. Eight contigs comprised the draft genome of strain Afadé.
The whole genome sequence of Mycoplasma mycoides subsp. mycoides strain B237 was obtained using PacBio sequencing (59,775 reads/2674 average read length). Pacbio reads were corrected with PBcR self-correction module. Corrected reads randomly sampled to 25 genome fold were assembled with CA and yielded 2 contigs with total size of 1,208,895 bp. One long contigs comprises the entire genome and contained the other contig (5091 bp) in a repeat region. The final genome sequences had a 24-fold coverage for Afadé and 23-fold coverage for B237.
The contigs of both assemblies were aligned against the two Mycoplasma mycoides subsp. mycoides reference genomes of Gladysdale [8] and PG1 [7] available in Genbank (CP002107, NC_005364) using mummer [30] and we noticed that all small contigs (<15,000 bp) aligned to places Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [47] already covered in other bigger contigs. On closer inspection, most of these contigs aligned to a previously characterized 26 kb region [11], consisting of a tandem repeat of three 8 kb segments, interspersed with transposon elements. Due to its repetitive nature, this 26 kb region was not clearly resolved during the assembly process. In order to resolve part of it, we were able to design unique primer pairs and amplify two long-range PCRs fragments of 4,800 and 5,200 bp respectively. For each genome, both Sanger derived sequences were aligned to the assembled genomes before and after polishing with multiple iterations of the PacBio Quiver algorithm (version 0.9.0 [31]). We verified that in the regions covered by the Sanger sequences, all substitution mismatches were resolved by Quiver, however we manually fixed a few indels present in the post polishing alignment, which were not corrected by Quiver.

Genome annotation
Open reading frames (ORFs) were predicted using Prodigal 2.50 [32]. Functional annotation was produced by the Institute for Genome Sciences Analysis Engine [26].
We annotated the small contigs overlapping bigger ones described above separately and noticed that these contigs had more ambiguous characters and ORFs that were on average half the size of the corresponding ORFs in larger contigs (498 nt versus 920 nt). This was due to insertions and deletions. We therefore excluded the small contigs from the assemblies and report 1 contig for Mycoplasma mycoides subsp. mycoides strain B237 and 8 contigs for Mycoplasma mycoides subsp. mycoides strain Afadé.
We also reannotated the genomes of Mycoplasma mycoides subsp. mycoides strain PG1, Mycoplasma mycoides subsp. mycoides strain Gladysdale and Mycoplasma mycoides subsp. mycoides strain 57/13 using the same Engine, for ease of comparison.

Genome properties
The genomes of Mycoplasma mycoides subsp. mycoides strain Afadé and B237 have a total size of 1,190,241 bp and 1,203,804 bp, respectively. The GC-content of both genomes is 23.9 %. Both strains have two copies of the 12 kb and 13 kb repeat described in [11], the difference in size between the two genomes is therefore not due to a missing copy in Afadé.
A total of 1,124 ORFs as well as 30 tRNA and 2 copies of the 23S, 16S and 5S rRNA operons were predicted. The average gene length is 920 bp and 927 bp for Afadé and B237, respectively. The coding density of the genome is 86.7 %. Signal peptides were detected using pSortb v3.0 [33] and LipoP v1.0 [34]. Transmembrane helices were detected with the TMHMM server v2.0 [35,36]. CRISPR repeats were searched with the CRISPR Finding program online. The properties and the statistics of both genomes are summarized in Tables 3, 4, 5.

Insights from the genome sequence
The genomes of the two African strains Mycoplasma mycoides subsp. mycoides Afadé and B237 were compared to the three previously sequenced Mycoplasma mycoides subsp. mycoides strains Gladysdale, PG1 and 57/13 using CloVR and Sybil [37,38]. Figure 3 shows a synteny gradient of the aligned genomes. Although there are a high number of transposable elements in all genomes, no major rearrangements have been observed. These results fit well with the very recent emergence of the pathogen, estimated to be as young as 300 years, and the narrow host specificity of Mycoplasma mycoides subsp. mycoides [5].
The core genome length is 1,148,950 bp. A total of 773 SNPs were identified when comparing the five core genomes. Only 72 SNPs distinguish B237 from Afadé. Two hundred and sixty six SNPs separate the Australian and European strains Gladysdale and 57/13. PG1 is the most distant from the other four genomes with 399, 483, 465 to 425 SNPs when compared to Afadé, Gladysdale, 57/13 and B237, respectively. This confirms previous reports [5].
We looked for homologs to the Cytadhesin proteins P1, P30, P40. P65, P90, HMW1 and HMW3 from Mycoplasma pneumoniae in the Afadé and B237 proteomes using blastp. No significant hits were found for any of the proteins. Other proteins might be involved in the adhesion process and will need to be identified and characterized.  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Conclusions
The genomes of the two African strains as expected differ from the laboratory type strain PG1, the European outbreak strain 57/13 and the Australian outbreak strain Gladysdale. Therefore these genome sequences should be included in subsequent genome comparisons and 'omics' studies. The presence of protrusions and branching phenotypes in these two Mycoplasmas but the absence of protein encoding genes similar to the ones characterized in Mycoplasma pneumoniae indicates that other/novel proteins in the Mycoplasma genomes encode the development of protrusions and branching. The total is based on the total number of protein coding genes in the annotated genome