Complete genome sequence of Staphylococcus aureus, strain ILRI_Eymole1/1, isolated from a Kenyan dromedary camel

We report the genome of a Staphylococcus aureus strain (ILRI_Eymole1/1) isolated from a nasal swab of a dromedary camel (Camelus dromedarius) in North Kenya. The complete genome sequence of this strain consists of a circular chromosome of 2,874,302 bp with a GC-content of 32.88 %. In silico annotation predicted 2755 protein-encoding genes and 76 non-coding genes. This isolate belongs to MLST sequence type 30 (ST30). Phylogenetic analysis based on a subset of 283 core genes revealed that it falls within the human clonal complex 30 (CC30) S. aureus isolate cluster but is genetically distinct. About 79 % of the protein encoding genes are part of the CC30 core genome (genes common to all CC30 S. aureus isolates), ~18 % were within the variable genome (shared among multiple but not all isolates) and ~ 3 % were found only in the genome of the camel isolate. Among the 85 isolate-specific genes, 79 were located within putative phages and pathogenicity islands. Protein encoding genes associated with bacterial adhesion, and secretory proteins that are essential components of the type VII secretion system were also identified. The complete genome sequence of S. aureus strain ILRI_Eymole1/1 has been deposited in the European Nucleotide Archive under the accession no LN626917.1. Electronic supplementary material The online version of this article (doi:10.1186/s40793-015-0098-6) contains supplementary material, which is available to authorized users.


Introduction
S. aureus is a bacterial species which has been isolated from diverse hosts including humans, other mammals and birds [1,2]. In humans, it is persistently present in the nares of approximately 20 % of all individuals and intermittently carried by nearly 30 % individuals [3]. S. aureus has been reported to be a common cause of wound infections, pneumonia and bacteraemia in humans in Kenya [4,5]. In small and large ruminants and pseudo ruminants such as dromedary camels (Camelus dromedaries), S. aureus causes mastitis and therefore negatively impacts the productivity of the dairy industry worldwide [6,7].
Zoonotic transmission of S. aureus has also been reported [8,9]. In arid and semi arid regions of the Greater Horn of Africa, camels represent an important and valuable livestock species that provides a significant percentage of the population with animal protein, particularly from milk [10]. Moreover, camel milk is often consumed raw without proper heat-treatment, which increases the risk of acquiring infections with zoonotic pathogens [11,12].
Currently our knowledge of bacterial pathogens in camels is rather limited [13]. S. aureus has been reported to cause infections of the skin, udder, eyes and joints [14][15][16][17] in camels. In North Kenya between 1999 and 2004, the prevalence of S. aureus in camels has been reported as 54 % in closed skin abscesses, 36 % in open skin abscesses, 39 % in skin necrosis and 31 % in lymph node abscesses [15]. A recent survey reports the prevalence of intramammary infections (IMI) associated with S. aureus as 11 % in lactating camels in Kenya [16]. A study has also reported genotype data and identified 'candidate' virulence factors of S. aureus strains in Middle Eastern camels [14].
Here we present the complete genome sequence, annotation and comparative analysis of the S. aureus ST30 strain ILRI_Eymole1/1 isolated from a nasal swab of a dromedary camel in Kenya.

Classification and features
The S. aureus strain ILRI_Eymole1/1 was isolated in Kenya in 2004 from a nasal swab of a camel. It was identified as a member of the Staphylococcus aureus species on the basis of standard microbiological procedures [18] combined with a species-specific PCR [19]. S. aureus is a Gram-positive, coccus shaped, non-motile, nonspore forming and facultative anaerobic bacterium. S. aureus were grown on agar. Agar pieces were cut out and fixed in 150 mM HEPES, pH 7.35, containing 1.5 % formaldehyde and 1.5 % glutaraldehyde for 30 min at room temperature and at 4°C over night. After dehydration in acetone and critical point drying, cells were gold sputtered and observed in a Philips SEM 505. Images were acquired using 10 kV at 10.000×/20 nm spot size or 40.00×/10 nm spot size. The bacterial cells are 0.5 to 1.0 mm in diameter, and occurs either singly or in the form of pairs or clusters (Fig. 1). The culture produces smooth, circular, glistening colonies of diameter > 5 mm. It produces a grey pigment. The general features of S. aureus strain Eymole1/1 are presented in Table 1 and Additional file 1: Table S1. The optimal growth temperature range is 37-42°C. Tolerance to NaCl was tested in liquid medium, LB with NaCl concentrations between 0 and 4 M NaCl. Cells were grown overnight at 37°C.
The sequence type of the S. aureus isolate was determined using a previously described MLST dataset [20]. ILRI_Eymole1/1 belongs to ST30 MLST group. A BLASTn search [21] of all five copies of 16S rRNA sequence of ILRI_Eymole1/1 using default parameters revealed 99-100 % identity (with 98-100 % coverage) with all available S. aureus genomes in the database. The phylogenetic relationship was established using the 16S rRNA sequences of the type strains defining the genus Staphylococcus (accession numbers are provided in Additional file 2: Table S2). In addition, the 16S rRNA sequences of 9 S. aureus isolates (NC_021554, NC_017333, NC_017349, NC_022113, NC_002952, NC_017342, NC_ 002758, NC_002745, NC_020529) were extracted from the genome sequences, and a neighbor joining phylogenetic tree was constructed with MEGA v.6.06 (Fig. 2). The tree illustrates the close relationship of S. aureus ILRI_Ey-mole1/1 with S. aureus isolates from ST 30, 36, 5, 45 and the S. aureus type strain L36472 (Fig. 2). The position relative to other species within the genus Staphylococcus is also illustrated. Bacillus subtilis type strain DSM10 was used as an outgroup for the genus Staphylococcus.

Genome sequencing information
Genome project history Twenty three strains of S. aureus have been isolated from healthy and diseased camels in East Africa using standard methods (1). The strains were isolated using primary cultivation on Columbia blood agar plates (Oxoid, UK) and were sub-cultured on mannitol-salt agar plates (Oxoid, UK). Afterwards the strains were subjected to multi locus sequence typing (2). Four strains belonged to sequence type 30, previously characterized in humans. The other isolates had novel sequence types that were likely to be camel specific. We selected the strain ILRI-Eymole1/1 for subsequent analysis since we wanted to elucidate the relationship of S. aureus isolates from camels and humans in order to project a zoonotic potential. S. aureus ILRI_Eymole1/1 was isolated from a nasal swab taken (transport Amies swab w/o charcoal, Copan, Italy) from a dromedary camel calf with rhinitis in Kenya in 2004.

Growth conditions and genomic DNA preparation
The strain was grown in 10 ml liquid Brain heart medium (Carl Roth, Germany) at 37°C and 200 rpm overnight. The strain was grown in 10 ml liquid Brain heart medium (Carl Roth, Germany) at 37°C and 200 rpm overnight. The bacterial cells were pelleted using centrifugation at 5000 × g for 20 min. The supernatant was discarded and cells were subjected to genomic DNA isolation using the PureLink™ Genomic DNA Mini Kit (Invitrogen, USA) according to vendor's instructions. The DNA was quantified using the Qubit® 3.0 Fluorometer (Thermo Scientific, Kenya) and the Qubit® dsDNA BR Assay Kit (Thermo Scientific, Kenya). The DNA concentration was 84.6 ng/ ul, the 260/280 and 260/230 ratios were 1.49 and 0.56, respectively. To remove impurities, the DNA was further cleaned using a ratio of 1.6 AMPure beads (ref).

Genome sequencing and assembly
Genome sequencing of S. aureus ILRI_Eymole1/1 was performed using the Illumina Genome Analyzer GAIIx platform. A 300 bp paired-end library with an average insert size of 550 bp was sequenced. The software MIRA v 4.0 [22] was used to assemble the S. aureus ILRI_Eymole1/ 1 genome, using as the input 1,154,246 Illumina pairedend reads. The de novo genome assembly generated a total Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [62] of 118 contigs with average coverage of 109 × and average quality of 83 ( Table 2). The whole genome alignment tools Mauve [23] and MUMmer v 3.2.2 [24] were used to order contigs of length greater than 1000 bp (69 contigs) against a reference genome sequence MRSA252/NC_002952 [25]. A complete genome sequence was obtained by joining the ordered contigs on the basis of their overlaps. The assembly output ACE file was viewed and analyzed in Tablet viewer version 1.13.05.17 [26].

Genome annotation
The complete genome sequence of S. aureus ILRI_Ey-mole1/1 was annotated using RAST [27]. Ribosomal RNA genes were identified using RNAmmer server v 1.2  [60,63] implementing a Neighbor-Joining method with 1000 bootstrap replications and a Kimura 2-parameter model [28], and the tRNA genes were predicted using tRNA scan-SE v 1.21 [29]. The COG genes and associated functional categories information were downloaded from the COG database [30]. The COG categories were assigned to the ILRI_Eymole1/1 genome annotation using blastp v 2.2.28 [31] against the COG genes collection/ myva. Genes with more than 40 % amino acid identity and with e-values of less than 0.00005 were classified as putative homologues of genes within the COG database, and functional categories were assigned. Signal peptides were predicted using the SignalP v 4.1 server [32], and the transmembrane helices/ membrane spanning domains were identified using TopPred2 [33]. Phage-like sequences were predicted using PHAST [34].

Genome properties
The S. aureus ILRI_Eymole1/1 genome is a circular chromosome of 2,874,302 bp with a GC-content of 32.88 %. A total of 2831 genes were predicted comprising 2755 protein encoding genes, 60 tRNA genes and 16 rRNA genes (Table 3, Fig. 3). Five copies of both 16S and 23S rRNA genes and six copies of 5S rRNA genes were identified. Among the predicted protein encoding genes, 652 (23.66 %) were hypothetical proteins. A total of 162 genes (5.88 %) were predicted to encode proteins with secretory signal peptides (potentially targeted to the secretory pathway) and 1040 (37.75 %) were genes encoding proteins with transmembrane helices or membrane spanning proteins. A total of 2054 (74.56 %) predicted genes were assigned to COG functional categories, while 701 (25.44 %) were not present within the COG collection (Table 4).

Insights from the genome sequence
We performed a comparative analysis of the camel S. aureus ILRI Eymole1/1 isolate of sequence type 30 with 16 previously sequenced ST30 S. aureus isolates, two ST36 methicillin resistance Staphylococcus aureus isolates MRSA252, EMRSA16 and one ST431 S. aureus isolate M809, which together belong to the clonal complex 30 (CC30). ST36 and ST431 are single locus MLST variants of ST30. Previously sequenced S. aureus complete genome sequences were downloaded from the NCBI FTP site [35] (accession numbers are provided in Additional file 3: Table  S3), and CC30 isolates were selected by analyzing their house keeping genes using the S. aureus MLST database [20]. The collection of draft S. aureus CC30 genomes was derived from previous studies [36,37].

Core genome analysis and COG classification
All 20 S. aureus CC30 genomes were annotated using the RAST server, and amino acid sequences of protein encoding genes from all CC30 genomes were used for the core genome analysis. Blastp searching of protein sequences of all CC30 isolates was carried out using local blastp v 2.2.28 [31]. The genes matching in all CC30 genomes with > =80 % identity, e-value < 0.00005, and alignment length > = 50 % were classified as core genes using custom scripts (Additional file 4: Supplementary material S4). The core genes were further analyzed for their COG functional classification using a matching criterion of > = 40 % identity and an e-value < 0.00005. Among 2163 core genes, 1810 (83.68 %) were present in COG database, whereas 353 (16.32 %) were not present  in COG database. The functional classification of these genes is shown in Fig. 4. S. aureus ILRI_Eymole1/1 variable genes (shared with some of the CC30 S. aureus genomes), and isolatespecific genes were also identified. We identified 2163 core genes (78.51 % of the total protein encoding genes), 507 (18.40 %) variable protein-encoding genes and 85 (3.09 %) isolate-specific genes.

Bacterial adhesins
The colonization and adhesion of S. aureus to the nasal epithelial cells is thought to be mediated by surface proteins ClfB, IsdA and the serine-aspartic acid repeat proteins SdrC and SdrE. A published study demonstrated that a mutant lacking these four proteins did not exhibit the adherence phenotype [38]. S. aureus ILRI_Eymole1/1 possesses genes encoding fibrinogen-binding protein ClfB (CEH27447), adhesin proteins SdrC (CEH25318) and SdrE (CEH25319), in common with a subset of the CC30 S. aureus genomes. A gene encoding Heme/ Iron regulated surface protein IsdA (CEH26009) was present among the core protein repertoire of ILRI_Eymole1/1 isolate, and is known to be important for S. aureus infection of human skin, through mediating resistance to skin innate defense mechanisms [39].
S. aureus ILRI_Eymole1/1 possessed genes encoding many fibrinogen-binding proteins, including clumping factor/ fibrinogen binding protein ClfA, (CEH26520: variable Fig. 3 Circular map of the S. aureus ILRI_Eymole1/1 genome. From outer to inner circle; 1) Protein encoding genes in forward orientation are shown in dark blue, 2) Protein encoding genes in reverse orientation are shown in light blue, 3) Ribosomal RNAs are depicted in green while tRNAs are in red, 4) G + C-content plot and 5) GC-skew graph. The Graph was generated using DNAPlotter [64] gene), fibronectin/ fibrinogen binding protein FnBP (CEH25930: core gene), extracellular fibrinogen-binding protein Efb (CEH25981: core gene), fibronectin binding protein FnbB (CEH27312: variable gene), fibronectin binding protein FnbA (CEH27314: variable gene), and clumping factor ClfB, fibrinogen binding protein (CEH27447: variable gene). These FnBPs bind the host fibronectin receptor β1-integrins to promote S. aureus invasion of various mammalian cells including epithelial cells, endothelial cells and fibroblasts. These cells do not require specific co-receptors for S. aureus [40].
A gene encoding an extracellular adherence protein of broad specificity Eap/Map (CEH26760: core gene) was also identified in the camel S. aureus isolate. This protein has been reported to be involved in S. aureus internalization into the host cells. Eap is known to be responsible for agglutination of bacterial cells by rebinding to the surface of S. aureus. It shows dual affinity for the cell surface plasma proteins as well as the bacterial surface. Eap plays a complementary role together with FnBP, in the internalization and long time persistence of S. aureus within eukaryotic cells. It was found to be a key component of the novel internalization pathway that works either in parallel with, or in addition to, the FnBP dependent internalization pathway [41].
Sec-independent Ess secretion pathway/ Type VII secretion system, T7SS Many Gram-positive bacterial species, including S. aureus, secrete exotoxins or virulence factors across the membrane, through signal peptides or the Sec translocon. A Sec-independent translocation of these factors has also been reported in Gram-positive bacteria. Human S. aureus has been shown to secret the ESAT-6-like secretory proteins EsxA and EsxB. The genes encoding these proteins cluster in the genome as an operon together with several additional genes to form a secretion system, known as the type 7 secretion system (T7SS) that is involved in bacterial pathogenicity [42,43]. The S. aureus isolate ILRI_Eymole1/1 possessed genes encoding proteins related to T7SS; which were also present in all other CC30 S. aureus genomes. These encoded the secretory antigen precursor SsaA (CEH25002), ESAT-6/Esx family secreted protein EsxA/YukE (CEH25003), putative secretion accessory protein EsaA/YueB (CEH25004), putative secretion system component EssA (CEH25005), putative secretion accessory protein EsaB/YukD (CEH25006), putative secretion system component EssB/YukC (CEH25007), and a FtsK/SpoIIIE family protein, together with putative secretion system component EssC/YukA (CEH25008).

Isolate specific genes
Out of the total 85 isolate specific genes encoded by S. aureus ILRI_Eymole1/1, 79 genes (92.94 %) were clustered into six large insertions. Four insertions were putative bacteriophages comprising four complete phages with sizes of 52.5 kb, 30 kb, 60.3 kb and 58.8 kb, respectively. All phage sequences possessed attL and attR integration sequences at the forward and reverse ends. Superantigen pathogenicity islands (SaPI) are mobile genetic elements in Gram-positive bacteria including S. aureus that carry genes associated with superantigens, virulence, resistance and metabolic functions; also named as S. aureus pathogenicity islands 'SaPI'. These are known for their strong association with temperate phages and result in high transfer frequencies [44]. Two insertions constituted complete SaPI islands (SaPIcam1 and SaPIcam2) at positions 426,323-443,273 and 758,187-774,130. These were confirmed by the identification of forward and reverse  acquisition of these genes through lateral gene transfer from either phages or heterologous bacterial species harboring these insertions.
Phylogeny using polymorphic set of core genes We determined the phylogenetic relationship among the isolates using a stringently defined set of 283 core genes that were shared among the CC30 isolates. Two S. aureus genomes from ST1 and ST5 were also included in this analysis as outgroups. The core genes were defined among these 22 S. aureus genomes using the criteria of (identity > = 95 % and < 100 %), (e-value < 0.00005) and (alignment length > = 90 %). Duplicate copies of genes were filtered out, resulting in the final total of 283 core genes. Multiple sequence alignment of the concatenated sequences of these genes was performed using the Mugsy aligner [47], generating an alignment comprising 316,359 nucleotides from each isolate. We estimated a maximum likelihood phylogeny using PhyML v. 3.0 [48]. The General Time-Reversible (GTR) model was used, where the base frequencies and the relative substitution rates between them were estimated by maximizing the likelihood of the phylogeny. For estimating the tree topology both nearest neighbor interchange and subtree pruning and regrafting methods were used. One hundred bootstrap replicates were run ( Fig. 5a and b).
In both rooted and unrooted trees ( Fig. 5a and b), the human CC30 isolates group in three clusters, in agreement with a published study [36]. The camel S. aureus isolate ILRI_Eymole1/1 clusters in the CC30 (Fig. 5a), but is genetically distant from human CC30 S. aureus isolates (Fig. 5b).