Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission
© The Author(s) 2010
Published: 30 June 2010
Gluconacetobacter diazotrophicus PAl 5 is of agricultural significance due to its ability to provide fixed nitrogen to plants. Consequently, its genome sequence has been eagerly anticipated to enhance understanding of endophytic nitrogen fixation. Two groups have sequenced the PAl 5 genome from the same source (ATCC 49037), though the resulting sequences contain a surprisingly high number of differences. Therefore, an optical map of PAl 5 was constructed in order to determine which genome assembly more closely resembles the chromosomal DNA by aligning each sequence against a physical map of the genome. While one sequence aligned very well, over 98% of the second sequence contained numerous rearrangements. The many differences observed between these two genome sequences could be owing to either assembly errors or rapid evolutionary divergence. The extent of the differences derived from sequence assembly errors could be assessed if the raw sequencing reads were provided by both genome centers at the time of genome sequence submission. Hence, a new genome sequence standard is proposed whereby the investigator supplies the raw reads along with the closed sequence so that the community can make more accurate judgments on whether differences observed in a single stain may be of biological origin or are simply caused by differences in genome assembly procedures.
Gluconacetobacter diazotrophicus PAl 5 is a bacterial endophyte of sugarcane, originally isolated in Brazil  that provides fixed nitrogen to its plant host, in addition to increasing plant growth by mechanisms independent of nitrogen fixation [2,3]. The ability of G. diazotrophicus to increase growth and reduce plant dependence on nitrogen fertilization also makes it important to increasing the efficiency of biofuel production from sugarcane . Since it was first isolated, additional strains of G. diazotrophicus have been isolated in several other countries and plant hosts [5–10]. As a result, there has been great interest in sequencing the genome of G. diazotrophicus to guide further research on this bacterium and to better understand endophytic nitrogen fixation by comparative genomics with other sequenced nitrogen fixing endophytic bacteria.
Genome sequences of G. diazotrophicus PAl 5 were recently completed by two groups, RioGene in Brazil, funded by FAPRJ, and the US DOE Joint Genome Institute (JGI) in California, USA. The RioGene sequence has been published .
Although both groups reported the genome sequence of the same strain, the two genome sequences vary between each other in gene arrangement and plasmid content, suggesting either that the original templates for genome sequencing were different strains of that sequencing and/or assembly errors exist in one or both of the genome sequences. Here optical mapping was used to elucidate which genome assembly is more closely related to the physical genome of PAl 5.
Optical mapping creates a physical restriction map of a genome assembled from DNA molecules immobilized on a glass slide prior to digestion with a selected restriction enzyme, maintaining the original order of restriction fragments. After digestion, DNA is stained and visualized by fluorescent microscopy, and the resulting digitized images are analyzed in an assembly program to construct an optical restriction map of the genome of interest [12,13]. These optical maps can be compared to in silico digests of DNA sequences and have been employed in many sequencing studies, serving as scaffolds for contigs alignment, as well as an independent means of identifying errors (inversions, insertions, deletions, translocations, etc.) in previously assembled sequences [14–20]. Therefore, optical mapping was deemed to be an ideal tool to elucidate which PAl 5 genome sequence most closely matched the physical DNA of the strain (ATCC 49037).
Materials and Methods
The bacterial strain used in this study is Gluconacetobacter diazotrophicus PAl 5 obtained from the American Type Culture Collection (ATCC 49037). G. diazotrophicus PAl 5 was cultured on yeast mannitol (YM) agar and broth at 30°C.
Preparation of cells for optical mapping
G. diazotrophicus PAl 5 was grown in a 5 mL YM broth until the cells reached a density of 109 CFU/mL. The culture was dispensed into five 1.5 mL microcentrifuge tubes in 1 mL aliquots. Tubes were then centrifuged at 6,000 rpm for 10 minutes to pellet the cells. Tubes with cell pellets were shipped on dry ice to OpGen Technologies, Inc. (Madison, Wisconsin) for optical mapping.
Optical mapping and analysis
A BglII optical map of G. diazotrophicus PAl 5 was constructed by OpGen Technologies, Inc. (Madison, Wisconsin, USA). In silico BglII restriction maps of the two complete G. diazotrophicus PAl 5 genomic sequences on GenBank (GenBank # CP001189 and AM889285) were constructed from each sequence’s GenBank file and compared to the BglII optical map of PAl 5 using MapViewer version 2.1.1 (OpGen Technologies, Inc.). Plasmid sequences associated with each genome assembly did not align to the optical map and were therefore not included in the analysis.
Comparison of annotation
The annotations of the two genomic assemblies were determined using RAST ver. 2.0 . Genome and plasmid sequences for RioGene (GenBank# AM889285, AM889286, and AM889287) and JGI (GenBank# CP001189 and CP001190) were concatenated into single FASTA files prior to RAST analysis. Annotations determined by RAST were compared using the SEED viewer (ver. 2.0) (22) based on percent identity between coding sequences (CDS) and the functional roles assigned to annotated genes.
Optical map of G. diazotrophicus PAl 5
Optical and in silico BglII restriction maps for G. diazotrophicus PAl 5
In silico BgIII restriction map
Map Length (bp)
Number of Fragments
Average fragment length (bp)
Maximum fragment length (bp)
Minimum fragment length (bp)
Identification of sequence rearrangements using optical mapping
Rearrangement positions in G. diazotrophicus PAl 5 genome sequence from RioGene
Location in genome (bp)
Regions of in silico maps not aligned to the G. diazotrophicus PAl 5 optical map
Length of the fragments (bp)
Total length of unaligned regions
Average unaligned fragment length
Maximum unaligned fragment length
Minimum unaligned fragment length
Differences in annotation between genome sequences
Comparison of coding sequences between G. diazotrophicus PAl 5 genome sequences based on percent identity
Percent identity to comparison genome
Number of CDS
Percent of total CDS
Number of CDS
Percent of total CDS
Unique functional roles between G. diazotrophicus PAl 5 genome sequences
Roles Unique to JGI
Roles Unique to RioGene
Ribose ABC transport system, periplasmic ribose-binding protein RbsB (TC 3.A.1.2.1)
Sorbitol dehydrogenase (EC 18.104.22.168)
D-alanine--D-alanine ligase (EC 22.214.171.124)
Transketolase, C-terminal section (EC 126.96.36.199)
UDP-N-acetylenolpyruvoylglucosamine reductase (EC 188.8.131.52)
Transketolase, N-terminal section (EC 184.108.40.206)
Organic hydroperoxide resistance protein
COG0028: Thiamine pyrophosphate-requiring enzymes
Organic hydroperoxide resistance transcriptional regulator
D-galactonate regulator, IclR family
Molybdenum cofactor biosynthesis protein B
Epi-inositol hydrolase (EC 3.7.1.-)
Flagellar biosynthesis protein fliL
Chromosome partition protein smc
Flagellar hook-associated protein flgL
dTDP-rhamnosyl transferase RfbF (EC 2.-.-.-)
Deoxyuridine 5′-triphosphate nucleotidohydrolase (EC 220.127.116.11)
Protein of unknown function DUF374
Aminopeptidase S (Leu, Val, Phe, Tyr preference) (EC 3.4.11.-)
Nicotinate-nucleotide adenylyltransferase (EC 18.104.22.168)
Leucyl/phenylalanyl-tRNA—protein transferase (EC 22.214.171.124)
DNA repair exonuclease family protein YhaO
Cysteinyl-tRNA synthetase (EC 126.96.36.199)
ATP-dependent DNA helicase UvrD/PcrA, proteobacterial paralog
Outer membrane lipoprotein carrier protein LolA
DNA-binding response regulator KdpE
Osmosensitive K+ channel histidine kinase KdpD (EC 2.7.3.-)
Potassium-transporting ATPase A chain (EC 188.8.131.52) (TC 3.A.3.7.1)
Potassium-transporting ATPase B chain (EC 184.108.40.206) (TC 3.A.3.7.1)
Beta-hexosaminidase (EC 220.127.116.11)
Potassium-transporting ATPase C chain (EC 18.104.22.168) (TC 3.A.3.7.1)
Protein-export membrane protein secD (TC 3.A.5.1.1)
H+/Cl- exchange transporter ClcA
Transposases in G. diazotrophicus PAl 5 genome sequences
Total transposase genes
Transposase (class II)
Transposase (class III)
Transposase (class IV)
Transposase IS3 family protein
Transposase IS3/IS911 family protein
Transposase IS4 family protein
Transposase IS5 family protein
Isrso16-transposase OrfA protein
Transposase and inactivated derivative
Transposase mutator type
Probable insertion sequence transposase protein
The construction of two different genome sequences from the same bacterium, G. diazotrophicus PAl 5 (ATCC 49037), demonstrated the need to confirm the sequence and assembly of these genomes through an independent method. In the current study, optical restriction mapping was used to distinguish between the discordant genomic assemblies, since this technique maintains the order of restriction fragments in the mapping process.
When comparing, the two PAl 5 genome sequences to an optical map, the resulting analysis led to the determination that the sequence reported by JGI is a more accurate representation of the PAl 5 strain (ATCC 49037) while the sequence reported by RioGene contained numerous rearrangements. The size and number of chromosomal rearrangements identified in the RioGene sequence of G. diazotrophicus PAl 5 was high, with nearly the entire sequence composed of regions that were inverted, translocated, or not aligned to the PAl 5 optical map. In contrast, only a few small inversions were detected in the JGI PAl 5 sequence. In addition, annotation of the two genome sequences found that approximately 5% of the CDS in each genome sequence were unique. This is a surprisingly high amount considering the two genomes are reported to be from the same strain and much greater than would be expected from the observed sequence rearrangements.
There are a few possibilities for the differences between these two PAl 5 genome sequences. One explanation is natural divergence due to rapid evolution that could occur during culturing. This explanation was also suggested by the RioGene sequencing group . However, the extremely high level of differences between the two sequences indicates other factors may have also contributed. For example, in the case of E. coli, comparison of the sequenced K-12 strain to the optically mapped H10407 strain revealed no major structural differences . In the case of M. avium subspecies paratuberculosis, only one inversion between the sequenced strain, K-10, and the optically mapped strain, ATCC 19698, was detected, and that inversion was subsequently determined to be an assembly error rather than a true chromosomal rearrangement .
Another explanation for the differences between the RioGene and JGI sequences is the different approaches taken by the two groups in genome assembly. To test this possibility, the raw reads from both projects are required. The 46,603 sequence traces from the JGI sequencing effort of this strain are publicly available while the traces from the RioGene project are not. The quality scores of the bases are not available from either project. While many studies have reported using optical maps to aid in genome assembly and identification of assembly errors prior to completion, fewer have reported using this technique to identify errors in previously completed genomes. After successfully using optical mapping to aid in assembling the genome of Xenorhabdus nematophila, Latreille et al.  used the same technique on a previously sequenced relative, X. bovienii, identifying a large inversion in the genome assembly that had previously been considered finished. In addition, optical mapping has also been used to verify assemblies between strains of the same species. In the case of Mycobacterium avium subspecies paratuberculosis, an optical map of the ATCC type strain was used to reveal the presence of an inversion in the genome of the sequenced strain, which was determined to be due to an assembly error rather than genomic variation between strains . These two instances illustrate how even complete, published genome sequences may contain significant assembly errors, indicating that caution should be taken when looking at assemblies where optical mapping was not utilized.
If the breakpoints of assembly errors occur within a coding region, such errors could alter the annotation of the genome. For example, when the inversion in the sequence of M. avium subspecies paratuberculosis K-10 was corrected, two new genes were identified . Therefore, the annotation of the PAl 5 sequences from both RioGene and JGI were determined and compared using the RAST on-line annotation pipeline . Six percent of the CDS from each genome shared less than 50% identity when compared against each other and approximately 5% shared zero percent identity. Again, this number of differences at the sequence and gene level was surprising considering the genomes are reported from the same strain, even given chromosomal rearrangements. Annotation of both genomes also revealed that the RioGene sequence possessed almost twice as many transposases as the JGI sequence. The strikingly high number of transposases in the RioGene sequence in relation to the JGI sequence suggests the possibility that some of the sequence rearrangements seen may be the result of transposition. Alternatively, since 16 of the transposases originated from IS sequences, which are flanked by inverted repeats , there is also the possibility that these repeated regions caused errors in assembly.
The observations made here confirm the utility of optical mapping in determining proper assembly of genomic sequences and identifying potential chromosomal rearrangements. It also highlights the need to provide raw reads and quality scores when submitting genomes to allow for independent confirmation of assembly. As technology advances, data from instances where contradictory sequences are observed could be reanalyzed in order to clarify results.
The rearrangements in the genome sequence of G. diazotrophicus PAl 5 may not have been identified had JGI not released a conflicting genome sequence of the same strain that prompted further investigation. As sequencing the genome of a single bacterial strain is not usually performed separately by different groups, the possibility remains that other previously released genomes could contain similar differences compared to other bacterial isolates under the same ATCC strain designation. Such rearrangements in genome sequences of the same strain could confound future work using comparative genomics to look for variations between closely related organisms. In such cases, the best tool to distinguish actual variations between organisms will be optical mapping. Consequently, the submission of raw sequencing reads with quality scores is proposed as a new genome sequencing standard when submitting completed genomes to GenBank or other repository.
This research was supported by grants from the Florida Agricultural Experiment Station, the National Science Foundation (MCB-0454030), and the United States Department of Agriculture (2005-35319-16300). RAST is supported in part by National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services (NIAD) under contract HHSN266200400042C.
- Gillis M, Kersters K, Hoste B, Janssens D, Kroppenstedt RM, Stephan MP, Tetxeira KRS, Dobereiner J, De Ley J. Acetobacter diazotrophicus sp. nov, a nitrogen-fixing acetic-acid bacterium associated with sugarcane. Int J Syst Bacteriol 1989; 39:361–364. doi:10.1099/00207713-39-3-361View ArticleGoogle Scholar
- Lee S, Flores-Encarnacion M, Contreras-Zentella M, Garcia-Flores L, Escamilla JE, et al. Indole-3-acetic acid biosynthesis is deficient in Gluconacetobacter diazotrophicus strains with mutations in cytochrome c biogenesis genes. J Bacteriol 2004; 186:5384–5391. PubMed doi:10.1128/IB.186.16.5384-5391.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Sevilla M, Burris RH, Gunapala N, Kennedy C. Comparison of benefit to sugarcane plant growth and 15N2 incorporation following inoculation of sterile plants with Acetobacter diazotrophicus wild type and Nif-mutant strains. Mol Plant Microbe Interact 2001; 14:358–366. PubMed doi:10.1094/MPMI.2001.14.3.358View ArticlePubMedGoogle Scholar
- Boddey RM. Biological nitrogen-fixation in sugarcane — a key to energetically viable biofuel production. Crit Rev Plant Sci 1995; 14:263–279. doi:10.1080/713608118View ArticleGoogle Scholar
- Hoefsloot G, Termorshuizen AJ, Watt DA, Cramer MD. Biological nitrogen fixation is not a major contributor to the nitrogen demand of a commercially grown South African sugarcane cultivar. Plant Soil 2005; 277:85–96; doi:10.1007/s11104-005-2581-0.View ArticleGoogle Scholar
- Boddey RM, de Oliveira OC, Urquiaga S, Reis VM, de Olivares FL, et al. Biological nitrogen-fixation associated with sugarcane and rice — contributions and prospects for improvement. Plant Soil 1995; 174:195–209. doi:10.1007/BF00032247View ArticleGoogle Scholar
- Fuentes-Ramirez LE, Jimenez-Salgado T, Abarcao-Campo IR, Caballero-Mellado J. Acetobacter diazotrophicus, an indoleacetic-acid producing bacterium isolated from sugarcane cultivars of Mexico. Plant Soil 1993; 154:145–150. doi:10.1007/BF00012519View ArticleGoogle Scholar
- Paula MA, Reis VM, Döbereiner J. Interactions of Glomus clarum with Acetobacter diazotrophicus in infection of sweet-potato (Ipomoea batatas), sugarcane (Saccharum spp.), and sweet sorghum (Sorghum vulgare). Biol Fertil Soils 1991; 11:111–115. doi:10.1007/BF00336374View ArticleGoogle Scholar
- Jimenez-Salgado T, Fuentes-Ramirez LE, Tapia-Hernandez A, Mascarua-Esparza MA, Martinez-Romero E, et al. Coffea arabica L., a new host plant for Acetobacter diazotrophicus, and isolation of other nitrogen-fixing acetobacteria. Appl Environ Microbiol 1997; 63:3676–3683. PubMedPubMed CentralPubMedGoogle Scholar
- Tapia-Hernández A, Bustillos-Cristales MR, Jimenez-Salgado T, Caballero-Mellado J, Fuentes-Ramirez LE. Natural endophytic occurrence of Acetobacter diazotrophicus in pineapple plants. Microb Ecol 2000; 39:49–55. PubMed doi:10.1007/s002489900190View ArticlePubMedGoogle Scholar
- Bertalan M, Albano R, Padua VD, Rouws L, Rojas C, et al. Complete genome sequence of the sugarcane nitrogen-fixing endophyte Gluconacetobacter diazotrophicus Pal5. BMC Genomics 2009; 10:450. PubMed doi:10.1186/1471-2164-10-450PubMed CentralView ArticlePubMedGoogle Scholar
- Aston C, Mishra B, Schwartz DC. Optical mapping and its potential for large-scale sequencing projects. Trends Biotechnol 1999; 17:297–302. PubMed doi:10.1016/S0167-7799(99)01326-8View ArticlePubMedGoogle Scholar
- Zhou S, Kile A, Bechner M, Place M, Kvikstad E, et al. Single-molecule approach to bacterial genomic comparisons via optical mapping. J Bacteriol 2004; 186:7773–7782. PubMed doi:10.1128/JB.186.22.7773-7782.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Latreille P, Norton S, Goldman BS, Henkhaus J, Miller N, et al. Optical mapping as a routine tool for bacterial genome sequence finishing. BMC Genomics 2007; 8:321. PubMed doi:10.1186/1471-2164-8-321PubMed CentralView ArticlePubMedGoogle Scholar
- Lim A, Dimalanta ET, Potamousis KD, Yen G, Apodoca J, et al. Shotgun optical maps of the whole Escherichia coli O157: H7 genome. Genome Res 2001; 11:1584–1593. PubMed doi:10.1101/gr.172101PubMed CentralView ArticlePubMedGoogle Scholar
- Reslewic S, Zhou SG, Place M, Zhang YP, Briska A, et al. Whole-genome shotgun optical mapping of Rhodospirillum rubrum. Appl Environ Microbiol 2005; 71:5511–5522. PubMed doi:10.1128/AEM.71.9.5511-5522.2005PubMed CentralView ArticlePubMedGoogle Scholar
- Wu CW, Schramm TM, Zhou SG, Schwartz DC, Talaat AM. Optical mapping of the Mycobacterium avium subspecies paratuberculosis genome. BMC Genomics 2009; 10:25. PubMed doi:10.1186/1471-2164-10-25PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou S, Bechner MC, Place M, Churas CP, Pape L, et al. Validation of rice genome sequence by optical mapping. BMC Genomics 2007; 8:278. PubMed doi:10.1186/1471-2164-8-278PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou S, Deng W, Anantharaman TS, Lim A, Dimalanta ET, et al. A whole-genome shotgun optical map of Yersinia pestis strain KIM. Appl Environ Microbiol 2002; 68:6321–6331. PubMed doi:10.1128/AEM.68.12.6321-6331.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou SG, Kile A, Kvikstad E, Bechner M, Severin J, et al. Shotgun optical mapping of the entire Leishmania major Friedlin genome. Mol Biochem Parasitol 2004; 138:97–106; doi:10.1016/j.molbiopara.2004.08.002. PubMedView ArticlePubMedGoogle Scholar
- Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, et al. The RAST Server: Rapid annotations using subsystems technology. BMC Genomics 2008; 9:75. PubMed doi:10.1186/1471-2164-9-75PubMed CentralView ArticlePubMedGoogle Scholar
- Meng X, Benson K, Chada K, Huff EJ, Schwartz DC. Optical mapping of bacteriophage-lambda clones using restriction endonucleases. Nat Genet 1995; 9:432–438; doi:10.1038/ng0495-432. PubMedView ArticlePubMedGoogle Scholar
- Chen Q, Savarino SJ, Venkatesan MM. Subtractive hybridization and optical mapping of the enterotoxigenic Escherichia coli H10407 chromosome: isolation of unique sequences and demonstration of significant similarity to the chromosome of E. coli K-12. Microbiology 2006; 152:1041–1054; doi:10.1099/mic.0.28648-0. PubMedView ArticlePubMedGoogle Scholar
- Mahillon J, Chandler M. Insertion sequences. Microbiol Mol Biol Rev 1998; 62:725–774. PubMedPubMed CentralPubMedGoogle Scholar
- Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, et al. The subsystem approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 2005; 33:5691–5702. PubMed doi:10.1093/nar/gki866PubMed CentralView ArticlePubMedGoogle Scholar