Genome sequences of two closely related strains of Escherichia coli K-12 GM4792

Zhang, Yan-Cong; Zhang, Yan; Zhu, Bi-Ru; Zhang, Bo-Wen; Ni, Chuan; Zhang, Da-Yong; Huang, Ying; Pang, Erli; Lin, Kui

doi:10.1186/s40793-015-0114-x

Short genome report
Open access
Published: 10 December 2015

Genome sequences of two closely related strains of Escherichia coli K-12 GM4792

Yan-Cong Zhang¹,
Yan Zhang^1,3,
Bi-Ru Zhu¹,
Bo-Wen Zhang¹,
Chuan Ni^1,4,
Da-Yong Zhang¹,
Ying Huang²,
Erli Pang¹ &
…
Kui Lin ORCID: orcid.org/0000-0002-5993-1972¹

Standards in Genomic Sciences volume 10, Article number: 125 (2015) Cite this article

1707 Accesses
Metrics details

Abstract

Escherichia coli lab strains K-12 GM4792 Lac⁺ and GM4792 Lac^- carry opposite lactose markers, which are useful for distinguishing evolved lines as they produce different colored colonies. The two closely related strains are chosen as ancestors for our ongoing studies of experimental evolution. Here, we describe the genome sequences, annotation, and features of GM4792 Lac⁺ and GM4792 Lac^-. GM4792 Lac⁺ has a 4,622,342-bp long chromosome with 4,061 protein-coding genes and 83 RNA genes. Similarly, the genome of GM4792 Lac^- consists of a 4,621,656-bp chromosome containing 4,043 protein-coding genes and 74 RNA genes. Genome comparison analysis reveals that the differences between GM4792 Lac⁺ and GM4792 Lac^- are minimal and limited to only the targeted lac region. Moreover, a previous study on competitive experimentation indicates the two strains are identical or nearly identical in survivability except for lactose utilization in a nitrogen-limited environment. Therefore, at both a genetic and a phenotypic level, GM4792 Lac⁺ and GM4792 Lac^-, with opposite neutral markers, are ideal systems for future experimental evolution studies.

Introduction

The microbial experimental evolution systems, with the ability to generate a ‘fossil’ record for later study and the design of replicate populations to test the predictability of evolution, offer a chance to ‘replay’ the evolutionary process, ‘watch’ evolution in action [1] and measure the fitness of evolved lines under the relevant environmental conditions [2]. However, the lack of obvious differences in phenotypic characteristics makes microbes difficult to observe. Fortunately, some neutral genetic markers help distinguish evolved lines by differences in colony color [2]. Typically, when a derived strain with an opposite marker relative to its progenitor is required, one can be selected using specific culture media [3]. Subsequently, the degree of neutrality for this marker is evaluated by comparing the fitness of the two strains containing opposite markers under the culture conditions used in the study [4]. The lactose marker is one such marker. For the lac operon, a previous study has been performed utilizing its mutations between strains with opposite lactose markers via target sequencing [5].

Since the publication of the K-12 genome in 1977 [6], Escherichia coli has been thoroughly studied with regard to its genetics [7–9], biochemistry [10–12], metabolic reconstruction [10], pathway inference [13], genomics [14–16] and metabolic [17]. E. coli strain K-12 GM4792, a laboratory strain, contains the chromosomal lacI33::lacZ allele and is unable to utilize lactose [18]. GM4792 was a derivative of the parent strain P90C [ara-600 del(gpt-lac)5 LAM^- relA1 spoT1 thiE1] [19–21] by homogenizing a Pro⁺ Lac⁺/F' lacI33::lacZ and then curing the episome with acridine orange [20] (M. G. Marinus, personal communication). A previous study [22] resulted in two closely related strains, GM4792 Lac^- and GM4792 Lac⁺ that carry opposite lactose markers and plasmids are knocked out for further studies on experimental evolution. Here, Lac⁺ refers to the ability of the strain to utilize lactose and Lac^- refers to the inability to utilize lactose. These strains were chosen as ancestors for our ongoing studies of the experimental evolution of E. coli in a nitrogen-limited environment. In this study, we summarize the classification and features of E. coli GM4792 Lac⁺ and GM4792 Lac^-, together with a description of the genome sequencing and annotation. This work provides a foundation for future variant analysis of evolved lines at the genomic scale. To compare GM4792 Lac⁺ and GM4792 Lac^-, we used the breseq pipeline v0.20 [23] to detect initial variants and subsequently applied a series of filters to eliminate false positives. Using this method, two significant variants were detected, including a synonymous single nucleotide polymorphism, and a 1-bp deletion responsible for lactose metabolism. A previous study on competitive experimentation [22] has shown that these two strains are identical or nearly identical in survivability, except for lactose utilization in a nitrogen-limited environment. Thus, both genetically and phenotypically, GM4792 Lac⁺ and GM4792 Lac^- carry neutral markers and are appropriate for future experimental evolution studies.

Organism information

Classification and features

GM4792 is a strain of E. coli K-12. It is asexual (F^-), carries lacI33::lacZ allele and cannot metabolize lactose [18]. This laboratory strain was a generous gift from M. G. Marinus (University of Massachusetts Medical School). We obtained it on October 7, 2007. Firstly, GM4792 was transferred to Luria-Bertani (LB) liquid medium for 24 h with shaking at 150 rpm. Subsequently, strains were streaked on LB solid medium. Twenty-four hours later, a single colony was transferred to LB liquid medium, with shaking for 24 h. The inoculated medium was mixed 1:1 with glycerol saline and stored in a –40 °C freezer. Thus, a monoclonal GM4792 Lac^- strain was obtained. The monoclonal GM4792 Lac^- colonies were grown in LB liquid medium, collected by centrifugation, and washed with the culture solution. Then, approximately 10⁹ cells were plated on Davis minimal media [4] containing only lactose as the carbon source. Following a 4-day incubation period, the colonies began to utilize the lactose in the medium. One colony was selected and amplified in LB liquid medium, then stored at –40 °C. Thus, GM4792 Lac⁺ strain was obtained with the ability to metabolize lactose. The genome of each strain is a single circular chromosome with knockout plasmids; so, genetic variants between them could arise only from de novo mutations. Like most strains of E. coli [7], the cells of GM4792 are rod-shaped (Fig. 1, Additional file 1: Figure S1), Gram-negative, motile with peritrichous flagella, non-pigmented, chemo-organotrophic and facultative anaerobes. As GM4792 does not ferment sucrose or salicin, the strain belongs to E. coli “var. communis” [24]. As previously described, GM4792 can grow at temperatures between 10 °C and 45 °C, with an optimum growth temperature of 37 °C, and pH 5.5-8.0 [25, 26]. Strain characteristics of E. coli K-12 GM4792 are shown in Table 1.

Table 1 Classification and general features of Escherichia coli strain K-12 GM4792 according to the MIGS recommendations [58]

Full size table

As a model organism, the molecular structure and chemical composition of the cell wall of E. coli have been thoroughly studied. This is described in detail by Scheutz and Strockbine [26]. Similar to other strains of E. coli , GM4792 has a single peptidoglycan layer within the periplasm, consisting of D-glutamic acid, D-alanine, mesodiaminopimelic acid, N-acetyglucosamine and N-acetylmuramic acid linked to the tetrapeptide L-alanine. The cells stain Gram-negative and contain an outer membrane, with a lipopolysaccharide layer containing lipid A, the core region of the phosphorylated nonrepeating oligosaccharides and the O-antigen polymer [7, 25, 26].

Genome sequencing information

Genome project history

The two closely related E. coli lab strains K-12 GM4792 Lac⁺ and GM4792 Lac^- were selected for genome sequencing for subsequent use in experimental evolution studies. The genomes were sequenced in the year 2012. The genome project is deposited at the Genome OnLine Database [27] and the NCBI BioProject database. The finished genome sequences are deposited at GenBank with the accession numbers CP011342 and CP011343. A summary of the project information is shown in Table 2.

Table 2 Project information

Full size table

Growth conditions and genomic DNA preparation

After receiving the laboratory strain GM4792 from M. G. Marinus, a single clone was randomly selected as a Lac^- strain. A single Lac⁺ clone was obtained after the Lac^- strain had been incubated for 4 days under selection conditions for lactose metabolism. Strains stored at –40 °C were thawed at room temperature. Each strain was streaked on LB solid medium with an inoculation needle and incubated for 24 h at 37 °C. Distinctive monoclonal colonies grew, and a single colony was selected and inoculated into 5 ml LB liquid medium and grown at 37 °C with shaking for 24 h. Total genomic DNA was extracted using the TIANamp Bacteria DNA Kit (Code:DP302, TIANGEN BIOTECH, Beijing, China), according to the manufacturer’s instructions. Additional RNaseA (Code:RT405-12, TIANGEN BIOTECH CO, Beijing, China) was added, following the manufacturer’s instruction. The quality and quantity of the genomic DNA was evaluated using agarose gel electrophoresis and the λ-Hind III digest DNA Marker (Code:D3403A, TaKaRa, China). For each sample, approximately 3 μg DNA with a concentration of 100 ng/μl was obtained.

Genome sequencing and assembly

Whole-genome sequencing was performed using the Illumina HiSeq 2000 by generating paired-end and mate-pair libraries with an average insert size of 180 bp, 380 bp, 2 kbp and 6 kbp. The length of reads for each library was 100 bp. Duplicate paired reads were filtered out from each library with FastUniq v1.1 [28], and reads that were contaminated by Illumina adapter were removed with the cutadapt tool [29]. Subsequently, reads with ~370×/~330×, ~100×, ~100× and ~100× coverage from each library, respectively, were used to perform the assembly. ALLPATHS-LG Release 42411 [30] was applied to assemble the genomes, which begins by correcting sequencing errors. The GapCloser version 1.12 [31] program was used on the resulting scaffolds to close gaps. After that, ICORN [32] was used to perform corrections on the assembly. Finally, six remaining gaps were completely closed by additional PCR experiments. More details are shown in Additional file 2.

Genome annotation

As the GM4792 strains are very closed to the strain MG1655, the annotations of GM4792 strains were firstly transferred from MG1655 using RATT [33]. And then, de novo annotation was performed on both those regions with imperfectly transferred annotations and the insertions with respect to the stain MG1655. tRNA and rRNA were identified using tRNAscan-SE v1.3.1 [34] and RNAmmer v1.2 [35], respectively. Coding sequences (CDSs) were identified using Prodigal v2.5 [36]. CDSs were translated and analyzed using the NCBI nonredundant database, UniProt (released 2012-10) [37], InterPro v40 [38], TIGRFAMs [39], Pfam [40], and COG [41] databases for functional annotation. Genes with signal peptides and transmembrane helices were predicted with TMHMM v2.0 [42] and SignalP v4.0 [43], respectively. Clustered regularly interspaced short palindromic repeats (CRISPR) were identified with CRT v1.2 [44]. Transcription factors were identified based on the results of domain identification and the DBD database v2.0 [45]. Gene ontology term assignment was performed using the GO database (released 2013-3-30) [46] and Blast2Go Pipeline v2.5.0 [47]. Metabolic pathways were constructed based on the KEGG database (Release 76.0) [48] and KAAS [49]. The complete sets of input parameters used for each program are shown in Table S7 of Additional file 1.

Genome properties

GM4792 Lac⁺ genome contains a 4,622,342 bp long chromosome with 50.81 % G + C content. GM4792 Lac^- genome has one circular chromosome of 4,621,656 bp with a G + C content of 50.80 %. Totally 4,144 genes were predicted for GM4792 Lac⁺, including 4,061 protein-coding genes and 83 RNA genes (tRNA and rRNA). Similarly, GM4792 Lac^- is composed of 4,117 genes (4,043 protein-coding genes and 74 RNA genes). The majority of protein-coding genes, for both GM4792 Lac⁺ and GM4792 Lac^-, were assigned a putative function (94.64 % and 94.73 %, respectively) and the remaining genes were annotated as hypothetical proteins. The properties and statistics of the two GM4792 strains are summarized in Tables 3 and 4, and the circular maps of the chromosome are shown in Fig. 2 and Figure S3 of the Additional file 1. As GM4792 belongs to K-12 strain, all fully assembled K-12 strains were used for phylogenetic analysis. The other groups may add any further information. All completely assembled and well-annotated K-12 strains were downloaded on 10 October 2015. In order to better characterize the phylogenetic relationships for K-12 strains, Escherichia albertii KF1 was included as outgroup. Totally, 45 genomes including 44 Escherichia coli K-12 strains were analyzed (Additional file 1: Table S6). According to phylogenetic analysis based on whole-genome sequences, the two GM4792 strains cluster together and are next to E. coli RV308 with a high support value (Fig. 3), a similar pattern also supported using a concatenation of single copy protein sequences (Additional file 1: Figure S2).

Table 3 Genome statistics

Full size table

Table 4 Number of genes associated with general COG functional categories

Full size table

Insights from the genome sequence

The paired-end reads with an insert size of 380 bp of Lac⁺ and the scaffolds of Lac^- were analyzed using the breseq pipeline v0.20 [23] to identify mutations based on read alignments. Six types of variants, including single-base substitution, multiple-base substitution, insertion, deletion, mobile element insertion, and sequence amplification, could be identified. All mutations containing a variant within the adjacent 20 base pairs were removed. Then, mutations that persisted when mapping the reads of Lac^- to the genome of Lac^- were removed. All of the retained mutations were manually reviewed using the graphical output of the mapping results. After filtering, only two significant variants were left: one 1-bp deletion in lacI and one synonymous SNP outside of the lac operon (Additional file 1: Table S1). We performed a multiple sequence alignment of the three DNA segments containing the lacI and lac operons from the MG1655, Lac^- and Lac⁺ strains using the CLUSTALW program [50]. We detected a 212-bp deletion, which consisted of the last 16 bp of lacI, all of the lac promoter and operator, and the first 74 bp of lacZ, in both the Lac^- and Lac⁺ genomes compared to MG1655. In the Lac^- strain, an insertion of a C at bp 961 generates a stop codon at bp 1281. Lacking the promoter and operator, the lac operon cannot be transcribed. Therefore, the Lac^- strain could not utilize lactose. In Lac⁺, the reverse occurred: a 1-bp deletion in this region. The frameshift mutation 1-bp deletion in lacI led to the loss of the stop codon, and thus, lacI was fused to the lac operon, and consequently, the fused protein was transcribed via the lacI promoter (Additional file 1: Figure S4). Thus, GM4792 Lac⁺ could catabolize lactose. This transition is in agreement with previous studies [5, 18, 51]. In addition, the GM4792 strains were compared to MG1655 on the whole-genome scale with Mauve version snapshot_2015-02-25 [52]. For GM4792 Lac⁺, 450 SNPs and 112 indels were identified compared to the MG1655. As to GM4792 Lac^-, there were totally 441 SNPs and 109 indels compared to the MG1655. More details are shown in Additional file 1: Tables S2–S5.

Phenotypic analysis revealed that the lactose marker was neutral under the conditions used in our studies of experimental evolution of E. coli in a nitrogen-limited environment; the ratio of fitness between GM4792 Lac^- and GM4792 Lac⁺ was 1.00 (0.994 ~ 1.036, 95 % confidence interval) [22]. Therefore, at both the genotypic and phenotypic levels, these two strains differ only by their ability to utilize lactose, indicating that GM4792 Lac⁺ and GM4792 Lac^- are a good system for studies of population evolution and adaption.

Conclusions

This study presents two closely related genomes, E. coli lab strains K-12 GM4792 Lac⁺ and GM4792 Lac^-, which lay a solid foundation for future variant analysis of evolved lines at the genome scale in evolutionary experiments. A whole-genome comparison of GM4792 Lac⁺ and GM4792 Lac^- reveals that the extent of genome-wide differences between GM4792 Lac⁺ and GM4792 Lac^- are not significant and are isolated to the loci related to the utilization of lactose. Only two significant variants have been detected. One is a synonymous SNP, and the other is 1-bp deletion that is responsible for lactose utilization in GM4792 Lac⁺. Moreover, phenotypic analysis also showed that GM4792 Lac⁺ and GM4792 Lac^- are nearly identical regarding survivability, except for lactose utilization, in a nitrogen-limited environment. All of the results indicate that GM4792 Lac⁺ and GM4792 Lac^- with neutral markers are ideal systems for future experimental evolution studies.

References

Barrick JE, Lenski RE. Genome dynamics during experimental evolution. Nat Rev Genet. 2013;14(12):827–39.
Article PubMed Central CAS PubMed Google Scholar
Elena SF, Lenski RE. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nat Rev Genet. 2003;4(6):457–69.
Article CAS PubMed Google Scholar
Barrick JE, Kauth MR, Strelioff CC, Lenski RE. Escherichia coli rpoB mutants have increased evolvability in proportion to their fitness defects. Mol Biol Evol. 2010;27(6):1338–47.
Article PubMed Central CAS PubMed Google Scholar
Lenski R, Rose M, Simpson S, Tadler S. Long-term experimental evolution in Escherichia coli. I. Adaptation and divergence during 2,000 generations. American naturalist. 1991;138(6):1315–41.
Article Google Scholar
Foster PL, Trimarchi JM. Adaptive reversion of a frameshift mutation in Escherichia coli by simple base deletions in homopolymeric runs. Science. 1994;265(5170):407–9.
Article PubMed Central CAS PubMed Google Scholar
Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277(5331):1453.
Article CAS PubMed Google Scholar
Meier-Kolthoff JP, Hahnke RL, Petersen J, Scheuner C, Michael V, Fiebig A, et al. Complete genome sequence of DSM 30083^T, the type strain (U5/41^T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Stand Genomic Sci. 2014;9(1):2.
Article PubMed Central PubMed Google Scholar
Allocati N, Masulli M, Alexeyev MF, Di Ilio C. Escherichia coli in Europe: An Overview. Int J Environ Res Public Health. 2013;10(12):6235–54.
Article PubMed Central PubMed Google Scholar
Kaper JB, Nataro JP, Mobley HLT. Pathogenic Escherichia coli. Nat Rev Microbiol. 2004;2(2):123–40.
Article CAS PubMed Google Scholar
Tee TW, Chowdhury A, Maranas CD, Shanks JV. Systems metabolic engineering design: Fatty acid production as an emerging case study. Biotechnol Bioeng. 2014;111(5):849–57.
Article PubMed Central CAS PubMed Google Scholar
Wen M, Bond-Watts BB, Chang MCY. Production of advanced biofuels in engineered E. coli. Curr Opin Chem Biol. 2013;17(3):472–9.
Article CAS PubMed Google Scholar
Donovan C, Bramkamp M. Cell division in Corynebacterineae. Frontiers in Microbiology. 2014;5.
Rosano GL, Ceccarelli EA. Recombinant protein expression in Escherichia coli: advances and challenges. Frontiers in Microbiology. 2014;5.
Kuzminov A. The chromosome cycle of prokaryotes. Mol Microbiol. 2013;90(2):214–27.
PubMed Central CAS PubMed Google Scholar
Whitfield C, Roberts IS. Structure, assembly and regulation of expression of capsules in Escherichia coli. Mol Microbiol. 1999;31(5):1307–19.
Article CAS PubMed Google Scholar
Cooper KK, Mandrell RE, Louie JW, Korlach J, Clark TA, Parker CT, et al. Comparative genomics of enterohemorrhagic Escherichia coli O145:H28 demonstrates a common evolutionary lineage with Escherichia coli O157:H7. BMC Genomics. 2014;15.
Kang Z, Zhang C, Zhang J, Jin P, Zhang J, Du G, et al. Small RNA regulators in bacteria: powerful tools for metabolic engineering and synthetic biology. Appl Microbiol Biotechnol. 2014;98(8):3413–24.
Article CAS PubMed Google Scholar
Foster PL, Trimarchi JM. Adaptive reversion of an episomal frameshift mutation in Escherichia coli requires conjugal functions but not actual conjugation. Proc Natl Acad Sci U S A. 1995;92(12):5487–90.
Article PubMed Central CAS PubMed Google Scholar
Coulondre C, Miller JH. Genetic studies of the lac repressor: III. Additional correlation of mutational sites with specific amino acid residues. J Mol Biol. 1977;117(3):525–67.
Article CAS PubMed Google Scholar
Miller JH. Experiments in molecular genetics. Cold Spring Harbor Laboratory: Cold Spring Harbor; 1972.
Google Scholar
Miller JH. A short course in bacterial genetics. Cold Spring Harbor: Cold Spring Harbor Laboratory; 1992.
Google Scholar
Ni C. The experimental evolution of Escherichia coli in nitrogen limited environment, PhD thesis. Beijing: Normal University, College of Life Sciences; 2010.
Google Scholar
Deatherage DE, Barrick JE. Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol Biol. 2014;1151:165–88.
Article PubMed Central CAS PubMed Google Scholar
Topley WWC, Wilson GS. The Principles of Bacteriology and Immunity. 2nd ed. 1936.
Welch RA. The genus Escherichia. The Prokaryotes. New York: Springer; 2006. p. 60–71.
Schultz F, Strockbine N. Genus I. Escherichia Castellani and Chalmers 1919, 941T^AL. In: Brenner DJ KN, Staley JT, editors. Bergey’s Manual of Systematic Bacteriology, vol. 2. 2nd ed. New York: Springer; 2005. p. 607–24. The Proteobacteria.
Google Scholar
Pagani I, Liolios K, Jansson J, Chen IMA, Smirnova T, Nosrat B, et al. The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012;40(D1):D571–9.
Article PubMed Central CAS PubMed Google Scholar
Xu H, Luo X, Qian J, Pang X, Song J, Qian G, et al. FastUniq: A aast de novo duplicates removal tool for paired short reads. PLoS One. 2012;7(12):e52249.
Article PubMed Central CAS PubMed Google Scholar
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet journal. 2011;17(1):10–2.
Article Google Scholar
Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A. 2011;108(4):1513–8.
Article PubMed Central CAS PubMed Google Scholar
Luo RB, Liu BH, Xie YL, Li ZY, Huang WH, Yuan JY, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1:6.
Article Google Scholar
Otto TD, Sanders M, Berriman M, Newbold C. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics. 2010;26(14):1704–7.
Article PubMed Central CAS PubMed Google Scholar
Otto TD, Dillon GP, Degrave WS, Berriman M. RATT: rapid annotation transfer tool. Nucleic Acids Res. 2011;39(9):7.
Article Google Scholar
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.
Article PubMed Central CAS PubMed Google Scholar
Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007;35(9):3100–8.
Article PubMed Central CAS PubMed Google Scholar
Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119.
Article PubMed Central PubMed Google Scholar
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004;32:D115–9.
Article PubMed Central CAS PubMed Google Scholar
Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 2011;40(D1):D306–12.
Article PubMed Central PubMed Google Scholar
Haft DH, Selengut JD, White O. The TIGRFAMs database of protein families. Nucleic Acids Res. 2003;31(1):371–3.
Article PubMed Central CAS PubMed Google Scholar
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42(D1):D222–30.
Article PubMed Central CAS PubMed Google Scholar
Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28(1):33–6.
Article PubMed Central CAS PubMed Google Scholar
Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J Mol Biol. 2001;305(3):567–80.
Article CAS PubMed Google Scholar
Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004;340(4):783–95.
Article PubMed Google Scholar
Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, et al. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8(1):209.
Article PubMed Central PubMed Google Scholar
Kummerfeld SK. DBD: a transcription factor prediction database. Nucleic Acids Res. 2006;34:D74–81.
Article PubMed Central CAS PubMed Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.
Article PubMed Central CAS PubMed Google Scholar
Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.
Article CAS PubMed Google Scholar
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999;27(1):29–34.
Article PubMed Central CAS PubMed Google Scholar
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35(Web Server):W182–5.
Article PubMed Central PubMed Google Scholar
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22(22):4673–80.
Article PubMed Central CAS PubMed Google Scholar
MüLLER-HILL B, KANIA J. Lac repressor can be fused to β-galactosidase. Nature. 1974;249(5457):561–3.
Article PubMed Google Scholar
Darling ACE, Mau B, Blattner FR, Perna NT. Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14(7):1394–403.
Article PubMed Central CAS PubMed Google Scholar
Zhang Y, Lin K. A phylogenomic analysis of Escherichia coli / Shigella group: implications of genomic features associated with pathogenicity and ecological adaptation. BMC Evol Biol. 2012;12:174.
Article PubMed Central CAS PubMed Google Scholar
Hazen TH, Sahl JW, Fraser CM, Donnenberg MS, Scheutz F, Rasko DA. Refining the pathovar paradigm via phylogenomics of the attaching and effacing Escherichia coli. Proc Natl Acad Sci U S A. 2013;110(31):12810–5.
Article PubMed Central CAS PubMed Google Scholar
Minkin I, Patel A, Kolmogorov M, Vyahhi N, Pham S. Sibelia: a scalable and comprehensive synteny block generation tool for closely related microbial genomes. Proceedings of Algorithms in Bioinformatics. Berlin: Springer; 2013. p. 215-29.
Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17(6):368–76.
Article CAS PubMed Google Scholar
Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3), e9490.
Article PubMed Central PubMed Google Scholar
Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol. 2008;26(5):541–7.
Article PubMed Central CAS PubMed Google Scholar
Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci. 1990;87(12):4576–9.
Article PubMed Central CAS PubMed Google Scholar
Garrity GM BJ, Lilburn T. Phylum XIV. Proteobacteria phyl nov. In: Brenner DJ KN, Stanley JT, Garrity GM, editors. Bergey’s Manual of Systematic Bacteriology, vol. 2. 2nd ed. New York: Springer; 2005. p. 1. The Proteobacteria part B The Gammaproteobacteria.
Chapter Google Scholar
Garrity GMBD, Lilburn T. Class III. Gammaproteobacteria class. nov. In: Garrity GM BD, Krieg NR, Staley JT, editors. Bergey’s Manual of Systematic Bacteriology, vol. 2. 2nd ed. New York: Springer; 2005. p. 1. Part B.
Chapter Google Scholar
Garrity GM, Holt JG. Taxonomic outline of the Archaea and Bacteria. Bergey’s Manual of Systematic Bacteriology. 2001;1:155–66.
Google Scholar
Brenner DJ. Family I. Enterobacteriaceae Rahn 1937, Nom. fam. cons. Opin. 15, Jud. Com. 1958, 73; Ewing, Farmer, and Brenner 1980, 674; Judicial Commission 1981, 104. In: Krieg NRHJ, editor. Bergey’s Manual of Systematic Bacteriology, vol. 1. 1st ed. Baltimore: The Williams & Wilkins Co; 1984. p. 408–20.
Google Scholar
Escherich T. Die Darmbakterien des Säuglings und ihre Beziehungen zur Physiologie der Verdauung. Stuttgart: Ferdinand Enke; 1886: p. 63–74.
Editorial Board (for the Judicial Commission of the International Committee on Bacteriological Nomenclature). Opinion 26: designation of neotype strains (cultures) of type species of the bacterial genera Salmonella, Shigella, Arizona, Escherichia, Citrobacter and Proteus of the family Enterobacteriaceae. Int J Syst Evol Microbiol. 1963;13:35–6.
Google Scholar
List of growth media used at the DSMZ. [http://www.dsmz.de].

Download references

Acknowledgements

We thank two anonymous reviewers for their invaluable comments and suggestions. The authors gratefully acknowledge the generous help of M. G. Marinus for providing us GM4792. We also thank Hong-Tao Song for useful comments on the manuscript. This work was supported by the National Natural Science Foundation of China (Grant No. 31421063) and the State Key Laboratory of Earth Surface Processes and Resource Ecology (Grant No. 2013-ZY-10).

Author information

Authors and Affiliations

State Key Laboratory of Earth Surface Processes and Resource Ecology and MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, 19 Xinjiekouwai Street, Beijing, 100875, China
Yan-Cong Zhang, Yan Zhang, Bi-Ru Zhu, Bo-Wen Zhang, Chuan Ni, Da-Yong Zhang, Erli Pang & Kui Lin
State Key Laboratory for Infectious Disease Prevention and Control, and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
Ying Huang
Present address: National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
Yan Zhang
Present address: The second high school attached to Beijing Normal University, Beijing, 100192, China
Chuan Ni

Authors

Yan-Cong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bi-Ru Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Bo-Wen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Ni
View author publications
You can also search for this author in PubMed Google Scholar
Da-Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Huang
View author publications
You can also search for this author in PubMed Google Scholar
Erli Pang
View author publications
You can also search for this author in PubMed Google Scholar
Kui Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kui Lin.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

KL and D-YZ designed and coordinated the study. Y-CZ, YZ and EP performed the bioinformatics analyses and wrote the manuscript, and KL assisted in writing the manuscript. B-WZ, B-RZ and CN performed the experiment. YH performed the electron micrograph scanning. All authors commented on the manuscript prior to submission. All authors read and approved the final manuscript.

Additional files

Additional file 1:

Supplementary Tables and Figures. Table S1. Genomic differences between E. coli GM4792 Lac⁺ and Lac^- detected via reads mapping with breseq pipeline. Table S2. Structural variations (insertions, deletions) of GM4792 Lac⁺ compared to MG1655 obtained with Mauve. Table S3. Structural variations (insertions, deletions) of GM4792 Lac^- compared to MG1655 obtained with Mauve. Table S4. Nonsynonymous changes in protein sequence of GM4792 Lac⁺ compared to MG1655 obtained with Mauve. Table S5. Nonsynonymous changes in protein sequence of GM4792 Lac^- compared to MG1655 obtained with Mauve. Table S6. 45 complete genomes used in this study. Table S7. The complete set of input parameters used for programs. Figure S1. Scanning-electron micrograph of strain E. coli GM4792 Lac^-. Figure S2. Phylogenetic tree inferred from the supermatrix of proteome sequences under the Maximum-likelihood (ML) criterion. Figure S3. Graphical circular map of the chromosome of Escherichia coli K-12 GM4792 Lac^-. Figure S4. Mutations related to lactose utilization. (PDF 4042 kb)

Additional file 2:

PCR experiment description. Table S8. The primer sequences for two GM4792 strains. Figure S5. The primer design for the large gap (~2,900 bps) in GM4792 Lac^-. (PDF 124 kb)

Additional file 3:

Title of data: A document containing missing key taxonomic references. (PDF 96 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Zhang, YC., Zhang, Y., Zhu, BR. et al. Genome sequences of two closely related strains of Escherichia coli K-12 GM4792. Stand in Genomic Sci 10, 125 (2015). https://doi.org/10.1186/s40793-015-0114-x

Download citation

Received: 05 June 2015
Accepted: 09 November 2015
Published: 10 December 2015
DOI: https://doi.org/10.1186/s40793-015-0114-x

Genome sequences of two closely related strains of Escherichia coli K-12 GM4792

Abstract

Introduction