Towards long-read metagenomics: complete assembly of three novel genomes from bacteria dependent on a diazotrophic cyanobacterium in a freshwater lake co-culture
© The Author(s). 2017
Received: 28 April 2016
Accepted: 9 December 2016
Published: 19 January 2017
Here we report three complete bacterial genome assemblies from a PacBio shotgun metagenome of a co-culture from Upper Klamath Lake, OR. Genome annotations and culture conditions indicate these bacteria are dependent on carbon and nitrogen fixation from the cyanobacterium Aphanizomenon flos-aquae, whose genome was assembled to draft-quality. Due to their taxonomic novelty relative to previously sequenced bacteria, we have temporarily designated these bacteria as incertae sedis Hyphomonadaceae strain UKL13-1 (3,501,508 bp and 56.12% GC), incertae sedis Betaproteobacterium strain UKL13-2 (3,387,087 bp and 54.98% GC), and incertae sedis Bacteroidetes strain UKL13-3 (3,236,529 bp and 37.33% GC). Each genome consists of a single circular chromosome with no identified plasmids. When compared with binned Illumina assemblies of the same three genomes, there was ~7% discrepancy in total genome length. Gaps where Illumina assemblies broke were often due to repetitive elements. Within these missing sequences were essential genes and genes associated with a variety of functional categories. Annotated gene content reveals that both Proteobacteria are aerobic anoxygenic phototrophs, with Betaproteobacterium UKL13-2 potentially capable of phototrophic oxidation of sulfur compounds. Both proteobacterial genomes contain transporters suggesting they are scavenging fixed nitrogen from A. flos-aquae in the form of ammonium. Bacteroidetes UKL13-3 has few completely annotated biosynthetic pathways, and has a comparatively higher proportion of unannotated genes. The genomes were detected in only a few other freshwater metagenomes, suggesting that these bacteria are not ubiquitous in freshwater systems. Our results indicate that long-read sequencing is a viable method for sequencing dominant members from low-diversity microbial communities, and should be considered for environmental metagenomics when conditions meet these requirements.
KeywordsAerobic anoxygenic phototroph Aphanizomenon flos-aquae Hyphomonadaceae Betaproteobacterium Bacteroidetes PacBio SMRT sequencing
Metagenomic sequencing is the process of sampling DNA sequences from multiple genomes in a community of organisms, and has been applied to many environmental samples to assess both functional diversity and species richness of microbial communities [1, 2]. Recently, there has been a progression in metagenomic approaches associated with advances in sequencing technologies. Next-generation sequencing methods  such as 454 and Illumina HiSeq/MiSeq have greatly reduced sequencing costs per base relative to Sanger sequencing due to increased throughput, which has facilitated shotgun metagenomics (randomly sequencing all DNA in a sample). This has provided several advantages over amplicon sequencing. For example, focus has shifted from assigning taxa using single genes to using multiple genes and sequence composition instead [2, 4]. It has also permitted functional characterization of individual representatives or whole microbial communities [5, 6].
However, there are technical hurdles associated with short-read sequencing. Specifically, assembling short reads (50-300 bp) into contiguous sequences (contigs) rarely leads to complete genome assemblies due to repetitive genomic elements such as 16S rRNA genes and IS elements  that are 1 kb or greater in length. There are two consequences as a result. First, closing draft genomes by primer walking requires considerable manual effort and time. Second, if closure is not possible, contigs must be clustered and binned using methods like differential coverage , co-abundance [5, 9, 10], or gene/nucleotide composition . While useful, these methods are often not comprehensive and become even more difficult to implement when used in a metagenomic context, where multiple genomes (sometimes from closely related organisms) must be delineated .
Single-molecule real time sequencing technologies, such as PacBio and Oxford Nanopore, are part of the third-generation sequencing wave . These sequencers produce average read lengths in the 5–50 kb range, with ~50% of reads longer than 14 kb , which exceeds the size of repetitive elements in the average bacterial genome. Although more error-prone, these longer reads have proven advantageous for assembling closed genomes if sequencing depth is high enough to allow error correction . To date, long-read sequencing has rarely been used for metagenomics for several reasons: 1) the amount of sequence data returned is a fraction of an Illumina run (up to 750 Gb/flow cell of Illumina HiSeq 3000 vs. up to 1 Gb/SMRT cell of PacBio Sequel based on company specifications), 2) the sequencing cost per base pair is higher, and 3) PacBio does not rely upon DNA amplification, so high concentrations of raw DNA are required. Due to these limitations, long-read metagenomics has so far been limited to whole-16S amplicon sequencing  and to improving binning from fragmented (short-read) assemblies .
Here, we have generated a PacBio shotgun metagenome from a non-axenic cyanobacterium culture established in summer 2013 originating from Upper Klamath Lake, OR. In this freshwater lake, the N2-fixing filamentous cyanobacterium Aphanizomenon flos-aquae blooms annually. These blooms are harvested and sold as nutritional supplements. Little is known about the co-occurring microbial community in this lake, whose composition could be influenced by the presence of A. flos-aquae as the dominant primary producer [17, 18]. By applying a selective growth medium lacking nitrogen, our goal was to sequence and assemble complete genomes from a relatively simple community, in turn assessing the possibility for using PacBio shotgun sequencing for environmental metagenomics. We closed three novel bacterial genomes, which provide insight into putative metabolic dependencies of these bacteria on A. flos-aquae in the co-culture. However, we were unable to close the A. flos-aquae genome, which is in draft quality and will be discussed elsewhere.
Classification and features
Classification and general features of UKL genomes according to MIGS specifications 
Class incertae sedis
Order incertae sedis
Order incertae sedis
Family incertae sedis
Family incertae sedis
Genus incertae sedis
Genus incertae sedis
Genus incertae sedis
Species incertae sedis
Species incertae sedis
Species incertae sedis
pH range; Optimum
Terminal electron acceptor
Upper Klamath Lake, Oregon, USA
Upper Klamath Lake, Oregon, USA
Upper Klamath Lake, Oregon, USA
Aug 6, 2013
Aug 6, 2013
Aug 6, 2013
Although we initiated and maintained this mixed-community culture for 1 year, the culture was lost and we did not obtain physiological information regarding these organisms. Sustaining long-term A. flos-aquae cultures is often difficult, and it is common for cultures to die. In the absence of phenotypic information, we discuss insights from the genome annotations for the three novel bacteria.
Genome sequencing information
Genome project history
SMRT library prep
SMRT library prep
SMRT library prep
Gene calling method
GenBank date of release
March 30, 2016
March 30, 2016
March 30, 2016
Source material identifier
Growth conditions and genomic DNA preparation
DNA extraction procedures and respective sequencing technologies
1.2 μm GF/C filtration
GeneRiteKit (silica beads)
Illumina 100 bp paired-end HiSeq 2000
1.2 μm GF/C filtration and whole sample
Nov 2013 & Mar 2015
Genome sequencing and assembly
Genomes identified from PacBio assemblies. PacBio read coverage calculated by mapping with BLASR
Assembly Length (bp)
PB read coverage
Bacteroidetes bacterium UKL13-3
Unknown Caulobacterales bacterium
Illumina assembly statistics for each genome. Contig number and assembly length are from extracted bins. Illumina coverage calculated by mapping with BWA-MEM. Bin coverage parameters used to bin Illumina assemblies with mmgenome
# Illumina contigs
Bin coverage parameters
Bin assembly length (bp)
Bin assembly (% of genome)
Bin estimated completeness
Bin estimated contamination
Illumina: 15-40x PacBio: >49x
Illumina: 37-87x PacBio: 71-211x
Bacteroidetes bacterium UKL13-3
Illumina: 44-103x PacBio: >228x
All genomes were annotated with NCBI's PGAP  and PROKKA . Counts of features (Genes, CDS, pseudogenes, rRNAs, tRNAs, ncRNAs, and CRISPR arrays) come from PGAP annotations. Amino acid sequences were assigned to COG categories by searching against the COG protein database  using RAPSEARCH , taking only the top hits above an E-value of 1E-30. Amino acid sequences from each genome were also annotated using the KEGG database  with the GhostKOALA  pipeline and the “genus_prokaryotes” database on September 3, 2015.
Properties and statistics for each genome
% of total
% of total
% of total
Genome size (bp)
DNA coding (bp)
DNA G + C (bp)
Protein coding genes
Genes in internal clusters
Genes with function prediction
Genes assigned to COGs
Genes with Pfam domains
Genes with signal peptides
Genes with transmembrane helices (≥3)
Number and proportion of genes associated with COG functional categories
% of totala
% of totala
% of totala
RNA processing and modification
Chromatin structure and dynamics
Cell cycle control
Signal transduction mechanisms
Cell wall /membrane/biogenesis
Energy production and conversion
Carbohydrate transport and metabolism
Amino acid transport and metabolism
Nucleotide transport and metabolism
Coenzyme transport and metabolism
Lipid transport and metabolism
Inorganic ion transport and metabolism
Secondary metabolites biosynthesis
General function prediction only
Not in COGs
Insights from the genome sequence
PacBio metagenome and comparison to Illumina metagenome
The bacterial community associated with the Aphanizomenon flos-aquae culture was subjected to metagenomic analysis with sequencing on eight PacBio SMRT cells, resulting in three completed novel bacterial genomes: Hyphomonadaceae UKL13-1, Betaproteobacterium UKL13-2, and Bacteroidetes UKL13-3 (Table 4). There were insufficient reads to close the genome of A. flos-aquae, although 67 contigs could be clustered to represent an estimated 97% of the genome (Table 4). The A. flos-aquae genome was sequenced with lower coverage than the three completed genomes, and additional sequencing would be needed for genome completion. Contigs from partial genomes of two additional bacteria were also clustered: a novel Flavobacterium (63% estimated genome completeness) and a novel Brevundimonas bacterium (17% estimated genome completeness) (Table 4), which were taxonomically placed via PhylopythiaS+. The Flavobacterium genome contained 16S rDNA genes with 98% similarity to Flavobacterium aquatile DSM 1132, but no 16S gene was identified in the Brevundimonas contigs. Our results indicate the presence of at least six separate bacterial taxa in this non-axenic culture.
A parallel Illumina HiSeq 2000 metagenome allowed comparison of PacBio-only and Illumina-only assemblies. When assembled with Illumina reads, the three predominant genomes separated into bins containing ~100 or more contigs. The Betaproteobacterium genome bin contained more contigs than the Hyphomonadaceae and Bacteroidetes genomes, although it was sequenced at the highest Illumina depth of the three (63x coverage vs. 23x and 58x coverage, respectively) (Table 5). There was a ~200 kb discrepancy between Illumina bin length and completed genome length for each of the three genomes. The total binned contig lengths for the Bacteroidetes and Betaproteobacterium were each shorter than the completed genomes, while the Hyphomonadaceae total binned contig length was longer (Table 5). The additional sequences in the Hyphomonadaceae bin were primarily contigs shorter than 10 kb that were not part of the PacBio-assembled Hyphomonadaceae genome. The bin quality control program CheckM  overestimated genome completeness or underestimated contamination when compared with the finished genome size. For example, CheckM estimated that the Hyphomonadaceae UKL13-1 bin contained ~2% contamination, while comparing the bin length with the completed genome length suggests ~6% contamination (Table 5). These discrepancies indicate that genome binning has a tendency to exclude important sequences or include extraneous sequences, and reveals the difficulty of assessing binned genome completeness and contamination without a reference. Incomplete binning is common for draft genomes, particularly from metagenomic assemblies .
We also assessed the extent to which genome repeats affected Illumina assemblies. Repeats in each genome were identified by using BLASTN to align each genome with itself, with a minimum E-value cutoff of 1E-30. Both intragenome BLASTN hits and missing Illumina coverage were then visualized with a circular genome plot (Additional file 3: Figure S1a-c). Breaks in Illumina assemblies commonly co-localized with intragenomic repeats in each genome. In particular, the Betaproteobacterium UKL13-2 genome is enriched for repeat sequences relative to the other two genomes and contains larger regions unassembled by Illumina reads, factors that possibly contributed to the greater genome fragmentation (Table 5).
tRNA-Asn, Ile, Ala
Response regulator UvrY
16S ribosomal RNA
Ribosomal proteins S6, S18, S23, L9, L32
16S-23S-5S rRNA operons (2x)
16S-23S-5S rRNA operon
DNA polymerase III subunit alpha
tRNAs Ile (2x), Ala (2x), Asn, Ser
Elongation factor Tu
DNA ligase (2x)
50S ribosomal proteins L21, L27
Lipoprotein-releasing system proteins LolCDE
RNA polymerase-binding transcription factor DksA
DNA polymerase III subunit alpha
Glycerol-3-phosphate dehydrogenase [NAD(P)+]
Elongation factor Tu (2x)
Novel completed genomes
To functionally characterize the three novel genomes, we searched all protein-coding sequences against the COG database using Rapsearch 2.16 and a 1E-30 E-value cutoff. We then repeated this for all bacterial genomes in GenBank (collected on November 3, 2015) and compared these to our novel genomes to assess enrichment of protein-coding sequences associated with each COG functional category. These are shown as a percentage of all protein-coding sequences from each respective genome (Additional file 7: Figure S2). Our results indicate that the Hyphomonadaceae UKL13-1 genome contains more lipid metabolism genes than most bacteria (at 5.01% of predicted coding sequences vs. a mean of 2.96%), while the Bacteroidetes UKL13-3 genome contains more cell wall/envelope/membrane biogenesis genes (7.39%, vs. a mean of 4.61%).
The A. flos-aquae genome was the only identified source of nitrogen fixing genes in the culture. Since the growth medium was nitrogen-deplete, all other bacteria in the community likely depend on reduced N provided by the cyanobacterium. It has been shown that A. flos-aquae from the Baltic Sea fixes N2 and releases it as NH4 +, which is then taken up by surrounding heterotrophic or phototrophic bacteria [44, 45]. Both proteobacterial genomes contain the ammonia transporter gene amtB, which would allow uptake of NH4 + released by A. flos-aquae (Fig. 6). No ammonia channel transport genes were annotated in the Bacteroidetes UKL13-3 genome. The proteobacterial genomes both contain chemotaxis and flagellar genes, and the Betaproteobacterium genome also contains type IV pilus genes for twitching motility (Additional file 8: Table S6; Fig. 6). Motility may be necessary for these organisms to stay associated with and obtain benefits from A. flos-aquae, similar to other host-associated bacteria .
We searched the novel genomes for the presence of other transporters to inform of the needs for survival and growth. Both proteobacterial genomes contain transporters for alkanesulfonate, iron(III), phosphate, and phosphonate. The Hyphomonadaceae genome also contains a transporter for putrescine, while the Betaproteobacterium genome contains complete transporter modules for tungstate, molybdate, glutamate/aspartate, and branched-chain amino acids. Few, and only broadly functional, transporter modules were identified in the Bacteroidetes genome. All three genomes appear to carry complete genetic pathways for nucleotide biosynthesis, as well as genes for synthesis of all 20 amino acids, indicating these organisms are self-sufficient in this regard. Because the Flavobacterium and Brevundimonas genomes were so incomplete (Table 4), their gene content is not reported here.
We were unable to identify any plasmids in the assemblies. The distribution of all plasmids in GenBank shows that the majority are found in Proteobacteria (~47%), although most of these are associated with Gammaproteobacteria (~63%), rather than Alphaproteobacteria (~22%) or Betaproteobacteria (~8.7%) . Plasmids from Bacteroidetes were much rarer at ~1.6%. It may then be unsurprising that these bacteria lack plasmids.
Freshwater bacteria associated with cyanobacterial blooms
Bacteria from the Alphaproteobacteria , Betaproteobacteria , and Bacteroidetes are common in freshwater systems , are known to be commonly associated with cyanobacterial blooms, and can directly influence the growth of cyanobacteria in culture . We therefore propose that the three newly sequenced genomes represent a bacterial community that is dependent on Aphanizomenon flos-aquae (Fig. 6). Some Alphaproteobacteria have been identified in such cyanobacterial-associated communities . For example, Alphaproteobacteria 16S sequences have been detected in association with the nitrogen-fixing cyanobacterium Gloeotrichia echinulata . Interestingly, 16S rDNA from Hyphomonadaceae UKL13-1 shared significant identity (Additional file 1: Table S1) with one of these sequences (A0904), suggesting that bacteria related to Hyphomonadaceae UKL13-1 are associated with various bloom-forming cyanobacteria. However, the extent to which such co-occurrences reflect physiological interdependencies remains to be explored. Betaproteobacteria have been seen physically associated with cyanobacteria [18, 49]. The predicted chemotaxis and flagellar and twitching motility genes (Additional file 8: Table S6) would assist both Hyphomonadaceae UKL13-1 and Betaproteobacterium UKL13-2 to remain associated with A. flos-aquae colonies and obtain the benefits of ammonium and organic nutrients released by the cyanobacterium (Fig. 6). We have detected no genes by which these photoheterotrophic bacteria could obviously benefit A. flos-aquae.
Bacteria from the Bacteroidetes phylum are commonly identified in, and sometimes dominate, freshwater lake systems . They are also frequently found in particle-associated communities and commonly degrade extracellular polysaccharide matrices that are grazed via gliding motility . Bacteroidetes UKL13-3 possesses annotated gliding motility genes, which may indicate physical association with the originally isolated A. flos-aquae colony. Extracellular mucilage, as well as a range of nutrients (reduced C, N and S compounds) released by A. flos-aquae, may support the growth of Bacteroidetes UKL13-3, whose genome seems to lack many functionally annotated pathways. Bacteroidetes UKL13-3 has the only annotated extracellular peroxidase gene in the three genomes, which could protect against reactive oxygen species generated by photosynthesis in A. flos-aquae (Fig. 6), which itself lacks annotated peroxidase genes. This may indicate a mutual benefit for both bacteria, and conform to the Black Queen Hypothesis described for interactions between the unicellular cyanobacterium Prochlorococcus with other interacting bacteria . On the other hand, large populations of Bacteroidetes bacteria have been observed following cyanobacterial bloom decline  due to subsequently favorable conditions for copiotrophs  and cell turnover of A. flos-aquae and other cells may provide organic material for Bacteroidetes UKL13-3 growth in co-culture, with similar benefits from lysed cells for the two Proteobacteria .
Search for the novel genomes in freshwater metagenomes
We searched for the occurrence of the three novel bacteria in 62 freshwater lake metagenomes from 8 sampling sites across the United States, including Oregon, Washington state, California, Texas, and Kansas (BioProject accessions: PRJNA312985, PRJNA282166, PRJNA312830, PRJNA312986, and PRJNA294203, respectively). To do so, we mapped reads from these metagenomes to the references with BWA-MEM with default parameters (~0.067% mismatch rate) and calculated average genome coverage. Matches were found in two samples. A metagenome from Copco Reservoir, CA, on the Klamath River downstream of UKL on September 19, 2007 contained ~86x average read coverage depth of the Hyphomonadaceae UKL13-1 genome and ~151x coverage depth of the Bacteroidetes UKL13-3 genome from 398,356,734 Illumina read pairs. Additionally, a metagenome from Cranberry Lake, WA on August 11, 2014 contained the Betaproteobacterium UKL13-2 genome at ~99x average coverage depth from 13,955,857 Illumina read pairs. We also searched in 50 additional freshwater lake metagenomes in the IMG, MG-RAST, and SRA databases. The only detection found was the Betaproteobacterium UKL13-2 genome at ~19x average coverage depth in a metagenome consisting of 319,415,720 Illumina read pairs labeled “vibrio metagenome HEM-04” from a freshwater lake (BioProject accession: PRJNA64039). This initial analysis shows that the three novel bacteria are found elsewhere in freshwater habitats, although they do not appear to be ubiquitous or widely abundant.
Taxonomic placement and naming of genomes from uncharacterized bacteria
Currently, there is a lack of guidance and standardization for assigning taxonomic nomenclature to genomic sequences lacking phenotypic information. Until now, most of these sequences have been amplicons or draft assemblies from shotgun metagenomes, in which the genomes are usually incomplete and there are likely to be some contaminating contigs. We observed both of these defects in the draft genomes assembled from the short-read Illumina sequencing that paralleled the assembly of complete genomes from the PacBio long-read sequencing (Tables 4 and 5). Critical genes—including ribosomal RNA operons—were missing from the draft genomes (Table 8), making it perhaps premature to assign new taxonomic designations to such bacteria. Our work demonstrates that as long-read sequencing depth increases, so will the likelihood of assembling complete microbial genomes from uncultured samples of high relevance to extant microbial communities. Especially for circular chromosomes, the assembly of such complete genomes would seem to carry a low risk of artifactual assembly of the genome from a mixed-DNA sample in a metagenomic study, particularly when no PCR amplifications are used in the sequencing protocol, as in the present case. It would seem beneficial to the advancement of microbial ecology to develop new guidelines by which currently unclassifiable complete genomes derived from metagenomic data can be taxonomically placed with new genus and species names as appropriate.
Here, we have shown that completing multiple genome assemblies is possible from a simple microbial community using PacBio sequencing, a feat that is nearly impossible with short-read shotgun sequencing alone. There are several advantages to this approach. Completing genome assemblies from a shotgun metagenome avoids genome gaps and excludes contaminant sequences, which are significant issues with binned draft genomes. Absent sequences can contain functionally relevant information, such as gene clusters encoding secondary metabolites  or antibiotic resistance genes near mobile elements . Here we observed that key essential genes (Table 8) were missing from each short-read assembly. Also, short-read assemblers can compress small repeats, potentially removing important functional information . In addition to providing more complete genomic information, long-read sequencing of communities such as mixed cultures or environmental samples creates possibilities for new experimental designs. For example, complete genomes from novel organisms sequenced from the environment can be used as new references for culture-free resequencing efforts, such as to explore gene linkage patterns among alleles in a population. Further, long-read sequencers often detect DNA modifications, such as methylation, allowing capture of epigenetic information from environmental sequencing runs.
Although PacBio sequencing is low-throughput compared with short-read sequencers, our results suggest that the current state of this technology allows genome sequencing from communities with relatively low diversity, such as those in extreme environments  or when dominated by one or a few organisms . Platform improvement, such as the recently released PacBio Sequel instrument, is expected to make long-read sequencing increasingly desirable for shotgun metagenomics in the future.
Here, we have sequenced three novel genomes that may be associated with A. flos-aquae as part of the cyanobacterial phycosphere (Fig. 6). Based on gene annotations and growth medium, both Proteobacteria are motile aerobic anoxygenic phototrophs that utilize fixed nitrogen and carbon provided by A. flos-aquae. Bacteroidetes UKL13-3 is a heterotroph that likely has similar nutritional requirements, and may exist in a mutual relationship with A. flos-aquae through provision of an extracellular peroxidase. In future work, it will be interesting to explore the possible existence and nature of dependencies between these novel bacteria and A. flos-aquae colonies in blooms in Upper Klamath Lake and elsewhere.
Aerobic anoxygenic phototroph
Clusters of orthologous genes
Dissolved organic matter
Kyoto encyclopedia of genes and genomes
Prokaryotic genome annotation pipeline
Single molecule real-time
Upper Klamath Lake
We thank Sara Eldridge at the USGS for providing samples from UKL used for culture inoculum, Kim Halsey for discussions and Mark Wildung and Derek Pouchnik from the WSU Molecular Biology and Genomics Core for DNA preparation and PacBio sequencing in addition to providing HGAP assemblies. We also thank the NL Tartar Research Fellowship, the Oregon State University Agricultural Experiment Station, and the Mabel E. Pernot Trust for providing funding.
Conceived the project: TWD, TGO, NMB, CBD. Provided strains: TGO. Annotated the genomes: CBD. Analyzed and interpreted results: CBD, TWD, NMB. Wrote the manuscript: CBD, TWD, TGO. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Gilbert JA, Dupont CL. Microbial Metagenomics: Beyond the Genome. Annu Rev Mar Sci. 2011;3:347–71.View ArticleGoogle Scholar
- Escobar-Zepeda A, de León AV-P, Sanchez-Flores A. The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front Genet. 2015;6:348.View ArticlePubMedPubMed CentralGoogle Scholar
- Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387–402.View ArticlePubMedGoogle Scholar
- Gregor ID J, Schirmer M, Quince C, McHardy AC. PhyloPythiaS+: A self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. arXiv. 2014;1406:7123.Google Scholar
- Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 2013;23(1):111–20.View ArticlePubMedPubMed CentralGoogle Scholar
- Evans PN, Parks DH, Chadwick GL, Robbins SJ, Orphan VJ, Golding SD, Tyson GW. Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics. Science. 2015;350(6259):434–8.View ArticlePubMedGoogle Scholar
- Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol. 2015;23:110–20.View ArticlePubMedGoogle Scholar
- Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31(6):533–8.View ArticlePubMedGoogle Scholar
- Imelfort M, Parks D, Woodcroft BJ, Dennis P, Hugenholtz P, Tyson GW. GroopM: an automated tool for the recovery of population genomes from related metagenomes. Peer J. 2014;2:e603.View ArticlePubMedPubMed CentralGoogle Scholar
- Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. Peer J. 2015;3:e1165.View ArticlePubMedPubMed CentralGoogle Scholar
- Cleary B, Brito IL, Huang K, Gevers D, Shea T, Young S, Alm EJ. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat Biotechnol. 2015;33:1053–60.View ArticlePubMedPubMed CentralGoogle Scholar
- Hess M, Sczyrba A, Egan R, Kim T-W, Chokhawala H, Schroth G, Luo S, Clark DS, Chen F, Zhang T. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science. 2011;331(6016):463–7.View ArticlePubMedGoogle Scholar
- Lee H, Gurtowski J, Yoo S, Marcus S, McCombie WR, Schatz M. Error correction and assembly complexity of single molecule sequencing reads. BioRxiv. 2014:doi: 10.1101/006395
- Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z, Rasko DA, McCombie WR, Jarvis ED. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol. 2012;30(7):693–700.View ArticlePubMedPubMed CentralGoogle Scholar
- Fichot EB, Norman RS. Microbial phylogenetic profiling with the Pacific Biosciences sequencing platform. Microbiome. 2013;1(1):1.View ArticleGoogle Scholar
- Frank JA, Pan Y, Tooming-Klunderud A, Eijsink VG, McHardy AC, Nederbragt AJ, Pope PB. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci Rep. 2016;6:25373.View ArticlePubMedPubMed CentralGoogle Scholar
- Bagatini IL, Eiler A, Bertilsson S, Klaveness D, Tessarolli LP, Vieira AAH. Host-specificity and dynamics in bacterial communities associated with bloom-forming freshwater phytoplankton. PLoS ONE. 2014;9:e85950.View ArticlePubMedPubMed CentralGoogle Scholar
- Louati I, Pascault N, Debroas D, Bernard C, Humbert JF, Leloup J. Structural diversity of bacterial communities associated with bloom-forming freshwater cyanobacteria differs according to the cyanobacterial genus. PLoS ONE. 2015;10(11):e0140614.View ArticlePubMedPubMed CentralGoogle Scholar
- Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41(D1):D590–6.View ArticlePubMedGoogle Scholar
- Pruesse E, Peplies J, Glöckner FO. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics. 2012;28(14):1823–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5(3):e9490.View ArticlePubMedPubMed CentralGoogle Scholar
- Darling AE, Jospin G, Lowe E, Matsen IV FA, Bik HM, Eisen JA. PhyloSift: phylogenetic analysis of genomes and metagenomes. Peer J. 2014;2:e243.View ArticlePubMedPubMed CentralGoogle Scholar
- Parker CT, Tindall BJ, Garrity GM. International code of nomenclature of prokaryotes. Int J Syst Evol Microbiol. 2015;doi: 10.1099/ijsem.0.000778
- Culture Collection of Algae at the University of Texas at Austin. B3N media recipe. [https://utex.org/products/bold-3n-medium]. Accessed 12 Jan 2017.
- Sambrook J, Russell DW. Purification of nucleic acids by extraction with phenol: chloroform. Cold Spring Harbor Protocols 2006, 2006(1):doi: 10.1101/pdb.prot4455.
- Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–9.View ArticlePubMedGoogle Scholar
- Forde BM, Zakour NLB, Stanton-Cook M, Phan M-D, Totsika M, Peters KM, Chan KG, Schembri MA, Upton M, Beatson SA. The complete genome sequence of Escherichia coli EC958: a high quality reference sequence for the globally disseminated multidrug resistant E. coli O25b: H4-ST131 clone. PLoS ONE. 2014;9(8):e104400.View ArticlePubMedPubMed CentralGoogle Scholar
- Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD. REAPR: a universal tool for genome assembly evaluation. Genome Biol. 2013;14(5):1.View ArticleGoogle Scholar
- Peng Y, Leung HC, Yiu SM, Chin FY. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–8.View ArticlePubMedGoogle Scholar
- Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:13033997 2013.
- Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics. 2012;13(1):238.View ArticlePubMedPubMed CentralGoogle Scholar
- NCBI Prokaryotic Genome Annotation Pipeline Release Notes, Version 3.3 [https://www.ncbi.nlm.nih.gov/genome/annotation_prok/release_notes/]. Accessed 12 Jan 2017.
- Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–9.View ArticlePubMedGoogle Scholar
- Galperin MY, Makarova KS, Wolf YI, Koonin EV. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2014:doi: 10.1093/nar/gku1223.
- Zhao Y, Tang H, Ye Y. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics. 2012;28(1):125–6.View ArticlePubMedGoogle Scholar
- Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic acids research 2015:doi: 10.1093/nar/gkv1070.
- Kanehisa M, Sato Y, Morishima K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol. 2016;428(4):726–31.View ArticlePubMedGoogle Scholar
- Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25(7):1043–55.View ArticlePubMedPubMed CentralGoogle Scholar
- Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, Mcvey SD, Radune D, Bergman NH, Phillippy AM. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol. 2013;14(9):1.View ArticleGoogle Scholar
- Darling AE, Mau B, Perna NT. ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE. 2010;5(6):e11147.View ArticlePubMedPubMed CentralGoogle Scholar
- Moran MA, Miller WL. Resourceful heterotrophs make the most of light in the coastal ocean. Nat Rev Microbiol. 2007;5(10):792–800.View ArticlePubMedGoogle Scholar
- Yurkov V, Csotonyi JT. New light on aerobic anoxygenic phototrophs. In: Advances in Photosynthesis and Respiration. Dordrecht: Springer; 2009. p. 31-55. ISBN: 978-1-4020-8814-8.
- Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. A guide to the natural history of freshwater lake bacteria. Microbiol Mol Biol Rev. 2011;75(1):14–49.View ArticlePubMedPubMed CentralGoogle Scholar
- Ploug H, Musat N, Adam B, Moraru CL, Lavik G, Vagner T, Bergman B, Kuypers MM. Carbon and nitrogen fluxes associated with the cyanobacterium Aphanizomenon sp. in the Baltic Sea. ISME J. 2010;4(9):1215–23.View ArticlePubMedGoogle Scholar
- Adam B, Klawonn I, Sveden JB, Bergkvist J, Nahar N, Walve J, Littmann S, Whitehouse MJ, Lavik G, Kuypers MM, et al. N-fixation, ammonium release and N-transfer to the microbial and classical food web within a plankton community. ISME J. 2016;10:450–9.View ArticlePubMedGoogle Scholar
- Lertsethtakarn P, Ottemann KM, Hendrixson DR. Motility and chemotaxis in Campylobacter and Helicobacter. Annu Rev Microbiol. 2011;65:389–410.View ArticlePubMedGoogle Scholar
- Shintani M, Sanchez ZK, Kimbara K. Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy. Front Microbiol. 2015;6.
- Berg KA, Lyra C, Sivonen K, Paulin L, Suomalainen S, Tuomi P, Rapala J. High diversity of cultivable heterotrophic bacteria in association with cyanobacterial water blooms. ISME J. 2009;3(3):314–25.View ArticlePubMedGoogle Scholar
- Eiler A, Olsson JA, Bertilsson S. Diurnal variations in the auto‐and heterotrophic activity of cyanobacterial phycospheres (Gloeotrichia echinulata) and the identity of attached bacteria. Freshw Biol. 2006;51(2):298–311.View ArticleGoogle Scholar
- Pernthaler J, Zöllner E, Warnecke F, Jürgens K. Bloom of filamentous bacteria in a mesotrophic lake: identity and potential controlling mechanism. Appl Environ Microbiol. 2004;70(10):6272–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Lemarchand C, Jardillier L, Carrias J-F, Richardot M, Debroas D, Sime-Ngando T, Amblard C. Community composition and activity of prokaryotes associated to detrital particles in two contrasting lake ecosystems. FEMS Microbiol Ecol. 2006;57(3):442–51.View ArticlePubMedGoogle Scholar
- Morris JJ, Lenski RE, Zinser ER. The Black Queen Hypothesis: evolution of dependencies through adaptive gene loss. mBio 2012, 3(2):doi: 10.1128/mBio.00036-12.
- Eiler A, Bertilsson S. Flavobacteria blooms in four eutrophic lakes: linking population dynamics of freshwater bacterioplankton to resource availability. Appl Environ Microbiol. 2007;73(11):3511–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Zeder M, Peter S, Shabarova T, Pernthaler J. A small population of planktonic Flavobacteria with disproportionally high growth during the spring phytoplankton bloom in a prealpine lake. Environ Microbiol. 2009;11(10):2676–86.View ArticlePubMedGoogle Scholar
- Harrison J, Studholme DJ. Recently published Streptomyces genome sequences. Microb Biotechnol. 2014;7(5):373–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Zowawi HM, Forde BM, Alfaresi M, Alzarouni A, Farahat Y, Chong T-M, Yin W-F, Chan K-G, Li J, Schembri MA. Stepwise evolution of pandrug-resistance in Klebsiella pneumoniae. Sci Rep. 2015;5:15082.View ArticlePubMedPubMed CentralGoogle Scholar
- Brown NM, Mueller RS, Shepardson JW, Landry ZC, Morre JT, Maier CS, Hardy FJ, Dreher TW. Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102. BMC Genomics. 2016;17(1):457.View ArticlePubMedPubMed CentralGoogle Scholar
- Méndez-García C, Peláez AI, Mesa V, Sánchez J, Golyshina OV, Ferrer M. Microbial diversity and metabolic networks in acid mine drainage habitats. Front Microbiol. 2015;6:475.PubMedPubMed CentralGoogle Scholar
- Lin K-H, Liao B-Y, Chang H-W, Huang S-W, Chang T-Y, Yang C-Y, Wang Y-B, Lin Y-TK WY-W, Tang S-L. Metabolic characteristics of dominant microbes and key rare species from an acidic hot spring in Taiwan revealed by metagenomics. BMC Genomics. 2015;16(1):1.View ArticleGoogle Scholar
- Abraham W-R, Rohde M. The Family Hyphomonadaceae. In: The Prokaryotes. Heidelberg: Springer; 2014. p. 283-99. ISBN: 978-3-642-30193-3.
- Prosser JI, Head IM, Stein LY. The family Nitrosomonadaceae. In: The Prokaryotes. Heidelberg: Springer; 2014. p. 901-18. ISBN: 978-3-642-30193-3.
- Oren A. The Family Rhodocyclaceae. In: The Prokaryotes. Heidelberg: Springer; 2014. p. 975-98. ISBN: 978-3-642-30193-3.
- Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol. 2008;26(5):541–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci. 1990;87(12):4576–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Garrity GM, Bell JA, Lilburn T. Phylum XIV. Proteobacteria phyl. In: Garrity GM, Brenner DJ, Krieg NR, Staley JT, editors. Bergey’s manual of systematic bacteriology, second edition, volume 2, part b. New York: Springer; 2005. p. 1.View ArticleGoogle Scholar
- Krieg NR, Ludwig W, Euzéby J, Whitman WB: Phylum XIV. Bacteroidetes phyl. nov. In: Bergey’s Manual® of Systematic Bacteriology. New York: Springer; 2010. p. 25-469. ISBN 0387950427.
- Garrity GM, Bell JA, Lilburn T. Class I. Alphaproteobacteria class. nov. In: Bergey’s Manual® of Systematic Bacteriology. New York: Springer; 2005. p. 1-574. ISBN 0387950427.
- Garrity GM, Bell JA, Lilburn T: Class II. Betaproteobacteria class. nov. In: Bergey’s manual® of systematic bacteriology. New York: Springer; 2005. p. 575-922. ISBN 0387950427.
- Garrity GM, Bell JA, Lilburn T. Order III. Rhodobacterales ord. nov. In: Bergey’s manual® of systematic bacteriology. Edited by Brenner DJ, Krieg NR, Staley JT, Garrity GM, vol. 2, part C: New York: Springer; 2005. p. 161. ISBN 0387950427.
- Lee K-B, Liu C-T, Anzai Y, Kim H, Aono T, Oyaizu H. The hierarchical system of the ‘Alphaproteobacteria’: description of Hyphomonadaceae fam. nov., Xanthobacteraceae fam. nov. and Erythrobacteraceae fam. nov. Int J Syst Evol Microbiol. 2005;55(5):1907–19.View ArticlePubMedGoogle Scholar
- Morris BE, Henneberger R, Huber H, Moissl-Eichinger C. Microbial syntrophy: interaction for the common good. FEMS Microbiol Rev. 2013;37(3):384–406.View ArticlePubMedGoogle Scholar