Complete genome sequence of the fish pathogen Flavobacterium psychrophilum ATCC 49418T

Flavobacterium psychrophilum is the causative agent of bacterial cold water disease and rainbow trout fry mortality syndrome in salmonid fishes and is associated with significant losses in the aquaculture industry. The virulence factors and molecular mechanisms of pathogenesis of F. psychrophilum are poorly understood. Moreover, at the present time, there are no effective vaccines and control using antimicrobial agents is problematic due to growing antimicrobial resistance and the fact that sick fish don’t eat. In the hopes of identifying vaccine and therapeutic targets, we sequenced the genome of the type strain ATCC 49418 which was isolated from the kidney of a Coho salmon (Oncorhychus kisutch) in Washington State (U.S.A.) in 1989. The genome is 2,715,909 bp with a G+C content of 32.75%. It contains 6 rRNA operons, 49 tRNA genes, and is predicted to encode 2,329 proteins.


Introduction
Flavobacterium psychrophilum is a Gram-negative pathogen that infects all species of salmonid fish and has been found to also infect eel and three species of cyprinids [1][2][3]. It causes bacterial cold water disease (BCWD) and rainbow trout fry mortality syndrome (RTFS) in fish and is responsible for significant losses in the salmonid aquaculture industry [1]. Water temperature plays a key role in the infection and development of disease [4] which occurs between 4-16°C and is most prevalent at 10°C or below [5]. It was originally thought to be limited to North America [6] but it is now recognized in almost every country in Europe, in some parts of Asia, and in Australia [1,7].
Three serotypes and two biovars of F. psychrophilum have been described [7,8]. In addition, molecular analysis of the population structure of this bacterium suggests that there are a number of distinct lineages [7]. It has been speculated that some strains are species specific [9] while others are location specific [10]. Some strains have also been observed to cause only either BCWD or RTFS [7]. A recent study in Japan showed multiple sequence types infecting ayu (Plecoglossus altivelis) in a closed lake environment [11]. It is also known that phase variation can occur where the colonial phenotype changes between "rough" and "smooth", perhaps to help in evasion of the immune system [12]. Generally F. psychrophilum populations are heterogeneous; however, a recent study showed closely related epidemic clones infecting rainbow trout (Oncorhynchus mykiss) in Nordic countries [13]. To date, only one genome sequence [14] of F. psychrophilum has been reported and sequences of other strains are required to gain insight into the molecular mechanisms of virulence and why some strains are more virulent than others. Here we present a summary of classification and features of the F. psychrophilum type strain ATCC 49418 (= DSM 3660 = NCMB = 1947 = LMG 13179 = ATCC 49418) [15] together with a description of the complete genome and its annotation.
A phylogentic tree was constructed using the 16S rRNA sequences of F. psychrophilum ATCC 49418 T , selected strains and species of the same genus, as well as selected species of other genera belonging to the family Flavobacteriaceae ( Figure 2). The four F. psychrophilum strains are grouped together in the tree with ATCC Figure 2 Phylogenetic tree displaying the relationship between F. psychrophilum ATCC 49418 T and selected strains and species of the same genus. Other genera from the family Flavobacteriaceae were used as an out group. The phylogenetic tree was constructed using the "One Click" mode with default settings in the Phylogeny.fr platform [41]. This pipeline uses four different programs including MUSCLE [42], Gblock [43], PhyML [44], and TreeDyn [45]. The numbers above the branches are tree support values generated by PhyML using the aLRT statistical test.

Genome sequencing information
Genome project history The complete genome sequence and annotation data of F. psychrophilum ATCC 49418 T have been deposited in DDBJ/EMBL/GenBank under the accession number CP007207. Sequencing and assembly steps as well as finishing were performed at McGill University and Génome Québec Innovation Centre. Annotation was performed using the NCBI Prokaryotic Genome Annotation Pipeline [46] and manually edited in Kodon (Applied Maths, Austin, TX). Table 2 presents a summary of the project information and its association with MIGS version 2.0 compliance [47]. 3) created using CGview [65]. From the outside to the center: Genes on forward strand (blue clockwise arrows), genes on reverse strand (blue counter-clockwise arrows), F. psychrophilum JIP02/86 genome (red), RNA genes (tRNAs orange, rRNAs violet, other RNAs gray), GC content (black), GC skew (purple/olive).

Genome sequencing and assembly
Genome sequencing of F. psychrophilum ATCC 49418 T was performed using a PacBio RS II instrument. The reads were automatically processed through the Single Molecule Real Time (SMRT) software suite using the Hierarchical Genome Assembly Processing (HGAP) pipeline [49]. The resulting reads (580,625,890 bp in total) were filtered and the longest reads with 20x coverage were selected as seeds for constructing preassemblies. The preassemblies were constructed by aligning the short reads to the long reads (seeds). Each read was mapped to multiple seeds using BLASTR [50]. In total there were 8073 long sequences totaling 90,000,401 bp with an average length of 11148 bp and 162,858 bp short sequences totaling 490,625,489 bp with an average length of 3013 bp. Since errors in PacBio are random, aligning the multiple short reads onto the long reads allows the correction of errors in the long reads. The optimal number of sequences to be mapped onto the seeds is controlled by the "-bestn" parameter and the optimal number was determined to be 12. The preassembled reads for the seeds are generated using PBDAG-Con [51] to create corrected consensus sequences in addition to quality analysis of the seeds. This script uses multiple sequence alignments and a directed acyclic graph to produce the best consensus reads possible. It does so by eliminating the insertion and deletion errors generated during the sequencing process. In addition, it avoids generating chimeric sequences (sequences with artifacts) for  assembly because chimeric reads will have no or low short sequence coverage. At the end of the process, only the best preassembled reads without artifacts are sent to the assembler [52]. After quality analysis and eliminating some of the preassembled reads by PBDAG-Con, the remaining 6,009 reads were fed into the Celera assembler which uses an overlap-layout-consensus strategy [49]. A total of 2 contigs were generated with sizes 1,647,861 bp and 1,076,634 bp. These contigs underwent an additional polishing step where they were compared against the raw reads and any artifacts found were removed [49]. The final consensus generated was analyzed and improved by using the multiread consensus algorithm Quiver. Quiver takes the two contigs and the initial sequencing reads and maps the reads onto the assemblies [49]. It then disregards the alignment between the reads and the assemblies and a consensus is created independently from the reads allowing it to remove any fine-scale errors made by the Celera assembler [52]. An approximate copy of the consensus sequences is then generated by Quiver which makes insertions and deletions and those that improve the maximum likelihood are applied to the initial consensus sequence [53]. The two final contigs generated by Quiver were 1,648,613 bp and 1,077,094 bp.
The two contigs underwent a finishing process using SeqMan Pro (DNASTAR Inc., Madison, WI). The two contigs were collapsed into one and the sequence was then opened in a region homologous to the Ori of F. psychrophilum JIP02/86 resulting in another two contigs.
These were resealed using SeqMan Pro to create one final complete contig.

Genome annotation
The NCBI Prokaryotic Genome Annotation Pipeline was used to predict protein coding genes, structural RNAs (5S, 16S, 23S), tRNAs, and small non-coding RNAs [54]. Protein coding genes were predicted by protein alignment using ProSplign [55] where only complete alignments with 100% identity to a reference protein are kept for final annotation. Frameshifted or partial alignments were further analyzed by GeneMarkS+ [56] for further analysis and gene prediction. A BLASTN search against a reference set of structural RNA genomes from the NCBI Reference Sequence Collection was conducted to find the structural RNAs since they are highly conserved in closely related prokaryotes. tRNAscan-SE was used to identify the tRNAs [57]. Small RNAs were predicted using a BLASTN search against sequences of selected Rfam families and the results were refined further using Cmsearch [58]. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) were identified by searching the CRISPR database with the CRISPRfinder program (http://crispr.u-psud. fr/Server/) [59][60][61][62].

Insights into the genome sequence
A number of studies have been done to determine the pathogenesis of F. psychrophilum but, to date, the exact mechanisms are still unknown [1]. Some putative and previously characterized virulence factors are listed in Table 5. Proteolytic enzymes are widely used by fish pathogens to cause tissue damage and allow invasion of the host [1]. In the F. psychrophilum ATCC 49418 T genome there are four metalloprotease encoding genes including a predicted zinc metalloprotease [FPG3_00455], a predicted zinc peptidase [FPG3_06120] and the previously reported Fpp1 [66] and Fpp2 [67] metalloproteases. Rainbow trout with RTFS are anemic and past studies have reported that the red blood cells of rainbow trout are partially lysed when infected by F. psychrophilum [68,69]. Homologs of two RTX hemolysin transporters (FPG3_06485, FPG3_10400) were identified, but did not appear to be linked to any toxin or modification genes [70]. Six iron transport genes were also identified; these were anticipated since iron uptake is a well-known characteristic of most pathogens. Moreover, recent research has shown that attenuated F. psychrophilum strains cultured under iron limiting conditions confer greater protection to fish when used as an experimental vaccine [71]. A hydroperoxidase with predicted catalase and peroxidase functions were also identified. In addition, there are 11 cell surface proteins with leucine rich repeats that are predicted to be adhesins; several are listed in Table 5. These were very similar to the ones found in F. psychrophilum JIP/02. Further research is required to determine what functions these adhesins have and how they help F. psychrophilum bind to the host.

Conclusion
Flavobacterium psychrophilum, the causative agent of BCWD and RTFS in salmonid fishes, causes significant economic losses in the aquaculture industry. The genome sequence of the ATCC 49418 T strain will hopefully provide new insights into virulence mechanisms and pathogenesis of F. psychrophilum and help in the identification of suitable targets for vaccines and antimicrobial agents; however, to do this much more analysis will be required.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions AW participated in genome sequencing analysis, bioinformatics analysis, drafted the original manuscript, and participated in the revision process. AK participated in genome sequence analysis and assembly refinement. JL and BD participated in the study design and provided funding for the project. JM conceived the study, provided funding for the project, and participated in the revision process. All authors read and approved the final manuscript.