Genome statistics
The MSH1 genome (based on substrain DK1) consists of a chromosome of 5,301,518 bp and seven plasmids. The genome contains two large plasmids pUSP1 of 367,423 bp and pUSP2 of 365,485 bp, three smaller plasmids pUSP3, pUSP4, and pUSP5 (respectively 97,029 bp, 64,122 bp, and 31,577 bp) and the two previously reported smaller catabolic plasmids pBAM1 and pBAM2 of 40,559 bp and 53,893 bp, respectively (Table 1). A total of 6,257 genes could be predicted of which six rRNAs, 53 tRNAs, and four ncRNAs. A total of 6194 CDS were predicted including 190 pseudogenes (Table 2). Circular views of the chromosome and seven plasmids are shown in Figs. 1 and 2. The KU Leuven variant, designated as substrain MK1, lacked one of the two larger plasmids, i.e. pUSP2, and the three smaller plasmids pUSP3, pUSP4, and pUSP5. Except for the discrepancy in plasmids, the shared genomes (chromosome, pUSP1, pBAM1, and pBAM2) of the two strains have an average nucleotide identity of 99.9925%. In order to determine whether differences in the two assemblies were due to assembly- or sequencing errors, trimmed Illumina datasets from both substrains were mapped to the KU Leuven strain MK1 assembly in CLC Genomics Workbench (Qiagen, Hilden, Germany) and simple sequence variants (SNPs, small deletions and insertions) were detected using either of the two read mappings, with a cutoff of 35% consensus for variant calling. Almost all small sequence variants between the two substrains were explained by heterogeneity in the Illumina reads, indicating that for both substrains, the cultures from which DNA was extracted were already genetically heterogenous. For most small variants, stochasticity appears to determine the final sequence in assemblies. Hence, the actual sequence similarity between the shared genomes of both substrains is even higher than 99.9925% (not considering the plasmids missing in MK1). The BAM-catabolic genes were manually checked for mutations that could indicate differences in degradation potential. A single nucleotide change was noted in the bbdb3 gene on pBAM2, encoding one of three subunits of a TRAP-type transport system potentially involved in the uptake of 2,6-DCBA8. In this gene, a non-synonomous substitution has changed a glycine to an arginine in the resulting protein in MK1. Currently, it is not known if this change has an effect on the putative function of this tripartite transport system. Furthermore, differences were found in the region of plasmid pUSP1 containing an IS30 family insertion sequence with 38 bp flanking, imperfect, inverted repeats (IRs). The repeats appear complete in DK1, but MK1 shows a deletion of 56 bp and 34 bp up- and downstream of the IS30 transposase gene, including partial deletion of the IR at both ends, suggesting that the MK1 substrain has undergone further genetic changes. The deletions flanking the IS30 element on pUSP1 in MK1 may have been caused by a possible intramolecular transposition event42. However, this IS30 element with deletion in the IRs in MK1 may still be functional, as the functional core region of IS30 IRs are only part of the complete IR43.


Circular view of the chromosome of Aminobacter sp. MSH1. From outer to inner circle: CDS on leading strand, scale (ticks: 100 kb), CDS on lagging strand, tRNA (purple) and rRNA (red) (only chromosome), GC plot and GC skew (> 0: green, < 0: red). CDS are colored according to COG functional categories determined with EggNOG mapper 4.5.1.


Circular view of the plasmids of the newly assigned Aminobacter niigataensis MSH1. From outer to inner circle: CDS on leading strand, scale (ticks: 100 kb), CDS on lagging strand, GC plot and GC skew (> 0: green, < 0: red). CDS are colored according to COG functional categories determined with EggNOG mapper 4.5.1. The KU Leuven substrain MK1 lacks plasmids pUSP2-5.
Phylogenetic assignment of MSH1 to Aminobacter niigataensis
A phylogenetic tree based on the 16S rRNA gene sequence indicating the position of MSH1 is shown in Fig. 3. The 1463 bp 16S rRNA gene sequence of MSH1 is 100% identical to that of Aminobacter niigataensis DSM 7050T and 99.6–99.8% to those of other Aminobacter species. The relationship with Aminobacter niigataensis is supported by whole-genome in silico digital DNA:DNA hybridization using TYGS, which reports that MSH1 (Aarhus substrain DK1) is 82.5% (recommended d4 formula) similar to A. niigataensis DSM 7050 (see Supplementary Table S1 online). Finally, ANI values against all available Aminobacter genomes from NCBI (complete and incomplete assemblies; downloaded January 31, 2021), showed an ANI of 98% against A. niigataensis DSM 7050 (Fig. 4). Based on these analyses, we reassign Aminobacter sp. MSH1 as Aminobacter niigataensis MSH1.


Phylogenetic relationships of Aminobacter niigataensis MSH1 based on the 16S rRNA gene sequence. Maximum likehood tree visualized as a cladogram with bootstrap values. This tree was created from a clustal-omega31 multiple sequence alignment using 16S rRNA genes from the set of type strains available in the Phyllobacteriaceae family (NCBI accession numbers between parenthesis). The tree was inferred using PhyML32 with a GTR substitution model and a calculation of branch support values (bootstrap value of 1000). The Variovorax sp. strain WDL1 was used as an outgroup58.


Heatmap of ANI values for all available Aminobacter genomes from NCBI (downloaded January 31, 2021). Genomes are clustered using hierarchical clustering of ANI values, as implemented in the R59 package “pheatmap”35 (v1.0.12).
Chromosomally encoded metabolic features of MSH1
We examined the occurrence of basic metabolisms in MSH1 which might be of importance for its fate in bioaugmentation applications. The chromosome of MSH1 possesses all genes required for glycolysis using the Embden-Meyerhof pathway and additionally possesses all genes for glucose metabolism through the Entner-Doudoroff pathway and the pentose phosphate pathway. It also contains all genes of the tricarboxylic acid cycle. MSH1 was previously shown to grow on several carbohydrates with slower growth on succinate and acetic acid as carbon sources compared to glucose, fructose, and glycerol44. These properties might be important when accommodating MSH1 with an auxiliary C-source for improved maintenance in the oligotrophic waters where it finds its main application13,15. On the other hand, MSH1 does not possess genes involved in carbon fixation which rules out autotrophic growth using CO2. MSH1 further displays the catechol ortho-cleavage pathway45 and possesses genes for conversion of benzoate to catechol allowing the organism to grow on benzoate which was confirmed by culturing the strain on benzoate (data not shown). With regards to nitrogen metabolism, MSH1 contains a gene cluster that encodes the transmembrane ammonium channel AmtB as well as its cognate protein GlnK46, indicating that MSH1 can use mineral ammonia as a nitrogen source directly from its environment, in addition to nitrogen released from BAM degradation. In addition, MSH1 carries genes encoding proteins involved in nitrate transport (NrtA and NrtT), located upstream of genes for assimilatory reduction of nitrate (nasDEA) to ammonium suggesting that MSH1 can also use nitrate as a nitrogen source. Ammonia is also released from amino acid metabolism and is further incorporated in L-glutamate for biosynthesis. Furthermore, despite its inability to grow under nitrate reducing conditions6, MSH1 contains a gene cluster which encodes for several proteins (NapAB, NapC, NirK, NorBC) involved in dissimilatory nitrate reduction. However, narG, encoding the typical cytoplasmatic oriented dissimilatory nitrate reductase47, is lacking as well as nosZ for reduction of nitrous oxide to dinitrogen48. The exact function of the gene cluster containing napABC, nirK and norBC is therefore yet unknown. For sulfur metabolism, MSH1 possesses two nearby located gene clusters encoding the ABC transporter complex CysUWA involved in sulfate/thiosulfate import linked with either sbp and cysP49 that encode the periplasmic protein that delivers respectively sulfate and thiosulfate to the ABC transporter complex leading to high affinity uptake. Furthermore, the chromosome contains all genes (cysC, cysH, cisIJ, cysK) necessary for assimilatory sulfate reduction to sulfide and incorporation of sulfide into O-acetylserine to form cysteine49. The assimilation of thiosulfate is less clear but MSH1 encodes for a second homologue of CysK, as well as several glutaredoxin proteins required for incorporating thiosulfate in O-acetylserine and reductive cleavage reaction of its disulfide bond to form cysteine49.
Plasmids of MSH1
Besides the previously described IncP1-β and repABC plasmids, pBAM1 and pBAM27,8, the Aarhus MSH1 substrain DK1 harbors the five pUSP1-5 plasmids (Fig. 2), while the KU Leuven substrain MK1 lacks pUSP2, pUSP3, pUSP4, and pUSP5. Catabolic genes on pBAM1 and pBAM2 enable MSH1 to mineralize the groundwater micropollutant BAM and use it as a source of carbon, nitrogen, and energy for growth. The amidase BbdA on pBAM1 transforms BAM to 2,6-dichlorobenzoic acid (DCBA)7 which is further metabolized by a series of catabolic enzymes encoded by pBAM28,9. As previously discussed8, the gene bbdI encoding the gluthatione dependent thiolytic dehalogenase responsible for removal of one of the chlorines from BAM together with bbdJ encoding gluthatione reductase, occur on pBAM2 in three consecutive, perfect repeats followed by a fourth, imperfect repeat. This, together with the placement of the BAM degradation genes on two separate plasmids (pBAM1 and pBAM2) and the bordering of the catabolic gene clusters by remnants of insertion sequences and integrase genes, suggests that the BAM catabolic genes in MSH1 have been acquired by horizonral gene transfer and then evolved to occur in their observed genomic organisation. In addition, pBAM2 has a considerably lower GC content of 56% compared to the chromosome and other plasmids which are between 60.0 and 64.4% (Table 2), which could indicate that pBAM2 was acquired from another, unknown, unrelated bacterium. It was previously shown that mineralization of DCBA is a common trait in bacteria in sand filters and soils, while BAM to DCBA conversion is the rate limiting step in BAM mineralization and is rare in microbial communities50.
Like pBAM2, plasmids pUSP1, pUSP2, and pUSP3 belong to the repABC family. repABC replicons are known as typical genome components of Alphaproteobacteria species51. The occurrence of more than one repABC replicon in one and the same genome has been described before and the plasmid family has been shown to exist of different incompatability groups. For instance, Rhizobium etli CFN42 has 6 repABC plasmids52,53.
Plasmids pBAM2, pUSP2, pUSP3, and pUSP4 contains Type IV secretion system (T4SS) genes54, while pUSP1 does not. This indicates that pUSP1 is likely not self-transferable, unlike pBAM2, pUSP2, pUSP3, and pUSP4. Besides T4SS genes, plasmid pUSP4 contains a mobABC operon. The 31.6 kbp plasmid pUSP5 lacks conjugative transfer genes and appears to be a mobilizable plasmid with genes encoding a VirD4-like coupling protein and a TraA conjugative transfer relaxase likely involved in nicking at an oriT site and unwinding DNA before transfer. Furthermore, MOB-suite predicted that pBAM1, pUSP2, and pUSP4 have MOBP-type relaxase genes, while pUSP1, pUSP3 and pUSP5 have MOBQ-type relaxase genes.
Specialized functions of plasmids pUSP1-5
In Table 3, all CDS of the different plasmids are categorized according to COGs. Half of the CDS annotated on plasmid pUSP1 (322 CDS) and pUSP2 (346 CDS) are genes primarily associated with the transport and metabolism of amino acids (20% and 12%, resp.), carbohydrates (6% and 6%, resp.) and inorganic compounds (10% and 3%, resp.), and genes for energy production and conversion (9% and 8%, resp.). For the plasmids pUSP3, pUSP4, and pUSP5, CDS categorized under the same COGs are lower than 18%. Together, pUSP1 and pUSP2 accounts for about 17% of all genes in MSH1 related to amino acid, carbohydrate transport and metabolism, and energy production and conversion in MSH1. The transport systems encoded by pUSP1 and pUSP2 include multiple ABC-transporters for N and/or S-containing organic compounds. For amino acids, carbohydrates and inorganic compound metabolism and transport, ABC-type transport systems are predicted for polar amino acids (arginine, glutamine), branched chain amino acids, and multiple sugars. In addition, transport systems for spermidine/putrescine, taurine, aliphatic sulphonates, dipeptides, beta-methyl galactoside, polysialic acid, and phosphate were predicted. Putative functions could be assigned by Prokka to 64.2%, 56.8%, 27.4%, 42.2%, and 34.3% of CDS for pUSP1, pUSP2, pUSP3, pUSP5 and pUSP5, respectively. On pUSP1, found in both MSH1 substrains, multiple genes could be assigned to metabolic subsystems by RAST. These include folate biosynthesis, cytochrome oxidases and reductases, degradation of aromatic compounds (homogentisate pathway), ammonia assimilation, and several genes related to amino acid metabolism. Some of these functions on pUSP1 do not have functional analogs on the chromosome, which may help to explain why pUSP1 was not lost in the KU Leuven substrain MK1, but the other pUSP plasmids were. On pUSP2, which is absent in MSH1 from KU Leuven, some genes are predicted to be involved in acetyl-CoA fermentation to butyrate, creatine degradation, metabolism of butanol, fatty acids, and nitrile, and a few miscellanoues functions. A large number of CDS on pUSP1 (19%), and pUSP2 (23%) are homologues to CDS on the chromosome and could be considered dispensable genes and hence explain the loss in the KU Leuven MK1 substrain. However, although these CDS might be considered homologues, their functionality might differ considerably in terms of substrate specificity and kinetics.
Besides genes encoding conjugative transfer, plasmid replication, and plasmid stability functions, most genes on plasmids pUSP3, pUSP4, and pUSP5 could not be annotated with a function. However, several genes on pUSP3 may have functions related to metabolism of sugars, including inositol and mannose which were not tested in an earlier growth optimization experiment44. On pUSP4, genes encoding a transmembrane amino acid transporter are situated next to an aspartate ammonia-lyase-encoding gene that enables conversion between aspartate and fumarate that may enter the tricarboxylic acid cycle, as described above. A cytochrome bd-type quinol oxidase, encoded by two subunit genes on pUSP5, also occurs in some nitrogen-fixing bacteria where it is responsible for removing oxygen in microaerobic conditions55. Furthermore, a pseudoazurin type I blue copper electron-transfer protein is encoded by a gene on pUSP5, that may act as an electron donor in a denitrification pathway. A chromate transporter, ChrA, encoded by a gene on pUSP5 may confer resistance to chromate56. Future studies should look into whether the lack of plasmids pUSP2-5 in substrain MK1 has phenotypic consequences, with regards to the predicted functions, including metabolism of sugars and aspartate, nitrogen metabolism, and resistance to chromate.
Plasmid stability and chromosome polyploidy
The Illumina sequencing coverage (1000 bp windows) of several plasmids relative to the chromosome (except for pBAM1) was lower than one, i.e., approximately 0.3 to 0.6 per chromosome. This suggests that either not all cells (only three to six out of ten) contain a copy of the same plasmid due to plasmid loss or that there are multiple copies of the chromosome. Previously, in the MK1 substrain, we observed that pBAM2 is not always perfectly inherited by the daughter cells in cultures grown in R2B and R2B containing BAM17. To observe whether plasmid instability explained the copy number relative to the chromosome in the sequenced cultures, sequencing was performed directly on the cryo stock as well as on colonies directly derived from this, mimicking the sequenced cell preparation for whole genome sequencing. We hypothesized that if certain plasmids are not stably inherited (i.e. those with copy numbers 0.3 to 0.6), only part of the cell population will habour those plasmids and picking of multiple colonies from a plate will result in picking of some colonies that have lost one or more plasmids.
The Aarhus MSH1 substrain containing all 7 plasmids was sequenced directly from the cryostock (grown to stationary phase prior to cryopreservation), from a single colony picked from R2A plates after spreading the cryostock (incubated for 7 days), and from the broth R2B culture that had been inoculated with the same single colony from cryostock (incubated for 72 h). Moreover, after spreading the latter R2B culture on an R2A plate and incubated for 11 days, an additional 14 MSH1 colonies were picked for sequencing. All samples represent stationary phase cultures/colonies, considering that MSH1 was previously shown to reach stationary phase after 30–35 hours44. Taking into account a plasmid coverage of 0.3–0.6 per chromosome, we expect that around half of the 14 picked colonies would have lost one or more of the plasmids in the case of poor inheritance. However, only one of the colonies showed loss of a plasmid, i.e., plasmid pUSP1 (Fig. 5) indicating polyploidy of the chromosome rather than unstable inheritance of plasmids. The loss of repABC megaplasmid pUSP1 shows that the possible metabolic features encoded by genes on pUSP1, as described above, are not essential for growth under these conditions, although, remarkably this is the only pUSP plasmid still present in substrain MK1. Interestingly, the plasmid/chromosome-ratio varied according to the growth medium from which DNA was isolated. When growing in R2B (broth), e.g. as done for DNA extraction for genome sequencing and from cryostock and R2B culture (first and third green rings, Supplementary Fig. S1 online), all plasmids, except pBAM1, have a copy number lower than one per chromosome. When DNA was extracted from colonies grown on R2A plates (though resuspended in PBS prior to DNA extraction), plasmid copy numbers were approx. one per chromosome, except for pBAM1 which has a copy number of approx. 2.5 per chromosome.


Coverage in 1000 bp windows of replicons normalized to chromosome coverage (NormCov). A NormCov of 1 indicates a single copy per chromosome of a replicon. A NormCov above 1 indicates that there are more copies of a given plasmid than the chromosome per cell. Points have been slightly jittered horizontally to improve visualization of overlaps.
Except for the single loss of pUSP1, nothing here indicates unstable maintenance of plasmids and subsequent loss. Instead, our results indicate that MSH1 regulates the chromosome copy number according to whether it grows as planktonic bacteria or fixed on an agar plate. The results shown here can be explained by MSH1 being polyploid with regards to its chromosome when growing in broth media. Single-copy plasmids (e.g. pBAM2, pUSP1-5) will thereby have copy numbers lower than one relative to the chromosome, when growing in broth R2B. Polyploidy in prokaryotes have been described before, including in Deinococcus, Borrelia, Azotobacter, Neisseria, Buchnera, and Desulfovibrio57 and may be quite overlooked in many other bacteria. E. coli in stationary phase was shown to have two chromosome copies after growing in rich, complex medium, but only 60% of the cells had two copies in stationary phase after slower growth in a synthetic medium57. It was suggested that monoploidy is not typical for proteobacteria, and that many bacteria are polyploid when growing in exponential phase57. Possible advantages offered by polyploidy include resistance to DNA damage and mutations, global regulation of gene expression by changing chromosome copy number, and finally polyploidy may enable heterozygosity in bacteria where genes mutate to cope with challenging condition while preserving a copy of the original genes. Despite the stability of the plasmids in MSH1, the KU Leuven substrain MK1 lacks plasmids pUSP2-5 and loss of pBAM2 was previously observed17. Although pBAM2 encodes its own T4SS, the multiple loss of pBAM2 and pUSP2-5 in the KU Leuven MSH1 could be hypothetically explained by some uncharacterized plasmid codependence, where one loss leads to another. The dynamics of plasmid loss that has led to formation of the KU Leuven substrain MK1 are still unknown.

