Adenylation subdomain exchange via CRISPR-Cas9 to generate new enduracidin analogues
Enduracidin (Fig. 1a) is produced by Streptomyces fungicidicus ATCC 21013 and is closely related to ramoplanin (Fig. 1b), from Actinoplanes sp. ATCC 33076, which entered phase III clinical trials for the treatment of vancomycin-resistant Enterococcus30,31. A total synthesis of ramoplanin has been developed, and used to produce variants with improved attributes, but this required over 40 chemical steps31, which is not viable for drug development. Biosynthetic engineering approaches for diversification of these lipopeptide antibiotics would be a far more attractive alternative to chemical synthesis. However, genetic manipulation of the huge and repetitive 17-module NRPS systems that deliver these lipopeptide antibiotics would be extremely challenging using the conventional double-crossover homologous recombination method16. Enduracidin biosynthesis requires four NRPSs (2.0 MDa) encoded by endA-D genes, encompassing 57 kb in total (Fig. 1a). The first of these, EndA, is relatively small and contains just two modules (Fig. 1a). The relatively low complexity of EndA made it a good candidate for initial testing of FSD swaps using CRISPR-Cas9, whilst also allowing comparison under optimal conditions with conventional recombineering methods (Fig. 2). An FSD comprising 139 amino acid residues within the second A domain of EndA that recognizes L-Thr was identified27,32, and selected for replacement with the FSD from the L-Ser selective A domain of EndC (Supplementary Fig. 1a). A successful exchange of these two FSDs, which share 88% identity and 89% similarity (Figs. 2a and b, Supplementary Table 2), should result in enduracidin with L-Ser in place of L-Thr at position 2. By making this change at the native chromosomal locus, using the CRISPR-Cas9 system, any disruption to the integrity of the NRPS machinery should be limited and the creation of the mutant strains could be achieved more rapidly than with the laborious multi-step gene knockout/complementation approach.


a The structure of a ligand-bound GrsA Phe-A domain (PDB: 1AMU)32. The flavodoxin-like subdomain (FSD) is highlighted in yellow and ligands phenylalanine and adenosine monophosphate (AMP) are shown in blue and red, respectively. Residues HKGISNLKVFFENSLNV which form an alpha helix have been removed for greater visibility of the ligands. b Alignment of the grsA Phe-A domain with the endA Thr[2]-A domain, with the secondary structural features indicated and nine of the ten amino acid substrate binding residues (except Lys517) marked with red asterisks20. The FSD sequence is highlighted in yellow and the black triangles show cut sites for the subdomain swap. c Single-step CRISPR-Cas9 strategy for exchanging subdomains at the native locus. sgRNA-guided Cas9 cleaves within the Thr[2] A domain of endA. DNA repair utilizes a plasmid-borne sequence containing endA possessing a Thr[2] to Ser[12] subdomain swap, resulting in seamless exchange of the Thr[2] FSD with a Ser[12] FSD at its native locus within the BGC. d Schematic for conventional multi step gene knockout and complementation strategy. The wild type endA is replaced with an apramycin resistance cassette. The resultant ∆endA strain is complemented with an integrative plasmid containing endA (Swap 1) where the subdomain sequence of the Thr[2] A domain is replaced with the subdomain from the Ser[12] A domain. Insertion of endA (Swap 1) occurs at the ΦC31 site outside of the enduracidin biosynthetic gene cluster (BGC).
A pCRISPomyces-2-based plasmid28 was constructed, containing a sgRNA that directed a double-stranded DNA cut within the targeted L-Thr FSD and a repair cassette comprised of endA with the native FSD exchanged for the alternative L-Ser FSD from endC (Supplementary Table 1 and Supplementary Data 1). This repair cassette should repair the resulting DNA break, via homologous recombination, with the copy of endA containing the desired FSD change (Swap 1, Fig. 2c, and Supplementary Table 2). The plasmid was introduced into the wild type enduracidin producer, S. fungicidicus, resulting in a clean exchange of the original FSD with the new L-Ser selective FSD, at the native locus, as determined by DNA sequencing (Supplementary Fig. 1b). RP-HPLC and LC-HRMS/MS analysis of the resulting mutant strain F1 (Thr to Ser) revealed production of new L-Ser containing variants of enduracidins a and b (2a & 2b) with observed m/z 780.9827 (C106H139Cl2N26O31) and 785.6545 (C107H141Cl2N26O31) corresponding to the predicted [M + 3H]3+ ions (Fig. 3a) at a high titre of 65 mg/L, with only trace levels of wild type enduracidins (1a & 1b) evident (<3%). The high preference of this strain (>97%) for producing the L-Ser containing enduracidin (2a & 2b), rather than the wild type, confirms that introducing the subdomain change at the native chromosomal locus is very effective. The traces of 1a & 1b detected in the mutant strain (F1) could be due to the structural similarity between the amino acids (Thr & Ser). To confirm the structures of the new products, 2a was purified from strain F1 and compared with 1a isolated from the wild type S. fungicidicus ATCC 21013. Tandem MS analysis of 2a revealed fragment ions corresponding to a loss of CH2 when compared to the equivalent ions of 1a (Fig. 3c and Supplementary Fig. 2). Fragment ions not predicted to contain the altered amino acid were identical for all four compounds, indicating the observed loss of CH2 stems from a Thr to Ser switch at the desired position of enduracidin. The 1H NMR spectra of 1a includes a doublet at 1.39 ppm, corresponding to the methyl moiety of the Thr residue, which was absent in the spectra of 2a as expected. In addition, a new NOESY correlation signal from the NH moiety of residue 2 at 8.52 ppm to a methylene (CH2) signal at 4.06 ppm in 2a, provides further support that the Thr to Ser exchange (Swap 1) has been successful (Supplementary Figs. 3–6 and Supplementary Table 3).


a LC-HRMS comparison between CRISPR-Cas9 editing and conventional gene complementation methods for introducing subdomain swaps. The F1a (Swap 1-complement) complementation strain produces mostly 1a & 1b, whereas the CRISPR-Cas9 edited strain S. fungicidicus F1 (Swap 1) produces Ser-containing enduracidins 2a & 2b. b LC-HRMS analysis of all productive mutants that generate new enduracidin analogues. c LC-HRMS/MS analysis of 1a and 2a confirms that Thr has been replaced with Ser in compound 2a. Fragment ions containing residue 2 are 14 mass units higher in 1a compared with 2a (methyl group of Thr). Loss of residue 2 results in fragment ions with identical m/z values between 1a and 2a.
A-domain FSD exchange via conventional gene deletion and complementation
Given that endA is small, we chose to repeat Swap 1 using a more conventional gene knockout and complementation strategy for comparison with the CRISPR-Cas9 mediated procedure (Fig. 2d). First, we deleted endA from the S. fungicidicus chromosome by introducing an apramycin resistance cassette apraR in place of endA (Supplementary Fig. 7). The resultant ∆endA strain was selected on apramycin-containing media and was unable to produce wild type enduracidins a and b (1a & 1b) (Fig. 3a). A wild type copy of endA was introduced back into this strain (forming strain ∆endA::endA) under control of the constitutive ermE* promoter, using a vector integrating at the ΦC31 site, distal to the end biosynthetic gene cluster (BGC) locus. This restored production of 1a & 1b, albeit at lower levels (14 mg/L) when compared with the wild type strain (108 mg/L). Next, a copy of endA engineered with the L-Ser FSD from endC (Swap 1), was introduced into the ∆endA strain via integration at the ΦC31 site. Surprisingly, the resulting strain (F1a) still produced wild type enduracidins (1a & 1b) as the major products (9 mg/L), despite possessing a mutant EndA that is identical in sequence to that produced in the CRISPR engineered strain (F1). LC-HRMS analysis revealed additional products with m/z 780.9831 and 785.6550 corresponding to the predicted [M + 3H]3+ ions of the desired L-Ser containing enduracidin variants (2a & 2b) (Fig. 3a). However, these variants were only observed at very low levels (ca. 8% that of 1a & 1b, as determined by LC-HRMS). Whole genome sequencing of strain F1a confirmed that the engineered endA gene was indeed inserted at a different locus on the chromosome. Given that endA would be expressed independently to the native end BGC, it is possible that the assembly and interactions of the EndA variant with the other NRPS and gene products of the end BGC could be perturbed. This, compounded by the selectivity of the downstream C-domain of EndB and any proofreading machinery, may result in a preference for introduction of L-Thr over L-Ser, especially if peptide assembly is slowed due to poor timing of gene expression and assembly of the NRPS machinery. The fact that the single-step CRISPR-Cas9 strategy was not only more successful in generating a successful amino acid substitution, but did so without a penalty in production titre, indicates that this is a much more efficient way of introducing A domain specificity changes than conventional means. Moreover, the multi-step conventional approaches often leave scars, in this case, an apramycin resistance gene, which can severely limit subsequent genetic manipulation. The CRISPR-Cas9 approach, on the other hand, is scarless, facilitating further strain manipulation, which could potentially enable combinatorial FSD exchanges to be implemented.
Exploring further FSD changes
We next sought to explore the broader applicability of CRISPR-Cas9 subdomain swapping. Genes endB and endC, encoding six and eight NRPS modules, respectively, are significantly larger than endA and contain multiple repeating sequences. Knocking out endB or endC and complementation with engineered variants, possessing FSD swaps, would present a significant challenge and may not be achievable with standard methodologies. To address this, three modules within EndC, the largest enduracidin NRPS, were targeted for FSD changes by CRISPR-Cas9 (Ser[12], Gly[14] and Ala[16]). Bioinformatics analysis of the A domain sequences of both the enduracidin and ramoplanin NRPS30,33,34 (Fig. 1) guided the design of nine additional constructs for FSD replacement (Fig. 4 and Supplementary Table 2). First, the complementary exchange to Swap 1, where the L-Ser FSD of EndC was swapped with the L-Thr FSD from EndA, was performed in an analogous fashion. This exchange (Swap 2) resulted in two new enduracidin products 3a and 3b with L-Thr in place of the original L-Ser residue at position 12. The change was confirmed by LC-HRMS and LC-HRMS/MS (Fig. 3b and Supplementary Fig. 8). Only trace quantities of wild type enduracidin 1a and 1b were evident, with the desired new compounds (3a & 3b) being produced with a selectivity of 91% versus the wild type. To explore if subdomains from a different gene cluster and organism could be introduced, three FSDs from the ramoplanin NRPS that are selective for L-allo-Thr were exchanged with the L-Ser FSD of EndC (Swaps 3–5). The selected replacement FSDs were from modules that incorporate allo-Thr into positions 5, 8 and 12 of ramoplanin, respectively (Fig. 1)31,33,34. All three resulting swaps generated two new products (4a & 4b) consistent with a Ser to allo-Thr exchange with high selectivity as little or no 1a and 1b could be detected. Although the new products (4a & 4b) have identical masses and MS/MS fragmentation patterns as the Thr containing 3a and 3b, LC-HRMS showed shifted retention times as expected for diastereoisomers (Fig. 3b).


Most of the subdomain swaps were designed for endC, the largest NRPS, containing multiple repeating sequences, which presents a significant challenge for engineering conventional homologous recombination methodologies. *Swap via gene complementation, † Relative titre of engineered variants (2–6) compared titres of 1 produced by wild type, ‡ selectivity for new products (2–6) over parent enduracidin (1) in engineered NRPS. Coloured arrows are used to indicate different amino acid swaps: green (Thr[2] to Ser), blue (Ser[12] to Thr), pink (Ser[12] to aThr), brown (Ser[12] to Ala), purple (Gly[14] to Ala) and yellow (Ala[16] to Gly). Source data for yield and selectivity are provided with this paper.
Two further exchanges (Swap 6 & 7) were then attempted where the L-Ser FSD of EndC was replaced with Ala-activating FSDs from either EndC or the ramoplanin NRPS Ramo14. However, LC-HRMS analysis showed that these swaps failed to produce the Ala containing variants and production of enduracidin was abolished in both mutants (Supplementary Fig. 9). The FSD swaps in these two strains may have disrupted the activity of the A domain, preventing activation of Ala. Alternatively, strict specificity of the downstream C-domain could have disrupted the processing of the Ala containing variant14,15,35. The same Ala FSD from EndC was, however, successfully used to replace the Gly FSD of EndC (Swap 8), resulting in new enduracidin a and b analogues with an Ala in place of Gly (5a & 5b) (Fig. 3b). The opposite replacement of the Ala FSD of EndC with the Gly-activating FSDs from both EndC and Ramo14 (Swap 9 & 10) was also successful and resulted in production of new enduracidin a and b variants consistent with a switch of Ala to Gly (6a & 6b) (Fig. 3b). Production of analogues 5a, 5b, 6a and 6b was, however, accompanied with 12–39% of the wild type enduracidins 1a & 1b. This indicates an incomplete switch in selectivity, which may be due to the close structural similarity between alanine and glycine. To confirm the structures of the new engineered products, the most abundant enduracidin a variants (3a, 4a, 5a & 6a) were isolated and subjected to tandem MS analysis. Analysis of fragmentation patterns enabled diagnostic fragmentation ions to be assigned, confirming the positions of the swapped amino acid residues (Supplementary Figs. 8 and 10–15).
To demonstrate the broader applicability of this method, seven additional FSD swap mutants were constructed using diverse NRPS from different bacterial strains (Fig. 5 and Supplementary Table 1). One swap mutant for each of the positions discussed above (Thr[2], Ser[12], Gly[14] and Ala[16]) was constructed: Swap 11 (Thr[2] to Ser) used a Ser-selective FSD from the calcium-dependent antibiotic (CDA)36 NRPS of Streptomyces coelicolor M145; Swap 12 (Ser[12] to Thr) used a predicted Thr activating FSD from a streptobactin-like37 BGC in Streptomyces griseolus NRRL 3739; Swap 13 (Gly[14] to Ala) employed an FSD predicted to activate Ala from tyrobetaine-like38 BGC in Streptomyces rimosus sub. paromomycinus NRRL 2455; and Swap 14 (Ala[16] to Gly) was constructed with a predicted Gly-selective FSD from a lipopeptide 8D1-139 BGC found in Streptomyces rochei NRRL B1559. Anti-SMASH40 was used to identify candidate FSD for swaps 12, 13 and 14 from NRPS that exhibit high similarity to previously well characterized NRPS pathways37,38,39. Among these four mutants, only Swap 12 generated the desired enduracidin variants (3a & 3b) with a selectivity of >99%, with only traces of the wild type enduracidins detected (Fig. 5). Swaps 13 and 14 showed abolished production, while Swap 11 showed traces (<1%) of wild type enduracidin 1a. The fact that only Swap 12 produces the expected variants is most likely due to the high sequence identity (64%) and similarity (72%) between the FSD that were exchanged. In comparison, the FSD exchanged in Swaps 11, 13, and 14 have lower identity (29–44%) and similarity (40–59%) ranges (Supplementary Table 2).


a List of additional swap mutants constructed using FSDs from various NRPSs to demonstrate the broad applicability of this method. b Combined EIC of enduracidins a and b analogues from LC-HRMS analysis of extracts from additional S. fungicidicus strains (F12 and F15-17) used in this study (normalized to 100%). The product for Swap 16 had a retention time consistent with the incorporation of allo-Thr. Source data for yield and selectivity are provided with this paper. *Natural products and FSDs are predicted using anti-SMASH.
Based on these results, further FSD exchanges were implemented: Swap 15 (Ser[12] to Thr) using an FSD from an NRPS in Streptomyces rochei NRRL B1559, which is predicted by anti-SMASH to produce antimycin41; Swap 16 (Ser[12] to Thr) introducing an FSD derived from the known pristinamycin pathway in Streptomyces sp. DSM 4033842; and finally Swap 17 (Ser[12] to allo-Thr) using an FSD from a BGC of Pseudomonas syringae DSM 10604 that is known to produce syringafactin43. All three mutants generated new enduracidin analogues, which was expected based on the high sequence similarity between the FSDs that are exchanged (Supplementary Table 2). Swap 15 produced variants 3a & 3b and Swaps 16 and 17 gave the compounds 4a & 4b at >99% selectivity (Fig. 5). Of the 17 FSD exchanges that were performed, 12 were successful in producing enduracidin variants. This provides a guideline for future engineering, with successful swaps occurring between FSD that have an identity of >55% and a similarity of >65% (Supplementary Table 2).
Effect of FSD changes on production titre
In order to quantify the effects of the subdomain changes on overall enduracidin titre, we cultivated all the subdomain swap mutants and the wild type S. fungicidicus under identical conditions. Streptomyces strains are known to be highly susceptible to variations in culture conditions; therefore ten replicates per strain were prepared and were cultured simultaneously to minimize disparities in culture time or temperature fluctuations. To accommodate parallel cultivation of a large number of strains and replicates, the fermentation conditions were optimized for higher throughput, which involved changing media components and reducing the volume of cultures. When grown under these new conditions, the wild type S. fungicidicus produced significantly lower enduracidin titres of 2.02 ± 0.45 mg/L (1a & 1b) when compared to the previous conditions that were optimized for production of 1a and 1b (108 mg/L) (Fig. 6, Supplementary Fig. 16 and Supplementary Table 4). However, these new conditions did allow a reliable comparison between strains to be made. Notably, several mutant strains produced enduracidin variants in yields approaching that observed by the wild type; strain F1 (Thr[2] to Ser, Swap 1), F2 (Ser[12] to Thr, Swap 2), and F10 (Ala[16] to Gly, Swap 10) produced variants (2a & 2b), (3a & 3b) and (6a & 6b) in titres of 72%, 55% and 82% relative to the wild type. The three strains (F3–F5) containing Ser[12] to allo-Thr exchanges (Swaps 3–5) produced variant 4a and 4b in lower relative yields (16–32%) compared with the wild type. Strains F8 (Gly[14] to Ala, Swap 8) and F9 (Ala[16] to Gly, Swap 9) also gave reduced yields of 16–20% (Fig. 4, Fig. 6, and Supplementary Table 4). The strains F12 and F15–F17 which were generated from FSD derived from NRPS produced by more diverse Streptomyces and Pseudomonas species, gave similar yields of engineered enduracidins (3a & 3b or 4a & 4b), in the range 16% to 47%, relative to the wild type strain (Fig. 6 and Supplementary Table 4).


Production levels (mg/L) for engineered enduracidin from FSD swaps compared with the wild type enduracidin (1a & 1b). Non-filled bars represent a batch of ten replicates (n = 10) for Swaps 1–5 and Swaps 8–10. Shaded bars represent a different batch of cultures carried out in triplicate (n = 3) for Swaps 12, 15, 16 and 17. Data are presented as mean values + /− standard error. Each data point is overlaid on the graph as a dot plot. The wild type strain is included in both batches as a positive control. The production for Swap 10 was calculated from an average of eight samples, omitting two statistical outliers from flasks 1 and 5. The ratio of enduracidin a & b variants produced by wild type and engineered strains is variable and so the combined yields (a & b) are reported in each case to enable a clearer comparison of the productivity of each strain. Each bar is colour-coded for the analogues produced: green (2a & 2b), blue (3a & 3b), pink (4a & 4b), purple (5a & 5b) and yellow (6a & 6b). Source data are provided with this paper.
Whole genome sequencing to evaluate off-target effects of CRISPR-Cas9 editing
In order to establish if the CRISPR-Cas9 gene editing caused any off-target effects that may have impacted on the mutant strains ability to produce enduracidin, six strains were subjected to PacBio whole genome sequencing. The wild type strain was also sequenced, and the mutant strains were aligned with this to identify any off-target mutations. Mutant strains with a range of enduracidin production levels were sequenced, from relatively high levels of production (F1, F2, and F10) to no production (F7). All of the mutant strains had the correct swapped subdomains, as previously confirmed by DNA sequencing. In addition, all the strains sequenced did not possess any mutations anywhere in the enduracidin gene cluster, which indicates that CRISPR-Cas9 gene editing of NRPS does not lead to any of the rearrangements that have been seen with genetic manipulations of other genes encoding related megasynthases17,18,19. Most mutant strains did have a number of single-nucleotide variants (SNVs) or small insertions or deletions (indels) (Supplementary Table 5). The three strains that had the highest titres of enduracidin production (F1, F2, and F10) did exhibit the lowest number of mutations across their genomes, with F2 having no mutations at all other than the FSD swap. The strain that produced no enduracidin (F7) also had a relatively low number of mutations (2 indels and 1 SNV), ruling out off-target effects of the CRISPR-Cas9 editing in explaining the lack of production in this strain. The other two strains (F8 and F9) sequenced had relatively low enduracidin production levels and a higher number of mutations within the genome (14 mutations for F8, and seven for F9), although none of these can explain the lower levels of enduracidin produced and so it is most likely that the swapped subdomain has affected the efficiency of these NRPSs.

