Identification of the angustmycin gene cluster from S. angustmyceticus JCM 4053
To identify the gene cluster for angustmycin biosynthesis, we investigated the genomes of two independent angustmycin producer strains, Streptomyces angustmyceticus JCM 4053 and Streptomyces decoyicus NRRL 2666, whose capabilities of angustmycins production were confirmed by liquid chromatography-mass spectrometry (LC-MS) (Supplementary Fig. 1a–d). Since adenine is the direct precursor for the corresponding adenosyl motif of angustmycins13, we initially used Ari9 (the adenosine phosphorylase in the aristeromycin pathway) as a query sequence to perform BlastP analysis against S. angustmyceticus JCM 4053 genome (accession no. CP082945, BLAG01000004). This identified a homolog with 45% identity to Ari9, but the surrounding genes (lotus tag, K7396_21205-K7396_21225; accession no. CP082945) are apparently unrelated to angustmycin biosynthesis. We deduced that angustmycin biosynthesis may use a biosynthetic strategy similar to that of trans-zeatin to provide the purine-related moiety28, and we subsequently used the enzyme LOG (Lonely guy; accession no. AK071695) as the query sequence to conduct BlastP analysis (Supplementary Fig. 1e), leading to the discovery of five candidate enzymes. The surrounding regions of these candidates revealed a gene cluster (agm, accession no. MZ151497) that encodes enzymes related to allulose sugar biosynthesis and assembly. Furthermore, using AgmA as the query sequence led to the identification of a homologous gene cluster (accession no. MZ151498) in S. decoyicus NRRL 2666 genome (accession no. CP082301) with the identical organization and significant homology. Hence, these initial findings suggested that the target gene cluster (agm) may be directly responsible for angustmycin biosynthesis (Fig. 2a).


a Genetic organization of the angustmycin biosynthetic gene cluster. The agm gene cluster is responsible for angustmycin biosynthesis in S. angustmyceticus JCM 4053 and S. decoyicus NRRL 2666. The two gene clusters in different strains harbor significant homologies, and the gray shaded bars represent the identities of the two corresponding enzymes encoded by the two gene clusters. b Bioassays of the metabolites produced by related recombinants of S. coelicolor M1154. The indicator strain is Mycobacterium smegmatis mc2155. The Roman numerals I, II, III, IV, and V represent JCM 4053, pCHW501, pCHW501ΔagmF, pCHW501ΔagmR, and pSET152, respectively, and the denotations of them correspond to those shown in (c). c HPLC analysis (λ = 254 nm) of the metabolites produced by related recombinants of S. coelicolor M1154. JCM 4053, 1 and 2 produced by S. angustmyceticus JCM 4053 as positive controls ; pCHW501, the metabolites of the recombinant S. coelicolor M1154 containing pCHW501; pCHW501ΔagmF, the metabolites of the recombinant S. coelicolor M1154 containing pCHW501ΔagmF; pCHW501ΔagmR, the metabolites of the recombinant S. coelicolor M1154 containing pCHW501ΔagmR; pSET152, the metabolites of the recombinant S. coelicolor M1154 containing pSET152 as a negative control. Allulose 6-P, Allulose 6-phosphate.
To correlate the agm gene cluster to angustmycin biosynthesis, the gene cluster from S. angustmyceticus JCM 4053 was directly cloned by two-fragment PCR coupled with Gibson assembly. The subsequent gene cluster and its variant (pCHW501 and pCHW501ΔagmF) were individually introduced into Streptomyces coelicolor M1154 (Supplementary Fig. 2a, b). After confirmation, the resultant conjugants were fermented for metabolite analysis. A bioassay indicated that the metabolite samples of M1154::pCHW501 and JCM 4053 show apparent inhibition against the indicator strain Mycobacterium smegmatis, while the samples of M1154::pCHW501ΔagmF and M1154::pSET152 lacked the relevant bioactivity (Fig. 2b). High-performance liquid chromatography (HPLC) analysis indicated that the sample from S. coelicolor M1154::pCHW501 generated target peaks corresponding to those of 2 and 1 of JCM 4053 metabolites, while these peaks were absent from the metabolites of the negative control (S. coelicolor M1154::pSET152). LC-MS analysis revealed that the target peaks of M1154::pCHW501 gave rise to distinctive [M + H]+ ions at m/z 298.1144 (with major fragment ions at m/z 280.0587 and 136.0617) and m/z 280.1040 (with fragment ions at m/z 262.0929 and 136.0618), which are fully consistent with the expected fragmentation patterns of 2 and 1 (Supplementary Fig. 3b, c). More interestingly, the metabolites recovered from M1154::pCHW501ΔagmF selectively produced the 2 peak, but not 1, suggesting that the agmF gene is likely responsible for the conversion of 2 to 1 (Fig. 2c and Supplementary Fig. 3a).
To establish the identities of the target metabolites accumulated by M1154::pCHW501 and M1154::pCHW501ΔagmF, they were HPLC purified for 1D and 2D nuclear magnetic resonance (NMR) analysis. The proton-NMR (1H-NMR) and heteronuclear multiple quantum coherence NMR spectra of the metabolite from M1154::pCHW501ΔagmF displayed two methylenes, five methines (including two sp2 carbons), and four quaternary carbons (including three sp3). Detailed analysis of the 1D and 2D NMR data led to the identification of this metabolite as 2, which was further confirmed by 1H-1H COrrelated SpectroscopY correlations of H-3′‒H-4′‒H-5′‒H-6′, and by heteronuclear multiple bond correlations (HMBC) of H-1′ with C-2′ and C-3′; H-8 with C-4 and C-5; and H-2 with C-4, C-5, and C-6 (Supplementary Fig. 4, 5 and Supplementary Table 1). Comparison of the 1D and 2D NMR data of the M1154::pCHW501 metabolites with those of 2 suggested structural similarities, except that the hydroxymethyl group at C-5′ in 2 was replaced by an exocyclic double bond. This was further confirmed by HMBC correlations from H-6′ to C-4′ and C-5′ (Supplementary Figs. 6 and 7 and Supplementary Table 2). Accordingly, the structures of the target metabolites were identical to those of 2 and 1 as shown in Fig. 1. Taken together, these data also demonstrated that the target gene cluster is responsible for the biosynthesis of angustmycin.
The nine-gene cluster agm is required for angustmycin biosynthesis
In silico analysis revealed a target 9.8 kb region (pCHW501) containing nine genes that are deduced to be involved in angustmycin biosynthesis. Of these, agmA-F, constituting the structural genes, are important for angustmycin biosynthesis on the basis of genetic and in silico analysis, and their functions are described in Supplementary Table 3. Moreover, AgmT1 and AgmT2 are annotated as MFS (major facilitator superfamily) transporters. In general, MFS proteins facilitate the transport of a variety of substrates across cytoplasmic or internal membranes29, and therefore, AgmT1 and AgmT2 are proposed to play the functional role of exporting angustmycins out of cells. AgmR shows 50% identity to A6A06_38595 (LacI family regulator) of Streptomyces sp. CB02923, and may have a regulatory role by binding to the promoter region during angustmycin biosynthesis. To establish the role of agmR, the pCHW501ΔagmR variant was conjugated into S. coelicolor M1154, and the resultant recombinant was then fermented for metabolite analysis (Supplementary Fig. 2c and Supplementary Table 4). HPLC analysis indicated that the sample from M1154::pCHW501ΔagmR was capable of producing two peaks corresponding to 2 (major component) and 1, which are quantitatively different from that of M1154::pCHW501, implying that there may be an endogenous regulator in S. coelicolor M1154, which also participates in upregulating the transcriptional level of the biosynthetic genes to control the ratio of 2 and 1 (Fig. 2c and Supplementary Fig. 3a, d). Related studies to investigate the regulatory mechanism during angustmycin biosynthesis are now underway in our laboratory.
AgmA functions as an AMP phosphoribohydrolase
As indicated by in silico analysis, AgmA shows considerable homology (54% identity) to the LOG gene of Oryza sativa Japonica Group (japonicum cultivar rice), which is a well-characterized phosphoribohydrolase involved in trans-zeatin biosynthesis, so we proposed that AgmA performs a similar function in angustmycin biosynthesis (Supplementary Fig. 8a). To determine if AgmA indeed fulfills this role, it was overexpressed and purified to near homogeneity from E. coli BL21(DE3). The in vitro activity was tested using AMP/dAMP as substrate. Consistent with our expectation, the HPLC results indicated that AgmA is capable of consuming AMP/dAMP as a substrate to produce a characteristic peak for adenine, which was absent from the negative control (Fig. 3a and Supplementary Fig. 8b–f). Further LC-MS analysis showed that the target peak gave rise to a distinctive [M + H]+ ion for adenine at m/z 136.0613 (with major fragment ions at m/z 118.8611 and 93.8582) and m/z 136.0613 (with major fragment ions at m/z 118.8811 and 93.9553), fully corresponding with those of the adenine authentic standard (Supplementary Fig. 9a–c). We then determined the specificity of AgmA against various other substrates related to AMP, including ATP, ADP, dATP, dADP, adenosine, and deoxyadenosine. The HPLC results from these experiments indicated that the AgmA enzyme could also recognize ATP, ADP, and dADP as substrates, but with lower relative activity (Supplementary Fig. 9d). These data establish that AgmA functions as an AMP phosphoribohydrolase for the supply of adenine.


a HPLC traces (λ = 254 nm) of the AgmA reactions using AMP or dAMP as the substrate. Std, the authentic standards of AMP, dAMP, and adenine; AgmA(AMP), AgmA reaction with AMP as substrate; -AgmA(AMP), reaction using AMP as substrate but without AgmA added; AgmA(dAMP), AgmA reaction with dAMP as substrate; -AgmA(dAMP), reaction using dAMP as substrate but without AgmA added; -AMP/dAMP, AgmA reaction without AMP or dAMP added. b Sequence similarity network (SSN) of 9058 AgmA homologs generated by Cytoscape (v3.8.2). Each node in the network represents a protein sequence and the alignment score is 70. AgmA falls into the actinobacteria cluster and is highlighted by a yellow point.
The LOG family of cytokinin-activating enzymes (InterPro: IPR005269), as revealed by BlastP analysis, is widespread among the three kingdoms of life (with ca. 9000 sequences retrieved from the UniProt database). These sequences were incorporated using an enzyme similarity tool (EFI-EST) to generate a sequence similarity network (SSN)30 with the UniRef90 database based on an alignment score of 80 (Fig. 3b). These results indicated that AgmA homologs are more widely distributed in bacterial genomes, and also in plant genomes. The related enzymes of actinobacterial origin, including the AgmA described in this study, cluster in the network (highlighted in red). These cytokinin-activating family proteins commonly contain a conserved PGGxGTxxE motif, which differs from the canonical lysine decarboxylases31. Their proposed catalytic activity in plants is to convert inactive cytokinin nucleotides to active forms. Foreseeably, the AgmA-related enzyme may be of considerable value for plant cytokinin studies and applications.
Angustmycin A biosynthesis features an unusual final dehydration step
The C-5′–C-6′ double bond present in the sugar moiety of 1 is chemically intriguing. In our initial studies, the truncated agm gene cluster (pCHW501ΔagmF) was revealed to confer the host cell S. coelicolor M1154 with the capability of selectively producing 2, implying that the candidate enzyme AgmF is likely to govern the final dehydration step of 2 to 1. Surprisingly, the further bioinformatic analysis showed that AgmF was annotated as an S-adenosylhomocysteine (SAH) hydrolase, which usually catalyzes the reversible hydrolysis of SAH to adenosine and homocysteine32,33. SAH hydrolase is a ubiquitous enzyme that plays a central role in cellular methylation processes by maintaining the intracellular balance between SAH and S-adenosylmethionine34. To determine if AgmF also fulfills the role of a SAH hydrolase, it was overexpressed and purified from E. coli BL21(DE3), and we subsequently evaluated the enzymatic assay of AgmF using SAH as substrate. More surprisingly, the results showed that the AgmF enzyme could not recognize it as the substrate. We, therefore, reinvestigated the enzymatic role of AgmF, and inferred that 2 is more likely to be its real substrate. Accordingly, we tested its activity with 2 in the presence or absence of the exogenous NAD+ (nicotinamide adenine dinucleotide) cofactor, and the HPLC results indicated that 2 is indeed dehydrated to form 1 (Supplementary Fig. 10a). The identity was further confirmed by LC-MS, as it could give rise to [M + H]+ ions at m/z 280.1032 (with main fragment ions at m/z 262.0833 and 136.0000) and m/z 280.1031 (with main fragment ions at m/z 262.0000 and 135.9167), which fully matched to those of a 1 authentic standard. More interestingly, we also report that AgmF is able to maintain the activity without the addition of exogenously NAD+ cofactor, implying that AgmF is a self-sufficient enzyme in which the NAD+ cofactor is tightly bound within the activity pocket (Fig. 4a and Supplementary Fig. 12a). To further test this hypothesis, the AgmF enzyme was heat-treated to release the potential cofactor, which was then identified by HPLC as a distinctive NAD+ peak, whose identity was further determined by LC-MS (Supplementary Fig. 10b–f). To compare the difference in the enzymatic activity with or without NAD+, we conducted a time-course experiment, illustrating that the AgmF reaction with NAD+ occurs at a higher rate in the early reaction stage (Supplementary Fig. 10g). Furthermore, we noticed that substrate 2 was not be used up in the AgmF reaction, implying that this enzyme may catalyze a reversible reaction. Hence, AgmF may catalyze a reversible reaction from 1 to 2 with the chemical equilibrium shifted to 1 formation (Fig. 4b and Supplementary Fig. 12b).


a HPLC analysis (λ = 254 nm) of AgmF reaction with 2 as substrate. Std, the authentic standards of 2 and 1; AgmF, AgmF reaction with 2 as substrate; -AgmF, reaction using 2 as substrate but without AgmF added; -(2), AgmF reaction without 2 added. b HPLC analysis (λ = 254 nm) of AgmF-catalyzed reversible reaction with 1 as substrate. Std, the authentic standards of 2 and 1; AgmF, AgmF reaction with 1 as substrate; -AgmF, reaction using 1 as substrate but without AgmF added; -(1), AgmF reaction without 1 added. c Homology structure model of AgmF. This structure is constructed on the basis of Rv3248c from Mycobacterium tuberculosis H37Rv (PDB 3CE6), and the active 2 and NAD+ binding sites are indicated in the rectangular region. The sites marked out are proposed to be essential for the binding of 2. d HPLC analysis (λ = 254 nm) of the reactions of AgmF and its variants. Std, the authentic standards of 2 and 1. AgmF, reaction with the intact AgmF added; lines labeled H56A, D129A, K184A, D188A, and H299A show the individual reactions of AgmF variants; -(2), AgmF reaction without substrate 2 added. e Proposed enzymatic mechanism for the AgmF-catalyzed reaction.
We next determined the kinetic parameters of AgmF for the substrate 2, with KM = 1.382 ± 0.121 mM, and kcat = 1.902 ± 0.061 s−1 (Supplementary Fig. 13a). The KM value is larger than those of the reported SAH hydrolases with SAH as substrate35. We also evaluated the substrate specificity of AgmF against several 2 analogs, including adenosine, 2′-deoxyadenosine, 2′-amino-2′-deoxyadenosine, aristeromycin, formycin, and tubercidin, with results showing that the enzyme specifically recognizes 2 as the primary substrate (Supplementary Fig. 12d, e). To further investigate the catalytic mechanism of AgmF, a homologous structural model for AgmF was constructed based on a SAH hydrolase (containing NAD+ and adenosine binding pockets) of Mycobacterium tuberculosis H37Rv (PDB: 3CE6, 34% identity to AgmF)34. Accordingly, the side chains of Thr155, Thr156, Asn189, Thr240, Glu241, and Thr274 are identified as the candidate NAD+ binding sites that are generally required for this family of SAH hydrolase-related enzymes to execute the catalytic function. Specific residues from the C terminus of the adjacent subunit also interact with the NAD+, forming a covering over the binding site. This structure might partly explain the difficulty in the release of NAD+ (Fig. 4c). The 2 binding to the active sites also seems to be similar to those of the model structure complexed with adenosine. His56 and His299 may form hydrogen bonds with the 6′-hydroxyl of 2, and residues Asp156, Asp188, and Lys184 probably interacted with the hydroxyls of the furanosyl sugar (Fig. 4c and Supplementary Fig. 11). To verify these deductions, we performed selective mutational assays and determined that the AgmF variants H56A, D129A, K184A, D188A, and H299A completely abolished the activity for catalyzing 2 to 1, thereby demonstrating that these residues are functionally essential for AgmF activity (Fig. 4d and Supplementary Fig. 12c). Hence, these combined data demonstrate that AgmF functions as an unusual NAD+-dependent dehydratase. On account of the structural similarity between 2 and adenosine, we also assessed if adenosine could act as a competitive inhibitor, and the results indicated that the AgmF was found to be almost inactivated in the presence of a 3-fold excess of exogenously added adenosine (Supplementary Fig. 14).
AgmF differs from the classical family of SAH hydrolases in catalytic function, particularly in the recognition of a distinct substrate, while the key catalytic residues are conserved, suggesting that AgmF may adopt a similar catalytic mechanism to that of the typical SAH hydrolase. Concerning the dehydration reaction, several enzymatic strategies are derived from the common dehydratase domain in polyketide synthase36, the heme iron-dependent aldoxime dehydratase37, the dihydroxyacid dehydratase (containing a 2Fe–2S cluster)38, and the glutamyl-tRNA-dependent dehydratase in RiPPs (ribosomally synthesized and post-translationally modified peptides) biosynthesis39. The generic mycobacterial SAH hydrolase has been shown to catalyze a reversible reaction via an oxidation–reduction mechanism, which is associated with a cryptic dehydration step. However, the self-sufficient strategy exploited for the NAD+ cofactor recycling in the AgmF reaction is interesting. We, therefore, propose that Lys185 is likely to act as the basic residue extracting the proton from the 4′-OH group during oxidation to 4′-keto angustmycin C (2a) and that Lys185 is more basic because of its proximity to Asp188 and Ser190 (Fig. 4e). The carboxyl group of Asp129 can make an interaction with H-4′ and catalyze a proton abstraction at C-4′ to form 2b (Fig. 4e). Moreover, the two histidine residues His56 and His299 are speculated to form hydrogen bonds with O6′ of 2 and may be involved in the removal of the 6′-OH of 2b to form 2c. Finally, the end product 1 is produced with recycling of the NAD+ cofactor (Fig. 4e and Supplementary Fig. 13b, c). The leaving group proposed for the AgmF-catalyzed reaction is water, which is relatively poor when compared to the 5′-homocysteine motif for the SAH hydrolase family enzymes32. However, this may be promoted by more favorable conditions within the active site of AgmF or possibly involve a transient leaving group at the 5′ position (Fig. 4e and Supplementary Fig. 13c).
We also note that AgmF homologs are actually far more widely distributed in the microbial genomes than was previously supposed, which could be shown in a colored SSN analysis (Supplementary Fig. 15). A phylogenetic analysis of AgmF against other SAH hydrolases showed that the AgmF-related dehydratases formed a separate cluster from other enzymes (Supplementary Fig. 16), suggesting that AgmF represents a family of dehydratases using an unusual self-sufficient strategy for the cofactor NAD+ recycling.
AgmE, C, and B harbor individual activities of APRTase, ribose 5-P pyrophosphokinase, and AMP phosphatase
An initial in silico analysis showed that AgmE contains a conserved phosphoribosyltransferase domain. To evaluate if the protein carries out the expected adenine phosphoribosyltransferase (APRTase) function, it was overexpressed and purified from E. coli BL21(DE3), and assayed in vitro with the substrates adenine and phosphoribosyl pyrophosphate (PRPP). HPLC analysis of the reaction products showed that the AgmE-catalyzed reaction is capable of generating a characteristic AMP peak, whose identity was further verified by LC-MS, and indicating that AgmE plays a similar role as APRTase during 1 biosynthesis (Supplementary Fig. 17).
As for the catalytic function of AgmC, in silico analysis indicated that it harbors a conserved ribose-phosphate pyrophosphokinase domain. To determine if this enzyme executes a related activity, we performed an in vitro enzyme assay using ribose 5-phosphate (ribose 5-P) as substrate. To facilitate the detection of the enzymatic product (PRPP), we conducted an assay to indirectly monitor the conversion of ATP to AMP, using a coupled AgmC-AgmE reaction to trap PRPP to form the product AMP. As anticipated, the coupled reaction synthesized an enhanced amount of AMP in comparison with that of the negative control (without adenine added), showing that AgmC could utilize ATP to give AMP (Supplementary Fig. 18d). The AgmC-AgmE-coupled reactions lacking ribose 5-P (or lacking both ribose 5-P and ATP) also abolished the production of the target product AMP. These data demonstrate that AgmC performs a catalytic pyrophosphokinase-like function in the process of 1 biosynthesis (Supplementary Fig. 18).
Similar bioinformatics analysis for the function of AgmB revealed a conserved HAD signature motif with 51% identity to C5D20_10175 (HAD-like hydrolase) of Rathayibacter toxicus. This family of enzymes unusually catalyzes nucleophilic substitution reactions at phosphorus or carbon centers, using a conserved Asp carboxylate anion in the active site40. All members of this family possess a conserved alpha/beta core domain. To assess the function of the purified AgmB enzyme, we tested its activity in vitro and, as expected, found that the AgmB enzyme is indeed able to convert AMP to adenosine, the identity of which was verified by LC-MS (Supplementary Fig. 19). Collectively, this provides biochemical evidence that AgmE, AgmB, and AgmC have the activities of APRTase, AMP phosphatase, and ribose 5-P pyrophosphokinase, and they are most likely to play related functional roles during 1 biosynthesis.
Reconstitution of the complete six-enzyme pathway in vitro to 1 and 2
The functional role of AgmD in the 1 biosynthetic pathway also remained undetermined, and by in silico analysis showed 31% identity to AlsE of E. coli. The AlsE enzyme was previously characterized as a d-allulose 6-phosphate 3-epimerase that utilizes d-fructose 6-phosphate as substrate. In addition, earlier metabolic labeling experiments indicated that 1 arises directly from either d-glucose or d-fructose. This suggested that AgmD should play a 3-epimerase role similar to AlsE and provides the basis for further understanding of 1 biosynthesis (Supplementary Fig. 20a–c).
To further decipher the biosynthetic logic underlying 1 biosynthesis, we reconstituted the complete biosynthetic pathway to 1 in vitro. The six enzymes covering AgmA-F were individually overexpressed and purified from E. coli BL21(DE3), allowing for the construction of the biosynthetic pathway to 1 de novo (Fig. 5a, b). HPLC analysis of the products showed that the complete AgmA-F reactions are capable of producing the distinctive 2, adenine, and 1 peaks, whose corresponding [M + H]+ ions m/z 298.1143, 136.0617, and 280.1039 (and related fragment ions) are fully consistent to those of authentic standards. It was also found that the enzymatic reaction including AlsE, AgmA-C, and AgmE-F is also capable of producing the peaks of 2, adenine, and 1 peaks, but with a higher overall efficiency, showing that AlsE and AgmD share the same functional role. These data demonstrated that the six enzymes (AgmD, C, A, E, B, and F) catalyze the relay reactions with fructose 6-phosphate as the starter substrate in the biosynthesis of angustmycins (Fig. 5c and Supplementary Table 5).


a Diagrammatic sketch of one-pot reaction in vitro for 2 and 1 biosynthesis. b HPLC analysis (λ = 254 nm) of the one-pot reaction mixtures of proteins responsible for angustmycin biosynthesis. Std, the authentic standards of 2, AMP, adenine, and 1; AlsE, reaction with AlsE added, accompanying with fructose 6-P, metal ion, and cofactors; Agm(D + C + A + E + B + F), reaction with complete six proteins (AgmD, AgmC, AgmA, AgmE, AgmB, and AgmF) added, accompanying with related substrates, metal ion, and cofactors. Other samples are correspondingly assigned. ‘*’ represents NADP+-related compounds. c Bioassays of the 1 authentic standard and the products generated by related reaction mixtures of different protein compositions. The indicator strain is Mycobacterium smegmatis mc2155. The Roman numerals I, II, III, IV, V, VI, VII, and VIII represent AlsE, AlsE + AgmC, AlsE + Agm(C + A), AlsE + Agm(C + A + E), AlsE + Agm(C + A + E + B), AlsE + Agm(C + A + E + B + F), Agm(D + C + A + E + B + F), and 1 standard, respectively, and the denotations of them correspond to those shown in (b). d Confirmed biosynthetic pathway to 2 and 1 on the basis of one-pot reaction. Fructose 6-P, Fructose 6-phosphate.
Based on these results, we then attempted to dissect the individual stepwise reactions occurring during 1 biosynthesis. HPLC analysis of the products indicated that the reaction (AlsE + AgmC) was able to generate a characteristic AMP peak, which was absent from the AlsE reaction. LC-MS analysis of this peak gave a [M + H]+ molecular ion at m/z 348.0703 and major fragment ions at 136.0617 and 250.0937, fully matched to those of the AMP authentic standard. These data established that AgmC is a pyrophosphokinase that uses allulose 6-phosphate as the acceptor substrate. The coupled reactions containing AlsE, AgmC, and AgmA also gave the adenine peak, but without the coproduction of AMP. Hence, the substrate of AgmA is probably also the product of AgmC. This intermediate was further consumed by the addition of the AgmE enzyme in the related coupled reaction to form 5. The coupled LC-MS analysis of 5 gave a [M + H]+ ion at m/z 378.0806, and main fragment ions of m/z 360.0691, 262.0933, and 136.0619, demonstrating that AgmE is an adenine phosphoallulosyltransferase. 5 was revealed to be specifically dephosphorylated to render 2, whose identity was confirmed by LC-MS (Fig. 5c, Supplementary Fig. 21, and Supplementary Table 5). To analyze their relative catalytic activities, the generation of 2 by the five-enzyme coupled reaction was measured with either AlsE or AgmD. The relative activity of AgmD is 22% when compared with that of AlsE, consistent with the result of the one-pot in vitro reaction (Supplementary Fig. 20d). The stepwise reactions were also individually carried out to illustrate that AgmD has a function identical to AlsE (Supplementary Fig. 22 and Supplementary Table 5), and the accumulation of adenine and consumption of 5 would affect the one-pot reaction efficiency (Supplementary Figs. 23 and 24).
In the present paper, we have demonstrated that the complete 1 biosynthetic pathway consisting of six enzymes is sequentially arranged to synthesize 1. The biosynthetic pathway starts with a glycosyl epimerization step from d-fructose 6-phosphate to d-allulose 6-phosphate as catalyzed by AgmD, and then successively undergoes pyrophosphorylation, adenine phosphoallulosyltransfer, and dephosphorylation to produce 2. After the final dehydration step, 2 is converted to 1. Notably, the cofactors involving NAD+ and NADP+ are recycled during 1 biosynthesis, and AMP plays dual roles as both the substrate and the product. These latter results suggest that 1 biosynthesis exploits a self-sufficient and efficient strategy for cofactor recycling and substrate utilization, which is unusual in natural product biosynthesis (Fig. 5d).
Utilization of E. coli as a cell factory for engineered production of 1 and 2
Finally, we have also explored the potential of utilizing E. coli as a cell factory for the production of 1. The six biosynthetic genes (agmDCAEBF, listed according to the reaction sequence) were introduced into E. coli GYJ23, a strain engineered for nucleoside production, via a three-plasmid system (Fig. 6a). After confirmation, the transformant GYJ23/agmDCAEBF was shown by HPLC to be capable of generating the target peaks of 1 and 2, whose identities were further confirmed by LC-MS analysis (Fig. 6b and Supplementary Table 5). The yields of 2 and 1 could independently reach 110 and 370 μg/mL after 96 h fermentation. Interestingly, we found the strain (GYJ23/agmDCAEB, lacking agmF) is capable of targeted accumulating 2 with a yield of 420 μg/mL after 96 h fermentation (Fig. 6c, d).


a Diagrammatic sketch of the production of 2 and 1 in E. coli, the biosynthesis-related genes are cloned into pETDuet, pCDFDuet, and pRSFDuet and coexpressed in E. coli GYJ23. b HPLC analysis (λ = 254 nm) of metabolites producedby related recombinants of E. coli GYJ23. Std, the authentic standards of 2 and 1; alsE + agmCAEBF, the metabolites of the recombinant E. coli GYJ23 containing pETDuet/alsE + agmC, pCDFDuet/agmA + agmE, and pRSFDuet/agmB + agmF. agmD + agmCAEBF, the metabolites of the recombinant E. coli GYJ23 containing pETDuet/agmD + agmC, pCDFDuet/agmA + agmE, and pRSFDuet/agmB + agmF. Other samples are correspondingly assigned. c The production of 2 and 1 by the strains GYJ23/(alsE + agmCAEBF) and GYJ23/agmDCAEBF. The red bars refer to 2 production, while the green ones refer to 1 production, and the colors marked correspond to those in HPLC analysis of (alsE + agmCAEBF) and agmDCAEBF in (b). d The production of 2 by fermenting strain GYJ23/(alsE + agmCAEB) and GYJ23/agmDCAEB. The 24, 48, 72, and 96 h represent the corresponding time for fermentation sampling, and the values are the means ± s.d. measured from three biological replicates, and the colors marked correspond to those in HPLC analysis of (alsE + agmCAEB) and agmDCAEB in (b). Source data underlying (c) and (d) are provided as a Source Data file.
We also evaluated the production of 1 and 2 conferred by the set of six genes with agmD replaced by alsE. HPLC analysis showed that the strain GYJ23/(alsE + agmCAEBF) also produces 1 and 2, but with higher titers (Fig. 6b and Supplementary Table 5). Hence, the 2 titer of the strain GYJ23/(alsE + agmCAEB) was 780 μg/mL, which is considerably higher than that of the GYJ23/agmDCAEB strain. All of these data demonstrate that suitably engineered E. coli strains could be used as a robust cell factory for the efficient production of the nucleoside antibiotic 1 (Fig. 6c, d). The successful production of 1 and 2 in a heterologous E. coli cell factory raised the question of why the strain GYJ23/(alsE + agmCAE) is not capable of accumulating 5 and adenine. This may in part be because phosphate-containing compounds, such as nucleotides, are less easily transported out of cells. Concerning adenine, as a primary metabolite, it is also perhaps prone to being recycled during the cell cycle of E. coli.
In summary, we report the discovery and characterization of the gene cluster responsible for angustmycin A (1) biosynthesis and we provide evidence that AgmA (phosphoribohydrolase), AgmB (phosphatase), AgmC (pyrophosphokinase), AgmD (epimerase), and AgmE (phosphoallulosyltransferase) collaborate together to biosynthesize the angustmycin C (2) molecule. We have also unraveled that AgmF functions as an unusual dehydratase for the final tailoring step to 1 via a self-sufficient strategy for NAD+ cofactor recycling. Moreover, we have reconstituted the complete biosynthetic pathway to 1 in vitro and successfully engineered the production of 1 in a robust cell factory of E. coli. We anticipate that deciphering the angustmycin pathway will expand the biochemical repertoire for the intriguing enzymatic reactions leading to nucleoside natural product biosynthesis (while this paper was under review, a part result of the angustmycin biosynthesis was reported by Shiraishi et al.41; they characterized the gene cluster and identified part of the AgmF function (dehydratase)), and open the way to the rapid and rational discovery of more purine nucleoside antibiotics related to 1 (Supplementary Fig. 25).

