We describe the definition of the docking poses used in this work and the details of MSPER below. When an enzyme catalyzes target products or byproducts, it must form the appropriate docking poses (target product docking poses or byproduct poses) in which the enzyme active site contacts the corresponding substrate reaction sites. It is believed that residues in contact with the substrate in a docking pose (contact residues) are important for stabilizing the pose. To enhance the substrate reaction site’s regioselectivity, contact residues in target product docking poses must not be substituted to maintain target product production. Contact residues in byproduct docking poses should actively be substituted to decrease the production of byproducts. Our prediction method, MSPER, identifies candidate substitution residues that maintain the target product docking poses and destabilize the byproduct docking poses.
Enzymes with low regioselectivity of substrate reaction sites such as a cytochrome P450 CYP102A1 can assume multiple docking poses corresponding to each product (Fig. 1). We defined the docking poses as follows. For each conformation, we calculated the distances between the active site of the enzyme and substrate reaction sites (in this study, the oxygen atom covalently bonded to the Fe atom in heme and the hydrogen atoms located at the substrate reaction sites [7 and 14 hydrogen atoms in (S)-(−)-limonene (1) and p-cymene (16), respectively, as shown in Fig. 2)]. When the minimum distance between these sites was less than the sum of the radii of the two atoms and the tolerance (oxygen vdW radius 1.48 Å + hydrogen’s radius 1.00 Å + tolerance 1.00 Å = 3.48 Å in this study), the conformation was considered a docking pose in which the closest reaction site contacts the active site. For example, if the closest hydrogen was the H2 atom, the docking pose was called the H2 docking pose and belonged to the H2 docking pose group. Note that a docking pose could not belong to multiple docking pose groups simultaneously. Thus, the simulated ensemble was divided into two docking pose groups, i.e., the target product docking pose group and the sum of the other docking pose groups (the byproduct docking group), and the substrate was not docked in the other conformations.


Various oxidized compounds detected from reaction products converted by wild-type CYP102A1. (A) (S)-(−)-Limonene (1) was converted to 14 oxidized compounds, i.e., limonene-1,2-epoxide and its isomer (2, 3), limonene-8,9-epoxide and its isomer (4, 5), cis-isopiperitenol (6), (−)-perillaldehyde (7), limonene diepoxide and its isomers (8–11), trans-carveol (12), cis-carveol (13), perillyl alcohol (14) and limonene-1,2-diol (15). (B) The enzyme converted p-cymene (16) to six compounds, i.e., p-cymenene (17), cuminaldehyde (18), p,α,α-trimethylbenzyl alcohol (19), 4-isopropylbenzyl alcohol (20), thymol (21), and carvacrol (22).


Chemical structures of substrates, such as (A) (S)-(−)-limonene (1) and (B) p-cymene (16), with a number to each carbon and hydrogen atom. Seven hydrogen atoms (H2, H3a, H3b, H6a, H6b, H9a, and H9b) in (S)-(−)-limonene and all 14 hydrogen atoms in p-cymene (16) corresponding to experimentally detected oxidation sites were used to define the docking pose groups.
To identify candidate substitution residues, we obtained enzyme–substrate contacts to stabilize the docking poses. We calculated the distances between an atom (a heavy atom, not hydrogen) in the enzyme (an enzyme heavy atom) and heavy atoms in the substrate (substrate heavy atoms). When the minimum distance was less than the sum of vdW radii of the enzyme heavy atom and the closest substrate atom + tolerance of 1.00 Å, the enzyme heavy atom was considered in contact with the substrate. For each docking pose group, the average contact rate of an enzyme heavy atom (C) was calculated as follows:
$$begin{array}{*{20}c} {C = mathop sum limits_{i = 1}^{N} cw_{i} /mathop sum limits_{i = 1}^{N} w_{i} } \ end{array} ,$$
(1)
where N is the total number of conformations in the docking pose group, c is the contact condition (when the enzyme heavy atom is in contact with the substrate, c = 1, otherwise c = 0). Though wi is a weight coefficient for the i-th conformation (see the reweighting factor of ALSD simulation in “Methods” section for the details), it is 1 if the docking pose is from X-ray crystal conformation, an NMR model, docking simulation, or a conventional MD simulation. To identify atoms with low contact rates in the target product docking pose and high contact rates in the byproduct docking pose, we calculated the difference in contact rate (D) as follows:
$$begin{array}{*{20}c} {D = C_{{{text{bp}}}} – C_{{{text{tp}}}} } \ end{array} ,$$
(2)
where Cbp and Ctp are the contact rates of the byproduct docking pose group and the target product group, respectively. Finally, the substitution candidate residue score (Sscr) for each residue was calculated as
$$begin{array}{*{20}c} {S_{{{text{scr}}}} = mathop sum limits_{i = 1}^{N} D_{i} } \ end{array} ,$$
(3)
where N is the number of heavy atoms in the residue, and Di is the difference in contact rate of the i-th heavy atom in the residue. In the current study, two simulations with different initial conformations were performed to check the convergence of the simulation results, and amino acid residues were ranked according to Sscr for each simulation. Note that residues corresponding to the active site (CYS 400 covalently bonded to heme in this study) were removed from the ranked list. The residues with the highest average rank from the two simulations were considered candidate substitution residues.
Wild-type CYP102A1 converted (S)-(−)-limonene (1) to 14 oxidized compounds (Fig. 1A and Supplementary Figs. 1 and 2). Epoxide compounds, i.e., limonene-1,2-epoxide (2 and 3), limonene-8,9-epoxide (4 and 5), limonene diepoxide (8, 9, 10, and 11), and these diastereomers comprised approximately 80% of the total reaction products. On the other hand, the composition of the target alcohols was as follows: 5.4% were composed of cis-isopiperitenol (6) and 9.8% were composed of trans-carveol (12) (Supplementary Table 1). Additionally, the wild-type enzyme catalyzed the conversion of 6 reaction products from p-cymene (16) (Fig. 1B and Supplementary Figs. 3 and 4). p,α,α-Trimethylbenzyl alcohol (19) was the main component, comprising 64.5% of the total reaction compounds, while thymol (21) and carvacrol (22), as target alcohols, comprised 4.7% and 13.5% of these compounds, respectively (Supplementary Table 14). See Fig. 2A,B for the names of the hydrogen atoms located at the substrate reaction sites (7 and 14 hydrogen atoms for (S)-(−)-limonene (1) and p-cymene (16), respectively.
To enhance the regioselectivity of (S)-(−)-limonene oxidization, we generated docking poses of the N-terminal P450 heme domain (BMP) of cytochrome P450 BM3 (CYP102A1) and (S)-(−)-limonene complex with MD simulation. In this study, we used Adaptive Lambda Square Dynamics (ALSD) simulation23,24, which is one of MD simulation techniques that enhances conformational changes of substrates and/or enzyme active sites to achieve efficient docking pose search in a shorter time than conventional MD simulation (see “Methods” section for the details). Examples of docking poses corresponding to modification sites for the two target products, H6b (for trans-carveol (12) and H3a (for cis-isopiperitenol (6)), and the largest component of the byproduct docking pose group, H9b, are shown in Fig. 3A–C, respectively. In these docking poses, conformational differences are observed in the β-turn and the side chain ring of F87 in the pocket. For example, in Fig. 3A, the β-turn moves toward the pocket entrance side, and the ring of F87 moves to be parallel to the heme plane to invite the substrate to the backside. On the other hand, in Fig. 3C, the β-turn moves toward the backside, and the ring of F87 becomes perpendicular, similar to the X-ray crystal structure (Fig. 4B), preventing the substrate from entering the backside. These results indicate that it is necessary to consider not only the arrangement of the substrate relative to the enzyme but also the conformational change in the enzyme required to obtain the correct docking poses.


Conformational examples of docking poses in which H6b [corresponding to trans-carveol (12), (A)], H3a [corresponding to cis-isopiperitenol (6), (B)], and H9b [the largest component of the byproduct docking pose group, (C)] of (S)-(−)-limonene contact the O atom bound to the Fe atom in heme. A view of the active site from the pocket entrance side. The heme and (S)-(−)-limonene are represented by orange and green sticks, respectively. The active site, heme oxygen is a red ball. The modification sites corresponding to H6b, H3a, and H9b are taken as magenta spheres. The orientation of the side chain ring of F87 (yellow stick) and the β-turn of T436-T438 (purple stick) changed depending on the modification sites on the substrate.


(A) A tentative complex conformation of the (S)-(−)-limonene docked into the BMP X-ray crystal conformation with the resolution of 1.65 Å (PDB ID: 1BU7) using ZDOCK software. The coordinates of the missing hydrogen atoms in the BMP crystal structure were automatically generated by an MD simulation program PRESTO ver. 3. See the legend in Fig. 3 for the coloring and representation in the figure. This conformation was used as a pre-initial conformation to generate the initial different conformations for 72 MD simulation runs. A view of the active site from the pocket entrance side. (B) A view of the active site from the F87 side. (C) The whole simulation system with water molecules and ions.
To identify candidate substitution residues, we calculated the contact rates of BMP atoms with (S)-(−)-limonene in docking poses where the H6b of (S)-(−)-limonene contacts the O atom bound to the Fe atom in heme, which corresponds to trans-carveol (12) (Fig. 5A) and those in the docking poses where one of the six hydrogen atoms located in the other oxidation sites (see Fig. 2) contacts the O atom, which corresponds to the generation of byproducts (Fig. 5B). The examples of contact rate maps (Fig. 5A,B) and the difference in the maps (Fig. 5C) are shown. To ensure the prediction accuracy of MSPER, a ranked list was generated from each of the two independent simulations, and residues with the highest average rank were selected as candidate substitution residues. We selected the top four residues (A330, L75, P329, and L437, see Fig. 5D) from the ranked list (Table 1, Supplementary Table 2) as candidate substitution residues. In addition, we conducted proof experiments for the bottom three residues (F87, I263, and T260, see Fig. 5D) to confirm whether substitution of the bottom-ranked residues resulted in the reduction of trans-carveol. We also applied MSPER on the docking poses generated by a docking simulation software, ZDOCK (https://zlab.umassmed.edu/zdockconv3d/)25, to evaluate the effect of calculation methods used to generate docking pose data on the generated poses and the mutation site rankings. Although there are some differences between docking poses by ZDOCK and those by ALSD, the ranked list using ZDOCK is almost equivalent to that using ALSD in this case. The candidate substitution residues we selected above (A330, L75, P329, and L437) are also ranked relatively high in that using ZDOCK (Supplementary Table 22; 1st, 6th, 5th, and 2nd, respectively). The 3rd (F331) and 4th (A328) mutation sites in the ranked list using ZDOCK are 9th and 6th in that using ALSD, respectively. For details of the ZDOCK calculation, see Supplementary Information.


Contact rate maps of BMP atoms with (S)-(−)-limonene. The higher the contact rate of the atoms is, the redder the atoms are. A view of the active site from the F87 side. (A) The map from the docking poses where H6b of (S)-(−)-limonene contact the O atom bound to the Fe atom in heme, which corresponds to the generation of trans-carveol (12). (B) The map from the docking poses where one of the six hydrogen atoms located in the other oxidation sites (see Fig. 2) contacts the O atom, which corresponds to the generation of byproducts (B). (C) The difference map between (B) and (A). Red-colored residues are identified as candidate substitution residues for enhancing regioselectivity. These maps were calculated from the first ALSD simulation and projected on the X-ray crystal structure. (D) Positions of the top 4 (red) and the bottom 3 (blue) residues were determined by MSPER. A view of the active site from the pocket entrance side. A β-turn (K41-T49), the lid of the pocket entrance, is transparently displayed to make it easier to see around the active site.
To investigate whether substitution of the top 4 residues enhances regioselective C6 (H6b as the hydrogen position) oxidation corresponding to trans-carveol as the reaction product (12), all single mutants for each position, the total number was 76, were constructed. The enzymatic activity of BMP in 53 of the single mutants yielded levels of trans-carveol above the experimental quantification limit. The trans-carveol reaction product ratio was increased by 81.1% (43 mutants) of the total activated mutants compared with the wild-type control (Supplementary Tables 7, 11, 12, and 13). At the top 4 residues, the A330P, L75F, P329L, and L437F mutants converted the maximum trans-carveol ratios, with trans-carveol comprising 16.1, 13.3, 22.5, and 24.7% of the reaction compounds, respectively, produced by these mutants (Table 2, Supplementary Fig. 5, and Supplementary Table 1). In particular, the L437F mutant produced 2.5-fold more trans-carveol (12) than the wild-type enzyme (Fig. 6A). In contrast, all mutants with substitutions at the bottom-ranked F87, I263, and T260 (i.e., 57 mutants) converted (S)-limonene, whereas 55 of these mutants yielded a lower trans-carveol ratio than the wild-type enzyme (Supplementary Table 8, 9, and 10). These results suggest that our method succeeds in identifying proper modification sites for regioselective C6 oxidation.


Comparison of reaction product compositions between wild-type CYP102A1 and CYP102A1 mutants. The percentages of oxidized compounds, cis-isopiperitenol (6) and trans-carveol (12), were calculated from the peak area of converted total products. These compound ratios of mutants were compared to those of wild type as the positive control for each experiment. (A) The data for the L437F mutant are presented as the averages of three independent experiments. The standard deviations for the percentages are included in Supplementary Table 1. (B) The data for the P329G mutant are presented as the averages of two independent experiments (Supplementary Table 11).
Furthermore, we made ranked lists for regioselective C3 oxidation corresponding to cis-isopiperitenol (6) and H3a of (S)-(−)-limonene as the hydrogen position, and the top two residues (A74 and P329, Supplementary Table 3) were selected as candidate substitution residues. A total of 63.1% of all single mutants (24 of 38) yielded a higher reaction product ratio of cis-isopiperiteol (6), and the maximum ratio of cis-isopiperiteol, which was 6.4-fold more than that produced by the wild-type ratio, was produced by P329G, with cis-isopiperiteol comprising 29.6% of the reaction products (Fig. 6B and Supplementary Tables 6 and 11). Therefore, these results suggest that MSPER can control regioselectivity not only for C6 but also for C3 oxidation.
Next, we show the simulation results of the BMP-p-cymene (16) complex. We identified candidate substitution residues for two target alcohols, carvacrol (22) and thymol (21), as shown in Supplementary Tables 4 and 5, respectively. Conversion of p-cymene to carvacrol was observed for 92 of 95 mutants (Supplementary Tables 14, 15, 17, 18, and 19). Substitutions at 5 candidate residues, namely, L75, A74, A330, P329, and M354, yielded the maximum carvacrol ratios, with carvacrol comprising 15.7%, 25.8%, 22.9%, > 99%, and 19.6% of the reaction compounds produced by the L75W, A74H, A330P, P329K, and M354R mutants, respectively. The carvacrol ratios for all P329 mutants were higher than the carvacrol ratio for the wild-type enzyme, although the amount of carvacrol was decreased for 5 mutants, i.e., as P329K, P329Q, P329R, P329M, and P329H (Supplementary Table 17). Moreover, the ratio and amount of carvacrol produced by the P329Y and P329G mutants were threefold higher than those produced by the wild-type enzyme. Meanwhile, the maximum ratio of carvacrol produced by mutants with substitutions at the bottom-ranked I263 for carvacrol was 0.8-fold that produced by the wild-type enzyme. The amount of carvacrol produced was below the quantification limit for 10 mutants (Supplementary Table 16).
Subsequently, the regioselectivity of C3 oxidation corresponding to H3 and H5 as the hydrogen positions and thymol as the product (21) by the mutants modified at the top-ranked 3 residues, L437, P329, and A74, was evaluated (Supplementary Tables 14, 17, and 20). Although the thymol ratio for the L437 mutants was reduced by > 30%, the thymol ratio for the P329R and A74H mutants was the highest, reaching 4.5- and 2.4-fold of the ratio for the wild-type control, respectively. These results demonstrated that MSPER can identify mutation sites for controlling the regioselectivity of CYP102A1 despite a structural difference in the substrates.

