Preloader

A general theoretical framework to design base editors with reduced bystander effects

Kinetic model of base editing

We developed a discrete-state stochastic model to describe the dynamics of target and bystander editing. This is a minimal chemical-kinetic approach that considers the most relevant chemical states and features of base editing. For convenience, unless noted otherwise, we will use A3A-BE3 to edit the EGFP site 1 as an example.

In this theoretical model (Fig. 1), it is assumed that the Cas9 domain of CBE can bind to ssDNA with a rate u0, initiating the base editing (transition from state 0 to state 2). Alternatively, the protein complex can go to an unproductive state where editing cannot take place, with a rate of u4 (transition from state 0 to state 1). Next, either the Cas9 domain dissociates from DNA with a rate w0 (Transition from state 2 to state 0), or the target cytidine binds to the deaminase catalytic site with a rate u1 (transition from state 2 to state 3). Then the cytidine can either dissociate from the site with a rate w1 without being edited (backward step from state 3 to state 2), or it can be chemically transformed to uridine with a rate u3 (transition from state 3 to state 5). Similarly, the bystander cytidine may bind to the deaminase with a rate u2 (transition from state 2 to state 4), and subsequently, it can either unbind with a rate w2 without being edited (transition from state 4 back to state 2), or it can be chemically transformed with a rate u3 (transition from state 4 to state 6). After that, while Cas9 is still bound to DNA (being in the state 5 or 6), the deaminase can continue editing other cytidines in this region with the same sequence of events (transition to the states 9–12). Alternatively, if Cas9 dissociates from DNA, uridine will be transformed to thymidine through DNA repair (transitions from state 5 to states 7 and 13, or transition from state 6 to states 8 and 14). This U-to-T editing decreases the rebinding rate of Cas9 to ssDNA (transition 7(to )5) if the endogenous DNA repair and replication machinery has changed the DNA sequence from G:C pair to A:T pair. In this case, the new DNA sequence does not perfectly match the spacer sequence of sgRNA. Because the repairing rate is unknown, the rebinding rate is assumed to be mu0 with (0le mle 1). The parameter m reflects how much the rebinding ability of the BE complex is lowered in comparison with the original substrate. If the DNA repairing rate is slow then m tends to be closer to 1; otherwise, m tends to be closer to 0. Note that the kinetic network in Fig. 1 is a minimal description of complex chemical processes that take place during base editing.

Fig. 1: Chemical-kinetic model of A3A-BE3 editing the EGFP site 1.
figure1

Deaminase, Cas9, target and bystander base are represented by blue, orange, red and yellow squares, respectively. “C” represents cytosine, “T” represents thymine and “X” represents uridine or thymine. Editing is modeled as a multiple-step chemical reaction where Cas9 first binds to ssDNA, cytidine then binds to the catalytic site of the deaminase and is chemically converted to thymidine. Here, uj and wj (j = 0–4) represent the chemical-kinetic rates for various transitions. The model has a total of 15 states and can produce four outcomes (dashed squares): CTC (failed editing), CTT (only the target base is edited), TTC (only the bystander is edited), TTT (both target and bystander bases are edited).

To evaluate the dynamics of base editing, we explored the first-passage probabilities method successfully used in various problems in chemistry, physics, and biology15,16,17,18. In the case of EGFP site 1 editing by A3A-BE3 there are four possible outcomes as shown in Fig. 1: CTC (state 1, failed editing), CTT (state 13, only the target base is edited), TTC (state 14, only the bystander is edited) and TTT (state 12, both the target base and the bystander are edited). The explicit solution for the probability for the system to end up in one of these products is given below (see derivations in the Supplementary Information Appendix):

$${P}_{{CTT}}={P}_{5}cdot frac{{u}_{4}{w}_{0}({u}_{3}+{w}_{2})}{left({u}_{2}+{w}_{0}right)left({u}_{3}+{w}_{2}right)left({u}_{4}+m{u}_{0}right)-{u}_{2}{w}_{2}left({u}_{4}+{{mu}}_{0}right)-{{mu}}_{0}{w}_{0}left({u}_{3}+{w}_{2}right)}$$

(1)

$${P}_{{TTC}}={P}_{6}cdot frac{{u}_{4}{w}_{0}({u}_{3}+{w}_{1})}{left({u}_{1}+{w}_{0}right)left({u}_{3}+{w}_{1}right)left({u}_{4}+m{u}_{0}right)-{u}_{1}{w}_{1}left({u}_{4}+{{mu}}_{0}right)-{{mu}}_{0}{w}_{0}left({u}_{3}+{w}_{1}right)}$$

(2)

$${P}_{{TTT}}= {P}_{5}cdot left[1-frac{{u}_{4}{w}_{0}left({u}_{3}+{w}_{2}right)}{left({u}_{2}+{w}_{0}right)left({u}_{3}+{w}_{2}right)left({u}_{4}+m{u}_{0}right)-{u}_{2}{w}_{2}left({u}_{4}+{{mu}}_{0}right)-{{mu}}_{0}{w}_{0}left({u}_{3}+{w}_{2}right)}right]\ +{P}_{6}cdot left[1-frac{{u}_{4}{w}_{0}({u}_{3}+{w}_{1})}{left({u}_{1}+{w}_{0}right)left({u}_{3}+{w}_{1}right)left({u}_{4}+m{u}_{0}right)-{u}_{1}{w}_{1}left({u}_{4}+{{mu}}_{0}right)-{{mu}}_{0}{w}_{0}left({u}_{3}+{w}_{1}right)}right]$$

(3)

({P}_{{CTC}}) can be calculated as one minus the sum of the above three probabilities. In equations [1–3], ({P}_{5}) and ({P}_{6}) are two intermediate parameters satisfying:

$${P}_{5}=frac{{u}_{0}{u}_{1}{u}_{3}({u}_{3}+{w}_{2})}{left({u}_{1}+{u}_{2}+{w}_{0}right)left({u}_{3}+{w}_{1}right)left({u}_{3}+{w}_{2}right)left({u}_{0}+{u}_{4}right)-{u}_{1}{w}_{1}left({u}_{3}+{w}_{2}right)left({u}_{0}+{u}_{4}right)-{u}_{2}{w}_{2}left({u}_{3}+{w}_{1}right)left({u}_{0}+{u}_{4}right)-{u}_{0}{w}_{0}left({u}_{3}+{w}_{1}right)left({u}_{3}+{w}_{2}right)}$$

(4)

$${P}_{6}=frac{{u}_{0}{u}_{1}{u}_{3}({u}_{3}+{w}_{1})}{left({u}_{1}+{u}_{2}+{w}_{0}right)left({u}_{3}+{w}_{1}right)left({u}_{3}+{w}_{2}right)left({u}_{0}+{u}_{4}right)-{u}_{1}{w}_{1}left({u}_{3}+{w}_{2}right)left({u}_{0}+{u}_{4}right)-{u}_{2}{w}_{2}left({u}_{3}+{w}_{1}right)left({u}_{0}+{u}_{4}right)-{u}_{0}{w}_{0}left({u}_{3}+{w}_{1}right)left({u}_{3}+{w}_{2}right)}$$

(5)

In experiments, a common way to quantify editing efficiency is to measure the overall probability of editing the target cytidine, ({P}_{t})11,12. To compare these predictions with experimental results, ({P}_{t}) was calculated as:

$${P}_{t}={P}_{{CTT}}+{P}_{{TTT}}$$

(6)

Similarly, the overall probability of editing the bystander cytidine, ({P}_{b}), was calculated as:

$${P}_{b}={P}_{{TTC}}+{P}_{{TTT}}$$

(7)

Our goal is to parameterize the model by reproducing experimentally measured probabilities, ({P}_{t}) and ({P}_{b}). Here, we assume that the binding between the cytidine (both target and bystander) and the deaminase is mainly a diffusion-controlled process. Therefore, considering that target and bystander cytidine are chemically identical and very close spatially, we added an additional approximation:

$${w}_{2}={w}_{1}{e}^{triangle triangle {E}_{0}/{kT}}={w}_{1}{e}^{left[triangle {E}_{0}left({bystander}right)-triangle {E}_{0}left({target}right)right]/{k}_{B}T}$$

(9)

The physical meaning of these expressions is the following: the binding rate to the target or the bystander are the same, but the unbinding is governed by the strength of the interactions between the DNA substrate and the protein complex. In Eq. (9), the term (triangle {E}_{0}) represents the binding free energy between the ssDNA binding motif and the deaminase. (triangle triangle {E}_{0}) represents the difference in (triangle {E}_{0}) between the dissociation from the target base and the dissociation from the bystander base. This difference arises from the sequence shift in the binding interface. An example is shown in Fig. 1, where the sequence of ssDNA binding motif changes from “T-1C0” in the case of target editing, to “G-1C0” in the case of bystander editing. This change can be formalized by a mutation from thymine to guanine at position -1, which perturbs the binding free energy and further influences the unbinding rate w of the cytidine from the catalytic site. Note that this approximation can also be explained using thermodynamic arguments, since the ratio between rates of binding and unbinding is related to the free energy difference between two states: the state where the protein-RNA complex is bound to the DNA chain and the state where both DNA and protein complex are free, (frac{{u}_{1}}{{w}_{1}}={e}^{-triangle {E}_{0}left({target}right)/{k}_{B}T}), (frac{{u}_{2}}{{w}_{2}}={e}^{-triangle {E}_{0}left({bystander}right)/{k}_{B}T}). Using Eq. (8) one can derive the result in Eq. (9).

Similarly, any deaminase mutation can be represented as a perturbation in binding free energy relative to the wild type,

$${w}_{1,{mutation}}={w}_{1,{WT}}{e}^{triangle triangle {E}_{m}/{k}_{B}T}={w}_{1,{WT}}{e}^{left[triangle {E}_{0}left({mutation}right)-triangle {E}_{0}left({WT}right)right]/{k}_{B}T}$$

(10)

(triangle triangle {E}_{m}) represents the difference in free energy due to mutations.

Substituting Eqs. (8–10) into Eqs. (6–7), we obtain:

$${P}_{t}=frac{left({gamma }_{1}+m+{gamma }_{1}{gamma }_{3}+{gamma }_{1}{gamma }_{2}{gamma }_{3}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}}right)left(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}right)+({gamma }_{1}+m)(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}})}{left({gamma }_{1}+m+{gamma }_{1}{gamma }_{3}+{gamma }_{1}{gamma }_{2}{gamma }_{3}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}}right)cdot [left(2+{2gamma }_{1}+{{gamma }_{1}gamma }_{3}right)left(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}}right)left(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}right)-{gamma }_{2}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}}left(1+{gamma }_{1}right)left(1+2{gamma }_{2}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}+{e}^{frac{triangle triangle {E}_{0}}{{k}_{B}T}}right)]}$$

(11)

$${P}_{b}=frac{left({gamma }_{1}+m+{gamma }_{1}{gamma }_{3}+{gamma }_{1}{gamma }_{2}{gamma }_{3}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}right)left(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}}right)+({gamma }_{1}+m)(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}})}{left({gamma }_{1}+m+{gamma }_{1}{gamma }_{3}+{gamma }_{1}{gamma }_{2}{gamma }_{3}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}right)cdot [left(2+{2gamma }_{1}+{{gamma }_{1}gamma }_{3}right)left(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}}right)left(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}right)-{gamma }_{2}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}}left(1+{gamma }_{1}right)left(1+2{gamma }_{2}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}+{e}^{frac{triangle triangle {E}_{0}}{{k}_{B}T}}right)]}$$

(12)

$${gamma }_{1}={u}_{4}/{u}_{0}$$

(13)

$${gamma }_{2}={w}_{1,{WT}}/{u}_{3}$$

(14)

$${gamma }_{3}={w}_{0}/{u}_{1}$$

(15)

Equations (11–15) give the full analytical expressions in terms of kinetic rates and binding affinities that can be used to calculate the editing probability. There are six free parameters to describe the base editing process (({gamma }_{1}), ({gamma }_{2}), ({gamma }_{3}), (m), (triangle triangle {E}_{0}), (triangle triangle {E}_{m})) but this number can be reduced using additional information. For example, previous binding experiments19 have indicated that A3A binds to ssDNA with ({K}_{d}=57mu M,{K}_{M}=62mu M) and ({k}_{{cat}}=1.1/s). From these values, one can infer that ({w}_{1,{WT}}=12.54/s) and ({u}_{3}=1.1/s). Therefore, after the cytidine binds to the catalytic site, the relative probability between unbinding and the chemical transformation step, ({gamma }_{2}), is 11.4. Next, if the changed ssDNA sequence no longer perfectly matches the sgRNA sequence, we assume that successful editing prevents rebinding of Cas9 to ssDNA, therefore (m)=0. Nevertheless, we show below that this assumption only has a minor effect on the final results. Lastly, we performed all-atom computational simulations to estimate (triangle triangle {E}_{0}) and (triangle triangle {E}_{m}), as shown in the next section. As a result, only two free parameters remain in the model, ({gamma }_{1}) and ({gamma }_{3}) (Eqs. 13 and 15), both of which are parameterized by reproducing experimental values of ({P}_{t}) and ({P}_{b}).

Computational estimates of binding free energy changes

We chose four CBEs developed by the Joung group:11 A3A(S99A), A3A(Y130F), A3A(N57Q), and A3A(N57A) to calculate the binding free energy changes between ssDNA and A3A. These CBE variants reduce the bystander effect to different extents while maintaining a high probability of on-target editing. The binding interface in the wild type A3A-ssDNA binding complex is shown in the crystal structure (PDB ID: 5KEG) (Fig. 2a). The carbonyl oxygen of Ser99 forms a hydrogen bond with the N4 atom of the cytidine in the catalytic site (dC0). The hydroxyl group of Tyr130 forms a hydrogen bond with the 5’-phosphate of dC0. Lastly, the nitrogen atom in the sidechain of Asp57 forms a hydrogen bond with the O3 atom of dC0. Therefore, all four CBE variants appear to destabilize the binding between A3A and ssDNA ((triangle triangle {E}_{m}) > 0) by breaking this hydrogen-bonding network. In addition, since A3A recognizes the T-1C0 instead of the G-1C0 motif, the binding free energy to the deaminase should be higher (more repulsive) for the bystander cytidine than for the target cytidine ((triangle triangle {E}_{0}) > 0). To quantitatively calculate (triangle triangle {E}_{0}) and (triangle triangle {E}_{m}), we utilized the so-called “alchemical free-energy calculations” based on MD simulations20,21. A thermodynamic cycle was constructed to convert (triangle triangle {E}_{0}) and (triangle triangle {E}_{m}) (Fig. 2b, (triangle {G}_{3}-triangle {G}_{1})) to the difference between two slow alchemical transitions (Fig. 2b, (triangle {G}_{2}-triangle {G}_{4})). One transition is the free energy change for the A3A-ssDNA complex due to mutations (Fig. 2b, (triangle {G}_{2})) whereas the other is the free energy change for A3A alone due to mutations (Fig. 2b, (triangle {G}_{4})). Calculated values indeed show that mutations cause an apparent increase in the deaminase-ssDNA binding free energy (Fig. 2c), consistent with predictions based on the structural data.

Fig. 2: Calculation of binding free energy changes between ssDNA binding motif and A3A deaminase under various mutations.
figure2

a Hydrogen-bonding network between cytidine dC0 and A3A residues 57, 99 and 130. b Thermodynamic cycle used to calculate the binding free energy changes: ssDNA (black line), A3A (blue balloons). A mutation at the binding interface is shown by a transition from small orange circle to orange triangle. c Computationally estimated changes in binding free energy for A3A deaminase mutations S99A, Y130F, N57Q, N57G and binding free energy change between A3A binding to target and bystander cytidines. The energy unit is kBT and T = 300 K. Data are presented as mean values ± s.e.m, estimated by Bennett’s acceptance ratio method for 200,000 data points. Source data are provided as a Source Data file.

The rationale for A3A mutants that reduce the bystander effect

To check whether our model can reproduce the experimentally measured on-target and bystander editing probability, we substituted (triangle triangle {E}_{0}) and (triangle triangle {E}_{m}) calculated above into Eqs. (11–15) and adjusted ({gamma }_{1}) and ({gamma }_{3}). The resulting theoretical prediction is in very good agreement with the experimental measurements11 (Fig. 3a), with values ({gamma }_{1}=frac{{u}_{4}}{{u}_{0}}=2.1) and ({gamma }_{3}=frac{{w}_{0}}{{u}_{1}}=2.9* {10}^{-5}). The value of ({gamma }_{1}) indicates that there is a significant fraction of BEs failing to initiate editing, whereas the value of ({gamma }_{3}) suggests that the residence time of Cas9 on ssDNA is sufficient for the deaminase to function. We note here that the choice of m, which quantifies the effect of sgRNA mismatch on the rebinding rate of Cas9 and ssDNA, does not significantly affect the result (Fig. S1). The model also well produced the editing patterns at multiple genomic loci (Fig. S3), demonstrating the generality of this model.

Fig. 3: Theoretical model of the A3A-BE3 editing system.
figure3

a Comparison between theoretical calculations (solid lines) and experimental data11 (circles, representing the mean ± s.e.m. of three independent biological replicates). (triangle triangle {E}_{m}) represents the perturbation in binding free energy due to the different mutations; kBT is the unit of energy; ({P}_{t}) and ({P}_{b}) represent the overall probability of editing target and bystander cytidine, respectively. b Calculated editing probabilities for products CTT, TTC and TTT. ({P}_{t}={P}_{{CTT}}+{P}_{{TTT}}); ({P}_{b}={P}_{{TTC}}+{P}_{{TTT}}). Source data are provided as a Source Data file.

Our theoretical model can be used to explain why the single mutation N57G greatly improves the editing selectivity of A3A-BE3. First, the ratio between the probabilities of having the target cytidine edited before the bystander (Fig. 1, transition state (2to 3to 5to ldots )) and that of the reversed events (Fig. 1, transition state (2to 4to 6to ldots )) can be calculated as:

$${R}_{1}=frac{P({state}2to 3to ldots )}{P({state}2to 4to ldots )}=frac{frac{{u}_{3}}{{u}_{3}+{w}_{1}}}{frac{{u}_{3}}{{u}_{3}+{w}_{2}}}=frac{1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}}{1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{m}}{{k}_{B}T}}}$$

(16)

with ({gamma }_{2}=11.4). In the case of A3A, ({R}_{1}) can be approximated as ({e}^{frac{triangle triangle {E}_{0}}{{k}_{B}T}}). As A3A significantly prefers the TC motif to the GC motif ((triangle triangle {E}_{0} sim 6{k}_{B}T)) this ratio is larger than 400. As a result, for both A3A(WT) and A3A(N57G), the probability of having only the bystander edited is very low (Fig. 3b, blue line, almost zero). After the target cytidine is edited (Fig. 1, state 5), the system has the choice of getting released with the product CTT (Fig. 1, state 13) or to continue editing the bystander, leading to the product TTT (Fig. 1, state 12). The outcome is largely influenced by the ratio between ({w}_{2}) and u3. If ({w}_{2}) is significantly larger than u3, bystander editing will be blocked because the residence time for the bystander cytidine in the catalytic site is too short to complete the transition to thymidine. Analytically, after the target cytidine gets edited, the probability ratio between CTT and TTT outcomes can be calculated as:

$${R}_{2}=frac{P({state}5to 13)}{P({state}5to 12)}=frac{{w}_{0}}{{u}_{2}}left(1+frac{{w}_{2}}{{u}_{3}}right)={gamma }_{3}left(1+{gamma }_{2}{e}^{frac{triangle triangle {E}_{0}+triangle triangle {E}_{m}}{{k}_{B}T}}right)$$

(17)

For A3A(WT), ({R}_{2}) is 0.17, meaning that the dominant product is TTT (Fig. 3b, purple square at (triangle triangle {E}_{m}=0)). This explains that for the wild-type A3A the editing efficiency of the target cytidine is similar to that of the bystander. In sharp comparison, as (triangle triangle {E}_{m}) increases by 4.5 ({k}_{B}T) for A3A(N57G), ({R}_{2}) is 14.8. Now the dominant edited product changes to CTT (Fig. 3b, green line at (triangle triangle {E}_{m})= 4.5 ({k}_{B}T)). In this case, A3A(N57G) minimizes the bystander effect while maintaining a high probability of editing the target base.

Similar arguments can be presented for the other three A3A mutants (S99A, Y130F, and N57Q). Calculations (Table S1) show that ({R}_{2}) gradually increases from 0.42 for S99A to 4.91 for N57Q mutants, indicating that the bystander effect gradually decreases. This is consistent with experimental findings (Fig. 3). However, it is critical to note that to gain high editing selectivity, mutated residues have a non-monotonic effect in the deaminase-ssDNA interface. Here, selectivity is defined as the difference in probabilities between editing the target and editing the bystander. Weakening the binding interface up to 4–6 kBT (depending on the system) greatly improves selectivity (Fig. 3a), but it drops when (triangle triangle {E}_{m}) continues further increasing. This result can be explained using the following physical considerations. Increasing (triangle triangle {E}_{m}) leads to faster-unbinding rates between cytidine and deaminase. At moderate values of (triangle triangle {E}_{m}), target editing is less affected (Fig. 1, state 2(to )5) but bystander editing is blocked (Fig. 1, state 5(to )12). However, for very large values of (triangle triangle {E}_{m}), both editing pathways are essentially blocked and the system prefers to go into the inactive state (Fig. 1, state 1). Therefore, proper modulation of the binding interface is the key to optimize base editing selectivity. We further prove this point in the next section.

The computational model helps design new A3G-BEs with improved editing selectivity

In this section, we employ our theoretical model to optimize the editing selectivity of the base editor A3G3.1 (Fig. 4a). First, the editing profiles at both target and bystander bases were calculated by using Eqs. (11–15). Our calculations show that improving the editing selectivity of A3G3.1 requires mutations that increase (triangle triangle {E}_{m}) by 2–3 kBT (shaded area in Fig. 4a). Second, specific mutations were designed and (triangle triangle {E}_{m}) was calculated for each mutation by alchemical free-energy calculations as detailed above. Four mutations (T218S, T218N, T218I and T218G) fell into these selection criteria. A failure example is T218W, which loses the editing activity at the target base owing to overly increased (triangle triangle {E}_{m}) (Fig. 4a). We then experimentally verified these four mutations at three genomic loci containing the “TCC” motif, including EMX1 #a3, PPP1R12C #a1, and ATM #1. We chose these target sites with the “TCC” motif, which are generally more challenging over “ACC” or “GCC” for selectively editing the second C, since “T” and “C” are structurally more similar. In the “TCC” case, the deaminase tends to treat “T” as a “C” and preferentially edits the bystander first “C” as well. In our tests, A3G3.14 (A3G3.1 with T218S) and A3G3.15 (A3G3.1 with T218N) generally show much improved editing selectivity (Fig. 4b), with marginally or modestly decreased editing efficiency. Therefore, A3G3.14 and A3G3.15 were further tested at other five genomic loci, including MMS22L #1, FANCE #1, MRPL44 #1, FANCF #c1, and MRPL40 #1 (Fig. 4c). Compared to the original A3G3.1, the target-to-bystander editing ratio increases from average 2.9 to 8.6-fold with mutations.

Fig. 4: Engineering of A3G-BEs.
figure4

a Theoretical calculation. C6 represents the target base. C5 represents the bystander. The shaded area represents the region with improved editing selectivity. b Experimental measurements at three genomic loci for four mutations picked by theoretical model; A3G3.1 is the full-length APOBEC3G deaminase with a set of mutations which increase the catalytic efficiency. A3G3.8, 3.9, 3.14, and 3.15 are A3G3.1 with T218G, T218I, T218S, and T218N, respectively. Bar plots represent the mean ± s.d. of three independent biological replicates. c Experimental measurements at other five genomic loci for A3G3.14 (T218N) and A3G3.15 (T218S). Bar plots represent the mean ± s.d. of three independent biological replicates, except for the bar representing the editing efficiency of A3G3.14 at FANCF #c1 site, which shown the mean ± s.d. from four biological replicates. Source data are provided as a Source Data file.

Our results indicate that mutagenesis stringency and genomic sites are tightly coupled in determining the target-to-bystander editing ratio. Mutagenesis stringency influences the overall editing patterns while specific genomic sites dictate the mutation with the best performance. The basic rule is that relatively large mutagenesis stringency (i.e., high (triangle triangle {E}_{m})) is needed for genomic sites with low editing selectivity, and vice versa. Our tested eight genomic loci can be divided into two types in regards to the A3G3.1 editing selectivity. The first group, including EMX1 #a3, FANCE #1, and MRPL40 #1 sites, showed low selectivity, as expressed by the target-to-bystander editing ratio around 1.07–1.23; whereas the second group, including PPP1R12C #a1, ATM #1, MMS22L #1, MRPL44 #1 and FANCF #c1 sites, showed selectivity to some extent, with the target-to-bystander editing ratio ranging from 2.87 to 5.97. This natural site-dependent difference in selectivity arises from multiple reasons such as sequence context and the levels of DNA accessibility. Therefore, the first group needs mutations with higher (triangle triangle {E}_{m}) than the second group. The theoretical model predicts that (triangle triangle {E}_{m}) follows an order of S < N (Fig. 4a). As a result, for the first group, A3G3.15 (T218N) generally performs better than A3G3.14 (T218S). In contrast, for the second group, A3G3.14 (T218S) performs better. These results indicate that mutagenesis stringency and genomic sites should be considered simultaneously during the designing process.

Currently, one difficulty in designing BE is that there are few methods to predict the editing pattern for a novel mutation before experimental validation. In addition, the same mutation can function differently at different genomic loci. Using our model, the editing patterns of those two mutations on A3G3.1 are computationally predictable, and well-validated by experiments (Fig. 4b and 4c). This result demonstrates the power of combining theoretical and experimental approaches. EMX1 #a3 site was also tested in three cell lines, K562, Jurkat, and HeLa (Supplementary Fig. 4). Although these cell lines generally have low transfection efficiency, we still observed an increase of the target-to-bystander editing ratio in A3G3.15 treated cells, compared to those treated by A3G3.1 (Supplementary Fig. 4 and Fig. 4b), for two cell lines, K562 (two-tailed p = 0.0005 with unpaired t-test) and Jurkat (p = 0.0001). The improvement for Hela cells is less significant and needs further optimization in the future.

Source link