Generation of large datasets for adenine- and cytidine base editing via high-throughput screening with self-targeting libraries
To capture base editing outcomes of SpCas9 CBEs and ABEs across thousands of sites in a single experiment, we generated a pooled lentiviral library of constructs encoding unique 20-nt sgRNA spacers paired with their corresponding target sequences (20-nt protospacer and a downstream NGG PAM site) (Fig. 1a). Our library included 23,123 randomly generated target sequences and 5,171 disease-associated human loci with transition mutations, comprising a comprehensive and diverse library for machine learning (Supplementary Data 1). Oligonucleotides containing the sgRNAs and corresponding target sequences were synthesized in a pool and cloned into a lentiviral backbone containing an upstream U6 promoter and a puromycin resistance cassette. HEK293T cells were then transduced at a 1000× coverage with a multiplicity of infection (MOI) of 0.5, and selected with puromycin. Next, cells were transfected with Tol2 compatible plasmids encoding for blasticidin resistance and one of the four commonly used base editors: ABEmax (containing ecTad7.10), CBE4max (containing rAPOPEC1), ABE8e (containing ecTadA-8e), and Target-AID (containing the AID ortholog PmCDA1) (Supplementary Fig. 1). Co-transfection with a Tol2 transposase plasmid allowed stable integration and prolonged expression of base editors. After 10 days in culture, cells were harvested, and genomic DNA was collected for amplicon high-throughput sequencing (HTS) (Fig. 1b and see the ‘Methods’ section).


a The design of the self-targeting library was adapted from refs. 27,28,29,30. The lentiviral library contains the sgRNA expression cassette and the target locus on the same DNA molecule. The sgRNA (spacer and scaffold) is transcribed under the control of a U6 promoter and is designed to direct the base editor (nCas9-deaminase fusion) to the 20-nt sequence upstream of the protospacer adjacent motif (PAM). hU6 human U6 promoter, ef1α elongation factor 1α promoter, nCas9 nickase Cas9, sgRNA single-guide RNA, Puro puromycin selection marker. b Overview of library screening. c–f Base editor profiles for loci above mean editing efficiency for c ABEmax, d CBE4max, e ABE8e, and f Target-AID. The plot shows the average efficiency of A-to-G or C-to-T base conversions at each position across the protospacer target sequence. The top horizontal bar illustrates the favored activity window of the respective deaminase. g–j Proportion of the different tri-nucleotide motifs for loci above mean editing efficiency for g ABEmax, h CBE4max, i ABE8e, and j Target-AID. The number of analyzed target sequences shown in a–j are as follows: n = 8,558 (ABEmax); 9,534 (CBE4max); 3,416 (ABE8e); and 10,177 (Target-AID).
We observed high consistency between both experimental replicates (Pearson’s r2 = 0.88 (ABEmax), 0.86 (CBE4max), 0.92 (ABE8e), and 0.88 (Target-AID)) (Supplementary Fig. 2), indicating comprehensive and robust sampling of edited target sites. Mean base editing efficiencies (defined here as the fraction of mutant reads overall sampled reads of a target site) were 4.26% for ABEmax, 3.61% for CBE4max, 3.15% for ABE8e, and 3.13% for Target-AID (Supplementary Fig. 3). In line with previous studies, we observed maximum editing at position 6 (counting from PAM distal) with ABEmax, CBE4max, and ABE8e, and at position 3 for Target-AID (Fig. 1c–f)7,8,9,10. Interestingly, the editing window of ABE8e was broader than ABEmax, and that of Target-AID was shifted PAM-distally compared to CBE4max (Fig. 1e, f). Analysis of the trinucleotide sequence context, moreover, confirmed that ecTadA7.10 of ABEmax and rAPOBEC1 of CBE4max have a preference for editing at bases that are preceded by T (Fig. 1g, h)10,11,12,13. ecTadA7.10 additionally has an aversion for an upstream A and preference for a downstream C. Notably, ecTad-8e of ABE8e displayed a reduced sequence preference, although editing of bases that were preceded by an A was still largely disfavored (Fig. 1i). Compared to rAPOBEC1 PmCDA1 of Target-AID lacked the requirement of a preceding T for efficient editing, but motifs, where the targeted base is followed by a C, were disfavored (Fig. 1j).
Development of BE-DICT, an attention-based deep learning model predicting base editing outcomes
Potentially predictive features that influence CRISPR/Cas9 sgRNA activity, such as the GC content and minimum Gibbs free energy of the sgRNA, did not influence base editing rates (Supplementary Fig. 4). This prompted us to utilize the comprehensive base editing data generated in the ABE and CBE target library screens for designing and training a machine learning model capable of predicting base editing outcomes at any given target site. We established BE-DICT (Base Editing preDICTion via attention-based deep learning), an attention-based deep learning algorithm that models and interprets dependencies of base editing on the protospacer target sequence. The model is based on multi-head self-attention inspired by the Transformer encoder architecture14. It takes a sequence of nucleotides of the protospacer as input and computes the probability of editing for each target nucleotide as output (Fig. 2a). The formal descriptions of the model and the different computations involved are reported in Supplementary Notes 1–3. In short, BE-DICT assigns a weight (attention-score) to each base within the protospacer (i.e. learned fixed-vector representation). The input mode is dichotomous, where bases with editing efficiencies above or equal mean editing were classified as edited, and bases below were classified as non-edited. The output is a probability score, reflecting the likelihood (between 0 and 1) with which a target base will be edited (C-to-T or A-to-G). To train and test the model, we included all target sequences with at least one classified base edit (8,558 for ABEmax; 9,534 for CBE4max; 3,416 for ABE8e; 10,177 for Target-AID). In order to reduce the tendency towards edited target sequences, which could result in an inherent bias of the prediction tool, we also added unedited target sequences at a ratio of 1:4 (Supplementary Data 1). For model training, we used ∼80% of the dataset and performed stratified random splits for the rest of the sequences to generate an equal ratio (1:1) between the test and validation datasets. We repeated this process five times (denoted by runs), in which we trained and evaluated a model for every base editor separately for each run. BE-DICT performance was then plotted using the area under the receiver operating characteristic curve (AUC), and the area under the precision-recall curve (AUPR). For all four models, an AUC of between 0.92–0.95 and AUPR between 0.733–0.806 was achieved (Fig. 2b–e). Notably, at positions within the activity window where we have a balanced distribution of edited vs. unedited substitute bases, BE-DICT performed with significantly higher accuracy than a per position majority class predictor—a baseline model that predicts nucleotides conversions as a Bernoulli trial, using maximum-likelihood estimation for computing the probability of editing success at each position (Fig. 2f–i, Supplementary Fig. 5).


a Design of an attention-based deep learning algorithm to predict base editing probabilities. Given a target sequence, the model returns a confidence score to predict the chance of target base conversions. The model has three main blocks: (1) An embedding block that embeds both the nucleotide and its corresponding position from one-hot encoded representation to a dense fixed-length vector representation. (2) An encoder block that contains a self-attention layer (with multi-head support), layer normalization31 and residual connections, and a feed-forward network. (3) An output block that contains a position attention layer, and a classifier layer. b–e The average AUC achieved across five runs (interpolated) for models trained on data from high-throughput base editing experiments. f–i Line plot of per-position accuracy of the trained models across five individual runs for base editors in comparison to the accuracy of majority class baseline predictor. Standard deviation is depicted as a band along with the line plot.
BE-DICT can be utilized to predict editing efficiencies at endogenous loci and predominantly puts attention to bases flanking the target base
Base editing at endogenous loci may also be affected by protospacer sequence-independent factors, such as chromatin accessibility. We, therefore, tested the accuracy of BE-DICT in predicting base editing outcomes at 18 separate endogenous genomic loci for ABEmax and ABE8e, and 16 endogenous genomic loci for CBE4max and Target-AID. HEK293T cells were co-transfected with plasmids expressing the sgRNA and base editor, and genomic DNA was isolated after 4 days for targeted amplicon HTS analysis. Across all tested loci we observed a strong correlation between experimental editing rates and the BE-DICT probability score (Pearson’s r = 0.78 for ABEmax, 0.68 for CBE4max, 0.57 for ABE8e, and 0.64 for Target-AID; Fig. 3a–d; Supplementary Data 2). Further validating our model, BE-DICT also accurately predicted base editing efficiencies from previously published experiments (Pearson r = 0.82 for ABEmax, 0.71 for CBE4max, 0.91 for ABE8e, and 0.76 for Target-AID; Supplementary Fig. 7; Source Data)8,15. These results demonstrate that the BE-DICT probability score can be used as a proxy to predict ABEmax and CBE4max editing efficiencies with high accuracy.


Endogenous genomic target sequences with at least two substrate nucleotides were targeted separately by co-transfection of the sgRNA and base editors a ABEmax, b CBE4max, c ABE8e, and d Target-AID. Heatmap shows the BE-DICT prediction score (green) and experimentally observed target base conversion (purple). Substrate bases for the respective base editor are outlined in bold. Pearson’s correlation (r) for all target bases is shown.
The attention-based BE-DICT model provides insights (attention scores) for each position within the protospacer with regard to the position’s influence on the editing outcome. These attention scores provide a proxy for identifying relevant motifs and sequence contexts for editing outcomes. Interestingly, we found that for all base editors (ABEmax, CBE4max, ABE8e, and Target-AID) BE-DICT attention was mainly focused on bases flanking the target base and on the target base position itself (Fig. 4a–d). In addition, we observed that base attention patterns were dependent on the position of the target base, and occasionally consisted of complex gapped motifs rather than consecutive bases (Supplementary Fig. 6) underscoring the necessity of using machine learning for predicting base editing outcomes.


Attention weight is indicated by color from light to dark. Mean-aggregate attention scores of target sequences (BE-DICT test dataset) predicted to be edited at the respective position for a ABEmax, b CBE4max, c ABE8e, and d Target-AID are shown. The number of sequences used for the analysis of the respective target base position is indicated as n.
Development of the BE-DICT bystander module
Multiple A or C nucleotides within the editing window can lead to bystander base conversions. These are often undesired, in particular, if they induce coding mutations in the targeted gene. Given that BE-DICT per-base models the ‘marginal probability’ of target base editing by providing a probability score whether a single base will be edited, it does not directly predict the editing efficiency of a locus (i.e. it cannot predict co-occurrences of target base- and bystander editing). Therefore, we next developed an extension module of BE-DICT, which is adapted to predict the relative proportions of all different editing outcomes (combinations of target base and bystander transitions) per target locus (BE-DICT bystander module—Fig. 5a). The model is based on an encoder–decoder architecture (adapting the Transformer architecture used in the BE-DICT per-base model), which takes a sequence of nucleotides of the protospacer as input, and computes the probability of the different output sequences (i.e. probabilities for all combinations of sequences with target-based and bystander transitions, as well as the probability of observing a wild-type sequence) (Fig. 5a). The formal description of the model is reported in Supplementary Notes 2 and 3. In short, it uses an encoder module that computes a vector representation for each nucleotide in the input protospacer sequence, and then uses a decoder that has the same components of the encoder module with the exception of a masked self-attention and cross-attention layer. The masked self-attention layer acts as an “autoregressive layer”, ensuring the use of only past information while computing the probability of the output. The cross-attention layer learns what parts of the input sequence are important when computing the vector representation of the nucleotides in the output sequence, subsequently allowing the model to compute the probability of each output sequence. For model training, we used the edited input sequences from the ABEmax-, CBE4max-, ABE8e-, and Target-AID library screens that were already used to train and test the BE-DICT per-base model, and again partitioned them in an 8:1:1 ratio for training, testing, and validation. Unlike for the per-base BE-DICT model, however, the outcome is non-binary and represented the frequencies of all outcomes on the target sites (unedited read and the different edited outcomes) for a given input sequence (i.e. protospacer). The trained BE-DICT bystander module predicted various possible editing outcomes per target sequence, including combinations with multiple base conversions (Fig. 5b, c). Importantly, the performance was reliable for all four base editors, as we achieved strong correlations between predicted and experimentally observed sequence proportions in the validation datasets (Pearson’s r = 0.86 for ABEmax, 0.94 for CBE4max, 0.66 for ABE8e, and 0.97 for Target-AID; Supplementary Fig. 8).


a An extension of the BE-DICT algorithm enables the prediction of the frequency of reads with bystander mutations. b, c Graphical representations of predicted and experimentally observed allele frequencies for an ABEmax and CBE4max target site. d Workflow for benchmarking BE-DICT with BE-HIVE and DeepBaseEditor. e–h Performance evaluation of different machine learning models for ABE and CBE on prediction of the proportion of edited outcomes (e, f) and the prediction of all outcomes (g, h). Pearson’s correlation (r) was calculated by comparison of predicted versus measured base editing outcome proportions in datasets published by Arbab et al. and Song et al. 10, 13, and in this study. The number of analyzed outcomes in the dataset from Arbab et al. are n = 7,743 (ABE edited outcomes), n = 7,537 (CBE edited outcomes), n = 9,008 (ABE all outcomes), and n = 8,895 (CBE all outcomes) arising from a total of 1,265 unique sequences for ABE and 1,358 sequences for CBE. The number of analyzed outcomes in the dataset from Song et al. is n = 1,767 (ABE edited outcomes), n = 2,332 (CBE edited outcomes), n = 2,204 (ABE all outcomes), and n = 2,807 (CBE all outcomes) arising from a total of 437 unique sequences for ABE and 475 sequences for CBE. The number of analyzed outcomes in the dataset from this study is n = 3,844 (ABE edited outcomes), n = 4,502 (CBE edited outcomes), n = 5,510 (ABE all outcomes), and n = 6,176 (CBE all outcomes) arising from a total of 1,667 unique sequences for ABE and 1,675 sequences for CBE.
Recently, two other machine learning models capable of predicting base editing outcomes have been developed. BE-Hive10, which is a deep conditional autoregressive model, and DeepBaseEditor13, which is based on a two-hidden layer convolutional neural network framework. Contrary to the BE-DICT bystander module that directly predicts the proportions of all outcomes at the target locus, both models separately predict the proportions of edited outcomes and the overall editing efficiency of the target site, and combining both values is required to estimate the frequency of precise target base conversion without bystander mutations (Fig. 5d). Since also BE-Hive and DeepBaseEditor have been trained and applied on TadA7.10-ABE and APOBEC1-CBE datasets, we decided to compare their performance to our attention-based machine learning model. First, we only benchmarked the ability of the three models to predict the proportions of edited outcomes. Therefore, we adapted the BE-DICT bystander model to only calculate the proportions of edited outcomes, comparable to the BE-Hive bystander and DeepBaseEditor proportion models. When applied to the high-throughput datasets of the three studies, all models achieved similarly good correlations with the experimentally observed values using Pearson’s correlation (Fig. 5e, f; Supplementary Fig. 10a, b) or Spearman’s correlation (Supplementary Fig. 9). Next, we compared the ability of the three models to predict the proportions of all outcomes (including the wildtype sequence) at a target locus. Again, predicted values correlated well with the experimentally observed values for all three models (Fig. 5g, h; Supplementary Fig. 10c, d). Interestingly, the performance of the three models was not substantially affected by the differences in the experimental setup of the three datasets (Fig. 5e–h; Supplementary Fig. 10), suggesting that they can tolerate variations in experimental procedures between laboratories. Confirming this hypothesis, when BE-DICT was retrained on the ABE datasets of Song et al. 13 (HT_ABE_Train), correlations between predicted and experimentally observed editing outcomes on the HT_ABE_Test dataset of Song et al. increased only incrementally to r = 0.94 (Supplementary Fig. 11). Altogether, we conclude that the three machine learning models operate robustly on different experimental datasets and with comparable accuracy.

