Generation of plasmid
pX330 SpyCas9-mSA was a gift from Janet Rossant (Addgene plasmid # 113096). sgRNAs targeting Actb gene, Fah gene, or Pcsk9 gene (gP from VIVO) were cloned into plasmid pX330 SpyCas9-mSA67. pET-45b(+)-Tn5 was a gift from Frank Pugh (Addgene plasmid # 112112) Sequences of all sgRNAs are listed in Supplementary Table 1. The modified SpyCas9-mSA* construct was generated through Gibson assembly, by combining the following four DNA fragments: (i) PCR amplified 3xc-Myc NLS fragment, (ii) PCR-amplified mammalian codon-optimized Cas9 cassette, (iii) a DNA gblock (linker-mSA-2xNLS), and (iv) AgeI/EcoRI-digested pX330 SpyCas9-mSA backbone. The SauCas9-mSA* construct was generated through Gibson assembly, by combining the following two DNA fragments: (i) a DNA gblock (linker-mSA-2xNLS), and (ii) EcoRI-digested customized pmax SauCas9 expression vector. The enAsCas12a-mSA* construct was generated through Gibson assembly, by combining the following two DNA fragments: (i) a DNA gblock (linker-mSA-2xNLS), and (ii) EcoRI-digested customized pmax enAsCas12a expression vector.
Donors for GUIDE-tag
To generate IRES-GFP donors for Actb gene, a vector (Addgene plasmid # 83575) containing IRES-GFP was used as a template for PCR amplification using Phusion master mix (ThermoFisher Scientific) with either biotinylated primers or standard primers (Supplementary Table 3). To generate exon 2–14 donors for the Fah gene, a gene block (IDT) containing a 3′ splice acceptor and Fah cDNA of exon 2–14 was used as a donor for PCR amplification with either biotinylated primers or standard primers (Supplementary Table 3). iGUIDE donor53 was prepared by annealing the following two oligos:
5′-P-G*C*TCGCGTTTAATTGAGTTGTCATATGTTAATAACGGTATACGC*G*A and 5′-P-T*C*GCGTATACCGTTATTAACATATGACAACTCAATTAAACGCGA*G*C.
GUIDE-seq donor4 was prepared by annealing the following two oligos:
5′-P-G*T*TTAATTGAGTTGTCATATGTTAATAACGGT*A*T and
5′-P-A*T*ACCGTTATTAACATATGACAACTCAATTAA*A*C.
Biotin-iGUIDE donor was prepared by annealing:
5′-biotin-G*C*TCGCGTTTAATTGAGTTGTCATATGTTAATAACGGTATACGC*G*A and 5′-biotin-T*C*GCGTATACCGTTATTAACATATGACAACTCAATTAAACGCGA*G*C.
Different Biotin-GUIDE donor combinations were prepared by annealing the following oligos:
GS1: 5′-P-G*T*TTAATTGAGTTGTCATATGTTAATAACGGT*A*T,
GS2: 5′-P-A*T*ACCGTTATTAACATATGACAACTCAATTAA*A*C,
GS1-5′Bio: 5′-biotin-G*T*TTAATTGAGTTGTCATATGTTAATAACGGT*A*T, GS2-5′Bio: 5′-biotin-A*T*ACCGTTATTAACATATGACAACTCAATTAA*A*C, GS1-IntBio: 5′-P-G*T*TTAATTGAGTTGTCAT(biotin)ATGTTAATAACGGT*A*T and GS2-IntBio: 5′-P-A*T*ACCGTTATTAACATAT(biotin)GACAACTCAATTAA*A*C P is 5′ phosphorylation and * indicates phosphorothioate bond.
Cell culture and transfection
Neuro 2A (N2A) cells were purchased from ATCC, and cells were maintained in Dulbecco’s Modified Eagle’s Medium supplemented with 10% FBS at 37° and 5% CO2.
For transfection-based editing experiments in N2A cells, cells were plated 30,000 cells per well in a 12-well plate. 24 h later, the cells were co-transfected with the indicated dose of SpyCas9-mSA plasmid, and biotinylated dsDNA donors. Lipofectamine 3000 (for plasmids) or CRISPRmax (for RNP) was used for the transfection according to the manufacturer’s instructions. FACS analysis was performed 4 days after transfection, and genomic DNA was isolated for PCR analysis.
For editing experiments in Hep1-6 cells, 25k Hepa1-6 cells were electroporated with 2 pmol of 3xNLS-SpyCas9 sgPcsk9 RNP or 3xNLS-SpyCas9-mSA sgPcsk9 RNP (sgRNA from IDT) and 5pmol of each different GUIDE-seq donor duplex DNA (except in the case of the RNA and donor titration experiments, where the dose range delivered is indicated in the figure legend). gDNA were isolated 3 days after transfection from each group and the insertion and indel percentages were measured by deep sequencing PCR amplicons spanning the Pcsk9 target site.
Animal studies
All animal experiments were authorized by the Institutional Animal Care and Use Committee (IACUC) at UMASS medical school. All DNA vectors were prepared by EndoFreeMaxi kit (Qiagen).
For in vivo Actb gene editing, FVB/NJ (Strain #001800) mice were purchased from Jackson Laboratories. Eight-week-old mice were injected with 2–2.5 ml 0.9% saline containing 20 μg sgActb-SpyCas9-mSA or sgActb-SpyCas9-mSA* and 4 μg of dsDNA donor or biotinylated dsDNA donor (IRES-GFP) into the tail vein in 5–7 s.
For in vivo Fah gene editing, Fah−/− (deltaExon5) mice were a gift from Dr. Markus Grompe (Oregon Health Science University) and were kept on 10 mg/L NTBC water. Eight-week-old mice were injected with 2–2.5 ml 0.9% saline containing 20 μg sgFah-SpyCas9-mSA and 4 μg of dsDNA donor or biotinylated dsDNA donor (Fah exon 2–14). NTBC water was removed 7 days post injection (defined as NTBC on, D0) to assess the functional correction of Fah. Two cycles of NTBC withdrawal (D0, D19, and D26) and reintroduction (D17 and D22) were performed to allow expansion of FAH+ hepatocytes.
For in vivo Pcsk9 gene editing, C57BL/6J (Strain #000664) mice were purchased from Jackson Laboratories. 15 Eight-week-old mice were randomally allocated into five groups. Mice were injected with 2–2.5 ml 0.9% saline containing (i) 1nmol of iGUIDE donor, or (ii) 1nmol of biotinylated iGUIDE donor, or (iii) 30 μg sgPcsk9 gP-SpyCas9-mSA*, or (vi) 30 μg sgPcsk9 gP-SpyCas9-mSA* and 1nmol of iGUIDE donor, or (v) 30 μg sgPcsk9 gP-SpyCas9-mSA* and 1nmol of biotinylated iGUIDE donor.
Animals were sacrificed at the end of each experiment (7 days for the Actb and Pcsk9 editing). Livers were fixed with formalin or stored at −80 °C until further analyses. No animals were excluded from the analyses. No sample size calculation was performed and each group consisted of at least three mice for statistical analysis.
Lung RNP delivery
Alt-R sgRNA (sgPcks9 and sgAi9) or Alt-R crRNA were chemically synthesized by Integrated DNA Technologies (IDT), resuspended in IDT duplex buffer at a concentration of 100 µM, and stored in aliquots at −80 °C (Supplementary Table 1). Ai9 mice were purchased from Jackson Laboratories (strain #007909). Fresh SpyCas9-mSA* RNPs or enAsCas12a RNPs were generated as previously described59. For each mouse, 4.5 nmol sgRNA was first mixed 1:0.8 v/v with 15–50 kDa PGA (100 mg/ml, Sigma-Aldrich) prior to complexing with 3 nmol Cas9-mSA* or Cas12a proteins for a final volume ratio of sgRNA:PGA:Cas9 of 1:0.8:1. For donor co-delivery, 1nmol bio-iGUIDE donor was mixed with RNPs (after 10 min) to form complexes. The RNP complexes were then delivered to mouse lung through intratracheal injection. tdTomato positive cells were sorted from dissociated mouse lung as previously described68. gDNA was isolated from ~200,000 sorted cells from each treated mouse and GUIDE-tag libraries were prepared as below.
Whole exome sequencing (WES) and variant calling
The livers of 3 control mice that received biotin-Fah-dsDNA by HTLV and 3 mice received with sgFah-SpyCas9-mSA and biotin-Fah-dsDNA by HTLV were analyzed to determine the rate of genome-wide variants after hepatocyte expansion (at D34). 1.5ug of gDNA per mouse was used for library preparation and an average of 120 Gb of deep sequencing data (~1000×) was generated per mouse (GENEWIZ). The original downstream analysis procedure has been described previously69. In brief, raw reads were processed with fastqc (Version 0.11.9) and trim_galore (Version 0.6.5) (https://www.bioinformatics.babraham.ac.uk/projects/) to remove reads with low quality and trim adapters. Then processed reads were mapped to Mouse GRCm38/mm10 with BWA-mem (v0.7.15)70. Picard (v1.119) (https://github.com/broadinstitute/picard) was used to mark duplicated reads. Genome Analysis Toolkit (GATK; version 4.1.6.0)70 was used for variant calling of low-frequency SNVs and INDELS with default parameters. Final SNVs in 3 treatment mice were extracted and filtered by 3 control mice.
Electrophoretic mobility shift assays (EMSA)
An iGUIDE sense oligonucleotide with 5′biotin and 3′Cy3 terminal modifications was purchased from IDT. To make biotin-iGUIDE-Cy3 donor duplex, 5′-biotin and 3′-Cy3 labeled iGUIDE sense oligonucleotide and anti-sense 5′-biotin iGUIDE oligonucelotide were mixed 1:1 mol/mol ratio and annealed. Ten pmol SpyCas9-mSA* (or SpyCas9 lacking mSA) in 7.5 µL was mixed at an equal molor ratio with 7.5 µL of sgRNA and incubated at room temperature for 10 min. Next, the 15 µL of Cas9-mSA* RNP was incubated with biotin-iGUIDE-Cy3 donor or a control Cy3-labeled duplex DNA lacking biotin at a 5:1 molor ratio unless otherwise indcated in a total volume of 30 µL of EMSA buffer [50 mM Tris-HCl, pH 7.0, 20 mM KCl, 1 mM MgCl2, 0.1% NP-40, 0.1% Tween20, 6% glycerol and 5 mM tris(2-carboxyethyl) phosphine (TCEP)]. Samples were analyzed by electrophoresis on a 4% native PAGE and DNA visualized by Cy3 fluorescence.
Protein purification
Protein purification followed a previously described protocols for Cas9-based proteins62 and Cas12a-based proteins71. Tn5 purification utilized a modified protocol72 that includes the addition of PEI and (NH4)2SO4 precipitations. pET-45b(+)-Tn5 (for Tn5 protein, a gift from Frank Pugh – Addgene plasmid # 112112) or pET-21a-SpyCas9-mSA* (for Cas9-mSA* protein) or pET-21a-3xNLS-SpyCas9-mSA (for 3xNLS-SpyCas9-mSA protein) or pET-21a-3xNLS-SpCas9 (Plasmid #114365) or pET-21a-enAsCas12a (for enAsCas12a protein) were introduced into E. coli Rosetta2(DE3)pLysS cells (EMD Millipore) for protein overexpression. Cells were grown at 37 °C to an OD600 of ~0.2, then shifted to 18 °C and, at OD600 of ~0.4, induced for 16 h with IPTG (0.7 mM final concentration). Following induction, cells were pelleted by centrifugation and then resuspended with Nickel-NTA buffer (20 mM TRIS + 1 M NaCl + 20 mM imidazole + 1 mM TCEP, pH 7.5) supplemented with HALT Protease Inhibitor Cocktail, EDTA-Free (100X) [ThermoFisher] and lysed with LM-20 Microfluidizer (Microfluidics) following the manufacturer’s instructions. For Tn5 purification prior to Ni-NTA purification, the nucleic acids were removed by precipitation with 0.25% w/v PEI and centrifuged @10,000 × g for 10 min. The PEI was removed by precipitating the protein with 70% (NH4)2SO4 at 4 °C and centrifuged @12,000 × g for 15 min. The protein pellet was resuspended in Nickel-NTA buffer and purified with Ni-NTA resin and eluted with elution buffer (20 mM TRIS, 500 mM NaCl, 250 mM Imidazole, 10% w/v glycerol, pH 7.5). Tn5 protein was dialyzed overnight at 4 °C in 20 mM HEPES, 500 mM NaCl, 1 mM EDTA, 10% w/v (8% v/v) glycerol, pH 7.5. Subsequently, Tn5 protein was step dialyzed from 500 mM NaCl to 200 mM NaCl (Final dialysis buffer: 20 mM HEPES, 200 mM NaCl, 1 mM EDTA, 10% w/v glycerol, pH 7.5). Next, the Tn5 protein was purified by cation exchange chromatography (Column = 5 ml HiTrap-S, Buffer A = 20 mM HEPES pH 7.5 + 1 mM TCEP, Buffer B = 20 mM HEPES pH 7.5 + 1 M NaCl + 1 mM TCEP, Flow rate = 5 ml/min, CV = column volume = 5 ml). The primary protein peak from the CEC was dialyzed to 2xTn5 buffer (100 mM HEPES-KOH at pH 7.2, 0.2 M NaCl, 0.2 mM EDTA, 2 mM DTT, 0.2% Triton X-100, 20% glycerol) and concentrated in an Ultra-15 Centrifugal Filters Ultracel −30K (Amicon) to a concentration of 63.5 µM. Finally, 0.827 volumes of 100% glycerol was added for a final concentration of 55% glycerol and then the Tn5 is stored at −20 °C until needed for transposome assembly.
Synthesis of oligonucleotides
Biotinylated oligonucleotides generated in house were synthesized at 1 µmole scale on a Biolytic Dr. Oligo 48 synthesizer. Standard phosphoramidites were purchased from ChemGenes. 5′ biotin (10–5950) and internal biotin (Biotin dT; 10–1038) phosphoramidites were purchased from Glen Research. Oxidation to phosphodiester linkages was accomplished with 0.05 M Iodine in 90% pyridine/10% water. Sulfurization to phosphorothioate linkages was accomplished with 0.1 M DDTT solution (ChemGenes). Oligonucleotides were deprotected with 30% NH3 in water (16 h at 55 °C), and then the ammonia was removed under vacuum. The oligonucleotides were then desalted (3x RNase-free water wash, 14 K rpm, 15 min) using Amicon Ultra 0.5 mL 3 K filters (Millipore, Billerica, MA), and resuspended in 400 μL RNase-free water. Oligonucleotides were analyzed on an Agilent 6530 Q-TOF LC/MS system with electrospray ionization and time-of-flight ion separation in negative ionization mode. Liquid chromatography was performed using a 2.1 × 50 mm AdvanceBio oligonucleotide column (Agilent Technologies, Santa Clara, CA). The data were analyzed using Agilent Mass Hunter software. Buffer A: 100 mM hexafluoroisopropanol with 9 mM triethylamine in water; Buffer B: 100 mM hexafluoroisopropanol with 9 mM trimethylamine in methanol. Samples were resolved over an elution gradient from 0 to 100% Buffer B over 5 min.
Tn5 tagmentation and library preparation for GUIDE-tag and UDiTaS
25 mg of frozen liver tissue was lysed to isolate ~25 μg genomic DNA using DNeasy blood & tissue kits (Qiagen). Adaptor oligonucleotides were synthesized by IDT (Supplementary Table 3). Transposon assembly was done by incubating 158ug Tn5 with 1.4 nmol annealed oligo (contains the full-length Illumina forward (i5) adapter, a sample barcode, and UMI)40 at room temperature for 60 min.
For tagmentation, 200 ng of genomic DNA was incubated with 2 μl of assembled transposome at 55° for 7 min, and the product was cleaned up (20 μl) with a Zymo column (Zymo Research, #D4013). Tagmented DNA was used for the 1st PCR using PlatinumTM SuperFi DNA polymerase (Thermo) with i5 primer and gene-specific primers (Supplementary Table 3). Four different libraries were prepared for gDNA from each mouse with different combinations of primers (i5+Locus_F [UDiTaS], i5+Locus_R [UDiTaS], i5+Insert_F [GUIDE-tag] and i5+Insert_R [GUIDE-tag]). The i7 index was added in the 2nd PCR and the PCR product was cleaned up with Ampure XP SPRI beads (Agencourt, 0.9X reaction volume). Completed libraries were quantified by Tapestation and Qubit (Agilent), pooled with equal mole, and sequenced with 150 bp paired-end reads on an Illumina MiniSeq instrument.
GUIDE-tag and UDiTaS data analysis
The GUIDE-tag and UDiTaS analysis pipeline was built using python code. Code is available at https://github.com/locusliu/GUIDESeq-Preprocess_from_Demultiplexing_to_Analysis and as we previously reported72. Briefly, it consists of the following steps:
-
i.
Demultiplexing and UMI extraction. Raw BCL files were converted and demultiplexed using the appropriate i5 and i7 sequencing barcodes, allowing up to one mismatch in each barcode. UMIs for each read were extracted into UMI.fastq files after filtering out the UMIs containing ‘N’ for further downstream analysis.
-
ii.
Raw reads were processed with fastqc (Version 0.11.9) and trim_galore (Version 0.6.5) (https://www.bioinformatics.babraham.ac.uk/projects/) to remove reads with low quality and trim adapters (regular Illumina adapter sequences), inserted tag (GUIDE-seq, iGUIDE) sequences, locus-specific sequences in UDiTaS (gene-specific primers) or IRES-GFP and FAH repair cassette for GUIDE-tag.
-
iii.
For UDiTaS create a reference sequence based on the UDiTaS locus-specific primer position and donor map separately. Build index files for the reference using bowtie2-index, version 2.4.0.
-
iv.
Alignment analysis. Paired reads were then globally aligned (end-to-end mode) to mouse genome (mm10) and all the reference amplicons using bowtie2’s very sensitive parameter. Finally, Samtools (version 0.1.19) was used to create an index-sorted bam file.
-
v.
Data anlaysis:
-
a.
For UDiTaS analysis at each target site, locus-specific primers were used to construct UDiTaS libraries, precise editing or small indels were analyzed as previously described73. Pindel (version 0.2.5b8) was used to detect breakpoints of large deletions and donor integration. Raw sequencing reads that align to the reference sequence were collapsed to a single read by common UMI and categorized as an exemplar for each UMI to a specific category—for example, Wild Type, precise editing, small indel/substitution (<50 bp), and Large Deletions/Insertions (>50 bp). Then the number of UMIs assigned per category was determined to define the ratio of each event.
-
b.
For GUIDE-Tag, iGUIDE-Tag or long donor-based (IRES-GFP and FAH repair cassette) analysis for off-targets identification the analysis pipeline that was used is dependent on the sequence of the DNA donor that was used. For synthetic duplex donors containing the iGUIDE sequence, the data was preprocessed using iGUIDE package53 to remove mispriming events (https://github.com/cnobles/iGUIDE) before running through the Bioconductor GUIDE-seq analysis pipeline as previously described74,75. After these preprocessing steps all data were analyzed for off-target site identification through the Bioconductor GUIDE-seq analysis pipeline. Briefly, for GUIDE-seq analysis processed paired reads were merged (if they overlap) and then globally aligned to the mouse genome (mm10) using bowtie2. Then BAM files and UMI files were used to aggregate unique reads. Default parameters were used for defining peaks composed of unique reads that may represent off-target sites. Potential off-target site identification within these peaks required the presence of a near-cognate recognition sequence for Cas9 with these parameters: the maximum number of allowed mismatches is 6 positions with one DNA/RNA bulge permitted and the presence of an NNG or NGN PAM is required. The peaks that represent potential off-targets sites were extracted from the GUIDE-Tag R package output files, which have the location information, then the header and UMI sequence for each read were extracted from UMI.fastq files. Subsequently, UMI counts within peaks that represent potential off-targets sites were counted, where a UMI is required to have at least three read counts to be included to reduce UMI singletons associated with sequencing errors.
-
a.
For computational prediction comparison, we used CRISPRseek55 to predict potential off-targets sites for sgActb and sgFah sgRNAs, allowing up to three mismatches.
Targeted amplicon deep sequencing to assess editing rates
Genomic DNA was isolated for indel analysis from the frozen liver of mice injected with sgRNA+SpyCas9-mSA. For the Actb and Fah sgRNA, we validated all off-target sites identified by GUIDE-tag. For the Pcsk9 sgRNA, we seleted 52 off-target sites (16 overlapping sites with VIVO, 24 overlapping sites with DISCOVER-seq, 17 sites identified by CIRCLE-seq but not validated by VIVO and 6 sites identified by GUIDE-tag) for amplicon deep sequencing to verify the presence of indels. 200 ng of genomic DNA was used for PCR using Phusion master mix (Thermo) with locus specific primers. All primers used for amplicon sequence are listed in Supplementary Table 3. PCR products were purified with Ampure beads (0.9X reaction volume) and eluted with 25 μl of TE buffer, and were quantified by Tapestation and Qubit. Equal mole of each amplicon was pooled and sequenced using Illumina Miniseq. Amplicon sequencing data were analyzed with CRISPResso (https://crispresso.pinellolab.partners.org/).
Immunohistochemistry and immunofluorescence
For immunohistochemical studies, formalin-fixed, paraffin-embedded (FFPE) mouse liver samples were sectioned at 4 μm, deparaffinized, and subsequently stained with anti-GFP (1:200, CST, Cat. #2956) or anti-Fah antibody (1:100, Abcam, Cat. #83770). Visualization was performed using the DAB Quanto kit (Fisher Scientific, Cat. # TA-125-QHDX) as instructed by the manufacturer.
For immunofluorescence, N2A cells grown on coverslides were fixed with 4% paraformaldehyde for 15 min at room temperature (RT) and permeabilized with 0.1% Triton X-100/PBS at RT for 15 min. Cells were then incubated overnight at 4 °C with anti-streptavidin antibody (1:100, Vector, Cat. # BA-0500-.5) and 1 h at room temperature with Alexa Fluor 647 Donkey anti-goat IgG (Invitrogen, Cat. #A32849). Nuclei were counterstained with DAPI. Images were acquired on a Leica DMi8 imaging microscope.
Statistical analysis
Statistical analyses for plotted data were performed using GraphPad Prism 8.4. Sample size was not pre-determined by statistical methods, but rather, based on preliminary data. Group allocation was performed randomly. In all studies, data represent biological replicates (n) and are depicted as mean ± s.d. as indicated in the figure legends. Comparison of mean values was conducted with unpaired, two-tailed Student’s t-test; one-way ANOVA; or two-way ANOVA with Tukey’s multiple comparisons test, as indicated in the figure legends. R (version 3.4.3), a system for statistical computation and graphics, was used for the analysis of the significance of Indel and translocation rates76. Indel frequency and translocation rate were first arcsine transformed to homogenize the variance. For the experiment with more than two groups (Actin), Levene’s test indicates that the assumption of homogeneity of variances was met for the on-target and all off-targets. Therefore, one-way analysis of variance (ANOVA) with completely randomized design was performed followed by pre-specified contrasts for the on-target and each off-target. For experiments with two groups (Fah and Pcsk9), Welch two sample t-test was performed for the on-target and each off-target. P values were adjusted using the Benjamini & Hochberg (BH) method to correct for multiple inferences in each experiment77. Correlation coefficient (Spearman and Pearson) were analyzed using R (version 3.4.3). In all analyses, P values < 0.05 were considered statistically significant.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

