Crosstalk between CRISPR-Cas9 and the human transcriptome

Tissue culture

HEK 293T cells (Takara Bio Lenti-X 293T, #632180) were maintained in DMEM (4.5 g/L D-glucose) supplemented with 10% FBS (Gibco) at 37 °C with 5% CO₂. Cells were periodically passaged once at 70-90% confluency by dissociating with TrypLE Express Enzyme (Gibco) at a ratio of 1:10.

Plasmid construction

Protein-expressing plasmids were constructed from pCDNA3.1(-) (ThermoFisher Scientific) by Gibson cloning a protein with upstream Kozak and start codon sequences and downstream stop codon sequence into its EcoRI and BamHI restriction enzyme sites. Catalytically inactive dSpCas9 without an NLS was subcloned from Nelles et al.²⁸. Catalytically dead SpCas9 with an NLS was subcloned from lentiCRISPR v2 (AddGene #52961). A V5 peptide sequence with G linker (5′-GGCAAACCGATCCCGAATCCGCTTCTTGGTCTTGACTCCACGGGG-3′) was cloned upstream of the expressed protein in pCDNA3.1-V5-dSpCas9 and pCDNA3.1-V5-SpCas9-NLS. A 3xFLAG peptide sequence with G linker (5′- GACTACAAAGACCATGACGGTGATTATAAAGATCATGACATCGACTACAAGGATGACGATGACAAGGGG -3′) was cloned upstream of the expressed protein in pCDNA3.1-3xFLAG-dSpCas9. The UnaG protein sequence (Fluorescent Protein Database) was human codon optimized with IDT’s codon optimization tool prior to ordering as a gBlock to clone into pCDNA3.1-UnaG. For U6 promoter and U6 promoter-driven gRNAs conditions, the U6 promoter sequence 5′-AGGTCGGGCAGGAAGAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGAGAGATAATTAGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGAC GTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACC-3′ with a 5′-TTTTTT-3′ terminator sequence was cloned into pCDNA3.1-V5-dSpCas9. The gRNA backbone sequence used was 5′-GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCT-3′. Sequences for guides 1 and 2 were 5′-GAGTGTCAGCCAGTATAACCC-3′ and 5′-GGCGCGGGCCGCTCGCTCTA-3′, respectively.

eCLIP experiment

HEK 293T cells were transfected at 60–80% confluence in 10 cm plates using the jetOPTIMUS transfection kit (Polyplus Transfection) with either pCDNA3.1-V5-dSpCas9, pCDNA3.1-3xFLAG-dSpCas9, or pCDNA3.1(-). Forty-eight hours post-transfection, biological replicates of confluent 10 cm plates of HEK 293T cells were treated with 400 mJ/cm² of UV using the Stratalinker 2400, harvested in ice cold PBS and pellets flash frozen in liquid nitrogen and stored in −80 °C until ready to IP with either V5 Tag mouse monoclonal antibody (ThermoFisher Scientific #R960-25) or mouse monoclonal ANTI-FLAG M2 antibody (Sigma-Aldrich #F1804) each at a dilution of 1:3000 in a subsequent protocol exactly as detailed in Van Nostrand et al.⁶, cutting the nitrocellulose membrane from 115 kDa and up. The size-matched input not subjected to IP was cut from the identical region. Sequencing was performed on Illumina HiSeq 4000 with paired end reads.

eCLIP computational analysis

Data were processed through Dr. Yeo’s eCLIP pipeline version 0.4.0 (https://github.com/YeoLab), aligning reads to the human reference genome hg38. Reproducible peaks were assigned using IDR (Irreproducible Discovery Rate) in which entropy was used to rank replicate peaks, and further evaluated using self-consistency and rescue ratios according to Van Nostrand et al. (2016), where detailed information regarding peak calling used for Cas9 in this study can be found. Bed files representing each of the four sets of eCLIP peak IDR analyses (two for V5 and two for FLAG) were intersected with minimum 50% overlap, intersecting the V5 and FLAG eCLIP peaks separately at first. Maximum eCLIP peak enrichment per gene was taken from the V5 vs. size-matched input or FLAG vs. size-matched input IDR analysis, with R² statistics performed on these values. Gene regions for each eCLIP peak were determined based on all gene regions represented across all four IDR peaks whose intersection yielded that peak. For the gene region analysis of top three represented regions (5′ UTR, CDS, 3′ UTR), region lengths were normalized over average TPM (transcripts per million) across all four no IP/size-matched input eCLIP RNA-sequencing datasets (two for V5-dSpCas9; two for 3xFLAG-dSpCas9) for genes with TPM > 1. Paired nucleotide probabilities (Vienna RNAplfold default parameters, u = 1) were region length- and TPM-normalized for genes with TPM > 1.

Immunofluorescence (IF) imaging of Cas9 proteins

HEK 293T cells were transfected at 60–80% confluence in Nunc Lab-Tek II Chamber Slides (ThermoFisher Scientific) using the jetOPTIMUS transfection kit (Polyplus Transfection) with either pCDNA3.1-V5-dSpCas9 or pCDNA3.1-3xFLAG-dSpCas9. Slides were fixed with MeOH, blocked for 1 h at room temperature, and incubated under gentle orbital shaking with primary antibody overnight: either V5 Tag mouse monoclonal antibody (ThermoFisher Scientific #R960-25) at 1:3000 dilution or mouse monoclonal ANTI-FLAG M2 antibody (Sigma-Aldrich #F1804) at 1:1000 dilution. Slides were washed five times for 10 min with phosphate-buffered saline with Tween 20 (PBST), then incubated for 1 h at room temperature under gentle orbital shaking with secondary antibody: Goat anti-mouse IgG AlexaFluor 488 Superclonal Recombinant Secondary antibody (ThermoFisher Scientific #A28175) at 1:2000 dilution. Slides were washed five times for 10 min with PBST, then washed three more times with PBS before mounting overnight with 4′,6-diamidino-2-phenylindole (DAPI). All antibodies were incubated with 5% BSA in 0.1% Tween-PBS. Immunofluorescence images were taken at 63x objective with a Zeiss LSM 780 confocal microscope in 5–10 slices, with maximum intensity projections across the entire image plane generated in Zeiss ZEN 2010 for figures.

Gene ontology enrichment of eCLIP genes

Gene ontology enrichment was performed on the 381 eCLIP genes with Panther, using a background of 11,405 genes derived from size-matched inputs (average TPM > 1 among 1N, 4N, 6N, 2N).

Computational analysis of stress pathway-associated eCLIP genes

eCLIP genes associated with the Panther GO gene ontology accession GO:0033554 cellular response to stress (n = 66) were selected. For each gene, the mean of its log2 RPKM L1000 expression over 165 human cancer cell lines was taken from a Cas9 gene expression dataset in Enache et al.¹⁰, filtering out those with <1 log2 RPKM L1000 expression in any of the 165 control cell lines (n = 55 genes).

RT-qPCR of stress pathway-associated eCLIP genes

HEK 293T cells were transfected at 60–80% confluence in six-well plates using the jetOPTIMUS transfection kit (Polyplus Transfection) with either pCDNA3.1-V5-dSpCas9, pCDNA3.1-V5-dSpCas9-U6 promoter, pCDNA3.1-V5-dSpCas9-U6-gRNA 1, pCDNA3.1-V5-dSpCas9-U6 gRNA 2, pCDNA3.1-V5-SpCas9-NLS, pCDNA3.1(-), or pCDNA3.1-UnaG in three bioreplicates per condition. RNA was extracted from cells with the RNeasy Plus kit (Qiagen). Approximately, 1 µg of RNA was converted into cDNA with the ProtoScript II First Strand cDNA Synthesis kit (NEB) with random primers. qPCR for two technical replicates of each of the three bioreplicates with a distinct pair of PCR primers per gene was performed on a CFX384 Touch Real-Time PCR Detection System (Bio-Rad) with 1/6 diluted cDNA samples at 2 µL input in PowerTrack SYBR Green Master Mix (ThermoFisher Scientific), for 95 °C initial incubation for 2 min, followed by 40 cycles of 95 °C for 15 s and 60 °C for 1 min. Technical replicates were averaged for each of the three bioreplicates per condition. In analysis each gene’s expression was compared to GAPDH housekeeping gene expression to compute Δct values. Then –ΔΔct values were computed for each condition-bioreplicate-gene Δct with respect to the mean gene Δct of the pCDNA3.1(-) bioreplicates. Comparisons among a given gene’s condition-bioreplicate –ΔΔct values were made pairwise with one-way ANOVA. PCR primer pairs for given genes are as follows: GAPDH (F: 5′-GTCTCCTCTGACTTCAACAGCG-3′, R: 5′-ACCACCCTGTTGCTGTAGCCAA-3′); ACTB (F: 5′-CACCATTGGCAATGAGCGGTTC-3′, R: 5′-AGGTCTTTGCGGATGTCCACGT-3′); p53 (F: 5′-GAGCTGAATGAGGCCTTGGA-3′, R: 5′-CTGAGTCAGGCCCTTCTGTCTT-3′); CDIP1 (F: 5′-ATTGGCTTGATGAATTTCGTGC-3′, R: 5′-GTGCGTCACATCCTTGAAGTC-3′); ATF3 (F: 5′-CCTCTGCGCTGGAATCAGTC-3′, R: 5′-TTCTTTCTCGTCGCCTCTTTTT-3′); CDKN1A (p21) (F: 5′-AGGTGGACCTGGAGACTCTCAG-3′, R: 5′-TCCTCTTGGAGAAGATCAGCCG-3′).

Western blots

Frozen pellets containing 10 million cells were recovered from −80 °C. Protease inhibitor III (Millipore Sigma #539134) was combined with iCLIP lysis buffer (50 mM Tris-HCL pH 7.4, 100 mM NaCl, 1% NP-40 Igepal CA630, 0.1% SDS, 0.5% Sodium deoxycholate). Cells were lysed with eCLIP lysis buffer and protease inhibitor for 15 min on ice and then sonicated on low for 5 min, 30 s on/30 s off. Lysed cells were centrifuged at 15,000 × g for 4 min. The supernatant was aliquoted into 100 µL aliquots to be stored at −80 °C to prevent protein degradation. Protein concentration was measured by Pierce BCA Protein Assay (ThermoFisher Scientific #23227). Fifty micrograms of protein was run on NuPAGE 4–12% Bis-Tris Gel (ThermoFisher Scientific #NP0335BOX) at 150 V for 1.5 h. Gels were transferred via iBlot 2 Gel Transfer Device (ThermoFisher Scientific #IB21001), blocked in 5% milk, and put in primary overnight. Florescent antibodies were utilized for multiplexing. For ATF3, primary antibody Recombinant Anti-ATF3 antibody [EPR22610-19] (Abcam #ab254268) at 1:1000 dilution and secondary antibody IRDye 680RD Goat anti-Rabbit IgG Secondary Antibody (Li-Cor #926-68071) at 1:20,000 dilution were used. For alpha tubulin, primary antibody Anti-alpha Tubulin antibody [DM1A] – Loading Control (Abcam #ab7291) at 1:5000 dilution and secondary antibody IRDye 800CW Goat anti-Mouse IgG Secondary Antibody (Li-Cor #926-32210) at 1:20,000 dilution were used. Membranes were visualized using the Azure biosystems c600. Proteins were quantified using ImageJ Version 2.0.0-rc-69/1.52n (https://imagej.nih.gov/).

Electrophoretic mobility shift assays (EMSAs)

All EMSAs were performed with SpCas9-NLS protein (CAS9PROT from Sigma-Aldrich) in EMSA buffer (20 mM Tris-HCL pH 7.4, 150 mM KCl, 5 mM MgCl₂, 0.1% BSA, 1 mM DTT, 5 mM EDTA, 200 U/mL Superase-In RNase Inhibitor (ThermoFisher Scientific), 5% glycerol, 0.01% Tween 20, 50 µg/mL heparin). RNA was in vitro transcribed with the MEGAscript T7 Transcription kit (ThermoFisher Scientific) and purified with RNA Clean & Concentrator-5 (Zymo). Labeled RNA was 5′ labeled with the 5′ EndTag Labeling DNA/RNA kit (Vector Laboratories) and IRDye 800CW Maleimide (Li-COR Biosciences). After incubating protein and RNA in EMSA buffer for 30 min at room temperature, 10x Orange loading dye (Li-COR Biosciences) was added to samples before pipetting into gels pre-run for 20 min at 120 volts at 4 °C. Gels were resolved by running for 1 h at 120 volts at 4 °C on 6% Novex TBE gels (ThermoFisher Scientific) with 0.5x TBE buffer. Images were taken with the Azure Biosystems c600 imager.

Competitive EMSA

Cas9 protein at 640 nM was incubated with 20 nM 5′-end fluorescently labeled in vitro transcribed CDIP1 5′ UTR RNA (5′-UACCCGCCUCCUUGUGACAGAAGUGCGACUGCCAGCUGCCGAGGCGUUCGGUCCUGCUGUUGCGGCCGCUGCCCCAGGGCUGCGGGGACGGUGAGUCGACUGGA-3′) and either unlabeled in vitro transcribed Cas9 gRNA (5′-AUUAAUCGGUGGGAGUAUUCGUUUAAGAGCUAUGCUGGAAACAGCAUAGCAAGUUUAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCU-3′) or unlabeled in vitro transcribed non-specific N.S. RNA (5′-CUAUGCGGCAUCAGAGCAGAUUGUACUGAGAGUGCACCAUAUGCGGUGUGAAAUACCGCACAGAUGCGUAAGGAGAAAAUACCGCAUCAGGCGCCAUUCGCCAUUCAGGCUGCGCAACUGUUGG-3′) at molar ratios of 1:1 to 1:32 with respect to Cas9 protein.

Mutant CDIP1 EMSAs (Fig. 2)

Cas9 protein at 0, 198, 296, 444, 667, and 1000 nM was incubated with 20 nM 5′-end fluorescently labeled in vitro transcribed N.S. RNA, CDIP1 5′ UTR RNA, CDIP1 5′ UTR RNA (loop G > U), or CDIP1 5′ UTR RNA (loop U > A). GU-loop sequence is emboldened and underlined in the CDIP1 5′ UTR RNA sequence above.

Mutant CDIP1 EMSAs (Supplementary Fig. 6)

In vitro transcribed RNAs were 3′ end labeled with Terminal Deoxynucleotidyl Transferase (ThermoFisher Scientific) and Propargylamino-dCTP-Cy5 (Sigma-Aldrich). Cas9 protein at 0, 148, 222, 333, 500, and 750 nM was incubated with 5 nM 3′ end fluorescently labeled in vitro transcribed CDIP1 5′ UTR RNA and various GU-loop and 5nt-stem mutants depicted in the figure.

Relevant uncropped EMSA gels can be found in Supplementary Figs. 11 and 12.

In silico RNA secondary structure modeling

The minimum free-energy secondary structure of CDIP1 5′ UTR RNA was predicted in RNAfold (Vienna RNA Websuite). A model for SpCas9 RNA binding to eCLIP peaks was developed with RNAplfold (Vienna RNA Websuite), based on the GU-loop:5nt-stem of SpCas9 in complex with its gRNA (base-pairing probability of loop U < 0.7; base-pairing probability of each of the five stem bases > 0.5), under default parameters with 50 nt padding on either side of an input RNA sequence (unspliced, for consistency given that some peaks are located on unspliced RNA). The prediction performance of this model was compared for eCLIP peak sequences (n = 478) and eCLIP genes with peak sequences (n = 381) against 1000 Monte Carlo simulations of random shuffles of the eCLIP peak sequences (n = 478).

In vivo RNA secondary structure modeling

In vivo click selective 2-hydroxyl acylation and profiling experiment (icSHAPE) structure probing data of HEK 293T transcripts¹⁸ were utilized. For inclusion in the analysis, Cas9-interacting RNA transcripts were required to have (i) an eCLIP peak represented by two valued replicates of icSHAPE reactivities across the entire eCLIP peak, (ii) only one eCLIP peak per transcript, and (iii) a peak interval length of at least 50nt. This quality control filter yielded a total of ten eCLIP peaks. Experimental folds were computed using RNApvmin and RNAfold (Vienna RNA Websuite; Mathews et al.²⁰; Deigan et al.¹⁹ parameters of slope 1.9 and intercept −0.7).

Computational analysis of RNA-Seq data for nickase Cas9-APOBEC with gRNA co-expressed in HEK 293T cells

Editing sites with edit rates for replicates 1 and 2 of EMX, RNF2, and N.T. gRNA were taken from Supplementary Tables 11, 12, and 13 of Grünewald et al.²⁵ For each of the six replicates, the total number of editing sites per gene (as determined by alignment to GENCODE v29) was plotted in a box plot for non-eCLIP genes alongside eCLIP genes. Only genes with at least one editing site were plotted for each cohort. On a per gene basis C-to-U edit site counts were compared to the average TPM (transcripts per million) across all four no IP/size-matched input eCLIP RNA-sequencing datasets (two for V5-dSpCas9; two for 3xFLAG-dSpCas9), with R² statistics performed on these values. The mean fraction of edits within W (50, 100, 200, 500) nt distance of eCLIP peaks was calculated for each unique eCLIP peak whose midpoint mapped to spliced RNA. Briefly, for each eCLIP peak midpoint, the fraction of all C-to-U edit sites on its spliced transcript within W nt distance was calculated. For each of the six replicates, the mean of this value over all eCLIP peaks was then calculated. For the 10,000 Monte Carlo simulations, simulated eCLIP peaks were placed according to a uniform random distribution across their respective spliced RNA transcripts.

Computational analysis of CHIP-Seq data for catalytically dead Cas9 with gRNA co-expressed in HEK 293T cells

CHIP-Seq data for replicates 1 and 2 of gRNAs 1, 2, and 3 were taken from GEO: GSE55887 of Kuscu et al.²² Reads were mapped to the human reference genome hg38 and converted to bedgraph file form using bowtie (1.2.2) and bedtools (2.27.1) with read coverage normalized to reads per million. For each of the six replicates, the maximum single-base read coverage per gene (as determined by alignment to GENCODE v29) was plotted in a box plot for non-eCLIP genes alongside eCLIP genes. For inclusion in a cohort, genes were required to have at least one mapped read in the CHIP-Seq dataset and TPM > 1, where TPM are the average transcripts per million across all four no IP/size-matched input eCLIP RNA-sequencing datasets (two for V5-dSpCas9; two for 3xFLAG-dSpCas9). On a per gene basis maximum single-base read coverages were compared to the average TPM across all four no IP/size-matched input eCLIP RNA-sequencing datasets (two for V5-dSpCas9; two for 3xFLAG-dSpCas9), with R² statistics performed on these values.

Computational analysis of GUIDE-Seq data for Cas9 with gRNA co-expressed in HEK 293T cells

GUIDE-Seq data for the no Cas/no gRNA negative control and gRNAs 1, 2, 3, and 4 were taken from SRA: SRP050338 of Tsai et al.²¹ Reads were processed into unique reads from UMIs and then mapped to the human reference genome hg38 using the GUIDE-Seq pipeline (https://github.com/tsailabSJ/guideseq) and BWA (0.7.17), with reads normalized to reads per million. For each of the five conditions, the total number of mapped reads per gene (as determined by alignment to GENCODE v29) was plotted in a box plot for non-eCLIP genes alongside eCLIP genes. Only genes with at least one mapped read were plotted for each cohort.

General computational analysis

Custom scripts written in Python 3.7.7 and MATLAB 2019b were used to analyze and plot data.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source link

Vasiprak Blog