Preloader

AGO CLIP-based imputation of potent siRNA sequences targeting SARS-CoV-2 with antifibrotic miRNA-like activity

Bioinformatics and statistical analyses

We used Python scripts (Biopython; https://biopython.org/) and Integrative Genomics Viewer (IGV; https://software.broadinstitute.org/software/igv/) for bioinformatics analysis. RNA-Seq analysis was performed using Cufflinks and Cuffdiff (http://cole-trapnell-lab.github.io/cufflinks/).

A standard laboratory practice randomisation procedure was used for all experimental groups. The investigators were not blinded to the allocations assigned during the experiments and the outcome assessment. All statistical tests, including the Wilcoxon rank-sum test (two-sided), Kolmogorov–Smirnov test (two-sided), t-test (unpaired two-tailed), and chi-square test were conducted using R (https://www.r-project.org/), SciPy (http://www.scipy.org/), or Excel. The values are presented as mean ± SD. Statistical significance was set at P-value = 0.05, relative to control, equal variance, unless stated otherwise; repeated with biologically independent samples (n ≥ 3).

Sequence analysis of SARS-CoV-2

Different sequencing results of SARS-CoV-2 gRNAs from the infected human host, currently deposited in the GenBank (n = 5475; 7th June, 2020), were downloaded (https://www.ncbi.nlm.nih.gov/sars-cov-2/), aligned using the multiple sequence alignment program, MAFFT (https://mafft.cbrc.jp/alignment/software/), and used to calculate the conservation rate (%) at each position via a sliding 6 nucleotide window, in which the size reflects the minimum length of the miRNA seed. The conservation rates were visualised with the positions in SARS-CoV-2 gRNA (NC_045512; reference sequence in GenBank) using IGV.

Meta-analysis of AGO CLIP data for SARS-CoV-2

RNA sequences encoding the RdRP of RNA viruses were retrieved from the GenBank for SARS-CoV-2 (NC_045512, positions 13442–16236; nsp12), HCV (NC_009823, positions 7667–9439; NS5B), HAV (M59808, positions 5929–7395; 3D), CVB (NC_038307, positions 5912–7297; 3Dpol), SINV (NC_001547, positions 5751–7598; nsP4), CHIKV (NC_004162, positions 5648–7498; nsP4), and VEEV (NC_001449, positions 5703–7520; nsP4). All sequences were analysed using the Clustal Omega program (https://www.ebi.ac.uk/Tools/msa/clustalo/) to achieve multiple sequence alignments. These alignments were further applied to generate neighbouring phylogenetic trees using the PHYLIP package (DNAdist and Protdist; https://evolution.genetics.washington.edu/phylip.html). Based on the phylogenetic analysis results, RNA viruses related to the nsp12 sequence of SARS-CoV-2 were selected.

For the meta-analysis of the AGO CLIP data, pairwise sequence alignment of the nsp12 region in SARS-CoV-2 was initially performed for each RdRP region, as described above for different viruses (i.e. where AGO CLIP experiments have been performed in infected host cells)29. Of note, since Ago2 antibodies used in the experiments29 were proven to be anti-pan Ago antibodies24, we referred them as data from AGO. After directly retrieving AGO CLIP data29, originally derived from the GEO database for HCV (n = 7; GSM2041566, GSM2041567, GSM2041568, GSM2041569, GSM2041570, GSM2041571, GSM2041572), HAV (n = 7; GSM2041645, GSM2041646, GSM2041647, GSM2041648, GSM2041649, GSM2041650, GSM2041651), CVB (n = 4; GSM2041622, GSM2041623, GSM2041624, GSM2041625), SINV (n = 6; GSM2041626, GSM2041627, GSM2041628, GSM2041629, GSM2041630, GSM2041631), CHIKV (n = 4; GSM2041632, GSM2041633, GSM2041636, GSM2041637), and VEEV (n = 2; GSM2041640, GSM2041641), coordination of the compiled read-counts mapped onto the viral genome29, were lifted to the nsp12 region of SARS-CoV-2 using the pairwise sequence alignment results; the counts were then used to generate bedGraph files and further analysed using IGV.

Calculation of the exposure probability in SARS-CoV-2

The exposure probability (Pexp) of the siRNA target sites was determined using the RNAplfold program (https://www.tbi.univie.ac.at/RNA/RNAplfold.1.html) as previously described31. The partition function for all local structures within the 80 nucleotide-long window (W = 80, as previously defined as optimal)31 was calculated to derive the accessibility (log10(Pexp)) of a target site in a predefined length (u) within the specified maximal distance (L) between two base-pairing positions. To select the optimal folding parameters, the difference in exposure probability (ΔPexp) between two sets of AGO-bound and either AGO-unbound or total regions (defined in nsp12 of SARS-CoV-2 by the meta-analysis of AGO CLIP data) was examined to consider the length of the seed site (u = 6, 7, and 8) with different constraints (L = 20, 40, and 60), compared with the previously defined length (u = 16) for siRNA target accessibility31. P-values were calculated using the Wilcoxon rank-sum test (two-sided) in R. To design siRNAs with seed sequences (6mers or more in positions 2–8) of antifibrotic miRNAs (miR-27, miR-193a-5p, miR-486, miR-151, and miR-455)17, target sites were selected in the siRNA-accessible regions (Pexp > 0.1) of SARS-CoV-2 gRNA (NC_045512) based on calculations using the optimised parameters (u = 8 and L = 20).

RNA synthesis

siRNA and miRNA were synthesised (Supplementary Table S1) using custom RNA synthesis services provided by Bioneer (Korea). The quality of the synthesised RNAs and their modifications were monitored, reported, and confirmed by the manufacturer. To serve as a negative control, non-targeting miRNA (NT) derived from the cel-miR-67 sequence (Caenorhabditis elegans-specific miRNA) was synthesised as an siRNA with two thymidine deoxynucleotide (dT) overhang. To avoid miRNA-like repression, the siRNA was further modified to contain abasic pivot (abasic deoxynucleotide, dSpacer (Ø), at position 6) in both strands as previously reported19,20. Duplexes of the miRNA, miR-27a, and miR-193a-5p, were synthesised according to the human miRNA annotation in miRBase (http://www.mirbase.org/), with consideration of the two-nucleotide overhang. For RNA-Seq analysis, the miR-27a sequence was synthesised according to the annotation in miRbase; however, its passenger strands contain 2ʹ-O methylation at positions 1 and 2 (2ʹOMe) in the form of siRNA, preventing seed-mediated repression from the passenger strand. siRNAs designed to contain the miR-27a seed sequence were synthesised as indicated in Fig. 3d; these siRNAs also contain 2ʹOMe in the passenger strands. Details of the siRNA and miRNA sequences are provided in Supplementary Table S1.

Cell culture, transfection, and treatment

The human epithelial lung carcinoma cell line, A549 (Korean Cell Line Bank), was grown in RPMI-1640 (Hyclone). The human fibroblast lung cell line MRC-5 (Korean Cell Line Bank) was grown in MEM (Hyclone). HeLa cells (Korean Cell Line Bank) were grown in Dulbecco’s modified Eagle’s medium (DMEM; Hyclone). All media were supplemented with 10% fetal bovine serum (FBS; Gibco), 100 U·mL−1 penicillin, and 100 μg·mL−1 streptomycin (Welgene) and incubated at 37 °C with 5% CO2.

Unless otherwise indicated, transfection of siRNA or miRNA (50 nM) was performed using Lipofectamine RNAiMAX (Invitrogen) according to the general protocol provided by the manufacturer. Lipofectamine 3000 (Invitrogen) was used to co-transfect the RNA with plasmid vectors according to the manufacturer’s protocol. To seed cells prior to transfection, the number of cells was quantified using the Countess II Automated Cell Counter (Invitrogen). The cells were harvested 24 h after transfection unless otherwise indicated. To induce differentiation of the fibroblast lung cell line, MRC-5, 200,000 cells per well in 6 well plates were transfected with 50 nM RNA using Lipofectamine RNAiMAX (Invitrogen) and then 2.5 ng/mL of human recombinant transforming growth factor-β1 (TGF-β, Sigma-Aldrich) was added after 24 h. Cell morphology was examined using inverted microscopy (Leica DMi8).

Construction of luciferase reporter constructs

To measure the efficiency of RNAi-mediated repression, the psiCheck-2 vector (Promega) was used to construct luciferase reporters, which had an on-target site inserted into the 3′ untranslated region (3′UTR) of Renilla luciferase. Accordingly, synthetic duplex DNA oligos (Bionics, Korea) containing different on-target sites (bold) were cloned into the psiCheck-2 plasmid via the XhoI and NotI sites as indicated: 151/nsp3, forward: 5′-TCGAGATGAAGTTGCGAGAGACTTGTCACTACAGTTTAAAAGACCAATAAATCCTACTGACCAGTCTTCTTACATCGTTGATAGTGTTACAGTGAAGAATGGTTCCATCCATCTGC-3′, reverse: 5′-GGCCGCAGATGGATGGAACCATTCTTCACTGTAACACTATCAACGATGTAAGAAGACTGGTCAGTAGGATTTATTGGTCTTTTAAACTGTAGTGACAAGTCTCTCGCAACTTCATC-3′; 193/nsp3, forward: 5′-TCGAGGCTGGTAGTACATTTATTAGTGATGAAGTTGCGAGAGACTTGTCACTACAGTTTAAAAGACCAATAAATCCTACTGACCAGTCTTCTTACATCGTTGATAGTGTTACAGGC-3′, reverse: 5′-GGCCGCCTGTAACACTATCAACGATGTAAGAAGACTGGTCAGTAGGATTTATTGGTCTTTTAAACTGTAGTGACAAGTCTCTCGCAACTTCATCACTAATAAATGTACTACCAGCC-3′; 193/nsp5, forward: 5′-TCGAGAACTCTTAATGACTTTAACCTTGTGGCTATGAAGTACAATTATGAACCTCTAACACAAGACCATGTTGACATACTAGGACCTCTTTCTGCTCAAACTGGAATTGCCGTTGC-3′, reverse: 5′-GGCCGCAACGGCAATTCCAGTTTGAGCAGAAAGAGGTCCTAGTATGTCAACATGGTCTTGTGTTAGAGGTTCATAATTGTACTTCATAGCCACAAGGTTAAAGTCATTAAGAGTTC-3′; 151-seed, forward: 5′-TCGAGCAGTCTAACAGTCTATACAGTCTAATCAGTCTAAACAGTCTATGC-3′, reverse: 5′-GGCCGCATAGACTGTTTAGACTGATTAGACTGTATAGACTGTTAGACTGC-3′; 486/RdRP, forward: 5′-TCGAGACTGATGTCGTATACAGGAGC-3′, reverse: 5′-GGCCGCTCCTGTATACGACATCAGTC-3′; 27/RdRP, forward: 5′-TCGAGAACAGTACAATTCTGTGAAGC-3′, reverse: 5′-GGCCGCTTCACAGAATTGTACTGTTC-3′.

Luciferase reporter assays

Luciferase reporter assays were performed as previously described19. Briefly, psiCheck-2 plasmids (Promega) were co-transfected with different amounts of duplex RNA (up to 100 nM) using Lipofectamine 3000 (Invitrogen). Twenty-four hours after transfection, relative activity (Renilla luciferase activity normalised to firefly luciferase) was measured using the Dual-Luciferase Reporter Assay System (Promega) and the GloMax-Multi Detection System (Promega), with replicates (n = 6), according to the manufacturer’s protocol. The half-maximal inhibitory concentration (IC50) was calculated via nonlinear least-squares fitting for the sigmoid function using SciPy (scipy.optimize.curve_fit()). When the least-squares failed to fit the function, the approximate IC50 was estimated from the regression line.

RNA-Seq analysis

RNA-Seq libraries were constructed from a large RNA using the strand-displacement stop/ligation method. Briefly, 1.5 μg of large RNA (extracted using RNeasy Mini Kit; Qiagen), polyadenylated mRNA was extracted with 10 μL of Dynabead Oligo(dT)25 (Invitrogen), according to the manufacturer’s protocol. Thereafter, 10 ng of the purified mRNA was subjected to RNA-Seq library preparation using the CORALL Total RNA-Seq Library Prep Kit (Lexogen), according to the manufacturer’s instructions. After the cDNA was constructed, the libraries were amplified using PCR with the lowest optimal cycle, which was determined by performing qPCR (Rotor-Gene Q; Qiagen) with the addition of SYBR Geen I (1:10,000; Invitrogen) to the Q5 High-fidelity 2 × master mix (New England Biolabs). The quality of the amplified library was verified using Fragment Analyzer (Advanced Analytical) and the size distribution and quantity were estimated. After cross-checking the concentration of libraries using the Qubit DNA HS assay kit (Thermo Fisher Scientific), the prepared multiplexed libraries were pooled and sequenced as 75 single-end reads (SE75) using the MiniSeq system (Illumina). All sequence data were deposited in the SRA database (SRP270828).

The de-multiplexed sequencing reads were aligned with the human genome (hg19) using STAR (star—outSAMtype BAM SortedByCoordinate—outSAMattrIHstart 0—outFilterType BySJout—outFilterMultimapNmax 20—outFilterIntronMotifs RemoveNoncanonicalUnannotated—outMultimapperOrder Random—alignSJoverhangMin 8—alignSJDBoverhangMin 1) with RefSeq gene annotations. The first 12 nucleotides from the starter side of the reads were trimmed before mapping as instructed by the manufacturer.

Transcript abundance was quantified as reads per kilobase of transcript per million mapped reads (RPKM) using Cufflinks (cufflinks -b -G—compatible-hits-norm—library-type fr-secondstrand), and the statistical significance of the differential expression (P-value) was derived using Cuffdiff (Cuffdiff—FDR = 0.1 -b—compatible-hits-norm—library-type fr-secondstrand), with a supply of mapping results in groups under the same experimental conditions. Only values with a valid status were selected and further analysed to identify differentially expressed genes (DEGs) with statistical significance (P-value).

CDF and volcano plot analyses

To examine the propensity of miRNA-like target repression depending on the transfection of a given duplex RNA, the cumulative distribution function (CDF) according to fold change (log2 ratio) was analysed using RPKM values derived from Cufflinks. Putative miRNA targets were selected if they contained cognate seed sites in the 3′UTR (annotated by RefSeq, downloaded from the UCSC genome browser), and the 7mer-A1 seed site (match to positions 2–8 with A at position 1) was generally searched unless otherwise indicated. “No site” indicated transcripts that had no cognate 6mer sites from the seed (positions 2–8) in their mRNA sequences. “Cont site” denoted that the subset of “No site” transcripts with a control site in the 3′UTR (i.e. where the nucleotide should align with pivot (position 6) was substituted by the same nucleotide to achieve impairment26,27). Total mRNAs were selected for the expression of mRNAs (Cufflinks; RPKM ≥ 0.1). The KS test was performed using SciPy (scypy.stats.ks_2samp()) relative to total mRNAs or control sites. The volcano plot was analysed by calculating the fold change and significance (− log10 (p-value)) derived from Cuffdiff; the cutoffs were used to select the DEGs.

Gene ontology analysis

Gene ontology (GO) analysis was performed using DAVID (http://david.abcc.ncifcrf.gov/), with a supply of downregulated DEGs (selected by volcano plot analysis) under the background of expressed transcripts in a matched control set (selected as RPKM > 0 and log2 (volume) > 2, where volume is the average RPKM across different conditions), and default parameters were employed unless otherwise indicated. GO analysis results (EASE score < 0.15, or 0.2) were visualised using REVIGO (http://revigo.irb.hr/) in the network of biological process terms. An interconnected graph, which had highly similar GO terms linked to represent the degree of similarity based on width, was also analysed.

Overlap analysis of 27/RdRP and miR-27a downregulated targets

In RNA-Seq data, downregulated target transcripts (P < 0.05 or log2 fold change < -0.5) depending on 27/RdRP or miR-27a expression (6mer (positions 2–7) seed site in the 3′UTR) were selected from the total expressed transcripts (RPKM > 0 and log2 (volume) > 2). To determine the statistical significance of the overlap between the 27/RdRP- and miR-27a-dependent target transcripts, the average number of random overlaps (1000 iterations of random shuffling in DEGs using Python, random.shuffle()) was calculated to determine the P-value using the chi-square test (scipy.stats.chisquare). GO analysis of the overlapping target transcripts was performed using DAVID, with the background of expressed transcripts in both 27/RdRP and miR-27a (selected as RPKM > 0 and log2 (volume) > 2). GO analysis results were visualized by REVIGO (EASE score < 0.2) or clustered with low classification stringency (presented in Supplementary Table S2).

Prediction of the SARS-CoV-2 nsp12 secondary structure

The SARS-CoV-2 nsp12 gRNA sequence was downloaded from the GenBank (NC_045512:13442–16236), and the secondary structure was predicted using the mfold program (version 3.6)32 with default parameters. The minimum free energy of 27/RdRP and 486/RdRP seed regions (positions 2–8) that hybridise with target sites in the nsp12, was calculated using RNAduplex (version 2.4.15)33.

Quantitative RT–PCR analysis

Total RNA was isolated using the RNeasy Mini Kit (Qiagen) via on-column DNA digestion with the RNase-Free DNase Set (Qiagen). Reverse transcription was performed using SuperScript III Reverse Transcriptase (Invitrogen) and oligo(dT) primers, qPCR was performed using the QuantiNova SYBR Green PCR Kit (Qiagen) and custom primers for COL1A1 mRNA (forward primer: 5′-GCCAAGACGAAGACATCCCA-3′, reverse: 5′-CGTCATCGCACAACACCTTG-3′) on Rotor-Gene Q (Qiagen). All reactions were run in triplicate with the standard two-step cycling protocol. Relative quantification was performed using the ΔCT method, with GAPDH (forward: 5′-TGCACCACCAACTGCTTAGC-3′, reverse: 5′-GGCATGGACTGTGGTCATGAG-3′).

Quantification of SARS-CoV-2 target RNAs silenced by siRNA

To prepare the nsp12 transcript, we initially synthesised the DNA of the nsp12 transcript (NC_045512, positions 13442–16236; ~ 2.8 kb), whose 5′ end was fused with the T7 promoter (5′-GGATCCTAATACGACTCACTATAGG-3′) using a custom synthesis service provided by Bionics (Korea). The product was further amplified using PCR (forward primer, 5′-GGATCCTAATACGACTCACTAT-3′; reverse primer, 5′-CTGTAAGACTGTATGCGGTG-3′) using Q5 High-Fidelity 2 × Master Mix (New England Biolabs). To serve as a control transcript that harbours no target site, we produced a T7 promoter-containing DNA of codon-optimised nsp12, amplified from a pLVX-EF1alpha-SARS-CoV-2-nsp12-2 × Strep-IRES-Puro plasmid (Addgene plasmid # 141378, positions 3550–6453) using PCR (forward primer, 5′-GGATCCTAATACGACTCACTATAGGATGTCAGCAGACGCACAAAG-3′; reverse primer, 5′-CTGCAGGACGGTGTGAGGC-3′). After purification using the QIAquick Gel Extraction kit (Qiagen), the DNA was used as a template for in vitro transcription, which was performed using the T7 RiboMAX Express RNAi System (Promega) according to protocols provided by the manufacturer’s instructions. After the DNA templates were digested via treatment with RQ1 RNase-Free DNase (Promega) at 37 °C for 30 min, the synthesised RNAs were purified using RNA Clean & Concentrator Kit (Zymo Research) and further polyadenylated using a Poly(A) Tailing Kit (Invitrogen), according to the manufacturer’s instructions.

After purification with RNA Clean & Concentrator Kits (Zymo Research), the extracted RNA transcripts (250 ng) were co-transfected with 50 nM siRNA into A549 cells (0.2 × 106 cells) using Lipofectamine 3000 (Invitrogen), according to the manufacturer’s protocol. After 24 h, total RNA was purified using QIAzol (Qiagen), followed by isopropanol precipitation. The amount of transfected RNA was measured using qPCR. Reverse transcription was performed using SuperScript III Reverse Transcriptase (Invitrogen) and oligo(dT) primers. qPCR was conducted using the QuantiNova SYBR Green PCR Kit (Qiagen) and custom primers (nsp12, forward primer: 5′-AGGTGAACGTGTACGCCAAG-3′, reverse primer: 5′-CTGGCGTGGTTTGTATGAAA-3′; codon-optimised nsp12, forward primer: 5′-GACTTCGTCGAAAACCCTGA-3′, reverse primer: 5′-CGTAAGCACACCAACAATGC-3′) using Rotor-Gene Q (Qiagen). All reactions were run in triplicate with the standard two-step cycling protocol. Relative quantification was calculated using the ΔCT method, with nsp12 normalised by the codon-optimised nsp12.

To compare the activity of two siRNAs (27/RdRP and 486/RdRP) that target nsp12 of SARS-CoV-12, the RNA fragment containing each on-target site (27/RdRP or 486/RdRP) with ± 43 nucleotides flanking regions was produced using in vitro transcription. Briefly, DNA fragments of the SARS-CoV-2 RdRP (NC_045512; positions 13961–14065, 27/RdRP; positions 13479–13583, 486/RdRP) under the T7 promoter (5′-GGATCCTAATACGACTCACTATAGG-3′) were synthesised (Bionics, Korea) and used as templates for the T7 RiboMAX Express RNAi System (Promega). After purification using QIAzol (Qiagen) and isopropanol precipitation, the synthesised RNA fragments (60 ng) were co-transfected with 50 nM siRNA into A549 cells (0.2 × 106 cells) using Lipofectamine 3000 (Invitrogen) according to the manufacturer’s protocol. After 24 h, the cells were harvested for small RNA (< 200 nucleotides) isolation using the miRNeasy Mini kit (Qiagen). The purified small RNAs were then reverse transcribed using SuperScript III Reverse Transcriptase (Invitrogen) with specific primers for RNA fragments harbouring 27/RdRP (5′-CTAATGTCAGTACACCAACA-3′) or 486/RdRP (5′-TTAGCAAAACCAGCTACTTTATC-3′). The relative amount of RNA fragments containing 27/RdRP vs.486/RdRP was measured using qPCR. qPCR was performed using the QuantiFast SYBR Green PCR Kit (Qiagen) and custom primers (27/RdRP, forward primer: 5′-TACGCCAACTTAGGTGAACG-3′, reverse primer: 5′-CCAACAATACCAGCATTTCG-3′; 486/RdRP, forward primer: 5′-GTGTAAGTGCAGCCCGTCTT-3′, reverse primer: 5′-TGTCAAAAGCCCTGTATACGAC-3′) using Rotor-Gene Q (Qiagen). All qPCR reactions were performed in triplicate using the standard two-step cycling protocol. Relative quantification was performed using the ΔΔCT method. Of note, the 486/RdRP fragment was used as a control for the 27/RdRP fragment and vice versa.

Construction of the siRNA webserver for SARS-CoV-2

The siRNA webserver for SARS-CoV-2 (http://clip.korea.ac.kr/covid19) was constructed using the Django framework (https://www.djangoproject.com/) in Python and MySQL. The web interface was implemented using HTML, CSS, and JavaScript components, where the JavaScript components utilised the JQuery library (http://jquery.com/).

Source link