SARS-CoV-2 RNA is highly structured in host cells
To study the secondary structures of SARS-CoV-2 RNAs inside cells, we infected Vero-E6 cells with WT and Δ382 SARS-CoV-2 and performed structure probing using the compound NAI (Methods)10,25. We then performed mutational mapping (MaP) to determine the location of high reactivity bases, indicating single-stranded bases, along the virus genome. We confirmed that mutational rates along with NAI-treated, denatured and DMSO-treated samples are as expected (Supplementary Fig. 1a,b, Supplementary Data 1). Biological replicates of SARS-CoV-2 SHAPE-MaP show that the reactivities across replicates are highly reproducible (Supplementary Fig. 1c), cover around 80% of the entire SARS-CoV-2 genome (Fig. 1b), and map to known structures in the 5’ and 3’ UTR as expected (Supplementary Fig. 1d, Supplementary Data 2). SHAPE-MaP reactivities were used to constrain RNA secondary structure predictions to obtain accurate structure models of the entire SARS-CoV-2 genome (Methods, Fig. 1c,d, Fig. 2)26. Our structure models are consistent with previously identified structural elements in the 5’ and 3’ UTRs (Fig. 1c)27,28,29 and for TRS-L elements (Supplementary Fig. 2a)27, confirming that models based on our data are accurate. As the frameshifting element is a conserved element that is important for ribosome frameshifting and translation of ORF1b, multiple structure models based on high throughput structure probing data have been proposed. We observed that our structure model resembles the alternative structure presented in Lan et al. (Fig. 1d, Supplementary Fig. 2b), and mapped our SHAPE-MaP reactivities onto the different proposed models in the literature. Out of five different structures, our SHAPE-MaP reactivities (from both the WT and Δ382 SARS-CoV-2 virus) agree the most with the in-cell FSE model proposed by Lan et al. (Supplementary Fig. 3,4). Similar to other RNA viruses like DENV and ZIKV, 57% of the bases in the SARS-CoV-2 genome are predicted to be paired, with a median helix length of 5 bases in both WT and Δ382 genomes (Supplementary Fig. 4b)10. These short helices enable RNA viruses to escape from host immune responses.


a Regions (150 bases) with the lowest 20% SHAPE reactivity (in black), Shannon entropy (blue), and greatest difference (top 20%) between actual and shuffled energies (green) are shown along both the WT and Δ382 genomes. 12 consensus regions are consistently highlighted (4/6) across both WT and Δ382 genomes and are shown in fuchsia. The red box indicates the deletion region in the Δ382 genome. b Structure models of 5 consensus were generated using the program RNAstructure54, using SHAPE-MaP reactivities as constraints, and visualized using VARNA57. The SHAPE-reactivities are mapped onto the structure models. Source data are provided as a Source Data file.
To identify potentially functional structural RNA elements in the SARS-CoV-2 genome, we used a consensus model between WT and Δ382, incorporating local SHAPE reactivity, local Shannon entropy of the structure models and local ScanFold Z-scores in 150 nt windows (Fig. 2a)29,30,31,32. We evaluated window sizes of 50–300 nt which yielded consistent results (Supplementary Fig. 6a). We considered a position a consensus candidate if it had at least four out of six possible characteristics of ‘low average SHAPE’ as an indication of structuredness at a location, ‘low Shannon entropy’ as an indication of structural consistency and limited alternative folding, and ‘low ScanFold Z-score’ as a proxy for the high stability of putative structural elements in both WT and Δ382 (Fig. 2a). Local structure models of consensus regions were largely consistent with structures obtained in the global context and identified novel highly structured elements within the genome (Fig. 2b, Supplementary Fig. 5). As single-stranded regions present in the SARS-CoV-2 genome could be used for siRNA targeting, we also identified locations with high reactivities (top 20%) in both the WT and Δ382 genomes (Supplementary Fig. 6b, Supplementary Data 3). We identified a total of 21 regions that could be used for siRNA targeting, to facilitate potential treatments for COVID-19 disease.
SARS-CoV-2 genome contains hundreds of regions involved in intramolecular long-range interactions
In addition to determining which bases are paired or unpaired in the SARS-CoV-2 genome, we also wanted to know the identity of pairing partners within the genome. To identify pair-wise RNA interactions, we treated SARS-CoV-2 or Δ382 infected Vero-E6 cells with biotinylated psoralen and performed proximity ligation sequencing using SPLASH20. Biological replicates of SPLASH showed a good correlation of pair-wise interactions between the samples, indicating that our method is robust (Supplementary Fig. 7a,b). 82.9% of our chimeric interactions on the 18 S and 28 S rRNA fall within 30 Å in physical space, indicating that our data is capturing pair-wise interactions as expected (Supplementary Fig. 7c).
We identified 237 and 187 intramolecular interactions along the WT and Δ382 genomes, respectively (Fig. 3a, Supplementary Data 4,5). SPLASH pair-wise interaction patterns are largely consistent between WT and Δ382 genomes, indicating the robustness of our method (Fig. 3a, Supplementary Fig. 7d, e). 45.6% and 42.3% of the intramolecular interactions occur over a distance longer than 1 kb in the WT and Δ382 genomes respectively, indicating that the viral sequences are involved in extensive long-range interactions (Fig. 3b, Supplementary Fig. 7f–h). Longer-range interactions (>1 kb) tend to have a lower number of reads than shorter-range interactions, indicating that they are formed more transiently inside cells (Fig. 3c), consistent with previous literature that longer-range interactions tend to be disrupted33. As RNA structures could have an impact on the regulation of virus gene expression, we examined whether RNA pairing could be associated with translation using publicly available SARS-CoV-2 ribosome profiling data (Supplementary Data 6). We observed that ribosome pause sites from cycloheximide experiments have more pair-wise interactions than non-pause sites (Fig. 3d)34, suggesting that RNA structures could be associated with translational pauses and thus regulate the translation of SARS-CoV-2.


a Pair-wise RNA-RNA interactions along the WT (blue) and Δ382 (red) genomes. The thickness of the lines indicates the abundance of chimeric reads for that particular interaction. b Histogram showing the distribution of interactions that span different lengths along the WT SARS-CoV-2 genome. Interactions over a distance longer than 1 kb are classified as “long-range” and comprise 45.6% of all interactions. c Boxplot showing the distribution of the abundance of SPLASH chimeric reads for long (> 1 kb) (WT n = 195, Δ382 n = 264) and short (≤ 1 kb) (WT n = 140, Δ382 n = 150) pair-wise interactions in both WT (left) and Δ382 (right) genomes. Long interactions tend to have lower SPLASH interaction counts, suggesting that they may be formed more transiently. P-values were calculated using a two-sided Wilcoxon Rank Sum test without adjustments (p = 3.857 × 10−5 and 1.776 × 10−8 respectively). d Boxplot showing the distribution of SPLASH chimeric reads along the SARS-CoV-2 genome for all sites (left) (n = 29,847) and for sites that show ribosome pausing events (right) (n = 80). Sites with ribosome pausing events show higher SPLASH chimeric reads, indicating that they reside in more highly structured regions. P-value was calculated using the two-sided Wilcoxon Rank Sum test without adjustments. In (c, d), the box represents the 25–75th percentiles, and the median is indicated. The whiskers show the minimum and maximum values. The outliers are presented as dots. e Bar-charts showing the proportion of unique pair-wise interactions, as well as interactions that have 2 or more alternative partners, along the WT and Δ382 genomes. f Arc plots showing the alternative interactions between N and other positions along the SARS-CoV-2 genome. g Representative structure models for interactions between N and other regions along the genome. Structure models are generated using RNAcofold from regions identified by SPLASH. For each interaction, the SPLASH count and predicted energy of folding from RNAcofold is shown next to the model35. SHAPE-MaP reactivities are mapped onto the bases in the structure models. Source data are provided as a Source Data file.
Interestingly, we observed that SARS-CoV-2 RNA exhibit more alternative interactions than DENV and ZIKV RNAs inside the cell, with 55.6% and 48.1% of the WT and Δ382 pair-wise interactions involving two or more partners (Fig. 3e)10. This suggests that SARS-CoV-2 RNA takes on numerous conformations that are present simultaneously inside host cells. We observed that a location at the 3’ end of sgRNA N is particularly promiscuous and interacts with regions throughout ORF1a (Fig. 3f). Structure modelling of SPLASH identified interactions using the program RNAcofold revealed that energies calculated from the predicted pairings are coherent with the SPLASH interaction counts (Fig. 3g)35, indicating that the relative abundance of SPLASH counts between different interactions could serve as a proxy for the relative prevalence of these interactions inside cells.
SARS-CoV-2 sgRNAs are structurally different
In addition to the synthesis of the full-length genomes, a nested set of 3’ co-terminal sgRNAs are made in SARS-CoV-2 infected cells using discontinuous RNA synthesis9 (Supplementary Fig. 8a). These sgRNAs range from 2–8 kb long, contain a leader sequence and are produced at different amounts. While SHAPE-MaP provides information on single nucleotide SHAPE along the genome, short-read sequencing makes it difficult to map structure information unambiguously to individual sgRNAs. While sgRNAs have been observed to have different structures from full-length genomic RNA using enrichments and proximity ligation and sequencing36, it is unclear whether each individual sgRNA contains unique structures that could be important for sgRNA-specific functions and regulation.
To address this, we utilize our previously developed method of coupling RNA structure probing with Nanopore direct-RNA sequencing (PORE-cupine) to allow us to read out SHAPE reactivities along long RNA molecules19. Sequencing of two biological replicates of RNAs extracted from NAI-treated, WT and Δ382 SARS-CoV-2 infected Vero-E6 cells showed good structure correlation, indicating that our data is reliable (Supplementary Fig. 8b, c). We also confirmed that PORE-cupine reactivity shows a good correlation with SHAPE-MaP reactivity along the SARS-CoV-2 genome (Supplementary Fig. 8d). By filtering for the full-length reads that contain leader sequences, we determined reactivities along individual ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8 (WT only), and N sgRNAs (Fig. 4a, Supplementary Fig. 9a, Supplementary Data 7). We observed that ORF7b sgRNA contains the highest average reactivities for both WT and Δ382, suggesting that it is likely to be the most single-stranded among the different sgRNAs of SARS-CoV-2 (Fig. 4b, Supplementary Fig. 9b). As structures around the leader sequences for each sgRNA were previously shown to have weak correlations with gene expression, we calculated the correlation between PORE-cupine reactivity around TRS-B sites for each sgRNA and their relative abundance from our Nanopore data. We observe a weak positive correlation between reactivity and transcript abundance, similar to previously published literature28, for both WT and Δ382 sgRNAs, suggesting that single-strandedness around the TRS-B region could result in increased synthesis of corresponding sgRNAs (Fig. 4c, Supplementary Fig. 9c).


a Top, SPLASH interactions along SARS-CoV-2 from the region 3a to the 3’ UTR. Bottom, PORE-cupine reactivity signals are averaged across all the signals from the sgRNAs (Sum). PORE-cupine reactivity signals are also shown for 3a (green), E (blue), M (brown), 6 (purple), 7a (light brown), 7b (navy), 8 (red) and N (dark green). PORE-cupine reactivity signals for each sgRNA are filtered for full-length sequences that contain leader sequences for each sgRNA. Regions with significant differences are highlighted in grey, p < 0.05, (Methods). b Violin plots showing the distribution of average reactivities for each sgRNA. Each sgRNA is subsampled for 500 strands before calculating its mean, n = 100. P-values were calculated by comparing the distribution of the reactivities in each sgRNA against all of the sgRNAs with a two-sided Wilcoxon Rank Sum test. n.s.: not statistically significant. The box represents the 25–75th percentiles, and the median is indicated. The whiskers show the minimum and maximum values. c Scatterplot showing the correlation between the PORE-cupine reactivity around TRS-B for each sgRNA (x-axis) against transcript levels inside cells (y-axis). d,e Reactivity plots of regions that show significant structure differences between the sgRNAs. d P-value for the follow regions are: 27,750–27.850 (p-value = 3.52 × 10−6), 27,800–27,900 (p-value = 2.55 × 10−8), 27,850-27,950 (p-value = 0.02), 27,900–28,000 (p-value = 0.03) and 27,950-28,050 (p-value = 0.04). e P-value for the follow regions are: 29,250–29,350 (p-value = 0.02). f Boxplots showing the distribution of correlation between reactivities of different sgRNAs for regions that show unique SPLASH interactions (1) and regions that show alternative SPLASH interactions (≥ 2). Regions that show alternative SPLASH interactions take on different conformations and show lower reactivity correlations between sgRNAs. The p-value was calculated using a two-tailed Wilcoxon Rank Sum test. The box represents the 25–75th percentiles, and the median is indicated. The whiskers show the minimum and maximum values. The outliers are presented as dots. g Structure models of WT ORF7b and M sgRNA are generated using the program RNA structure, using PORE-cupine reactivities as constraints. PORE-cupine reactivities are mapped onto the secondary structure models. The red and blue arrows indicate the same positions (start for red and end for blue) in the structure models. Source data are provided as a Source Data file.
To identify structures unique to each sgRNA, we compared the reactivities among individual sgRNAs to identify highly consistent as well as divergent structural regions (Fig. 4a, Methods). We found 4 regions in RNAs of WT SARS-CoV-2 that showed consistent structure differences between different sgRNAs, 3 of which are also seen in the RNAs of Δ382 sgRNAs (Supplementary Fig. 9a). While two regions centred around bases 27,800 and 28,250 correspond to the leader sequences of sgRNAs of ORF7b and N respectively, two other structurally different regions (centred around 29,300 and in 3’ UTR) are present within all sgRNAs, and hence cannot be identified using short-read sequencing (Fig. 4d, e, Supplementary Fig. 9a, d, e). We checked that the regions that show diverse structures in different sgRNAs also exhibit multiple interaction partners by SPLASH, confirming that those regions do exist in alternative conformations (Fig. 4f). We then visualized the sgRNA-specific structures by incorporating PORE-cupine reactivities into structure modelling and observed different structure models for the same sequence region in different sgRNAs (Fig. 4g, Supplementary Fig. 10a, b), further confirming that different sgRNAs could exist in different structures despite sharing the same sequences.
Genomes of WT and Δ382 SARS-CoV-2 contain different RNA structures
Viruses that contain genomes with various ORF8 deletions have been found in patients around the world7, however the mechanisms behind how such deletions impact the virus are still largely unknown. To determine whether virus phenotypes could be associated with structural differences, we performed correlations of SHAPE-MaP reactivities between the two genomes. As expected, structures in WT and Δ382 genomes are generally highly correlated (R = 0.62, Supplementary Fig. 11a), although we do observe local structure differences at the deletion region of around base 28000 (Supplementary Fig. 11b, c). SPLASH analysis around the deletion region also revealed differences in pair-wise interactions between WT and Δ382, confirming the local structure rearrangements between the two viruses (Supplementary Fig. 11d).
As the deletion occurs around base 28,000 (ORF8), it is present not only in the full-length genome but also in most of the sgRNAs (except for sgRNA of N, starts the site of which is located downstream of the deletion). Due to the extensive amount of sequence similarity between the different sgRNAs, it is difficult to map uniquely to each individual sgRNA using short-read sequencing. Consequently, it remained unclear whether the structure differences between WT and Δ382 are present in all the sgRNAs or only between some specific sgRNAs. To determine which sgRNA shows reactivity differences between WT and Δ382 genomes, we compared the PORE-cupine reactivity profiles of individual WT and Δ382 sgRNA to each other (Supplementary Fig. 12a). While we could only detect very local reactivity differences immediately before and after the deletion site when all the WT and Δ382 sgRNA reads are summed and compared to each other as an aggregate (similar reactivity profiles obtained using short-read Illumina sequencing), we observed additional structure differences when individual WT and Δ382 sgRNAs are compared to each other (Supplementary Fig. 12a–d). We observed the largest structure differences between WT and Δ382 in ORF3a and E sgRNAs (Supplementary Fig. 12a). We also consistently observed a second structurally different region between WT and Δ382 sgRNAs at the bases 29,200–29,400 (Supplementary Fig. 12b, d), indicating that the deletion could impact distal structures that are located more than 1 kb away. As expected, we did not observe reactivity differences between N-gene sgRNAs of WT and Δ382 viruses as this sgRNA is transcribed using TRS located downstream of the deletion region. This finding indicates that the reactivity differences between other sgRNAs of WT and Δ382 viruses are likely to occur in cis due to the deletion and not due to factors that may act in trans (Supplementary Fig. 12a, b). As sgRNA of N is by far the most abundant sgRNA of SARS-CoV-2 and it did not show structure differences between WT and Δ382 viruses9, differences in the reactivity between the 29,300 region in WT and Δ382 genomes were masked when an aggregate reactivity of all sgRNAs is used for comparison (Supplementary Fig. 12a, b). As such, using long-read sequencing to map RNA structures across sgRNAs can yield novel insights into sgRNA-specific RNA structures.
SARS-CoV-2 genome interacts strongly with mitochondrial RNAs and snoRNAs
The genomes of RNA viruses can interact directly with host RNAs to facilitate or restrict viral infection. By analysing the SPLASH interactions between SARS-CoV-2 and host cell RNAs, we identified 374 and 334 host RNAs that interact with the WT and Δ382 SARS-CoV-2 genomes, respectively (Fig. 5a, b, Supplementary Fig. 13a–c, Supplementary Data 8,9). The host RNA-virus genome interactions are preferentially enriched in the coding regions along host mRNAs (Fig. 5c). STRING analysis of the top 25% of SARS-CoV-2 interactors showed that they are enriched for proteins that physically interact with each other (PPI: p < 10−16)37, including genes that are involved in the mitochondria, ER, GTP hydrolysis, and translation processes (Fig. 5d). GO term enrichment of interacting RNAs showed similar enrichments, confirming the importance of SARS-CoV-2 interactors in mitochondrial and ER function (Fig. 5e)38,39.


a Pie-chart showing the number of host RNAs from different RNA classes that interact with WT SARS-CoV-2. b Line plot showing the number of SPLASH reads along the WT SARS-CoV-2 genome. The names of host RNAs that bind strongly to the virus at a particular location is labelled above the interaction peak. c Bar-chart showing the fraction of host interacting regions that fall in 5’ UTR, CDS, and 3’ UTR (black), as compared to what is expected from random (grey). Host interacting regions are enriched in CDS and depleted in 3’ UTRs. Significance was assessed using a one-way Chi-Square test without adjustments. d STRING analysis of the top 25% SARS-CoV-2 host interactors37. The networks were built based on high confidence (0.7) evidence from protein-protein interaction sources of experiments, databases, and text-mining where the line thickness indicates the strength of data support. Functional clusters in PPI networks were determined using the Markov Clustering algorithm (MCL). The PPI enrichment p-value < 10−16. e GO term enrichment of the top 25% SARS-CoV-2 interactors using David functional annotation analysis. SARS-CoV-2 interactors are enriched for transcripts that reside in the mitochondria, ER, and exosome, and are enriched for molecular functions for iron-binding, endopeptidase inhibitor activity, and cytochrome-c oxidation activities. The p-value was calculated using a hypergeometric test. f Boxplots showing the distribution of log2 fold change in gene expression upon SARS-CoV-2 infection in non-interacting genes, in 374 RNAs that interact with SARS-CoV-2 (All), in the top 100 interactors and top 20 interactors ranking by the chimeric read counts. SARS-CoV-2 interactors show a decrease in gene expression upon virus infection. However, the top interactors show an increase in gene expression upon virus infection, indicating that they are selectively stabilized. The expression data were calculated from the non-chimeric reads from SPLASH and quantified using DESeq258. The p-value was calculated using the two-tailed Wilcoxon Rank Sum test. The box represents the 25–75th percentiles, and the median is indicated. The whiskers show the minimum and maximum values. g Volcano plot showing the distribution of host RNA gene expression upon SARS-CoV-2 infection. The top 20 interactors are highlighted in red and show a general stabilization in gene expression upon virus infection. The p-value was calculated using the two-tailed Wilcoxon Rank Sum test. h Boxplots showing the distribution of protein ratio after virus infection in all genes, in 374 RNAs that interact with SARS-CoV-2, in the top 100 interactors and top 20 interactors. The p-value was calculated using the two-tailed Wilcoxon Rank Sum test. The box represents the 25–75th percentiles, and the median is indicated. The whiskers show the minimum and maximum values. Source data are provided as a Source Data file.
While SARS-CoV-2 RNAs bind to more than 300 RNAs inside cells, we observed that the top 10 (2.6%) of the strongest interactors contributed to 17.5% and 24.1% of all WT and Δ382 binding events, indicating that the virus binds to them particularly strongly (Fig. 5b). These strong interactors include mitochondrial RNAs such as the mRNA of COX1, which is a mitochondrially encoded cytochrome-c oxidase, mitochondrial rRNA and tRNA, and SNORD27, a snoRNA responsible for 18 S ribosomal RNA methylation (Supplementary Fig. 13b). Using the program RNAcofold, we observed strong pair-wise interactions between virus and mitochondria RNAs in SPLASH identified binding sites (Supplementary Fig. 13d). While generally, SARS-CoV-2 interacts stronger with more abundant host RNAs, we observed significantly more interactions between the virus and host mitochondrial and snoRNAs than expected from abundance alone (Supplementary Fig. 13e). A previous study using RNA-GPS had shown that part of SARS-CoV-2 RNAs localized to the mitochondria and the nucleolus40. SARS-CoV-2 infection also results in mitochondrial dysregulation23,41. Further experiments are needed to test whether the direct pairing between SARS-CoV-2 and mitochondrial RNAs contributes to mitochondrial dysregulation.
The SARS-CoV-2 infection has been found to have an impact on almost every aspect of the host transcriptome to control virus and host gene regulation42. We observed a general decrease of RNA abundance of SARS-CoV-2 interactors upon virus infection. Interestingly, however, an opposite trend was observed for the strong interactors that were selectively stabilized and their abundance is increased upon SARS-CoV-2 infection (Fig. 5f, g, Supplementary Fig. 13f, g). qRT-PCR analysis of key interactors such as COX1 mRNA and MT-rRNA showed that these RNAs are indeed stabilized upon virus infection, confirming our RNA sequencing results (Supplementary Fig. 13h–k). Mining of published SARS-CoV-2 proteomics data revealed that proteins encoded by SARS-CoV-2 interactors were also preferentially translated and/or stabilized at the protein level as compared to proteins produced by non-SARS-CoV-2 interactors (Fig. 5h)43. Thus, interaction with SARS-CoV-2 RNAs may confer a stabilizing effect on their overall gene regulation.
SARS-CoV-2 RNA binds to SNORD27 and is 2’-O-methylated
SNORD27 is one of the strongest host interaction partners for SARS-CoV-2 RNA (Fig. 6a) and is traditionally known to guide 2’-O-methylation of 18 S ribosomal RNA44. snoRNAs can bind and methylate cellular RNAs45, and methylation enzymes including fibrillarin (FBL), rRNA methyltransferase 2 and 3 (MRM2 and MRM3) have been found to be physically associated with SARS-CoV-2 genome23. We tested whether SARS-CoV-2 RNA could be 2’-O-methylated and whether host RNAs’ methylation levels are changed upon virus infection. We performed 2 biological replicates of Nm-seq on total RNA from HeLa cells, as well as from uninfected and SARS-CoV-2 infected Vero-E6 cells (Supplementary Fig. 14a, Supplementary Data 10,11, Methods)46. Biological replicates of Nm-seq from both cell types show that they are well correlated, suggesting that Nm-seq data is reproducible (Supplementary Fig. 14b,c). Nm-seq analysis on human 18 S rRNA accurately identified 36 out of 42 known 2’-O-methylation sites and had a high AUC-ROC curve of 0.96, suggesting that we are able to detect existing 2’-O-methylation sites accurately and sensitively (Supplementary Fig. 14d,e).


a Structure model of SARS-CoV-2 RNA before and after SNORD27 binding. The model of SARS-CoV-2 before SNORD27 binding is generated using the RNA structure program54 and incorporating SHAPE-MaP reactivity as constraints. The model of SARS-CoV-2:SNORD27 pairing is generated using RNAcofold35. SHAPE-MaP reactivities are mapped onto the structure models. b The distribution of 130 2’-O-methylation sites along WT SARS-CoV-2 genome. The y-axis is the 2’-O-methylation enrichment measured by the RT-stop fraction against the coverage at each nucleotide. The orange bars indicate enriched sites above control. c Boxplots showing the distribution of 2’-O-methylation frequency along with the host RNAs and on SARS-CoV-2 RNA from n = 2 biological replicates (total 4 technical replicates). P-value was calculated using the two-tailed Wilcoxon Rank Sum test. d Bar-plot showing the distribution of 2’-O-methylation sites (black), and control sites (grey), in the different amino acids, 5’ UTR, 3’ UTR, and non-coding regions (NC) along the SARS-CoV-2 genome. P-value was calculated using the chi-squared test. e Distribution of 2’-O-methylation sites on SARS-CoV-2 genome (orange) and Vero-E6 transcriptome (blue) at A, C, U, G bases. The proportion of each nucleotide was normalized by its prevalence in the host transcriptome and SARS-CoV-2 genome, respectively. 2’-O-methylation sites on SARS-CoV-2 are enriched in Us and depleted in Gs. P-value was calculated using the chi-squared test. f Boxplot showing the distribution of SPLASH chimeric reads at all sites (n = 29,847) versus 2’-O-methylation sites (n = 485) along the SARS-CoV-2 genome. P-value was calculated using the two-tailed Wilcoxon Rank Sum test. g Boxplot showing the distribution of 2’-O-methylation frequency along Vero-E6 mRNAs in uninfected and SARS-CoV-2 infected cells. P-value was calculated using the two-tailed Wilcoxon Rank Sum test. h, i Boxplot showing the distribution of increase (h) or decrease (i) in methylation frequency in non-interacting RNAs, in all interacting RNAs, in top 100 interacting RNAs, and in top 20 interacting RNAs. P-value was calculated using the two-tailed Wilcoxon Rank Sum test. j Violin plot showing the distribution of the minimum distance between host 2’-O-methylation site and the location of its interaction with SARS-CoV-2. P-value is calculated using the Wilcoxon Rank Sum test. k Boxplot showing the distribution of changes in gene expression in Vero-E6 mRNAs that either lost (decrease) or gained (increase) 2’-O-methylation sites. P-value is calculated using the Wilcoxon Rank Sum test. In (c, f–i, and k), the box represents the 25–75th percentiles, and the median is indicated. The whiskers show the minimum and maximum values. l Model of our hypothesis. SARS-CoV-2 binds to SNORD27 to sequester methylation enzymes to itself and away from host mRNA, enhancing host RNA decay. Source data are provided as a Source Data file.
Using Nm-seq, we identified a total of 130 2’-O-methylation sites in the SARS-CoV-2 genome (Fig. 6b), and 4931 sites in 4142 transcripts in the Vero-E6 transcriptome (Supplementary Fig. 14f,g). We observed that a 2’-O-methylated host mRNA contains approximately 1.1 modifications per transcript in the Vero-E6 transcriptome, similar to methylated RNAs in HeLa cells46. The majority of these host modification sites (60%) were located in the coding regions and were enriched for codons encoding charged amino acids (Supplementary Fig. 15a), as previously described46. In comparison, the SARS-CoV-2 genome is 19-fold more modified than host mRNAs after normalizing for transcript length (Fig. 6c). The 2’-O-methylations are enriched in the 5’ and 3’ UTRs of SARS-CoV-2 (Fig. 6d), depleted in position 2 of codons (Supplementary Fig. 15b), and are enriched for U and depleted for G bases along the genome (Fig. 6e). 2’-O-methylation sites on SARS-CoV-2 are also associated with high SPLASH reads, indicating that they are located near positions with abundant intramolecular pair-wise interactions (Fig. 6f). As bases that are 2’-O-methylated cannot be modified by NAI, we tested whether 2’-O-methylated bases have lower SHAPE-MaP reactivity. We did not observe a decrease in SHAPE-MaP reactivity in 2’-O-methylated bases as compared to non-methylated bases (Supplementary Fig. 15c), suggesting that only a fraction of the SARS-CoV-2 genome at the base is modified.
As the modification of the SARS-CoV-2 genome might sequester corresponding RNA modification enzymes away from the host transcriptome, we calculated the changes in modification rates in the host RNAs in the presence and absence of SARS-CoV-2 infection. We observed a decrease in host RNA 2’-O-methylation frequency upon virus infection, supporting our hypothesis that they become less methylated (Fig. 6g). In addition, we also observed that host RNAs that interact strongly with the SARS-CoV-2 genome show greater 2’-O-methylation changes, and show large losses and gains in modification sites within the RNAs (Fig. 6h, i, Supplementary Fig. 15d, e). We hypothesized that RNAs interacting with the SARS-CoV-2 genome could be methylated near their interacting regions, presumably due to proximity to SNORD27, while methylation sites that are located far away could be lost. To determine whether 2’-O-methylation sites on SARS-CoV-2 genome interacting RNAs are closer to the locations of virus-RNA interactions regions, we calculated the closest distance between virus-host interaction to host 2’-O-methylation site. We observed that sites that had 2’-O-methylation were indeed closer to virus-host RNA interactions sites (Fig. 6j), supporting the hypothesis that proximity to the SARS-CoV-2 genome might allow interacting RNAs to be methylated together within a hub.
2’-O-methylation has been shown to stabilize RNAs inside cells45. Therefore, we hypothesized that the loss of 2’-O-methylation on host RNA upon virus infection may affect the stability of the host RNAs. Indeed, we observed that the abundance of host RNAs that show a decrease in methylation sites was significantly decreased in infected cells. In contrast, the abundance of host RNAs that show an increase in methylation sites was increased (Fig. 6k). Thus, together with the production of Nsp1 from SARS-CoV-2 to cleave cellular RNAs, the binding of SARS-CoV-2 RNAs to SNORD27 could serve as part of a multi-prong mechanism to decrease cellular RNAs and maximize virus replication (Fig. 6l)47.

