Characterization of PAR adduct moieties upon chemical RNA cleavage
To identify the PAR-crosslinked RNA-binding sites, we needed to characterize the masses of PAR adducts on peptides. For this, HeLa cells were metabolically labeled with 4SU or 6SG and were irradiated with UVA (365 nm) to induce crosslinking between proteins and PARs in RNA. Following cell lysis, the protein-mRNA conjugates were enriched using oligo-d(T) beads4,5, and digested with trypsin. Then, the peptide-RNA conjugates were enriched using a 30 kDa size-cutoff cellulose membrane filter to deplete non-crosslinked peptides smaller in size than the conjugates. The peptide-RNA conjugates were treated with hydrofluoride (HF) for complete RNA digestion15, which leaves a single nucleoside or base conjugated to peptides, and subjected to LC-MS/MS analysis (Fig. 1a). The MS/MS data consists of two layers. MS1 measures the mass of a whole peptide precursor ion which is subsequently isolated in the gas phase and broken into smaller fragment ions, whose masses are measured in the MS2 stage.


a Experimental scheme for sample preparation for pRBS-ID. Upon photoactivatable ribonucleoside (PAR) labeling, cells are irradiated with UVA to induce protein-RNA crosslinking. The protein-RNA conjugates are enriched using oligo-d(T)4, 5, and the protein is digested into tryptic peptides. Then, the peptide-RNA conjugates are enriched by size-cutoff filtration, and HF is added to fully digest the RNA into monomers15. The resulting PAR adducts are characterized using two open search tools22, 23. This information serves as a basis for designing the MS/MS search pipeline for pRBS-ID. Open search for 4SU-RBS (b) or 6SG-RBS (c). MS1 modification mass was measured using MSFragger22 (top), and the respective MS2 fragment ion modification on precursor ions with base (bottom left) or nucleoside (bottom right) adducts was analyzed, respectively. Modification mass ranges of 60−400 Da are shown.
First, we determined the PAR adduct masses on the peptide precursor ions. We utilized MSFragger22 as an open search tool, which detects the precursor mass shifts in MS1 with little dependency on MS2 fragment ions (Supplementary Data 1). As a result, the base and nucleoside forms with H2S loss were identified as 4SU and 6SG adducts (Of note, hereafter 4-thiouracil/6-thioguanine are referred to as “4SU/6SG-base”; 4-thiouridine/6-thioguanosine as “4SU/6SG-nucleoside”) (Fig. 1b, c, top). To understand how the PAR-base and PAR-nucleoside forms were generated, we carefully inspected the MS1 extracted ion chromatograms (XICs) of the same peptide with the respective adducts (Supplementary Fig. 1a, b). Among the XIC peaks corresponding to peptides with PAR-base adducts, some showed elution times distinct from the peaks from peptides with PAR-nucleoside. This indicates that HF digestion generates both the PAR-base and PAR-nucleoside forms in solution. Meanwhile, other XIC peaks from peptides with the PAR-base forms showed co-elution patterns with those of the nucleoside forms. This suggests that a portion of the PAR-nucleoside forms are in-source fragmented to the PAR-base forms during the ionization step prior to MS1 measurement. In addition, the loss of H2S is consistent with previous studies that used nuclease-based RNA digestion approaches9,13, suggesting that H2S is lost independent of the HF treatment. It is likely that H2S loss occurs during UVA-induced protein-RNA crosslinking (Supplementary Fig. 1e).
Next, we further characterized the fragmentation behavior of PAR adduct masses in MS2 (Fig. 1b, c, bottom). Notably, we observed a nearly complete transition of the nucleoside form in the MS1-level to the base form in the MS2-level owing to the neutral loss of ribose, while the base form in the MS1-level remained intact in the MS2-level. This unique phenomenon was initially noticed in an open search using the MODa algorithm23, which employs the adduct mass on MS2 fragment ions to find modifications in peptides (Supplementary Data 1). Here, we encountered an unexpected observation where the PAR-nucleoside forms were severely underestimated (Supplementary Fig. 1c, d), compared to the results obtained from MSFragger (Fig. 1b, c, top). This finding suggests partial or complete loss of the PAR-nucleoside modification during higher-energy collisional dissociation (HCD) for MS2 fragment ion generation. To accurately characterize the fragmentation patterns of PAR adducts in the MS/MS process, we carried out an in-depth analysis of the MS2 spectra from peptide-spectrum matches (PSMs) with MS1 adduct masses of PAR-nucleosides in MSFragger searches, matching the MS2 peaks to a list of all possible masses for MS2 fragment ions with adduct masses ranging from 1 to 400 m/z. For each nominal mass adduct, we collected well-matched MS2 fragment ions and summed up the MS2 peak intensities. As a result, we found that the nominal mass corresponding to the PAR-base form showed the largest intensity sum, indicating that the PAR-nucleoside form was further fragmented to the PAR-base form via partial neutral loss of ribose (Fig. 1b, c, each bottom right). When we further performed the same analysis on PSMs with MS1 adduct masses of PAR-bases, the most abundant nominal mass was found to be the same (Fig. 1b, c, each bottom left). We additionally verified these observations by carefully inspecting individual MS2 spectra (Supplementary Figs. 2–4).
Building an MS/MS search pipeline specific to PAR-modification
Having determined the chemical features of PAR adducts in both the solution and gas phases (Supplementary Fig. 1e), we next designed an MS/MS search pipeline that can robustly profile the RBSs crosslinked to PARs. We focused on the observation that the modification masses at the MS2 stage from both solid (PAR-base form) and partially labile (PAR-nucleoside form, ribose neutral loss) modification were the same (PAR-base form). In the case of partially labile PAR-nucleoside modification, if the MS1 precursor mass is adjusted into a PAR-base adduct, the resulting combination of MS1 and MS2 ion masses would be identical to that of the solid PAR-base form modification. This MS1 correction would minimize the number of variable modifications to a single PAR-base and allow simultaneous identification of RBSs crosslinked to either PAR-bases or PAR-nucleosides.
Therefore, we integrated this idea into the MS/MS search pipeline design for pRBS-ID (Fig. 2a). Given that the MS/MS scans from PAR-nucleoside forms could not be predetermined, we decided to open up the possibilities for MS1 correction for all MS/MS scans. Following LC-MS/MS spectral data acquisition (Fig. 2a, step 1), each MS/MS scan data was duplicated. One was left unaltered, while the other’s precursor mass was adjusted by subtracting the ribose mass, accounting for the ribose loss from the PAR-nucleosides (Fig. 2a, step 2). Then, the duplicated spectral data were matched to peptides (in other words, PSMs were generated) in a target-decoy database with a single variable modification of the PAR-base. For each MS/MS scan, a single best PSM was selected within the entire set of uncorrected and corrected scans, thereby excluding any chance of PSM inflation; just one PSM per a single MS/MS scan. If the best match came from the uncorrected scan, the MS1 modification would be the PAR-base form; for the corrected scan, the PAR-nucleoside form (Fig. 2a, step 3). In this way, both PAR-base or nucleoside form modifications could be identified from a minimal MS/MS search space using a single base form modification. Notably, because both 4SU and 6SG-nucleosides lose the identical mass of ribose, the same approach could be applied to both crosslink types.


a Schematic of the MS/MS search pipeline for pRBS-ID. Following MS/MS spectra acquisition, each MS2 scan is duplicated. One is left unaltered (“Uncorrected MS1”), while the other’s precursor mass is corrected to account for the ribose neutral loss (“Corrected MS1”). The spectra are together searched against a target-decoy database, with a single variable modification of the base. Finally, a single best match is chosen for each scan. This allows for the simultaneous identification of peptides with base or nucleoside adducts. b Comparison of unique mass class (UMC) identification counts between conventional MS/MS search and pRBS-ID. In the former, variable modifications were defined as either base or nucleoside, without precursor correction considering ribose loss. c Comparison of identified 4SU-RBSs and 6SG-RBSs (left) or the respective protein groups (right). d MS1 intensity-based label-free quantification24 of peptides containing 4SU-RBS (left) or 6SG-RBS (right) co-identified in replicate experiments. Spearman’s correlation coefficients were calculated and rounded up to the second decimal point.
Through careful inspection of the MS/MS data, we noticed that MS1 chromatograms from the same peptide showed multiple peaks with distinct retention times (Supplementary Fig. 1a, b). These can originate from various RBS localizations or crosslink isoforms15 (i.e., different atom pairs in amino acid-PAR are crosslinked). Thus, each XIC peak should be distinguished for precise RBS localization and quantification. Therefore we integrated mPE-MMR24, a tool that marks individual XIC peaks as separate unique mass classes (UMCs) and assigns label-free quantification values, prior to the MS-GF+25 closed search in pRBS-ID (see online “Methods” for details). Notably, the assignment of one RBS localization per UMC allowed precise identification of multiple RBSs from a single peptide. This expanded the RBS repertoire compared to the previous analysis pipeline in RBS-ID, where only a single RBS was chosen for each peptide15.
Finally, we evaluated the performance of the pRBS-ID MS/MS search pipeline against a conventional search method using the same dataset. For the “conventional” search, we analyzed PAR-base and nucleoside modifications independently, without MS1 correction or consideration of the ribose loss. pRBS-ID showed a remarkable increase in the PAR-nucleoside form identification while that of PAR-base form was comparable between the two methods (Fig. 2b). This was because pRBS-ID considers the transition of PAR-nucleoside forms in MS1 into PAR-base forms in MS2, enabling more sensitive identification compared to the conventional search.
Robust and base-specific identification and quantification of PAR-RBS
Using pRBS-ID, we identified 1,463 4SU-RBSs and 70 6SG-RBSs from 288 and 49 proteins, respectively (Fig. 2c, Supplementary Fig. 5, and Supplementary Data 2). As expected, the sites were found mainly in previously annotated RBPs26 (Supplementary Fig. 6a, b). The number of 6SG-RBSs was smaller than that of 4SU-RBSs, although we used twice the amount of the sample, which is likely due to the relatively low UV-crosslinking efficiency of 6SG17. Twenty-three RBSs were found specific to 6SG (Fig. 2c). Other 6SG-RBSs crosslinked to both 4SU and 6SG, suggesting that these RBSs may be in close contact with adjacent U/G sequences or that the interactions may not be sequence-specific.
Moreover, pRBS-ID enabled accurate MS1 intensity-based label-free quantification24 of RBSs. While pRBS-ID showed modest overlap between replicate experiments in terms of identification (Supplementary Fig. 5), the MS1 intensities of RBS-containing peptides co-identified in the replicate experiments showed high quantitative reproducibility (Fig. 2d).
Comparison of base-specific features of RBSs
Having generated the 4SU-RBS dataset by pRBS-ID, we first examined the differences in RBSs crosslinked to the same base type (U), but using different UV irradiation (UVA for 4SU; UVC for natural U). To obtain a comparable UVC-RBS dataset, we re-analyzed our previously published oligo d(T)-enriched mRNA-RBS dataset15 by integrating mPE-MMR24 into the MS/MS search pipeline in RBS-ID (see online methods for details). As a result, from the previous RBS-ID data (generated from 2.5 times more cells than the pRBS-ID experiment), 2,030 RBSs were identified (Supplementary Data 2). These RBSs were from 429 protein groups that are annotated mostly as RBPs (Supplementary Fig. 6c) and located mainly in canonical RNA-binding domains (RBDs) (Fig. 3a). Out of 2,030 RBSs identified by RBS-ID, only 427 sites were commonly detected in both RBS-ID and pRBS-ID experiments (Fig. 3b). pRBS-ID revealed 1,036 additional sites, which substantially expanded the list of U-contacting RBS to 3,066 sites in total. Thus, these two methods can be used to complement each other.


a Domain annotation26 of identified UVC-RBSs (left), 4SU-RBSs (middle), and 6SG-RBSs (right). Comparison of identified 4SU-RBSs and UVC-RBSs (b) or the respective protein groups (c). Comparison of amino acid frequencies between 4SU-RBS and UVC-RBS (d) or 6SG-RBS (e), respectively normalized by those from all sequences in proteins where each RBS type was identified. Amino acids that ranked in the top five are shown.
Of note, in the protein level, the RBS-ID and pRBS-ID have a larger overlap than they do in the RBS level (Fig. 3c). In terms of protein domains, the two datasets were highly comparable (Fig. 3a), including the identification of RNA-binding regions without domain annotation in canonical RBPs (Supplementary Fig. 7). These observations suggest that 4SU and U would crosslink to similar proteins and domains but different amino acids in the vicinity, likely by the virtue of their distinct preferences for UV-crosslinking with different amino acid types.
Hence, we compared the amino acid type frequencies between the 4SU- and UVC-RBSs (Fig. 3d). RNA base-interactors (Trp, Tyr, Phe; aromatic amino acids) and an efficient electron acceptor (Cys; thiol-containing amino acid) were the top four amino acid types in common, all showing modest enrichment as UVC-RBS except for Phe. Meanwhile, Met (thioester-containing amino acid) showed a high frequency only as UVC-RBS. This finding suggests that natural U crosslinks readily with any sulfur-containing amino acid (Cys, Met), while 4SU crosslinks only with one containing a thiol group (Cys). On the other hand, 4SU exhibited a broader preference for His, Arg, Lys (basic amino acids), and Pro, compared with natural U (Supplementary Fig. 8). Thus, 4SU would be useful to broaden the coverage of RBS mapping. Overall, the overlapping yet distinct crosslinking preferences of the two UV-crosslinking methods indicate that they can complement one another to profile RBSs of different amino acid types, thereby effectively expanding the repertoire of RBSs.
Next, we analyzed the differences in RBSs crosslinked to distinct base types (U or G) using the 4SU-RBS and 6SG-RBS datasets. Although the number of identified 6SG-RBSs may be rather small to make highly accurate comparisons, the analysis can serve as a proof-of-principle to dissect the base-specific features of RNA−protein interactions. First, we compared the domain-level characteristics of these two types of PAR-RBSs26 (Fig. 3a). Both RBS types were mainly identified within annotated RBDs, with the RRM domain as the top structural motif. The same was true for 6SG-RBSs, but notably, two zinc finger motifs, ZNF_CCHC and ZNF_RanBP2, were found more frequently with 6SG-RBSs than with 4SU-RBSs. ZNF_CCHC is known to bind G-rich RNA, with a notable example being LIN28 binding to the GGAG motif27. ZNF_RanBP2 binds to RNA with a GGU core motif28, as exemplified in the interaction between FUS and the hnRNP A2/B1 pre-mRNA29. This indicates that pRBS-ID using 6SG successfully captures the G base-protein interactions in the cell.
In addition to the domain-level analysis, we further dissected the frequency of amino acid types in the two PAR-RBS datasets (Fig. 3e). Both PARs shared Trp, Tyr, Phe, His, and Cys as the top five most frequently captured amino acids. They were either RNA base-interactors (Trp, Tyr, Phe, His; aromatic ring- or π-bond-containing amino acids) or an efficient electron acceptor (Cys; thiol-containing amino acid) that can facilitate efficient UV-crosslinking. Among the five amino acids, Trp, Cys, and Tyr showed higher frequencies as 6SG-RBS, while His and Phe were preferred as 4SU-RBS. We predict that the preferences may result from a combined effect of the prevalence of a base type at RNA-protein interfaces and differential UV-crosslinking efficiency for each amino acid type.
Structural insights into RNA−protein interactions using pRBS-ID
pRBS-ID can pinpoint the exact amino acids that interact with RNA at “zero distance”. We revisited previous high-resolution structural datasets on RNA-protein complexes and checked whether 4SU- or 6SG-RBSs interact with U or G bases, respectively. In PTBP1, two prominent 4SU-RBSs (H411 and F487) are located close to the U bases30 (Fig. 4a). Furthermore, in hnRNP F protein, Y298 was prominently identified as both 4SU- and 6SG-RBS (Fig. 4b). Y298 is adjacent to the G base in the structure31. Although no close interactions with a U residue could be found due to the particular RNA sequence they used for structural studies, our data suggest that, in cells, hnRNP F Y298 may interact with both U and G in diverse pre-mRNA partners.


a−d RBSs identified in this study (top) and their positions in known structures (bottom). One or two top-ranking RBSs from each protein are highlighted in green. The U and G residues proximal to RBSs are shown in blue and red, respectively. a Interaction between H411 or F487 in PTBP1 (4SU-RBSs) with proximal Us in structural data (PDB: 2ADC30). b Interaction between Y298 in HNRNPF (both 4SU- and 6SG-RBS) with proximal G in structural data (PDB: 2KFY31). c Potential interaction between F49 in FAU (4SU-RBS) and most proximal U in structural data (PDB: 4V6X33). d Potential interaction between W44 in EDF1 (6SG-RBS) and proximal Gs in structural data (PDB: 6ZVH34).
Structural approaches are instrumental in understanding the molecular basis of RNA−protein interactions32, yet the resolved structures are static snapshots. Hence, dynamic intermolecular binding events, particularly those involving flexible loops and intrinsically disordered regions (IDRs), may not be captured in the conformation resolved by X-ray crystallography or cryo-EM. In theory, pRBS-ID can identify RBSs in RNP complexes at any conformational state. Thus, we examined RBSs on flexible regions to assess our hypothesis. In FAU, a component of the human 40S ribosome, F49 was identified as 4SU-RBS33 (Fig. 4c). According to the cryo-EM structure, this RBS does not directly interact with rRNA. However, F49, as part of a disordered region, may be brought closer to U607 in the 18S rRNA for direct interaction. Another example was W44 (6SG-RBS) in EDF1, which binds to collided ribosomes to mediate the ribosome-associated protein quality control pathway34 (Fig. 4d). W44 of EDF1 is located within a long disordered region and may contact G534, G535, or G552 in the 18S rRNA, which could not be captured in the cryo-EM structure. Taken together, these results suggest that pRBS-ID datasets can be valuable resources for studies of RNP complexes to elucidate the RNA−protein interactions in both stable and dynamic states.

