Preloader

Glyco-Decipher enables glycan database-independent peptide matching and in-depth characterization of site-specific N-glycosylation

Development of Glyco-Decipher

Glyco-Decipher begins with in silico deglycosylation25,26 to identify peptide backbones in glycopeptides without setting any glycan modifications (Fig. 1). The core structure of N-glycan on glycopeptides generates a series of Y fragment ions (i.e., Y0, Y-HexNAc(1), Y-HexNAc(2), Y-Hex(1)HexNAc(2), Y-Hex(2)HexNAc(2), and Y- Hex(3)HexNAc(2)) with known mass gaps during HCD fragmentation (Fig. 2a). These Y fragment ions in the glycopeptide spectra are determined by matching mass gaps to derive the masses of peptide backbones. The in silico deglycosylated spectra are generated by the removal of B/Y ions originating from glycan fragmentation followed by the reassignment of the precursors to peptides (“Methods”). MS-GF + 27, which is an open-source database search tool integrated in Glyco-Decipher, is then used to search for the resulting spectra in proteome databases for the initial identification of peptide backbones. For a second-stage search, the fragmentation patterns of each peptide backbone are extracted from the peptide-spectrum matches (PSMs) and applied to match glycopeptide spectra that remained unassigned in the initial search. This strategy, called spectrum expansion, exploits the high similarity of peptide fragmentation patterns for glycopeptides with identical backbones and enables the identification of glycopeptide spectra with poor peptide fragmentation. No glycan information is used in either stage of peptide identification. After the identification of peptide backbones, the mass of the glycan part is precisely derived by the mass difference between the precursor and the peptide backbone. This glycan database-independent peptide identification pipeline enables Glyco-Decipher to profile glycans at the proteome level and to discover glycans not present in databases.

Fig. 2: Peptide fragmentation patterns improve spectrum interpretation and glycopeptide identification.
figure 2

a Tandem mass spectrum example of glycopeptide “MHLNGSNVQVLHRLTIR- Hex(9)HexNAc(2)” (top) and the in silico deglycosylation result of the spectrum (bottom). Blue: b ions of peptide; red: y ions of peptide; purple: b/y ions with HexNAc residue; green: B ions; orange: Y ions of glycan with intact peptide backbone attached. b Fragmentation patterns of the peptide backbone “MHLNGSNVQVLHR” modified by different glycans and/or with different precursor charge states in six glycopeptide spectra. c Distributions of similarities between the peptide fragmentation pattern in each peptide-spectrum match (PSM) from in silico deglycosylation and the averaged pattern of the corresponding peptide backbone. The quartiles of the distributions are indicated by inner dashed lines. The medians and the spectrum numbers are labeled in the plot. The similarity value of 0.9 is indicated by an outer dashed line. Source data are provided as a Source Data file. d Distribution of matching scores for the peptide backbone “MHLNGSNVQVLHR” with glycopeptide spectra across the entire data acquisition time window. Specifically note that the score of PSM from in silico deglycosylation was re-calculated with the consideration of peptide fragmentation pattern. Inset: The score distribution of PSMs after score filtration and core structure peak matching in spectrum expansion. The dashed line indicates PSM score threshold of 52.75 derived from the e-value filtration method. Green box: target PSMs obtained by in silico deglycosylation; blue circle: target PSMs obtained in spectrum expansion; orange cross: decoy PSMs generated in spectrum expansion for quality control. e Comparison of identifications before (pale blue) and after (blue) spectrum expansion. The glycan part was matched with the GlyTouCan database to obtain glycopeptide-spectrum match (GPSM, PSM with definite glycan composition) identifications. Site-specific glycans were classified into three categories based on the glycan composition: truncated glycans (Hex(<4)HexNAc(<3)Fuc(<2)), oligo-mannose glycans (Hex(>3)HexNAc(2)Fuc(<2)) and complex/hybrid glycans. f Performance comparison between Glyco-Decipher and MSFragger (V3.1.1) in open search mode. Green: the consistently identified spectra. Orange: the spectra commonly identified, but matched to different peptides. Gray: spectra specifically matched by MSFragger.

To identify the attached glycans, candidate glycans in the built-in (GlyTouCan) or user-provided database are scored by matching their theoretical fragment ions with experimental B/Y ions in glycopeptide spectra (Supplementary Figs. 1–3). For glycans that do not match any database entries, Glyco-Decipher performs monosaccharide stepping to reveal the composition of the glycans and their potential modifications. Target-decoy spectrum matching and expectation value calculation28 are adopted to evaluate the false discovery rate (FDR) at both identification stages of peptides and glycans (Supplementary Fig. 4). Glycans attached to peptide backbones are mapped to the site level for further analysis. A quantification module that extracts the elution profiles of intact glycopeptides in MS1 is embedded in Glyco-Decipher for abundance analysis of protein glycosylation.

Utilization of peptide fragmentation patterns to improve spectrum interpretation

A dataset containing 1,386,844 tandem mass spectra acquired by stepped collision energy HCD (sceHCD) fragmentation for intact glycopeptides from five mouse tissues9 (brain, heart, kidney, liver, and lung) was used for the performance evaluation. From the 1,282,263 oxonium ion-containing glycopeptide spectra, Glyco-Decipher yielded a total of 140,250 peptide-spectrum matches by the in silico deglycosylation scheme (Supplementary Note 1 and Supplementary Figs. 45–48). Notably, the in silico deglycosylation scheme only permits the identification of peptide backbones for the glycopeptide spectra with sufficient peptide fragment ions. We then proposed that the peptide fragmentation patterns in these identified spectra could be used to facilitate the interpretation of spectra with poor peptide fragmentation. To this end, we first evaluated the similarity of peptide fragmentation patterns in the spectra of glycopeptides sharing the same backbones. Figure 2b presents fragmentation patterns of the peptide backbone “MHLNGSNVQVLHR” modified by diverse glycans in the six glycopeptide spectra identified by the in silico deglycosylation scheme from a single liquid chromatography (LC)-MS/MS analysis of mouse liver samples. Their patterns are highly similar: the b2 ion is always the most abundant peptide fragment ion, followed by the b3 and other ions. We then investigated the fragmentation patterns of the peptide backbones in the identified glycopeptide spectra of the five mouse tissue datasets. Among the fragmentation patterns in the 140,250 PSMs achieved by in silico deglycosylation, the majority (127,095, 90.6%) exhibited over 0.9 cosine similarities with the average pattern of the corresponding peptide backbone (Fig. 2c), regardless of the attached glycans and/or precursor charge states (Supplementary Fig. 5). This result confirmed that N-glycopeptides sharing the same peptide backbone exhibit similar peptide fragmentation patterns.

We then applied the obtained peptide fragmentation patterns to improve the interpretation of glycopeptide spectra (Methods) and termed this process “spectrum expansion”. Figure 2d illustrates an example of spectrum expansion: the peptide backbone “MHLNGSNVQVLHR” was initially identified by in silico deglycosylation of six glycopeptide spectra (green, Fig. 2d), and the averaged fragmentation pattern was used to match the unassigned spectra in spectrum expansion to achieve additional PSMs (blue). The results demonstrated that high-score PSMs were observed within three narrow retention time windows at 70, 80, and 92 min, respectively (note that an LC gradient of as long as 360 min was adopted for glycopeptide separation in this dataset). While random matches (orange) and low-score PSMs spanned the whole chromatography gradient. From the final glycopeptide identification results (Supplementary Table 3), we found that those made in the in silico deglycosylation stage (green, approximately 70 min) corresponded mainly to glycopeptides with oligo-mannose glycans, while those from spectrum expansion were glycopeptides with glycans containing one (~80 min) or two (~92 min) sialic acids. Comparative analysis of their spectra indicates that fewer and lower-intensity peptide fragment ions were generated in the spectra of sialic acid glycopeptides than in the spectra of oligo-mannose glycopeptides (Supplementary Fig. 6). However, the consistent fragmentation pattern of the peptides elevated the matching scores of true-positive PSMs and thereby enabled the interpretation of these glycopeptide spectra. More examples of peptide identification in spectrum expansion are shown in Supplementary Figs. 7–9.

By employing the spectrum expansion strategy (Supplementary Fig. 10), peptide backbones in 53.3% (74,760/140,250) more spectra were uncovered and resulted in 215,010 PSMs without the use of glycan databases (Fig. 2e, top). The glycans in glycopeptide spectra were matched to the downloaded GlyTouCan database, which contains 1,766 unique monosaccharide compositions (Methods). In total, a 48.4% increase (before: 91,535, after: 135,840) in the number of glycopeptide-spectrum matches (GPSMs, PSMs with annotated glycan compositions) and a 31.2% increase (before: 12,934, after: 16,971) in the identification of unique glycopeptides were achieved after spectrum expansion (Fig. 2e, top, Supplementary Data 2). Substantial site-specific glycans of complex/hybrid type were revealed at the expansion stage, resulting an improvement of 44.5% in the number of identification results after expansion (Fig. 2e, bottom). Statistical analysis also confirmed that the spectrum expansion strategy can be used to determine peptide backbones for glycopeptide spectra with poor peptide fragmentation (Supplementary Fig. 11). Notably, the peptide fragmentation patterns are nearly unchanged in the expansion results (Supplementary Fig. 12), indicating the high confidence of spectrum expansion. Overall, nearly 53.3% more glycopeptide spectra and 31.2% more unique glycopeptides were uncovered in the five mouse tissues after spectrum expansion.

Comparison with open search method in blind glycan modification search

Based on the in silico deglycosylation and the subsequent spectrum expansion, the peptide backbones in glycopeptides were confidently identified. In addition, the mass difference between the precursor and peptide backbone was recognized as the attached glycan and the distribution of delta mass values, i.e., the glycan profile for a specific sample could be obtained without the use of glycan databases. As noted above, 215,010 PSMs were identified from the glycopeptide spectra of five mouse tissues. After derivation of glycan masses, distinct glycan profiles were observed across these tissues and the determined glycan masses were located mainly in the range of 1200–4000 Da (Supplementary Fig. 13). The glycan profiles of the mouse tissues led to the observation of 580 distinct mass bins (>10 PSMs for each) in which only 313 were matched to GlyTouCan database entries, indicating the high diversity of glycans and the presence of unexpected glycans. An alternative approach for profiling N-glycans is to analyze the enzymatically released glycans, from which information on glycosites and glycoproteins is missing. Relevant information is retained in the glycopeptide spectra and is provided by Glyco-Decipher with the glycan database-independent pipeline.

Peptide matching in modification-blind mode could also be achieved by the open search method, which is an effective way to discover unknown modifications on a proteome scale16,22,23,24. The underlying principle of this method is matching the spectrum with all peptide candidates within a wide window around the precursor mass, and the mass difference between the precursor and the peptide sequence is attributed to the detected modification. The open search method was also applied in recent glycoproteomics studies. A total of 300 O-glycan mass values were identified from human kidney tissue, serum, and T cells by using MSFragger in open search mode10, and the glycan diversity within bacterial proteomes was investigated by wide-tolerance (up to 2000 Da) open searching29.

To compare the performance between Glyco-Decipher and the open search method in blind glycan searching, we downloaded MSFragger (v3.1.1) and performed an open search with various parameters (Fig. 2f, mass window: 3000/6000 Da, labile search for N-glycan: off/on). Compared to the traditional open search and the N-glycan mode open search of MSFragger, Glyco-Decipher offered over onefold and ~20% increases in the number of PSMs, respectively (Fig. 2f). Approximately 60% of glycopeptide spectra identified by the open search were commonly matched by Glyco-Decipher, and ~90% of these common spectra were matched to identical peptide sequences. The theoretical Y ions of N-glycan core structure were deduced based on the peptide results and were matched in a much higher proportion of PSMs identified by Glyco-Decipher than that by MSFragger (Supplementary Fig. 14a). The Y1 (peptide + HexNAc) ion, which is a general feature of glycopeptide spectra in the dataset acquired upon optimal fragmentation energy9,30, was matched in all the PSMs of Glyco-Decipher since corresponding core structure ion screening was performed. While only 50% (traditional mode) or 60% (N-glycan mode) of the PSMs specifically identified by MSFragger matched corresponding Y1 ions (Supplementary Fig. 14). Further analysis on the spectra that were inconsistently identified in Glyco-Decipher and MSFragger suggests that the lack of corresponding core structure ions in MSFragger may originate from false-positive peptide identifications in open search (Supplementary Fig. 15). Dissimilar to routine modifications attached to peptides, N-glycans hold a much wider mass distribution, which greatly exceeds the mass window of 500 Da that is typically adopted in an open search22,23. An enlarged window (3000/6000 Da in this study) is needed in an open search to cover the mass of N-glycans, leading to a substantial increase in the peptide search space, which results in limited identification sensitivity and highly random hits due to the interference of glycan ions. In stark contrast, a mass tolerance of several ppm was adopted in Glyco-Decipher for peptide close searching without setting any glycan modifications, which requires several orders of magnitude lower search space than the open search.

Monosaccharide stepping enables the discovery of modified glycans at the proteome level

After profiling the glycans in mouse tissues, we compared the derived mass values (Supplementary Fig. 13) with GlyTouCan database values. The glycan masses of many PSMs with confident peptide matching did not match any database glycans. For example, from the mouse liver dataset, we identified peptide sequences for 44,646 glycopeptide spectra. Glycans in 27,855 (62.4%) PSMs were annotated by database entries, yielding a high proportion of PSMs (37.6%, 16,791) with unannotated glycans (Fig. 3a). To deduce the saccharide compositions of the unannotated glycans, we developed a monosaccharide stepping method (Fig. 3b and Supplementary Fig. 16) to match Y ions from the stepwise fragmentation of glycans. In addition to the basic saccharide composition of glycans (e.g., Hex(7)HexNAc(2)), the mass of unexpected glycan modifications was also determined. Supplementary Fig. 17 presents two spectrum examples in which the compositions of Hex(5)HexNAc(2) with the moiety of 179 Da and Hex(6)HexNAc(2) with the moiety of 242 Da were deduced by monosaccharide stepping strategy. By using this method, we revealed the glycan compositions for 21.2% (9482/44,646) of the PSMs in the mouse liver dataset. From the five mouse tissue datasets, the glycans in 25,850 additional PSMs were uncovered, accounting for 12.0% (25,850/215,010) of the total PSM results (Fig. 3c).

Fig. 3: Elucidation of the modification moieties on glycans.
figure 3

a Mass histogram of glycans in glycopeptides of the mouse liver dataset. The glycan masses were obtained by the glycan database-independent pipeline in Glyco-Decipher. Blue: PSMs with glycan mass that matched GlyTouCan glycans; Orange: PSMs with glycan mass that could not be annotated by GlyTouCan glycans. The bin width was set to 1 Da and centered at integer values. b A monosaccharide stepping method was designed for the deduction of the composition and the mass of modification moiety on modified glycans. c Percentage of PSMs with glycan parts annotated by database glycans and modified glycans. d Mass values of the ten most abundant modification moieties on glycans from glycopeptide spectra of the five mouse tissues. e Mass profiles of the modification moieties with known chemical compositions on modified glycans. f Examples of mass profiles of glycan modification moieties with unannotated composition. Source data are provided as a Source Data file.

Among the deduced modification moieties (Supplementary Fig. 18), some matched the monosaccharide (e.g., 163 Da for the isotopic-shifted mass of Hex), suggesting the addition of a corresponding monosaccharide to the deducted composition. Furthermore, there were 27 modification moieties (>20 GPSMs for each) that did not match any known monosaccharide (Supplementary Fig. 19), leading to the discovery of 164 modified glycans with mass values could not be annotated by GlyTouCan entries (Supplementary Data 3). The mass values for the 10 most abundant modification moieties are listed in Fig. 3d. The 179-Da moiety was the one with the most PSMs and was mostly added to oligo-mannose glycans (Supplementary Fig. 18). The observed mass of the 179 Da moiety coincided with the mass value of reported ammonium-adducted Hex (Hex+NH3)31,32 (Fig. 3e). The high similarity between the spectra of +179-Da-modified glycopeptides and their unmodified counterparts indicated that ammonium adduction occurs at the terminus of glycan structures (Supplementary Fig. 20), which is in line with a previous report33. Glyco-Decipher also demonstrated its ability to detect many other low-abundance glycan modifications. For example, glycans linked with a moiety of 242 Da (Hex + phosphorylation) (Supplementary Fig. 17), which is known as mannose-6-phosphate34 (M6P), were detected in 1,144 glycopeptide spectra of the mouse tissues. As a key targeting signal in lysosomal hydrolase transfer35,36, M6P exhibited distinct abundance distributions across the datasets of different tissues (Fig. 3e), implying disparate lysosomal activity in these tissues. Glycans with a moiety of 215 Da (Hex + 53 Da) were also detected (Fig. 3e), and intriguing dissociation of 53 Da from Y fragment ions was observed in MS2 (Supplementary Fig. 21), indicating that displacement of three protons by iron ions (3 + ) occurred on glycopeptides23.

Glyco-Decipher also discovered many unassigned moieties on glycans, including +220 Da and +262 Da at the terminus of glycan structures37,38 (Fig. 3f and Supplementary Fig. 22). After adopting the monosaccharide stepping method, ~24.8% of the PSMs remained with unannotated glycan composition due to the lack of sufficient Y fragment ions in their spectra (Fig. 3c and Supplementary Figs. 23 and 24). Some of them were deduced to have glycans of high mass values (Supplementary Fig. 24), suggesting high diversity of glycosylation. Overall, Glyco-Decipher can comprehensively profile the glycans attached to glycopeptides and enables the discovery of unexpected modified glycans (Supplementary Note 2 and Supplementary Figs. 49–52). The dedicated monosaccharide stepping method demonstrated its ability to reveal unexpected modifications on glycans which could not be achieved by other tools.

Systematic evaluation of the performance of Glyco-Decipher in glycoproteomics analysis

To evaluate the performance of Glyco-Decipher in large-scale N-glycoproteomics analysis, we systematically compared Glyco-Decipher with four software tools, including Byonic8 (v3.11.3), MSFragger-Glyco10 (v3.1.1), StrucGP12 (v1.0.0), and pGlyco 3.013 (v20210615), by using the dataset of mouse tissues (Supplementary Fig. 25). Notably, the peptide matching in Glyco-Decipher and StrucGP is glycan database-independent, while a list of glycans (pGlyco 3.0 and Byonic) or glycan masses (MSFragger-Glyco) are required for the other tools. Compared with the listed search tools, Glyco-Decipher reported the most PSMs and provided a 33.5–178.5% increase in the number of identified glycopeptide spectra due to the glycan database-independent search and spectrum expansion strategy (Fig. 4a, top). The consistency of peptide identification with the other tools was observed at both identification stages (i.e., before and after spectrum expansion) of Glyco-Decipher (Supplementary Fig. 26a). As a result, most of the glycopeptide spectra commonly identified by Glyco-Decipher and the other tools were matched to identical peptides. For example, 98.3% (69,349/70,576) of the spectra that commonly identified by Glyco-Decipher and StrucGP were matched to the same peptides. Glyco-Decipher also reported the most peptide identifications from the glycopeptide spectra of mouse tissues and covered 72% or more of the peptide sequences in paired comparisons with the other tools (Fig. 4a, bottom).

Fig. 4: Comparison between Glyco-Decipher and other software tools.
figure 4

a Comparison of the performance between Glyco-Decipher, StrucGP, pGlyco 3.0, MSFragger-Glyco and Byonic using the dataset of mouse tissues. Top: Comparison between Glyco-Decipher and other software tools in the performance of glycopeptide-spectrum interpretation. Green: the spectra matched to identical peptide backbones in Glyco-Decipher and another tool. Orange: the spectra commonly identified in Glyco-Decipher and another tool but matched to different peptide backbones. Gray: spectra specifically matched by another tool. Bottom: Comparison between Glyco-Decipher and other software tools in peptide sequence identification. Green: the peptide sequence commonly identified by Glyco-Decipher and another tool in pair comparison. Gray: the peptide sequence specifically matched by another tool in pair comparison. b FDR analysis using the 13C/15N metabolically labeled yeast dataset. The isotope-based FDR was calculated by matching 13C/15N isotopic peak pairs in MS1 spectra (line). Red line: FDR analysis based on the total GPSM results of each tool; blue line: FDR analysis based on the GPSM results that overlapped with Glyco-Decipher; purple line: FDR analysis based on the GPSMs specifically identified by each tool. The proportions of GPSMs of oligo-mannose glycans with composition of Hex(n)HexNAc(2) (green pie), NeuAc (orange pie) or NeuGc (red pie) containing glycans identified by each tool are shown in the bottom table. Source data are provided as a Source Data file. c Distributions of GPSMs of ammonium-adducted glycans reported by Glyco-Decipher and pGlyco 3.0. d Number of ammonium adduction GPSMs with/without oligo-mannose glycan composition. e Comparison of glycopeptide identification results between Glyco-Decipher and StrucGP/pGlyco 3.0. The additional (gain), overlap and lost identifications of Glyco-Decipher compared to other software tools are indicated by blue, green and orange bars, respectively. f Distributions of mannose-6-phosphate (M6P) GPSMs (bars) and intact glycopeptides (dots) across mouse tissues reported by Glyco-Decipher and pGlyco 3.0. All M6P identifications were validated by the diagnostic oxonium ion (phosphorylated hexose, m/z = 243.0269) in the glycopeptide spectra.

In terms of glycopeptide identification, we first investigated the confidence of these tools in multiple aspects (Fig. 4b). The dataset containing HCD spectra of glycopeptides from 13C/15N metabolically labeled yeast9 was searched using these tools with the same parameters (“Methods”). True-positive identifications would result in 13C/15N peak pairs in MS1, which correlate to the number of carbon/nitrogen atoms in the elemental composition of glycopeptides9. From the matching results of isotopic peak pairs, only Glyco-Decipher, StrucGP and pGlyco 3.0 correctly estimated the FDR of glycopeptide identification at the spectrum level (line plot in Fig. 4b and Supplementary Data 4). In the analysis of glycan compositions, glycans in almost all the GPSMs (>98.5%, pie plot in Fig. 4b) reported by the three tools agreed with known biosynthesis rules39, in which only oligo-mannose glycans with the composition of Hex(n)HexNAc(2) are linked on yeast proteins. The above analysis indicated that reliable glycan assignments were achieved in Glyco-Decipher and StrucGP/pGlyco 3.0 by comprehensively matching B/Y ions of glycans. Additional FDR analysis of the results gained from spectrum expansion also suggested the high confidence of Glyco-Decipher in glycopeptide identification with the adoption of peptide fragmentation pattern (Supplementary Fig. 26b). However, the number of interpreted glycopeptide spectrum of StrucGP is significantly less than that of Glyco-Decipher and pGlyco 3.0 on the yeast dataset (Fig. 4b), because the design of StrucGP is mainly for human and mammalian models12. Higher proportions of incorrect glycans were reported by Byonic (variable modification search) and MSFragger-Glyco (mass offset search) from the proteome-scale glycan database due to only mass values of glycans were utilized in matching (pie plot in Fig. 4b and Supplementary Fig. 27).

In addition to database glycan matching, Glyco-Decipher retained high confidence in modified glycan discovery. The identification of modified glycans could also be achieved in pGlyco 3.0 by enumerating all possible modified forms of database entries with user-provided monosaccharide modification. From the yeast dataset, 2528 and 2055 GPSMs of ammonium-adducted glycans were identified in Glyco-Decipher and pGlyco 3.0, respectively (Fig. 4c). Note that all the ammonium-adducted glycans matched in Glyco-Decipher are oligo-mannose glycans, which is in line with the glycosylation rule in yeast. In contrast, 1.8% of the ammonium adduction spectra reported by pGlyco 3.0 were matched to non-oligo-mannose glycans, e.g., Hex(11)HexNAc(2)Fuc(1) (Fig. 4d). The analysis of the identification results indicated that the incorrect glycan assignments in pGlyco 3.0 may originate from the largely increased glycan search space of the enumeration method (Supplementary Fig. 28). Benefiting from glycan database-independent searching, Glyco-Decipher enables the identification of modified glycans in discovery mode and the stepwise Y ion matching in monosaccharide stepping also ensures reliable elucidation of modified glycan compositions.

The performance of Glyco-Decipher was further compared with StrucGP and pGlyco 3.0 in detail due to their comparable confidence in glycopeptide identification. Taking advantages of the sensitivity of spectrum expansion, Glyco-Decipher reported more glycopeptide identifications even when only the results with GlyTouCan glycans were considered (Supplementary Fig. 29a), suggesting improved coverage of glycosylation analysis based on database glycan compositions. Combined with the results of discovered modified glycan, Glyco-Decipher yielded a total of 20,777 unique intact glycopeptide identifications from mouse tissues, leading to increases of 84.0% and 29.2% compared to StrucGP and pGlyco 3.0, respectively (Fig. 4e). Similar distributions of glycans in different categories were observed in the results of Glyco-Decipher and pGlyco 3.0 (Supplementary Fig. 29b). However, compared to the results of Glyco-Decipher and pGlyco 3.0, only ~1/3 complex/hybrid glycans were identified by the modularization strategy in StrucGP due to the increased complexity of glycan ions in glycopeptide spectra, leading to a biased interpretation of glycans (Supplementary Fig. 29b). Nevertheless, the modularization strategy is of great potential to interpret glycopeptide spectra with unexpected glycans beyond database entries. Apart from searching with GlyTouCan glycans, phosphorylation on hexose was set in pGlyco 3.0, and monosaccharide stepping was performed in Glyco-Decipher to evaluate their performance in identifying low-abundance modified glycans. Benefit from glycan-independent peptide matching, Glyco-Decipher is able to profile glycans in complex samples (Supplementary Fig. 13) and most of the unexpected/modified glycans identified by StrucGP/pGlyco 3.0 from the mouse dataset were also successfully detected by Glyco-Decipher (Supplementary Fig. 29c). The detailed comparison of the glycopeptide results of M6P glycans also demonstrated the high sensitivity of Glyco-Decipher in glycoproteomics analysis: pGlyco 3.0 reported 157 M6P glycopeptides in 1044 spectra by enumerating phosphorylation on database glycans (Fig. 4f). In contrast, phosphorylation on hexose was confidently detected in 1134 glycopeptide spectra with diagnostic oxonium ion (phosphorylated hexose, m/z = 243.0269) by Glyco-Decipher, leading to the identification of 171 M6P glycopeptides (Fig. 4f and Supplementary Fig. 29d).

To further evaluate the ability of Glyco-Decipher in unveiling the heterogeneity of glycosylation in complex samples, glycopeptides were enriched from the tryptic digest of a human serum sample and analyzed by LC–MS/MS under 80, 160, and 210 min reversed-phase chromatographic gradients, respectively. The tandem mass spectra were collected upon stepped-energy HCD fragmentation at 20–30–40% with an Orbitrap Exploris 480 (“Methods”). We then evaluated the performance of Glyco-Decipher with the acquired serum data (Supplementary Fig. 30a, b). In total, Glyco-Decipher provided 132.4% and 51.8% more interpreted glycopeptide spectra compared to StrucGP and pGlyco 3.0, respectively. The improvements in spectrum interpretation also elevated the glycoproteome coverage of human serum, leading to the identification of 6478 site-specific glycans in human serum, including 1523 modified site-specific glycans (Supplementary Data 5). Glycoproteins in human serum are heavily sialylated, which is a useful feature in clinical application for the detection, staging and prognosis of different diseases and cancers40. Compared to StrucGP and pGlyco 3.0, more sialic acid-containing site-specific glycans, including the multi-sialylated ones, were uncovered by Glyco-Decipher on the serum glycoproteins (Supplementary Fig. 30c, d). In addition, NeuGc-containing glycans are negligible for the glycosylation of human serum41 (Supplementary Fig. 31a), and the feature has also been used to evaluate the accuracy of glycopeptide identification14. The absence of NeuGc-containing glycan in the results of Glyco-Decipher also suggests its reliability in glycan assignment (Supplementary Fig. 31b).

To assess the capability of Glyco-Decipher in benefiting biological research by offering deeper coverage of glycosylation heterogeneity, we also reanalyzed N-glycosylation datasets of the SARS-CoV-2 spike and its receptor angiotensin-converting enzyme 2 (ACE2) from a recent study by Zhao et al.42. The SARS-CoV-2 coronavirus is responsible for severe acute respiratory syndrome and utilizes a spike glycoprotein trimer for host cell receptor binding43. ACE2, known as a receptor of SARS-CoV-2 in the human body44, is also a glycoprotein that acts as a regulator of cardiovascular homeostasis. Reanalysis of the MS data revealed that Glyco-Decipher identified 1640 and 310 site-specific glycans on the SARS-CoV-2 spike and ACE2, respectively, which is an ~70% increase in the identification number compared to that of the original study in which the data were searched with pGlyco 2.09 (Supplementary Fig. 32a, Supplementary Data 6). More glycan identifications were reported by Glyco-Decipher on all 22 glycosites of the SARS-CoV-2 spike and 6 glycosites of ACE2 (Supplementary Fig. 32b, c). It should be noted that glycans containing NeuGc were not observed in the identification results of Glyco-Decipher (Supplementary Fig. 33a), which is in line with the glycosylation rule of SARS-CoV-2 spike protein cultured in HEK293 T cell42. More detailed information on the site-specific glycans identified on SARS-CoV-2 spike and ACE2 are provided in Supplementary Fig. 33. Extended comparison with StrucGP/pGlyco 3.0 on this dataset also demonstrates the improvements of Glyco-Decipher in glycoproteomics analysis (Supplementary Fig. 34).

Glyco-Decipher allows determination of the abundance distribution of site-specific glycans

In addition to the identification of intact glycopeptides, quantitation of glycopeptides is another essential issue in assessing glycosylation levels for functional studies. A quantification module was developed and embedded in Glyco-Decipher. In this module, the theoretical isotopic peak list of the precursor was calculated based on the elemental composition of the intact glycopeptide. The elution profile of the precursor was extracted from the MS1 spectra around its retention time, and the area under the elution profile was used for quantification (Supplementary Fig. 35). This module enables the determination of the abundance distribution of each glycan or a specific glycosylation type in a system (Supplementary Fig. 36). Correlation analysis of glycosylation at the sample level or the site level revealed the similarity of glycosylation between systems (Supplementary Figs. 37 and 38).

The occupation of diverse glycans on a glycosite is under the regulation of glycosyltransferases and glycosidases and the abundance distribution of site-specific glycans can be readily determined by Glyco-Decipher. Prosaposin (UniProt: Q61207), the precursor of saposin A-D, which acts as a cofactor for lysosomal hydrolysis of sphingolipids, was exemplified in detail (Supplementary Data 7). As presented in Fig. 5, the abundance distributions of different types of glycans at glycosites in prosaposin are illustrated. Complex/hybrid glycans of ten distinct compositions, account for a large proportion (43.9%) of glycosylation at glycosite N80 of prosaposin in the lung, including those with relatively low abundance (e.g., Hex(4)HexNAc(3)NeuAc(1)Fuc(1) of 5.4%). In contrast, less than five complex/hybrid glycans were identified by StrucGP and pGlyco 3.0 at glycosite N80 in the lung, thus leading to a biased distribution calculation (Supplementary Figs. 39 and 40).

Fig. 5: Quantitative analysis of site-specific glycosylation on prosaposin by Glyco-Decipher.
figure 5

Relative abundance distribution of glycans at each glycosite in prosaposin across five mouse tissues (heat map). Compositions and possible structure illustration of the high-abundance glycans are annotated at the top. For a more intuitive demonstration, the abundance distributions of glycans at each glycosite are listed in the right radial diagrams. In each radial diagram, nodes around the circle denote glycans linked to prosaposin, and donuts in the center denote glycosites identified in prosaposin. Linkage between the node and the center donut indicates that the glycosite was modified by the corresponding glycan. The percentage value in each donut indicates the relative abundance for a certain type of glycan. All M6P identifications were validated by the diagnostic oxonium ion (phosphorylated hexose, m/z = 243.0269) in glycopeptide spectra. H Hex, N HexNAc, A NeuAc, G NeuGc, F Fuc, P Phosphorylation. See Supplementary Data 7 for detailed information of glycan nodes.

Notably, modified glycans that are missed in glycan database-dependent tools also contribute to the determination of distributions of site-specific glycans. For example, the ammonium-adducted glycan was specifically identified by Glyco-Decipher at N334 of prosaposin in the kidney (Fig. 5 and Supplementary Fig. 41). M6P glycans, which are associated with lysosomal trafficking of prosaposin via the cation-independent mannose-6-phosphate receptor45 (M6PR), were detected with relatively high abundance at the glycosite N459 in brain and heart. For example, M6P glycans accounted for 74.1% of the abundance distribution at this site in mouse heart, as determined by Glyco-Decipher (Fig. 5). Similar abundance distributions of M6P glycans were observed based on the results of pGlyco 3.0 (Supplementary Fig. 40). However, despite their relatively high abundance, these modified glycans were not identified by the other glycan database-dependent tools. Overall, the in-depth coverage of glycosylation heterogeneity of Glyco-Decipher offers more accurate relative quantification of site-specific glycosylation.

Source link