Preloader

High-throughput and high-efficiency sample preparation for single-cell proteomics using a nested nanowell chip

Design and operation of the N2 chip

The N2 chip is distinct from previous nanoPOTS chips6,11,15,16. We cluster an array of nanowells in high density and use each cluster for one multiplexed TMT experiment. In this proof-of-concept study, we designed 9 (3 × 3) nanowells in each cluster and 27 (3 × 9) clusters, resulting in a total of 243 nanowells on one chip (Figs. 1a and S1a). Additionally, we designed a hydrophilic ring surrounding the nested nanowells to confine the droplet position and facilitate the TMT pooling and retrieval steps. Compared with previous nanoPOTS chips6,8,15, we reduced the nanowell diameters from 1.2 to 0.5 mm, corresponding to an 82% decrease in contact areas and an 85% decrease in total processing volumes (Table 1). The miniaturized volume resulted in a ~45× increase in trypsin digestion kinetics because both trypsin and protein concentrations were increased by 6.67×. Both the reduced contact area and increased digestion kinetics were expected to enhance scProteomics sensitivity and reproducibility.

Fig. 1: The design and operation of the nested nanoPOTS (N2) chip.
figure1

a A 3D illustration (top) and a photo (bottom) of the N2 chip. Nine nanowells are nested together and surrounded by a hydrophilic ring for a TMT set. The length of scale bar is 5 mm. b Single-cell proteomics workflow using the N2 chip. The length of scale bars is 0.5 mm.

Table 1 Comparison of technical characteristics between N2 and nanowell chips.

The scProteomics sample preparation workflow using the N2 chip is illustrated in Fig. 1b. To sort single cells in the miniaturized nanowells, we employed an image-based single-cell isolation system (IBSCI, cellenONE F1.4). The cellenONE system also allowed us to dispense low nanoliter reagents for cell lysis, protein reduction, alkylation, and digestion. After protein digestion, TMT reagent is dispensed to label peptides in each nanowell uniquely. Finally, we distributed 10 ng boosting/carrier peptide and 0.5 ng reference peptide into each nanowell cluster to improve the protein identification rate (Fig. S1b)14. To integrate the N2 chip in our LC-MS workflow, we loaded the chip in a nanoPOTS autosampler6. We applied a 3 µL droplet on top of the nested nanowells, combined the TMT set, and extracted the peptide mixture for LC-MS analysis (Fig. 1b). Compared with our previous nanoPOTS-TMT workflow6,15,16, the total processing time of each chip was reduced from 36.5 to 18 min (Fig. S1c), which is equivalent to the reduced time from 0.83 to 0.07 min for each single cell. As such, the N2 chip increases the single-cell processing throughput by >10×.

It should be noted that the N2 chip can be coupled with conventional LC systems without the use of the customized nanoPOTS autosampler. As shown in Fig. S1d, the user can manually add an 8-µL droplet inside the hydrophilic ring to pool the TMT-labeled single-cell samples and transfer it into an autosampler vial for LC injection. Recently, Schoof et al.19 and Liang et al.20 have demonstrated the Opentrons OT-2 liquid handler can reliably pipette low-µL-scale solutions for preparing single-cell samples. Similarly, the TMT pooling step for the N2 chip could be automated with conventional LC systems using the OT-2 robot.

Sensitivity and reproducibility of the N2 chip

We first benchmarked the performance of the N2 chip with our previous nanowell chip using diluted peptide samples from three murine cell lines (C10, Raw, SVEC). To mimic the scProteomics sample preparation process, we loaded 0.1 ng of peptide in each nanowell of both N2 and nanowell chips (Fig. S1b) and then incubated the chips at room temperature for 2 h. The long-time incubation would allow peptides to absorb on nanowell surfaces and lead to differential sample recoveries. The combined TMT samples were analyzed by the same LC-MS system. When containing at least one valid reporter ion value was considered as identified peptides, an average of 5706 peptides were identified with N2 chip, compared with only 4614 with nanowell chip. The increased peptide identifications result in a 15% improvement in proteome coverage; the average proteome identification number was increased from 1082 ± 22 using nanowell chips to 1246 ± 6 using N2 chips (Fig. 2a). We observed significant increases in protein intensities with N2 chips. The median log2-transformed protein intensities are 13.21 and 11.49 for N2 and nanowell chips, respectively, corresponding to ~230% improvement in protein recovery (Fig. 2b). Together, these results demonstrated that the N2 chips can improve sample recovery and proteomics sensitivity.

Fig. 2: Performance comparison between nanowell chip and N2 chip.
figure2

a The numbers of protein identifications from 0.1 ng tryptic peptides from three cell lines (four TMT sets from nanowell chip and four TMT sets from N2 chip). b The distributions of log2 transformed protein intensities in each TMT channel (n = 870 proteins). c Venn diagram of quantifiable proteins between nanowell and N2 chips. (d) The distributions of the coefficient of variations (CVs) for proteins identified in each cell type. Protein CVs were calculated inside single TMT batches (left), among different TMT batches without batch corrections (middle), and with batch correction (right). From left to right, the number of proteins (n) are: 1005, 1002, 975, 1213, 1233, 1221, 745, 747, 759, 938, 927, 944, 736, 738, 747, 937, 924, and 940, respectively. In b and d, center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles. Source data are provided.

We assessed if the N2 chip could provide comparable or better quantitative performance compared with nanowell chips (Fig. S2a). As expected, more proteins are quantifiable with N2 chip when 70% valid values in each cell line were required; the quantifiable protein numbers were 870 and 1123 for nanowell and N2 chips, respectively (Fig. 2c). For nanowell chips, pairwise analysis of any two samples showed Pearson’s correlation coefficients from 0.97 to 0.99 between the same cell types and from 0.87 to 0.95 between different cell types (Fig. S2b and S2c). With N2 chips, Pearson’s correlation coefficients were increased to a range of 0.98–0.99 for the same cell types, and a range of 0.91–0.96 for different cell types. We next evaluated the quantification reproducibility by measuring the coefficient of variations (CV) of samples from the same cell types. In intra-batch calculations, we obtained median protein CVs of <9.6% from N2 chips, which is more than two-fold lower than that from nanowell chips (median CVs of <24.9%) (Fig. 2d). Higher CVs were obtained between different TMT batches, which was known as TMT batch effect21. When Combat algorithm22 was applied to remove the batch effect, the median protein CVs from N2 chip dropped to <6.7%. Such low CVs are comparable with bulk-scale TMT data, demonstrating the N2 chip could provide high reproducibility for robust protein quantification in single cells.

Proteome coverage of single cells with the N2 chip

We analyzed a total of 108 single cells (12 TMT sets) from three murine cell lines, including epithelial cells (C10), immune cells (Raw264.7), and endothelial cells (SVEC) (Fig. 3a). Noteworthily, these three cell types have different sizes, which allows us to evaluate if the workflow presents a bias in protein identification or quantification based on cell sizes. Specifically, Raw cells have a diameter of 8 µm, SVEC of 15 µm and C10 of 20 µm (Fig. S3a).

Fig. 3: Single-cell proteomics with N2 chip.
figure3

a Experiment design showing single-cell isolation and TMT labeling on the N2 chip. b, c The numbers of identified peptides and proteins in 12 TMT sets. At least one valid value in the nine single-cell channels is required to count as an identification. Centerlines show the medians; top and bottom horizontal lines indicate the 25th and 75th percentiles, respectively. The data point (n) to generate the violin plots is 9. d The numbers of quantifiable proteins based on different percentages of required valid values. e Box plot showing the distributions of protein identification numbers (n = 36 single cells for each cell type). Centerlines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles. f Violin plot showing the distributions of signal to noise ratio (SNR) per channel for raw single-cell signals calculated by SCPCompanion17. Two published TMT scProteomics datasets from our group using nanowell chip15,16 were used to benchmark the data generated in this study. Centerlines show the medians; top and bottom horizontal lines indicate the 25th and 75th percentiles, respectively. The numbers (n) of SNR datapoints are 284,193 in N2 datasets, 282,190 in Tsai et al. datasets, and 260,039 in Dou et al. datasets. Source data are provided.

Among the 12 TMT sets, our platform identified an average of ~7369 unique peptides and ~1716 proteins from each set with at least one valid value in the nine single-cell channels (Fig. 3b, c). We identified a total of 2457 proteins, of which, 2407 proteins had reporter ion intensities in at least 1 single cells across the 108 cells (Fig. 3d). When a stringent criteria of >70% valid values was applied, the number of proteins dropped to 1437. As expected, we observed the numbers of proteins identified for three cell types ranked according to the cell sizes (Fig. S3a). An average of 1735, 1690, and 1725 proteins were identified in C10, RAW, and SVEC cells, respectively (Fig. 3e). Similar trends were also observed in the distribution of protein intensities (Fig. S3b).

Cheung and coworkers17 recently introduced the software SCPCompanion to characterize the quality of the data generated from single-cell proteomics experiments employing isobaric stable isotope labels and a carrier proteome. SCPCompanion extracts signal-to-noise ratio (SNR) of single-cell channels and provides suggested cutoff values to filter out low-quality spectra to obtain high-quality protein quantitation. In line with our experimental design, SCPCompanion estimated that ~0.1 ng proteins were contained in single cells, and the boost-to-single ratio is ~100 (Supplementary Data 1), indicating minimal peptide losses in the N2 chip. More importantly, the median SNR per single-cell sample was 14.4, which is very close to the suggested cutoff value of 15.5, corresponding to ~50% of raw MS/MS spectra can provide robust quantification. We also compared the data quality generated with previous nanowell chips and similar LC-MS setup15,16. The median SNR values per sample were 7.015 and 6.416, indicating the N2 chip-based workflow increased the SNRs by 106% and 125%, respectively (Fig. 3f). Recently, Hartlmayr et al.23 observed that the use of TMTpro 16plex can give higher SNRs compared with TMT 10plex. To verify the performance improvement observed in the N2 chips was not solely due to the change of TMT reagents, we labeled single-cell-level peptides (0.1 ng) with both TMT 10plex and TMTpro 16plex. We analyzed them with the same MS using four different normalized HCD collision energy levels. As shown in Fig. S4b, MS1 peak intensities show similar distribution between the two TMT reagents. At MS2 level, we consistently observed TMTpro gave higher signal intensities (Fig. S4c) and SNRs (Fig. S4d), which agreed with the previous report23. The differences were much larger at lower HCD energy compared with high energy levels. The SNRs were increased by 212%, 119%, 67%, and 66% at HCD energies of 26%, 29%, 32%, and 35%, respectively. Because we used similar normalized HCD collision energy in our current N2 chip (34%) and previous nanowell chip-based work (35%), we reason the TMTpro reagent could lead to a similar improvement of ~66%, which accounts for ~40–50% of the total contributions.

Cell typing with scProteomics

To assess the quantitative performance of the N2 chip-based scProteomics platform, we first performed a pairwise correlation analysis using the 1437 proteins across the 108 single cells. As expected, higher correlations were observed among the same types of cells and lower correlations among different types of cells (Fig. 4a). The median Pearson correlation coefficients are 0.98, 0.97, and 0.97 for C10, RAW, and SVEC cells, respectively. We next calculated the coefficient of variations (CVs) using protein abundances for the three cell populations. Interestingly, we see very low variations with median CVs <16.3% (Fig. S5), indicating protein expression is very stable for cultured cells under identical conditions. Principal component analysis (PCA) showed strong clustering of single cells based on cell types and the three clusters were well separated from one another (Fig. 4b). We compared these results to our previous PCA result obtained from the same three cell types using the nanowell-based platform (Fig. S6a)16. The median intra-cluster distances for the two-component PCA were relatively similar at 1.16 and 0.92 (median values) for nanowell and N2 chips, respectively (Fig. S6b). Conversely, the inter-cluster distances were 4.93 and 8.68 for nanowell and N2 chips, demonstrating the data generated from N2 chips have higher classification power for different cell populations.

Fig. 4: Evaluation of quantification performance for single cells.
figure4

a Clustering matrix showing Pearson correlations across 108 single cells using log2-transformed protein intensities. The color scale indicates the range of Pearson correlation coefficients. b PCA plot showing the clustering of single cells by cell types. Total 1437 proteins were used in the PCA projection. c Heatmap with hierarchical clustering showing 1127 significant proteins based on ANOVA test. Three protein clusters used for pathway analysis were labeled and highlighted. The color scale indicates the range of Z-score values. Source data are provided.

To identify proteins leading the clustering of the three cell populations, an ANOVA test was performed (Permutation-based FDR < 0.05, S0 = 1). Of the total 1437 proteins, 1127 were significantly differentially changed in abundances across three cell types (Fig. 4c). Among them, 237 proteins were enriched in C10 cells, 203 proteins were enriched in SVEC cells, and 275 proteins were enriched in RAW cells. Proteins enriched in each cell type revealed differences in molecular pathways based on the REACTOME pathway analysis (Fig. S7). For example, the proteins higher in abundance in C10 cells were significantly enriched in REACTOME terms such as “vesicle-mediated transport”, “membrane trafficking”, “innate immune system”, or “antigen processing-cross presentation”. These functions are in line with the known functions of lung epithelial cells, of which the C10 are derived from24. The protein more abundant in RAW cells, which derive from murine bone marrow macrophages, were enriched in REACTOME terms associated with “neutrophil degranulation”, “innate immune system” in line with their immune function. Other REACTOME terms related to the “ribosome” and the “pentose phosphate pathway” were also enriched. These pathways not only suggest that there is intricate cooperation between macrophages and neutrophils to orchestrate resolution of inflammation and immune system25, but also show that system metabolism strongly interconnects with macrophage phenotype and function26. Proteins more abundant in SVEC cells (murine endothelial cells) were enriched in pathways, including “processing pre-mRNA”, “cell cycle”, or “G2/m checkpoints”. This suggests its proliferation, migration, or coalescing of the endothelial cells to form primitive vascular labyrinths during angiogenesis27.

Identifying cell surface markers with scProteomics

One of the unique advantages of scProteomics over single-cell transcriptomics is the capability to identify cell surface protein markers, which can be readily used to enrich selected cell populations for deep functional annotations. We assessed if we can use our scProteomics data to identify cell-type-specific membrane surface proteins for the three cell populations. We matched the enriched protein lists to a subcellular-localization database from UniProtKB, which consists of 2871 reviewed plasma membrane proteins for Mus musculus (updated on 01/04/2021). We generated a list containing 64 plasma membrane proteins (Supplementary Data 2). Among them, 17 proteins were highly expressed in C10 compared to RAW and SVEC cells, while 34 and 13 plasma membrane proteins were significantly enriched in RAW and SVEC cells, respectively. For example, NCAM128, EZRI29, and JAM130, which are previously known to protect the barrier function of respiratory epithelial cells by enhancing the cell-cell adhesion, are highly expressed in C10 cells (Fig. 5a, left panel). For RAW enriched membrane proteins, CD1431 and CD6831,32 are widely used as histochemical or cytochemical markers for inflammation-related macrophages (Fig. 5a, middle panel). CY24A is a sub-component of the superoxide generating NOX2 enzyme on macrophage membrane33. In terms of SEVC enriched protein markers, BST2 is known to be highly expressed in blood vessels throughout the body as an intrinsic immunity factor (Fig. 5a, right panel)34. Both of HMGB1 and DDX58 were found to be highly expressed in endothelial cells in lymph node tissue based on tissue microarray (TMA) results in human protein atlas. We also attempted to compare with our previous results using nanowell chips (Fig. S8)16. Only five out of the nine membrane proteins were significantly enriched in one of the cell types, and three were not detected, likely due to the lower sensitivity and reproducibility of the previous nanowell devices and workflows.

Fig. 5: Prediction of cell-type-specific surface proteins with single-cell proteomics.
figure5

a Violin plots showing nine putative plasma membrane proteins enriched in three cell types. Proteins in each column are statistically significant (Two-sided t-test, ***p-value ***<0.001) expressed in the specific cell type. Centerlines show the medians; top and bottom horizontal lines indicate the 25th and 75th percentiles, respectively. For C10, n = 34 single cells; For RAW, n = 35 single cells; For SVEC, n = 35 SVEC single cells. b Immunofluorescence images showing the expression of NCAM1_MOUSE, CD14_MOUSE, and BST2_MOUSE in three cell populations. The protein abundance is visualized with red fluorescence, and DNA is visualized by DAPI staining (blue). The length of scale bar is 20 µm. Source data are provided.

To evaluate the usability of scProteomics for identifying the cell-type-specific surface marker proteins, we selected one protein from each of the three cell populations (NCAM1_MOUSE fro C10; CD14_MOUSE for RAW; BST2_MOUSE for SVEC) and evaluated their specificity using an immunofluorescence imaging approach. As shown in Fig. 5b, immunofluorescence imaging validated the enrichments of the three marker proteins to their assigned cell types. It also confirmed their expected subcellular localization at the surface of the plasma membrane. Next, we assessed if these protein markers are specifically expressed in similar cell types in tissue samples. We verified the localization of the markers on human immunoperoxidase histology images generated by the Human Protein Atlas focusing on respiratory organs (lung, bronchi, and nasopharynx)35. While the general organization of the lung differs in human and mice (e.g.number of lobes, airway, and bronchi organization), the cell types composing the organ are almost identical as evidenced in a scRNA-seq study36. Thus, we speculated the human and mice cells should share many similarities in terms of protein expression patterns. As anticipated, the localization of the protein markers for similar cell types in human tissues is in agreement with our scProteomics data (e.g., C10 and RAW). Both EZRI and JAM1 enriched in C10 are localized in human epithelial cells (Fig. S9). The immune-cell-related markers, CD14, CD68, and CYBA (Uniprot protein name: CY24A_Human), are localized explicitly in macrophage cells in human lung tissues. Together, these results demonstrated cell-type-specific surface markers can be effectively identified by combining scProteomics with subcellular-localization information.

Comparing scProteomics with scRNA-seq measurements

We compared the scProteomics results to previously published scRNA-seq datasets containing 11 C10 cells37 and 185 Raw cells38 generated with SMART-Seq2 workflows. Compared with scRNA-seq, we observed higher Pearson correlation coefficients from scProteomics for both cell types (Fig. 6a). Specifically, the medians of correlation coefficients of mRNA abundances are 0.60 (C10) and 0.71 (RAW), while the coefficients of protein abundances significantly increased to 0.98 (C10) and 0.97 (RAW). The low variation in protein abundances can also be observed in the CV distributions of protein or mRNA abundances (Fig. 6b). Previous work have suggested moderate correlations between protein and mRNA abundances of the same genes39,40. Our cross-correlation analysis between protein and mRNA shows similar trends with coefficients of 0.22 for C10 and 0.36 for RAW (Fig. 6c, d). These low correlations agree with bulk-scale measurement41 and suggest scProteomics could provide additional information on the cell functions.

Fig. 6: Integrative analysis of single-cell proteomics and single-cell transcriptomics datasets.
figure6

a Box plots showing the distributions of Pearson correlation coefficients and b coefficient of variations of transcript and protein abundances from scRNA-seq (Pearson correlation plot: n = 11 single C10 cells and 186 single RAW cells; coefficient of variation plot: n = 1292 genes for both C10 and RAW) and scProteomics (Pearson correlation plot: n = 36 single cells for both C10 and RAW; coefficient of variation plot: n = 1292 proteins for both C10 and RAW cells), respectively. Centerlines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles. c Clustering matrix showing Pearson correlations of transcript and protein abundances for C10 and d RAW cells. The color scale indicates the range of Pearson correlation coefficients. e The linear correlation of log2-transformed fold changes of C10 and RAW cells between scRNA-seq and scProteomics. The color scale indicates the range of data point density. f Venn diagrams showing the overlap of membrane protein markers predicted by RNA seq and proteomics. Source data are provided.

To identify the differentially expressed proteins and mRNAs between the two cell types, we performed t-test for both datasets. The proteins and mRNAs enriched either in C10 or RAW were moderately correlated. The overlaps between enriched mRNA and proteins were 44% for C10 and 40% for RAW cells (Fig. S10a). Most proteins and mRNAs followed a similar abundance pattern between the two cell types (Fig. 6e). The linear correlation coefficient of the log2(fold-changes) of the protein-mRNA pair is 0.55. The magnitude of the fold-change seemed higher for mRNA compared to protein (linear regression slope 0.08). This difference may indicate that a high amount of RNA is required to result in a moderate change of protein abundance. Another explanation is that the amplification steps employed in single-cell RNA sequencing may result in artificially inflated fold-changes42. Reactome pathway analysis for the significantly enriched proteins and mRNAs indicates general agreements between the two measurement types (Fig. S10b). However, a few enriched pathways were unique to either single-cell proteomics or RNA sequencing. For example, for pathways enriched in C10 cells, downstream signaling events of B cell receptor (BCR) was only detected at protein level and the adaptive immune system was only seen at mRNA level. For pathways enriched in RAW cells, the innate immune system was only observed at the protein level. However, several immune-related Reactome pathway terms were unique at mRNA level.

Finally, we assessed if mRNA and protein measurements predict the same membrane protein markers. After matching to the same subcellular-localization database, scRNA-seq measurements identified 30 membrane proteins for RAW cells and 40 proteins for C10 cells (Supplementary Data 2). The overlaps between the two measurements were moderate for both cell types (Fig. 6f). Less than 32.5% protein targets predicted by RNA-seq were found by proteomics measurements, indicating mRNA abundances cannot precisely infer membrane protein abundances. Interestingly, both protein and mRNA measurements identified the six proteins shown in Fig. 5 as significantly enriched markers. Overall, our analysis suggests the combination of the two modalities provides the most reliable membrane protein markers.

Source link