Recent advances in both sequencing and mass spectrometry technologies have enabled the generation of high quality data sets about expression systems such as Chinese hamster ovary (CHO) and others11,15,16. In order to obtain even greater insights from these data sets, an emerging approach called comparative systeomics was used in this study to analyze whole cell proteomic and transcriptomics data of CHO and SP2/0. Firstly, a comprehensive omics study was performed on both exponential and stationary phases of two model cell lines, M-CHO and SP2/0, in order to evaluate differences in their proteome and transcriptome expression patterns, as well as the changes in each cell line between the exponential and stationary phases (see Fig. 1). In order to increase the solubilization of whole cell proteins, including membrane proteins, the filter aided sample preparation (FASP) method was used17, and high and low pH reversed phase liquid chromatography was coupled prior to MS/MS to substantially increase the proteome coverage18. Digests from the exponential phase were separated into 24 fractions and ran twice on LC/MS/MS, whereas the digests from stationary phase were separated into 48 fractions prior to LC/MS/MS analysis. Interestingly, separation into 24 and 48 fractions identified a similar number of proteins, as shown in Table 1, suggesting that separating the lysate into 24 fractions with duplicate runs can be sufficient to reach high numbers of identified proteins. A box plot justifying the same amount of protein was injected to the LC/MS/MS is given in Supplementary Fig. S2.
RNAseq resulted in the identification and quantification of more than 10,500 transcripts for M-CHO, whereas the sequencing of SP2/0 identified and quantified more than 13,500 transcripts, likely due to the superior annotation of the mouse genome. The identified mRNA, along with their normalized values and triplicates belonging to M-CHO exponential, M-CHO stationary, and CHO-K1 stationary phases are tabulated in Supplementary Table 1, whereas Supplementary Table 2 includes the mRNA values measured for SP2/0 exponential and SP2/0 stationary phases. Analogously, label free proteomic experiments resulted in the identification of 45,000–55,000 unique peptides belonging to the 6000–7000 grouped proteins with a 1% FDR (false discovery rate) for both peptides and proteins. The average number of peptides identified per protein was around 7–8, providing high coverage for most proteins. This represents a whole deep sequencing proteomic profiling of SP2/0 and a serum free suspension CHO cell line, yielding 7118 and 7410 identified proteins for SP2/0 and CHO cell lines, respectively. A previous analysis of MS/MS spectra of a serum-bearing and attachment-dependent model CHO-K1 cell line by our group identified 6358 proteins using the same search criteria11, and another analysis of two CHO cell lines (CHO-S and CHO DG44) identified 9359 unique proteins12. The protein and peptide information belonging to M-CHO exponential, M-CHO stationary, SP2/0 exponential, SP2/0 stationary and a control CHO-K1 ATCC stationary are compiled in Supplementary Tables 3, 4, 5, 6, and 7, respectively, with a summary of these results in Table 1.
Correlation and comparison of CHO and SP2/0 proteomes and transcriptomes
Due to the lack of comprehensive omics data sets, little is known regarding the differences in proteome and transcriptome expression patterns between CHO and SP2/0 or about the changes between the exponential and stationary phases of these cells at protein or mRNA levels. In order to perform a comprehensive comparison, mRNA and protein levels of data sets were compared between M-CHO and SP2/0 cell lines along with different phases. A standard normalized FPKM (fragments of reads mapped per kilobase of exon model) was used to correlate and compare the mRNA values of the samples, whereas the abundance level across the proteins and between the samples was compared using the normalized spectral abundance factor (NSAF), accounting for the length of the identified proteins19. Firstly, the genes having both mRNA and protein expression were mapped for both cell lines under the two conditions, resulting in 5500–6000 genes exhibiting both mRNA and protein expression in the separate phases (exponential and stationary) for each cell line (Fig. 2Aa–d). An additional 4000–8000 genes were identified and quantified only in the mRNA transcripts for each cell line, while 500–600 additional genes were found only in the proteome data. An alternative evaluation examined which of these genes were found in both the exponential and stationary phases for transcriptomics and proteomic data for each cell lines, as shown in Fig. 2Ae–h. Over 10,000 genes were identified from transcriptomics data in both exponential and stationary phases for each cell line, while more than 5500 genes were elucidated in the proteome of each cell type for both phases.


(A) Number of genes identified in mRNA and/or protein levels (a) M-CHO exponential phase mRNA versus protein comparison (b) M-CHO stationary mRNA versus protein comparison (c) SP2/0 exponential mRNA versus protein comparison (d) SP2/0 stationary mRNA versus protein comparison (e) M-CHO exponential versus stationary mRNA comparison (f) M-CHO exponential and stationary protein comparison (g) SP2/0 exponential and stationary mRNA comparison (h) SP2/0 exponential and stationary protein comparison (B) (a–d) Comparison and correlation of mRNA and protein data in M-CHO or SP2/0 cell lines in exponential and/or stationary phases (C) (a–d) Comparison and correlation of mRNA and protein data between the M-CHO and SP2/0 cell lines in exponential or stationary phases (D) Downregulated proteins in M-CHO cells compared to SP2/0 cells.
Next, pair-wise comparisons were performed; as indicated in Figs. S1 and S2, the relative expression levels between the two phases were similar for both SP2/0 and M-CHO cells. Secondly, protein and mRNA expression levels were compared for each phase of growth in each cell line in Fig. 2B, while the mRNA and protein expression levels were compared between the two cell lines in Fig. 2C on a logarithmic scale. The confidence level calculations illustrated that a majority of the genes were found to be in the 90 or 95% confidence interval. Examples of groups of genes that lie outside the 95% confidence interval are shown in Fig. 2D for the case of stationary phase proteomics comparison between the SP2/0 and M-CHO cell lines. These groups, which are at least 1.8-fold downregulated in M-CHO cells compared to the SP2/0 cells based on NSAF, are associated with protein folding, protein synthesis and protein metabolism. For example, PPIA-cyclophilin A, known to accelerate protein folding, and HSPD1, which plays a role in protein folding and assembly, are lower in CHO cell lines during the stationary phase. Also, the translation initiation factor, EIF3K displayed a lower NSAF value in M-CHO cells. Interestingly, co-expression of translation initiation factors such as EIF4A was previously shown to increase the expression of an antibody more than 3–fourfold in one mammalian cell line (COS)20. The growth curves of these two cell lines can be found in Figs. S3a,b and S4.
In addition to the cell cycle and protein folding pathways, apoptosis and actin cytoskeleton signaling pathways were found to be differentially expressed between the two cell lines. The actin cytoskeleton expression was found to be lower in the exponential phase SP2/0 proteome data relative to exponential phase in M-CHO. Interestingly, the actin cytoskeleton was also found to be a biological hub, providing crosstalk with PAK and RAC signaling (Supplementary Figs. S5 and S6). Previous research has shown that destabilizing the actin cytoskeleton with either MTX or Cytochalasin D in CHO cells can increase the production of recombinant secreted alkaline phosphatase by 50–150 fold21.
Pathway analysis of CHO and SP2/0 cell lines
In order to further explore differences between CHO and SP2/0 cell lines at the systems level, we applied pathway analytical tools, including KEGG and IPA, along with biological, molecular and cellular functional analysis tools such as GO (Fig. 1). Both CHO (M-CHO and CHO-K1) and SP2/0 RNAseq and proteome data were mapped to the Criteculus griseus and Mus musculus KEGG identifiers and pathways, respectively, with enrichment and depletion analyses performed using a hypergeometric distribution. The p-value results from both these tests are listed in Supplementary Table 8 with CHO-K1 data included to determine whether the results vary across different CHO cell lines. In this analysis, we focused on (1) comparing the enrichment and depletion results of stationary and exponential phases for both cell lines (2) comparing the over-represented and under-represented pathways for the M-CHO, SP2/0, and CHO-K1 ATCC cell lines. When the hypergeometric distribution test was applied to compare exponential and stationary phases, whole proteomics and transcriptomics p-values indicated that several pathways, such as apoptosis, RNA degradation, and proteasome, exhibited a higher representation in the CHO stationary phase. In addition, analyzing the proteomics for both exponential and stationary phases increased the number of proteins identified in the CHO proteome compared to previous studies3,11,22. For instance, proteins such as TNFSF10 (TRAIL) from the apoptosis pathway, EDEM1, CRYAB, and Mbtps1 from the protein processing pathway were shown to be expressed in the current M-CHO study. Other proteins, such as ERGL and S2P involved in protein processing pathways, were identified in SP2/0 cells in this study even though they were absent from the CHO proteome.
Shown in Fig. 3A is a heatmap that illustrates the proteomics changes in pathway depletion p-values for the exponential and stationary phases of M-CHO, SP2/0 and CHO-K1 stationary phase as a control. In all three cell lines, ribosome, RNA-transport, and spliceosome were found to be the highest enriched pathways, whereas metabolic pathways such as glycerolipid and glycerophospholipid metabolism were found to be depleted in CHO cells compared to the SP2/0 cells. The shared 288 pathways between CHO and SP2/0 cells were further investigated. The overall number of pathways showing significant depletion in CHO cells was higher in number compared to SP2/0 cells. Retinol metabolism was the only group showing slight under-representation in SP2/0 cells for both phases compared to M-CHO cells while all others groups were over-represented in SP2/0 compared to CHO.


(A) Heatmap of p-values from hypergeometric analysis of exponential and stationary phase M-CHO and SP2/0 cell lines and exponential phase CHO-K1 cell line (B) Heatmap of selected pathways from which are upregulated in SP2/0 exponential and SP2/0 stationary.
A heat map for a group of proteins found to be more depleted in CHO cells compared to SP2/0 cells was generated in Fig. 3B. For example, glycosphingolipid biosynthesis, ABC transporters, PPAR signaling, calcium signaling, cell adhesion molecules, mucin-type O-glycan biosynthesis, and secretion associated pathways, were found to be under-represented in both CHO-K1 and M-CHO cell lines compared to the SP2/0 cells, with calcium signaling and pancreatic secretion selected for further analysis. Since the pancreas has the highest protein synthesis rate in mammalian organs23, we were especially interested in looking for the differences between SP2/0 cells, coming from mouse spleen, and CHO cells. The KEGG pathway analysis of calcium signaling and pancreatic secretion in Fig. 4 helped to further elucidate potential functions under-represented in CHO cells. SPHK2, CD38, Slc8a and many other genes involved in calcium signaling were not detected in either deep sequencing transcriptomic or proteomic analysis for both M-CHO and CHO-K1, while these genes were present in proteomics and/or transcriptomics data sets of SP2/0. Calcium signaling is a versatile signaling network affecting a wide range of cellular functions, including gene transcription, cell proliferation, secretion and exocytosis24, and the importance of calcium signaling, both in endocrine and exocrine secretory cells has been previously demonstrated25. Pla2, a calcium-dependent lipase associated with phospholipid remodeling of bio-membranes in many cell types, and MaxiK (large conductance, voltage and calcium sensitive potassium channel), which plays a key role in regulating calcium-sensitive potassium channels for membrane potential and is important to exocytosis, mapped to the SP2/0 but were depleted in M-CHO and CHO-K1. This result is not unexpected since calcium signaling is important to the development and function of B cells26.


KEGG analysis of calcium signaling and pancreatic secretion pathways for (A) M-CHO (B) SP2/0 (C) CHO-K1 cell lines using proteomics and transcriptomics analysis. Orange stands for transcriptome and proteome data, magenta stands for only proteome data, green stands for only mRNA data and yellow stands for neither transcriptome nor proteome data (https://www.kegg.jp/kegg/kegg1.html).
Functional analysis of CHO and SP2/0 cell lines
In order to gain a better understanding of the biological process (BP), molecular function (MF) and cellular component (CC) of the transcriptomic and proteomic profiles of M-CHO and SP2/0 cells, gene ontology (GO) analysis was implemented to identify over-represented (enriched) and under-represented (depleted) categories. The GO-CHO database was used, and enrichment and depletion p-values for MF, BP and CC were found and are listed in Supplementary Tables 9 to 1127,28. The enrichment results of the M-CHO and SP2/0 cell lines are summarized in Fig. 5. DNA and RNA binding, ubiquitin transferase activity and ligase activity were observed among the top 15 enriched molecular functions in both M-CHO and SP2/0 cells. Alternatively, biological processes, such as transport, phosphorylation, and apoptosis were more enriched in SP2/0 cell lines. In terms of depleted biological process, signal transducer activity and G-protein-coupled receptor activity were among the top 15 depleted pathways of both M-CHO and SP2/0 cells, while ion and potassium channel activities and cell to cell signaling were more depleted in M-CHO.


Enrichment results of gene ontology analysis of (A) M-CHO molecular function (B) M-CHO biological process (C) SP2/0 molecular function (D) SP2/0 biological process (E) Secretory granule genes present in SP2/0 and CHO cells (F) Ingenuity Pathway Analysis of secretory granule genes present in SP2/0 cells.
Interestingly, the integral components of membrane, cell surface and plasma membrane terms, and secretory granules were found to be under-represented for the CC analysis in M-CHO cells and enriched in SP2/0 cells. Individual genes representing the secretory granule category were compared between the SP2/0 and CHO cells with the resulting overlap shown in Fig. 5E. The eighteen genes, found only in the SP2/0 cell data, were then subjected to Ingenuity Pathway Analysis (IPA). Interestingly, proteins such as RAB3B, SYT1, SYT9 and RAB11FIP5 involved in secretion of proteins and vesicle exocytosis were found in SP2/0 data but were missing from the M-CHO data, as shown in Fig. 5F and Table S12.
CHO membranome exposure
Both transcriptomics and proteomics data, gene ontology and KEGG pathway analysis revealed that membrane or secretion associated pathways were often depleted in M-CHO or CHO-K1 cells, whereas these pathways were enriched in SP2/0 cells. Membrane biogenesis is known to be enriched in murine cells, but these findings also suggest that this category of proteins may also be low in M-CHO cells29,30. Although M-CHO cells are widely used both for secreted and membrane protein expression, poor expression of membrane proteins has been previously reported31. In order to further examine the presence of key membrane and vesicle proteins in M-CHO cells, we applied three different enrichment methods to explore the M-CHO membranome. Two step ultracentrifugation, cell surface biotinylation and hydrazide chemistry-based glycoproteome enrichment methods were coupled with LC/MS/MS as shown in Fig. 6 to evaluate both membrane and secretory vesicle proteins. While cell surface biotinylation identifies plasma membrane proteins, glycoproteome enrichment identifies proteins traveling through the ER and Golgi apparatus along the secretory pathway. The two step ultracentrifugation technique based on sucrose and NaCO3 treatments allowed for the isolation of the vesicular proteome, exosome and plasma membranome.


CHO membranome analysis with cell surface biotinylation, glycoproteome and two step ultracentrifugation based enrichment methods.
The unique peptide numbers and protein groups for each analysis are summarized in Table 2, and the data for glycoproteome, ultracentrifugation and biotinylation can be found in Supplementary Tables S13, S14 and S15, respectively. The proteins from each isolation were subjected to a variety of bioinformatics tools, including TMHMM32, SignalP28, TargetP33, Phobius34, and WolfPSort35 in order to identify those containing transmembrane domains and/or signal peptides27. Although glycoproteome enrichment provided the highest percentage of either membrane or secreted proteins, the ultracentrifugation-based membrane proteomics technique revealed the highest number (1483) of membrane and/or secreted proteins. For this reason, peptides from the ultracentrifugation enrichment were separated into 48 fractions using bRPLC followed by tandem mass spectrometry analysis. Coupling enrichment technology with the two-dimensional fractionation technique identified 86,646 peptides belonging to 8736 proteins, with an average of 10 peptides per protein. Of these proteins, 2478 were predicted to be on the membrane, based on WolfPsort, TMHMM and Phobius-TM, whereas 2804 were predicted to be secreted, based on WolfPsort, SignalP, TargetP and Phobius-SP (Supplementary Table 16.), while some were predicted to be both membrane and secreted. As a result, approximately 47% (or 4160) of the total proteins identified were predicted to be either membrane and/or secreted. When we combined all the proteins from the cell proteome, glycoproteome, cell surface biotinylation and ultracentrifugation experiments, the number of total proteins increased to 9941, with membrane enrichment work described above identifying an additional 1889 proteins. Furthermore, of these 1889 proteins, 529 were not found in RNAseq data. GO cellular component analysis of these newly elucidated membrane-associated proteins found that 68% of the proteins identified in M-CHO were found to be localized either on the membrane or extracellular space, including important vesicular transport genes such as Ap3b2, A2m, and Srebf2, along with Rab proteins such as Rab33a, Rab40b, Rab19, Rab11fip2. However, when we mapped the newly identified proteins from the membranome to the secretory granule pathway, we were only able to identify BRCA2 out of the 16 secretory granule proteins listed in Supplementary Table 12.
Even after the secondary membrane proteomics experiments, many of proteins and pathways associated with secretory and membrane pathways were still depleted in CHO cells compared to SP2/0 cells, with depletion values listed in Supplementary Table 17. Thus, most of the membrane and vesicle proteins appear to remain in low abundance in CHO cells even after these secondary isolation approaches.

