Predicted structure of PKCι and post translational modifications
Three dimensional (3D) structures of the protein were predicted via I-TASSER6 which is the most advance and reliable tool. It predicts protein structure based on multiple threading approach. Model 1 with a c-score of − 2.30 was selected7. The protein structure was predicted from I-TASSER because the complete structure with all important domains was not found in the protein structure data bank. The structure was then validated via INTERPRO8 and different domains of the protein were identified (Fig. 1a). Domains of the protein were highlighted through PYMOL. The protein is found to have 596 amino acids with four important domains: PB1 domain, C1 domain (Pseudo substrate domain), protein kinase domain containing the active site of the protein and AGC kinase domain. Alignment of PKCι structure was performed through PYMOL. Protein kinase domain of PKCι was aligned with crystal structure of kinase domain (38AX:ID from protein data base). An RMSD score of 0.944 was obtained. C1 and kinase domain of predicted structure were then aligned against solution structure of PKC-theta (1XJD, C1 and kinase domain). An RMSD score of 0.88 indicates that the structures are well aligned (Supplementary file 4, Fig. 1a,b).


(a) Predicted Structure of Protein PKCι; It contains 4 domains. Red color represents PB1 domain (2–108 AA), C1 domain (123–192 AA) is highlighted in blue color, pink color shows Protein kinase domain (254–522) while dark grey color represents AGC kinase domain (523–596). (b) PKCɩ structure with predicted post translational modification sites distributed through its domains. Yellow pentagon shape is the representation of phosphorylation sites, red triangle is depicting ubiquitination sites, while blue circle is for methylation and acetylation is illustrated by brown oval shape.
Protein kinase domain is predicted to have maximum number of post translational modifications 17 phosphorylation sites (yellow pentagon), 6 ubiquitination sites (red triangles) and 1 acetylation (brown oval). AGC kinase and PB1 domain hosted only phosphorylation sites that were five and eleven in number respectively, while in C1 domain a total of three phosphorylation, four ubiquitination and two acetylation sites are observed. One methylation, one phosphorylation and a few phosphorylation sites were noticed in region 193–254 that comes in the regulatory domain (Fig. 1b) (Supplementary file 2; Data Tables 3, 4 and 5).
Identification of variants in PKCɩ and calculation of %SNP effect
A total of 1317 SNPs of PKCɩ were collected from ENSEMBL data base (Fig. 2a). ENSEMBL data consists of variant information, protein functional annotations, disease association, and sequence data. The coding SNPs are found across 596 amino acid residues in PKCɩ. Only missense SNPs (301) were selected for the further analysis, because mostly missense variants are found to be associated with diseases. A frequency of non-sense variants is very less as compared to missense variants and are concentrated in the Protein kinase and AGC kinase domain (Fig. 2b).


(a) Count of synonymous, stop gained, missense, frame shift, 5 and 3 prime UTR variants included in the study, (b) Mutational allocation of missense (green) and nonsense (blue) variation across protein co-ordinates of PKCɩ, (c) Distribution of missense (blue) and nonsense (orange) variations across 18 exons of PKCɩ, (d) Heat map drawn for missense SNPs (i), non-sense variations are illustrated in row (ii), while (iii) is the exons over which SNPs distribution are highlighted. Row (iv) is the null set of SNPs where probability of occurrence of SNPs is not high.
Exon wise relative abundance analysis of coding SNPs illustrated that exon one has the highest number of mutations (thirty-two in number), all of which are missense SNPs. Exon one encodes the PB1 domain of the protein. The lowest number of variations are displayed by fifteen exons containing a total of eight SNPs out of which seven are missense and one is non-sense. Exons that encode PB1 and C1 domain contained the highest number of variations. AGC domain has the lowest number of variations (Fig. 2c,d).
Deleterious SNPs in PKCι
The missense SNPs were analyzed on seven tools SIFT (≤ 0.05), POLYPHEN (> 0.9), REVEL (> 0.5), Mutation Accessor (≥ 0.8 = Medium, > 1.9 = high), CADD (≥ 30), MetaLR (> 0.5), and PROVEAN (≤ − 2.5), the cut off criterion for deleteriousness is shown in parenthesis (Supplementary file 1). The percentage SNP effect of missense variations were determined in each residue of the protein (Fig. 3a). The total number of identified SNPs varied in all the tools. SNP was considered as deleterious if pathogenicity was confirmed by > 75% tools (Table 1). After final scrutiny of deleterious and non-deleterious nine missense SNPs were identified to be highly deleterious with two SNPs residing in the PB1 domain, five in the C1 domain, one in the AGC domain while one in the protein kinase domain (Table 2) (Fig. 3b). The highest number of deleterious SNPs are found in C1 and pseudo kinase domain. This region is encoded by four, five, six, seven and eight exons. This region of the protein can be regarded as mutationally sensitive region of the protein. In atypical type of PKCs C1 domain is not responsive to DAG, it is catalytically activated by phospholipids that releases it from its membrane bound state to perform activities like wound healing, chemotaxis, and migration. The C1 domain can be targeted for drug development along with its hinge region9.


(a) Summary of percentage SNP effect of missense variations in each residue of the protein. (b) PKCɩ protein structure with domain wise distributed variants; PB1 domain (G34W, F66Y), C1 domain (R127K, R130C, R130H), Kinase domain of the protein (G398S) and AGC kinase domain (G581V). (c) Stability change DDG values of SNPs depicting destabilizing effect of selected SNPs.
Stability changes in mutant structures of protein
The effect of change in the stability of protein was predicted for nine selected SNPs. The stability prediction is based on DDG (delta delta G) values, it is metric of prediction of how single nucleotide polymorphism can affect the stability of a protein. It is the change in Gibbs free energy. A DDG score less than zero indicates lower stability. G34W, R127k, R130C, Y169H, and G398S are found to decrease the stability and F66Y, R130H, G165E, and G581V were found to destabilize the protein (Fig. 3c) (Supplementary file 3; Data Tables 1 and 2). Destabilization of a protein structure can alter its functional dynamics and can affect the normal pathways of the protein.
Functional and physio-chemical analysis of selected SNPs
Project HOPE predicts the effects of Amino acid substitutions on the structural confirmations and functions of the protein. Project HOPE reveals some structural and functional changes in the protein because of these mutations that cause heritable diseases. Protein sequence and mutation are inserted as input query for HOPE. In case of 6 SNPs, the resultant residues are bigger than the wild one while in three SNPs the size has been reduced. This change can affect the overall structure and function of the protein. Most of the SNPs are in the regulatory region of the protein, making it mutationally sensitive region and affecting the regulatory function of the protein. In case of R130C, R130H and G165E the charge of the residue is also affected, changing from positive to neutral in case of R130C and R130H (Tables 3 and 4).
The most evolutionary conversed domain
According to ConSurf results for the PKCɩ, the protein kinase domain is found to be evolutionary more conserved with a greater number of conserved amino acid residues. Literature also suggests that protein kinase domain is most conserved domain in PKC family members10. Few residues in the hinge region are conserved but its most residues were are variable. PB1 domain and pseudo substrate domain are found to be least conserved with very less evolutionary conserved residues (Fig. 4). Mutations in the conserved region of the protein are expected to be more damaging as compared to those in the less conserved region. Surface accessibility analysis gives an insight into the structure and function of Amino acid. The buried residues usually play a role in maintaining the structural integrity while the exposed residues are important for the protein- protein interactions. The SNPs G34W, R127K, R130C, R130H and G581V were found in exposed confirmations while F66Y, G165E, Y169H and G398S were found buried in the structure.


Protein conservation and surface accessibility analysis performed for 596 residues of protein PKCɩ on ConSurf tool.
Flexibility analysis of wild and mutant protein
The change in flexibility of the protein caused by nine SNPs was analyzed by DynaMut. The stabilizing effect was analyzed through ENcom values. According to the ENCom, values all of them are found to have destabilizing effect on the protein structure and function. Five variations (G34W, R127K, R130C, R130H, and Y169H) had increased molecular flexibility that was caused by increased vibrational entropy. Four variations (F66Y, G165E, G398S, G581V) were within the cut off value > 0.5 with decreased molecular flexibility. None of them has increased molecular rigidity. This change in the overall flexibility of the protein can affect the intramolecular interactions of the protein. A comparison of intramolecular interactions of wild and mutated structures is done in Fig. 5.


A comparison of molecular flexibility and destabilizing effect of mutants along with interatomic interactions in wild type and mutants by DynaMut tool.
Molecular dynamic simulations
Molecular dynamic simulations for nine SNPs were performed for 20 ns to get an insight into the confirmational changes in the protein structure due to missense mutations. The time scale of 20 ns is enough for the rearrangements of side chain in the wild structure and various parameters such as RMSF, RMSD, Radius of gyration, total number of intra-molecular hydrogen bonds, and SASA. Domain wise comparison of changes in mutants with wild structure was performed.
Molecular dynamic simulation analysis in PB1 domain of the protein
Two variants (G34W and F66Y) occupy the PB1 domain. The compactness of protein and mutants was examined by the radius of gyration. Wild protein has radius of gyration around 2.8 nm while highest gyration value of 2.87 nm is shown by F66Y at 3 ns. It is illustrated by the data that these structural destabilizations can lead to the loss of compactness to the protein structure as compared to the wild type PKCι (Fig. 6a). In wildtype protein as well in mutants, the total number of intramolecular hydrogen bonds contributes to the stability of the structure. Lowest number of hydrogen bonds are observed in F66Y around 310, followed by G34W having a mean of 320 H-bonds, with wild structure having around 400 bonds. The data suggest lower flexibility in structure with F66Y and G34W mutations (Fig. 6b).


Simulation analysis of the missense SNPs in PB1 domain (G34W, F66Y) (a) Radius of gyration of the protein back bone (compactness of protein), (b) Total number of Hydrogen bonds throughout simulations of wild and mutant structure, (c) RMSF values of Carbon alpha in the simulation, (d) RMSD values of Cα atoms of wild and mutants, (e) Solvent accessible surface of wild and variants.
For each residue of wild type and mutated protein fluctuations in RMSF were monitored to check the effect of mutation on dynamic behavior of protein residues. It is known from Fig. 6c that in G34W and F66Y the residue level fluctuations are quite high as compared to wild structure and other mutations. The wild protein has the highest fluctuation of 0.9 nm in residue 461. G34W has the highest fluctuation value of 1.4 nm in residue 576 while F66Y had the highest value of 0.9 nm in residue 580 residues (Fig. 6c). The effect of mutations on the structure of PKCɩ was analyzed by RMSD values. It is revealed from RMSD values that mutant structures are significantly unstable as compared to the wild structure (Fig. 6d).
It was showed that F66Y has higher SASA values followed by G34W. Both values are greater than wild structure. A higher SASA value indicates expansion of a protein, the results indicate that mutants are more unstable as compared to wild protein with F66Y being more unstable than G34W (Fig. 6e).
Molecular dynamic simulations analysis of (C1 and pseudo substrate) regulatory domain of protein
A total of 5 SNPs is observed in the C1 and pseudo substrate region. This region with PB1 and C1 domain make the regulatory region of the protein. In mutant R130C the radius of gyration has significantly reduced as compared to wild and other mutants, indicating a major change in the backbone of the protein structure, and altered compactness of the protein. Radius of gyration of other mutants is also changed (Fig. 7a). Maximum number of intramolecular hydrogen bonds in wild structure are around 400 while in mutants the number has been reduced. In Y169H lowest number of hydrogen bonds (an average of 360) are seen during 1–4 ns duration depicting decreased flexibility in its structure. Minor fluctuations in number of hydrogen bonds of other mutants are also observed (Fig. 7b).


Simulation analysis of the missense SNPs in C1 domain (R127K. R130K, R130C, G165E, R130H, Y169H), (a) Radius of gyration of the protein back bone, (b) Total number of Hydrogen bonds throughout simulations of wild and mutant structure, (c) RMSF values of Carbon alpha in the simulation, (d) RMSD values of Cα atoms of wild and mutants, (e) Solvent accessible surface of wild and variants.
Root mean square fluctuation (RMSF) values for each residue of native and mutant protein was examined. R127k had the highest RMSF of 1.3 nm at residues 561–596, followed by another fluctuation of 1 nm at residues 1–4. Y169H has the maximum fluctuation value of 0.7 nm from 441 to 481aminoacid residues. 0.9 nm fluctuation was recorded for R130C from 561 till last residues. G165E has a fluctuation of 0.5 nm from 161 to 201 residues. Maximum fluctuation of 0.6 nm was recorded for R130 from 281 to 290 residues (Fig. 7c). The effect of mutations on the structure of PKCι was analyzed by RMSD values. RMSD values showed that mutant structures are significantly unstable as compared to the wild structure (Fig. 7d). Maximum RMSD values were recorded for R130H, followed by G165E and then R130C. The difference between wild and R127K RMSD is not significant. It is demonstrated from the figure that mutation has considerable effect on the structure of PKCɩ (Fig. 7d).
From analysis solvent accessible surface area (SASA) it is exposed that Y169H has higher SASA values followed by G165E. After that R130H is found with higher values, with R127K values close to the wild structure. All mutant values are greater than the wild structure. A higher SASA value indicates expansion of a protein, the results indicate that mutants are more unstable as compared to wild protein with Y169H and G165E being more unstable than wild structure and other mutants (Fig. 7e).
Molecular dynamic simulation analysis of protein kinase domain of protein
SNP G398S is in the protein kinase domain of PKCɩ. This domain is the most conserved domain of the family. The SNP is observed to cause alterations to the protein. The compactness of the protein is predicted to be majorly affected by this mutation (Fig. 8a). Radius of gyration has reached to a maximum of 3 nm during 20 ns duration. This is a huge increase as compared to the gyration values of wild protein (Fig. 8b). Overall number of hydrogen bonds in G398S have been reduced when seen in comparison to wild structure, depicting a decreased flexibility of structure (Fig. 8c).


Simulation analysis of the missense SNP in Protein kinase domain (G398S), (a) Radius of gyration of the protein back bone, (b) Total number of Hydrogen bonds throughout simulations of wild and mutant structure, (c) RMSF values of Carbon alpha in the simulation, (d) RMSD values of Cα atoms of wild and mutants, (e) Solvent accessible surface of wild and variants.
In the root mean square fluctuation values, a noticeable change at each domain was observed. The highest RMSF peak of G398S was observed at 0.9 nm in residue 200 of the protein. Overall RMSF values of mutant were noticed to be higher as compared to wild protein. Root mean square deviation values were compared for wild and G398S mutant, a major change in stability was under observation depicting a highly unstable state of protein (Fig. 8d). From analysis solvent accessible surface area (SASA) it was illustrated that mutant G398S has higher SASA values than the wild structure. Indicating that the mutant is unstable as compared to wild protein (Fig. 8e).
Molecular dynamic simulation analysis of AGC kinase domain of protein
Radius of gyration for mutant G581V is higher than the native protein and is increasing with the passage of time predicting a decrease in flexibility of the mutant structure (Fig. 9a). From hydrogen bond analysis wild structure was found to form more bonds than the mutant. The stability of mutant is therefore affected by fewer number of hydrogen bonds (Fig. 9b).


Simulation analysis of the missense SNP in AGC kinase domain (G581V), (a) Radius of gyration of the protein back bone, (b) Total number of Hydrogen bonds throughout simulations of wild and mutant structure, (c) RMSF values of Carbon alpha in the simulation. (d) RMSD values of Cα atoms of wild and mutants, (e) Solvent accessible surface of wild and variants.
A significant difference in the RMSF values of wild and G581V was noticed. The highest peak of wild structure observed was 0.9 nm, while that of mutant was 0.8 nm at residue 500 of the protein (Fig. 9c). Other than that, the mutant peaks are at increasing trend when compared with peaks of wild structure. This imparts significant deviations in both structures. Also, RMSD values of G581V are higher than the wild structure (Fig. 9d). SASA analysis indicated that mutant has greater values than mutant. The reason for this change could be effect of substitution of amino acids by change in size of surface of protein (Fig. 9e).
Association of pathogenic SNPs with cancer
The oncogenicity of selected SNPs were predicted through two tools. The FATHMM results for individual mutations are in the form of functional scores, SNP having score above 1 are considered as deleterious. CScape predicts the SNP as deleterious if the score is above 0.5. From FATHMM results, F66Y, G398S and G581V were predicted to be associated with cancer. CScape predicted all nine variants to be cancer drivers and oncogenic with a score greater than 0.6. Six variants F66Y, R127K, R130H, G165E, Y169H, G398S are categorized as high confidence oncogenic having score above 0.9.This suggests that all nine SNPs specifically F66Y, G398S and G581V can have possible role in protein dysregulation and causation of cancer (Table 5).
Connection of PKCɩ with cancer through Kaplan–Meier plotter
The effect of expression of PKCι on survival of cancer types like breast cancer, ovarian cancer, lung cancer and gastric cancer was determined through Kaplan–Meier Plotter. The red line is depicting the survival period of cancer patients having high expression levels of PKCι, while the black line illustrates survival period of cancer patients with low expression levels of the protein (Fig. 10). This representation is in the form of Kaplan–Meier curve that shows probability of survival of patients at a certain time period.


Probability survival curves for (a) Breast cancer, (b) Ovarian Cancer, (c) lung cancer, (d) Gastric cancer based on high and low expression of PKCɩ (Red line indicates higher expression of PKCɩ, while black line shows expression below the median line).
Kaplan–Meier plotter analysis revealed that high and low expression of PKCɩ was found to have no significant link on survival of breast cancer, lung cancer, gastric cancer and Ovarian cancer patients. (Fig. 10a–d).
Authentication of results through control study
For validation of our results, we performed a control analysis of the SNP, K274R which is proven as non-deleterious experimentally11. The score from Sift, Polyphen-2, CADD, MetaLR, PROVEAN, Mutation Assessor and REVEL predicted the SNP as non-deleterious, proving that these tools have a good accuracy level. The stability assessment of K274R was done through I-mutant, SDM, mCSM, MuPro and Dynamut. Except I-mutant & mCSM through which SNP is predicted as destabilizing, the other three tools prove it as stabilizing to the protein structure and function. Hope analysis also revealed that the SNP is possibly not damaging to the protein. Fathmm, and CScape both predicted the SNP to be benign. These results illustrated that all these tools have some accuracy level and can be used for filtration of deleterious SNPs that are to be tested experimentally (Supplementary file 3, Tables 1, 2, 3 and 4).

