Retrieval of anti-viral AMPs (VAP-AMPs) and profile creation using HMM
Literature mining uncovered 176 experimentally validated anti-viral pneumonia antimicrobial peptides (VAP-AMPs) in total for the CAMP, APD3, and AVPDB databases against the microbes Respiratory Syncytial Virus, Influenza A, and B in the order 112, 52 and 12, respectively. The initial phase in the profile construction pipeline was the random grouping of the various classes into ¾ and ¼ of the experimentally validated AMPs (Table 1). The ¾ is the training dataset, expected to prepare the HMM software to test whether the functionally significant amino acid consensus is captured. After this, multiple alignments were produced utilizing HMM ClustalW. A total of three AMP profiles was produced for every one of the accompanying classes ((anti-Respiratory syncytial virus (RSVM) and, anti-Influenza A, and B (INFA and INFB).
Independent testing of the created profiles and evaluation of the independent testing results
Each constructed profile was tested against a positive dataset (testing datasets) which was about 25% of the dataset (training datasets in Table 1). Since experimentally validated AMPs were used, the assumption is that the profiles developed ought to have the option to recognize different sequences with precisely the same action and separate those that have no anti-pneumonia activity from the same microorganism. The constructed profiles were examined against a negative control dataset, comprised of random fragments of 17,236 neuropeptides, which had no recorded anti-pneumonia action. This independent testing was carried out with the negative dataset (neuropeptides) to confirm whether the trained profiles would distinguish non-anti-pneumonia peptides.
The independent testing of the profiles was evaluated utilizing the true positive (TP), false-positive (FP), true negative (TN), and false-negative (FN). A cut-off E-value of 0.05 was applied to the HMM tool to fortify the profile’s capacity to separate between the TP anti-pneumonia AMP and the false-negative anti-pneumonia AMPs. TP speaks to effectively anticipated positive sequences (anti-pneumonia AMPs), TN indicates accurately predicted negative groupings (non-anti-pneumonia AMPs), FP (False-positive) is the quantity of non-anti-pneumonia AMPs wrongly anticipated as anti-pneumonia AMPs (AP-AMPs), FN is the number of anti-pneumonia AMPs wrongly anticipated as non-anti-pneumonia AMPs. It was conceivable to ascertain the quantity of TP AMPs from the complete number of input sequences; accordingly, the FP number could be extrapolated with the outcomes that appeared in Table 2, mirroring the limit of each profile to recognize true anti-pneumonia AMPs from false anti-pneumonia AMPs. In Table 2, INFB had all its testing datasets as TP while RSVM had 22 of its 28 testing datasets as TP. Nonetheless, INFA had 6 of its 13 testing datasets as TP, which could be because of an overlap of homologous relationships in the AMPs utilized in their profiles.
Performance measurement of the target-specific profiles
After evaluating the capacity of the tested profiles, the performance was determined to calculate the performance of each profile, utilizing specificity, sensitivity, accuracy, and MCC, presented by organic chemist Brian W. Matthews in 197533. The specificity, sensitivity, accuracy, and MCC were determined as detailed in Table 3.
From the results in Table 3, sensitivity values were high in Anti-Influenza B virus (INFB) and Anti-Respiratory Syncytial Virus (RSVM) of anti-viral profiles tested. The high sensitivity values of INFB and RSVM profiles indicated the right prediction. The moderate sensitivity of INFA could be ascribed to the huge overlap in the conserved space of the AMPs utilized for their profile development17. The specificity results for all profiles were 100%, indicating a correct prediction. The accuracy results of the profiles showed a correct prediction with the elimination of mistakes by invalidating misclassified AMPs from both positive and negative datasets. MCC values for all the profiles indicated huge outcomes, with the most minimal value recorded for Anti-Influenza A virus (INFA) (0.50). The MCC value of 0.5 to 1 relates to the ideal expectation, while ‘0’ points to an irregular prediction. Hence all profiles showed right expectation (INFB > RSVM > INFA). The MCC is considered to give the best performance estimation of models since it joins sensitivity, specificity, and accuracy33.
Proteome sequence databases query and discovery of putative anti-pneumonia AMPs
The discovery stage (Table 4) was to look for novel anti-viral pneumonia AMPs for the pneumonia pathogens (Influenza A, B just as Respiratory Syncytial Virus) in order to recognize peptides that had similar signatures/motifs and properties as the input sequences used to assemble the profiles RSVM, INFA, and INFB. The matches of the separate profiles to the proteome sequences additionally appeared with E-values (Table 4) of 0.05 to discover putative AMPs. The final list of anti-viral AMPs was arranged by their E-values, with those having the smallest E-values described as the most probable putative anti-viral pneumonia AMPs.
Physicochemical properties of the AMPs
The physicochemical parameters of the putative AMPs were determined using APD3 and BACTIBASE to ascertain that the AMP sequences conform to other known AMPs. Physicochemical parameters, for example, atomic weight amino acid components, hydrophobicity, Boman index, net charge, isoelectric potential, and half-life, were utilized to assess the anti-viral AMPs (Table 5). The amino acid composition of the AMPs adds to the molecular weight since the AMPs are comprised of amino acids and can be a distinctive component to separate between two classes of protein/eptides34. Aside from this, the anti-viral pneumonia AMPs likewise have common amino acids that could recognize them from each other. BOPAM-INFA1, 6, and 8 had proline; BOPAM-INFA2 had proline, valine, isoleucine, and leucine; BOPAM-INFA3 had threonine; BOPAM-INFA4-5 had proline and glutamine; while BOPAM-INFA7 had proline, isoleucine, serine, and valine. BOPAM-INFB1 had valine and proline; BOPAM-INFB2-3 had asparagine and leucine; BOPAM-INFB4-5 had leucine; while BOPAM-INFB6 had asparagine and leucine. BOPAM-RSV1 had isoleucine and lysine; BOPAM-RSV2 had serine; BOPAM-RSV3 and 12 had isoleucine and asparagine; BOPAM-RSV4, 5, 7, 8, 9, 10, 11 had asparagine; BOPAM-RSV6 had isoleucine while BOPAM-RSV13 had aspartate. The anti-pneumonia AMPs such as BOPAM-INFA1, 3, 4, 5, 6, and 8, BOPAM-RSV9 had hydrophobicity less than 30% due to the presence of more polar amino acid residues. All the anti-viral peptides such as BOPAM-INFB3, 6, BOPAM-INFA1, 2, 3, 4, 6, 7, 8, BOPAM-RSV2, 3, 4, 6, 7, 8, 10, 11, 12, 13 were predominantly neutral or negative. Cationic AMPs are said to be positively connected with expanded antimicrobial activities35. Nonetheless, the absence of the positive charge in the net charge of anti-viral AMPs does not interpret an absence of antimicrobial activities since some negatively charged AMPs have recently been discovered, for example, surfactant associated anionic peptides in the APD3 database (AP00528) with a net charge of − 5 which has antibacterial activity and maximin H5 with charge ranging between − 1 and − 7 which has bacterial growth inhibition against Listeria monocytogenes33. Anti-viral pneumonia AMPs pI range from 3.85 to 12.50 shows solubility properties for the AMPs regardless of the difference in charges of acid and alkaline media36. Isoelectric potential (pI) of peptides is an element of individual amino acids in the backbone groups. At a pH beneath the pI, AMPs convey a net positive charge and vice versa. The outcomes of the Boman index demonstrated negative values for BOPAM-INFA2, 7, and BOPAM-INFB4, and 5. A negative Boman index is said to be positively correlated with a more hydrophobic peptide, showing a high protein binding potential, while a more hydrophilic peptide will, in general, have a more positive index37. Notwithstanding, specific peptides’ inclination to have a positive Boman index has been accounted for with the capacity to distinguish HIV in a lateral flow device13. Anti-viral pneumonia AMPs had BOPAM-INFA1, 3, 4, 5, 6, 8 with a half-life of 7.2 h, BOPAM-INFA2, and 7 had a half-life of 1.2. BOPAM-INFB1, 2 had a half-life of 30 h, BOPAM-INFB3-6 had a half-life of 5.5 h. BOPAM-RSV1 had a half-life of 20 h, and BOPAM-RSV10 had a half-life of 100 h; all other BOPAM-RSV had a half-life range between 1 and 4.5 h. AMPs have been said to generally exhibit a short half-life because they are not stable. Half-life values as low as 1 h have been reported for AMP molecules used for HIV diagnosis13.
Retrieval of protein receptors of pneumonia pathogens
This stage was carried out to assess the diagnostic potential of some immunogenic proteins of viral pneumonia to serve as targets for the putative antimicrobial peptides to determine these microbes. For example, a few pneumonia proteins, cell surface receptors, and nucleoproteins were analyzed for the viruses: Influenza A, Influenza B viruses, Respiratory Syncytial virus. These recovered protein receptors were projected to be potentially applicable in the diagnosis of viral pneumonia associated with these viruses. The Respiratory syncytial virus has some immunogenic receptors that have potential diagnostic pertinence, such as membrane fusion core protein chains38. The virus has Human RSV fusion protein core chain A with molecular weight 4869.38 Da, isoelectric point 4.38, hydrophobicity 39.53%, charge—4, instability index 49.41, and half-life of 1 h in mammals. Influenza A virus has some protein receptors of potential importance in its detection. It has 416a monomeric nucleoprotein with molecular weight 56,297.78 Da, isoelectric point 9.45, hydrophobicity 29.52%, charge + 12, instability index 36.35, and half-life of 30 h in mammals. Influenza B virus receptor proteins of diagnostic potential were recognized and investigated. Influenza B virus has nucleoprotein with molecular weight 61,644.09 Da, isoelectric point 9.43, hydrophobicity 31.61%, charge + 18, instability index 39.98 half-lives of 30 h in mammals (Table 6). Instability index, molecular weight, and half-life are a function of how stable a protein can be, and any protein with an instability index lower than or equal to 40 is said to be stable; hydrophobicity enhances protein binding to ligands; while the net charge determines the behavior of the proteins in acidic or alkaline solution with all proteins having a net zero charge at the isoelectric point39.
Structure prediction of the putative anti-pneumonia AMPs and Pneumonia protein receptors
Representative figures from the I-TASSER server after predicting the 3-D structures of the anti-pneumonia AMPs (ligands) and the protein receptors are shown in Fig. 1. The results demonstrate that all AMPs predicted showed different secondary structures, including α-helices, parallel β-sheet, anti-parallel β-sheet, extended, and loop conformational structures.


3-D structures of the AMPs and pneumonia protein as determined by I-TASSER (https://zhanggroup.org/I-TASSER/)29 and visualized using PyMOL version 1.330. 3D Structure of (a) Respiratory syncytial virus X fusion core protein (b1) alpha-helical AMP, (b2) beta-sheet AMP, (b3) extended sheet AMP.
For structure prediction assessment utilizing I-TASSER (Table 7), a few parameters, for example, Confidence score (C-score), Template modeling score (TM-score), and Root Mean Square Deviation (RMSD), were utilized for the prediction of the putative AMPs and pneumonia protein receptor 3-D structures. The results demonstrated that the C-score of all the anticipated 3-D structures for the anti-viral pneumonia AMPs and the pneumonia receptor proteins were between the estimations of − 5 to 2 (see Table7), which suggests an existing template by I TASSER for their structure prediction40. The determined C-score of BOPAM-RSV11 was lower than that of the other AMPs and could show that this molecule had no accessible template for prediction by I-TASSER but was not a random prediction29. TM-score has of late been proposed for estimating the structural compatibility between two structures41. A TM-score > 0.5 shows a model of right topology, and a TM-score < 0.17 implies irregular compatibility. From the results, the TM-score of the predicted structures of the AMPs and protein receptors was higher than the cut-off value of 0.5. This signifies that these structures had a correct topology with structural similarity to the templates that were used to predict their structures29,41. Although there is no defined RMSD value for 3-D structure prediction, an RMSD value of 2–4 Å is considered good, and an RMSD ≤ 1 Å is considered ideal. Thus, all anti-viral pneumonia AMPs and the receptor proteins having RMSD within the accepted range (Table 7) had less distance and the atomic deviation between the peptides and the templates used for their 3-D structure prediction42,43.
RMSD is sensitive to local error since it is an average distance of all residue sets in two structures, hence the for proposing TM-score. For example, a misorientation of the structure will increase the RMSD value even though the global topology of the structure is right. TM-score is not sensitive to misorientation in the region of the residues, which makes the score insensitive toward the local modelling mistake and, in this manner, a more reliable measure.
Docking interaction analysis of the putative anti-pneumonia amps with viral pneumonia receptors
The output figures from the PATCHDOCK and HDock servers after predicting the docking interaction between the anti-pneumonia AMPs (ligands) and the protein receptors were analyzed (Fig. 2). The spatial docking interaction analysis indicated that all the AMPs bound firmly to their proteins. Also, the computational investigation was done to affirm the AMPs with the most binding potential. These amino acid residues partook in the complex formation and towards which terminal of the proteins the binding occurs. Among the anti-Influenza A AMPs, only BOPAM-INFA1 bound at a different orientation to the nucleoprotein receptor. In contrast, BOPAM-INFB4 bound differently to the influenza B nucleoprotein when compared to other anti-Influenza B AMPs. All anti-Respiratory syncytial virus AMPs are bound on the same chain A fusion protein orientation except BOPAM-RSV2, 6, and 9.


Docking interaction of the pneumonia protein receptors and putative anti-pneumonia AMPs produced from PATCHDOCK (https://bioinfo3d.cs.tau.ac.il/PatchDock/php.php)32 and visualized using PyMOL version 1.330.. The structure in colour blue represents the receptor proteins with the ligand depicted in red with their orientations in their respective receptor.
BOPAM-RSVs bound more firmly to chain A protein with the highest binding geometry score noticed for BOPAM-RSV4. In a similar vein, the BOPAM-INFAs bound more firmly to nucleoprotein with the most binding geometry score noticed for BOPAM-INFA4. Also, in Table 8, BOPAM-INFA5, BOPAM-INFB4, and BOPAM-RSV4 have the highest area scores of 1601.80, 1740.90, and 1244.20, respectively, which denote the approximate interface area of their complexes to their respective receptors. It is also observed that BOPAM-INFA7, BOPAM-INFB6, and BOPAM-RSV8 have the lowest ACE scores of − 474.03, − 259.94, and − 368.59, which is the desolvation free energy needed for the ligand to shift atoms from water to the interior of the protein receptors44.
The putative anti-influenza A AMPs displayed a high docking energy score using HDock, with BOPAM-INFA8 showing the highest energy − 199 kJ/mol. Similarly, all anti-influenza B AMPs displayed high binding energy to their receptors, with BOPAM-INFB2 having the highest docking energy score. Anti-respiratory syncytial virus AMPs showed high energy docking energy scores, with BOPAM-RSV4 and 3 having the highest docking energy scores to the receptor protein. The root-mean-square values are also generated from the HDock server as indicated in Table 9 alongside the hotspot interacting residues of the anti-viral pneumonia AMPs and their respective receptor proteins. The result from the HDock server shows consistency when compared to the PatchDock server.

