Considering the previous knowledge that virus infections provoke changes in the structures of biomolecules, in this research, we examined the FTIR spectra of COVID-19 and healthy patients, seeking the discrimination between these two populations through the analysis of FTIR spectra and an MLRM. Although ATR-FTIR is not used as a diagnosis technique, several authors have reported the use of FTIR for virus detection; for example, Erukhimovitch et al. in 2005, stated that it is possible to apply FTIR microscopy as a sensitive and effective assay for the detection of cells infected with various members of the herpes family of viruses and retroviruses23. Furthermore, Lee-Montiel et al., in 2011, evaluated the utility of FTIR spectroscopy for rapid detection of infective virus particles poliovirus in cell cultures24, and Santos et al. in 2020, reported several spectral features changes for hepatitis infected patients17. Additionally, nowadays, Banerjee et al. developed a predictive algorithm for COVID-19 disease stratification into severe and non-severe COVID-19 through ATR-FTIR spectra25. Therefore, in the search to propose new techniques that allow detecting the SARS-CoV-2 virus, FTIR spectroscopy has been considered in this research.
In the analysis of the COVID-19 population characteristics, although Peckham et al. have demonstrated that there is no difference in the proportion of males and females infected with SARS-CoV-226, in this research, we documented that 160 (62.7%) men integrated the COVID-19 population, and 95 (37.3%) women, this probably due to the samples were obtained from hospitalized patients. Furthermore, the same authors declared that males face higher odds of intensive therapy unit (ITU) admission and death than females.
About the age, even though Hu et al. reported that it appears that all ages of the population are susceptible to SARS-CoV-2 infection, the median age of infection is around 50 years27, which was also observed in this research, once the average age was 54.3 ± 14.7 years.
As previously mentioned, concerning vital signs in the COVID-19 group, the only altered vital sign was the SaO2, showing a mean of 90%. Nevertheless, it is mandatory to remember that these patients were hospitalized, being one of the main criteria for hospitalization besides the evidence of pulmonary affection through computed tomography the low PO2, which entails a low SaO2. Furthermore, Hu et al. have reported that the most common symptoms in COVID-19 patients are fever, dry cough, and fatigue in patients less than 50 years, adding dyspnea in patients over 60 years27. Likewise, we found that this research’s main reported symptoms were cough, dyspnea, headache, and fever.
About comorbidities, as previously mentioned, obesity, diabetes, and hypertension were the most reported entities in this study. Thus, these results agree with Ortiz-Brizuela et al., Berumen et al., and Petrova et al., who declared that the pathologies above are the main risk of COVID-19 infection and hospitalization28,29,30.
Regarding the blood group, even though Zhao et al. have reported that the blood group O is associated with a lower risk for the infection compared with non-O blood groups31, in this research, the main blood type was O, probability due to this blood type is the most common in Mexico32, country where this research took place.
Velavan and Meyer have declared about the laboratory blood tests that CRP, d-dimers, ferritin, cardiac troponin, and IL-6 could be used in risk stratification to predict severe and fatal COVID-19 in the hospitalized patient33. In this study, we observed that the values of neutrophiles, glucose, CRP, LDH, fibrinogen, d-dimer, and ferritin were increased, i.e., the patients that integrated this study presented three of the laboratory risks mentioned by Velavan, probably due to these patients were hospitalized because they required specialized medical attention. As expected, we detected neutrophilia, as it is known the primary function of the neutrophils is clearance of pathogens and debris through phagocytosis, the liberation of neutrophil extracellular traps is needed for viral infection inactivation and restriction of virus replication, been the neutrophils the first cell recruitment in COVID-1934. In addition, hypoxia and hypocapnia are seen in severe COVID-19 cases; Wang et al. reported a median PaO2 of 68 mmHg and a median of PaCO2 of 34 mmHg in 138 COVID-19 patients35, results that are similar to the ones obtained in this research (PaO2 66 mmHg, and PaCO2 31.1 mmHg).
On the other hand, regarding FTIR spectra, the obtained spectra were similar to those reported by Caetano et al., showing characteristics of biological samples16. However, the population evaluated by Caetano et al. was informed to abstain from food and caffeine products for at least two hours before the saliva collection and rinse out their mouths with distilled water. Contrary, in this study, a fasting period of at least 8 h was required, and an exclusion criterion was patients who had brushed or rinsed the oral cavity with mouthwash before sampling.
As previously mentioned, in the FTIR spectra analysis, a slight displacement, as well as a decrease in the absorbance in the regions of amide I and amide II, were exhibited in the COVID-19 group, which may be attributed to a decrease in protein production, which corresponds to that reported by Bojkova et al., who observed a decrease in the expression of proteins, especially those related to cholesterol metabolism in CaCo-2 cells infected by SARS-CoV-236. In the same way, Bouhaddou et al. reported a decrease in the abundance of host proteins and a predominance of viral proteins, which is consistent with the mechanisms reported by other viruses in the inhibition of protein translation of the guest37. Similar to that found in Vero cells infected by herpes viruses, protein synthesis and cellular metabolism decrease in the initial stages of infection consuming cellular metabolites such as nucleotides, amino acids, and cellular enzymes36,37,38. Highlighting that Barauna et al. reported a decrease in the peak related to amide I in saliva combined with inactive SARS-CoV-2 virus compared to saliva without infection39.
In the same way, the peaks at 1240 cm−1 and 1076 cm−1, which are related to phosphorylated molecules, are increased in COVID-19 patients, respect to healthy patients. Bouhaddou et al. reported an increase in phosphorylated proteins with a decrease in protein abundance and hyperphosphorylation of the CK2 and p38 MAPK pathways related to cytokine production37, which is also consistent with that reported by Diamond et al.40. Moreover, Erukhimovitch et al. reported an increase in the peak at 1240 cm−1 in cells infected with the herpes virus38.
About the band at 1030 cm−1 attributed to carbohydrates (including glucose, fructose, and glycogen), it is known that the SARS-CoV-2 spike glycoprotein (S-protein) is occupied by 66 glycosylation sites, each of which can be occupied by up to 10 different glycans (carbohydrates) upon infection. After the attack of viruses in the human body through the respiratory tract, they usually utilize sugar chains (glycans) present on the surface of host cells. Thus, the virus is covered by glycans resistant to mutation through its development process41. In this research, the band correlated to carbohydrates showed a higher expression in the COVID-19 group, probably due to the high concentration of spike glycoprotein.
The region of nucleic acids (1100–850 cm−1) showed a higher expression in the COVID-19 group, probably because SARS-CoV-2 can be detected in more than 95% of saliva samples. Moreover, the virus can be cultured from saliva, which means that the virus is present in this biofluid. Besides, virus detection in saliva has also been used to monitor viral load dynamics over time42.
About the immune response, it has been declared that the combination of IgG and IgM achieves an overall sensitivity of 87.8% and specificity of 98.9% for detecting SARS-CoV-2; nevertheless, the complexity of the humoral response in COVID-19 is not fully elucidated, and the relevance of the SARS-CoV-2 antibody response for the long-term clinical outcome of viral clearance is still lacking. Furthermore, some authors have declared that the reported time to IgM positivity ranges from 5 to 10 days following disease onset, whereas IgG positivity occurs between 13 and 21 days. Moreover, others have stated that the earliest detection of IgM was at 5 days post symptom onset, and the earliest detection of IgG was at 7 days post symptom onset43,44,45.
In the same way, it has been reported that IgA plays an essential role in mucosal immunity, being the most crucial immunoglobulin to fight infectious pathogens in the respiratory system46. Furthermore, it has been stated that salivary testing is the most convenient way to measure IgA, the reason by which it has been used to characterize mucosal immune responses to many viral infections such as SARS, MERS, influenza, HIV, and RSV. Serum IgA has been detected in COVID-19 patients and appears to be detectable earlier than IgM or IgG antibodies, possibly as early as two days after onset of symptoms, suggesting that IgA may be the first antibody to appear in response to SARS-CoV-2 infection47. In this research, changes in absorbance in the areas related to IgG (1560–1464 cm−1), IgM (1420–1289 cm−1, 1160–1028 cm−1), and IgA (1285–1237 cm−1) were observed, noticing a higher absorbance in the spectra of COVID-19 group, which is concordant with all those mentioned above.
On the other hand, the second derivative spectra of COVID-19 patients in the amide I proteins region (1700–1600 cm−1) showed an absorbance decrease and a displacement, suggesting changes in the protein structures and less content of these secondary structures. According to Usoltsev et al., who studied the secondary structure changes in human serum albumin under various denaturation conditions reported that the ranges herein described are attributed to β-turns (1689–1660 cm−1), α-helices (1660–1650 cm−1), β-sheets (1639–1620 cm−1), and intermolecular β-sheets (1619–1610 cm−1) in the secondary structure of the human serum albumin22. This protein has been studied as a key in COVID-19 clinical evolution; Viana-Llamas et al. have declared that hypoalbuminemia is a predictor of mortality. Hypoalbuminemia is associated with AN inflammatory response in critical illness; this due to the cytokines and chemokines released induce an increase in capillary leakage, altering the distribution of albumin between intravascular and extravascular compartments. Viana-Llamas et al. reported that in COVID-19 patients, serum albumin concentration media at the moment of hospital admission was 34.4 ± 4.0 g/L, and in deceased patients 32.3 ± 4.1 g/L48, which is concordant to the results reported in this research once the second derivate analysis of the bands associated to albumin showed a decrement in the COVID-19 group. Moreover, the laboratory blood test reported hypoalbuminemia (3.3 g/dL) at the hospital admission moment, remembering that the saliva sample was collected in the first 3 weeks of hospitalization, that is, probably if the blood sample had been taken at the same time as the saliva, the albumin concentration could have been lower.
Furthermore, Diamond et al. declared a decrease in the expression of the mRNA of ACE2 and IL-6 in saliva samples, which would correspond to the decrease in the secondary structures reported by Meirson et al., who through a bioinformatic analysis described that the main secondary structure between the union of SARS-CoV-2 and ACE is the к-helix structure (polyproline II), followed by the α-helix and β-strand, changing the disulfide bonds40,49. Moreover, Giubertoni et al. assigned the peak at 1619 ± 2 cm−1 as helical conformation and 1659 ± 2 cm−1 as α-helix, which are also diminished herein50.
As expected, the immunoglobulins content showed that the COVID-19 group expressed a higher IgA, IgM, and IgG content than the healthy group. Moreover, when comparing the expression of these in the COVID-19 group, it can be observed that the IgA was the least immunoglobin expressed, followed by the IgM; being the IgG the most expressed immunoglobulin, which may be attributed to that most of the samples were collected at day 9.24 after PCR diagnosis, and according to the aforementioned the IgM is detected 5 days post symptom onset. On the other hand, the earliest detection of IgG is at 7 days post symptom onset. Nevertheless, some samples were obtained on the first day of symptoms so that, IgA was detected in this population.
When comparing DNA and nucleic acids content, the COVID-19 group showed a higher content of these molecules. Besides aforementioned about the presence of the virus in saliva, Paolini et al. have stated that SARS-CoV-2 promotes cell death51, and Zelig et al. have declared that in necrotic cell death, the DNA is completely unwound, the reason by which 100% of the DNA is visible to IR at this stage, observing an increase of ∼ 65% in DNA absorbance in necrosis compared to the control. Moreover, they also reported a decrease in the random coil structure of the total protein52, similarly to the COVID-19 group of this research. In addition, it also agrees to the results observed in the second derivative of proteins, where a decreased absorption at the range of random coil in the COVID-19 group is observed, as well as an increment of the bands related to nucleic acids. Moreover, Wood et al. reported that the band at 1078 cm−1 among others allowed to distinguish the positive from the negative COVID-19 samples53; this band was found at 1076 cm−1 in this research, showing a higher absorbance in the COVID-19 group.
On the other hand, as previously mentioned, the characterization of two or more populations from the analysis of the FTIR spectra of their individuals is not an easy task; in a more complex sample, it will be more complicated to find characteristic patterns of the population. This because the links of the different components could overlap with the characteristic component links of each sample. Moreover, the nature of the samples (fluid or tissues, cells, among others) has its particularities.
Different methodologies have been proposed to identify populations from the analysis of FTIR spectra, facilitating the adoption of a classification method by allowing experimentation to focus only on the most promising. In this sense, in another work, we first experimented with linear classification models to discriminate COVID-19 patients, although these models were affected by the overlap of the spectra due to the variances of the absorbances/transmittances of the populations; this problem can be overcome by having a large population thanks to the central limit theorem. In this work, we discriminated against our groups employing an MLRM, which was validated employing a LOOCV according to our previous research.
The absorbance variations and principally the peak displacement associated with viral infections shown in Fig. 2A,B contributed to the excellent performance of MLRM. As we note in (1), the slope performs an essential role in MLRM models because a displacement in any peak means that one population has reached its maximum absorbance level while the other continues growing, so its sign is the opposite. Thus, our results presented in Fig. 6 suggest that the best region to identify possible virus carriers is the amide I of proteins (1700–1600 cm−1) to compact the outputs between the predictions of the same populations and the separation to the other one.
Enthought the spectra analysis allowed us to detect the molecular components that characterize a positive patient to SARS-CoV-2, and the data analysis through MLRM let us discriminate these patients from healthy persons, more assays need to be done, one of them should consider the time elapsed from the symptoms to the diagnosis and categorize this population. Another one should consider the diagnosis corroboration through the serological test (IgA, IgM, and IgG), correlating these results with the FTIR spectra.
Herein, we are proposing a new diagnosis strategy that could be used as screening due to its low cost; once this technique does not require consumables, recognizing as gold-standard diagnosis the RT-PCR. Nevertheless, there are large discrepancies about RT-PCR effectiveness. For example, Hellewell et al. have declared that the probability of a positive PCR test is 77% by four days after infection, decreasing to 50% by ten days after infection, reaching 0% by 30 days after infection, being the day 1–3 when the probability of detecting increase54. On the other hand, Jarrom et al. estimated a sensitivity of 87.8%55. However, we reached a sensibility of 99.2% and specificity of 100% in the amide I region, even though more studies need to be done in a more significant population.

