Prediction of coating thickness for polyelectrolyte multilayers via machine learning

Data collection

The first step of the work consisted in data collection from the literature, which represents the first dataset. It should be stated that currently in the literature no established database is available related to LbL coating thickness, and the available experimental data is relatively scarce compared to the number of LbL related articles. The second dataset was based on the QCM-D experiments done in the laboratory. For the data extracted from the literature, different ways of thickness calculation/estimation were used, such as AFM, ellipsometry, confocal microscopy. Coating thickness determined by these methods may differ, but due to the limited amount of the available data, we selected to include all the data regardless of the thickness measurement method. All the results extracted from the literature, as well as obtained in the laboratory, were entered in the tables describing different parameters (Table S1). Of note, all the multilayers used in the study were prepared by dip-coating or a similar technique (simple polymer solution deposition on the substrate followed by adsorption time and rinsing). Thus, the coating preparation method was not among the parameters influencing thickness.

Influence of different construction parameters on coating thickness

As a first step, distribution of the thickness values in the coatings made of different polymers was studied (Fig. 2). The results show that for polycations, the coatings made of poly(L-lysine) (PLL) have the greatest thickness median values with large interquartile range (IQR), which overlaps the thickness distribution of chitosan (CHI)-made coatings (Fig. 2A). In Fig. 2B, coatings having poly(L-glutamic acid) (PGA) have the greatest thickness median value with IQR overlapping with hyaluronic acid (HA)-containing coatings thickness distribution. The large thickness distribution of the LbL films containing the aforementioned polymers is probably due to the high frequency of their utilization and therefore to the wide variety of molecular weights (MW) that have been used.

Next, linear relationships between the coating thickness and different parameters were evaluated using the Pearson correlation, and non-linear relationships were analyzed with the predictive power score (PPS) method (Fig. 3). The PPS allows seeing non-symmetrical relationships between variables: features located on the x-axis are independent variables (influencers), and features on the y-axis are dependent variables (influenced by x-features).

The results show that concentrations of polymers and the number of bilayers in the coating have a strong positive linear relationship with the resulting thickness (Fig. 3A), while polyanion molecular weight and buffer properties have some non-linear relationships with this property (Fig. 3B). The effect of concentration and the number of bilayers are intuitive, however, their relative importance in an overall tendency sense cannot be extracted from a single or few types of LbL films, which is the currently common practice. Similarly, the discrepancy between the relative importance of polyanion (PA) MW and polycation (PC) MW is also not evident.

One of the most important features which have strong linear relationships with the coating thickness is the number of bilayers in the coating (Fig. 3).

As stated above, this dependency is largely obvious, and therefore, the data had to be unified by this number. For this, we decided to calculate the thickness at eight bilayers for each coating in the literature-generated data using the growth function (Eq. 1).

The growth function, describing changes in coating thickness d with the number of layers, N, has three coefficients, a₀, a, and b, which vary depending on the coatings^16,17. In this function parameter, b defines function curvature: for b ≥ 0.05, the growth is exponential, for b < 0.05, it is nearly linear.

$$d = a_{0} + a*e^{bN}$$

(1)

We extracted data on the dynamics of each coating growth from the original research papers. Then we used these data to calculate coefficients of growth function. Having these coefficients, we defined the thickness of coatings having eight bilayers (N = 16). With this, we created a new target value, the thickness of the 8-bilayer coating, which was not dependent on the number of bilayers.

In this configuration, the type of polyanion, its concentration and its molecular weight were found to positively influence the thickness of the coating, while the charge density of polyanion negatively correlated with the coating thickness (Fig. 4A). Three features (on the x-axis) contributed to the coating thickness (on the y-axis) (Fig. 4B): type of polyanion, polyanion unit molecular weight, and resulting molecular weight. The reason why polyanion characteristics appear more important than those of polycation can be explained by pH values at which polyelectrolyte multilayers were built (pH values are close to the pKa of the acid groups).

Coating thickness predictions

After the determination of the most influential parameters, the next step was the build-up of a predictive model. We constructed a Bagging Regressor model¹⁸ to make predictions about coating thickness using ten features from the original data set: presence of HA, presence of poly(styrene sulfonate) (PSS), resulting MW of the polycation and of the polyanion, unit MW of the polycation and of the polyanion, concentration of polycation and concentration of polyanion, the concentration of NaCl, and charge density of polyanion. Two quantitative structure–property relationship (QSPR) regression models were constructed using the selected features, Bagging Regressor and support vector regression (SVR) (Table 1).

Table 1 Performance of the constructed models measured as root-mean-square error (RMSE, nm) for three data sets. QSPR: quantitative structure–property relationship, SVR: support vector regression.

These models have smaller root-mean-square error (RMSE) values for the validation set than the Bagging Regressor constructed using the original dataset features. This fact indicates their greater potential for generalization. The model performance was evaluated with RMSE values calculated for the training set, test set, and validation set. All the models were fivefold cross-validated. All metrics are mean values for scores from different folds from cross-validation.

However, we encountered a classical Machine Learning challenge: overfitting, when the model makes good predictions for the instances it was built on (training set), but fails to “generalize”, i.e. make good predictions for the unseen items (validation and test sets). Therefore, the large gap between training and test/validation error values is the major sign of overfitting. This is the case for the Bagging Regressor constructed on the original data set features. As we can see, this model generates a large RMSE value for the validation set, which is 4 times larger than the error for the training set. From here, we conclude that polymers as specific chemical entities are not good features by themselves, and generating numerical features that describe the chemical properties of polymers can improve the model performance.

To get features of a molecule, firstly we had to get information about its structure, which is commonly represented in simplified molecular-input line-entry system (SMILES) format, available in the PubChem database. This information is further used to predict molecular features by deep learning pre-trained models available in the DeepChem library.

For each polycation and polyanion, 123 molecular descriptors were generated, therefore each coating in the dataset was characterized by 246 molecular descriptors. Many molecular features have a moderate correlation with the thickness of the coating, so we assume that they can be used to predict this target value (Fig. 5). Mostly, molecular descriptors with high correlations with the coating thickness are related to polyanions, not polycations; this is in line with the observations in the previous section.

In the next step, we combined features generated by DeepChem and numerical features of the polymers from the original data set. After that, we performed feature elimination with the recursive feature elimination (RFE) algorithm leaving ten features that will be used by the models. These features are MW of the polycation, MW of the polyanion, NaCl concentration, six polyanion RDKit descriptors (MinAbsEStateIndex, FpDensityMorgan3, BalabanJ, PEOE_VSA8, VSA_EState2, VSA_EState6), and one polycation descriptor Kappa1. The most significant molecular descriptors demonstrate the importance of the topological features of the polyanions (such as Balaban distance connectivity index (BalabanJ) and also polycations (Kappa1) in the formation of the supramolecular LbL assemblies in addition to the electrostatic interactions which are described by molecular operation environment (MOE) type electrotopological descriptors. Descriptors related to van der Walls forces and also partial charges underline the highly intricate nature of the LbL film formation at molecular level.

As the last step, in order to further test the generalization capacity of the final model, we have tested polymers which were not in the training set that are described solely by molecular descriptors.

In this configuration, we observed more inaccurate predictions (Table 2, Fig. 6). However, this could be expected, given the size of the training dataset. As we can see, the best results are obtained with the use of the QSPR RFE/Bagging regressor. It can predict the thickness of the coating better than two other models. We can also see that predictions of QSPR models are better than for the model RFE/Bagging constructed with original features, which predicts almost constant thickness values for all coatings (302 ± 25 nm). Predictions of QSPR models are correlating, and there are four coatings for which both regressors failed to predict thickness correctly (relative error > 100%): PAR30/PSS0.2, PAR30/CARiota_2, PAR30/PSS4, PAR30/CARiota_1 (Fig. 6B).

Table 2 The thickness of the coatings predicted with three models and thickness determined experimentally (ground truth).

Two other coatings, that have inaccurate predictions, contain CARiota polymer which was not in the dataset during training. The fact that the model fails to make predictions for the coatings which are made of unseen polymers confirms the assumption the generalization is not complete. It is interesting that despite this, there is one coating with CARiota for which relative error is small (47%), and the model was able to make an accurate prediction in this case.

Below, we discuss some of the possible reasons that may have caused large errors in the predictions made by the best of the constructed models, QSPR RFE/Bagging. For the PAR30/PSS0.2 and PAR30/PSS4, a large error may be caused by the too low molecular weight of the PSS polyanion compared to the one that the model has seen in the training set. The smallest polyanion in the dataset used for training was DEX with MW = 7.2 kDa, hereas in the validation set we have coatings where PSS has lower MW values (0.2 and 4 kDa). Moreover, in the training set, PSS MW was greater than in the validation set (60–70 kDa), so the combination of polyanion/MW of polyanion is new to the model. Because the model has made inaccurate predictions for the kind of polymers that it did not see during training, we assume that it can not generalize well and this is the reason for the prediction failures.

In the end, we observe that the model makes acceptable predictions for the coatings made of combinations of polymers that were present in the data set (like CHI/FUC, CHI/HA, and PAR/DEX). However, it fails to do so for the unseen polymers tested, due to the lack of extensive training data. We believe that the model potentially can be improved by generating and using more data for training. More data points will cover more chemical parameters, and the dependencies between features and thickness will be more informative and will have more predicting power.

Source link

Vasiprak Blog

Data collection

Influence of different construction parameters on coating thickness

Coating thickness predictions

You might also like

Among surveyed adults, those with depressive symptoms were more likely to believe false statements about COVID-19 vaccines — ScienceDaily

Physicist develops next-generation cancer photodrug — ScienceDaily

People with high-risk prediabetes benefit from intensive lifestyle intervention — ScienceDaily

Stay tuned!