Preloader

A generative adversarial network for synthetization of regions of interest based on digital mammograms

This section presents the result of the experimentation described in the last section and compares its performance with other similar models. We have further discussed the benefit of the proposed model and training techniques adopted to achieve a stable, converging, and nonfailing GAN model. We enabled the GAN model to train each category of abnormality for a long time until the generated images improved. As a result, the training epoch for each class of abnormalities is not the same. We graphed the loss and accuracy of the generator and discriminator during training and during image synthesis.

The plots of the loss values and accuracies obtained during training for samples with architectural distortion, asymmetry, microclassification, and mass are shown in Figs. 8, 9, 10, and 11, respectively. We found this very interesting to describe how our model learns the problem of generating these images with different abnormalities. The losses for both G and D are collected, and we illustrate the variation in the rise and fall of their losses, respectively.

Figure 8
figure 8

Training output on the proposed GAN model for architectural distortion showing how the generator learns in synthesizing images similar to real samples with architectural distortion.

Figure 9
figure 9

Training output on the proposed GAN model for asymmetry showing how the generator learns in synthesizing images similar to real samples with asymmetry.

Figure 10
figure 10

Training output on the proposed GAN model for microclacification showing how the generator learns in synthesizing images similar to real samples with microclacification.

Figure 11
figure 11

Training output on the proposed GAN model for mass showing how the generator learns in synthesizing images similar to real samples with architectural distortion mass.

We observed the training of the GAN model on the samples with architectural distortion and found that the generator is able to generate images with an average accuracy of 85%, as shown in Fig. 8. In the same figure, we observed that the generator’s loss values and discriminator rise and fall, respectively. This trend of loss values for both the generator and discriminator also presents similar plots for asymmetry, microcalcification, and mass. The implication of this trend in loss values confirms a progressive and useful learning pattern. Meanwhile, the accuracy of samples generated for the abnormalities architectural distortion, asymmetry, microcalcification, and mass have also been plotted. For instance, we have observed that the model has appeared to be learning quickly to generate samples of asymmetry and microcalcification with significant accuracy, whereas the mass abnormality slowly learns to generate samples with sufficient accuracy. Furthermore, the accuracy of both the true and generated (fake) images was summed, evaluated to 100%, and collected during training. These are computed to determine the significance of the accuracy of both images (real and fake) when combined if they both approach good-quality images.

The outcomes of this analysis are shown in Fig. 12a–c,d for architectural distortion, asymmetry, microclassification, and mass, respectively. This evaluation shows that for samples with architectural distortion, the GAN model improves significantly within the epochs of 20,000–25,000. A similar result has been seen in the cases of microcalcification and mass abnormalities as the accuracy of both fake and real images improved progressively, although the former presents a false performance in the early phase of training. Meanwhile, the discriminator has demonstrated poor performance in learning the samples with asymmetry abnormalities, as the generator is able to fool it in the early phase of training.

Figure 12
figure 12

A plot of the combination of the real and generated (fake) accuracies of (a) architectural distortion, (b) asymmetry, (c) microclacification, and (d) mass during training as evaluated under 100%.

We obtained samples of images generated during the training of the four classes of abnormalities, namely, architectural distortion, asymmetry, microclacification, and mass. Figure 13 shows these samples, demonstrating how the GAN model approaches synthesizing quality ROI-based digital mammograms for those abnormalities. Each case of the abnormalities is captured in Fig. 13a–c,d for architectural distortion, asymmetry, microclassification, and mass, respectively.

Figure 13
figure 13

(a) Sample images generated during the training of inputs with architectural distortion abnormality. (b) Sample images generated during the training of inputs with asymmetrical abnormalities. (c) Sample images generated during the training of inputs with microclacification abnormalities. (d) Sample images generated during the training of inputs with mass abnormality.

The state of the models at the epoch, where outputs become qualitative, is saved for synthesizing new samples to support convolutional neural network (CNN) models aimed at classifying abnormalities in digital mammograms. Using these stored models for each abnormality, we generated sample images to apply the computational metrics described in “Performance evaluation metrics” section. In Tables 6, 7, 8, and 9, the values obtained for the PSNR, SSIM, MSE, FSIM, BRISQUE, PQUE, and NIQE metrics are outlined in the cases of architectural distortion, asymmetry, microclassification, and mass, respectively. Ten (10) real samples are drawn from each abnormality and compared with corresponding synthesized samples to confirm and stabilize the analysis. Similarly, to illustrate the distribution of these metrics for each abnormality, we have utilized boxplots in Figs. 14, 15, 16, and 17, which provides good visualization for these distributions.

Table 6 Quantitative comparison of the image quality analysis of ten (10) randomly selected synthesized images with architectural distortion (AD) for metrics ranging in the categories of reference-based, nonreference-based, and feature-based.
Table 7 Quantitative comparison of the image quality analysis of ten (10) randomly selected synthesized images with asymmetry (ASY) for metrics ranging in the categories of reference-based, nonreference-based, and feature-based.
Table 8 Quantitative comparison of the image quality analysis of ten (10) randomly selected synthesized images with microcalcification (CALC) for metrics ranging in the categories of reference-based, nonreference-based, and feature-based.
Table 9 Quantitative comparison of the image quality analysis of ten (10) randomly selected synthesized images with mass (MS) for metrics ranging in the categories of reference-based, nonreference-based, and feature-based.
Figure 14
figure 14

Boxplot showing the distribution of values obtained for ten randomly selected samples of architectural distortion in computational metrics PSNR, SSIM, FSIM, BRISQUE, PQUE, and NIQE.

Figure 15
figure 15

Boxplot showing the distribution of values obtained for ten randomly selected samples of asymmetry in computational metrics PSNR, SSIM, FSIM, BRISQUE, PQUE, and NIQE.

Figure 16
figure 16

Boxplot showing the distribution of values obtained for ten randomly selected samples of microcalcification in computational metrics PSNR, SSIM, MSE, FSIM, BRISQUE, PQUE, and NIQE.

Figure 17
figure 17

Boxplot showing the distribution of values obtained for ten randomly selected samples of mass in computational metrics PSNR, SSIM, MSE, FSIM, BRISQUE, PQUE, and NIQE.

The results of the reference-based metrics applied to the evaluation of the synthesized images, as listed in Tables 6, 7, and 8, reveal that the MSE, the average squared difference between the real and synthesized images, is minimal compared to what was obtained in Table 9. This indicates that our GAN model can learn the representation of samples from architectural distortion, asymmetry, and microcalcification abnormalities except for some challenges encountered in the case of mass abnormalities. Furthermore, to investigate the images’ quality, we compute the PSNR, SSIM, DSSIM, and FSIM metrics. We discovered that for the PSNR metric, average values of 27.97, 27.69, and 27.93 were obtained for architectural distortion, asymmetry, and microcalcifications, respectively, whereas the mass abnormality yielded 8.26 for the same metric. Additionally, for the SSIM and DSSIM metrics, which also evaluate the quality of an image, paired values of 0.04 and 0.48, 0.74 and 0.13, and 0.05 and 0.48 are obtained for architectural distortion, asymmetry, and microcalcifications, respectively, with mass abnormalities of 0.35 and 0.32 for the paired metrics. The performance of the GAN model on the four abnormalities when evaluated with FSIM metrics shows that the average values of 0.88, 0.78, 0.85, and 0.85 were obtained for architectural distortion, asymmetry, microcalcifications, and mass abnormalities, which are very competitive. The outcome of the reference-based metrics implies that the images generated by the proposed GAN model are acceptable. Meanwhile, to demonstrate the distribution of values across the ten (10) samples obtained from the synthesized images for each metric in this category, Figs. 14, 15, 16, and 17 depict their plots for architectural distortion, asymmetry, microcalcifications, and mass abnormalities, respectively. The data in this distribution show that the GAN model’s success in generating samples is primarily determined by features learned since just a few outliers in the boxplot are noticed.

Nonreference-based metrics are also evaluated against the proposed GAN model, and the results obtained are listed in Tables 6, 7, 8, and 9 for architectural distortion, asymmetry, microcalcifications, and mass abnormalities, respectively. For instance, the results obtained for the BRISQUE metric reveal that average values of 25.84, 115.58, 20.39, and 55.59 are obtained for architectural distortion, asymmetry, microcalcifications, and mass abnormalities, respectively. The GAN model performed appreciably well in the cases of architectural distortion and microcalcification abnormalities, whereas those of asymmetry and mass abnormalities were trailed behind in performance. Similarly, for the PIQE metrics, average values of 23.44, 61.07, 16.35, and 69.91 are obtained for architectural distortion, asymmetry, microcalcifications, and mass abnormalities, respectively. We see a similar distribution in performance by the proposed GAN model, where both abnormalities of architectural distortion and microcalcifications show better outcomes than those of asymmetry and mass abnormalities. This consistency on the side of the model still confirms that the model is able to maintain the syncretization pattern based on what it learned in the case of each abnormality. Finally, the NIQE metric is also evaluated on the proposed GAN model in all cases of abnormalities, and the results showed that average values of 27.60, 58.62, 33.18, and 144.74 resulted in architectural distortion, asymmetry, microcalcifications, and mass abnormalities, respectively. The distribution of the values for the ten (10) samples of synthesized images in the case of architectural distortion, asymmetry, microcalcifications, and mass abnormalities are plotted in Figs. 14, 15, 16, and 17, respectively, for these three metrics in the category of nonreference based. As reflected in their average values, we see that the distribution of values for the architectural distortion, microcalcifications, and asymmetry for the BRISQUE, PIQE, and NIQE metrics are closely spaced with minimal outliers, while that of the mass abnormalities for the same metrics BRISQUE, PIQE, and NIQE indicated more outliers and wider distribution values.

The feature-based metrics are also evaluated against the GAN model proposed in this study. Particularly, the geometry score (GS) and the FID metrics are evaluated to investigate and compare the feature constitution that exists between the real image and those synthesized by the GAN model. This feature similarity measurement for GS in all cases of architectural distortion, asymmetry, microcalcifications, and mass abnormalities tends toward zero (0), confirming the good visual quality of the images generated by the proposed GAN model and indicating that the images, when compared with those from the real distribution, are almost identical with little diversity in their topology. Similarly, the values obtained for the FID metric, which is a calculation of the distance between the real image and that synthesized, are expected to yield many values. As seen in Tables 6, 7, 8, and 9 for architectural distortion, asymmetry, microcalcifications, and mass abnormalities, respectively, those values obtained are significant for all abnormalities.

In Fig. 18, sample images synthesized using the fully trained GAN model are presented. These represent regions of interest (ROIs) with different forms of abnormalities.

Figure 18
figure 18

Sample image outputs in 1000 iterations with architectural distortion synthesized using the fully trained generator.

The results plotted in Fig. 19a–c present the loss values and accuracy obtained for the first fifty (50) samples generated with the trained GAN model in the cases of architectural distortion, asymmetry, and microcalcification abnormalities, respectively. Interestingly, we discovered that the asymmetry accuracy consistently outputs 0.1, while those of architectural distortion and microcalcification peak to approximately 0.7 and 0.78, respectively. To compare the performance of the proposed ROImammoGAN with state-of-the-art image synthesizing models, we carried out a comparative analysis of the work in this study and presented the results in Table 10. A corresponding plot of the results obtained for SSIM, PSNR, and MSE in the table is shown in Fig. 20.

Figure 19
figure 19

Plot of accuracy and loss values for testing the trained model on samples with (a) architectural distortion, (b) asymmetry, and (c) microcalcification.

Table 10 Comparison of the performance of GAN proposed in this study with state-of-the-art GANs using metrics of reference-based category.
Figure 20
figure 20

A graphical comparison of the performance of the proposed GAN model in this study compared with similar state-of-the-art medical image GANs: (a) SSIM, (b) PSNR, and (c) MSE.

The comparison of the performance of the proposed ROImammoGAN, as seen in Table 10, reveals that based on the SSIM metric, which evaluates the quality of the real image compared with the synthesized image, our model trails those of cGAN, perceptual GAN, SC-GAN, and MedGAN with values of 0.8960, 0.9071, 0.9046, and 0.9160 compared with our proposed model, which outputs 0.8000. The range of distribution of similar studies and that obtained by our study illustrate that the quality of our generated samples is significant. This comparison is plotted in Fig. 20a. Similarly, we have compared the performance of our model with other state-of-the-art GAN models using the structural dissimilarity SSIM (DSSIM) and demonstrated that the values of 0.05, 0.05, 0.05, and 0.04 obtained by cGAN, perceptual GAN, SC-GAN, and MedGAN compared with 0.10 yielded by our model are good. The PSNR, which also measures the quality of an image with respect to a synthesized image, is also applied to evaluate our GAN model in comparison with similar models. The results show that our GAN model outputs a better performance by attaining a value of 27.72 for PSNR compared with those of 23.65, 24.20, 24.12, and 24.62 cGAN, perceptual G SC-GAN, and MedGAN. A graphical illustration of these values is shown in Fig. 20b. This also confirms that the quality of images synthesized by our proposed GAN model is acceptable and qualitative. Finally, for the MSE metric, we see from Table 10 that our suggested GAN model showed the lowest mean squared error value compared with values obtained from state-of-the-art GAN models used for the comparison task, and the plot in Fig. 20c also confirms this. In summary, the implication of the findings from the results obtained in the experimentation of the proposed ROImammoGAN demonstrates that the model is useful for generating image samples for different abnormalities of digital mammography images. Therefore, the outcome of this study is a GAN model capable of synthesizing ROI-based image samples in the category architectural distortion, asymmetry, microcalcifications, and mass abnormalities. These synthesized images may be used to augment class-imbalanced datasets, which may further be used for classification problems in CNN architectures.

Source link