3. Results
3.2 Density Estimation
3.2.5 Test on a different vendor: Hologic Selenia Dimension
As last test, to further validate the model, it has been decided to apply the final model, trained with the mammograms of the Siemens device, to the mammograms of the same breast phantoms (belonging to the test set) but obtained from another type of system: Hologic Selenia Dimension.
As reported in paragraph 2.3, the main difference between the two vendors is in the correspondence between the thickness of the breast and the kV used by the device. Furthermore, Hologic system switches the filter material from Rh to Ag starting from a certain breast thickness (70 mm) in order to return to a lower voltage, which didn’t happen for the Siemens simulations.
However, these differences result in different pixel values on the mammograms, so they should be almost completely compensated by the normalization for the only-adipose-pixel.
In the same scatter plot (Figure 3.7), are reported the density predictions for the two systems (with different colours), so as to make the comparison easier. In both cases, the normalization for the only-adipose-pixel has been done with the automatic masks.
50 Fig 3.7 Scatter plots representing the relationship between the Ground-Truth-Densities (x-axis) and the Estimated Densities (y-axis). In blue are reported the estimations for the Siemens system and in red the ones for
the Hologic device.
By comparing the estimations for Hologic device with the previous ones, it’s possible to see that for low densities the performance doesn’t change very much, and the dots follow approximately the bisector (except for a few isolated cases). Instead, for densities higher than 40% in percentage, the negative bias is present and seems to be more pronounced for the new system. Again, the presence of limited cases with density higher than 40% (only 17.61% of the dataset) may have made the model less robust in the last part of the scale.
51
Discussion
The proposed algorithm for breast density estimation has been shown to result in accurate predictions starting from simulated mammographic images, exploiting a DL-based method to perform the task. The presented method is fully automatic and can estimate breast density from simulated mammograms of different vendors accurately, without the need of re-training.
There are already a few algorithms in literature for breast density estimation, but unlike most previous studies [20]-[24], this approach was validated in terms of accuracy in breast density estimation against an objective ground truth. This was made possible thanks to using patient-based phantoms with known density obtained with breast CT. Furthermore, the algorithm does not aim at the division into density classes (four in most of the studies: almost entirely fatty, scattered fibroglandular densities, heterogeneously dense and extremely dense), but at the exact prediction of the density, since it involves a linear regression model.
The training of the U-Net, which has been done with the presence of the breast thickness maps, is helpful to find the zone of the breast with thickness constant and has paved the way for the possibility of estimating the breast density from the mammograms automatically (eliminating the need of the thickness maps, which would not be available in clinic).
However, in this work, there are some limitations to be taken into account. Indeed, this study should be considered preliminary mainly for two reasons: the limited dataset size (only 88 different patient-phantoms were present in the two views, and even after the augmentation the cases were 528) and all the simplification intrinsic in mammograms simulations (only primary, non-scattered, x rays and an ideal detector). However, due to the relatively large pixel size after the resizing (the maximum is about 0.907 mm for the x-size and 1.284 for the y-size), the influence of noise and resolution loss should be low. Furthermore, the use of an anti-scatter grid in mammography should also limit the influence of scatter (or the lack thereof) on the results obtained in this work.
The results, both for the U-Net and the CNN exploited for the estimation, from all metrics evaluated in this work were satisfying. The metrics used for the segmentation highlight very accurate masks in every set for both views (CC: DSC_TR = 0.9926, DSC_VL = 0.9616, DSC_TS = 0.9597; MLO: DSC_TR = 0.9915, DSC_VL = 0.9622, DSC_TS = 0.9632). However, looking at Tables 3.1 and 3.2, the automatic masks are characterized by a sensitivity higher than the precision, which means the masks obtained from the U-Net are slightly bigger than the manual ones. Due to these results, before being used on the test set, the automatic masks have been eroded with a circumference with radius equal to 3 pixels, to avoid including zones where the breasts are no longer at constant thickness.
52 Regarding the breast density estimation, the results obtained with the fully-automatic method bode well for future studies in this field. Applying the model to the Siemens mammograms, the median absolute error and the median percentage error on all the test set are 0.0357 [0.0187 - 0.0531] and 14.69% [8.23% - 24.95%], respectively. These results can be considered accurate, especially thinking that nowadays the density is often estimated by eye by radiologists and placed in the four different classes previously mentioned.
Talking about the performance of the algorithm in particular, the median absolute error increases as the density also increases (i.e., the mAE between 0.25 and 0.6 is equal to 0.0351 while the mAE for densities above 0.6 is equal to 0.1482). Especially above 50% (density as a percentage), the negative bias in the density estimation is evident on the test set (Figure 3.5, panel C). This is not surprising, since most of the cases in the dataset presents density lower than 50% in percentage (89.96% of the dataset), which justify the model being less robust after certain densities. However, it may not be considered a major problem since it can be probably solved with a larger dataset, which is one of the upcoming improvements. Furthermore, it’s important to remember that most women worldwide have a breast density between 0.05 and 0.25, reason why it’s possible to consider these cases as outliers.
The model trained with the Siemens Mammomat Inspiration mammograms actually proved to be accurate also when applied to mammographic images from another vendor: Hologic Selenia Dimension. Even if the combinations voltage-thickness and target material-thickness are different, the overall results on the test remain good considering that the training was done totally with another vendor. Particularly, the median absolute error turns out to be 0.0373 [0.0171 - 0.0699], when for the original vendor was 0.0357, and the median percentage error becomes 15.35%
[7.46% - 32.36%] from 14.69%. In terms of median values, the performances are comparable to each other, thanks to the normalization for the only-adipose-pixel found in the zone with thickness constant. The reason of this normalization, in addition to correcting any potential bias in pixel values introduced by the x-ray spectrum being discretized for a given thickness range, was to standardise as much as possible the values among different vendors. However, even if the new general performance is acceptable and accurate, it’s possible to see in Figure 3.7 how the negative bias is more evident moving to a new vendor. This can again be explained by the lack of breast phantoms with high densities in the training set. Indeed, even though the values don’t change so much between vendors thanks to the normalization, the low-robustness of the model for high densities results in an incorrect estimation of those densities.
53
Conclusion
The proposed algorithm for breast density estimation resulted in accurate prediction starting from simulated mammograms. The method doesn’t require additional data besides mammograms, thanks to the automatic detection of the constant-thickness-zone directly from the mammographic images, carried out by the U-Net.
Respect to previous software or studies, the use of patient-based breast phantoms allows the validation in terms of accuracy against an objective ground truth, normally not feasible since it’s not possible to calculate breast density in any way.
Even if only on simulated mammograms, the algorithm seems to be accurate also when applied to images from different vendors: indeed, the DL model was trained using Siemens mammograms, but the performance was also good on Hologic ones.
In conclusion, the method proposed in this thesis reported promising results for the breast density estimation with deep learning approaches.
54
55
Future Works
Considering the above-mentioned limitations, future studies will have to be carried out to make the work, currently considered preliminary, complete. First of all, the future works have to include a larger dataset to train and test the model; indeed, it’s well known that the performance of DL models increases as the data become more numerous. It would also be more comprehensive including patient-based phantom from different Countries, to consider the variability among the world population.
Talking about mammogram’s simulation, further studies should include other factors in the image generation as scattering, blurring and noise. Thus, simulations would be more accurate and similar to those performed in reality, even though the contribution of these factors is minimal in mammography. Furthermore, in this study only two different vendors are implemented (Siemens and Hologic) but it would be useful to test the model on more of them (the next ones to be implemented will be Fujifilm and GE).
Ultimately, the model should be re-trained on processed simulated mammograms and evaluate on real patient data.
56
57
Bibliography
[1] “European Cancer Information System”, link: Cancer burden statistics and trends across Europe | ECIS (europa.eu) (online).
[2] D. Lehman, MD, PhD • Adam Yala, MEng • Tal Schuster, MSc • Brian Dontchos, MD • Manisha Bahl, MD, MPH • Kyle Swanson, BS • Regina Barzilay, PhD. “Mammographic Breast Density Assessment Using Deep Learning: Clinical Implementation Constance” (2018).
[3] Marco Caballo “Towards Precision Medicine in Breast Cancer Imaging: from 3D Breast CT Radiomics to 4D Perfusion” (2021).
[4] Perry Sprawls “Mammography Physics and Technology for effective clinical imaging”
(online).
[5] “Breast Imaging: Mammography “| Radiology Key (online).
[6] EN Manson • V Atuwo Ampoh • E Fiagbedzi • J H Amuasi •J J Flether • C Schandorf.
“Image Noise in Radiography and Tomography: Causes, Effects and Reduction Techniques”
(2019).
[7] Vivek Kaul, Sarah Enslin, Seth A. Gross, “History of artificial intelligence in medicine”, Gastrointestinal Endoscopy, Volume 92, Issue 4 (2020).
[8] “Deep Learning vs Machine Learning”, link: Levity.ai/difference-machine-learning-deep-learning (online).
[9] “Simplifying the difference: Machine Learning vs Deep Learning”, link:
https://www.scs.org.sg/articles/machine-learning-vs-deep-learning (online).
[10] “Neural Network”, link: https://www.ibm.com/it-it/cloud/learn/neural-networks (online).
[11] Emma Amor, “Understanding Non-Linear Activation Functions in Neural Networks”
(2020).
[12] Marco Caballo et al Phys. Med. Biol. 63 225017 (2018).
[13] N. Moriakov, J.-J. Sonke, and J. Teuwen, “LIRE: Learned Invertible Reconstruction for Cone Beam CT,” arXiv, 2022. Available: http://arxiv.org/abs/2205.07358. (online)
[14] Andrew M. Hernandez, John M. Boone “Tungsten anode spectral model using interpolating cubic splines: unfiltered x-ray spectra from 20 kV to 640 kV” (2014).