Halgamuge MN. Supervised Machine Learning Algorithms for Bioelectromagnetics: Prediction Models and Feature Selection Techniques Using Data from Weak Radiofrequency Radiation Effect on Human and Animals Cells.
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020;
17:E4595. [PMID:
32604814 PMCID:
PMC7345599 DOI:
10.3390/ijerph17124595]
[Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 06/10/2020] [Accepted: 06/18/2020] [Indexed: 11/20/2022]
Abstract
The emergence of new technologies to incorporate and analyze data with high-performance computing has expanded our capability to accurately predict any incident. Supervised Machine learning (ML) can be utilized for a fast and consistent prediction, and to obtain the underlying pattern of the data better. We develop a prediction strategy, for the first time, using supervised ML to observe the possible impact of weak radiofrequency electromagnetic field (RF-EMF) on human and animal cells without performing in-vitro laboratory experiments. We extracted laboratory experimental data from 300 peer-reviewed scientific publications (1990-2015) describing 1127 experimental case studies of human and animal cells response to RF-EMF. We used domain knowledge, Principal Component Analysis (PCA), and the Chi-squared feature selection techniques to select six optimal features for computation and cost-efficiency. We then develop grouping or clustering strategies to allocate these selected features into five different laboratory experiment scenarios. The dataset has been tested with ten different classifiers, and the outputs are estimated using the k-fold cross-validation method. The assessment of a classifier's prediction performance is critical for assessing its suitability. Hence, a detailed comparison of the percentage of the model accuracy (PCC), Root Mean Squared Error (RMSE), precision, sensitivity (recall), 1 - specificity, Area under the ROC Curve (AUC), and precision-recall (PRC Area) for each classification method were observed. Our findings suggest that the Random Forest algorithm exceeds in all groups in terms of all performance measures and shows AUC = 0.903 where k-fold = 60. A robust correlation was observed in the specific absorption rate (SAR) with frequency and cumulative effect or exposure time with SAR×time (impact of accumulated SAR within the exposure time) of RF-EMF. In contrast, the relationship between frequency and exposure time was not significant. In future, with more experimental data, the sample size can be increased, leading to more accurate work.
Collapse