1
|
Li M, Guo H, Wang K, Kang C, Yin Y, Zhang H. AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification. Comput Biol Med 2024; 177:108614. [PMID: 38796884 DOI: 10.1016/j.compbiomed.2024.108614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 02/27/2024] [Accepted: 05/11/2024] [Indexed: 05/29/2024]
Abstract
Integration analysis of cancer multi-omics data for pan-cancer classification has the potential for clinical applications in various aspects such as tumor diagnosis, analyzing clinically significant features, and providing precision medicine. In these applications, the embedding and feature selection on high-dimensional multi-omics data is clinically necessary. Recently, deep learning algorithms become the most promising cancer multi-omic integration analysis methods, due to the powerful capability of capturing nonlinear relationships. Developing effective deep learning architectures for cancer multi-omics embedding and feature selection remains a challenge for researchers in view of high dimensionality and heterogeneity. In this paper, we propose a novel two-phase deep learning model named AVBAE-MODFR for pan-cancer classification. AVBAE-MODFR achieves embedding by a multi2multi autoencoder based on the adversarial variational Bayes method and further performs feature selection utilizing a dual-net-based feature ranking method. AVBAE-MODFR utilizes AVBAE to pre-train the network parameters, which improves the classification performance and enhances feature ranking stability in MODFR. Firstly, AVBAE learns high-quality representation among multiple omics features for unsupervised pan-cancer classification. We design an efficient discriminator architecture to distinguish the latent distributions for updating forward variational parameters. Secondly, we propose MODFR to simultaneously evaluate multi-omics feature importance for feature selection by training a designed multi2one selector network, where the efficient evaluation approach based on the average gradient of random mask subsets can avoid bias caused by input feature drift. We conduct experiments on the TCGA pan-cancer dataset and compare it with four state-of-the-art methods for each phase. The results show the superiority of AVBAE-MODFR over SOTA methods.
Collapse
Affiliation(s)
- Minghe Li
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Huike Guo
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Keao Wang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Chuanze Kang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, NE, USA
| | - Han Zhang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China.
| |
Collapse
|
2
|
Zayed A, Belhadj N, Ben Khalifa K, Bedoui MH, Valderrama C. Efficient Generalized Electroencephalography-Based Drowsiness Detection Approach with Minimal Electrodes. SENSORS (BASEL, SWITZERLAND) 2024; 24:4256. [PMID: 39001037 PMCID: PMC11244425 DOI: 10.3390/s24134256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/21/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024]
Abstract
Drowsiness is a main factor for various costly defects, even fatal accidents in areas such as construction, transportation, industry and medicine, due to the lack of monitoring vigilance in the mentioned areas. The implementation of a drowsiness detection system can greatly help to reduce the defects and accident rates by alerting individuals when they enter a drowsy state. This research proposes an electroencephalography (EEG)-based approach for detecting drowsiness. EEG signals are passed through a preprocessing chain composed of artifact removal and segmentation to ensure accurate detection followed by different feature extraction methods to extract the different features related to drowsiness. This work explores the use of various machine learning algorithms such as Support Vector Machine (SVM), the K nearest neighbor (KNN), the Naive Bayes (NB), the Decision Tree (DT), and the Multilayer Perceptron (MLP) to analyze EEG signals sourced from the DROZY database, carefully labeled into two distinct states of alertness (awake and drowsy). Segmentation into 10 s intervals ensures precise detection, while a relevant feature selection layer enhances accuracy and generalizability. The proposed approach achieves high accuracy rates of 99.84% and 96.4% for intra (subject by subject) and inter (cross-subject) modes, respectively. SVM emerges as the most effective model for drowsiness detection in the intra mode, while MLP demonstrates superior accuracy in the inter mode. This research offers a promising avenue for implementing proactive drowsiness detection systems to enhance occupational safety across various industries.
Collapse
Affiliation(s)
- Aymen Zayed
- Technology and Medical Imaging Laboratory, Faculty of Medicine Monastir, University of Monastir, Monastir 5019, Tunisia
- National Engineering School of Sousse, University of Sousse, BP 264 Erriyadh, Sousse 4023, Tunisia
- Department of Electronics and Microelectronics (SEMi), University of Mons, 7000 Mons, Belgium
| | - Nidhameddine Belhadj
- Laboratory of Electronics and Microelectronics, Faculty of Sciences of Monastir, Monsatir 5019, Tunisia
| | - Khaled Ben Khalifa
- Technology and Medical Imaging Laboratory, Faculty of Medicine Monastir, University of Monastir, Monastir 5019, Tunisia
- Higher Institute of Applied Science and Technology of Sousse, University of Sousse, Sousse 4003, Tunisia
| | - Mohamed Hedi Bedoui
- Technology and Medical Imaging Laboratory, Faculty of Medicine Monastir, University of Monastir, Monastir 5019, Tunisia
| | - Carlos Valderrama
- Department of Electronics and Microelectronics (SEMi), University of Mons, 7000 Mons, Belgium
| |
Collapse
|
3
|
Dai J, Li W, Dong G. Dung Beetle Optimizer Algorithm and Machine Learning-Based Genome Analysis of Lactococcus lactis: Predicting Electronic Sensory Properties of Fermented Milk. Foods 2024; 13:1958. [PMID: 38998464 PMCID: PMC11241492 DOI: 10.3390/foods13131958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 06/11/2024] [Accepted: 06/19/2024] [Indexed: 07/14/2024] Open
Abstract
In the global food industry, fermented dairy products are valued for their unique flavors and nutrients. Lactococcus lactis is crucial in developing these flavors during fermentation. Meeting diverse consumer flavor preferences requires the careful selection of fermentation agents. Traditional assessment methods are slow, costly, and subjective. Although electronic-nose and -tongue technologies provide objective assessments, they are mostly limited to laboratory environments. Therefore, this study developed a model to predict the electronic sensory characteristics of fermented milk. This model is based on the genomic data of Lactococcus lactis, using the DBO (Dung Beetle Optimizer) optimization algorithm combined with 10 different machine learning methods. The research results show that the combination of the DBO optimization algorithm and multi-round feature selection with a ridge regression model significantly improved the performance of the model. In the 10-fold cross-validation, the R2 values of all the electronic sensory phenotypes exceeded 0.895, indicating an excellent performance. In addition, a deep analysis of the electronic sensory data revealed an important phenomenon: the correlation between the electronic sensory phenotypes is positively related to the number of features jointly selected. Generally, a higher correlation among the electronic sensory phenotypes corresponds to a greater number of features being jointly selected. Specifically, phenotypes with high correlations exhibit from 2 to 60 times more jointly selected features than those with low correlations. This suggests that our feature selection strategy effectively identifies the key features impacting multiple phenotypes, likely originating from their regulation by similar biological pathways or metabolic processes. Overall, this study proposes a more efficient and cost-effective method for predicting the electronic sensory characteristics of milk fermented by Lactococcus lactis. It helps to screen and optimize fermenting agents with desirable flavor characteristics, thereby driving innovation and development in the dairy industry and enhancing the product quality and market competitiveness.
Collapse
Affiliation(s)
- Jinhui Dai
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot 010011, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, Hohhot 010011, China
| | - Weicheng Li
- Key Laboratory of Dairy Biotechnology and Engineering (IMAU), Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China
- Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot 010018, China
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot 010018, China
- Collaborative Innovative Center for Lactic Acid Bacteria and Fermented Dairy Products, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China
| | - Gaifang Dong
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot 010011, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, Hohhot 010011, China
| |
Collapse
|
4
|
Saini R, Tiwari AK, Nath A, Singh P, Maurya SP, Shah MA. Covering assisted intuitionistic fuzzy bi-selection technique for data reduction and its applications. Sci Rep 2024; 14:13568. [PMID: 38866851 DOI: 10.1038/s41598-024-62099-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open
Abstract
The dimension and size of data is growing rapidly with the extensive applications of computer science and lab based engineering in daily life. Due to availability of vagueness, later uncertainty, redundancy, irrelevancy, and noise, which imposes concerns in building effective learning models. Fuzzy rough set and its extensions have been applied to deal with these issues by various data reduction approaches. However, construction of a model that can cope with all these issues simultaneously is always a challenging task. None of the studies till date has addressed all these issues simultaneously. This paper investigates a method based on the notions of intuitionistic fuzzy (IF) and rough sets to avoid these obstacles simultaneously by putting forward an interesting data reduction technique. To accomplish this task, firstly, a novel IF similarity relation is addressed. Secondly, we establish an IF rough set model on the basis of this similarity relation. Thirdly, an IF granular structure is presented by using the established similarity relation and the lower approximation. Next, the mathematical theorems are used to validate the proposed notions. Then, the importance-degree of the IF granules is employed for redundant size elimination. Further, significance-degree-preserved dimensionality reduction is discussed. Hence, simultaneous instance and feature selection for large volume of high-dimensional datasets can be performed to eliminate redundancy and irrelevancy in both dimension and size, where vagueness and later uncertainty are handled with rough and IF sets respectively, whilst noise is tackled with IF granular structure. Thereafter, a comprehensive experiment is carried out over the benchmark datasets to demonstrate the effectiveness of simultaneous feature and data point selection methods. Finally, our proposed methodology aided framework is discussed to enhance the regression performance for IC50 of Antiviral Peptides.
Collapse
Affiliation(s)
- Rajat Saini
- Department of Mathematics, School of Basic Sciences, Central University of Haryana, Mahendergarh, 123031, India
| | - Anoop Kumar Tiwari
- Department of Computer Science and Information Technology, Central University of Haryana, Mahendergarh, 123031, India.
| | - Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India
| | - Phool Singh
- Department of Mathematics (SoET), Central University of Haryana, Mahendergarh, 123031, India
| | - S P Maurya
- Department of Geophysics, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Mohd Asif Shah
- Department of Economics, Kebri Dehar University, 250, Kebri Dehar, Somali, Ethiopia.
- Division of Research and Development, Lovely Professional University, Phagwara, Punjab, 144001, India.
- Department of Economics, Kardan University, Parwan e Du, Kabul, 1001, Afghanistan.
| |
Collapse
|
5
|
Iqbal A, Amin R, Alsubaei FS, Alzahrani A. Anomaly detection in multivariate time series data using deep ensemble models. PLoS One 2024; 19:e0303890. [PMID: 38843255 PMCID: PMC11156414 DOI: 10.1371/journal.pone.0303890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/03/2024] [Indexed: 06/09/2024] Open
Abstract
Anomaly detection in time series data is essential for fraud detection and intrusion monitoring applications. However, it poses challenges due to data complexity and high dimensionality. Industrial applications struggle to process high-dimensional, complex data streams in real time despite existing solutions. This study introduces deep ensemble models to improve traditional time series analysis and anomaly detection methods. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks effectively handle variable-length sequences and capture long-term relationships. Convolutional Neural Networks (CNNs) are also investigated, especially for univariate or multivariate time series forecasting. The Transformer, an architecture based on Artificial Neural Networks (ANN), has demonstrated promising results in various applications, including time series prediction and anomaly detection. Graph Neural Networks (GNNs) identify time series anomalies by capturing temporal connections and interdependencies between periods, leveraging the underlying graph structure of time series data. A novel feature selection approach is proposed to address challenges posed by high-dimensional data, improving anomaly detection by selecting different or more critical features from the data. This approach outperforms previous techniques in several aspects. Overall, this research introduces state-of-the-art algorithms for anomaly detection in time series data, offering advancements in real-time processing and decision-making across various industrial sectors.
Collapse
Affiliation(s)
- Amjad Iqbal
- Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan
| | - Rashid Amin
- Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan
- Department of Computer Science and Information Technology, University of Chakwal, Chakwal, Pakistan
| | - Faisal S. Alsubaei
- Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Abdulrahman Alzahrani
- Department of Information System and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
6
|
Rostamzadeh S, Abouhossein A, Alam K, Vosoughi S, Sattari SS. Exploratory analysis using machine learning algorithms to predict pinch strength by anthropometric and socio-demographic features. INTERNATIONAL JOURNAL OF OCCUPATIONAL SAFETY AND ERGONOMICS 2024; 30:518-531. [PMID: 38553890 DOI: 10.1080/10803548.2024.2322888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Objectives. This study examines the role of different machine learning (ML) algorithms to determine which socio-demographic factors and hand-forearm anthropometric dimensions can be used to accurately predict hand function. Methods. The cross-sectional study was conducted with 7119 healthy Iranian participants (3525 males and 3594 females) aged 10-89 years. Seventeen hand-forearm anthropometric dimensions were measured by JEGS digital caliper and a measuring tape. Tip-to-tip, key and three-jaw chuck pinches were measured using a calibrated pinch gauge. Subsequently, 21 features pertinent to socio-demographic factors and hand-forearm anthropometric dimensions were used for classification. Furthermore, 12 well-known classifiers were implemented and evaluated to predict pinches. Results. Among the 21 features considered in this study, hand length, stature, age, thumb length and index finger length were found to be the most relevant and effective components for each of the three pinch predictions. The k-nearest neighbor, adaptive boosting (AdaBoost) and random forest classifiers achieved the highest classification accuracy of 96.75, 86.49 and 84.66% to predict three pinches, respectively. Conclusions. Predicting pinch strength and determining the predictive hand-forearm anthropometric and socio-demographic characteristics using ML may pave the way to designing an enhanced tool handle and reduce common musculoskeletal disorders of the hand.
Collapse
Affiliation(s)
- Sajjad Rostamzadeh
- Department of Ergonomics, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Alireza Abouhossein
- Department of Ergonomics, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Khurshid Alam
- Department of Mechanical and Industrial Engineering, College of Engineering, Sultan Qaboos University, Muscat, Oman
| | - Shahram Vosoughi
- Department of Occupational Health Engineering, School of Public Health, Iran University of Medical Sciences, Tehran, Iran
| | | |
Collapse
|
7
|
Park JY, Lee SH, Kim YJ, Kim KG, Lee GJ. Machine learning model based on radiomics features for AO/OTA classification of pelvic fractures on pelvic radiographs. PLoS One 2024; 19:e0304350. [PMID: 38814948 PMCID: PMC11139281 DOI: 10.1371/journal.pone.0304350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 05/10/2024] [Indexed: 06/01/2024] Open
Abstract
Depending on the degree of fracture, pelvic fracture can be accompanied by vascular damage, and in severe cases, it may progress to hemorrhagic shock. Pelvic radiography can quickly diagnose pelvic fractures, and the Association for Osteosynthesis Foundation and Orthopedic Trauma Association (AO/OTA) classification system is useful for evaluating pelvic fracture instability. This study aimed to develop a radiomics-based machine-learning algorithm to quickly diagnose fractures on pelvic X-ray and classify their instability. data used were pelvic anteroposterior radiographs of 990 adults over 18 years of age diagnosed with pelvic fractures, and 200 normal subjects. A total of 93 features were extracted based on radiomics:18 first-order, 24 GLCM, 16 GLRLM, 16 GLSZM, 5 NGTDM, and 14 GLDM features. To improve the performance of machine learning, the feature selection methods RFE, SFS, LASSO, and Ridge were used, and the machine learning models used LR, SVM, RF, XGB, MLP, KNN, and LGBM. Performance measurement was evaluated by area under the curve (AUC) by analyzing the receiver operating characteristic curve. The machine learning model was trained based on the selected features using four feature-selection methods. When the RFE feature selection method was used, the average AUC was higher than that of the other methods. Among them, the combination with the machine learning model SVM showed the best performance, with an average AUC of 0.75±0.06. By obtaining a feature-importance graph for the combination of RFE and SVM, it is possible to identify features with high importance. The AO/OTA classification of normal pelvic rings and pelvic fractures on pelvic AP radiographs using a radiomics-based machine learning model showed the highest AUC when using the SVM classification combination. Further research on the radiomic features of each part of the pelvic bone constituting the pelvic ring is needed.
Collapse
Affiliation(s)
- Jun Young Park
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon, Republic of Korea
| | - Seung Hwan Lee
- Department of Trauma Surgery, Gachon University Gil Medical Center, Gachon University, Incheon, Republic of Korea
- Department of Traumatology, Gachon University College of Medicine, Gachon University, Incheon, Republic of Korea
| | - Young Jae Kim
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon, Republic of Korea
- Department of Medical Devices R&D Center, Gachon University Gil Medical Center, Gachon University, Incheon, Republic of Korea
- Department of Biomedical Engineering, Pre-medical Course, College of Medicine, Gachon University, Incheon, Republic of Korea
| | - Kwang Gi Kim
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon, Republic of Korea
- Department of Medical Devices R&D Center, Gachon University Gil Medical Center, Gachon University, Incheon, Republic of Korea
- Department of Biomedical Engineering, Pre-medical Course, College of Medicine, Gachon University, Incheon, Republic of Korea
| | - Gil Jae Lee
- Department of Trauma Surgery, Gachon University Gil Medical Center, Gachon University, Incheon, Republic of Korea
- Department of Traumatology, Gachon University College of Medicine, Gachon University, Incheon, Republic of Korea
| |
Collapse
|
8
|
Canero FM, Rodriguez-Galiano V, Aragones D. Machine Learning and Feature Selection for soil spectroscopy. An evaluation of Random Forest wrappers to predict soil organic matter, clay, and carbonates. Heliyon 2024; 10:e30228. [PMID: 38707402 PMCID: PMC11066688 DOI: 10.1016/j.heliyon.2024.e30228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 04/19/2024] [Accepted: 04/22/2024] [Indexed: 05/07/2024] Open
Abstract
Soil spectroscopy estimates soil properties using the absorption features in soil spectra. However, modelling soil properties with soil spectroscopy is challenging due to the high dimensionality of spectral data. Feature Selection wrapper methods are promising approaches to reduce the dimensionality but are barely used in soil spectroscopy. The aim of this study is to evaluate the performance of two feature selection wrapper methods, Sequential Forward Selection (SFS) and Sequential Flotant Forward Selection (SFFS) built using the Random Forest (RF) algorithm, for dimensionality reduction of spectral data and predictive modelling of modelling soil organic matter (SOM), clay and carbonates. The reflectance of 100 soil samples, acquired from Sierra de las Nieves (Spain), was measured under laboratory conditions using ASD FieldSpec Pro JR. Four different datasets were obtained after applying two spectral preprocessing methods to raw spectra: raw spectra, Continuum Removal (CR), Multiplicative Scatter Correction (MSC), and a so-called "Global" dataset composed of raw, CR and MSC features. The performance of RF models built with feature selection methods was compared to that of Partial Least Squares Regression (PLSR) and RF (alone). RF models built with SFS and SFFS outperformed PLSR and RF alone models: The best RF models with feature selection had a respective ratio of performance to interquartile distance of 1.93, 0.38 and 2.56. PLSR models had an accuracy of 1.41, 0.29 and 1.81 for SOM, carbonates, and clay, respectively. RF alone had a respective performance of 1.29, 0.29 and 1.81. The application of feature selection wrapper methods reduced the number of features to less than 1 % of the starting features. Features were selected across all spectra for SOM and clay, and around 900 nm, 1900 nm, and 2350 nm for carbonates. However, feature selection highlighted features around 1100 nm in SOM modelling, as well as other features around 2200 nm, which is considered a main absorption feature of clay. The application of feature selection with Random Forest was very important in improving modelling accuracy, reducing the redundant features and avoiding the curse of dimensionality or Hughes effect. Thus, this research showed an alternative to dimensionality reduction approaches that have been applied to date to model soil properties with spectroscopy and paves the way for further scientific investigation based on feature selection methods and machine learning.
Collapse
Affiliation(s)
- Francisco M. Canero
- Department of Physical Geography and Regional Geographic Analysis, Universidad de Sevilla, 41004, Seville, Spain
| | - Victor Rodriguez-Galiano
- Department of Physical Geography and Regional Geographic Analysis, Universidad de Sevilla, 41004, Seville, Spain
| | - David Aragones
- Remote Sensing and Geographic Information Systems Lab (LAST-EBD), Doñana Biological Station, C.S.I.C., 41092, Seville, Spain
| |
Collapse
|
9
|
Tiwari AK, Saini R, Nath A, Singh P, Shah MA. Hybrid similarity relation based mutual information for feature selection in intuitionistic fuzzy rough framework and its applications. Sci Rep 2024; 14:5958. [PMID: 38472266 PMCID: PMC10933482 DOI: 10.1038/s41598-024-55902-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
Fuzzy rough entropy established in the notion of fuzzy rough set theory, which has been effectively and efficiently applied for feature selection to handle the uncertainty in real-valued datasets. Further, Fuzzy rough mutual information has been presented by integrating information entropy with fuzzy rough set to measure the importance of features. However, none of the methods till date can handle noise, uncertainty and vagueness simultaneously due to both judgement and identification, which lead to degrade the overall performances of the learning algorithms with the increment in the number of mixed valued conditional features. In the current study, these issues are tackled by presenting a novel intuitionistic fuzzy (IF) assisted mutual information concept along with IF granular structure. Initially, a hybrid IF similarity relation is introduced. Based on this relation, an IF granular structure is introduced. Then, IF rough conditional and joint entropies are established. Further, mutual information based on these concepts are discussed. Next, mathematical theorems are proved to demonstrate the validity of the given notions. Thereafter, significance of the features subset is computed by using this mutual information, and corresponding feature selection is suggested to delete the irrelevant and redundant features. The current approach effectively handles noise and subsequent uncertainty in both nominal and mixed data (including both nominal and category variables). Moreover, comprehensive experimental performances are evaluated on real-valued benchmark datasets to demonstrate the practical validation and effectiveness of the addressed technique. Finally, an application of the proposed method is exhibited to improve the prediction of phospholipidosis positive molecules. RF(h2o) produces the most effective results till date based on our proposed methodology with sensitivity, accuracy, specificity, MCC, and AUC of 86.7%, 90.1%, 93.0% , 0.808, and 0.922 respectively.
Collapse
Affiliation(s)
- Anoop Kumar Tiwari
- Department of Computer Science and Information Technology, Central University of Haryana, Mahendergarh, 123031, India
| | - Rajat Saini
- Department of Mathematics, School of Basic Sciences, Central University of Haryana, Mahendergarh, 123031, India.
| | - Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India
| | - Phool Singh
- Department of Mathematics (SoET), Central University of Haryana, Mahendergarh, 123031, India
| | - Mohd Asif Shah
- Department of Economics, Kebri Dehar University, 250, Kebri Dehar, Somali, Ethiopia.
- Centre of Research Impact and Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, 140401, Punjab, India.
- Division of Research and Development, Lovely Professional University, Phagwara, 144001, Punjab, India.
| |
Collapse
|
10
|
Zhou W, Yan Z, Zhang L. A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction. Sci Rep 2024; 14:5905. [PMID: 38467662 PMCID: PMC10928191 DOI: 10.1038/s41598-024-55243-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/21/2024] [Indexed: 03/13/2024] Open
Abstract
To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R2 (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.
Collapse
Affiliation(s)
- Wei Zhou
- Florida Agricultural and Mechanical University, Tallahassee, FL, 32307, USA.
| | - Zhengxiao Yan
- Florida State University, Tallahassee, FL, 32306, USA
| | - Liting Zhang
- Florida State University, Tallahassee, FL, 32306, USA
| |
Collapse
|
11
|
Atimbire SA, Appati JK, Owusu E. Empirical exploration of whale optimisation algorithm for heart disease prediction. Sci Rep 2024; 14:4530. [PMID: 38402276 PMCID: PMC10894250 DOI: 10.1038/s41598-024-54990-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 02/19/2024] [Indexed: 02/26/2024] Open
Abstract
Heart Diseases have the highest mortality worldwide, necessitating precise predictive models for early risk assessment. Much existing research has focused on improving model accuracy with single datasets, often neglecting the need for comprehensive evaluation metrics and utilization of different datasets in the same domain (heart disease). This research introduces a heart disease risk prediction approach by harnessing the whale optimization algorithm (WOA) for feature selection and implementing a comprehensive evaluation framework. The study leverages five distinct datasets, including the combined dataset comprising the Cleveland, Long Beach VA, Switzerland, and Hungarian heart disease datasets. The others are the Z-AlizadehSani, Framingham, South African, and Cleveland heart datasets. The WOA-guided feature selection identifies optimal features, subsequently integrated into ten classification models. Comprehensive model evaluation reveals significant improvements across critical performance metrics, including accuracy, precision, recall, F1 score, and the area under the receiver operating characteristic curve. These enhancements consistently outperform state-of-the-art methods using the same dataset, validating the effectiveness of our methodology. The comprehensive evaluation framework provides a robust assessment of the model's adaptability, underscoring the WOA's effectiveness in identifying optimal features in multiple datasets in the same domain.
Collapse
Affiliation(s)
| | | | - Ebenezer Owusu
- Department of Computer Science, University of Ghana, Accra, Ghana
| |
Collapse
|
12
|
Yang K, Liu L, Wen Y. The impact of Bayesian optimization on feature selection. Sci Rep 2024; 14:3948. [PMID: 38366092 PMCID: PMC10873405 DOI: 10.1038/s41598-024-54515-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 02/13/2024] [Indexed: 02/18/2024] Open
Abstract
Feature selection is an indispensable step for the analysis of high-dimensional molecular data. Despite its importance, consensus is lacking on how to choose the most appropriate feature selection methods, especially when the performance of the feature selection methods itself depends on hyper-parameters. Bayesian optimization has demonstrated its advantages in automatically configuring the settings of hyper-parameters for various models. However, it remains unclear whether Bayesian optimization can benefit feature selection methods. In this research, we conducted extensive simulation studies to compare the performance of various feature selection methods, with a particular focus on the impact of Bayesian optimization on those where hyper-parameters tuning is needed. We further utilized the gene expression data obtained from the Alzheimer's Disease Neuroimaging Initiative to predict various brain imaging-related phenotypes, where various feature selection methods were employed to mine the data. We found through simulation studies that feature selection methods with hyper-parameters tuned using Bayesian optimization often yield better recall rates, and the analysis of transcriptomic data further revealed that Bayesian optimization-guided feature selection can improve the accuracy of disease risk prediction models. In conclusion, Bayesian optimization can facilitate feature selection methods when hyper-parameter tuning is needed and has the potential to substantially benefit downstream tasks.
Collapse
Affiliation(s)
- Kaixin Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No 56 Xinjian South Road, Yingze District, Taiyuan, Shanxi, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No 56 Xinjian South Road, Yingze District, Taiyuan, Shanxi, China.
| | - Yalu Wen
- Department of Statistics, University of Auckland, 38 Princes Street, Auckland Central, Auckland, 1010, New Zealand.
| |
Collapse
|
13
|
Lu M, Yin R, Chen XS. Ensemble methods of rank-based trees for single sample classification with gene expression profiles. J Transl Med 2024; 22:140. [PMID: 38321494 PMCID: PMC10848444 DOI: 10.1186/s12967-024-04940-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 01/27/2024] [Indexed: 02/08/2024] Open
Abstract
Building Single Sample Predictors (SSPs) from gene expression profiles presents challenges, notably due to the lack of calibration across diverse gene expression measurement technologies. However, recent research indicates the viability of classifying phenotypes based on the order of expression of multiple genes. Existing SSP methods often rely on Top Scoring Pairs (TSP), which are platform-independent and easy to interpret through the concept of "relative expression reversals". Nevertheless, TSP methods face limitations in classifying complex patterns involving comparisons of more than two gene expressions. To overcome these constraints, we introduce a novel approach that extends TSP rules by constructing rank-based trees capable of encompassing extensive gene-gene comparisons. This method is bolstered by incorporating two ensemble strategies, boosting and random forest, to mitigate the risk of overfitting. Our implementation of ensemble rank-based trees employs boosting with LogitBoost cost and random forests, addressing both binary and multi-class classification problems. In a comparative analysis across 12 cancer gene expression datasets, our proposed methods demonstrate superior performance over both the k-TSP classifier and nearest template prediction methods. We have further refined our approach to facilitate variable selection and the generation of clear, precise decision rules from rank-based trees, enhancing interpretability. The cumulative evidence from our research underscores the significant potential of ensemble rank-based trees in advancing disease classification via gene expression data, offering a robust, interpretable, and scalable solution. Our software is available at https://CRAN.R-project.org/package=ranktreeEnsemble .
Collapse
Affiliation(s)
- Min Lu
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA.
| | - Ruijie Yin
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA
| | - X Steven Chen
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA.
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, 1475 NW 12th Ave, Miami, FL, 33136, USA.
| |
Collapse
|
14
|
Sheng J, Lam S, Zhang J, Zhang Y, Cai J. Multi-omics fusion with soft labeling for enhanced prediction of distant metastasis in nasopharyngeal carcinoma patients after radiotherapy. Comput Biol Med 2024; 168:107684. [PMID: 38039891 DOI: 10.1016/j.compbiomed.2023.107684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/06/2023] [Accepted: 11/06/2023] [Indexed: 12/03/2023]
Abstract
Omics fusion has emerged as a crucial preprocessing approach in medical image processing, significantly assisting several studies. One of the challenges encountered in integrating omics data is the unpredictability arising from disparities in data sources and medical imaging equipment. Due to these differences, the distribution of omics futures exhibits spatial heterogeneity, diminishing their capacity to enhance subsequent tasks. To overcome this challenge and facilitate the integration of their joint application to specific medical objectives, this study aims to develop a fusion methodology for nasopharyngeal carcinoma (NPC) distant metastasis prediction to mitigate the disparities inherent in omics data. The multi-kernel late-fusion method can reduce the impact of these differences by mapping the features using the most suiTable single-kernel function and then combining them in a high-dimensional space that can effectively represent the data. The proposed approach in this study employs a distinctive framework incorporating a label-softening technique alongside a multi-kernel-based Radial basis function (RBF) neural network to address these limitations. An efficient representation of the data may be achieved by utilizing the multi-kernel to map the inherent features and then merging them in a space with many dimensions. However, the inflexibility of label fitting poses a constraint on using multi-kernel late-fusion methods in complex NPC datasets, hence affecting the efficacy of general classifiers in dealing with high-dimensional characteristics. The label softening increases the disparity between the two cohorts, providing a more flexible structure for allocating labels. The proposed model is evaluated on multi-omics datasets, and the results demonstrate its strength and effectiveness in predicting distant metastasis of NPC patients.
Collapse
Affiliation(s)
- Jiabao Sheng
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China; Research Institute for Smart Ageing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China.
| | - SaiKit Lam
- Research Institute for Smart Ageing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China; Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China.
| | - Jiang Zhang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China.
| | - Yuanpeng Zhang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China; The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, China.
| | - Jing Cai
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China; Research Institute for Smart Ageing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China; The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
15
|
Hosseiniyan Khatibi SM, Rahbar Saadat Y, Hejazian SM, Sharifi S, Ardalan M, Teshnehlab M, Zununi Vahed S, Pirmoradi S. Decoding the Possible Molecular Mechanisms in Pediatric Wilms Tumor and Rhabdoid Tumor of the Kidney through Machine Learning Approaches. Fetal Pediatr Pathol 2023; 42:825-844. [PMID: 37548233 DOI: 10.1080/15513815.2023.2242979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 07/26/2023] [Indexed: 08/08/2023]
Abstract
Objective: Wilms tumor (WT) and Rhabdoid tumor (RT) are pediatric renal tumors and their differentiation is based on histopathological and molecular analysis. The present study aimed to introduce the panels of mRNAs and microRNAs involved in the pathogenesis of these cancers using deep learning algorithms. Methods: Filter, graph, and association rule mining algorithms were applied to the mRNAs/microRNAs data. Results: Candidate miRNAs and mRNAs with high accuracy (AUC: 97%/93% and 94%/97%, respectively) could differentiate the WT and RT classes in training and test data. Let-7a-2 and C19orf24 were identified in the WT, while miR-199b and RP1-3E10.2 were detected in the RT by analysis of Association Rule Mining. Conclusion: The application of the machine learning methods could identify mRNA/miRNA patterns to discriminate WT from RT. The identified miRNAs/mRNAs panels could offer novel insights into the underlying molecular mechanisms that are responsible for the initiation and development of these cancers. They may provide further insight into the pathogenesis, prognosis, diagnosis, and molecular-targeted therapy in pediatric renal tumors.
Collapse
Affiliation(s)
- Seyed Mahdi Hosseiniyan Khatibi
- Kidney Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
- Clinical Research Development Unit of Tabriz Valiasr Hospital, Tabriz University of Medical Sciences, Tabriz, Iran
| | | | | | - Simin Sharifi
- Dental and Periodontal Research Center, Tabriz University of Medical Sciences, Tabriz Iran
| | | | - Mohammad Teshnehlab
- Department of Electrical and Computer Engineering, K.N. Toosi University of Technology, Tehran, Iran
| | | | - Saeed Pirmoradi
- Clinical Research Development Unit of Tabriz Valiasr Hospital, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
16
|
Sun S, Alkahtani ME, Gaisford S, Basit AW, Elbadawi M, Orlu M. Virtually Possible: Enhancing Quality Control of 3D-Printed Medicines with Machine Vision Trained on Photorealistic Images. Pharmaceutics 2023; 15:2630. [PMID: 38004607 PMCID: PMC10674815 DOI: 10.3390/pharmaceutics15112630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/01/2023] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open
Abstract
Three-dimensional (3D) printing is an advanced pharmaceutical manufacturing technology, and concerted efforts are underway to establish its applicability to various industries. However, for any technology to achieve widespread adoption, robustness and reliability are critical factors. Machine vision (MV), a subset of artificial intelligence (AI), has emerged as a powerful tool to replace human inspection with unprecedented speed and accuracy. Previous studies have demonstrated the potential of MV in pharmaceutical processes. However, training models using real images proves to be both costly and time consuming. In this study, we present an alternative approach, where synthetic images were used to train models to classify the quality of dosage forms. We generated 200 photorealistic virtual images that replicated 3D-printed dosage forms, where seven machine learning techniques (MLTs) were used to perform image classification. By exploring various MV pipelines, including image resizing and transformation, we achieved remarkable classification accuracies of 80.8%, 74.3%, and 75.5% for capsules, tablets, and films, respectively, for classifying stereolithography (SLA)-printed dosage forms. Additionally, we subjected the MLTs to rigorous stress tests, evaluating their scalability to classify over 3000 images and their ability to handle irrelevant images, where accuracies of 66.5% (capsules), 72.0% (tablets), and 70.9% (films) were obtained. Moreover, model confidence was also measured, and Brier scores ranged from 0.20 to 0.40. Our results demonstrate promising proof of concept that virtual images exhibit great potential for image classification of SLA-printed dosage forms. By using photorealistic virtual images, which are faster and cheaper to generate, we pave the way for accelerated, reliable, and sustainable AI model development to enhance the quality control of 3D-printed medicines.
Collapse
Affiliation(s)
- Siyuan Sun
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK; (S.S.); (M.E.A.); (S.G.)
| | - Manal E. Alkahtani
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK; (S.S.); (M.E.A.); (S.G.)
- Department of Pharmaceutics, College of Pharmacy, Prince Sattam bin Abdulaziz University, Alkharj 11942, Saudi Arabia
| | - Simon Gaisford
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK; (S.S.); (M.E.A.); (S.G.)
| | - Abdul W. Basit
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK; (S.S.); (M.E.A.); (S.G.)
| | - Moe Elbadawi
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK; (S.S.); (M.E.A.); (S.G.)
- School of Biological and Behavioural Sciences, Queen Mary University of London, Mile End Road, London E1 4DQ, UK
| | - Mine Orlu
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK; (S.S.); (M.E.A.); (S.G.)
| |
Collapse
|
17
|
Alahdab F, El Shawi R, Ahmed AI, Han Y, Al-Mallah M. Patient-level explainable machine learning to predict major adverse cardiovascular events from SPECT MPI and CCTA imaging. PLoS One 2023; 18:e0291451. [PMID: 37967112 PMCID: PMC10651041 DOI: 10.1371/journal.pone.0291451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 08/30/2023] [Indexed: 11/17/2023] Open
Abstract
BACKGROUND Machine learning (ML) has shown promise in improving the risk prediction in non-invasive cardiovascular imaging, including SPECT MPI and coronary CT angiography. However, most algorithms used remain black boxes to clinicians in how they compute their predictions. Furthermore, objective consideration of the multitude of available clinical data, along with the visual and quantitative assessments from CCTA and SPECT, are critical for optimal patient risk stratification. We aim to provide an explainable ML approach to predict MACE using clinical, CCTA, and SPECT data. METHODS Consecutive patients who underwent clinically indicated CCTA and SPECT myocardial imaging for suspected CAD were included and followed up for MACEs. A MACE was defined as a composite outcome that included all-cause mortality, myocardial infarction, or late revascularization. We employed an Automated Machine Learning (AutoML) approach to predict MACE using clinical, CCTA, and SPECT data. Various mainstream models with different sets of hyperparameters have been explored, and critical predictors of risk are obtained using explainable techniques on the global and patient levels. Ten-fold cross-validation was used in training and evaluating the AutoML model. RESULTS A total of 956 patients were included (mean age 61.1 ±14.2 years, 54% men, 89% hypertension, 81% diabetes, 84% dyslipidemia). Obstructive CAD on CCTA and ischemia on SPECT were observed in 14% of patients, and 11% experienced MACE. ML prediction's sensitivity, specificity, and accuracy in predicting a MACE were 69.61%, 99.77%, and 96.54%, respectively. The top 10 global predictive features included 8 CCTA attributes (segment involvement score, number of vessels with severe plaque ≥70, ≥50% stenosis in the left marginal coronary artery, calcified plaque, ≥50% stenosis in the left circumflex coronary artery, plaque type in the left marginal coronary artery, stenosis degree in the second obtuse marginal of the left circumflex artery, and stenosis category in the marginals of the left circumflex artery) and 2 clinical features (past medical history of MI or left bundle branch block, being an ever smoker). CONCLUSION ML can accurately predict risk of developing a MACE in patients suspected of CAD undergoing SPECT MPI and CCTA. ML feature-ranking can also show, at a sample- as well as at a patient-level, which features are key in making such a prediction.
Collapse
Affiliation(s)
- Fares Alahdab
- Houston Methodist DeBakey Heart & Vascular Center, Houston, TX, United States of America
| | - Radwa El Shawi
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Ahmed Ibrahim Ahmed
- Houston Methodist DeBakey Heart & Vascular Center, Houston, TX, United States of America
| | - Yushui Han
- Houston Methodist DeBakey Heart & Vascular Center, Houston, TX, United States of America
| | - Mouaz Al-Mallah
- Houston Methodist DeBakey Heart & Vascular Center, Houston, TX, United States of America
| |
Collapse
|
18
|
Connor M, Salans M, Karunamuni R, Unnikrishnan S, Huynh-Le MP, Tibbs M, Qian A, Reyes A, Stasenko A, McDonald C, Moiseenko V, El-Naqa I, Hattangadi-Gluth JA. Fine Motor Skill Decline After Brain Radiation Therapy-A Multivariate Normal Tissue Complication Probability Study of a Prospective Trial. Int J Radiat Oncol Biol Phys 2023; 117:581-593. [PMID: 37150258 PMCID: PMC10911396 DOI: 10.1016/j.ijrobp.2023.04.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 03/20/2023] [Accepted: 04/29/2023] [Indexed: 05/09/2023]
Abstract
PURPOSE Brain radiation therapy can impair fine motor skills (FMS). Fine motor skills are essential for activities of daily living, enabling hand-eye coordination for manipulative movements. We developed normal tissue complication probability (NTCP) models for the decline in FMS after fractionated brain radiation therapy (RT). METHODS AND MATERIALS On a prospective trial, 44 patients with primary brain tumors received fractioned RT; underwent high-resolution volumetric magnetic resonance imaging, diffusion tensor imaging, and comprehensive FMS assessments (Delis-Kaplan Executive Function System Trail Making Test Motor Speed [DKEFS-MS]; and Grooved Pegboard dominant/nondominant hands) at baseline and 6 months postRT. Regions of interest subserving motor function (including cortex, superficial white matter, thalamus, basal ganglia, cerebellum, and white matter tracts) were autosegmented using validated methods and manually verified. Dosimetric and clinical variables were included in multivariate NTCP models using automated bootstrapped logistic regression, least absolute shrinkage and selection operator logistic regression, and random forests with nested cross-validation. RESULTS Half of the patients showed a decline on grooved pegboard test of nondominant hands, 17 of 42 (40.4%) on grooved pegboard test of -dominant hands, and 11 of 44 (25%) on DKEFS-MS. Automated bootstrapped logistic regression selected a 1-term model including maximum dose to dominant postcentral white matter. The least absolute shrinkage and selection operator logistic regression selected this term and steroid use. The top 5 variables in the random forest were all dosimetric: maximum dose to dominant thalamus, mean dose to dominant caudate, mean and maximum dose to the dominant corticospinal tract, and maximum dose to dominant postcentral white matter. This technique performed best with an area under the curve of 0.69 (95% CI, 0.68-0.70) on nested cross-validation. CONCLUSIONS We present the first NTCP models for FMS impairment after brain RT. Dose to several supratentorial motor-associated regions of interest correlated with a decline in dominant-hand fine motor dexterity in patients with primary brain tumors in multivariate models, outperforming clinical variables. These data can guide prospective fine motor-sparing strategies for brain RT.
Collapse
Affiliation(s)
- Michael Connor
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California
| | - Mia Salans
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California
| | - Roshan Karunamuni
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California
| | - Soumya Unnikrishnan
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California
| | | | - Michelle Tibbs
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California
| | - Alexander Qian
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California
| | - Anny Reyes
- Department of Psychiatry, University of California San Diego, San Diego, California
| | - Alena Stasenko
- Department of Psychiatry, University of California San Diego, San Diego, California
| | - Carrie McDonald
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California; Department of Psychiatry, University of California San Diego, San Diego, California
| | - Vitali Moiseenko
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California
| | - Issam El-Naqa
- Department of Radiation Oncology, Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Jona A Hattangadi-Gluth
- Department of Radiation Medicine and Applied Sciences, University of California San Diego, San Diego, California.
| |
Collapse
|
19
|
Fu X, Song C, Zhang R, Shi H, Jiao Z. Multimodal Classification Framework Based on Hypergraph Latent Relation for End-Stage Renal Disease Associated with Mild Cognitive Impairment. Bioengineering (Basel) 2023; 10:958. [PMID: 37627843 PMCID: PMC10451373 DOI: 10.3390/bioengineering10080958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023] Open
Abstract
Combined arterial spin labeling (ASL) and functional magnetic resonance imaging (fMRI) can reveal more comprehensive properties of the spatiotemporal and quantitative properties of brain networks. Imaging markers of end-stage renal disease associated with mild cognitive impairment (ESRDaMCI) will be sought from these properties. The current multimodal classification methods often neglect to collect high-order relationships of brain regions and remove noise from the feature matrix. A multimodal classification framework is proposed to address this issue using hypergraph latent relation (HLR). A brain functional network with hypergraph structural information is constructed by fMRI data. The feature matrix is obtained through graph theory (GT). The cerebral blood flow (CBF) from ASL is selected as the second modal feature matrix. Then, the adaptive similarity matrix is constructed by learning the latent relation between feature matrices. Latent relation adaptive similarity learning (LRAS) is introduced to multi-task feature learning to construct a multimodal feature selection method based on latent relation (LRMFS). The experimental results show that the best classification accuracy (ACC) reaches 88.67%, at least 2.84% better than the state-of-the-art methods. The proposed framework preserves more valuable information between brain regions and reduces noise among feature matrixes. It provides an essential reference value for ESRDaMCI recognition.
Collapse
Affiliation(s)
- Xidong Fu
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China
| | - Chaofan Song
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China
| | - Rupu Zhang
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China
| | - Haifeng Shi
- Department of Radiology, The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213003, China
| | - Zhuqing Jiao
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China
| |
Collapse
|
20
|
Wang H, Doumard E, Soule-Dupuy C, Kemoun P, Aligon J, Monsarrat P. Explanations as a New Metric for Feature Selection: A Systematic Approach. IEEE J Biomed Health Inform 2023; 27:4131-4142. [PMID: 37220033 DOI: 10.1109/jbhi.2023.3279340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
With the extensive use of Machine Learning (ML) in the biomedical field, there was an increasing need for Explainable Artificial Intelligence (XAI) to improve transparency and reveal complex hidden relationships between variables for medical practitioners, while meeting regulatory requirements. Feature Selection (FS) is widely used as a part of a biomedical ML pipeline to significantly reduce the number of variables while preserving as much information as possible. However, the choice of FS methods affects the entire pipeline including the final prediction explanations, whereas very few works investigate the relationship between FS and model explanations. Through a systematic workflow performed on 145 datasets and an illustration on medical data, the present work demonstrated the promising complementarity of two metrics based on explanations (using ranking and influence changes) in addition to accuracy and retention rate to select the most appropriate FS/ML models. Measuring how much explanations differ with/without FS are particularly promising for FS methods recommendation. While reliefF generally performs the best on average, the optimal choice may vary for each dataset. Positioning FS methods in a tridimensional space, integrating explanations-based metrics, accuracy and retention rate, would allow the user to choose the priorities to be given on each of the dimensions. In biomedical applications, where each medical condition may have its own preferences, this framework will make it possible to offer the healthcare professional the appropriate FS technique, to select the variables that have an important explainable impact, even if this comes at the expense of a limited drop of accuracy.
Collapse
|
21
|
Ribeiro C, Farmer CK, de Magalhães JP, Freitas AA. Predicting lifespan-extending chemical compounds for C. elegans with machine learning and biologically interpretable features. Aging (Albany NY) 2023; 15:6073-6099. [PMID: 37450404 PMCID: PMC10373959 DOI: 10.18632/aging.204866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/19/2023] [Indexed: 07/18/2023]
Abstract
Recently, there has been a growing interest in the development of pharmacological interventions targeting ageing, as well as in the use of machine learning for analysing ageing-related data. In this work, we use machine learning methods to analyse data from DrugAge, a database of chemical compounds (including drugs) modulating lifespan in model organisms. To this end, we created four types of datasets for predicting whether or not a compound extends the lifespan of C. elegans (the most frequent model organism in DrugAge), using four different types of predictive biological features, based on: compound-protein interactions, interactions between compounds and proteins encoded by ageing-related genes, and two types of terms annotated for proteins targeted by the compounds, namely Gene Ontology (GO) terms and physiology terms from the WormBase's Phenotype Ontology. To analyse these datasets, we used a combination of feature selection methods in a data pre-processing phase and the well-established random forest algorithm for learning predictive models from the selected features. In addition, we interpreted the most important features in the two best models in light of the biology of ageing. One noteworthy feature was the GO term "Glutathione metabolic process", which plays an important role in cellular redox homeostasis and detoxification. We also predicted the most promising novel compounds for extending lifespan from a list of previously unlabelled compounds. These include nitroprusside, which is used as an antihypertensive medication. Overall, our work opens avenues for future work in employing machine learning to predict novel life-extending compounds.
Collapse
Affiliation(s)
- Caio Ribeiro
- School of Computing, University of Kent, Canterbury, Kent, UK
| | | | - João Pedro de Magalhães
- Genomics of Ageing and Rejuvenation Lab, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
| | - Alex A. Freitas
- School of Computing, University of Kent, Canterbury, Kent, UK
| |
Collapse
|
22
|
Rostamzadeh S, Abouhossein A, Saremi M, Taheri F, Ebrahimian M, Vosoughi S. A comparative investigation of machine learning algorithms for predicting safety signs comprehension based on socio-demographic factors and cognitive sign features. Sci Rep 2023; 13:10843. [PMID: 37407611 DOI: 10.1038/s41598-023-38065-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 07/02/2023] [Indexed: 07/07/2023] Open
Abstract
This study examines whether the socio-demographic factors and cognitive sign features can be used for envisaging safety signs comprehensibility using predictive machine learning (ML) techniques. This study will determine the role of different machine learning components such as feature selection and classification to determine suitable factors for safety construction signs comprehensibility. A total of 2310 participants were requested to guess the meaning of 20 construction safety signs (four items for each of the mandatory, prohibition, emergency, warning, and firefighting signs) using the open-ended method. Moreover, the participants were asked to rate the cognitive design features of each sign in terms of familiarity, concreteness, simplicity, meaningfulness, and semantic closeness on a 0-100 rating scale. Subsequently, all eight features (age, experience, education level, familiarity, concreteness, meaningfulness, semantic closeness, and simplicity) were used for classification. Furthermore, the 14 most popular supervised classifiers were implemented and evaluated for safety sign comprehensibility prediction using these eight features. Also, filter and wrapper methods were used as feature selection techniques. Results of feature selection techniques indicate that among the eight features considered in this study, familiarity, simplicity, and meaningfulness are found to be the most relevant and effective components in predicting the comprehensibility of selected safety signs. Further, when these three features are used for classification, the K-NN classifier achieves the highest classification accuracy of 94.369% followed by medium Gaussian SVM which achieves a classification accuracy of 76.075% under hold-out data division protocol. The machine learning (ML) technique was adopted as a promising approach to addressing the issue of comprehensibility, especially in terms of determining factors affecting the safety signs' comprehension. The cognitive sign features of familiarity, simplicity, and meaningfulness can provide useful information in terms of designing user-friendly safety signs.
Collapse
Affiliation(s)
- Sajjad Rostamzadeh
- Department of Ergonomics, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Alireza Abouhossein
- Department of Ergonomics, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mahnaz Saremi
- Department of Ergonomics, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Fereshteh Taheri
- Occupational Health Research Center, Iran University of Medical Sciences, Shahid Hemmat Highway, Tehran, 1449614535, Iran
| | - Mobin Ebrahimian
- Department of Health in Disasters and Emergencies, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Shahram Vosoughi
- Occupational Health Research Center, Iran University of Medical Sciences, Shahid Hemmat Highway, Tehran, 1449614535, Iran.
| |
Collapse
|
23
|
Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023; 21:182. [PMID: 37189125 DOI: 10.1186/s12916-023-02858-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.
Collapse
Affiliation(s)
| | | | - Axel Benner
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Federico Ambrogi
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
- Scientific Directorate, IRCCS Policlinico San Donato, San Donato Milanese, Italy
| | - Lara Lusa
- Department of Mathematics, Faculty of Mathematics, Natural Sciences and Information Technology, University of Primorksa, Koper, Slovenia
- Institute of Biostatistics and Medical Informatics, University of Ljubljana, Ljubljana, Slovenia
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany
| | | | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Stefan Michiels
- Service de Biostatistique et d'Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Labeled Ligue Contre le Cancer, Villejuif, France
| | - Willi Sauerbrei
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Lisa McShane
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.
| |
Collapse
|
24
|
Francis DP, Laustsen M, Dossi E, Treiberg T, Hardy I, Shiv SH, Hansen BS, Mogensen J, Jakobsen MH, Alstrøm TS. Machine learning methods for the detection of explosives, drugs and precursor chemicals gathered using a colorimetric sniffer sensor. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2023; 15:2343-2354. [PMID: 37157832 DOI: 10.1039/d3ay00247k] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Colorimetric sensing technology for the detection of explosives, drugs, and their precursor chemicals is an important and effective approach. In this work, we use various machine learning models to detect these substances from colorimetric sensing experiments conducted in controlled environments. The detection experiments based on the response of a colorimetric chip containing 26 chemo-responsive dyes indicate that homemade explosives (HMEs) such as hexamethylene triperoxide diamine (HMTD), triacetone triperoxide (TATP), and methyl ethyl ketone peroxide (MEKP) used in improvised explosives devices are detected with true positive rate (TPR) of 70-75%, 73-90% and 60-82% respectively. Time series classifiers such as Convolutional Neural Networks (CNN) are explored, and the results indicate that improvements can be achieved with the use of kinetics of the chemical responses. The use of CNNs is limited, however, to scenarios where a large number of measurements, typically in the range of a few hundred, of each analyte are available. Feature selection of important dyes using the Group Lasso (GPLASSO) algorithm indicated that certain dyes are more important in discrimination of an analyte from ambient air. This information could be used for optimizing the colorimetric sensor and extend the detection to more analytes.
Collapse
Affiliation(s)
- Deena P Francis
- DTU Compute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.
| | | | - Eleftheria Dossi
- Centre for Defence Chemistry, Cranfield University, Defence Academy of United Kingdom, Shrivenham, SN6 8LA, UK
| | - Tuule Treiberg
- DTU Chemistry, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Iona Hardy
- Centre for Defence Chemistry, Cranfield University, Defence Academy of United Kingdom, Shrivenham, SN6 8LA, UK
| | - Shai Hvid Shiv
- DTU Chemistry, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | | | - Jesper Mogensen
- Danish Emergency Management Agency, Chemical Division, Nørre Allé 67, 2100 Copenhagen, Denmark
| | - Mogens H Jakobsen
- DTU Chemistry, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Tommy S Alstrøm
- DTU Compute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
25
|
Doherty T, Dempster E, Hannon E, Mill J, Poulton R, Corcoran D, Sugden K, Williams B, Caspi A, Moffitt TE, Delany SJ, Murphy TM. A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator. BMC Bioinformatics 2023; 24:178. [PMID: 37127563 PMCID: PMC10152624 DOI: 10.1186/s12859-023-05282-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 04/11/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND The field of epigenomics holds great promise in understanding and treating disease with advances in machine learning (ML) and artificial intelligence being vitally important in this pursuit. Increasingly, research now utilises DNA methylation measures at cytosine-guanine dinucleotides (CpG) to detect disease and estimate biological traits such as aging. Given the challenge of high dimensionality of DNA methylation data, feature-selection techniques are commonly employed to reduce dimensionality and identify the most important subset of features. In this study, our aim was to test and compare a range of feature-selection methods and ML algorithms in the development of a novel DNA methylation-based telomere length (TL) estimator. We utilised both nested cross-validation and two independent test sets for the comparisons. RESULTS We found that principal component analysis in advance of elastic net regression led to the overall best performing estimator when evaluated using a nested cross-validation analysis and two independent test cohorts. This approach achieved a correlation between estimated and actual TL of 0.295 (83.4% CI [0.201, 0.384]) on the EXTEND test data set. Contrastingly, the baseline model of elastic net regression with no prior feature reduction stage performed less well in general-suggesting a prior feature-selection stage may have important utility. A previously developed TL estimator, DNAmTL, achieved a correlation of 0.216 (83.4% CI [0.118, 0.310]) on the EXTEND data. Additionally, we observed that different DNA methylation-based TL estimators, which have few common CpGs, are associated with many of the same biological entities. CONCLUSIONS The variance in performance across tested approaches shows that estimators are sensitive to data set heterogeneity and the development of an optimal DNA methylation-based estimator should benefit from the robust methodological approach used in this study. Moreover, our methodology which utilises a range of feature-selection approaches and ML algorithms could be applied to other biological markers and disease phenotypes, to examine their relationship with DNA methylation and predictive value.
Collapse
Affiliation(s)
- Trevor Doherty
- School of Biological, Health and Sports Sciences, Technological University Dublin, Dublin, Ireland.
- SFI Centre for Research Training in Machine Learning, Technological University Dublin, Dublin, Ireland.
| | - Emma Dempster
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Eilis Hannon
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Jonathan Mill
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Richie Poulton
- Department of Psychology, University of Otago, Dunedin, 9016, New Zealand
| | - David Corcoran
- Center for Genomic and Computational Biology, Duke University, Durham, NC, 27708, USA
| | - Karen Sugden
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Ben Williams
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Avshalom Caspi
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Terrie E Moffitt
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Sarah Jane Delany
- School of Computer Science, Technological University Dublin, Dublin, Ireland
| | - Therese M Murphy
- School of Biological, Health and Sports Sciences, Technological University Dublin, Dublin, Ireland
| |
Collapse
|
26
|
Pan Q, Hu W, He D, He C, Zhang L, Shi Q. Machine-learning assisted molecular formula assignment to high-resolution mass spectrometry data of dissolved organic matter. Talanta 2023; 259:124484. [PMID: 37001397 DOI: 10.1016/j.talanta.2023.124484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/22/2023] [Indexed: 03/29/2023]
Abstract
High-resolution mass spectrometry (HRMS) provides molecular compositional information of dissolved organic matter (DOM) through isotopic assignment from the molecular mass. However, due to the inevitable deviation of molecular mass measurement and the limitation of resolving power, multiple possible solutions frequently occur for a given molecular mass. Lowering the mass deviation threshold and adding assignment restriction rules are often applied to exclude the incorrect solutions, which generally involves time-consuming manual post-processing of mass data. To improve the result accuracy in an automated manner, we developed a molecular formula assignment algorithm based on machine-learning technology. The method integrated a logistic regression model using manually corrected isotopic composition and the peak features of HRMS data (m/z, signal-to-noise ratio, isotope type, and number, etc.) as training data. The developed model can evaluate the correctness of a candidate formula for the given mass peak based on the peak features. The method was verified by various DOM samples FT-ICR MS data (direct infusion negative mode electrospray), achieving a ∼90% accuracy (compared to the traditional approach) for formula assignment. The method was applied to a series of NOM samples and showed a significant improvement in formula assignment compared with the mass matching method.
Collapse
|
27
|
Wang XW, Wang T, Schaub DP, Chen C, Sun Z, Ke S, Hecker J, Maaser-Hecker A, Zeleznik OA, Zeleznik R, Litonjua AA, DeMeo DL, Lasky-Su J, Silverman EK, Liu YY, Weiss ST. Benchmarking omics-based prediction of asthma development in children. Respir Res 2023; 24:63. [PMID: 36842969 PMCID: PMC9969629 DOI: 10.1186/s12931-023-02368-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 02/16/2023] [Indexed: 02/27/2023] Open
Abstract
BACKGROUND Asthma is a heterogeneous disease with high morbidity. Advancement in high-throughput multi-omics approaches has enabled the collection of molecular assessments at different layers, providing a complementary perspective of complex diseases. Numerous computational methods have been developed for the omics-based patient classification or disease outcome prediction. Yet, a systematic benchmarking of those methods using various combinations of omics data for the prediction of asthma development is still lacking. OBJECTIVE We aimed to investigate the computational methods in disease status prediction using multi-omics data. METHOD We systematically benchmarked 18 computational methods using all the 63 combinations of six omics data (GWAS, miRNA, mRNA, microbiome, metabolome, DNA methylation) collected in The Vitamin D Antenatal Asthma Reduction Trial (VDAART) cohort. We evaluated each method using standard performance metrics for each of the 63 omics combinations. RESULTS Our results indicate that overall Logistic Regression, Multi-Layer Perceptron, and MOGONET display superior performance, and the combination of transcriptional, genomic and microbiome data achieves the best prediction. Moreover, we find that including the clinical data can further improve the prediction performance for some but not all the omics combinations. CONCLUSIONS Specific omics combinations can reach the optimal prediction of asthma development in children. And certain computational methods showed superior performance than other methods.
Collapse
Affiliation(s)
- Xu-Wen Wang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Tong Wang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Darius P Schaub
- Department of Mathematics, University of Hamburg, 21109, Hamburg, Germany
| | - Can Chen
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Zheng Sun
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Shanlin Ke
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Julian Hecker
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Anna Maaser-Hecker
- Genetics and Aging Research Unit, Department of Neurology, McCance Center for Brain Health, Mass General Institute for Neurodegenerative Disease, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA, USA
| | - Oana A Zeleznik
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Roman Zeleznik
- Department of Radiation Oncology, Brigham and Women's Hospital, Boston, MA, USA
| | - Augusto A Litonjua
- Division of Pediatric Pulmonology, Golisano Children's Hospital, Rochester, NY, USA
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
- Center for Artificial Intelligence and Modeling, The Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
28
|
Mohiuddin S, Sheikh KH, Malakar S, Velásquez JD, Sarkar R. A hierarchical feature selection strategy for deepfake video detection. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08201-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
29
|
Ensemble filters with harmonize PSO-SVM algorithm for optimal hearing disorder prediction. Neural Comput Appl 2023; 35:10473-10496. [PMID: 36747886 PMCID: PMC9894525 DOI: 10.1007/s00521-023-08244-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 01/06/2023] [Indexed: 02/05/2023]
Abstract
Discovering a hearing disorder at an earlier intervention is critical for reducing the effects of hearing loss and the approaches to increase the remaining hearing ability can be implemented to achieve the successful development of human communication. Recently, the explosive dataset features have increased the complexity for audiologists to decide the proper treatment for the patient. In most cases, data with irrelevant features and improper classifier parameters causes a crucial influence on the audiometry system in terms of accuracy. This is due to the dependent processes of these two, where the classification accuracy performance could be worsened if both processes are conducted independently. Although the filter algorithm is capable of eliminating irrelevant features, it still lacks the ability to consider feature reliance and results in a poor selection of significant features. Improper kernel parameter settings may also contribute to poor accuracy performance. In this paper, an ensemble filters feature selection based on Information Gain (IG), Gain Ratio (GR), Chi-squared (CS), and Relief-F (RF) with harmonize optimization of Particle Swarm Optimization (PSO) and Support Vector Machine (SVM) is presented to mitigate these problems. Ensemble filters are utilized so that the initial top dominant features relevant for classification can be considered. Then, PSO and SVM are optimized simultaneously to achieve the optimal solution. The results on a standard Audiology dataset show that the proposed method produces 96.50% accuracy with optimal solution compared to classical SVM, which signifies the proposed method is effective in handling high dimensional data for hearing disorder prediction.
Collapse
|
30
|
Chen Y, Liu Y, Zuo X, Zhao Q, Sun M, Cui M, Zhao X, Du Y. Identification of significant imaging features for sensing oocyte viability. Microsc Res Tech 2023; 86:181-192. [PMID: 36278826 DOI: 10.1002/jemt.24248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/26/2022] [Accepted: 10/06/2022] [Indexed: 01/21/2023]
Abstract
The evaluation of oocyte viability in the laboratory is limited to the morphological assessment by naked eyes, but the realization that most normal-appearing oocytes may conceal abnormalities prompts the search for automated approaches that can detect the abnormalities imperceptible to naked eyes. In this study, we developed an image processing pipeline applicable to bright-field microscope images to quantify the causal relationship between the quantitative imaging features and the developmental potential of oocytes. We acquired 19 imaging features of approximately 700 oocytes and determined two imaging subtypes, namely viable and nonviable subtypes that correlated closely with a viability fluorescence indicator and cleavage rates. The causal relationship between these imaging features and oocyte viability was derived from a viability-oriented Bayesian network that was developed based on the Bayesian information criterion and Tabu search. Our experimental results revealed that entropy with mean Gray Level Co-Occurrence Matrix energy describing the uniformity and texture roughness of cytoplasm were salient features for the automated selection of promising oocytes that exhibited excellent developmental potential.
Collapse
Affiliation(s)
- Yizhe Chen
- Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
| | - Yaowei Liu
- Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
| | - Xiaoying Zuo
- Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
| | - Qili Zhao
- Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
| | - Mingzhu Sun
- Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
| | - Maosheng Cui
- Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China.,Innovation Team of Pig Feeding, Institute of Animal Science and Veterinary of Tianjin, Tianjin, China
| | - Xin Zhao
- Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
| | - Yue Du
- Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
| |
Collapse
|
31
|
An improved feature selection approach using global best guided Gaussian artificial bee colony for EMG classification. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
32
|
Zhang M, Wang JS, Liu Y, Wang M, Li XD, Guo FJ. Feature selection method based on stochastic fractal search henry gas solubility optimization algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-221036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
In most data mining tasks, feature selection is an essential preprocessing stage. Henry’s Gas Solubility Optimization (HGSO) algorithm is a physical heuristic algorithm based on Henry’s law, which simulates the process of gas solubility in liquid with temperature. In this paper, an improved Henry’s Gas Solubility Optimization based on stochastic fractal search (SFS-HGSO) is proposed for feature selection and engineering optimization. Three stochastic fractal strategies based on Gaussian walk, Lévy flight and Brownian motion are adopted respectively, and the diffusion is based on the high-quality solutions obtained by the original algorithm. Individuals with different fitness are assigned different energies, and the number of diffusing individuals is determined according to individual energy. This strategy increases the diversity of search strategies and enhances the ability of local search. It greatly improves the shortcomings of the original HGSO position updating method is single and the convergence speed is slow. This algorithm is used to solve the problem of feature selection, and KNN classifier is used to evaluate the effectiveness of selected features. In order to verify the performance of the proposed feature selection method, 20 standard UCI benchmark datasets are used, and the performance is compared with other swarm intelligence optimization algorithms, such as WOA, HHO and HBA. The algorithm is also applied to the solution of benchmark function. Experimental results show that these three improved strategies can effectively improve the performance of HGSO algorithm, and achieve excellent results in feature selection and engineering optimization problems.
Collapse
Affiliation(s)
- Min Zhang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Jie-Sheng Wang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Yu Liu
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Min Wang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Xu-Dong Li
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Fu-Jun Guo
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| |
Collapse
|
33
|
Hapfelmeier A, Hornung R, Haller B. Efficient permutation testing of variable importance measures by the example of random forests. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
34
|
Lap BQ, Phan TTH, Nguyen HD, Quang LX, Hang PT, Phi NQ, Hoang VT, Linh PG, Thanh Hang BT. Predicting Water Quality Index (WQI) by feature selection and machine learning: A case study of An Kim Hai irrigation system. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.101991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
35
|
Signol F, Arnal L, Navarro-Cerdán JR, Llobet R, Arlandis J, Perez-Cortes JC. SEQENS: An ensemble method for relevant gene identification in microarray data. Comput Biol Med 2023; 152:106413. [PMID: 36521355 DOI: 10.1016/j.compbiomed.2022.106413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 11/25/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022]
Abstract
This paper describes an ensemble feature identification algorithm called SEQENS, and measures its capability to identify the relevant variables in a case-control study using a genetic expression microarray dataset. SEQENS uses Sequential Feature Search on multiple sample splitting to select variables showing stronger relation with the target, and a variable relevance ranking is finally produced. Although designed for feature identification, SEQENS could also serve as a basis for feature selection (classifier optimisation). Cliff, a ranking evaluation metric is also presented and used to assess the feature identification algorithms when a groundtruth of relevant variables is available. To test performance, three types of synthetic groundtruths emulating fictitious diseases are generated from ten randomly chosen variables following different target pattern distributions using the E-MTAB-3732 dataset. Several sample-to-dimensionality ratios ranging from 300 to 3,000 observations and 854 to 54,675 variables are explored. SEQENS is compared with other feature selection or identification state-of-the-art methods. On average, the proposed algorithm identifies better the relevant genes and exhibits a stronger stability. The algorithm is available to the community.
Collapse
Affiliation(s)
- François Signol
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Camino de Vera, s/n, 46022 València, Spain.
| | - Laura Arnal
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Camino de Vera, s/n, 46022 València, Spain.
| | - J Ramón Navarro-Cerdán
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Camino de Vera, s/n, 46022 València, Spain.
| | - Rafael Llobet
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Camino de Vera, s/n, 46022 València, Spain.
| | - Joaquim Arlandis
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Camino de Vera, s/n, 46022 València, Spain.
| | - Juan-Carlos Perez-Cortes
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Camino de Vera, s/n, 46022 València, Spain.
| |
Collapse
|
36
|
Jia Z, Ou C, Sun S, Wang J, Liu J, Sun M, Ma W, Li M, Jia S, Mao P. Integrating optical imaging techniques for a novel approach to evaluate Siberian wild rye seed maturity. FRONTIERS IN PLANT SCIENCE 2023; 14:1170947. [PMID: 37152128 PMCID: PMC10157248 DOI: 10.3389/fpls.2023.1170947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 04/03/2023] [Indexed: 05/09/2023]
Abstract
Advances in optical imaging technology using rapid and non-destructive methods have led to improvements in the efficiency of seed quality detection. Accurately timing the harvest is crucial for maximizing the yield of higher-quality Siberian wild rye seeds by minimizing excessive shattering during harvesting. This research applied integrated optical imaging techniques and machine learning algorithms to develop different models for classifying Siberian wild rye seeds based on different maturity stages and grain positions. The multi-source fusion of morphological, multispectral, and autofluorescence data provided more comprehensive information but also increases the performance requirements of the equipment. Therefore, we employed three filtering algorithms, namely minimal joint mutual information maximization (JMIM), information gain, and Gini impurity, and set up two control methods (feature union and no-filtering) to assess the impact of retaining only 20% of the features on the model performance. Both JMIM and information gain revealed autofluorescence and morphological features (CIELab A, CIELab B, hue and saturation), with these two filtering algorithms showing shorter run times. Furthermore, a strong correlation was observed between shoot length and morphological and autofluorescence spectral features. Machine learning models based on linear discriminant analysis (LDA), random forests (RF) and support vector machines (SVM) showed high performance (>0.78 accuracies) in classifying seeds at different maturity stages. Furthermore, it was found that there was considerable variation in the different grain positions at the maturity stage, and the K-means approach was used to improve the model performance by 5.8%-9.24%. In conclusion, our study demonstrated that feature filtering algorithms combined with machine learning algorithms offer high performance and low cost in identifying seed maturity stages and that the application of k-means techniques for inconsistent maturity improves classification accuracy. Therefore, this technique could be employed classification of seed maturity and superior physiological quality for Siberian wild rye seeds.
Collapse
|
37
|
Parkinson E, Liberatore F, Watkins WJ, Andrews R, Edkins S, Hibbert J, Strunk T, Currie A, Ghazal P. Gene filtering strategies for machine learning guided biomarker discovery using neonatal sepsis RNA-seq data. Front Genet 2023; 14:1158352. [PMID: 37113992 PMCID: PMC10126415 DOI: 10.3389/fgene.2023.1158352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 03/29/2023] [Indexed: 04/29/2023] Open
Abstract
Machine learning (ML) algorithms are powerful tools that are increasingly being used for sepsis biomarker discovery in RNA-Seq data. RNA-Seq datasets contain multiple sources and types of noise (operator, technical and non-systematic) that may bias ML classification. Normalisation and independent gene filtering approaches described in RNA-Seq workflows account for some of this variability and are typically only targeted at differential expression analysis rather than ML applications. Pre-processing normalisation steps significantly reduce the number of variables in the data and thereby increase the power of statistical testing, but can potentially discard valuable and insightful classification features. A systematic assessment of applying transcript level filtering on the robustness and stability of ML based RNA-seq classification remains to be fully explored. In this report we examine the impact of filtering out low count transcripts and those with influential outliers read counts on downstream ML analysis for sepsis biomarker discovery using elastic net regularised logistic regression, L1-reguarlised support vector machines and random forests. We demonstrate that applying a systematic objective strategy for removal of uninformative and potentially biasing biomarkers representing up to 60% of transcripts in different sample size datasets, including two illustrative neonatal sepsis cohorts, leads to substantial improvements in classification performance, higher stability of the resulting gene signatures, and better agreement with previously reported sepsis biomarkers. We also demonstrate that the performance uplift from gene filtering depends on the ML classifier chosen, with L1-regularlised support vector machines showing the greatest performance improvements with our experimental data.
Collapse
Affiliation(s)
- Edward Parkinson
- Department of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
- *Correspondence: Edward Parkinson,
| | - Federico Liberatore
- Department of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
| | - W. John Watkins
- Project Sepsis, Systems Immunity Research Institute, Cardiff University, Cardiff, United Kingdom
| | - Robert Andrews
- Project Sepsis, Systems Immunity Research Institute, Cardiff University, Cardiff, United Kingdom
| | - Sarah Edkins
- Project Sepsis, Systems Immunity Research Institute, Cardiff University, Cardiff, United Kingdom
| | - Julie Hibbert
- Wesfarmers Centre of Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, WA, Australia
- Medical School, University of Western Australia, Perth, WA, Australia
- Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, WA, Australia
| | - Tobias Strunk
- Wesfarmers Centre of Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, WA, Australia
- Medical School, University of Western Australia, Perth, WA, Australia
- Neonatal Directorate, Child and Adolescent Health Service, Perth, WA, Australia
| | - Andrew Currie
- Wesfarmers Centre of Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, WA, Australia
- Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, WA, Australia
| | - Peter Ghazal
- Project Sepsis, Systems Immunity Research Institute, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
38
|
Bertolini R, Finch SJ. Stability of filter feature selection methods in data pipelines: a simulation study. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00373-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
39
|
BF2SkNet: best deep learning features fusion-assisted framework for multiclass skin lesion classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08084-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
40
|
Pan X, Zhang G, Lin A, Guan X, Chen P, Ge Y, Chen X. An evaluation model for children's foot & ankle deformity severity using sparse multi-objective feature selection algorithm. Comput Biol Med 2022; 151:106229. [PMID: 36308897 DOI: 10.1016/j.compbiomed.2022.106229] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 10/08/2022] [Accepted: 10/16/2022] [Indexed: 12/27/2022]
Abstract
Foot & ankle deformity is a chronic disease with high incidence and is best treated in childhood. However, the current diagnostic procedures rely on doctor's consultation and empirical judgment, and lack objective and quantitative evaluation methods, resulting in low screening rates. To solve this problem, this paper aims to construct an evaluation model for children's foot & ankle deformity through data mining and machine learning technologies. Firstly, it proposes the grading rules for children's foot & ankle deformity severity based on analyzing the existing quantitative indexes and expert experience. Then the 3D foot scanner is used to collect the sample data including 30 foot structure indexes. Finally, an advanced sparse multi-objective evolutionary algorithm (sparse MO-FS) is present for feature selection. The effectiveness of the proposed sparse MO-FS and its search efficiency are proved by comparing 8 feature selection methods and 7 search strategies. Using sparse MO-FS, foot length, arch index, ankle index, and hallux valgus index are selected, which not only simplifies the evaluation model but also improves the average classification accuracy of random forest to more than 98%.
Collapse
Affiliation(s)
- Xiaotian Pan
- School of Information Management and Artificial Intelligence, Zhejiang University of Finance and Economics, Hangzhou 310018, China.
| | - Guodao Zhang
- School of Media and Design, Hangzhou Dianzi University, Hangzhou 310018, China.
| | - Aiju Lin
- College of international Education, Wenzhou University, Wenzhou 325035, China.
| | - Xiaochun Guan
- Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China.
| | - PingKuo Chen
- Great Bay University, Dongguan City 523000, China.
| | - Yisu Ge
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325100, China.
| | - Xin Chen
- Orthopedics Department of The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China.
| |
Collapse
|
41
|
Xu J, Lu W, Li J, Yuan H. Dependency maximization forward feature selection algorithms based on normalized cross-covariance operator and its approximated form for high-dimensional data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
42
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
43
|
Jeon Y, Hwang G. Feature Selection with Scalable Variational Gaussian Process via Sensitivity Analysis based on L2 Divergence. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
44
|
Feature selection for distance-based regression: An umbrella review and a one-shot wrapper. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
45
|
Schumann P, Scholz M, Trentzsch K, Jochim T, Śliwiński G, Malberg H, Ziemssen T. Detection of Fall Risk in Multiple Sclerosis by Gait Analysis-An Innovative Approach Using Feature Selection Ensemble and Machine Learning Algorithms. Brain Sci 2022; 12:1477. [PMID: 36358403 PMCID: PMC9688245 DOI: 10.3390/brainsci12111477] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 10/24/2022] [Accepted: 10/26/2022] [Indexed: 10/15/2023] Open
Abstract
One of the common causes of falls in people with Multiple Sclerosis (pwMS) is walking impairment. Therefore, assessment of gait is of importance in MS. Gait analysis and fall detection can take place in the clinical context using a wide variety of available methods. However, combining these methods while using machine learning algorithms for detecting falls has not been performed. Our objective was to determine the most relevant method for determining fall risk by analyzing eleven different gait data sets with machine learning algorithms. In addition, we examined the most important features of fall detection. A new feature selection ensemble (FS-Ensemble) and four classification models (Gaussian Naive Bayes, Decision Tree, k-Nearest Neighbor, Support Vector Machine) were used. The FS-Ensemble consisted of four filter methods: Chi-square test, information gain, Minimum Redundancy Maximum Relevance and RelieF. Various thresholds (50%, 25% and 10%) and combination methods (Union, Union 2, Union 3 and Intersection) were examined. Patient-reported outcomes using specialized walking questionnaires such as the 12-item Multiple Sclerosis Walking Scale (MSWS-12) and the Early Mobility Impairment Questionnaire (EMIQ) achieved the best performances with an F1 score of 0.54 for detecting falls. A combination of selected features of MSWS-12 and EMIQ, including the estimation of walking, running and stair climbing ability, the subjective effort as well as necessary concentration and walking fluency during walking, the frequency of stumbling and the indication of avoidance of social activity achieved the best recall of 75%. The Gaussian Naive Bayes was the best classification model for detecting falls with almost all data sets. FS-Ensemble improved the classification models and is an appropriate technique for reducing data sets with a large number of features. Future research on other risk factors, such as fear of falling, could provide further insights.
Collapse
Affiliation(s)
- Paula Schumann
- Institute of Biomedical Engineering, TU Dresden, Fetscherstr. 29, 01307 Dresden, Germany
| | - Maria Scholz
- Center of Clinical Neuroscience, Neurological Clinic, University Hospital Carl Gustav Carus, TU Dresden, Fetscherstr. 74, 01307 Dresden, Germany
| | - Katrin Trentzsch
- Center of Clinical Neuroscience, Neurological Clinic, University Hospital Carl Gustav Carus, TU Dresden, Fetscherstr. 74, 01307 Dresden, Germany
| | - Thurid Jochim
- Institute of Biomedical Engineering, TU Dresden, Fetscherstr. 29, 01307 Dresden, Germany
| | - Grzegorz Śliwiński
- Institute of Biomedical Engineering, TU Dresden, Fetscherstr. 29, 01307 Dresden, Germany
| | - Hagen Malberg
- Institute of Biomedical Engineering, TU Dresden, Fetscherstr. 29, 01307 Dresden, Germany
| | - Tjalf Ziemssen
- Center of Clinical Neuroscience, Neurological Clinic, University Hospital Carl Gustav Carus, TU Dresden, Fetscherstr. 74, 01307 Dresden, Germany
| |
Collapse
|
46
|
Identification of Candidate Salivary, Urinary and Serum Metabolic Biomarkers for High Litter Size Potential in Sows (Sus scrofa). Metabolites 2022; 12:metabo12111045. [DOI: 10.3390/metabo12111045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/27/2022] [Accepted: 10/28/2022] [Indexed: 11/16/2022] Open
Abstract
The selection of sows that are reproductively fit and produce large litters of piglets is imperative for success in the pork industry. Currently, low heritability of reproductive and litter-related traits and unfavourable genetic correlations are slowing the improvement of pig selection efficiency. The integration of biomarkers as a supplement or alternative to the use of genetic markers may permit the optimization and increase of selection protocol efficiency. Metabolite biomarkers are an advantageous class of biomarkers that can facilitate the identification of cellular processes implicated in reproductive condition. Metabolism and metabolic biomarkers have been previously implicated in studies of female mammalian fertility, however a systematic analysis across multiple biofluids in infertile and high reproductive potential phenotypes has not been explored. In the current study, the serum, urinary and salivary metabolomes of infertile (INF) sows and high reproductive potential (HRP) sows with a live litter size ≥ 13 piglets were examined using LC-MS/MS techniques, and a data pipeline was used to highlight possible metabolite reproductive biomarkers discriminating the reproductive groups. The metabolomes of HRP and INF sows were distinct, including significant alterations in amino acid, fatty acid, membrane lipid and steroid hormone metabolism. Carnitines and fatty acid related metabolites were most discriminatory in separating and classifying the HRP and INF sows based on their biofluid metabolome. It appears that urine is a superior biofluid than saliva and serum for potentially predicting the reproductive potential level of a given female pig based on the performance of the resultant biomarker models. This study lays the groundwork for improving gilt and sow selection protocols using metabolomics as a tool for the prediction of reproductive potential.
Collapse
|
47
|
Bernau CR, Knödler M, Emonts J, Jäpel RC, Buyel JF. The use of predictive models to develop chromatography-based purification processes. Front Bioeng Biotechnol 2022; 10:1009102. [PMID: 36312533 PMCID: PMC9605695 DOI: 10.3389/fbioe.2022.1009102] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/23/2022] [Indexed: 11/13/2022] Open
Abstract
Chromatography is the workhorse of biopharmaceutical downstream processing because it can selectively enrich a target product while removing impurities from complex feed streams. This is achieved by exploiting differences in molecular properties, such as size, charge and hydrophobicity (alone or in different combinations). Accordingly, many parameters must be tested during process development in order to maximize product purity and recovery, including resin and ligand types, conductivity, pH, gradient profiles, and the sequence of separation operations. The number of possible experimental conditions quickly becomes unmanageable. Although the range of suitable conditions can be narrowed based on experience, the time and cost of the work remain high even when using high-throughput laboratory automation. In contrast, chromatography modeling using inexpensive, parallelized computer hardware can provide expert knowledge, predicting conditions that achieve high purity and efficient recovery. The prediction of suitable conditions in silico reduces the number of empirical tests required and provides in-depth process understanding, which is recommended by regulatory authorities. In this article, we discuss the benefits and specific challenges of chromatography modeling. We describe the experimental characterization of chromatography devices and settings prior to modeling, such as the determination of column porosity. We also consider the challenges that must be overcome when models are set up and calibrated, including the cross-validation and verification of data-driven and hybrid (combined data-driven and mechanistic) models. This review will therefore support researchers intending to establish a chromatography modeling workflow in their laboratory.
Collapse
Affiliation(s)
- C. R. Bernau
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
| | - M. Knödler
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
| | - J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
| | - R. C. Jäpel
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
| | - J. F. Buyel
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Vienna, Austria
- *Correspondence: J. F. Buyel,
| |
Collapse
|
48
|
Dweekat OY, Lam SS. Cervical Cancer Diagnosis Using an Integrated System of Principal Component Analysis, Genetic Algorithm, and Multilayer Perceptron. Healthcare (Basel) 2022; 10:healthcare10102002. [PMID: 36292449 PMCID: PMC9601935 DOI: 10.3390/healthcare10102002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 11/04/2022] Open
Abstract
Cervical cancer is one of the most dangerous diseases that affect women worldwide. The diagnosis of cervical cancer is challenging, costly, and time-consuming. Existing literature has focused on traditional machine learning techniques and deep learning to identify and predict cervical cancer. This research proposes an integrated system of Genetic Algorithm (GA), Multilayer Perceptron (MLP), and Principal Component Analysis (PCA) that accurately predicts cervical cancer. GA is used to optimize the MLP hyperparameters, and the MLPs act as simulators within the GA to provide the prediction accuracy of the solutions. The proposed method uses PCA to transform the available factors; the transformed features are subsequently used as inputs to the MLP for model training. To contrast with the PCA method, different subsets of the original factors are selected. The performance of the integrated system of PCA–GA–MLP is compared with nine different classification algorithms. The results indicate that the proposed method outperforms the studied classification algorithms. The PCA–GA–MLP model achieves the best accuracy in diagnosing Hinselmann, Biopsy, and Cytology when compared to existing approaches in the literature that were implemented on the same dataset. This study introduces a robust tool that allows medical teams to predict cervical cancer in its early stage.
Collapse
|
49
|
Colombelli F, Kowalski TW, Recamonde-Mendoza M. A hybrid ensemble feature selection design for candidate biomarkers discovery from transcriptome profiles. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
50
|
Romanishkin I, Savelieva T, Kosyrkova A, Okhlopkov V, Shugai S, Orlov A, Kravchuk A, Goryaynov S, Golbin D, Pavlova G, Pronin I, Loschenov V. Differentiation of glioblastoma tissues using spontaneous Raman scattering with dimensionality reduction and data classification. Front Oncol 2022; 12:944210. [PMID: 36185245 PMCID: PMC9520479 DOI: 10.3389/fonc.2022.944210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
The neurosurgery of intracranial tumors is often complicated by the difficulty of distinguishing tumor center, infiltration area, and normal tissue. The current standard for intraoperative navigation is fluorescent diagnostics with a fluorescent agent. This approach can be further enhanced by measuring the Raman spectrum of the tissue, which would provide additional information on its composition even in the absence of fluorescence. However, for the Raman spectra to be immediately helpful for a neurosurgeon, they must be additionally processed. In this work, we analyzed the Raman spectra of human brain glioblastoma multiforme tissue samples obtained during the surgery and investigated several approaches to dimensionality reduction and data classificatin to distinguish different types of tissues. In our study two approaches to Raman spectra dimensionality reduction were approbated and as a result we formulated new technique combining both of them: feature filtering based on the selection of those shifts which correspond to the biochemical components providing the statistically significant differences between groups of examined tissues (center of glioblastoma multiforme, tissues from infiltration area and normally appeared white matter) and principal component analysis. We applied the support vector machine to classify tissues after dimensionality reduction of registered Raman spectra. The accuracy of the classification of malignant tissues (tumor edge and center) and normal ones using the principal component analysis alone was 83% with sensitivity of 96% and specificity of 44%. With a combined technique of dimensionality reduction we obtained 83% accuracy with 77% sensitivity and 92% specificity of tumor tissues classification.
Collapse
Affiliation(s)
- Igor Romanishkin
- Prokhorov General Physics Institute of the Russian Academy of Sciences, Moscow, Russia
- *Correspondence: Igor Romanishkin,
| | - Tatiana Savelieva
- Prokhorov General Physics Institute of the Russian Academy of Sciences, Moscow, Russia
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russia
| | - Alexandra Kosyrkova
- N.N. Burdenko National Medical Research Center of Neurosurgery, Moscow, Russia
| | - Vladimir Okhlopkov
- N.N. Burdenko National Medical Research Center of Neurosurgery, Moscow, Russia
| | - Svetlana Shugai
- N.N. Burdenko National Medical Research Center of Neurosurgery, Moscow, Russia
| | - Arseniy Orlov
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russia
| | - Alexander Kravchuk
- N.N. Burdenko National Medical Research Center of Neurosurgery, Moscow, Russia
| | - Sergey Goryaynov
- N.N. Burdenko National Medical Research Center of Neurosurgery, Moscow, Russia
| | - Denis Golbin
- N.N. Burdenko National Medical Research Center of Neurosurgery, Moscow, Russia
| | - Galina Pavlova
- N.N. Burdenko National Medical Research Center of Neurosurgery, Moscow, Russia
- Institute of Higher Nervous Activity and Neurophysiology of the Russian Academy of Sciences, Moscow, Russia
| | - Igor Pronin
- N.N. Burdenko National Medical Research Center of Neurosurgery, Moscow, Russia
| | - Victor Loschenov
- Prokhorov General Physics Institute of the Russian Academy of Sciences, Moscow, Russia
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow, Russia
| |
Collapse
|