Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106839] [Citation(s) in RCA: 206] [Impact Index Per Article: 51.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

For:	Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106839] [Citation(s) in RCA: 206] [Impact Index Per Article: 51.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Number

Cited by Other Article(s)

Li M, Guo H, Wang K, Kang C, Yin Y, Zhang H. AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification. Comput Biol Med 2024;177:108614. [PMID: 38796884 DOI: 10.1016/j.compbiomed.2024.108614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 02/27/2024] [Accepted: 05/11/2024] [Indexed: 05/29/2024]

Abstract

Integration analysis of cancer multi-omics data for pan-cancer classification has the potential for clinical applications in various aspects such as tumor diagnosis, analyzing clinically significant features, and providing precision medicine. In these applications, the embedding and feature selection on high-dimensional multi-omics data is clinically necessary. Recently, deep learning algorithms become the most promising cancer multi-omic integration analysis methods, due to the powerful capability of capturing nonlinear relationships. Developing effective deep learning architectures for cancer multi-omics embedding and feature selection remains a challenge for researchers in view of high dimensionality and heterogeneity. In this paper, we propose a novel two-phase deep learning model named AVBAE-MODFR for pan-cancer classification. AVBAE-MODFR achieves embedding by a multi2multi autoencoder based on the adversarial variational Bayes method and further performs feature selection utilizing a dual-net-based feature ranking method. AVBAE-MODFR utilizes AVBAE to pre-train the network parameters, which improves the classification performance and enhances feature ranking stability in MODFR. Firstly, AVBAE learns high-quality representation among multiple omics features for unsupervised pan-cancer classification. We design an efficient discriminator architecture to distinguish the latent distributions for updating forward variational parameters. Secondly, we propose MODFR to simultaneously evaluate multi-omics feature importance for feature selection by training a designed multi2one selector network, where the efficient evaluation approach based on the average gradient of random mask subsets can avoid bias caused by input feature drift. We conduct experiments on the TCGA pan-cancer dataset and compare it with four state-of-the-art methods for each phase. The results show the superiority of AVBAE-MODFR over SOTA methods.

Collapse

Zayed A, Belhadj N, Ben Khalifa K, Bedoui MH, Valderrama C. Efficient Generalized Electroencephalography-Based Drowsiness Detection Approach with Minimal Electrodes. SENSORS (BASEL, SWITZERLAND) 2024;24:4256. [PMID: 39001037 PMCID: PMC11244425 DOI: 10.3390/s24134256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/21/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024]

Dai J, Li W, Dong G. Dung Beetle Optimizer Algorithm and Machine Learning-Based Genome Analysis of Lactococcus lactis: Predicting Electronic Sensory Properties of Fermented Milk. Foods 2024;13:1958. [PMID: 38998464 PMCID: PMC11241492 DOI: 10.3390/foods13131958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 06/11/2024] [Accepted: 06/19/2024] [Indexed: 07/14/2024] Open

Abstract

In the global food industry, fermented dairy products are valued for their unique flavors and nutrients. Lactococcus lactis is crucial in developing these flavors during fermentation. Meeting diverse consumer flavor preferences requires the careful selection of fermentation agents. Traditional assessment methods are slow, costly, and subjective. Although electronic-nose and -tongue technologies provide objective assessments, they are mostly limited to laboratory environments. Therefore, this study developed a model to predict the electronic sensory characteristics of fermented milk. This model is based on the genomic data of Lactococcus lactis, using the DBO (Dung Beetle Optimizer) optimization algorithm combined with 10 different machine learning methods. The research results show that the combination of the DBO optimization algorithm and multi-round feature selection with a ridge regression model significantly improved the performance of the model. In the 10-fold cross-validation, the R2 values of all the electronic sensory phenotypes exceeded 0.895, indicating an excellent performance. In addition, a deep analysis of the electronic sensory data revealed an important phenomenon: the correlation between the electronic sensory phenotypes is positively related to the number of features jointly selected. Generally, a higher correlation among the electronic sensory phenotypes corresponds to a greater number of features being jointly selected. Specifically, phenotypes with high correlations exhibit from 2 to 60 times more jointly selected features than those with low correlations. This suggests that our feature selection strategy effectively identifies the key features impacting multiple phenotypes, likely originating from their regulation by similar biological pathways or metabolic processes. Overall, this study proposes a more efficient and cost-effective method for predicting the electronic sensory characteristics of milk fermented by Lactococcus lactis. It helps to screen and optimize fermenting agents with desirable flavor characteristics, thereby driving innovation and development in the dairy industry and enhancing the product quality and market competitiveness.

Collapse

Saini R, Tiwari AK, Nath A, Singh P, Maurya SP, Shah MA. Covering assisted intuitionistic fuzzy bi-selection technique for data reduction and its applications. Sci Rep 2024;14:13568. [PMID: 38866851 DOI: 10.1038/s41598-024-62099-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open

Abstract

The dimension and size of data is growing rapidly with the extensive applications of computer science and lab based engineering in daily life. Due to availability of vagueness, later uncertainty, redundancy, irrelevancy, and noise, which imposes concerns in building effective learning models. Fuzzy rough set and its extensions have been applied to deal with these issues by various data reduction approaches. However, construction of a model that can cope with all these issues simultaneously is always a challenging task. None of the studies till date has addressed all these issues simultaneously. This paper investigates a method based on the notions of intuitionistic fuzzy (IF) and rough sets to avoid these obstacles simultaneously by putting forward an interesting data reduction technique. To accomplish this task, firstly, a novel IF similarity relation is addressed. Secondly, we establish an IF rough set model on the basis of this similarity relation. Thirdly, an IF granular structure is presented by using the established similarity relation and the lower approximation. Next, the mathematical theorems are used to validate the proposed notions. Then, the importance-degree of the IF granules is employed for redundant size elimination. Further, significance-degree-preserved dimensionality reduction is discussed. Hence, simultaneous instance and feature selection for large volume of high-dimensional datasets can be performed to eliminate redundancy and irrelevancy in both dimension and size, where vagueness and later uncertainty are handled with rough and IF sets respectively, whilst noise is tackled with IF granular structure. Thereafter, a comprehensive experiment is carried out over the benchmark datasets to demonstrate the effectiveness of simultaneous feature and data point selection methods. Finally, our proposed methodology aided framework is discussed to enhance the regression performance for IC50 of Antiviral Peptides.

Collapse

Iqbal A, Amin R, Alsubaei FS, Alzahrani A. Anomaly detection in multivariate time series data using deep ensemble models. PLoS One 2024;19:e0303890. [PMID: 38843255 PMCID: PMC11156414 DOI: 10.1371/journal.pone.0303890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/03/2024] [Indexed: 06/09/2024] Open

Rostamzadeh S, Abouhossein A, Alam K, Vosoughi S, Sattari SS. Exploratory analysis using machine learning algorithms to predict pinch strength by anthropometric and socio-demographic features. INTERNATIONAL JOURNAL OF OCCUPATIONAL SAFETY AND ERGONOMICS 2024;30:518-531. [PMID: 38553890 DOI: 10.1080/10803548.2024.2322888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]

Park JY, Lee SH, Kim YJ, Kim KG, Lee GJ. Machine learning model based on radiomics features for AO/OTA classification of pelvic fractures on pelvic radiographs. PLoS One 2024;19:e0304350. [PMID: 38814948 PMCID: PMC11139281 DOI: 10.1371/journal.pone.0304350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 05/10/2024] [Indexed: 06/01/2024] Open

Abstract

Depending on the degree of fracture, pelvic fracture can be accompanied by vascular damage, and in severe cases, it may progress to hemorrhagic shock. Pelvic radiography can quickly diagnose pelvic fractures, and the Association for Osteosynthesis Foundation and Orthopedic Trauma Association (AO/OTA) classification system is useful for evaluating pelvic fracture instability. This study aimed to develop a radiomics-based machine-learning algorithm to quickly diagnose fractures on pelvic X-ray and classify their instability. data used were pelvic anteroposterior radiographs of 990 adults over 18 years of age diagnosed with pelvic fractures, and 200 normal subjects. A total of 93 features were extracted based on radiomics:18 first-order, 24 GLCM, 16 GLRLM, 16 GLSZM, 5 NGTDM, and 14 GLDM features. To improve the performance of machine learning, the feature selection methods RFE, SFS, LASSO, and Ridge were used, and the machine learning models used LR, SVM, RF, XGB, MLP, KNN, and LGBM. Performance measurement was evaluated by area under the curve (AUC) by analyzing the receiver operating characteristic curve. The machine learning model was trained based on the selected features using four feature-selection methods. When the RFE feature selection method was used, the average AUC was higher than that of the other methods. Among them, the combination with the machine learning model SVM showed the best performance, with an average AUC of 0.75±0.06. By obtaining a feature-importance graph for the combination of RFE and SVM, it is possible to identify features with high importance. The AO/OTA classification of normal pelvic rings and pelvic fractures on pelvic AP radiographs using a radiomics-based machine learning model showed the highest AUC when using the SVM classification combination. Further research on the radiomic features of each part of the pelvic bone constituting the pelvic ring is needed.

Collapse

Canero FM, Rodriguez-Galiano V, Aragones D. Machine Learning and Feature Selection for soil spectroscopy. An evaluation of Random Forest wrappers to predict soil organic matter, clay, and carbonates. Heliyon 2024;10:e30228. [PMID: 38707402 PMCID: PMC11066688 DOI: 10.1016/j.heliyon.2024.e30228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 04/19/2024] [Accepted: 04/22/2024] [Indexed: 05/07/2024] Open

Abstract

Soil spectroscopy estimates soil properties using the absorption features in soil spectra. However, modelling soil properties with soil spectroscopy is challenging due to the high dimensionality of spectral data. Feature Selection wrapper methods are promising approaches to reduce the dimensionality but are barely used in soil spectroscopy. The aim of this study is to evaluate the performance of two feature selection wrapper methods, Sequential Forward Selection (SFS) and Sequential Flotant Forward Selection (SFFS) built using the Random Forest (RF) algorithm, for dimensionality reduction of spectral data and predictive modelling of modelling soil organic matter (SOM), clay and carbonates. The reflectance of 100 soil samples, acquired from Sierra de las Nieves (Spain), was measured under laboratory conditions using ASD FieldSpec Pro JR. Four different datasets were obtained after applying two spectral preprocessing methods to raw spectra: raw spectra, Continuum Removal (CR), Multiplicative Scatter Correction (MSC), and a so-called "Global" dataset composed of raw, CR and MSC features. The performance of RF models built with feature selection methods was compared to that of Partial Least Squares Regression (PLSR) and RF (alone). RF models built with SFS and SFFS outperformed PLSR and RF alone models: The best RF models with feature selection had a respective ratio of performance to interquartile distance of 1.93, 0.38 and 2.56. PLSR models had an accuracy of 1.41, 0.29 and 1.81 for SOM, carbonates, and clay, respectively. RF alone had a respective performance of 1.29, 0.29 and 1.81. The application of feature selection wrapper methods reduced the number of features to less than 1 % of the starting features. Features were selected across all spectra for SOM and clay, and around 900 nm, 1900 nm, and 2350 nm for carbonates. However, feature selection highlighted features around 1100 nm in SOM modelling, as well as other features around 2200 nm, which is considered a main absorption feature of clay. The application of feature selection with Random Forest was very important in improving modelling accuracy, reducing the redundant features and avoiding the curse of dimensionality or Hughes effect. Thus, this research showed an alternative to dimensionality reduction approaches that have been applied to date to model soil properties with spectroscopy and paves the way for further scientific investigation based on feature selection methods and machine learning.

Collapse

Tiwari AK, Saini R, Nath A, Singh P, Shah MA. Hybrid similarity relation based mutual information for feature selection in intuitionistic fuzzy rough framework and its applications. Sci Rep 2024;14:5958. [PMID: 38472266 PMCID: PMC10933482 DOI: 10.1038/s41598-024-55902-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open

Abstract

Fuzzy rough entropy established in the notion of fuzzy rough set theory, which has been effectively and efficiently applied for feature selection to handle the uncertainty in real-valued datasets. Further, Fuzzy rough mutual information has been presented by integrating information entropy with fuzzy rough set to measure the importance of features. However, none of the methods till date can handle noise, uncertainty and vagueness simultaneously due to both judgement and identification, which lead to degrade the overall performances of the learning algorithms with the increment in the number of mixed valued conditional features. In the current study, these issues are tackled by presenting a novel intuitionistic fuzzy (IF) assisted mutual information concept along with IF granular structure. Initially, a hybrid IF similarity relation is introduced. Based on this relation, an IF granular structure is introduced. Then, IF rough conditional and joint entropies are established. Further, mutual information based on these concepts are discussed. Next, mathematical theorems are proved to demonstrate the validity of the given notions. Thereafter, significance of the features subset is computed by using this mutual information, and corresponding feature selection is suggested to delete the irrelevant and redundant features. The current approach effectively handles noise and subsequent uncertainty in both nominal and mixed data (including both nominal and category variables). Moreover, comprehensive experimental performances are evaluated on real-valued benchmark datasets to demonstrate the practical validation and effectiveness of the addressed technique. Finally, an application of the proposed method is exhibited to improve the prediction of phospholipidosis positive molecules. RF(h2o) produces the most effective results till date based on our proposed methodology with sensitivity, accuracy, specificity, MCC, and AUC of 86.7%, 90.1%, 93.0% , 0.808, and 0.922 respectively.

Collapse

Zhou W, Yan Z, Zhang L. A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction. Sci Rep 2024;14:5905. [PMID: 38467662 PMCID: PMC10928191 DOI: 10.1038/s41598-024-55243-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/21/2024] [Indexed: 03/13/2024] Open

Abstract

To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R2 (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.

Collapse

Atimbire SA, Appati JK, Owusu E. Empirical exploration of whale optimisation algorithm for heart disease prediction. Sci Rep 2024;14:4530. [PMID: 38402276 PMCID: PMC10894250 DOI: 10.1038/s41598-024-54990-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 02/19/2024] [Indexed: 02/26/2024] Open

Yang K, Liu L, Wen Y. The impact of Bayesian optimization on feature selection. Sci Rep 2024;14:3948. [PMID: 38366092 PMCID: PMC10873405 DOI: 10.1038/s41598-024-54515-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 02/13/2024] [Indexed: 02/18/2024] Open

Lu M, Yin R, Chen XS. Ensemble methods of rank-based trees for single sample classification with gene expression profiles. J Transl Med 2024;22:140. [PMID: 38321494 PMCID: PMC10848444 DOI: 10.1186/s12967-024-04940-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 01/27/2024] [Indexed: 02/08/2024] Open

Sheng J, Lam S, Zhang J, Zhang Y, Cai J. Multi-omics fusion with soft labeling for enhanced prediction of distant metastasis in nasopharyngeal carcinoma patients after radiotherapy. Comput Biol Med 2024;168:107684. [PMID: 38039891 DOI: 10.1016/j.compbiomed.2023.107684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/06/2023] [Accepted: 11/06/2023] [Indexed: 12/03/2023]

Abstract

Omics fusion has emerged as a crucial preprocessing approach in medical image processing, significantly assisting several studies. One of the challenges encountered in integrating omics data is the unpredictability arising from disparities in data sources and medical imaging equipment. Due to these differences, the distribution of omics futures exhibits spatial heterogeneity, diminishing their capacity to enhance subsequent tasks. To overcome this challenge and facilitate the integration of their joint application to specific medical objectives, this study aims to develop a fusion methodology for nasopharyngeal carcinoma (NPC) distant metastasis prediction to mitigate the disparities inherent in omics data. The multi-kernel late-fusion method can reduce the impact of these differences by mapping the features using the most suiTable single-kernel function and then combining them in a high-dimensional space that can effectively represent the data. The proposed approach in this study employs a distinctive framework incorporating a label-softening technique alongside a multi-kernel-based Radial basis function (RBF) neural network to address these limitations. An efficient representation of the data may be achieved by utilizing the multi-kernel to map the inherent features and then merging them in a space with many dimensions. However, the inflexibility of label fitting poses a constraint on using multi-kernel late-fusion methods in complex NPC datasets, hence affecting the efficacy of general classifiers in dealing with high-dimensional characteristics. The label softening increases the disparity between the two cohorts, providing a more flexible structure for allocating labels. The proposed model is evaluated on multi-omics datasets, and the results demonstrate its strength and effectiveness in predicting distant metastasis of NPC patients.

Collapse

Hosseiniyan Khatibi SM, Rahbar Saadat Y, Hejazian SM, Sharifi S, Ardalan M, Teshnehlab M, Zununi Vahed S, Pirmoradi S. Decoding the Possible Molecular Mechanisms in Pediatric Wilms Tumor and Rhabdoid Tumor of the Kidney through Machine Learning Approaches. Fetal Pediatr Pathol 2023;42:825-844. [PMID: 37548233 DOI: 10.1080/15513815.2023.2242979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 07/26/2023] [Indexed: 08/08/2023]

Sun S, Alkahtani ME, Gaisford S, Basit AW, Elbadawi M, Orlu M. Virtually Possible: Enhancing Quality Control of 3D-Printed Medicines with Machine Vision Trained on Photorealistic Images. Pharmaceutics 2023;15:2630. [PMID: 38004607 PMCID: PMC10674815 DOI: 10.3390/pharmaceutics15112630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/01/2023] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open

Abstract

Three-dimensional (3D) printing is an advanced pharmaceutical manufacturing technology, and concerted efforts are underway to establish its applicability to various industries. However, for any technology to achieve widespread adoption, robustness and reliability are critical factors. Machine vision (MV), a subset of artificial intelligence (AI), has emerged as a powerful tool to replace human inspection with unprecedented speed and accuracy. Previous studies have demonstrated the potential of MV in pharmaceutical processes. However, training models using real images proves to be both costly and time consuming. In this study, we present an alternative approach, where synthetic images were used to train models to classify the quality of dosage forms. We generated 200 photorealistic virtual images that replicated 3D-printed dosage forms, where seven machine learning techniques (MLTs) were used to perform image classification. By exploring various MV pipelines, including image resizing and transformation, we achieved remarkable classification accuracies of 80.8%, 74.3%, and 75.5% for capsules, tablets, and films, respectively, for classifying stereolithography (SLA)-printed dosage forms. Additionally, we subjected the MLTs to rigorous stress tests, evaluating their scalability to classify over 3000 images and their ability to handle irrelevant images, where accuracies of 66.5% (capsules), 72.0% (tablets), and 70.9% (films) were obtained. Moreover, model confidence was also measured, and Brier scores ranged from 0.20 to 0.40. Our results demonstrate promising proof of concept that virtual images exhibit great potential for image classification of SLA-printed dosage forms. By using photorealistic virtual images, which are faster and cheaper to generate, we pave the way for accelerated, reliable, and sustainable AI model development to enhance the quality control of 3D-printed medicines.

Collapse

Alahdab F, El Shawi R, Ahmed AI, Han Y, Al-Mallah M. Patient-level explainable machine learning to predict major adverse cardiovascular events from SPECT MPI and CCTA imaging. PLoS One 2023;18:e0291451. [PMID: 37967112 PMCID: PMC10651041 DOI: 10.1371/journal.pone.0291451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 08/30/2023] [Indexed: 11/17/2023] Open

Abstract

BACKGROUND

Machine learning (ML) has shown promise in improving the risk prediction in non-invasive cardiovascular imaging, including SPECT MPI and coronary CT angiography. However, most algorithms used remain black boxes to clinicians in how they compute their predictions. Furthermore, objective consideration of the multitude of available clinical data, along with the visual and quantitative assessments from CCTA and SPECT, are critical for optimal patient risk stratification. We aim to provide an explainable ML approach to predict MACE using clinical, CCTA, and SPECT data.

METHODS

Consecutive patients who underwent clinically indicated CCTA and SPECT myocardial imaging for suspected CAD were included and followed up for MACEs. A MACE was defined as a composite outcome that included all-cause mortality, myocardial infarction, or late revascularization. We employed an Automated Machine Learning (AutoML) approach to predict MACE using clinical, CCTA, and SPECT data. Various mainstream models with different sets of hyperparameters have been explored, and critical predictors of risk are obtained using explainable techniques on the global and patient levels. Ten-fold cross-validation was used in training and evaluating the AutoML model.

RESULTS

A total of 956 patients were included (mean age 61.1 ±14.2 years, 54% men, 89% hypertension, 81% diabetes, 84% dyslipidemia). Obstructive CAD on CCTA and ischemia on SPECT were observed in 14% of patients, and 11% experienced MACE. ML prediction's sensitivity, specificity, and accuracy in predicting a MACE were 69.61%, 99.77%, and 96.54%, respectively. The top 10 global predictive features included 8 CCTA attributes (segment involvement score, number of vessels with severe plaque ≥70, ≥50% stenosis in the left marginal coronary artery, calcified plaque, ≥50% stenosis in the left circumflex coronary artery, plaque type in the left marginal coronary artery, stenosis degree in the second obtuse marginal of the left circumflex artery, and stenosis category in the marginals of the left circumflex artery) and 2 clinical features (past medical history of MI or left bundle branch block, being an ever smoker).

CONCLUSION

ML can accurately predict risk of developing a MACE in patients suspected of CAD undergoing SPECT MPI and CCTA. ML feature-ranking can also show, at a sample- as well as at a patient-level, which features are key in making such a prediction.

Collapse

Connor M, Salans M, Karunamuni R, Unnikrishnan S, Huynh-Le MP, Tibbs M, Qian A, Reyes A, Stasenko A, McDonald C, Moiseenko V, El-Naqa I, Hattangadi-Gluth JA. Fine Motor Skill Decline After Brain Radiation Therapy-A Multivariate Normal Tissue Complication Probability Study of a Prospective Trial. Int J Radiat Oncol Biol Phys 2023;117:581-593. [PMID: 37150258 PMCID: PMC10911396 DOI: 10.1016/j.ijrobp.2023.04.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 03/20/2023] [Accepted: 04/29/2023] [Indexed: 05/09/2023]

Abstract

PURPOSE

Brain radiation therapy can impair fine motor skills (FMS). Fine motor skills are essential for activities of daily living, enabling hand-eye coordination for manipulative movements. We developed normal tissue complication probability (NTCP) models for the decline in FMS after fractionated brain radiation therapy (RT).

METHODS AND MATERIALS

On a prospective trial, 44 patients with primary brain tumors received fractioned RT; underwent high-resolution volumetric magnetic resonance imaging, diffusion tensor imaging, and comprehensive FMS assessments (Delis-Kaplan Executive Function System Trail Making Test Motor Speed [DKEFS-MS]; and Grooved Pegboard dominant/nondominant hands) at baseline and 6 months postRT. Regions of interest subserving motor function (including cortex, superficial white matter, thalamus, basal ganglia, cerebellum, and white matter tracts) were autosegmented using validated methods and manually verified. Dosimetric and clinical variables were included in multivariate NTCP models using automated bootstrapped logistic regression, least absolute shrinkage and selection operator logistic regression, and random forests with nested cross-validation.

RESULTS

Half of the patients showed a decline on grooved pegboard test of nondominant hands, 17 of 42 (40.4%) on grooved pegboard test of -dominant hands, and 11 of 44 (25%) on DKEFS-MS. Automated bootstrapped logistic regression selected a 1-term model including maximum dose to dominant postcentral white matter. The least absolute shrinkage and selection operator logistic regression selected this term and steroid use. The top 5 variables in the random forest were all dosimetric: maximum dose to dominant thalamus, mean dose to dominant caudate, mean and maximum dose to the dominant corticospinal tract, and maximum dose to dominant postcentral white matter. This technique performed best with an area under the curve of 0.69 (95% CI, 0.68-0.70) on nested cross-validation.

CONCLUSIONS

We present the first NTCP models for FMS impairment after brain RT. Dose to several supratentorial motor-associated regions of interest correlated with a decline in dominant-hand fine motor dexterity in patients with primary brain tumors in multivariate models, outperforming clinical variables. These data can guide prospective fine motor-sparing strategies for brain RT.

Collapse

Fu X, Song C, Zhang R, Shi H, Jiao Z. Multimodal Classification Framework Based on Hypergraph Latent Relation for End-Stage Renal Disease Associated with Mild Cognitive Impairment. Bioengineering (Basel) 2023;10:958. [PMID: 37627843 PMCID: PMC10451373 DOI: 10.3390/bioengineering10080958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023] Open

Wang H, Doumard E, Soule-Dupuy C, Kemoun P, Aligon J, Monsarrat P. Explanations as a New Metric for Feature Selection: A Systematic Approach. IEEE J Biomed Health Inform 2023;27:4131-4142. [PMID: 37220033 DOI: 10.1109/jbhi.2023.3279340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Ribeiro C, Farmer CK, de Magalhães JP, Freitas AA. Predicting lifespan-extending chemical compounds for C. elegans with machine learning and biologically interpretable features. Aging (Albany NY) 2023;15:6073-6099. [PMID: 37450404 PMCID: PMC10373959 DOI: 10.18632/aging.204866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/19/2023] [Indexed: 07/18/2023]

Rostamzadeh S, Abouhossein A, Saremi M, Taheri F, Ebrahimian M, Vosoughi S. A comparative investigation of machine learning algorithms for predicting safety signs comprehension based on socio-demographic factors and cognitive sign features. Sci Rep 2023;13:10843. [PMID: 37407611 DOI: 10.1038/s41598-023-38065-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 07/02/2023] [Indexed: 07/07/2023] Open

Abstract

This study examines whether the socio-demographic factors and cognitive sign features can be used for envisaging safety signs comprehensibility using predictive machine learning (ML) techniques. This study will determine the role of different machine learning components such as feature selection and classification to determine suitable factors for safety construction signs comprehensibility. A total of 2310 participants were requested to guess the meaning of 20 construction safety signs (four items for each of the mandatory, prohibition, emergency, warning, and firefighting signs) using the open-ended method. Moreover, the participants were asked to rate the cognitive design features of each sign in terms of familiarity, concreteness, simplicity, meaningfulness, and semantic closeness on a 0-100 rating scale. Subsequently, all eight features (age, experience, education level, familiarity, concreteness, meaningfulness, semantic closeness, and simplicity) were used for classification. Furthermore, the 14 most popular supervised classifiers were implemented and evaluated for safety sign comprehensibility prediction using these eight features. Also, filter and wrapper methods were used as feature selection techniques. Results of feature selection techniques indicate that among the eight features considered in this study, familiarity, simplicity, and meaningfulness are found to be the most relevant and effective components in predicting the comprehensibility of selected safety signs. Further, when these three features are used for classification, the K-NN classifier achieves the highest classification accuracy of 94.369% followed by medium Gaussian SVM which achieves a classification accuracy of 76.075% under hold-out data division protocol. The machine learning (ML) technique was adopted as a promising approach to addressing the issue of comprehensibility, especially in terms of determining factors affecting the safety signs' comprehension. The cognitive sign features of familiarity, simplicity, and meaningfulness can provide useful information in terms of designing user-friendly safety signs.

Collapse

Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023;21:182. [PMID: 37189125 DOI: 10.1186/s12916-023-02858-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open

Abstract

BACKGROUND

In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions.

METHODS

Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD.

RESULTS

The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided.

CONCLUSIONS

This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.

Collapse

Francis DP, Laustsen M, Dossi E, Treiberg T, Hardy I, Shiv SH, Hansen BS, Mogensen J, Jakobsen MH, Alstrøm TS. Machine learning methods for the detection of explosives, drugs and precursor chemicals gathered using a colorimetric sniffer sensor. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2023;15:2343-2354. [PMID: 37157832 DOI: 10.1039/d3ay00247k] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]

Doherty T, Dempster E, Hannon E, Mill J, Poulton R, Corcoran D, Sugden K, Williams B, Caspi A, Moffitt TE, Delany SJ, Murphy TM. A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator. BMC Bioinformatics 2023;24:178. [PMID: 37127563 PMCID: PMC10152624 DOI: 10.1186/s12859-023-05282-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 04/11/2023] [Indexed: 05/03/2023] Open

Abstract

BACKGROUND

The field of epigenomics holds great promise in understanding and treating disease with advances in machine learning (ML) and artificial intelligence being vitally important in this pursuit. Increasingly, research now utilises DNA methylation measures at cytosine-guanine dinucleotides (CpG) to detect disease and estimate biological traits such as aging. Given the challenge of high dimensionality of DNA methylation data, feature-selection techniques are commonly employed to reduce dimensionality and identify the most important subset of features. In this study, our aim was to test and compare a range of feature-selection methods and ML algorithms in the development of a novel DNA methylation-based telomere length (TL) estimator. We utilised both nested cross-validation and two independent test sets for the comparisons.

RESULTS

We found that principal component analysis in advance of elastic net regression led to the overall best performing estimator when evaluated using a nested cross-validation analysis and two independent test cohorts. This approach achieved a correlation between estimated and actual TL of 0.295 (83.4% CI [0.201, 0.384]) on the EXTEND test data set. Contrastingly, the baseline model of elastic net regression with no prior feature reduction stage performed less well in general-suggesting a prior feature-selection stage may have important utility. A previously developed TL estimator, DNAmTL, achieved a correlation of 0.216 (83.4% CI [0.118, 0.310]) on the EXTEND data. Additionally, we observed that different DNA methylation-based TL estimators, which have few common CpGs, are associated with many of the same biological entities.

CONCLUSIONS

The variance in performance across tested approaches shows that estimators are sensitive to data set heterogeneity and the development of an optimal DNA methylation-based estimator should benefit from the robust methodological approach used in this study. Moreover, our methodology which utilises a range of feature-selection approaches and ML algorithms could be applied to other biological markers and disease phenotypes, to examine their relationship with DNA methylation and predictive value.

Collapse

Pan Q, Hu W, He D, He C, Zhang L, Shi Q. Machine-learning assisted molecular formula assignment to high-resolution mass spectrometry data of dissolved organic matter. Talanta 2023;259:124484. [PMID: 37001397 DOI: 10.1016/j.talanta.2023.124484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/22/2023] [Indexed: 03/29/2023]

Wang XW, Wang T, Schaub DP, Chen C, Sun Z, Ke S, Hecker J, Maaser-Hecker A, Zeleznik OA, Zeleznik R, Litonjua AA, DeMeo DL, Lasky-Su J, Silverman EK, Liu YY, Weiss ST. Benchmarking omics-based prediction of asthma development in children. Respir Res 2023;24:63. [PMID: 36842969 PMCID: PMC9969629 DOI: 10.1186/s12931-023-02368-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 02/16/2023] [Indexed: 02/27/2023] Open

Affiliation(s)

Xu-Wen Wang Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Tong Wang Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Darius P Schaub Department of Mathematics, University of Hamburg, 21109, Hamburg, Germany
Can Chen Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Zheng Sun Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Shanlin Ke Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Julian Hecker Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Anna Maaser-Hecker Genetics and Aging Research Unit, Department of Neurology, McCance Center for Brain Health, Mass General Institute for Neurodegenerative Disease, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA, USA
Oana A Zeleznik Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Roman Zeleznik Department of Radiation Oncology, Brigham and Women's Hospital, Boston, MA, USA
Augusto A Litonjua Division of Pediatric Pulmonology, Golisano Children's Hospital, Rochester, NY, USA
Dawn L DeMeo Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Jessica Lasky-Su Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Edwin K Silverman Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
Yang-Yu Liu Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA. Center for Artificial Intelligence and Modeling, The Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
Scott T Weiss Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.

Collapse

Mohiuddin S, Sheikh KH, Malakar S, Velásquez JD, Sarkar R. A hierarchical feature selection strategy for deepfake video detection. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08201-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]

Ensemble filters with harmonize PSO-SVM algorithm for optimal hearing disorder prediction. Neural Comput Appl 2023;35:10473-10496. [PMID: 36747886 PMCID: PMC9894525 DOI: 10.1007/s00521-023-08244-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 01/06/2023] [Indexed: 02/05/2023]

Chen Y, Liu Y, Zuo X, Zhao Q, Sun M, Cui M, Zhao X, Du Y. Identification of significant imaging features for sensing oocyte viability. Microsc Res Tech 2023;86:181-192. [PMID: 36278826 DOI: 10.1002/jemt.24248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/26/2022] [Accepted: 10/06/2022] [Indexed: 01/21/2023]

Affiliation(s)

Yizhe Chen Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
Yaowei Liu Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
Xiaoying Zuo Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
Qili Zhao Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
Mingzhu Sun Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
Maosheng Cui Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China.,Innovation Team of Pig Feeding, Institute of Animal Science and Veterinary of Tianjin, Tianjin, China
Xin Zhao Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China
Yue Du Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, China.,Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China.,Institute of Intelligence Technology and Robotic Systems, Shenzhen Research Institute of Nankai University, Tianjin, China

Collapse

An improved feature selection approach using global best guided Gaussian artificial bee colony for EMG classification. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Zhang M, Wang JS, Liu Y, Wang M, Li XD, Guo FJ. Feature selection method based on stochastic fractal search henry gas solubility optimization algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-221036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Hapfelmeier A, Hornung R, Haller B. Efficient permutation testing of variable importance measures by the example of random forests. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Lap BQ, Phan TTH, Nguyen HD, Quang LX, Hang PT, Phi NQ, Hoang VT, Linh PG, Thanh Hang BT. Predicting Water Quality Index (WQI) by feature selection and machine learning: A case study of An Kim Hai irrigation system. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.101991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]

Signol F, Arnal L, Navarro-Cerdán JR, Llobet R, Arlandis J, Perez-Cortes JC. SEQENS: An ensemble method for relevant gene identification in microarray data. Comput Biol Med 2023;152:106413. [PMID: 36521355 DOI: 10.1016/j.compbiomed.2022.106413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 11/25/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022]

Jia Z, Ou C, Sun S, Wang J, Liu J, Sun M, Ma W, Li M, Jia S, Mao P. Integrating optical imaging techniques for a novel approach to evaluate Siberian wild rye seed maturity. FRONTIERS IN PLANT SCIENCE 2023;14:1170947. [PMID: 37152128 PMCID: PMC10157248 DOI: 10.3389/fpls.2023.1170947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 04/03/2023] [Indexed: 05/09/2023]

Abstract

Advances in optical imaging technology using rapid and non-destructive methods have led to improvements in the efficiency of seed quality detection. Accurately timing the harvest is crucial for maximizing the yield of higher-quality Siberian wild rye seeds by minimizing excessive shattering during harvesting. This research applied integrated optical imaging techniques and machine learning algorithms to develop different models for classifying Siberian wild rye seeds based on different maturity stages and grain positions. The multi-source fusion of morphological, multispectral, and autofluorescence data provided more comprehensive information but also increases the performance requirements of the equipment. Therefore, we employed three filtering algorithms, namely minimal joint mutual information maximization (JMIM), information gain, and Gini impurity, and set up two control methods (feature union and no-filtering) to assess the impact of retaining only 20% of the features on the model performance. Both JMIM and information gain revealed autofluorescence and morphological features (CIELab A, CIELab B, hue and saturation), with these two filtering algorithms showing shorter run times. Furthermore, a strong correlation was observed between shoot length and morphological and autofluorescence spectral features. Machine learning models based on linear discriminant analysis (LDA), random forests (RF) and support vector machines (SVM) showed high performance (>0.78 accuracies) in classifying seeds at different maturity stages. Furthermore, it was found that there was considerable variation in the different grain positions at the maturity stage, and the K-means approach was used to improve the model performance by 5.8%-9.24%. In conclusion, our study demonstrated that feature filtering algorithms combined with machine learning algorithms offer high performance and low cost in identifying seed maturity stages and that the application of k-means techniques for inconsistent maturity improves classification accuracy. Therefore, this technique could be employed classification of seed maturity and superior physiological quality for Siberian wild rye seeds.

Collapse

Parkinson E, Liberatore F, Watkins WJ, Andrews R, Edkins S, Hibbert J, Strunk T, Currie A, Ghazal P. Gene filtering strategies for machine learning guided biomarker discovery using neonatal sepsis RNA-seq data. Front Genet 2023;14:1158352. [PMID: 37113992 PMCID: PMC10126415 DOI: 10.3389/fgene.2023.1158352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 03/29/2023] [Indexed: 04/29/2023] Open

Bertolini R, Finch SJ. Stability of filter feature selection methods in data pipelines: a simulation study. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00373-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

BF2SkNet: best deep learning features fusion-assisted framework for multiclass skin lesion classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08084-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Pan X, Zhang G, Lin A, Guan X, Chen P, Ge Y, Chen X. An evaluation model for children's foot & ankle deformity severity using sparse multi-objective feature selection algorithm. Comput Biol Med 2022;151:106229. [PMID: 36308897 DOI: 10.1016/j.compbiomed.2022.106229] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 10/08/2022] [Accepted: 10/16/2022] [Indexed: 12/27/2022]

Xu J, Lu W, Li J, Yuan H. Dependency maximization forward feature selection algorithms based on normalized cross-covariance operator and its approximated form for high-dimensional data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022;15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]

Abstract

The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.

Collapse

Jeon Y, Hwang G. Feature Selection with Scalable Variational Gaussian Process via Sensitivity Analysis based on L2 Divergence. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Feature selection for distance-based regression: An umbrella review and a one-shot wrapper. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Schumann P, Scholz M, Trentzsch K, Jochim T, Śliwiński G, Malberg H, Ziemssen T. Detection of Fall Risk in Multiple Sclerosis by Gait Analysis-An Innovative Approach Using Feature Selection Ensemble and Machine Learning Algorithms. Brain Sci 2022;12:1477. [PMID: 36358403 PMCID: PMC9688245 DOI: 10.3390/brainsci12111477] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 10/24/2022] [Accepted: 10/26/2022] [Indexed: 10/15/2023] Open

Abstract

One of the common causes of falls in people with Multiple Sclerosis (pwMS) is walking impairment. Therefore, assessment of gait is of importance in MS. Gait analysis and fall detection can take place in the clinical context using a wide variety of available methods. However, combining these methods while using machine learning algorithms for detecting falls has not been performed. Our objective was to determine the most relevant method for determining fall risk by analyzing eleven different gait data sets with machine learning algorithms. In addition, we examined the most important features of fall detection. A new feature selection ensemble (FS-Ensemble) and four classification models (Gaussian Naive Bayes, Decision Tree, k-Nearest Neighbor, Support Vector Machine) were used. The FS-Ensemble consisted of four filter methods: Chi-square test, information gain, Minimum Redundancy Maximum Relevance and RelieF. Various thresholds (50%, 25% and 10%) and combination methods (Union, Union 2, Union 3 and Intersection) were examined. Patient-reported outcomes using specialized walking questionnaires such as the 12-item Multiple Sclerosis Walking Scale (MSWS-12) and the Early Mobility Impairment Questionnaire (EMIQ) achieved the best performances with an F1 score of 0.54 for detecting falls. A combination of selected features of MSWS-12 and EMIQ, including the estimation of walking, running and stair climbing ability, the subjective effort as well as necessary concentration and walking fluency during walking, the frequency of stumbling and the indication of avoidance of social activity achieved the best recall of 75%. The Gaussian Naive Bayes was the best classification model for detecting falls with almost all data sets. FS-Ensemble improved the classification models and is an appropriate technique for reducing data sets with a large number of features. Future research on other risk factors, such as fear of falling, could provide further insights.

Collapse

Identification of Candidate Salivary, Urinary and Serum Metabolic Biomarkers for High Litter Size Potential in Sows (Sus scrofa). Metabolites 2022;12:metabo12111045. [DOI: 10.3390/metabo12111045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/27/2022] [Accepted: 10/28/2022] [Indexed: 11/16/2022] Open

Abstract The selection of sows that are reproductively fit and produce large litters of piglets is imperative for success in the pork industry. Currently, low heritability of reproductive and litter-related traits and unfavourable genetic correlations are slowing the improvement of pig selection efficiency. The integration of biomarkers as a supplement or alternative to the use of genetic markers may permit the optimization and increase of selection protocol efficiency. Metabolite biomarkers are an advantageous class of biomarkers that can facilitate the identification of cellular processes implicated in reproductive condition. Metabolism and metabolic biomarkers have been previously implicated in studies of female mammalian fertility, however a systematic analysis across multiple biofluids in infertile and high reproductive potential phenotypes has not been explored. In the current study, the serum, urinary and salivary metabolomes of infertile (INF) sows and high reproductive potential (HRP) sows with a live litter size ≥ 13 piglets were examined using LC-MS/MS techniques, and a data pipeline was used to highlight possible metabolite reproductive biomarkers discriminating the reproductive groups. The metabolomes of HRP and INF sows were distinct, including significant alterations in amino acid, fatty acid, membrane lipid and steroid hormone metabolism. Carnitines and fatty acid related metabolites were most discriminatory in separating and classifying the HRP and INF sows based on their biofluid metabolome. It appears that urine is a superior biofluid than saliva and serum for potentially predicting the reproductive potential level of a given female pig based on the performance of the resultant biomarker models. This study lays the groundwork for improving gilt and sow selection protocols using metabolomics as a tool for the prediction of reproductive potential. Collapse

Bernau CR, Knödler M, Emonts J, Jäpel RC, Buyel JF. The use of predictive models to develop chromatography-based purification processes. Front Bioeng Biotechnol 2022;10:1009102. [PMID: 36312533 PMCID: PMC9605695 DOI: 10.3389/fbioe.2022.1009102] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/23/2022] [Indexed: 11/13/2022] Open

Dweekat OY, Lam SS. Cervical Cancer Diagnosis Using an Integrated System of Principal Component Analysis, Genetic Algorithm, and Multilayer Perceptron. Healthcare (Basel) 2022;10:healthcare10102002. [PMID: 36292449 PMCID: PMC9601935 DOI: 10.3390/healthcare10102002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 11/04/2022] Open

Colombelli F, Kowalski TW, Recamonde-Mendoza M. A hybrid ensemble feature selection design for candidate biomarkers discovery from transcriptome profiles. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]

Romanishkin I, Savelieva T, Kosyrkova A, Okhlopkov V, Shugai S, Orlov A, Kravchuk A, Goryaynov S, Golbin D, Pavlova G, Pronin I, Loschenov V. Differentiation of glioblastoma tissues using spontaneous Raman scattering with dimensionality reduction and data classification. Front Oncol 2022;12:944210. [PMID: 36185245 PMCID: PMC9520479 DOI: 10.3389/fonc.2022.944210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open