Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Luo G. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. ACTA ACUST UNITED AC 2016;5. [DOI: 10.1007/s13721-016-0125-6] [Citation(s) in RCA: 130] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

For:	Luo G. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. ACTA ACUST UNITED AC 2016;5. [DOI: 10.1007/s13721-016-0125-6] [Citation(s) in RCA: 130] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Number

Cited by Other Article(s)

Yuan Y, Hu R, Chen S, Zhang X, Liu Z, Zhou G. CKG-IMC: An inductive matrix completion method enhanced by CKG and GNN for Alzheimer's disease compound-protein interactions prediction. Comput Biol Med 2024;177:108612. [PMID: 38838556 DOI: 10.1016/j.compbiomed.2024.108612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 04/17/2024] [Accepted: 05/11/2024] [Indexed: 06/07/2024]

Scott IA, De Guzman KR, Falconer N, Canaris S, Bonilla O, McPhail SM, Marxen S, Van Garderen A, Abdel-Hafez A, Barras M. Evaluating automated machine learning platforms for use in healthcare. JAMIA Open 2024;7:ooae031. [PMID: 38863963 PMCID: PMC11165368 DOI: 10.1093/jamiaopen/ooae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/06/2024] [Accepted: 04/22/2024] [Indexed: 06/13/2024] Open

Tan JM, Liao H, Liu W, Fan C, Huang J, Liu Z, Yan J. Hyperparameter optimization: Classics, acceleration, online, multi-objective, and tools. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024;21:6289-6335. [PMID: 39176427 DOI: 10.3934/mbe.2024275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]

Cheng Z, Aitha M, Thomas CA, Sturgill A, Fairweather M, Hu A, Bethel CR, Rivera DD, Dranchak P, Thomas PW, Li H, Feng Q, Tao K, Song M, Sun N, Wang S, Silwal SB, Page RC, Fast W, Bonomo RA, Weese M, Martinez W, Inglese J, Crowder MW. Machine Learning Models Identify Inhibitors of New Delhi Metallo-β-lactamase. J Chem Inf Model 2024;64:3977-3991. [PMID: 38727192 PMCID: PMC11129921 DOI: 10.1021/acs.jcim.3c02015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2024]

Affiliation(s)

Zishuo Cheng Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Mahesh Aitha Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
Caitlyn A. Thomas Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Aidan Sturgill Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Mitch Fairweather Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Amy Hu Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Christopher R. Bethel Research Service, Louis Stokes Cleveland Department of Veterans Affairs Medical Center, Cleveland, OH 44106, USA
Dann D. Rivera Division of Chemical Biology and Medicinal Chemistry, College of Pharmacy, University of Texas, Austin, TX 78712, USA
Patricia Dranchak Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
Pei W. Thomas Division of Chemical Biology and Medicinal Chemistry, College of Pharmacy, University of Texas, Austin, TX 78712, USA
Han Li Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Qi Feng Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Kaicheng Tao Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Minshuai Song Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Na Sun Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Shuo Wang Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Surendra Bikram Silwal Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Richard C. Page Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Walt Fast Division of Chemical Biology and Medicinal Chemistry, College of Pharmacy, University of Texas, Austin, TX 78712, USA
Robert A. Bonomo Research Service, Louis Stokes Cleveland Department of Veterans Affairs Medical Center, Cleveland, OH 44106, USA Departments of Medicine, Biochemistry, Molecular Biology and Microbiology, Pharmacology, and Proteomics and Bioinformatics, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA Clinician Scientist Investigator, Louis Stokes Cleveland Department of Veterans Affairs Medical Center, Cleveland, OH 44106, USA CWRU-Cleveland VAMC Center for Antimicrobial Resistance and Epidemiology (Case VA CARES) Cleveland, OH 44106, USA
Maria Weese Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
Waldyn Martinez Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA
James Inglese Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA Metabolic Medicine Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20817, USA
Michael W. Crowder Department of Chemistry and Biochemistry, Miami University, Oxford, OH 45056, USA

Collapse

Darsha Jayamini WK, Mirza F, Asif Naeem M, Chan AHY. Investigating Machine Learning Techniques for Predicting Risk of Asthma Exacerbations: A Systematic Review. J Med Syst 2024;48:49. [PMID: 38739297 PMCID: PMC11090925 DOI: 10.1007/s10916-024-02061-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 04/04/2024] [Indexed: 05/14/2024]

Vecchi E, Bassetti D, Graziato F, Pospíšil L, Horenko I. Gauge-Optimal Approximate Learning for Small Data Classification. Neural Comput 2024;36:1198-1227. [PMID: 38669692 DOI: 10.1162/neco_a_01664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/16/2024] [Indexed: 04/28/2024]

Prinzi F, Currieri T, Gaglio S, Vitabile S. Shallow and deep learning classifiers in medical image analysis. Eur Radiol Exp 2024;8:26. [PMID: 38438821 PMCID: PMC10912073 DOI: 10.1186/s41747-024-00428-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 01/03/2024] [Indexed: 03/06/2024] Open

Abstract

An increasingly strong connection between artificial intelligence and medicine has enabled the development of predictive models capable of supporting physicians' decision-making. Artificial intelligence encompasses much more than machine learning, which nevertheless is its most cited and used sub-branch in the last decade. Since most clinical problems can be modeled through machine learning classifiers, it is essential to discuss their main elements. This review aims to give primary educational insights on the most accessible and widely employed classifiers in radiology field, distinguishing between "shallow" learning (i.e., traditional machine learning) algorithms, including support vector machines, random forest and XGBoost, and "deep" learning architectures including convolutional neural networks and vision transformers. In addition, the paper outlines the key steps for classifiers training and highlights the differences between the most common algorithms and architectures. Although the choice of an algorithm depends on the task and dataset dealing with, general guidelines for classifier selection are proposed in relation to task analysis, dataset size, explainability requirements, and available computing resources. Considering the enormous interest in these innovative models and architectures, the problem of machine learning algorithms interpretability is finally discussed, providing a future perspective on trustworthy artificial intelligence.Relevance statement The growing synergy between artificial intelligence and medicine fosters predictive models aiding physicians. Machine learning classifiers, from shallow learning to deep learning, are offering crucial insights for the development of clinical decision support systems in healthcare. Explainability is a key feature of models that leads systems toward integration into clinical practice. Key points • Training a shallow classifier requires extracting disease-related features from region of interests (e.g., radiomics).• Deep classifiers implement automatic feature extraction and classification.• The classifier selection is based on data and computational resources availability, task, and explanation needs.

Collapse

Zhang Y, Li Q, Xin Y. Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks. Front Comput Neurosci 2024;18:1345575. [PMID: 38356726 PMCID: PMC10864458 DOI: 10.3389/fncom.2024.1345575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/15/2024] [Indexed: 02/16/2024] Open

Teng Z, Chen J, Wang J, Wu S, Chen R, Lin Y, Shen L, Jackson R, Zhou J, Yang C. Panicle-Cloud: An Open and AI-Powered Cloud Computing Platform for Quantifying Rice Panicles from Drone-Collected Imagery to Enable the Classification of Yield Production in Rice. PLANT PHENOMICS (WASHINGTON, D.C.) 2023;5:0105. [PMID: 37850120 PMCID: PMC10578299 DOI: 10.34133/plantphenomics.0105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 09/19/2023] [Indexed: 10/19/2023]

Abstract

Rice (Oryza sativa) is an essential stable food for many rice consumption nations in the world and, thus, the importance to improve its yield production under global climate changes. To evaluate different rice varieties' yield performance, key yield-related traits such as panicle number per unit area (PNpM2) are key indicators, which have attracted much attention by many plant research groups. Nevertheless, it is still challenging to conduct large-scale screening of rice panicles to quantify the PNpM2 trait due to complex field conditions, a large variation of rice cultivars, and their panicle morphological features. Here, we present Panicle-Cloud, an open and artificial intelligence (AI)-powered cloud computing platform that is capable of quantifying rice panicles from drone-collected imagery. To facilitate the development of AI-powered detection models, we first established an open diverse rice panicle detection dataset that was annotated by a group of rice specialists; then, we integrated several state-of-the-art deep learning models (including a preferred model called Panicle-AI) into the Panicle-Cloud platform, so that nonexpert users could select a pretrained model to detect rice panicles from their own aerial images. We trialed the AI models with images collected at different attitudes and growth stages, through which the right timing and preferred image resolutions for phenotyping rice panicles in the field were identified. Then, we applied the platform in a 2-season rice breeding trial to valid its biological relevance and classified yield production using the platform-derived PNpM2 trait from hundreds of rice varieties. Through correlation analysis between computational analysis and manual scoring, we found that the platform could quantify the PNpM2 trait reliably, based on which yield production was classified with high accuracy. Hence, we trust that our work demonstrates a valuable advance in phenotyping the PNpM2 trait in rice, which provides a useful toolkit to enable rice breeders to screen and select desired rice varieties under field conditions.

Collapse

Affiliation(s)

Zixuan Teng Digital Fujian Research Institute of Big Data for Agriculture and Forestry, College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China Key Laboratory of Smart Agriculture and Forestry (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou 350002, China
Jiawei Chen State Key Laboratory of Crop Genetics & Germplasm Enhancement, academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing 210095, China
Jian Wang Ningxia Academy of Agriculture and Forestry Sciences, Yinchuan 750002, China
Shuixiu Wu Digital Fujian Research Institute of Big Data for Agriculture and Forestry, College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China College of Mechanical and Electrical Engineering, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Riqing Chen Digital Fujian Research Institute of Big Data for Agriculture and Forestry, College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Yaohai Lin Digital Fujian Research Institute of Big Data for Agriculture and Forestry, College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Liyan Shen State Key Laboratory of Crop Genetics & Germplasm Enhancement, academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing 210095, China
Robert Jackson Cambridge Crop Research, National Institute of Agricultural Botany (NIAB), Cambridge CB3 0LE, UK
Ji Zhou State Key Laboratory of Crop Genetics & Germplasm Enhancement, academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing 210095, China Cambridge Crop Research, National Institute of Agricultural Botany (NIAB), Cambridge CB3 0LE, UK
Changcai Yang Digital Fujian Research Institute of Big Data for Agriculture and Forestry, College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China Center for Agroforestry Mega Data Science, School of Future Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China

Collapse

Abbas F, Zhang F, Ismail M, Khan G, Iqbal J, Alrefaei AF, Albeshr MF. Optimizing Machine Learning Algorithms for Landslide Susceptibility Mapping along the Karakoram Highway, Gilgit Baltistan, Pakistan: A Comparative Study of Baseline, Bayesian, and Metaheuristic Hyperparameter Optimization Techniques. SENSORS (BASEL, SWITZERLAND) 2023;23:6843. [PMID: 37571627 PMCID: PMC10422586 DOI: 10.3390/s23156843] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/19/2023] [Accepted: 07/25/2023] [Indexed: 08/13/2023]

Abstract

Algorithms for machine learning have found extensive use in numerous fields and applications. One important aspect of effectively utilizing these algorithms is tuning the hyperparameters to match the specific task at hand. The selection and configuration of hyperparameters directly impact the performance of machine learning models. Achieving optimal hyperparameter settings often requires a deep understanding of the underlying models and the appropriate optimization techniques. While there are many automatic optimization techniques available, each with its own advantages and disadvantages, this article focuses on hyperparameter optimization for well-known machine learning models. It explores cutting-edge optimization methods such as metaheuristic algorithms, deep learning-based optimization, Bayesian optimization, and quantum optimization, and our paper focused mainly on metaheuristic and Bayesian optimization techniques and provides guidance on applying them to different machine learning algorithms. The article also presents real-world applications of hyperparameter optimization by conducting tests on spatial data collections for landslide susceptibility mapping. Based on the experiment's results, both Bayesian optimization and metaheuristic algorithms showed promising performance compared to baseline algorithms. For instance, the metaheuristic algorithm boosted the random forest model's overall accuracy by 5% and 3%, respectively, from baseline optimization methods GS and RS, and by 4% and 2% from baseline optimization methods GA and PSO. Additionally, for models like KNN and SVM, Bayesian methods with Gaussian processes had good results. When compared to the baseline algorithms RS and GS, the accuracy of the KNN model was enhanced by BO-TPE by 1% and 11%, respectively, and by BO-GP by 2% and 12%, respectively. For SVM, BO-TPE outperformed GS and RS by 6% in terms of performance, while BO-GP improved results by 5%. The paper thoroughly discusses the reasons behind the efficiency of these algorithms. By successfully identifying appropriate hyperparameter configurations, this research paper aims to assist researchers, spatial data analysts, and industrial users in developing machine learning models more effectively. The findings and insights provided in this paper can contribute to enhancing the performance and applicability of machine learning algorithms in various domains.

Collapse

Musigmann M, Nacul NG, Kasap DN, Heindel W, Mannil M. Use Test of Automated Machine Learning in Cancer Diagnostics. Diagnostics (Basel) 2023;13:2315. [PMID: 37510059 PMCID: PMC10378334 DOI: 10.3390/diagnostics13142315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/30/2023] [Accepted: 07/03/2023] [Indexed: 07/30/2023] Open

Thölke P, Mantilla-Ramos YJ, Abdelhedi H, Maschke C, Dehgan A, Harel Y, Kemtur A, Mekki Berrada L, Sahraoui M, Young T, Bellemare Pépin A, El Khantour C, Landry M, Pascarella A, Hadid V, Combrisson E, O'Byrne J, Jerbi K. Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. Neuroimage 2023:120253. [PMID: 37385392 DOI: 10.1016/j.neuroimage.2023.120253] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/05/2023] [Accepted: 06/26/2023] [Indexed: 07/01/2023] Open

Abstract

Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.

Collapse

Affiliation(s)

Philipp Thölke Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Institute of Cognitive Science, Osnabrück University, Neuer Graben 29/Schloss, Osnabrück, 49074, Lower Saxony, Germany.
Yorguin-Jose Mantilla-Ramos Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Neuropsychology and Behavior Group (GRUNECO), Faculty of Medicine, Universidad de Antioquia,53-108, Medellin, Aranjuez, Medellin, 050010, Colombia
Hamza Abdelhedi Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Charlotte Maschke Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Integrated Program in Neuroscience, McGill University, 1033 Pine Ave,Montreal, H3A 0G4, Canada
Arthur Dehgan Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Institut de Neurosciences de la Timone (INT), CNRS, Aix Marseille University,Marseille, 13005, France
Yann Harel Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Anirudha Kemtur Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Loubna Mekki Berrada Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Myriam Sahraoui Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Tammy Young Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Department of Computing Science, University of Alberta, 116 St & 85 Ave, Edmonton, T6G 2R3, AB, Canada
Antoine Bellemare Pépin Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Department of Music, Concordia University, 1550 De Maisonneuve Blvd. W., Montreal, H3H 1G8, QC, Canada
Clara El Khantour Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Mathieu Landry Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Annalisa Pascarella Institute for Applied Mathematics Mauro Picone, National Research Council, Roma, Italy, Roma, Italy
Vanessa Hadid Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Etienne Combrisson Institut de Neurosciences de la Timone (INT), CNRS, Aix Marseille University,Marseille, 13005, France
Jordan O'Byrne Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Karim Jerbi Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Mila (Quebec Machine Learning Institute),6666 Rue Saint-Urbain, Montreal, H2S 3H1, QC, Canada; UNIQUE Centre (Quebec Neuro-AI Research Centre), 3744 rue Jean-Brillant, Montreal,H3T 1P1,QC, Canada

Collapse

Haredasht FN, Vanhoutte L, Vens C, Pottel H, Viaene L, De Corte W. Validated risk prediction models for outcomes of acute kidney injury: a systematic review. BMC Nephrol 2023;24:133. [PMID: 37161365 PMCID: PMC10170731 DOI: 10.1186/s12882-023-03150-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 04/03/2023] [Indexed: 05/11/2023] Open

Machine Learning Models to Forecast Outcomes of Pituitary Surgery: A Systematic Review in Quality of Reporting and Current Evidence. Brain Sci 2023;13:brainsci13030495. [PMID: 36979305 PMCID: PMC10046799 DOI: 10.3390/brainsci13030495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/08/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023] Open

Abstract Background: The complex nature and heterogeneity involving pituitary surgery results have increased interest in machine learning (ML) applications for prediction of outcomes over the last decade. This study aims to systematically review the characteristics of ML models involving pituitary surgery outcome prediction and assess their reporting quality. Methods: We searched the PubMed, Scopus, and Web of Knowledge databases for publications on the use of ML to predict pituitary surgery outcomes. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) to assess report quality. Our search strategy was based on the terms “artificial intelligence”, “machine learning”, and “pituitary”. Results: 20 studies were included in this review. The principal models reported in each article were post-surgical endocrine outcomes (n = 10), tumor management (n = 3), and intra- and postoperative complications (n = 7). Overall, the included studies adhered to a median of 65% (IQR = 60–72%) of TRIPOD criteria, ranging from 43% to 83%. The median reported AUC was 0.84 (IQR = 0.80–0.91). The most popular algorithms were support vector machine (n = 5) and random forest (n = 5). Only two studies reported external validation and adherence to any reporting guideline. Calibration methods were not reported in 15 studies. No model achieved the phase of actual clinical applicability. Conclusion: Applications of ML in the prediction of pituitary outcomes are still nascent, as evidenced by the lack of any model validated for clinical practice. Although studies have demonstrated promising results, greater transparency in model development and reporting is needed to enable their use in clinical practice. Further adherence to reporting guidelines can help increase AI’s real-world utility and improve clinical practice. Collapse

Clinicians’ Guide to Artificial Intelligence in Colon Capsule Endoscopy—Technology Made Simple. Diagnostics (Basel) 2023;13:diagnostics13061038. [PMID: 36980347 PMCID: PMC10047552 DOI: 10.3390/diagnostics13061038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/12/2023] Open

Le TD, Noumeir R, Rambaud J, Sans G, Jouvet P. Adaptation of Autoencoder for Sparsity Reduction From Clinical Notes Representation Learning. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2023;11:469-478. [PMID: 37817825 PMCID: PMC10561736 DOI: 10.1109/jtehm.2023.3241635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 01/08/2023] [Accepted: 01/30/2023] [Indexed: 10/12/2023]

Abstract

When dealing with clinical text classification on a small dataset, recent studies have confirmed that a well-tuned multilayer perceptron outperforms other generative classifiers, including deep learning ones. To increase the performance of the neural network classifier, feature selection for the learning representation can effectively be used. However, most feature selection methods only estimate the degree of linear dependency between variables and select the best features based on univariate statistical tests. Furthermore, the sparsity of the feature space involved in the learning representation is ignored.

GOAL

Our aim is, therefore, to access an alternative approach to tackle the sparsity by compressing the clinical representation feature space, where limited French clinical notes can also be dealt with effectively.

METHODS

This study proposed an autoencoder learning algorithm to take advantage of sparsity reduction in clinical note representation. The motivation was to determine how to compress sparse, high-dimensional data by reducing the dimension of the clinical note representation feature space. The classification performance of the classifiers was then evaluated in the trained and compressed feature space.

RESULTS

The proposed approach provided overall performance gains of up to 3% for each test set evaluation. Finally, the classifier achieved 92% accuracy, 91% recall, 91% precision, and 91% f1-score in detecting the patient's condition. Furthermore, the compression working mechanism and the autoencoder prediction process were demonstrated by applying the theoretic information bottleneck framework. Clinical and Translational Impact Statement- An autoencoder learning algorithm effectively tackles the problem of sparsity in the representation feature space from a small clinical narrative dataset. Significantly, it can learn the best representation of the training data because of its lossless compression capacity compared to other approaches. Consequently, its downstream classification ability can be significantly improved, which cannot be done using deep learning models.

Collapse

Physics-Informed Recurrent Neural Networks and Hyper-parameter Optimization for Dynamic Process Systems. Comput Chem Eng 2023. [DOI: 10.1016/j.compchemeng.2023.108195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]

Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes (Basel) 2023. [DOI: 10.3390/pr11020349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open

Analysis of Chest X-ray for COVID-19 Diagnosis as a Use Case for an HPC-Enabled Data Analysis and Machine Learning Platform for Medical Diagnosis Support. Diagnostics (Basel) 2023;13:diagnostics13030391. [PMID: 36766496 PMCID: PMC9914706 DOI: 10.3390/diagnostics13030391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 01/14/2023] [Accepted: 01/18/2023] [Indexed: 01/24/2023] Open

Babayoff O, Shehory O, Geller S, Shitrit-Niselbaum C, Weiss-Meilik A, Sprecher E. Improving Hospital Outpatient Clinics Appointment Schedules by Prediction Models. J Med Syst 2022;47:5. [PMID: 36585996 DOI: 10.1007/s10916-022-01902-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 12/14/2022] [Indexed: 01/01/2023]

A survey on multi-objective hyperparameter optimization algorithms for machine learning. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10359-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Reuse, Reduce, Support: Design Principles for Green Data Mining. BUSINESS & INFORMATION SYSTEMS ENGINEERING 2022. [DOI: 10.1007/s12599-022-00780-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

García-Gutierrez F, Díaz-Álvarez J, Matias-Guiu JA, Pytel V, Matías-Guiu J, Cabrera-Martín MN, Ayala JL. GA-MADRID: design and validation of a machine learning tool for the diagnosis of Alzheimer’s disease and frontotemporal dementia using genetic algorithms. Med Biol Eng Comput 2022;60:2737-2756. [PMID: 35852735 PMCID: PMC9365756 DOI: 10.1007/s11517-022-02630-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 06/29/2022] [Indexed: 01/03/2023]

Jalaeian Zaferani E, Teshnehlab M, Khodadadian A, Heitzinger C, Vali M, Noii N, Wick T. Hyper-Parameter Optimization of Stacked Asymmetric Auto-Encoders for Automatic Personality Traits Perception. SENSORS (BASEL, SWITZERLAND) 2022;22:s22166206. [PMID: 36015967 PMCID: PMC9413006 DOI: 10.3390/s22166206] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 08/15/2022] [Accepted: 08/16/2022] [Indexed: 05/27/2023]

Sun Y, Pfahringer B, Gomes HM, Bifet A. SOKNL: A novel way of integrating K-nearest neighbours with adaptive random forest regression for data streams. Data Min Knowl Discov 2022. [DOI: 10.1007/s10618-022-00858-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Musigmann M, Akkurt BH, Krähling H, Nacul NG, Remonda L, Sartoretti T, Henssen D, Brokinkel B, Stummer W, Heindel W, Mannil M. Testing the applicability and performance of Auto ML for potential applications in diagnostic neuroradiology. Sci Rep 2022;12:13648. [PMID: 35953588 PMCID: PMC9366823 DOI: 10.1038/s41598-022-18028-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 08/03/2022] [Indexed: 11/25/2022] Open

Abstract

To investigate the applicability and performance of automated machine learning (AutoML) for potential applications in diagnostic neuroradiology. In the medical sector, there is a rapidly growing demand for machine learning methods, but only a limited number of corresponding experts. The comparatively simple handling of AutoML should enable even non-experts to develop adequate machine learning models with manageable effort. We aim to investigate the feasibility as well as the advantages and disadvantages of developing AutoML models compared to developing conventional machine learning models. We discuss the results in relation to a concrete example of a medical prediction application. In this retrospective IRB-approved study, a cohort of 107 patients who underwent gross total meningioma resection and a second cohort of 31 patients who underwent subtotal resection were included. Image segmentation of the contrast enhancing parts of the tumor was performed semi-automatically using the open-source software platform 3D Slicer. A total of 107 radiomic features were extracted by hand-delineated regions of interest from the pre-treatment MRI images of each patient. Within the AutoML approach, 20 different machine learning algorithms were trained and tested simultaneously. For comparison, a neural network and different conventional machine learning algorithms were trained and tested. With respect to the exemplary medical prediction application used in this study to evaluate the performance of Auto ML, namely the pre-treatment prediction of the achievable resection status of meningioma, AutoML achieved remarkable performance nearly equivalent to that of a feed-forward neural network with a single hidden layer. However, in the clinical case study considered here, logistic regression outperformed the AutoML algorithm. Using independent test data, we observed the following classification results (AutoML/neural network/logistic regression): mean area under the curve = 0.849/0.879/0.900, mean accuracy = 0.821/0.839/0.881, mean kappa = 0.465/0.491/0.644, mean sensitivity = 0.578/0.577/0.692 and mean specificity = 0.891/0.914/0.936. The results obtained with AutoML are therefore very promising. However, the AutoML models in our study did not yet show the corresponding performance of the best models obtained with conventional machine learning methods. While AutoML may facilitate and simplify the task of training and testing machine learning algorithms as applied in the field of neuroradiology and medical imaging, a considerable amount of expert knowledge may still be needed to develop models with the highest possible discriminatory power for diagnostic neuroradiology.

Collapse

Cao X, Chen H, Li Y, Peng Y, Zhou Y, Cheng L, Liu T, Shen D. Auto-DenseUNet: Searchable neural network architecture for mass segmentation in 3D automated breast ultrasound. Med Image Anal 2022;82:102589. [DOI: 10.1016/j.media.2022.102589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 07/18/2022] [Accepted: 08/17/2022] [Indexed: 11/15/2022]

A Romero RA, Y Deypalan MN, Mehrotra S, Jungao JT, Sheils NE, Manduchi E, Moore JH. Benchmarking AutoML frameworks for disease prediction using medical claims. BioData Min 2022;15:15. [PMID: 35883154 PMCID: PMC9327416 DOI: 10.1186/s13040-022-00300-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 06/27/2022] [Indexed: 11/10/2022] Open

Abstract

Objectives

Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets.

Materials and Methods

We generated a large dataset using historical de-identified administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics.

Results

The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications.

Discussion

Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance.

Conclusion

Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application.

Supplementary Information

The online version contains supplementary material available at (10.1186/s13040-022-00300-2).

Collapse

Batta I, Abrol A, Fu Z, Preda A, van Erp TG, Calhoun VD. Building Models of Functional Interactions Among Brain Domains that Encode Varying Information Complexity: A Schizophrenia Case Study. Neuroinformatics 2022;20:777-791. [PMID: 35267145 PMCID: PMC9463406 DOI: 10.1007/s12021-022-09563-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/12/2022] [Indexed: 12/31/2022]

Abstract

Revealing associations among various structural and functional patterns of the brain can yield highly informative results about the healthy and disordered brain. Studies using neuroimaging data have more recently begun to utilize the information within as well as across various functional and anatomical domains (i.e., groups of brain networks). However, most whole-brain approaches assume similar complexity of interactions throughout the brain. Here we investigate the hypothesis that interactions between brain networks capture varying amounts of complexity, and that we can better capture this information by varying the complexity of the model subspace structure based on available training data. To do this, we employ a Bayesian optimization-based framework known as the Tree Parzen Estimator (TPE) to identify, exploit and analyze patterns of variation in the information encoded by temporal information extracted from functional magnetic resonance imaging (fMRI) subdomains of the brain. Using a repeated cross-validation procedure on a schizophrenia classification task, we demonstrate evidence that interactions between specific functional subdomains are better characterized by more sophisticated model architectures compared to less complicated ones required by the others for optimally contributing towards classification and understanding the brain's functional interactions. We show that functional subdomains known to be involved in schizophrenia require more complex architectures to optimally unravel discriminatory information about the disorder. Our study points to the need for adaptive, hierarchical learning frameworks that cater differently to the features from different subdomains, not only for a better prediction but also for enabling the identification of features predicting the outcome of interest.

Collapse

Determining the Capability of the Tree-Based Pipeline Optimization Tool (TPOT) in Mapping Parthenium Weed Using Multi-Date Sentinel-2 Image Data. REMOTE SENSING 2022. [DOI: 10.3390/rs14071687] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

KOJIMA T, OISHI K, AOKI N, MATSUBARA Y, UETE T, FUKUSHIMA Y, INOUE G, SATO S, SHIRAISHI T, HIROOKA H, MASUDA T. Estimation of beef cow body condition score: a machine learning approach using three-dimensional image data and a simple approach with heart girth measurements. Livest Sci 2022. [DOI: 10.1016/j.livsci.2021.104816] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Al Handawi K, Kokkolaras M. Optimization of Infectious Disease Prevention and Control Policies Using Artificial Life. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2022. [DOI: 10.1109/tetci.2021.3107496] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Liang R, Duan X, Zhang J, Yuan Z. Bayesian based reaction optimization for complex continuous gas–liquid–solid reactions. REACT CHEM ENG 2022. [DOI: 10.1039/d1re00397f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Razzaq M, Clément F, Yvinec R. An overview of deep learning applications in precocious puberty and thyroid dysfunction. Front Endocrinol (Lausanne) 2022;13:959546. [PMID: 36339395 PMCID: PMC9632447 DOI: 10.3389/fendo.2022.959546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/16/2022] [Indexed: 11/24/2022] Open

Kingsmore KM, Puglisi CE, Grammer AC, Lipsky PE. An introduction to machine learning and analysis of its use in rheumatic diseases. Nat Rev Rheumatol 2021;17:710-730. [PMID: 34728818 DOI: 10.1038/s41584-021-00708-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2021] [Indexed: 02/07/2023]

Feature-Based Multi-Class Classification and Novelty Detection for Fault Diagnosis of Industrial Machinery. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11209580] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Sethi M, Ahuja S, Rani S, Bawa P, Zaguia A. Classification of Alzheimer's Disease Using Gaussian-Based Bayesian Parameter Optimization for Deep Convolutional LSTM Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021;2021:4186666. [PMID: 34646334 PMCID: PMC8505090 DOI: 10.1155/2021/4186666] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 09/21/2021] [Accepted: 09/22/2021] [Indexed: 01/22/2023]

Rashidi HH, Tran N, Albahra S, Dang LT. Machine learning in health care and laboratory medicine: General overview of supervised learning and Auto-ML. Int J Lab Hematol 2021;43 Suppl 1:15-22. [PMID: 34288435 DOI: 10.1111/ijlh.13537] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 03/17/2021] [Accepted: 03/25/2021] [Indexed: 11/27/2022]

Rahmatbakhsh M, Gagarinova A, Babu M. Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections. Front Genet 2021;12:667936. [PMID: 34276775 PMCID: PMC8283032 DOI: 10.3389/fgene.2021.667936] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/08/2021] [Indexed: 12/13/2022] Open

Abstract

Microbial pathogens have evolved numerous mechanisms to hijack host's systems, thus causing disease. This is mediated by alterations in the combined host-pathogen proteome in time and space. Mass spectrometry-based proteomics approaches have been developed and tailored to map disease progression. The result is complex multidimensional data that pose numerous analytic challenges for downstream interpretation. However, a systematic review of approaches for the downstream analysis of such data has been lacking in the field. In this review, we detail the steps of a typical temporal and spatial analysis, including data pre-processing steps (i.e., quality control, data normalization, the imputation of missing values, and dimensionality reduction), different statistical and machine learning approaches, validation, interpretation, and the extraction of biological information from mass spectrometry data. We also discuss current best practices for these steps based on a collection of independent studies to guide users in selecting the most suitable strategies for their dataset and analysis objectives. Moreover, we also compiled the list of commonly used R software packages for each step of the analysis. These could be easily integrated into one's analysis pipeline. Furthermore, we guide readers through various analysis steps by applying these workflows to mock and host-pathogen interaction data from public datasets. The workflows presented in this review will serve as an introduction for data analysis novices, while also helping established users update their data analysis pipelines. We conclude the review by discussing future directions and developments in temporal and spatial proteomics and data analysis approaches. Data analysis codes, prepared for this review are available from https://github.com/BabuLab-UofR/TempSpac, where guidelines and sample datasets are also offered for testing purposes.

Collapse

Meti N, Sadeghi-Naini A, Tran WT. Reply to A. Pfob et al. JCO Clin Cancer Inform 2021;5:656-657. [PMID: 34110932 DOI: 10.1200/cci.21.00059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Wani MA, Roy KK. Development and validation of consensus machine learning-based models for the prediction of novel small molecules as potential anti-tubercular agents. Mol Divers 2021;26:1345-1356. [PMID: 34110578 DOI: 10.1007/s11030-021-10238-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/27/2021] [Indexed: 11/30/2022]

Abstract

Tuberculosis (TB) is an infectious disease and the leading cause of death globally. The rapidly emerging cases of drug resistance among pathogenic mycobacteria have been a global threat urging the need of new drug discovery and development. However, considering the fact that the new drug discovery and development is commonly lengthy and costly processes, strategic use of the cutting-edge machine learning (ML) algorithms may be very supportive in reducing both the cost and time involved. Considering the urgency of new drugs for TB, herein, we have attempted to develop predictive ML algorithms-based models useful in the selection of novel potential small molecules for subsequent in vitro validation. For this purpose, we used the GlaxoSmithKline (GSK) TCAMS TB dataset comprising a total of 776 hits that were made publicly available to the wider scientific community through the ChEMBL Neglected Tropical Diseases (ChEMBL-NTD) database. After exploring the different ML classifiers, viz. decision trees (DT), support vector machine (SVM), random forest (RF), Bernoulli Naive Bayes (BNB), K-nearest neighbors (k-NN), and linear logistic regression (LLR), and ensemble learning models (bagging and Adaboost) for training the model using the GSK dataset, we concluded with three best models, viz. Adaboost decision tree (ABDT), RF classifier, and k-NN models that gave the top prediction results for both the training and test sets. However, during the prediction of the external set of known anti-tubercular compounds/drugs, it was realized that each of these models had some limitations. The ABDT model correctly predicted 22 molecules as actives, while both the RF and k-NN models predicted 18 molecules correctly as actives; a number of molecules were predicted as actives by two of these models, while the third model predicted these compounds as inactives. Therefore, we concluded that while deciding the anti-tubercular potential of a new molecule, one should rely on the use of consensus predictions using these three models; it may lessen the attrition rate during the in vitro validation. We believe that this study may assist the wider anti-tuberculosis research community by providing a platform for predicting small molecules with subsequent validation for drug discovery and development.

Collapse

Pascual-Triana JD, Charte D, Andrés Arroyo M, Fernández A, Herrera F. Revisiting data complexity metrics based on morphology for overlap and imbalance: snapshot, new overlap number of balls metrics and singular problems prospect. Knowl Inf Syst 2021. [DOI: 10.1007/s10115-021-01577-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Krittanawong C, Virk HUH, Kumar A, Aydar M, Wang Z, Stewart MP, Halperin JL. Machine learning and deep learning to predict mortality in patients with spontaneous coronary artery dissection. Sci Rep 2021;11:8992. [PMID: 33903608 PMCID: PMC8076284 DOI: 10.1038/s41598-021-88172-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/23/2021] [Indexed: 12/30/2022] Open

Abstract

Machine learning (ML) and deep learning (DL) can successfully predict high prevalence events in very large databases (big data), but the value of this methodology for risk prediction in smaller cohorts with uncommon diseases and infrequent events is uncertain. The clinical course of spontaneous coronary artery dissection (SCAD) is variable, and no reliable methods are available to predict mortality. Based on the hypothesis that machine learning (ML) and deep learning (DL) techniques could enhance the identification of patients at risk, we applied a deep neural network to information available in electronic health records (EHR) to predict in-hospital mortality in patients with SCAD. We extracted patient data from the EHR of an extensive urban health system and applied several ML and DL models using candidate clinical variables potentially associated with mortality. We partitioned the data into training and evaluation sets with cross-validation. We estimated model performance based on the area under the receiver-operator characteristics curve (AUC) and balanced accuracy. As sensitivity analyses, we examined results limited to cases with complete clinical information available. We identified 375 SCAD patients of which mortality during the index hospitalization was 11.5%. The best-performing DL algorithm identified in-hospital mortality with AUC 0.98 (95% CI 0.97-0.99), compared to other ML models (P < 0.0001). For prediction of mortality using ML models in patients with SCAD, the AUC ranged from 0.50 with the random forest method (95% CI 0.41-0.58) to 0.95 with the AdaBoost model (95% CI 0.93-0.96), with intermediate performance using logistic regression, decision tree, support vector machine, K-nearest neighbors, and extreme gradient boosting methods. A deep neural network model was associated with higher predictive accuracy and discriminative power than logistic regression or ML models for identification of patients with ACS due to SCAD prone to early mortality.

Collapse

Paleico ML, Behler J. A bin and hash method for analyzing reference data and descriptors in machine learning potentials. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abe663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract Abstract In recent years the development of machine learning potentials (MLPs) has become a very active field of research. Numerous approaches have been proposed, which allow one to perform extended simulations of large systems at a small fraction of the computational costs of electronic structure calculations. The key to the success of modern MLPs is the close-to first principles quality description of the atomic interactions. This accuracy is reached by using very flexible functional forms in combination with high-level reference data from electronic structure calculations. These data sets can include up to hundreds of thousands of structures covering millions of atomic environments to ensure that all relevant features of the potential energy surface are well represented. The handling of such large data sets is nowadays becoming one of the main challenges in the construction of MLPs. In this paper we present a method, the bin-and-hash (BAH) algorithm, to overcome this problem by enabling the efficient identification and comparison of large numbers of multidimensional vectors. Such vectors emerge in multiple contexts in the construction of MLPs. Examples are the comparison of local atomic environments to identify and avoid unnecessary redundant information in the reference data sets that is costly in terms of both the electronic structure calculations as well as the training process, the assessment of the quality of the descriptors used as structural fingerprints in many types of MLPs, and the detection of possibly unreliable data points. The BAH algorithm is illustrated for the example of high-dimensional neural network potentials using atom-centered symmetry functions for the geometrical description of the atomic environments, but the method is general and can be combined with any current type of MLP. Collapse

Li J, Zhou Z, Dong J, Fu Y, Li Y, Luan Z, Peng X. Predicting breast cancer 5-year survival using machine learning: A systematic review. PLoS One 2021;16:e0250370. [PMID: 33861809 PMCID: PMC8051758 DOI: 10.1371/journal.pone.0250370] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 04/06/2021] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Accurately predicting the survival rate of breast cancer patients is a major issue for cancer researchers. Machine learning (ML) has attracted much attention with the hope that it could provide accurate results, but its modeling methods and prediction performance remain controversial. The aim of this systematic review is to identify and critically appraise current studies regarding the application of ML in predicting the 5-year survival rate of breast cancer.

METHODS

In accordance with the PRISMA guidelines, two researchers independently searched the PubMed (including MEDLINE), Embase, and Web of Science Core databases from inception to November 30, 2020. The search terms included breast neoplasms, survival, machine learning, and specific algorithm names. The included studies related to the use of ML to build a breast cancer survival prediction model and model performance that can be measured with the value of said verification results. The excluded studies in which the modeling process were not explained clearly and had incomplete information. The extracted information included literature information, database information, data preparation and modeling process information, model construction and performance evaluation information, and candidate predictor information.

RESULTS

Thirty-one studies that met the inclusion criteria were included, most of which were published after 2013. The most frequently used ML methods were decision trees (19 studies, 61.3%), artificial neural networks (18 studies, 58.1%), support vector machines (16 studies, 51.6%), and ensemble learning (10 studies, 32.3%). The median sample size was 37256 (range 200 to 659820) patients, and the median predictor was 16 (range 3 to 625). The accuracy of 29 studies ranged from 0.510 to 0.971. The sensitivity of 25 studies ranged from 0.037 to 1. The specificity of 24 studies ranged from 0.008 to 0.993. The AUC of 20 studies ranged from 0.500 to 0.972. The precision of 6 studies ranged from 0.549 to 1. All of the models were internally validated, and only one was externally validated.

CONCLUSIONS

Overall, compared with traditional statistical methods, the performance of ML models does not necessarily show any improvement, and this area of research still faces limitations related to a lack of data preprocessing steps, the excessive differences of sample feature selection, and issues related to validation. Further optimization of the performance of the proposed model is also needed in the future, which requires more standardization and subsequent validation.

Collapse

Mapping Opuntia stricta in the Arid and Semi-Arid Environment of Kenya Using Sentinel-2 Imagery and Ensemble Machine Learning Classifiers. REMOTE SENSING 2021. [DOI: 10.3390/rs13081494] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Abstract Globally, grassland biomes form one of the largest terrestrial covers and present critical social–ecological benefits. In Kenya, Arid and Semi-arid Lands (ASAL) occupy 80% of the landscape and are critical for the livelihoods of millions of pastoralists. However, they have been invaded by Invasive Plant Species (IPS) thereby compromising their ecosystem functionality. Opuntia stricta, a well-known IPS, has invaded the ASAL in Kenya and poses a threat to pastoralism, leading to livestock mortality and land degradation. Thus, identification and detailed estimation of its cover is essential for drawing an effective management strategy. The study aimed at utilizing the Sentinel-2 multispectral sensor to detect Opuntia stricta in a heterogeneous ASAL in Laikipia County, using ensemble machine learning classifiers. To illustrate the potential of Sentinel-2, the detection of Opuntia stricta was based on only the spectral bands as well as in combination with vegetation and topographic indices using Extreme Gradient Boost (XGBoost) and Random Forest (RF) classifiers to detect the abundance. Study results showed that the overall accuracies of Sentinel 2 spectral bands were 80% and 84.4%, while that of combined spectral bands, vegetation, and topographic indices was 89.2% and 92.4% for XGBoost and RF classifiers, respectively. The inclusion of topographic indices that enhance characterization of biological processes, and vegetation indices that minimize the influence of soil and the effects of atmosphere, contributed by improving the accuracy of the classification. Qualitatively, Opuntia stricta spatially was found along river banks, flood plains, and near settlements but limited in forested areas. Our results demonstrated the potential of Sentinel-2 multispectral sensors to effectively detect and map Opuntia stricta in a complex heterogeneous ASAL, which can support conservation and rangeland management policies that aim to map and list threatened areas, and conserve the biodiversity and productivity of rangeland ecosystems. Collapse

Kalina J, Neoral A, Vidnerová P. Effective Automatic Method Selection for Nonlinear Regression Modeling. Int J Neural Syst 2021;31:2150020. [PMID: 33787471 DOI: 10.1142/s0129065721500209] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Zhou W, Luo G. Parameter Sensitivity Analysis for the Progressive Sampling-Based Bayesian Optimization Method for Automated Machine Learning Model Selection. HETEROGENOUS DATA MANAGEMENT, POLYSTORES, AND ANALYTICS FOR HEALTHCARE : VLDB WORKSHOPS, POLY 2020 AND DMAH 2020 VIRTUAL EVENT, AUGUST 31 AND SEPTEMBER 4, 2020 : REVISED SELECTED PAPERS 2021;12633:213-227. [PMID: 33768220 DOI: 10.1007/978-3-030-71055-2_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Automated Machine Learning for Healthcare and Clinical Notes Analysis. COMPUTERS 2021. [DOI: 10.3390/computers10020024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Abstract Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes. Collapse

Shields BJ, Stevens J, Li J, Parasram M, Damani F, Alvarado JIM, Janey JM, Adams RP, Doyle AG. Bayesian reaction optimization as a tool for chemical synthesis. Nature 2021;590:89-96. [DOI: 10.1038/s41586-021-03213-y] [Citation(s) in RCA: 132] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 12/11/2020] [Indexed: 02/04/2023]