51
|
Ibrahim M, Beneyto A, Contreras I, Vehi J. An ensemble machine learning approach for the detection of unannounced meals to enhance postprandial glucose control. Comput Biol Med 2024; 171:108154. [PMID: 38382387 DOI: 10.1016/j.compbiomed.2024.108154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 02/02/2024] [Accepted: 02/12/2024] [Indexed: 02/23/2024]
Abstract
BACKGROUND Hybrid automated insulin delivery systems enhance postprandial glucose control in type 1 diabetes, however, meal announcements are burdensome. To overcome this, we propose a machine learning-based automated meal detection approach; METHODS:: A heterogeneous ensemble method combining an artificial neural network, random forest, and logistic regression was employed. Trained and tested on data from two in-silico cohorts comprising 20 and 47 patients. It accounted for various meal sizes (moderate to high) and glucose appearance rates (slow and rapid absorbing). To produce an optimal prediction model, three ensemble configurations were used: logical AND, majority voting, and logical OR. In addition to the in-silico data, the proposed meal detector was also trained and tested using the OhioT1DM dataset. Finally, the meal detector is combined with a bolus insulin compensation scheme; RESULTS:: The ensemble majority voting obtained the best meal detector results for both the in-silico and OhioT1DM cohorts with a sensitivity of 77%, 94%, 61%, precision of 96%, 89%, 72%, F1-score of 85%, 91%, 66%, and with false positives per day values of 0.05, 0.19, 0.17, respectively. Automatic meal detection with insulin compensation has been performed in open-loop insulin therapy using the AND ensemble, chosen for its lower false positive rate. Time-in-range has significantly increased 10.48% and 16.03%, time above range was reduced by 5.16% and 11.85%, with a minimal time below range increase of 0.35% and 2.69% for both in-silico cohorts, respectively, compared to the results without a meal detector; CONCLUSION:: To increase the overall accuracy and robustness of the predictions, this ensemble methodology aims to take advantage of each base model's strengths. All of the results point to the potential application of the proposed meal detector as a separate module for the detection of meals in automated insulin delivery systems to achieve improved glycemic control.
Collapse
Affiliation(s)
- Muhammad Ibrahim
- Modeling, Identification and Control Engineering Laboratory (MICELab), Institut d'Informàtica i Aplicacions, Universitat de Girona, Girona, Spain
| | - Aleix Beneyto
- Modeling, Identification and Control Engineering Laboratory (MICELab), Institut d'Informàtica i Aplicacions, Universitat de Girona, Girona, Spain
| | - Ivan Contreras
- Modeling, Identification and Control Engineering Laboratory (MICELab), Institut d'Informàtica i Aplicacions, Universitat de Girona, Girona, Spain
| | - Josep Vehi
- Modeling, Identification and Control Engineering Laboratory (MICELab), Institut d'Informàtica i Aplicacions, Universitat de Girona, Girona, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Madrid, Spain.
| |
Collapse
|
52
|
Chen D, Gu X, Guo H, Cheng T, Yang J, Zhan Y, Fu Q. Spatiotemporally continuous PM 2.5 dataset in the Mekong River Basin from 2015 to 2022 using a stacking model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 914:169801. [PMID: 38184264 DOI: 10.1016/j.scitotenv.2023.169801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/13/2023] [Accepted: 12/29/2023] [Indexed: 01/08/2024]
Abstract
With the potential to cause millions of deaths, PM2.5 pollution has become a global concern. In Southeast Asia, the Mekong River Basin (MRB) is experiencing heavy PM2.5 pollution and the existing PM2.5 studies in the MRB are limited in terms of accuracy and spatiotemporal coverage. To achieve high-accuracy and long-term PM2.5 monitoring of the MRB, fused aerosol optical depth (AOD) data and multi-source auxiliary data are fed into a stacking model to estimate PM2.5 concentrations. The proposed stacking model takes advantage of convolutional neural network (CNN) and Light Gradient Boosting Machine (LightGBM) models and can well represent the spatiotemporal heterogeneity of the PM2.5-AOD relationship. In the cross-validation (CV), comparison with CNN and LightGBM models shows that the stacking model can better suppress overfitting, with a higher coefficient of determination (R2) of 0.92, a lower root mean square error (RMSE) of 5.58 μg/m3, and a lower mean absolute error (MAE) of 3.44 μg/m3. For the first time, the high-accuracy PM2.5 dataset reveals spatially and temporally continuous PM2.5 pollution and variations in the MRB from 2015 to 2022. Moreover, the spatiotemporal variations of annual and monthly PM2.5 pollution are also investigated at the regional and national scales. The dataset will contribute to the analysis of the causes of PM2.5 pollution and the development of mitigation policies in the MRB.
Collapse
Affiliation(s)
- Debao Chen
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
| | - Xingfa Gu
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China; School of Remote Sensing and Information Engineering, North China Institute of Aerospace Engineering, Langfang, China
| | - Hong Guo
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China.
| | - Tianhai Cheng
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
| | - Jian Yang
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
| | - Yulin Zhan
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
| | - Qiming Fu
- School of Remote Sensing and Information Engineering, North China Institute of Aerospace Engineering, Langfang, China
| |
Collapse
|
53
|
Kabir E, Guikema SD, Quiring SM. Power outage prediction using data streams: An adaptive ensemble learning approach with a feature- and performance-based weighting mechanism. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2024; 44:686-704. [PMID: 37666505 DOI: 10.1111/risa.14211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
A wide variety of weather conditions, from windstorms to prolonged heat events, can substantially impact power systems, posing many risks and inconveniences due to power outages. Accurately estimating the probability distribution of the number of customers without power using data about the power utility system and environmental and weather conditions can help utilities restore power more quickly and efficiently. However, the critical shortcoming of current models lies in the difficulties of handling (i) data streams and (ii) model uncertainty due to combining data from various weather events. Accordingly, this article proposes an adaptive ensemble learning algorithm for data streams, which deploys a feature- and performance-based weighting mechanism to adaptively combine outputs from multiple competitive base learners. As a proof of concept, we use a large, real data set of daily customer interruptions to develop the first adaptive all-weather outage prediction model using data streams. We benchmark several approaches to demonstrate the advantage of our approach in offering more accurate probabilistic predictions. The results show that the proposed algorithm reduces the probabilistic predictions' error of the base learners between 4% and 22% with an average of 8%, which also result in substantially more accurate point predictions. The improvement made by our algorithm is enhanced as we exchange base learners with simpler models.
Collapse
Affiliation(s)
- Elnaz Kabir
- Department of Engineering Technology & Industrial Distribution, Texas A&M University, College Station, Texas, USA
| | - Seth D Guikema
- Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Steven M Quiring
- Department of Geography, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
54
|
Kikuchi Y, Kawczynski MG, Anegondi N, Neubert A, Dai J, Ferrara D, Quezada-Ruiz C. Machine Learning to Predict Faricimab Treatment Outcome in Neovascular Age-Related Macular Degeneration. OPHTHALMOLOGY SCIENCE 2024; 4:100385. [PMID: 37868796 PMCID: PMC10585644 DOI: 10.1016/j.xops.2023.100385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 08/07/2023] [Accepted: 08/10/2023] [Indexed: 10/24/2023]
Abstract
Purpose To develop machine learning (ML) models to predict, at baseline, treatment outcomes at month 9 in patients with neovascular age-related macular degeneration (nAMD) receiving faricimab. Design Retrospective proof of concept study. Participants Patients enrolled in the phase II AVENUE trial (NCT02484690) of faricimab in nAMD. Methods Baseline characteristics and spectral domain-OCT (SD-OCT) image data from 185 faricimab-treated eyes were split into 80% training and 20% test sets at the patient level. Input variables were baseline age, sex, best-corrected visual acuity (BCVA), central subfield thickness (CST), low luminance deficit, treatment arm, and SD-OCT images. A regression problem (BCVA) and a binary classification problem (reduction of CST by 35%) were considered. Overall, 10 models were developed and tested for each problem. Benchmark classical ML models (linear, random forest, extreme gradient boosting) were trained on baseline characteristics; benchmark deep neural networks (DNNs) were trained on baseline SD-OCT B-scans. Baseline characteristics and SD-OCT data were merged using 2 approaches: model stacking (using DNN prediction as an input feature for classical ML models) and model averaging (which averaged predictions from the DNN using SD-OCT volume and from classical ML models using baseline characteristics). Main Outcome Measures Treatment outcomes were defined by 2 target variables: functional (BCVA letter score) and anatomical (percent decrease in CST from baseline) outcomes at month 9. Results The best-performing BCVA regression model with respect to the test coefficient of determination (R2) was the linear model in the model-stacking approach with R2 of 0.31. The best-performing CST classification model with respect to test area under receiver operating characteristics (AUROC) was the benchmark linear model with AUROC of 0.87. A post hoc analysis showed the baseline BCVA and the baseline CST had the most effect in the all-model prediction for BCVA regression and CST classification, respectively. Conclusions Promising signals for predicting treatment outcomes from baseline characteristics were detected; however, the predictive benefit of baseline images was unclear in this proof-of-concept study. Further testing and validation with larger, independent datasets is required to fully explore the predictive capacity of ML models using baseline imaging data. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Yusuke Kikuchi
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
- Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California
| | - Michael G. Kawczynski
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
| | - Neha Anegondi
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
- Clinical Imaging Group, Genentech, Inc., South San Francisco, California
| | - Ales Neubert
- Data & Analytics, Roche Pharma Research and Early Development, Basel, Switzerland
| | - Jian Dai
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
| | - Daniela Ferrara
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
| | - Carlos Quezada-Ruiz
- Clinical Science, Genentech, Inc., South San Francisco, California
- Department of Ophthalmology, Clínica de Ojos Garza Viejo, San Pedro Garza, Garcia, Nuevo Leon, Mexico
| |
Collapse
|
55
|
Song A, Lusk JB, Roh KM, Hsu ST, Valikodath NG, Lad EM, Muir KW, Engelhard MM, Limkakeng AT, Izatt JA, McNabb RP, Kuo AN. RobOCTNet: Robotics and Deep Learning for Referable Posterior Segment Pathology Detection in an Emergency Department Population. Transl Vis Sci Technol 2024; 13:12. [PMID: 38488431 PMCID: PMC10946693 DOI: 10.1167/tvst.13.3.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/31/2024] [Indexed: 03/19/2024] Open
Abstract
Purpose To evaluate the diagnostic performance of a robotically aligned optical coherence tomography (RAOCT) system coupled with a deep learning model in detecting referable posterior segment pathology in OCT images of emergency department patients. Methods A deep learning model, RobOCTNet, was trained and internally tested to classify OCT images as referable versus non-referable for ophthalmology consultation. For external testing, emergency department patients with signs or symptoms warranting evaluation of the posterior segment were imaged with RAOCT. RobOCTNet was used to classify the images. Model performance was evaluated against a reference standard based on clinical diagnosis and retina specialist OCT review. Results We included 90,250 OCT images for training and 1489 images for internal testing. RobOCTNet achieved an area under the curve (AUC) of 1.00 (95% confidence interval [CI], 0.99-1.00) for detection of referable posterior segment pathology in the internal test set. For external testing, RAOCT was used to image 72 eyes of 38 emergency department patients. In this set, RobOCTNet had an AUC of 0.91 (95% CI, 0.82-0.97), a sensitivity of 95% (95% CI, 87%-100%), and a specificity of 76% (95% CI, 62%-91%). The model's performance was comparable to two human experts' performance. Conclusions A robotically aligned OCT coupled with a deep learning model demonstrated high diagnostic performance in detecting referable posterior segment pathology in a cohort of emergency department patients. Translational Relevance Robotically aligned OCT coupled with a deep learning model may have the potential to improve emergency department patient triage for ophthalmology referral.
Collapse
Affiliation(s)
- Ailin Song
- Duke University School of Medicine, Durham, NC, USA
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - Jay B. Lusk
- Duke University School of Medicine, Durham, NC, USA
| | - Kyung-Min Roh
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - S. Tammy Hsu
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | | | - Eleonora M. Lad
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - Kelly W. Muir
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - Matthew M. Engelhard
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | | | - Joseph A. Izatt
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Ryan P. McNabb
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - Anthony N. Kuo
- Department of Ophthalmology, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| |
Collapse
|
56
|
Li X, Gao P. Significant duration prediction of seismic ground motions using machine learning algorithms. PLoS One 2024; 19:e0299639. [PMID: 38416770 PMCID: PMC10901361 DOI: 10.1371/journal.pone.0299639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 01/29/2024] [Indexed: 03/01/2024] Open
Abstract
This study aims to predict the significant duration (D5-75, D5-95) of seismic motion by employing machine learning algorithms. Based on three parameters (moment magnitude, fault distance, and average shear wave velocity), two additional parameters(fault top depth and epicenter mechanism parameters) were introduced in this study. The XGBoost algorithm is utilized for characteristic parameter optimization analysis to obtain the optimal combination of four parameters. We compare the prediction results of four machine learning algorithms (random forest, XGBoost, BP neural network, and SVM) and develop a new method of significant duration prediction by constructing two fusion models (stacking and weighted averaging). The fusion model demonstrates an improvement in prediction accuracy and generalization ability of the significant duration when compared to single algorithm models based on evaluation indicators and residual values. The accuracy and rationality of the fusion model are validated through comparison with existing research.
Collapse
Affiliation(s)
- Xinle Li
- College of Civil Engineering, Dalian Minzu University, Dalian, 116600, Liaoning, China
| | - Pei Gao
- College of Civil Engineering, Dalian Minzu University, Dalian, 116600, Liaoning, China
| |
Collapse
|
57
|
Sanchez-Pinto LN, Bennett TD, DeWitt PE, Russell S, Rebull MN, Martin B, Akech S, Albers DJ, Alpern ER, Balamuth F, Bembea M, Chisti MJ, Evans I, Horvat CM, Jaramillo-Bustamante JC, Kissoon N, Menon K, Scott HF, Weiss SL, Wiens MO, Zimmerman JJ, Argent AC, Sorce LR, Schlapbach LJ, Watson RS. Development and Validation of the Phoenix Criteria for Pediatric Sepsis and Septic Shock. JAMA 2024; 331:675-686. [PMID: 38245897 PMCID: PMC10900964 DOI: 10.1001/jama.2024.0196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 01/05/2024] [Indexed: 01/23/2024]
Abstract
Importance The Society of Critical Care Medicine Pediatric Sepsis Definition Task Force sought to develop and validate new clinical criteria for pediatric sepsis and septic shock using measures of organ dysfunction through a data-driven approach. Objective To derive and validate novel criteria for pediatric sepsis and septic shock across differently resourced settings. Design, Setting, and Participants Multicenter, international, retrospective cohort study in 10 health systems in the US, Colombia, Bangladesh, China, and Kenya, 3 of which were used as external validation sites. Data were collected from emergency and inpatient encounters for children (aged <18 years) from 2010 to 2019: 3 049 699 in the development (including derivation and internal validation) set and 581 317 in the external validation set. Exposure Stacked regression models to predict mortality in children with suspected infection were derived and validated using the best-performing organ dysfunction subscores from 8 existing scores. The final model was then translated into an integer-based score used to establish binary criteria for sepsis and septic shock. Main Outcomes and Measures The primary outcome for all analyses was in-hospital mortality. Model- and integer-based score performance measures included the area under the precision recall curve (AUPRC; primary) and area under the receiver operating characteristic curve (AUROC; secondary). For binary criteria, primary performance measures were positive predictive value and sensitivity. Results Among the 172 984 children with suspected infection in the first 24 hours (development set; 1.2% mortality), a 4-organ-system model performed best. The integer version of that model, the Phoenix Sepsis Score, had AUPRCs of 0.23 to 0.38 (95% CI range, 0.20-0.39) and AUROCs of 0.71 to 0.92 (95% CI range, 0.70-0.92) to predict mortality in the validation sets. Using a Phoenix Sepsis Score of 2 points or higher in children with suspected infection as criteria for sepsis and sepsis plus 1 or more cardiovascular point as criteria for septic shock resulted in a higher positive predictive value and higher or similar sensitivity compared with the 2005 International Pediatric Sepsis Consensus Conference (IPSCC) criteria across differently resourced settings. Conclusions and Relevance The novel Phoenix sepsis criteria, which were derived and validated using data from higher- and lower-resource settings, had improved performance for the diagnosis of pediatric sepsis and septic shock compared with the existing IPSCC criteria.
Collapse
Affiliation(s)
- L. Nelson Sanchez-Pinto
- Departments of Pediatrics (Critical Care) and Preventive Medicine (Health and Biomedical Informatics), Northwestern University Feinberg School of Medicine, and Ann and Robert H. Lurie Children’s Hospital of Chicago, Chicago, Illinois
| | - Tellen D. Bennett
- Departments of Biomedical Informatics and Pediatrics (Critical Care Medicine), University of Colorado School of Medicine, and Children’s Hospital Colorado, Aurora
| | - Peter E. DeWitt
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora
| | - Seth Russell
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora
| | - Margaret N. Rebull
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora
| | - Blake Martin
- Departments of Biomedical Informatics and Pediatrics (Critical Care Medicine), University of Colorado School of Medicine, and Children’s Hospital Colorado, Aurora
| | - Samuel Akech
- Kenya Medical Research Institute (KEMRI)–Wellcome Trust Research Programme, Nairobi, Kenya
| | - David J. Albers
- Departments of Biomedical Informatics, Bioengineering, Biostatistics, and Informatics, University of Colorado School of Medicine, Aurora
- Department of Biomedical Informatics, Columbia University, New York, New York
| | - Elizabeth R. Alpern
- Division of Emergency Medicine, Department of Pediatrics, Ann and Robert H. Lurie Children’s Hospital of Chicago, and Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Fran Balamuth
- Department of Pediatrics, University of Pennsylvania, Perelman School of Medicine and Division of Emergency Medicine, Children’s Hospital of Philadelphia, Philadelphia
| | - Melania Bembea
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Mohammod Jobayer Chisti
- Intensive Care Unit, Dhaka Hospital, Nutrition Research Division, International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | - Idris Evans
- Clinical Research, Investigation, and Systems Modeling of Acute Illness (CRISMA) Center, Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Christopher M. Horvat
- Clinical Research, Investigation, and Systems Modeling of Acute Illness (CRISMA) Center, Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Juan Camilo Jaramillo-Bustamante
- Pediatric Intensive Care Unit, Hospital General de Medellín Luz Castro de Gutiérrez and Hospital Pablo Tobón Uribe, and Red Colaborativa Pediátrica de Latinoamérica (LARed Network), Medellín, Colombia
| | - Niranjan Kissoon
- Department of Pediatrics, University of British Columbia, Vancouver, Canada
| | - Kusum Menon
- Department of Pediatrics, Children’s Hospital of Eastern Ontario and University of Ottawa, Ottawa, Canada
| | - Halden F. Scott
- Department of Pediatrics (Pediatric Emergency Medicine), University of Colorado School of Medicine, and Children’s Hospital Colorado, Aurora
| | - Scott L. Weiss
- Division of Critical Care, Department of Pediatrics, Nemours Children’s Health, Wilmington, Delaware
- Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Matthew O. Wiens
- Department of Anesthesiology, Pharmacology, and Therapeutics, Faculty of Medicine, University of British Columbia, Vancouver, Canada
- Institute for Global Health, BC Children’s Hospital, Vancouver, British Columbia, Canada
- Walimu, Kampala, Uganda
| | - Jerry J. Zimmerman
- Seattle Children’s Hospital and Department of Pediatrics, University of Washington School of Medicine, Seattle
| | - Andrew C. Argent
- Paediatrics and Child Health, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa
| | - Lauren R. Sorce
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, and Ann and Robert H. Lurie Children’s Hospital of Chicago, Chicago, Illinois
| | - Luregn J. Schlapbach
- Department of Intensive Care and Neonatology, Children’s Research Center, University Children’s Hospital Zurich, University of Zurich, Zurich, Switzerland
- Child Health Research Centre, The University of Queensland, Brisbane, Australia
| | - R. Scott Watson
- Department of Pediatrics, University of Washington, and Center for Child Health, Behavior, and Development and Pediatric Critical Care, Seattle Children’s Hospital, Seattle
| | | |
Collapse
|
58
|
Ju CW, Shen Y, French EJ, Yi J, Bi H, Tian A, Lin Z. Accurate Electronic and Optical Properties of Organic Doublet Radicals Using Machine Learned Range-Separated Functionals. J Phys Chem A 2024. [PMID: 38382058 DOI: 10.1021/acs.jpca.3c07437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Luminescent organic semiconducting doublet-spin radicals are unique and emergent optical materials because their fluorescent quantum yields (Φfl) are not compromised by the spin-flipping intersystem crossing (ISC) into a dark high-spin state. The multiconfigurational nature of these radicals challenges their electronic structure calculations in the framework of single-reference density functional theory (DFT) and introduces room for method improvement. In the present study, we extended our earlier development of ML-ωPBE [J. Phys. Chem. Lett., 2021, 12, 9516-9524], a range-separated hybrid (RSH) exchange-correlation (XC) functional constructed using the stacked ensemble machine learning (SEML) algorithm, from closed-shell organic semiconducting molecules to doublet-spin organic semiconducting radicals. We assessed its performance for a new test set of 64 doublet-spin radicals from five categories while placing all previously compiled 3926 closed-shell molecules in the new training set. Interestingly, ML-ωPBE agrees with the nonempirical OT-ωPBE functional regarding the prediction of the molecule-dependent range-separation parameter (ω), with a small mean absolute error (MAE) of 0.0197 a0-1, but saves the computational cost by 2.46 orders of magnitude. This result demonstrates an outstanding domain adaptation capacity of ML-ωPBE for diverse organic semiconducting species. To further assess the predictive power of ML-ωPBE in experimental observables, we also applied it to evaluate absorption and fluorescence energies (Eabs and Efl) using linear-response time-dependent DFT (TDDFT), and we compared its behavior with nine popular XC functionals. For most radicals, ML-ωPBE reproduces experimental measurements of Eabs and Efl with small MAEs of 0.299 and 0.254 eV, only marginally different from those of OT-ωPBE. Our work illustrates a successful extension of the SEML framework from closed-shell molecules to doublet-spin radicals and will open the venue for calculating optical properties for organic semiconductors using single-reference TDDFT.
Collapse
Affiliation(s)
- Cheng-Wei Ju
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
| | - Yili Shen
- Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Ethan J French
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Charlestown, Massachusetts 02129, United States
| | - Jun Yi
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Chemistry, Wake Forest University, Winston-Salem, North Carolina 27109, United States
| | - Hongshan Bi
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
| | - Aaron Tian
- Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts 01003, United States
| | - Zhou Lin
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
| |
Collapse
|
59
|
He X, Ghasemian A, Lee E, Clauset A, Mucha PJ. Sequential stacking link prediction algorithms for temporal networks. Nat Commun 2024; 15:1364. [PMID: 38355612 PMCID: PMC10866871 DOI: 10.1038/s41467-024-45598-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 01/29/2024] [Indexed: 02/16/2024] Open
Abstract
Link prediction algorithms are indispensable tools in many scientific applications by speeding up network data collection and imputing missing connections. However, in many systems, links change over time and it remains unclear how to optimally exploit such temporal information for link predictions in such networks. Here, we show that many temporal topological features, in addition to having high computational cost, are less accurate in temporal link prediction than sequentially stacked static network features. This sequential stacking link prediction method uses 41 static network features that avoid detailed feature engineering choices and is capable of learning a highly accurate predictive distribution of future connections from historical data. We demonstrate that this algorithm works well for both partially observed and completely unobserved target layers, and on two temporal stochastic block models achieves near-oracle-level performance when combined with other single predictor methods as an ensemble learning method. Finally, we empirically illustrate that stacking multiple predictive methods together further improves performance on 19 real-world temporal networks from different domains.
Collapse
Affiliation(s)
- Xie He
- Department of Mathematics, Dartmouth College, Hanover, NH, USA
| | - Amir Ghasemian
- Yale Institute for Network Science, Yale University, New Haven, CT, USA
| | - Eun Lee
- Department of Scientific Computing, Pukyong National University, Busan, South Korea
| | - Aaron Clauset
- Department of Computer Science, University of Colorado, Boulder, CO, USA
- BioFrontiers Institute, University of Colorado, Boulder, Boulder, CO, USA
- Santa Fe Institute, Santa Fe, NM, USA
| | - Peter J Mucha
- Department of Mathematics, Dartmouth College, Hanover, NH, USA.
| |
Collapse
|
60
|
Qifeng Y, Longsheng C, Naeem MT. Hidden Markov Models based intelligent health assessment and fault diagnosis of rolling element bearings. PLoS One 2024; 19:e0297513. [PMID: 38324594 PMCID: PMC10849232 DOI: 10.1371/journal.pone.0297513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 01/07/2024] [Indexed: 02/09/2024] Open
Abstract
Hidden Markov Models (HMMs) have become an immensely popular tool for health assessment and fault diagnosis of rolling element bearings. The advantages of an HMM include its simplicity, robustness, and interpretability, while the generalization capability of the model still needs to be enhanced. The Dempster-Shafer theory of evidence can be used to conduct a comprehensive evaluation, and Stacking provides a novel training strategy. Therefore, the HMM-based fusion method and ensemble learning method are proposed to increase the credibility of quantitative analysis and optimize classifiers respectively. Firstly, vibration signals captured from bearings are decomposed into intrinsic mode functions (IMFs) using ensemble empirical mode decomposition (EEMD), and then the Hilbert envelope spectra of main components are obtained; Secondly, multi-domain features are extracted as model input from preprocessed signals; Finally, HMM-based intelligent health assessment framework and fault diagnosis framework are established. In this work, the life cycle health assessment modeling is performed using a few training samples, the bearing degradation state is quantitatively evaluated, normal and abnormal samples are effectively distinguished, and the accuracy of fault diagnosis is significantly improved.
Collapse
Affiliation(s)
- Yao Qifeng
- School of Economics and Management, Nanjing University of Science and Technology, Nanjing, China
| | - Cheng Longsheng
- School of Economics and Management, Nanjing University of Science and Technology, Nanjing, China
| | - Muhammad Tariq Naeem
- Department of Neurosurgery, Nishtar Medical College and Hospital, Multan, Pakistan
| |
Collapse
|
61
|
Li D, Pan C, Zhao J, Luo A. A penalized variable selection ensemble algorithm for high-dimensional group-structured data. PLoS One 2024; 19:e0296748. [PMID: 38315712 PMCID: PMC10843621 DOI: 10.1371/journal.pone.0296748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/19/2023] [Indexed: 02/07/2024] Open
Abstract
This paper presents a multi-algorithm fusion model (StackingGroup) based on the Stacking ensemble learning framework to address the variable selection problem in high-dimensional group structure data. The proposed algorithm takes into account the differences in data observation and training principles of different algorithms. It leverages the strengths of each model and incorporates Stacking ensemble learning with multiple group structure regularization methods. The main approach involves dividing the data set into K parts on average, using more than 10 algorithms as basic learning models, and selecting the base learner based on low correlation, strong prediction ability, and small model error. Finally, we selected the grSubset + grLasso, grLasso, and grSCAD algorithms as the base learners for the Stacking algorithm. The Lasso algorithm was used as the meta-learner to create a comprehensive algorithm called StackingGroup. This algorithm is designed to handle high-dimensional group structure data. Simulation experiments showed that the proposed method outperformed other R2, RMSE, and MAE prediction methods. Lastly, we applied the proposed algorithm to investigate the risk factors of low birth weight in infants and young children. The final results demonstrate that the proposed method achieves a mean absolute error (MAE) of 0.508 and a root mean square error (RMSE) of 0.668. The obtained values are smaller compared to those obtained from a single model, indicating that the proposed method surpasses other algorithms in terms of prediction accuracy.
Collapse
Affiliation(s)
- Dongsheng Li
- School of Mathematics and Statistics, Qiannan Normal University for Nationalities, Duyun, Guizhou, China
- Key Laboratory of Complex Systems and Intelligent Optimization of Guizhou Province, Duyun, China
| | - Chunyan Pan
- School of Mathematics and Statistics, Qiannan Normal University for Nationalities, Duyun, Guizhou, China
| | - Jing Zhao
- School of Mathematics and Statistics, Qiannan Normal University for Nationalities, Duyun, Guizhou, China
| | - Anfei Luo
- Department of Computer Science, Guizhou Police College, Guiyang, Guizhou, China
| |
Collapse
|
62
|
Koehler JC, Dong MS, Bierlich AM, Fischer S, Späth J, Plank IS, Koutsouleris N, Falter-Wagner CM. Machine learning classification of autism spectrum disorder based on reciprocity in naturalistic social interactions. Transl Psychiatry 2024; 14:76. [PMID: 38310111 PMCID: PMC10838326 DOI: 10.1038/s41398-024-02802-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 01/19/2024] [Accepted: 01/23/2024] [Indexed: 02/05/2024] Open
Abstract
Autism spectrum disorder is characterized by impaired social communication and interaction. As a neurodevelopmental disorder typically diagnosed during childhood, diagnosis in adulthood is preceded by a resource-heavy clinical assessment period. The ongoing developments in digital phenotyping give rise to novel opportunities within the screening and diagnostic process. Our aim was to quantify multiple non-verbal social interaction characteristics in autism and build diagnostic classification models independent of clinical ratings. We analyzed videos of naturalistic social interactions in a sample including 28 autistic and 60 non-autistic adults paired in dyads and engaging in two conversational tasks. We used existing open-source computer vision algorithms for objective annotation to extract information based on the synchrony of movement and facial expression. These were subsequently used as features in a support vector machine learning model to predict whether an individual was part of an autistic or non-autistic interaction dyad. The two prediction models based on reciprocal adaptation in facial movements, as well as individual amounts of head and body motion and facial expressiveness showed the highest precision (balanced accuracies: 79.5% and 68.8%, respectively), followed by models based on reciprocal coordination of head (balanced accuracy: 62.1%) and body (balanced accuracy: 56.7%) motion, as well as intrapersonal coordination processes (balanced accuracy: 44.2%). Combinations of these models did not increase overall predictive performance. Our work highlights the distinctive nature of non-verbal behavior in autism and its utility for digital phenotyping-based classification. Future research needs to both explore the performance of different prediction algorithms to reveal underlying mechanisms and interactions, as well as investigate the prospective generalizability and robustness of these algorithms in routine clinical care.
Collapse
Affiliation(s)
| | - Mark Sen Dong
- Department of Psychiatry and Psychotherapy, Medical Faculty, LMU, Munich, Germany
| | - Afton M Bierlich
- Department of Psychiatry and Psychotherapy, Medical Faculty, LMU, Munich, Germany
| | - Stefanie Fischer
- Department of Psychiatry and Psychotherapy, Medical Faculty, LMU, Munich, Germany
- Goethe University Frankfurt, University Hospital, Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, Frankfurt am Main, Germany
| | - Johanna Späth
- Department of Psychiatry and Psychotherapy, Medical Faculty, LMU, Munich, Germany
| | - Irene Sophia Plank
- Department of Psychiatry and Psychotherapy, Medical Faculty, LMU, Munich, Germany
| | - Nikolaos Koutsouleris
- Department of Psychiatry and Psychotherapy, Medical Faculty, LMU, Munich, Germany
- Max Planck Institute of Psychiatry, Munich, Germany
- Institute of Psychiatry, Psychology and Neuroscience, King's College, London, UK
| | | |
Collapse
|
63
|
Holler S, Kübler D, Conrad O, Schmitz O, Bonannella C, Hengl T, Böhner J, Günter S, Lippe M. Quo vadis, smallholder forest landscape? An introduction to the LPB-RAP model. PLoS One 2024; 19:e0297439. [PMID: 38306349 PMCID: PMC10836681 DOI: 10.1371/journal.pone.0297439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 01/04/2024] [Indexed: 02/04/2024] Open
Abstract
The impacts of the Anthropocene on climate and biodiversity pose societal and ecological problems that may only be solved by ecosystem restoration. Local to regional actions are required, which need to consider the prevailing present and future conditions of a certain landscape extent. Modeling approaches can be of help to support management efforts and to provide advice to policy making. We present stage one of the LaForeT-PLUC-BE model (Landscape Forestry in the Tropics-PCRaster Land Use Change-Biogeographic & Economic model; in short: LPB) and its thematic expansion module RAP (Restoration Areas Potentials). LPB-RAP is a high-resolution pixel-based scenario tool that relies on a range of explicit land use types (LUTs) to describe various forest types and the environment. It simulates and analyzes future landscape configurations under consideration of climate, population and land use change long-term. Simulated Land Use Land Cover Change (LULCC) builds on dynamic, probabilistic modeling incorporating climatic and anthropogenic determinants as well as restriction parameters to depict a sub-national regional smallholder-dominated forest landscape. The model delivers results for contrasting scenario settings by simulating without and with potential Forest and Landscape Restoration (FLR) measures. FLR potentials are depicted by up to five RAP-LUTs. The model builds on user-defined scenario inputs, such as the Shared Socioeconomic Pathways (SSP) and Representative Concentration Pathways (RCP). Model application is here exemplified for the SSP2-RCP4.5 scenario in the time frame 2018-2100 on the hectare scale in annual resolution using Esmeraldas province, Ecuador, as a case study area. The LPB-RAP model is a novel, heuristic Spatial Decision Support System (SDSS) tool for smallholder-dominated forest landscapes, supporting near-time top-down planning measures with long-term bottom-up modeling. Its application should be followed up by FLR on-site investigations and stakeholder participation across all involved scales.
Collapse
Affiliation(s)
- Sonja Holler
- Thünen Institute of Forestry, Hamburg, Germany
- Center for Earth System Research and Sustainability (CEN), Hamburg University, Hamburg, Germany
| | | | - Olaf Conrad
- Center for Earth System Research and Sustainability (CEN), Hamburg University, Hamburg, Germany
| | - Oliver Schmitz
- Department of Physical Geography, Faculty of Geosciences, Utrecht University, Utrecht, The Netherlands
| | - Carmelo Bonannella
- OpenGeoHub, Wageningen, The Netherlands
- Laboratory of Geo-Information Science and Remote Sensing, Wageningen University & Research, Wageningen, The Netherlands
| | | | - Jürgen Böhner
- Center for Earth System Research and Sustainability (CEN), Hamburg University, Hamburg, Germany
| | - Sven Günter
- Thünen Institute of Forestry, Hamburg, Germany
| | | |
Collapse
|
64
|
Wang X, Sun H, Dong Y, Huang J, Bai L, Tang Z, Liu S, Chen S. Development and validation of a cuproptosis-related prognostic model for acute myeloid leukemia patients using machine learning with stacking. Sci Rep 2024; 14:2802. [PMID: 38307903 PMCID: PMC10837443 DOI: 10.1038/s41598-024-53306-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 01/30/2024] [Indexed: 02/04/2024] Open
Abstract
Our objective is to develop a prognostic model focused on cuproptosis, aimed at predicting overall survival (OS) outcomes among Acute myeloid leukemia (AML) patients. The model utilized machine learning algorithms incorporating stacking. The GSE37642 dataset was used as the training data, and the GSE12417 and TCGA-LAML cohorts were used as the validation data. Stacking was used to merge the three prediction models, subsequently using a random survival forests algorithm to refit the final model using the stacking linear predictor and clinical factors. The prediction model, featuring stacking linear predictor and clinical factors, achieved AUC values of 0.840, 0.876 and 0.892 at 1, 2 and 3 years within the GSE37642 dataset. In external validation dataset, the corresponding AUCs were 0.741, 0.754 and 0.783. The predictive performance of the model in the external dataset surpasses that of the model simply incorporates all predictors. Additionally, the final model exhibited good calibration accuracy. In conclusion, our findings indicate that the novel prediction model refines the prognostic prediction for AML patients, while the stacking strategy displays potential for model integration.
Collapse
Affiliation(s)
- Xichao Wang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, P. R. China
| | - Hao Sun
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, P. R. China
| | - Yongfei Dong
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, P. R. China
| | - Jie Huang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, P. R. China
| | - Lu Bai
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, P. R. China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, P. R. China.
| | - Songbai Liu
- Suzhou Key Laboratory of Medical Biotechnology, Suzhou Vocational Health College, Suzhou, 215009, Jiangsu, China.
| | - Suning Chen
- National Clinical Research Center for Hematologic Diseases, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.
| |
Collapse
|
65
|
Krämer C, Stumme J, da Costa Campos L, Dellani P, Rubbert C, Caspers J, Caspers S, Jockwitz C. Prediction of cognitive performance differences in older age from multimodal neuroimaging data. GeroScience 2024; 46:283-308. [PMID: 37308769 PMCID: PMC10828156 DOI: 10.1007/s11357-023-00831-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 05/17/2023] [Indexed: 06/14/2023] Open
Abstract
Differences in brain structure and functional and structural network architecture have been found to partly explain cognitive performance differences in older ages. Thus, they may serve as potential markers for these differences. Initial unimodal studies, however, have reported mixed prediction results of selective cognitive variables based on these brain features using machine learning (ML). Thus, the aim of the current study was to investigate the general validity of cognitive performance prediction from imaging data in healthy older adults. In particular, the focus was with examining whether (1) multimodal information, i.e., region-wise grey matter volume (GMV), resting-state functional connectivity (RSFC), and structural connectivity (SC) estimates, may improve predictability of cognitive targets, (2) predictability differences arise for global cognition and distinct cognitive profiles, and (3) results generalize across different ML approaches in 594 healthy older adults (age range: 55-85 years) from the 1000BRAINS study. Prediction potential was examined for each modality and all multimodal combinations, with and without confound (i.e., age, education, and sex) regression across different analytic options, i.e., variations in algorithms, feature sets, and multimodal approaches (i.e., concatenation vs. stacking). Results showed that prediction performance differed considerably between deconfounding strategies. In the absence of demographic confounder control, successful prediction of cognitive performance could be observed across analytic choices. Combination of different modalities tended to marginally improve predictability of cognitive performance compared to single modalities. Importantly, all previously described effects vanished in the strict confounder control condition. Despite a small trend for a multimodal benefit, developing a biomarker for cognitive aging remains challenging.
Collapse
Affiliation(s)
- Camilla Krämer
- Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich, Germany
- Institute for Anatomy I, Medical Faculty & University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Johanna Stumme
- Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich, Germany
- Institute for Anatomy I, Medical Faculty & University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Lucas da Costa Campos
- Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich, Germany
- Institute for Anatomy I, Medical Faculty & University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Paulo Dellani
- Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich, Germany
- Institute for Anatomy I, Medical Faculty & University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Christian Rubbert
- Department of Diagnostic and Interventional Radiology, Medical Faculty & University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Julian Caspers
- Department of Diagnostic and Interventional Radiology, Medical Faculty & University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Svenja Caspers
- Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich, Germany
- Institute for Anatomy I, Medical Faculty & University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Christiane Jockwitz
- Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich, Germany.
- Institute for Anatomy I, Medical Faculty & University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
66
|
Wang W, Harrou F, Dairi A, Sun Y. Stacked deep learning approach for efficient SARS-CoV-2 detection in blood samples. Artif Intell Med 2024; 148:102767. [PMID: 38325923 DOI: 10.1016/j.artmed.2024.102767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 01/02/2024] [Accepted: 01/05/2024] [Indexed: 02/09/2024]
Abstract
Identifying COVID-19 through blood sample analysis is crucial in managing the disease and improving patient outcomes. Despite its advantages, the current test demands certified laboratories, expensive equipment, trained personnel, and 3-4 h for results, with a notable false-negative rate of 15%-20%. This study proposes a stacked deep-learning approach for detecting COVID-19 in blood samples to distinguish uninfected individuals from those infected with the virus. Three stacked deep learning architectures, namely the StackMean, StackMax, and StackRF algorithms, are introduced to improve the detection quality of single deep learning models. To counter the class imbalance phenomenon in the training data, the Synthetic Minority Oversampling Technique (SMOTE) algorithm is also implemented, resulting in increased specificity and sensitivity. The efficacy of the methods is assessed by utilizing blood samples obtained from hospitals in Brazil and Italy. Results revealed that the StackMax method greatly boosted the deep learning and traditional machine learning methods' capability to distinguish COVID-19-positive cases from normal cases, while SMOTE increased the specificity and sensitivity of the stacked models. Hypothesis testing is performed to determine if there is a significant statistical difference in the performance between the compared detection methods. Additionally, the significance of blood sample features in identifying COVID-19 is analyzed using the XGBoost (eXtreme Gradient Boosting) technique for feature importance identification. Overall, this methodology could potentially enhance the timely and precise identification of COVID-19 in blood samples.
Collapse
Affiliation(s)
- Wu Wang
- Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing 100872, China.
| | - Fouzi Harrou
- King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia.
| | - Abdelkader Dairi
- Computer Science Department, University of Science and Technology of Oran-Mohamed Boudiaf (USTO-MB), El Mnaouar, BP 1505, 31000, Bir El Djir, Algeria.
| | - Ying Sun
- King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
67
|
Xu W, Wei L, Cheng W, Yi X, Lin Y. Non-destructive assessment of soluble solids content in kiwifruit using hyperspectral imaging coupled with feature engineering. FRONTIERS IN PLANT SCIENCE 2024; 15:1292365. [PMID: 38357269 PMCID: PMC10864577 DOI: 10.3389/fpls.2024.1292365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 01/15/2024] [Indexed: 02/16/2024]
Abstract
The maturity of kiwifruit is widely gauged by its soluble solids content (SSC), with accurate assessment being essential to guarantee the fruit's quality. Hyperspectral imaging offers a non-destructive alternative to traditional destructive methods for SSC evaluation, though its efficacy is often hindered by the redundancy and external disturbances of spectral images. This study aims to enhance the accuracy of SSC predictions by employing feature engineering to meticulously select optimal spectral features and mitigate disturbance effects. We conducted a comprehensive investigation of four spectral pre-processing and nine spectral feature selection methods, as components of feature engineering, to determine their influence on the performance of a linear regression model based on ordinary least squares (OLS). Additionally, the stacking generalization technique was employed to amalgamate the strengths of the two most effective models derived from feature engineering. Our findings demonstrate a considerable improvement in SSC prediction accuracy post feature engineering. The most effective model, when considering both feature engineering and stacking generalization, achieved an R M S E p of 0.721, a M A P E p of 0.046, and an R P D p of 1.394 in the prediction set. The study confirms that feature engineering, especially the careful selection of spectral features, and the stacking generalization technique are instrumental in bolstering SSC prediction in kiwifruit. This advancement enhances the application of hyperspectral imaging for quality assessment, offering benefits that extend across the agricultural industry.
Collapse
Affiliation(s)
- Wei Xu
- Institute for Electric Light Sources, School of Information Science and Technology, Fudan University, Shanghai, China
- Institute for Six-sector Economy, Fudan University, Shanghai, China
| | - Liangzhuang Wei
- Academy for Engineering & Technology, Fudan University, Shanghai, China
| | - Wei Cheng
- Institute for Electric Light Sources, School of Information Science and Technology, Fudan University, Shanghai, China
| | - Xiangwei Yi
- Academy for Engineering & Technology, Fudan University, Shanghai, China
| | - Yandan Lin
- Institute for Electric Light Sources, School of Information Science and Technology, Fudan University, Shanghai, China
- Institute for Six-sector Economy, Fudan University, Shanghai, China
- Academy for Engineering & Technology, Fudan University, Shanghai, China
| |
Collapse
|
68
|
Setiya A, Jani V, Sonavane U, Joshi R. MolToxPred: small molecule toxicity prediction using machine learning approach. RSC Adv 2024; 14:4201-4220. [PMID: 38292268 PMCID: PMC10826801 DOI: 10.1039/d3ra07322j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 01/23/2024] [Indexed: 02/01/2024] Open
Abstract
Different types of chemicals and products may exhibit various health risks when administered into the human body. For toxicity reasons, the number of new drugs entering the market through the conventional drug development process has been reduced over the years. However, with the advent of big data and artificial intelligence, machine learning techniques have emerged as a potential solution for predicting toxicity and ensuring efficient drug development and chemical safety. An ML model for toxicity prediction can reduce experimental costs and time while addressing ethical concerns by drastically reducing the need for animals and clinical trials. Herein, MolToxPred, an ML-based tool, has been developed using a stacked model approach to predict the potential toxicity of small molecules and metabolites. The stacked model consists of random forest, multi-layer perceptron, and LightGBM as base classifiers and Logistic Regression as the meta classifier. For training and validation purposes, a comprehensive set of toxic and non-toxic molecules is curated. Different structural and physicochemical-based features in the form of molecular descriptors and fingerprints were employed. MolToxPred utilizes a comprehensive feature selection process and optimizes its hyperparameters through Bayesian optimization with stratified 5-fold cross-validation. In the evaluation phase, MolToxPred achieved an AUROC of 87.76% on the test set and 88.84% on an external validation set. The McNemar test was used as the post-hoc test to determine if the stacked models' performance was significantly different compared to the base learners. The developed stacked model outperformed its base classifiers and an existing tool in the literature, reaffirming its better performance. The hypothesis is that the incorporation of a diverse set of data, the subsequent feature selection, and a stacked ensemble approach give MolToxPred the edge over other methods. In addition to this, an attempt has been made to identify structural alerts responsible for endpoints of the Tox21 data to determine the association of a molecule with a plausible downstream pathway of action. MolToxPred may be helpful for drug discovery and regulatory pipelines in pharmaceutical and other industries for in silico toxicity prediction of small molecule candidates.
Collapse
Affiliation(s)
- Anjali Setiya
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Vinod Jani
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Uddhavesh Sonavane
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Rajendra Joshi
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| |
Collapse
|
69
|
Arora HC, Bhushan B, Kumar A, Kumar P, Hadzima-Nyarko M, Radu D, Cazacu CE, Kapoor NR. Ensemble learning based compressive strength prediction of concrete structures through real-time non-destructive testing. Sci Rep 2024; 14:1824. [PMID: 38245574 PMCID: PMC10799911 DOI: 10.1038/s41598-024-52046-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 01/12/2024] [Indexed: 01/22/2024] Open
Abstract
This study conducts an extensive comparative analysis of computational intelligence approaches aimed at predicting the compressive strength (CS) of concrete, utilizing two non-destructive testing (NDT) methods: the rebound hammer (RH) and the ultrasonic pulse velocity (UPV) test. In the ensemble learning approach, the six most popular algorithms (Adaboost, CatBoost, gradient boosting tree (GBT), random forest (RF), stacking, and extreme gradient boosting (XGB)) have been used to develop the prediction models of CS of concrete based on NDT. The ML models have been developed using a total of 721 samples, of which 111 were cast in the laboratory, 134 were obtained from in-situ testing, and the other samples were gathered from the literature. Among the three categories of analytical models-RH models, UPV models, and combined RH and UPV models; seven, ten, and thirteen models have been used respectively. AdaBoost, CatBoost, GBT, RF, Stacking, and XGB models have been used to improve the accuracy and dependability of the analytical models. The RH-M5, UPV-M6, and C-M6 (combined UPV and RH model) models were found with highest performance level amongst all the analytical models. The MAPE value of XGB was observed to be 84.37%, 83.24%, 77.33%, 59.46%, and 81.08% lower than AdaBoost, CatBoost, GBT, RF, and stacking, respectively. The performance of XGB model has been found best than other soft computing techniques and existing traditional predictive models.
Collapse
Affiliation(s)
- Harish Chandra Arora
- AcSIR-Academy of Scientific and Innovative Research, Ghaziabad, 201002, India
- Structural Engineering Department, CSIR-Central Building Research Institute, Roorkee, 247667, India
| | - Bharat Bhushan
- Structural Engineering Department, CSIR-Central Building Research Institute, Roorkee, 247667, India
| | - Aman Kumar
- AcSIR-Academy of Scientific and Innovative Research, Ghaziabad, 201002, India.
- Structural Engineering Department, CSIR-Central Building Research Institute, Roorkee, 247667, India.
| | - Prashant Kumar
- AcSIR-Academy of Scientific and Innovative Research, Ghaziabad, 201002, India
- Structural Engineering Department, CSIR-Central Building Research Institute, Roorkee, 247667, India
| | - Marijana Hadzima-Nyarko
- Faculty of Civil Engineering and Architecture Osijek, J. J. Strossmayer University of Osijek, Vladimira Preloga, Osijek, Croatia
- Faculty of Civil Engineering, Transilvania University of Brașov, 500152, Brașov, Romania
| | - Dorin Radu
- Faculty of Civil Engineering, Transilvania University of Brașov, 500152, Brașov, Romania
| | | | - Nishant Raj Kapoor
- AcSIR-Academy of Scientific and Innovative Research, Ghaziabad, 201002, India
| |
Collapse
|
70
|
Wang M, Qian Y, Yang Y, Chen H, Rao WF. Improved stacking ensemble learning based on feature selection to accurately predict warfarin dose. Front Cardiovasc Med 2024; 10:1320938. [PMID: 38312950 PMCID: PMC10834785 DOI: 10.3389/fcvm.2023.1320938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 12/26/2023] [Indexed: 02/06/2024] Open
Abstract
Background With the rapid development of artificial intelligence, prediction of warfarin dose via machine learning has received more and more attention. Since the dose prediction involve both linear and nonlinear problems, traditional machine learning algorithms are ineffective to solve such problems at one time. Objective Based on the characteristics of clinical data of Chinese warfarin patients, an improved stacking ensemble learning can achieve higher prediction accuracy. Methods Information of 641 patients from southern China who had reached a steady state on warfarin was collected, including demographic information, medical history, genotype, and co-medication status. The dataset was randomly divided into a training set (90%) and a test set (10%). The predictive capability is evaluated on a new test set generated by stacking ensemble learning. Additional factors associated with warfarin dose were discovered by feature selection methods. Results A newly proposed heuristic-stacking ensemble learning performs better than traditional-stacking ensemble learning in key metrics such as accuracy of ideal dose (73.44%, 71.88%), mean absolute errors (0.11 mg/day, 0.13 mg/day), root mean square errors (0.18 mg/day, 0.20 mg/day) and R2 (0.87, 0.82). Conclusions The developed heuristic-stacking ensemble learning can satisfactorily predict warfarin dose with high accuracy. A relationship between hypertension, a history of severe preoperative embolism, and warfarin dose is found, which provides a useful reference for the warfarin dose administration in the future.
Collapse
Affiliation(s)
- Mingyuan Wang
- Department of Pharmacy, Fuwai Yunnan Cardiovascular Hospital, Kunming, China
- School of Mechanical Engineering (Shandong Institute of Mechanical Design and Research), Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
| | - Yiyi Qian
- Department of Pharmacy, Fuwai Yunnan Cardiovascular Hospital, Kunming, China
| | - Yaodong Yang
- School of Mechanical Engineering (Shandong Institute of Mechanical Design and Research), Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
| | - Haobin Chen
- Department of Pathology, Qujing First People's Hospital, Qujing, Yunnan, China
| | - Wei-Feng Rao
- School of Mechanical Engineering (Shandong Institute of Mechanical Design and Research), Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
| |
Collapse
|
71
|
Yoo D, Divard G, Raynaud M, Cohen A, Mone TD, Rosenthal JT, Bentall AJ, Stegall MD, Naesens M, Zhang H, Wang C, Gueguen J, Kamar N, Bouquegneau A, Batal I, Coley SM, Gill JS, Oppenheimer F, De Sousa-Amorim E, Kuypers DRJ, Durrbach A, Seron D, Rabant M, Van Huyen JPD, Campbell P, Shojai S, Mengel M, Bestard O, Basic-Jukic N, Jurić I, Boor P, Cornell LD, Alexander MP, Toby Coates P, Legendre C, Reese PP, Lefaucheur C, Aubert O, Loupy A. A Machine Learning-Driven Virtual Biopsy System For Kidney Transplant Patients. Nat Commun 2024; 15:554. [PMID: 38228634 DOI: 10.1038/s41467-023-44595-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 12/21/2023] [Indexed: 01/18/2024] Open
Abstract
In kidney transplantation, day-zero biopsies are used to assess organ quality and discriminate between donor-inherited lesions and those acquired post-transplantation. However, many centers do not perform such biopsies since they are invasive, costly and may delay the transplant procedure. We aim to generate a non-invasive virtual biopsy system using routinely collected donor parameters. Using 14,032 day-zero kidney biopsies from 17 international centers, we develop a virtual biopsy system. 11 basic donor parameters are used to predict four Banff kidney lesions: arteriosclerosis, arteriolar hyalinosis, interstitial fibrosis and tubular atrophy, and the percentage of renal sclerotic glomeruli. Six machine learning models are aggregated into an ensemble model. The virtual biopsy system shows good performance in the internal and external validation sets. We confirm the generalizability of the system in various scenarios. This system could assist physicians in assessing organ quality, optimizing allograft allocation together with discriminating between donor derived and acquired lesions post-transplantation.
Collapse
Affiliation(s)
- Daniel Yoo
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France
| | - Gillian Divard
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France
- Kidney Transplant Department, Saint-Louis Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Marc Raynaud
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France
| | | | | | | | - Andrew J Bentall
- Division of Nephrology and Hypertension, Mayo Clinic Transplant Center, Rochester, MN, USA
| | | | - Maarten Naesens
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Leuven, Belgium
| | - Huanxi Zhang
- Organ Transplant Center, First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Changxi Wang
- Organ Transplant Center, First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Juliette Gueguen
- Néphrologie-Immunologie Clinique, Hôpital Bretonneau, CHU Tours, Tours, France
| | - Nassim Kamar
- Department of Nephrology and Organ Transplantation, Paul Sabatier University, INSERM, Toulouse, France
| | - Antoine Bouquegneau
- Department of Nephrology-Dialysis-Transplantation, Centre hospitalier universitaire de Liège, Liège, Belgium
| | - Ibrahim Batal
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Shana M Coley
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - John S Gill
- Division of Nephrology, Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Federico Oppenheimer
- Kidney Transplant Department, Hospital Clínic i Provincial de Barcelona, Barcelona, Spain
| | - Erika De Sousa-Amorim
- Kidney Transplant Department, Hospital Clínic i Provincial de Barcelona, Barcelona, Spain
| | - Dirk R J Kuypers
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Leuven, Belgium
| | - Antoine Durrbach
- Department of Nephrology, AP-HP Hôpital Henri Mondor, Créteil, Île de France, France
| | - Daniel Seron
- Nephrology Department, Hospital Vall d'Hebrón, Autonomous University of Barcelona, Barcelona, Spain
| | - Marion Rabant
- Department of Pathology, Necker-Enfants Malades Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Jean-Paul Duong Van Huyen
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France
- Department of Pathology, Necker-Enfants Malades Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Patricia Campbell
- Faculty of Medicine & Dentistry - Laboratory Medicine & Pathology Dept, University of Alberta, Edmonton, AB, Canada
| | - Soroush Shojai
- Faculty of Medicine & Dentistry - Laboratory Medicine & Pathology Dept, University of Alberta, Edmonton, AB, Canada
| | - Michael Mengel
- Faculty of Medicine & Dentistry - Laboratory Medicine & Pathology Dept, University of Alberta, Edmonton, AB, Canada
| | - Oriol Bestard
- Nephrology Department, Hospital Vall d'Hebrón, Autonomous University of Barcelona, Barcelona, Spain
| | - Nikolina Basic-Jukic
- Department of nephrology, arterial hypertension, dialysis and transplantation, University Hospital Centre Zagreb, Zagreb, Croatia
| | - Ivana Jurić
- Department of nephrology, arterial hypertension, dialysis and transplantation, University Hospital Centre Zagreb, Zagreb, Croatia
| | - Peter Boor
- Institute of Pathology, RWTH Aachen University Hospital, Aachen, Germany
| | - Lynn D Cornell
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Mariam P Alexander
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - P Toby Coates
- Department of Renal and Transplantation, University of Adelaide, Royal Adelaide Hospital Campus, Adelaide, SA, Australia
| | - Christophe Legendre
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France
- Department of Kidney Transplantation, Necker-Enfants Malades Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Peter P Reese
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France
- Renal-Electrolyte and Hypertension Division, Perelman School of Medicine, University of Pennsylvania, Philadephia, PA, USA
| | - Carmen Lefaucheur
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France
- Kidney Transplant Department, Saint-Louis Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Olivier Aubert
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France
- Department of Kidney Transplantation, Necker-Enfants Malades Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Alexandre Loupy
- Université Paris Cité, INSERM U970 PARCC, Paris Institute for Transplantation and Organ Regeneration, F-75015, Paris, France.
- Department of Kidney Transplantation, Necker-Enfants Malades Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France.
| |
Collapse
|
72
|
Ma Z, Wang R, Song G, Zhang K, Zhao Z, Wang J. Interpretable ensemble prediction for anaerobic digestion performance of hydrothermal carbonization wastewater. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 908:168279. [PMID: 37926246 DOI: 10.1016/j.scitotenv.2023.168279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/12/2023] [Accepted: 10/31/2023] [Indexed: 11/07/2023]
Abstract
Hydrothermal carbonization (HTC) is a method to improve fuel quality that can directly treat wet solid waste, but the treatment produces large amounts of wastewater. Hydrothermal carbonation wastewater treatment for methane production by anaerobic digestion can lead to waste utilization and energy saving. However, anaerobic digestion performance prediction of HTC wastewater is challenging due to the complexity of influencing factors. This study applies interpretable machine learning combined with ensemble learning to construct ensemble prediction models for the biogas yield and CH4 concentration. The machine learning ensemble model can integrate the advantages of single models and effectively improve the prediction accuracy of the anaerobic digestion performance of HTC wastewater, with the best R2 reaching 0.836 and 0.820, respectively, which is better than 0.780 and 0.802 of the best single models. The SHapley Additive exPlanations theory is combined with the ensemble models to show that anaerobic digestion reacted time with HTC temperature, pH, and COD has a coupling effect on daily biogas yield and CH4 concentration.
Collapse
Affiliation(s)
- Zherui Ma
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| | - Ruikun Wang
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China.
| | - Gaoke Song
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| | - Kai Zhang
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| | - Zhenghui Zhao
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| | - Jiangjiang Wang
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| |
Collapse
|
73
|
Varghese J, Brenner A, Fujarski M, van Alen CM, Plagwitz L, Warnecke T. Machine Learning in the Parkinson's disease smartwatch (PADS) dataset. NPJ Parkinsons Dis 2024; 10:9. [PMID: 38182602 PMCID: PMC10770131 DOI: 10.1038/s41531-023-00625-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 12/18/2023] [Indexed: 01/07/2024] Open
Abstract
The utilisation of smart devices, such as smartwatches and smartphones, in the field of movement disorders research has gained significant attention. However, the absence of a comprehensive dataset with movement data and clinical annotations, encompassing a wide range of movement disorders including Parkinson's disease (PD) and its differential diagnoses (DD), presents a significant gap. The availability of such a dataset is crucial for the development of reliable machine learning (ML) models on smart devices, enabling the detection of diseases and monitoring of treatment efficacy in a home-based setting. We conducted a three-year cross-sectional study at a large tertiary care hospital. A multi-modal smartphone app integrated electronic questionnaires and smartwatch measures during an interactive assessment designed by neurologists to provoke subtle changes in movement pathologies. We captured over 5000 clinical assessment steps from 504 participants, including PD, DD, and healthy controls (HC). After age-matching, an integrative ML approach combining classical signal processing and advanced deep learning techniques was implemented and cross-validated. The models achieved an average balanced accuracy of 91.16% in the classification PD vs. HC, while PD vs. DD scored 72.42%. The numbers suggest promising performance while distinguishing similar disorders remains challenging. The extensive annotations, including details on demographics, medical history, symptoms, and movement steps, provide a comprehensive database to ML techniques and encourage further investigations into phenotypical biomarkers related to movement disorders.
Collapse
Affiliation(s)
- Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany.
- European Research Centre of Information Systems, University of Münster, Münster, Germany.
| | - Alexander Brenner
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Michael Fujarski
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | | | - Lucas Plagwitz
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Tobias Warnecke
- Department of Neurology and Neurorehabilitation, Klinikum Osnabrück - Academic teaching hospital of the University of Münster, Osnabrück, Germany
| |
Collapse
|
74
|
Wulan N, An L, Zhang C, Kong R, Chen P, Bzdok D, Eickhoff SB, Holmes AJ, Yeo BTT. Translating phenotypic prediction models from big to small anatomical MRI data using meta-matching. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.31.573801. [PMID: 38260665 PMCID: PMC10802307 DOI: 10.1101/2023.12.31.573801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Individualized phenotypic prediction based on structural MRI is an important goal in neuroscience. Prediction performance increases with larger samples, but small-scale datasets with fewer than 200 participants are often unavoidable. We have previously proposed a "meta-matching" framework to translate models trained from large datasets to improve the prediction of new unseen phenotypes in small collection efforts. Meta-matching exploits correlations between phenotypes, yielding large improvement over classical machine learning when applied to prediction models using resting-state functional connectivity as input features. Here, we adapt the two best performing meta-matching variants ("meta-matching finetune" and "meta-matching stacking") from our previous study to work with T1-weighted MRI data by changing the base neural network architecture to a 3D convolution neural network. We compare the two meta-matching variants with elastic net and classical transfer learning using the UK Biobank (N = 36,461), Human Connectome Project Young Adults (HCP-YA) dataset (N = 1,017) and HCP-Aging dataset (N = 656). We find that meta-matching outperforms elastic net and classical transfer learning by a large margin, both when translating models within the same dataset, as well as translating models across datasets with different MRI scanners, acquisition protocols and demographics. For example, when translating a UK Biobank model to 100 HCP-YA participants, meta-matching finetune yielded a 136% improvement in variance explained over transfer learning, with an average absolute gain of 2.6% (minimum = -0.9%, maximum = 17.6%) across 35 phenotypes. Overall, our results highlight the versatility of the meta-matching framework.
Collapse
Affiliation(s)
- Naren Wulan
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Lijun An
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Chen Zhang
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Ru Kong
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Pansheng Chen
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Danilo Bzdok
- Department of Biomedical Engineering, McConnell Brain Imaging Centre (BIC), Montreal Neurological Institute (MNI), Faculty of Medicine, School of Computer Science, McGill University, Montreal QC, Canada
- Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada
| | - Simon B Eickhoff
- Institute for Systems Neuroscience, Medical Faculty, Heinrich-Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Neuroscience and Medicine, Brain & Behavior (INM-7), Research Center Jülich, Jülich, Germany
| | - Avram J Holmes
- Department of Psychiatry, Brain Health Institute, Rutgers University, Piscataway, NJ, USA
| | - B T Thomas Yeo
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
- Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA
| |
Collapse
|
75
|
Yan W, Chiu B, Shen Z, Yang Q, Syer T, Min Z, Punwani S, Emberton M, Atkinson D, Barratt DC, Hu Y. Combiner and HyperCombiner networks: Rules to combine multimodality MR images for prostate cancer localisation. Med Image Anal 2024; 91:103030. [PMID: 37995627 DOI: 10.1016/j.media.2023.103030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 09/22/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023]
Abstract
One of the distinct characteristics of radiologists reading multiparametric prostate MR scans, using reporting systems like PI-RADS v2.1, is to score individual types of MR modalities, including T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant cancer. This work aims to demonstrate that it is feasible for low-dimensional parametric models to model such decision rules in the proposed Combiner networks, without compromising the accuracy of predicting radiologic labels. First, we demonstrate that either a linear mixture model or a nonlinear stacking model is sufficient to model PI-RADS decision rules for localising prostate cancer. Second, parameters of these combining models are proposed as hyperparameters, weighing independent representations of individual image modalities in the Combiner network training, as opposed to end-to-end modality ensemble. A HyperCombiner network is developed to train a single image segmentation network that can be conditioned on these hyperparameters during inference for much-improved efficiency. Experimental results based on 751 cases from 651 patients compare the proposed rule-modelling approaches with other commonly-adopted end-to-end networks, in this downstream application of automating radiologist labelling on multiparametric MR. By acquiring and interpreting the modality combining rules, specifically the linear-weights or odds ratios associated with individual image modalities, three clinical applications are quantitatively presented and contextualised in the prostate cancer segmentation application, including modality availability assessment, importance quantification and rule discovery.
Collapse
Affiliation(s)
- Wen Yan
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Hong Kong China; Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Bernard Chiu
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Hong Kong China; Department of Physics & Computer Science, Wilfrid Laurier University, 75 University Avenue West Waterloo, Ontario N2L 3C5, Canada.
| | - Ziyi Shen
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Qianye Yang
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Tom Syer
- Centre for Medical Imaging, Division of Medicine, University College London, London W1 W 7TS, UK.
| | - Zhe Min
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Shonit Punwani
- Centre for Medical Imaging, Division of Medicine, University College London, London W1 W 7TS, UK.
| | - Mark Emberton
- Division of Surgery & Interventional Science, University College London, Gower St, WC1E 6BT, London, UK.
| | - David Atkinson
- Centre for Medical Imaging, Division of Medicine, University College London, London W1 W 7TS, UK.
| | - Dean C Barratt
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Yipeng Hu
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| |
Collapse
|
76
|
Hermes S, Cady J, Armentrout S, O’Connor J, Holdaway SC, Cruchaga C, Wingo T, Greytak EM. Epistatic Features and Machine Learning Improve Alzheimer's Disease Risk Prediction Over Polygenic Risk Scores. J Alzheimers Dis 2024; 99:1425-1440. [PMID: 38788065 PMCID: PMC11284654 DOI: 10.3233/jad-230236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
Background Polygenic risk scores (PRS) are linear combinations of genetic markers weighted by effect size that are commonly used to predict disease risk. For complex heritable diseases such as late-onset Alzheimer's disease (LOAD), PRS models fail to capture much of the heritability. Additionally, PRS models are highly dependent on the population structure of the data on which effect sizes are assessed and have poor generalizability to new data. Objective The goal of this study is to construct a paragenic risk score that, in addition to single genetic marker data used in PRS, incorporates epistatic interaction features and machine learning methods to predict risk for LOAD. Methods We construct a new state-of-the-art genetic model for risk of Alzheimer's disease. Our approach innovates over PRS models in two ways: First, by directly incorporating epistatic interactions between SNP loci using an evolutionary algorithm guided by shared pathway information; and second, by estimating risk via an ensemble of non-linear machine learning models rather than a single linear model. We compare the paragenic model to several PRS models from the literature trained on the same dataset. Results The paragenic model is significantly more accurate than the PRS models under 10-fold cross-validation, obtaining an AUC of 83% and near-clinically significant matched sensitivity/specificity of 75%. It remains significantly more accurate when evaluated on an independent holdout dataset and maintains accuracy within APOE genotype strata. Conclusions Paragenic models show potential for improving disease risk prediction for complex heritable diseases such as LOAD over PRS models.
Collapse
Affiliation(s)
| | | | | | | | | | - Carlos Cruchaga
- Department of Psychiatry, Washington University, St. Louis, MO, USA
- Hope Center Program on Protein Aggregation and Neurodegeneration, Washington University, St. Louis, MO, USA
| | - Thomas Wingo
- Goizueta Alzheimer’s Disease Center, Emory University School of Medicine, Atlanta, GA, USA
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | | | | |
Collapse
|
77
|
Hu WH, Lin SY, Hu YJ, Huang HY, Lu PL. Application of machine learning for mortality prediction in patients with candidemia: Feasibility verification and comparison with clinical severity scores. Mycoses 2024; 67:e13667. [PMID: 37914666 DOI: 10.1111/myc.13667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 10/12/2023] [Accepted: 10/18/2023] [Indexed: 11/03/2023]
Abstract
BACKGROUND Clinical severity scores, such as acute physiology, age, chronic health evaluation II (APACHE II), sequential organ failure assessment (SOFA), Pitt Bacteremia Score (PBS), and European Confederation of Medical Mycology Quality (EQUAL) score, may not reliably predict candidemia prognosis owing to their prespecified scorings that can limit their adaptability and applicability. OBJECTIVES Unlike those fixed and prespecified scorings, we aim to develop and validate a machine learning (ML) approach that is able to learn predictive models adaptively from available patient data to increase adaptability and applicability. METHODS Different ML algorithms follow different design philosophies and consequently, they carry different learning biases. We have designed an ensemble meta-learner based on stacked generalisation to integrate multiple learners as a team to work at its best in a synergy to improve predictive performances. RESULTS In the multicenter retrospective study, we analysed 512 patients with candidemia from January 2014 to July 2019 and compared a stacked generalisation model (SGM) with APACHE II, SOFA, PBS and EQUAL score to predict the 14-day mortality. The cross-validation results showed that the SGM significantly outperformed APACHE II, SOFA, PBS, and EQUAL score across several metrics, including F1-score (0.68, p < .005), Matthews correlation coefficient (0.54, p < .05 vs. SOFA, p < .005 vs. the others) and the area under the curve (AUC; 0.87, p < .005). In addition, in an independent external test, the model effectively predicted patients' mortality in the external validation cohort, with an AUC of 0.77. CONCLUSIONS ML models show potential for improving mortality prediction amongst patients with candidemia compared to clinical severity scores.
Collapse
Affiliation(s)
- Wei-Huan Hu
- College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Shang-Yi Lin
- Division of Infectious Diseases, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Laboratory Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Yuh-Jyh Hu
- College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Institute of Biomedical Engineering, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Ho-Yin Huang
- Department of Pharmacy, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Po-Liang Lu
- Division of Infectious Diseases, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- School of Post-Baccalaureate Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| |
Collapse
|
78
|
Hussain S, Songhua X, Aslam MU, Hussain F. Clinical predictions of COVID-19 patients using deep stacking neural networks. J Investig Med 2024; 72:112-127. [PMID: 37712431 DOI: 10.1177/10815589231201103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic, which emerged in late 2019, has caused millions of infections and fatalities globally, disrupting various aspects of human society, including socioeconomic, political, and educational systems. One of the key challenges during the COVID-19 pandemic is accurately predicting the clinical development and outcome of the infected patients. In response, scientists and medical professionals globally have mobilized to develop prognostic strategies such as risk scores, biomarkers, and machine learning models to predict the clinical course and outcomes of COVID-19 patients. In this contribution, we deployed a mathematical approach called matrix factorization feature selection to select the most relevant features from the anonymized laboratory biomarkers and demographic data of COVID-19 patients. Based on these features, developed a model that leverages the deep stacking neural network (DSNN) to aid in clinical care by predicting patients' mortality risk. To gauge the performance of our suggested model, performed a comparative analysis with principal component analysis plus support vector machine, deep learning, and random forest, achieving outstanding performances. The DSNN model outperformed all the other models in terms of area under the curve (96.0%), F1-score (98.1%), recall (98.5%), accuracy (99.0%), precision (97.7%), specificity (97.0%), and maximum probability of correction decision (93.4%). Our model outperforms the clinical predictive models regarding patient mortality risk and classification in the literature. Therefore, we conclude that our robust model can help healthcare professionals to manage COVID-19 patients more effectively. We expect that early prediction of COVID-19 patients and preventive interventions can reduce the mortality risk of patients.
Collapse
Affiliation(s)
- Sajid Hussain
- School of Mathematics and Statistics XJTU, Xian, Shaanxi, China
| | - Xu Songhua
- School of Mathematics and Statistics XJTU, Xian, Shaanxi, China
| | | | - Fida Hussain
- School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Nuevo León, Mexico
| |
Collapse
|
79
|
Díaz I, Hoffman KL, Hejazi NS. Causal survival analysis under competing risks using longitudinal modified treatment policies. LIFETIME DATA ANALYSIS 2024; 30:213-236. [PMID: 37620504 DOI: 10.1007/s10985-023-09606-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 07/17/2023] [Indexed: 08/26/2023]
Abstract
Longitudinal modified treatment policies (LMTP) have been recently developed as a novel method to define and estimate causal parameters that depend on the natural value of treatment. LMTPs represent an important advancement in causal inference for longitudinal studies as they allow the non-parametric definition and estimation of the joint effect of multiple categorical, ordinal, or continuous treatments measured at several time points. We extend the LMTP methodology to problems in which the outcome is a time-to-event variable subject to a competing event that precludes observation of the event of interest. We present identification results and non-parametric locally efficient estimators that use flexible data-adaptive regression techniques to alleviate model misspecification bias, while retaining important asymptotic properties such as [Formula: see text]-consistency. We present an application to the estimation of the effect of the time-to-intubation on acute kidney injury amongst COVID-19 hospitalized patients, where death by other causes is taken to be the competing event.
Collapse
Affiliation(s)
- Iván Díaz
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, NY, 10016, USA.
| | - Katherine L Hoffman
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, 10032, USA
| | - Nima S Hejazi
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, 02115, USA
| |
Collapse
|
80
|
Srisongkram T. Ensemble Quantitative Read-Across Structure-Activity Relationship Algorithm for Predicting Skin Cytotoxicity. Chem Res Toxicol 2023; 36:1961-1972. [PMID: 38047785 DOI: 10.1021/acs.chemrestox.3c00238] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Read-across (RA) and quantitative structure-activity relationship (QSAR) are two alternative methods commonly used to fill data gaps in chemical registrations. These approaches use physicochemical properties or molecular fingerprints of source substances to predict the properties of unknown substances that have similar chemical structures or physicochemical properties. Research on RA and QSAR is essential to minimize the time, money, and animal testing needed to determine biological properties that are not currently known. This study developed a stacked ensemble quantitative read-across structure-activity relationship algorithm (enQRASAR) for predicting skin irritation toxicity based on negative log cell viability inhibition concentration at 50% (pIC50) against skin keratinocytes as the end point. The goodness-of-fit and predictability of this algorithm were validated using leave-one-out cross-validation and external test data sets. The results obtained were statistically reliable in terms of goodness-of-fit, robustness, and predictability metrics. Additionally, the developed model demonstrated a low prediction error when predicting FDA-approved drugs. These results confirm that the enQRASAR algorithm can be used to predict skin cytotoxicity of chemicals. Therefore, this model was publicly available to further facilitate toxicity predictions of unknown compounds in chemical registrations.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40000, Thailand
| |
Collapse
|
81
|
Einhaus J, Gaudilliere DK, Hedou J, Feyaerts D, Ozawa MG, Sato M, Ganio EA, Tsai AS, Stelzer IA, Bruckman KC, Amar JN, Sabayev M, Bonham TA, Gillard J, Diop M, Cambriel A, Mihalic ZN, Valdez T, Liu SY, Feirrera L, Lam DK, Sunwoo JB, Schürch CM, Gaudilliere B, Han X. Spatial subsetting enables integrative modeling of oral squamous cell carcinoma multiplex imaging data. iScience 2023; 26:108486. [PMID: 38125025 PMCID: PMC10730356 DOI: 10.1016/j.isci.2023.108486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 11/01/2023] [Accepted: 11/16/2023] [Indexed: 12/23/2023] Open
Abstract
Oral squamous cell carcinoma (OSCC), a prevalent and aggressive neoplasm, poses a significant challenge due to poor prognosis and limited prognostic biomarkers. Leveraging highly multiplexed imaging mass cytometry, we investigated the tumor immune microenvironment (TIME) in OSCC biopsies, characterizing immune cell distribution and signaling activity at the tumor-invasive front. Our spatial subsetting approach standardized cellular populations by tissue zone, improving feature reproducibility and revealing TIME patterns accompanying loss-of-differentiation. Employing a machine-learning pipeline combining reliable feature selection with multivariable modeling, we achieved accurate histological grade classification (AUC = 0.88). Three model features correlated with clinical outcomes in an independent cohort: granulocyte MAPKAPK2 signaling at the tumor front, stromal CD4+ memory T cell size, and the distance of fibroblasts from the tumor border. This study establishes a robust modeling framework for distilling complex imaging data, uncovering sentinel characteristics of the OSCC TIME to facilitate prognostic biomarkers discovery for recurrence risk stratification and immunomodulatory therapy development.
Collapse
Affiliation(s)
- Jakob Einhaus
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Pathology and Neuropathology, University Hospital and Comprehensive Cancer Center Tübingen, Tübingen, Germany
| | - Dyani K. Gaudilliere
- Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Julien Hedou
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Dorien Feyaerts
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Michael G. Ozawa
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Masaki Sato
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Edward A. Ganio
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Amy S. Tsai
- Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Ina A. Stelzer
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Karl C. Bruckman
- Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Jonas N. Amar
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Maximilian Sabayev
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Thomas A. Bonham
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Joshua Gillard
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Maïgane Diop
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Amelie Cambriel
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Zala N. Mihalic
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Tulio Valdez
- Division of Pediatrics, Department of Otolaryngology, Stanford University School of Medicine, Stanford, CA, USA
| | - Stanley Y. Liu
- Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
- Division of Sleep Surgery, Department of Otolaryngology, Stanford University School of Medicine, Stanford, CA, USA
| | - Leticia Feirrera
- Department of Oral and Maxillofacial Surgery, University of the Pacific, Arthur A. Dugoni School of Dentistry, San Francisco, CA, USA
| | - David K. Lam
- Department of Oral and Maxillofacial Surgery, University of the Pacific, Arthur A. Dugoni School of Dentistry, San Francisco, CA, USA
| | - John B. Sunwoo
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University School of Medicine, Stanford, CA, USA
| | - Christian M. Schürch
- Department of Pathology and Neuropathology, University Hospital and Comprehensive Cancer Center Tübingen, Tübingen, Germany
- Cluster of Excellence iFIT (EXC 2180) “Image-Guided and Functionally Instructed Tumor Therapies”, University of Tübingen, Tübingen, Germany
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Xiaoyuan Han
- Department of Biomedical Sciences, University of the Pacific, Arthur A. Dugoni School of Dentistry, San Francisco, CA, USA
| |
Collapse
|
82
|
Nguyen LTH, Fukumoto Y, Cesana P, Staykov A. Fully Automatized Optimization of Ring-Opening Reactions in Lactone Derivatives via Two-Step Machine Learning. J Phys Chem A 2023; 127:10159-10170. [PMID: 37982574 DOI: 10.1021/acs.jpca.3c05887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2023]
Abstract
Cyclization and cycloreversion of organic compounds are fundamental kinetic processes in the design of functional molecules, molecular machines, nanoscale sensors, and switches in the field of molecular and nanoelectronics. We present a fully automatic computational platform for the design of a class of five- and six-membered ring lactones by optimizing the ring-opening reaction rate. Starting from a minimal initial parent set, our algorithm generates iteratively cascades of pools of candidate lactone derivatives where optimization and down-selection are performed without human supervision. We employ the density functional theory combined with the transition state theory to elucidate the exact mechanism leading to the lactone ring-opening reaction. On the basis of the analysis of the reaction pathway and the frontier molecular orbitals, we identify a simple descriptor that can easily correlate with the reaction rate. Consequently, we can omit computationally expensive transition state calculations and deduce the reaction rate from simple ground-state and ionic calculations. To accelerate the platform, we use a data set of the order of 800 molecules to train machine learning models for the prediction of targeted chemical properties, reducing the computational time by a 90% factor. We developed an evolutionary algorithm capable of generating data sets 3 orders of magnitude larger than the initial parent set. Thus, we can explore a large domain of chemical space using minimal computational effort. Our entire platform is modular, and our current implementation for lactone can be further generalized to more complex systems via substitution of the quantum chemical and fingerprinting modules.
Collapse
Affiliation(s)
- Linh Thi Hoai Nguyen
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Yasuhide Fukumoto
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Pierluigi Cesana
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Aleksandar Staykov
- International Institute for Carbon-neutral Energy Research (WPI-I2CNER), Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| |
Collapse
|
83
|
Chen P, An L, Wulan N, Zhang C, Zhang S, Ooi LQR, Kong R, Chen J, Wu J, Chopra S, Bzdok D, Eickhoff SB, Holmes AJ, Yeo BT. Multilayer meta-matching: translating phenotypic prediction models from multiple datasets to small data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.05.569848. [PMID: 38106085 PMCID: PMC10723283 DOI: 10.1101/2023.12.05.569848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Resting-state functional connectivity (RSFC) is widely used to predict phenotypic traits in individuals. Large sample sizes can significantly improve prediction accuracies. However, for studies of certain clinical populations or focused neuroscience inquiries, small-scale datasets often remain a necessity. We have previously proposed a "meta-matching" approach to translate prediction models from large datasets to predict new phenotypes in small datasets. We demonstrated large improvement of meta-matching over classical kernel ridge regression (KRR) when translating models from a single source dataset (UK Biobank) to the Human Connectome Project Young Adults (HCP-YA) dataset. In the current study, we propose two meta-matching variants ("meta-matching with dataset stacking" and "multilayer meta-matching") to translate models from multiple source datasets across disparate sample sizes to predict new phenotypes in small target datasets. We evaluate both approaches by translating models trained from five source datasets (with sample sizes ranging from 862 participants to 36,834 participants) to predict phenotypes in the HCP-YA and HCP-Aging datasets. We find that multilayer meta-matching modestly outperforms meta-matching with dataset stacking. Both meta-matching variants perform better than the original "meta-matching with stacking" approach trained only on the UK Biobank. All meta-matching variants outperform classical KRR and transfer learning by a large margin. In fact, KRR is better than classical transfer learning when less than 50 participants are available for finetuning, suggesting the difficulty of classical transfer learning in the very small sample regime. The multilayer meta-matching model is publicly available at GITHUB_LINK.
Collapse
Affiliation(s)
- Pansheng Chen
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Lijun An
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Naren Wulan
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Chen Zhang
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Shaoshi Zhang
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
- Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore
| | - Leon Qi Rong Ooi
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
- Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore
| | - Ru Kong
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Jianzhong Chen
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
| | - Jianxiao Wu
- Institute for Systems Neuroscience, Medical Faculty, Heinrich-Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Neuroscience and Medicine, Brain & Behavior (INM-7), Research Center Jülich, Jülich, Germany
| | - Sidhant Chopra
- Department of Psychology, Yale University, New Haven, CT, USA
| | - Danilo Bzdok
- Department of Biomedical Engineering, McConnell Brain Imaging Centre (BIC), Montreal Neurological Institute (MNI), Faculty of Medicine, School of Computer Science, McGill University, Montreal QC, Canada
- Mila – Quebec Artificial Intelligence Institute, Montreal, QC, Canada
| | - Simon B Eickhoff
- Institute for Systems Neuroscience, Medical Faculty, Heinrich-Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Neuroscience and Medicine, Brain & Behavior (INM-7), Research Center Jülich, Jülich, Germany
| | - Avram J Holmes
- Department of Psychiatry, Brain Health Institute, Rutgers University, Piscataway, NJ, USA
| | - B.T. Thomas Yeo
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore
- Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA
| |
Collapse
|
84
|
Asadi‐Pooya AA, Fattahi D, Abolpour N, Boostani R, Farazdaghi M, Sharifi M. Epilepsy classification using artificial intelligence: A web-based application. Epilepsia Open 2023; 8:1362-1368. [PMID: 37565252 PMCID: PMC10690646 DOI: 10.1002/epi4.12800] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 07/29/2023] [Indexed: 08/12/2023] Open
Abstract
OBJECTIVE The purpose of the current endeavor was to evaluate the feasibility of using easily accessible and applicable clinical information (based on history taking and physical examination) in order to make a reliable differentiation between idiopathic generalized epilepsy (IGE) versus focal epilepsy using machine learning (ML) methods. METHODS The first phase of the study was a retrospective study of a prospectively developed and maintained database. All patients with an electro-clinical diagnosis of IGE or focal epilepsy, at the outpatient epilepsy clinic at Shiraz University of Medical Sciences, Shiraz, Iran, from 2008 until 2022, were included. The first author selected a set of clinical features. Using the stratified random portioning method, the dataset was divided into the train (70%) and test (30%) subsets. Different types of classifiers were assessed and the final classification was made based on their best results using the stacking method. RESULTS A total number of 1445 patients were studied; 964 with focal epilepsy and 481 with IGE. The stacking classifier led to better results than the base classifiers in general. This algorithm has the following characteristics: precision: 0.81, sensitivity: 0.81, and specificity: 0.77. SIGNIFICANCE We developed a pragmatic algorithm aimed at facilitating epilepsy classification for individuals whose epilepsy begins at age 10 years and older. Also, in order to enable and facilitate future external validation studies by other peers and professionals, the developed and trained ML model was implemented and published via an online web-based application that is freely available at http://www.epiclass.ir/f-ige.
Collapse
Affiliation(s)
- Ali A. Asadi‐Pooya
- Epilepsy Research CenterShiraz University of Medical SciencesShirazIran
- Department of Neurology, Jefferson Comprehensive Epilepsy CenterThomas Jefferson UniversityPhiladelphiaPennsylvaniaUSA
| | - Davood Fattahi
- Epilepsy Research CenterShiraz University of Medical SciencesShirazIran
| | - Nahid Abolpour
- Epilepsy Research CenterShiraz University of Medical SciencesShirazIran
| | - Reza Boostani
- Department of Computer Science Engineering and Information TechnologyShiraz UniversityShirazIran
| | - Mohsen Farazdaghi
- Epilepsy Research CenterShiraz University of Medical SciencesShirazIran
| | - Mehrdad Sharifi
- Vice‐Chancellery for Treatment AffairsShiraz University of Medical SciencesShirazIran
- Emergency Medicine Department, School of MedicineShiraz University of Medical SciencesShirazIran
- Emergency Medicine Research CenterShiraz University of Medical SciencesShirazIran
| |
Collapse
|
85
|
Rauschenberger A, Landoulsi Z, van de Wiel MA, Glaab E. Penalized regression with multiple sources of prior effects. Bioinformatics 2023; 39:btad680. [PMID: 37951587 PMCID: PMC10699841 DOI: 10.1093/bioinformatics/btad680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/19/2023] [Accepted: 11/08/2023] [Indexed: 11/14/2023] Open
Abstract
MOTIVATION In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. RESULTS We propose an approach for integrating multiple sources of such prior information into penalized regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented in the R package transreg (https://github.com/lcsb-bds/transreg, https://cran.r-project.org/package=transreg).
Collapse
Affiliation(s)
- Armin Rauschenberger
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 4362 Esch-sur-Alzette, Luxembourg
| | - Zied Landoulsi
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 4362 Esch-sur-Alzette, Luxembourg
| | - Mark A van de Wiel
- Department of Epidemiology and Data Science (EDS), Amsterdam University Medical Centers (Amsterdam UMC), 1081 HV Amsterdam, The Netherlands
| | - Enrico Glaab
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 4362 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
86
|
Garcia GGP, Czerniak LL, Lavieri MS, Liebel SW, Van Pelt KL, Pasquina PF, McAllister TW, McCrea MA, Broglio SP. Estimating the Relationship Between the Symptom-Free Waiting Period and Injury Rates After Return-to-Play from Concussion: A Simulation Analysis Using CARE Consortium Data. Sports Med 2023; 53:2513-2528. [PMID: 37610654 DOI: 10.1007/s40279-023-01901-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2023] [Indexed: 08/24/2023]
Abstract
BACKGROUND A key component of return-to-play (RTP) from sport-related concussion is the symptom-free waiting period (SFWP), i.e., the period during which athletes must remain symptom-free before permitting RTP. Yet, the exact relationship between SFWP and post-RTP injury rates is unclear. OBJECTIVE We design computational simulations to estimate the relationship between the SFWP and rates of repeat concussion and non-concussion time-loss injury up to 30 days post-RTP for male and female collegiate athletes across 13 sports. METHODS We leverage N = 735 female and N = 1,094 male post-injury trajectories from the National Collegiate Athletic Association-Department of Defense Concussion Assessment, Research, and Education Consortium. RESULTS With a 6-day SFWP, the mean [95% CI] rate of repeat concussion per 1,000 simulations was greatest in ice hockey for females (20.31, [20.16, 20.46]) and American football for males (24.16, [24.05, 24.28]). Non-concussion time-loss injury rates were greatest in field hockey for females (153.66, [152.59, 154.74]) and wrestling for males (247.34, [246.20, 248.48]). Increasing to a 13-day SFWP, ice hockey for females (18.88, [18.79, 18.98]) and American football for males (23.16, [23.09, 24.22]) exhibit the greatest decrease in repeat concussion rates across all sports within their respective sexes. Field hockey for females (143.24, [142.53, 143.94]) and wrestling for males (237.73, [236.67, 237.90]) exhibit the greatest decrease in non-concussion time-loss injury rates. Males receive marginally smaller reductions in injury rates for increased SFWP compared to females (OR = 1.003, p ≤ 0.002). CONCLUSION Longer SFWPs lead to greater reductions in post-RTP injury rates for athletes in higher risk sports. Moreover, SFWPs should be tailored to sport-specific post-RTP injury risks.
Collapse
Affiliation(s)
- Gian-Gabriel P Garcia
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Lauren L Czerniak
- Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Mariel S Lavieri
- Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Spencer W Liebel
- Department of Neurology, University of Utah, Salt Lake City, UT, USA
| | | | - Paul F Pasquina
- Department of Physical Medicine and Rehabilitation, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Thomas W McAllister
- Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Michael A McCrea
- Departments of Neurosurgery and Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Steven P Broglio
- Michigan Concussion Center, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
87
|
Turhan S, Canbek U, Dubektas-Canbek T, Dogu E. Predicting Prolonged Wound Drainage after Hemiarthroplasty for Hip Fractures: A Stacked Machine Learning Study. Clin Orthop Surg 2023; 15:894-901. [PMID: 38045590 PMCID: PMC10689231 DOI: 10.4055/cios22181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 09/24/2022] [Accepted: 09/27/2022] [Indexed: 12/05/2023] Open
Abstract
Background Prolonged wound drainage (PWD) is one of the most important reasons that increase the risk of early periprosthetic joint infection after arthroplasty. It is very important to evaluate the risk factors for PWD in the surgical field after arthroplasty surgery. This can be accomplished using machine learning or artificial intelligence methods. Our aim in this study was to compare machine learning methods in predicting possible PWD. Methods The study was carried out on clinical, laboratory, and radiological data of 313 patients who underwent hemiarthroplasty (HA) for proximal femur fractures. We preprocessed the dataset and trained and tested machine learning methods using cross validation. We compared various machine learning algorithms (linear discriminant analysis, decision tree, k-nearest neighbors, gradient boosting machine, and logistic regression [LR]) based on performance measures. We also combined the most successful algorithms with a metaclassifier. To help understand the relationship between risk factors, we provided a risk factor severity ranking. Results To estimate the risk of PWD, classification was performed with first-level classifiers and then integrated as a LR-based meta-learner stacking method. More performance improvements were achieved with the stacking method. Conclusions We found that the stacking method was superior to other methods in PWD classification. We determined that the volume of fluid collected from the drain, morbid obesity class, blood transfusion, and body mass index score were the four most important risk factors according to stacking.
Collapse
Affiliation(s)
- Sultan Turhan
- Department of Statistics, Mugla Sitki Kocman University, Mugla, Türkiye
| | - Umut Canbek
- Department of Orthopedics and Traumatology, Mugla Sitki Kocman University College of Medicine, Mugla, Türkiye
| | - Tugba Dubektas-Canbek
- Department of Internal Medicine, Mugla Sitki Kocman University College of Medicine, Mugla, Türkiye
| | - Eralp Dogu
- Department of Statistics, Mugla Sitki Kocman University, Mugla, Türkiye
| |
Collapse
|
88
|
Mir BA, Rehman MU, Tayara H, Chong KT. Improving Enhancer Identification with a Multi-Classifier Stacked Ensemble Model. J Mol Biol 2023; 435:168314. [PMID: 37852600 DOI: 10.1016/j.jmb.2023.168314] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/06/2023] [Accepted: 10/11/2023] [Indexed: 10/20/2023]
Abstract
Enhancers are DNA regions that are responsible for controlling the expression of genes. Enhancers are usually found upstream or downstream of a gene, or even inside a gene's intron region, but are normally located at a distant location from the genes they control. By integrating experimental and computational approaches, it is possible to uncover enhancers within DNA sequences, which possess regulatory properties. Experimental techniques such as ChIP-seq and ATAC-seq can identify genomic regions that are associated with transcription factors or accessible to regulatory proteins. On the other hand, computational techniques can predict enhancers based on sequence features and epigenetic modifications. In our study, we have developed a multi-classifier stacked ensemble (MCSE-enhancer) model that can accurately identify enhancers. We utilized feature descriptors from various physiochemical properties as input for our six baseline classifiers and built a stacked classifier, which outperformed previous enhancer classification techniques in terms of accuracy, specificity, sensitivity, and Mathew's correlation coefficient. Our model achieved an accuracy of 81.5%, representing a 2-3% improvement over existing models.
Collapse
Affiliation(s)
- Bilal Ahmad Mir
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Mobeen Ur Rehman
- Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, Abu Dhabi 127788, United Arab Emirates.
| | - Hilal Tayara
- School of international Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea.
| |
Collapse
|
89
|
Ly QV, Tong NA, Lee BM, Nguyen MH, Trung HT, Le Nguyen P, Hoang THT, Hwang Y, Hur J. Improving algal bloom detection using spectroscopic analysis and machine learning: A case study in a large artificial reservoir, South Korea. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 901:166467. [PMID: 37611716 DOI: 10.1016/j.scitotenv.2023.166467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 08/17/2023] [Accepted: 08/19/2023] [Indexed: 08/25/2023]
Abstract
The prediction of algal blooms using traditional water quality indicators is expensive, labor-intensive, and time-consuming, making it challenging to meet the critical requirement of timely monitoring for prompt management. Using optical measures for forecasting algal blooms is a feasible and useful method to overcome these problems. This study explores the potential application of optical measures to enhance algal bloom prediction in terms of prediction accuracy and workload reduction, aided by machine learning (ML) models. Compared to absorption-derived parameters, commonly used fluorescence indices such as the fluorescence index (FI), humification index (HIX), biological index (BIX), and protein-like component improved the prediction accuracy. However, the prediction accuracy was decreased when all optical indices were considered for computation due to increased noise and uncertainty in the models. With the exception of chemical oxygen demand (COD), this study successfully replaced biochemical oxygen demand (BOD), dissolved organic carbon (DOC), and nutrients with selected fluorescence indices, demonstrating relatively analogous performance in either training or testing data, with consistent and good coefficient of determination (R2) values of approximately 0.85 and 0.74, respectively. Among all models considered, ensemble learning models consistently outperformed conventional regression models and artificial neural networks (ANNs). However, there was a trade-off between accuracy and computation efficiency among the ensemble learning models (i.e., Stacking and XGBoost) for algal bloom prediction. Our study offers a glimpse of the potential application of spectroscopic measures to improve accuracy and efficiency in algal bloom prediction, but further work should be carried out in other water bodies to further validate our proposed hypothesis.
Collapse
Affiliation(s)
- Quang Viet Ly
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, South Korea
| | - Ngoc Anh Tong
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Bo-Mi Lee
- Water Quality Assessment Research Division, National Institute of Environmental Research, Incheon 22689, South Korea
| | - Minh Hieu Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam; School of Information and Communication Technology, Griffith University, Gold Coast, Australia
| | - Huynh Thanh Trung
- Ecole Polytechnique Federale de Lausanne, 1015 Lausanne, Switzerland
| | - Phi Le Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Thu-Huong T Hoang
- School of Chemistry and Life Science, Hanoi University of Science and Technology, Hanoi 10000, Vietnam
| | - Yuhoon Hwang
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, South Korea
| | - Jin Hur
- Department of Environment and Energy, Sejong University, Seoul 05006, South Korea.
| |
Collapse
|
90
|
Twala B, Molloy E. On effectively predicting autism spectrum disorder therapy using an ensemble of classifiers. Sci Rep 2023; 13:19957. [PMID: 37968315 PMCID: PMC10651853 DOI: 10.1038/s41598-023-46379-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 10/31/2023] [Indexed: 11/17/2023] Open
Abstract
An ensemble of classifiers combines several single classifiers to deliver a final prediction or classification decision. An increasingly provoking question is whether such an ensemble can outperform the single best classifier. If so, what form of ensemble learning system (also known as multiple classifier learning systems) yields the most significant benefits in the size or diversity of the ensemble? In this paper, the ability of ensemble learning to predict and identify factors that influence or contribute to autism spectrum disorder therapy (ASDT) for intervention purposes is investigated. Given that most interventions are typically short-term in nature, henceforth, developing a robotic system that will provide the best outcome and measurement of ASDT therapy has never been so critical. In this paper, the performance of five single classifiers against several multiple classifier learning systems in exploring and predicting ASDT is investigated using a dataset of behavioural data and robot-enhanced therapy against standard human treatment based on 3000 sessions and 300 h, recorded from 61 autistic children. Experimental results show statistically significant differences in performance among the single classifiers for ASDT prediction with decision trees as the more accurate classifier. The results further show multiple classifier learning systems (MCLS) achieving better performance for ASDT prediction (especially those ensembles with three core classifiers). Additionally, the results show bagging and boosting ensemble learning as robust when predicting ASDT with multi-stage design as the most dominant architecture. It also appears that eye contact and social interaction are the most critical contributing factors to the ASDT problem among children.
Collapse
Affiliation(s)
- Bhekisipho Twala
- Office of the Deputy Vice-Chancellor (Digital Transformation), Tshwane University of Technology, Private Bag x680, Pretoria, 001, South Africa.
| | - Eamon Molloy
- Waterford Institute of Technology, School of Science & Computing, Waterford, Ireland
| |
Collapse
|
91
|
Wang GA, Yan X, Li X, Liu Y, Xia J, Zhu X. MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy. ACS OMEGA 2023; 8:41930-41942. [PMID: 37969991 PMCID: PMC10634282 DOI: 10.1021/acsomega.3c07086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 10/11/2023] [Accepted: 10/13/2023] [Indexed: 11/17/2023]
Abstract
As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace.
Collapse
Affiliation(s)
- Gang-Ao Wang
- School
of Sciences, Anhui Agricultural University, Hefei 230036, Anhui, China
| | - Xiaodi Yan
- School
of Sciences, Anhui Agricultural University, Hefei 230036, Anhui, China
| | - Xiang Li
- School
of Sciences, Anhui Agricultural University, Hefei 230036, Anhui, China
| | - Yinbo Liu
- School
of Sciences, Anhui Agricultural University, Hefei 230036, Anhui, China
| | - Junfeng Xia
- Key
Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, Anhui, China
| | - Xiaolei Zhu
- School
of Sciences, Anhui Agricultural University, Hefei 230036, Anhui, China
| |
Collapse
|
92
|
Ran R, Brubaker DK. Enhanced annotation of CD45RA to distinguish T cell subsets in single-cell RNA-seq via machine learning. BIOINFORMATICS ADVANCES 2023; 3:vbad159. [PMID: 38023329 PMCID: PMC10676521 DOI: 10.1093/bioadv/vbad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 10/20/2023] [Accepted: 11/03/2023] [Indexed: 12/01/2023]
Abstract
Motivation T cell heterogeneity presents a challenge for accurate cell identification, understanding their inherent plasticity, and characterizing their critical role in adaptive immunity. Immunologists have traditionally employed techniques such as flow cytometry to identify T cell subtypes based on a well-established set of surface protein markers. With the advent of single-cell RNA sequencing (scRNA-seq), researchers can now investigate the gene expression profiles of these surface proteins at the single-cell level. The insights gleaned from these profiles offer valuable clues and a deeper understanding of cell identity. However, CD45RA, the isoform of CD45 which distinguishes between naive/central memory T cells and effector memory/effector memory cells re-expressing CD45RA T cells, cannot be well profiled by scRNA-seq due to the difficulty in mapping short reads to genes. Results In order to facilitate cell-type annotation in T cell scRNA-seq analysis, we employed machine learning and trained a CD 45 RA + / - classifier on single-cell mRNA count data annotated with known CD45RA antibody levels provided by cellular indexing of transcriptomes and epitopes sequencing data. Among all the algorithms we tested, the trained support vector machine with a radial basis function kernel with optimized hyperparameters achieved a 99.96% accuracy on an unseen dataset. The multilayer perceptron classifier, the second most predictive method overall, also achieved a decent accuracy of 99.74%. Our simple yet robust machine learning approach provides a valid inference on the CD45RA level, assisting the cell identity annotation and further exploring the heterogeneity within human T cells. Based on the overall performance, we chose the support vector machine with a radial basis function kernel as the model implemented in our Python package scCD45RA. Availability and implementation The resultant package scCD45RA can be found at https://github.com/BrubakerLab/ScCD45RA and can be installed from the Python Package Index (PyPI) using the command "pip install sccd45ra."
Collapse
Affiliation(s)
- Ran Ran
- Department of Pathology, Center for Global Health and Diseases, Case Western Reserve University School of Medicine, Cleveland, OH 44106, United States
| | - Douglas K Brubaker
- Department of Pathology, Center for Global Health and Diseases, Case Western Reserve University School of Medicine, Cleveland, OH 44106, United States
- The Blood, Heart, Lung, and Immunology Research Center, Case Western Reserve University, University Hospitals of Cleveland, Cleveland, OH 44106, United States
| |
Collapse
|
93
|
Wang B, Finazzo M, Artsimovitch I. Machine Learning Suggests That Small Size Helps Broaden Plasmid Host Range. Genes (Basel) 2023; 14:2044. [PMID: 38002987 PMCID: PMC10670969 DOI: 10.3390/genes14112044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 11/01/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023] Open
Abstract
Plasmids mediate gene exchange across taxonomic barriers through conjugation, shaping bacterial evolution for billions of years. While plasmid mobility can be harnessed for genetic engineering and drug-delivery applications, rapid plasmid-mediated spread of resistance genes has rendered most clinical antibiotics useless. To solve this urgent and growing problem, we must understand how plasmids spread across bacterial communities. Here, we applied machine-learning models to identify features that are important for extending the plasmid host range. We assembled an up-to-date dataset of more than thirty thousand bacterial plasmids, separated them into 1125 clusters, and assigned each cluster a distribution possibility score, taking into account the host distribution of each taxonomic rank and the sampling bias of the existing sequencing data. Using this score and an optimized plasmid feature pool, we built a model stack consisting of DecisionTreeRegressor, EvoTreeRegressor, and LGBMRegressor as base models and LinearRegressor as a meta-learner. Our mathematical modeling revealed that sequence brevity is the most important determinant for plasmid spread, followed by P-loop NTPases, mobility factors, and β-lactamases. Ours and other recent results suggest that small plasmids may broaden their range by evading host defenses and using alternative modes of transfer instead of autonomous conjugation.
Collapse
Affiliation(s)
- Bing Wang
- Department of Microbiology and Center for RNA Biology, The Ohio State University, Columbus, OH 43210, USA;
| | | | - Irina Artsimovitch
- Department of Microbiology and Center for RNA Biology, The Ohio State University, Columbus, OH 43210, USA;
| |
Collapse
|
94
|
Li M, Wang H, Yang Z, Zhang L, Zhu Y. DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences. Comput Struct Biotechnol J 2023; 21:5544-5560. [PMID: 38034401 PMCID: PMC10681957 DOI: 10.1016/j.csbj.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/02/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023] Open
Abstract
Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (Tm). However, due to the limited availability of experimentally determined Tm data and the insufficient accuracy of existing computational methods in predicting Tm, there is an urgent need for a computational approach to accurately forecast the Tm values of thermophilic proteins. Here, we present a deep learning-based model, called DeepTM, which exclusively utilizes protein sequences as input and accurately predicts the Tm values of target thermophilic proteins on a dataset consisting of 7790 thermophilic protein entries. On a test set of 1550 samples, DeepTM demonstrates excellent performance with a coefficient of determination (R2) of 0.75, Pearson correlation coefficient (P) of 0.87, and root mean square error (RMSE) of 6.24 ℃. We further analyzed the sequence features that determine the thermal stability of thermophilic proteins and found that dipeptide frequency, optimal growth temperature (OGT) of the host organisms, and the evolutionary information of the protein significantly affect its melting temperature. We compared the performance of DeepTM with recently reported methods, ProTstab2 and DeepSTABp, in predicting the Tm values on two blind test datasets. One dataset comprised 22 PET plastic-degrading enzymes, while the other included 29 thermally stable proteins of broader classification. In the PET plastic-degrading enzyme dataset, DeepTM achieved RMSE of 8.25 ℃. Compared to ProTstab2 (20.05 ℃) and DeepSTABp (20.97 ℃), DeepTM demonstrated a reduction in RMSE of 58.85% and 60.66%, respectively. In the dataset of thermally stable proteins, DeepTM (RMSE=7.66 ℃) demonstrated a 51.73% reduction in RMSE compared to ProTstab2 (RMSE=15.87 ℃). DeepTM, with the sole requirement of protein sequence information, accurately predicts the melting temperature and achieves a fully end-to-end prediction process, thus providing enhanced convenience and expediency for further protein engineering.
Collapse
Affiliation(s)
- Mengyu Li
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Hongzhao Wang
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Zhenwu Yang
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Longgui Zhang
- SINOPEC Beijing Research Institute of Chemical Industry, Beijing 100013, China
| | - Yushan Zhu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, Beijing 100029, China
| |
Collapse
|
95
|
Holzhauer B, Adewuyi ET. "Super-covariates": Using predicted control group outcome as a covariate in randomized clinical trials. Pharm Stat 2023; 22:1062-1075. [PMID: 37553959 DOI: 10.1002/pst.2329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/01/2023] [Accepted: 07/14/2023] [Indexed: 08/10/2023]
Abstract
The power of randomized controlled clinical trials to demonstrate the efficacy of a drug compared with a control group depends not just on how efficacious the drug is, but also on the variation in patients' outcomes. Adjusting for prognostic covariates during trial analysis can reduce this variation. For this reason, the primary statistical analysis of a clinical trial is often based on regression models that besides terms for treatment and some further terms (e.g., stratification factors used in the randomization scheme of the trial) also includes a baseline (pre-treatment) assessment of the primary outcome. We suggest to include a "super-covariate"-that is, a patient-specific prediction of the control group outcome-as a further covariate (but not as an offset). We train a prognostic model or ensembles of such models on the individual patient (or aggregate) data of other studies in similar patients, but not the new trial under analysis. This has the potential to use historical data to increase the power of clinical trials and avoids the concern of type I error inflation with Bayesian approaches, but in contrast to them has a greater benefit for larger sample sizes. It is important for prognostic models behind "super-covariates" to generalize well across different patient populations in order to similarly reduce unexplained variability whether the trial(s) to develop the model are identical to the new trial or not. In an example in neovascular age-related macular degeneration we saw efficiency gains from the use of a "super-covariate".
Collapse
|
96
|
Dhandapani A, Iqbal J, Kumar RN. Application of machine learning (individual vs stacking) models on MERRA-2 data to predict surface PM 2.5 concentrations over India. CHEMOSPHERE 2023; 340:139966. [PMID: 37634588 DOI: 10.1016/j.chemosphere.2023.139966] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/31/2023] [Accepted: 08/24/2023] [Indexed: 08/29/2023]
Abstract
The spatial coverage of PM2.5 monitoring is non-uniform across India due to the limited number of ground monitoring stations. Alternatively, Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), is an atmospheric reanalysis data used for estimating PM2.5. MERRA-2 does not explicitly measure PM2.5 but rather follows an empirical model. MERRA-2 data were spatiotemporally collocated with ground observation for validation across India. Significant underestimation in MERRA-2 prediction of PM2.5 was observed over many monitoring stations ranging from -20 to 60 μg m-3. The utility of Machine Learning (ML) models to overcome this challenge was assessed. MERRA-2 aerosol and meteorological parameters were the input features used to train and test the individual ML models and compare them with the stacking technique. Initially, with 10% of randomly selected data, individual model performance was assessed to identify the best model. XGBoost (XGB) was the best model (r2 = 0.73) compared to Random Forest (RF) and LightGBM (LGBM). Stacking was then applied by keeping XGB as a meta-regressor. Stacked model results (r2 = 0.77) outperformed the best standalone estimate of XGB. Stacking technique was used to predict hourly and daily PM2.5 in different regions across India and each monitoring station. The eastern region exhibited the best hourly prediction (r2 = 0.80) and substantial reduction in Mean Bias (MB = -0.03 μg m-3), followed by the northern region (r2 = 0.63 and MB = -0.10 μg m-3), which showed better output due to the frequent observation of PM2.5 >100 μg m-3. Due to sparse data availability to train the ML models, the lowest performance was for the central region (r2 = 0.46 and MB = -0.60 μg m-3). Overall, India's PM2.5 prediction was good on an hourly basis compared to a daily basis using the ML stacking technique.
Collapse
Affiliation(s)
- Abisheg Dhandapani
- Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India
| | - Jawed Iqbal
- Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India
| | - R Naresh Kumar
- Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India.
| |
Collapse
|
97
|
Lewandowska E, Węsierski D, Mazur-Milecka M, Liss J, Jezierska A. Ensembling noisy segmentation masks of blurred sperm images. Comput Biol Med 2023; 166:107520. [PMID: 37804777 DOI: 10.1016/j.compbiomed.2023.107520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 08/11/2023] [Accepted: 09/19/2023] [Indexed: 10/09/2023]
Abstract
BACKGROUND Sperm tail morphology and motility have been demonstrated to be important factors in determining sperm quality for in vitro fertilization. However, many existing computer-aided sperm analysis systems leave the sperm tail out of the analysis, as detecting a few tail pixels is challenging. Moreover, some publicly available datasets for classifying morphological defects contain images limited only to the sperm head. This study focuses on the segmentation of full sperm, which consists of the head and tail parts, and appear alone and in groups. METHODS We re-purpose the Feature Pyramid Network to ensemble an input image with multiple masks from state-of-the-art segmentation algorithms using a scale-specific cross-attention module. We normalize homogeneous backgrounds for improved training. The low field depth of microscopes blurs the images, easily confusing human raters in discerning minuscule sperm from large backgrounds. We thus propose evaluation protocols for scoring segmentation models trained on imbalanced data and noisy ground truth. RESULTS The neural ensembling of noisy segmentation masks outperforms all single, state-of-the-art segmentation algorithms in full sperm segmentation. Human raters agree more on the head than tail masks. The algorithms also segment the head better than the tail. CONCLUSIONS The extensive evaluation of state-of-the-art segmentation algorithms shows that full sperm segmentation is challenging. We release the SegSperm dataset of images from Intracytoplasmic Sperm Injection procedures to spur further progress on full sperm segmentation with noisy and imbalanced ground truth. The dataset is publicly available at https://doi.org/10.34808/6wm7-1159.
Collapse
Affiliation(s)
| | - Daniel Węsierski
- Cameras and Algorithms Lab, Gdańsk University of Technology, Poland; Multimedia Systems Department, Faculty of Electronics, Telecommunication, and Informatics, Gdańsk University of Technology, Poland
| | - Magdalena Mazur-Milecka
- Department of Biomedical Engineering, Faculty of Electronics, Telecommunications, and Informatics, Gdańsk University of Technology, Poland
| | - Joanna Liss
- Invicta Research and Development Center, Sopot, Poland; Department of Medical Biology and Genetics, University of Gdańsk, Poland
| | - Anna Jezierska
- Cameras and Algorithms Lab, Gdańsk University of Technology, Poland; Department of Biomedical Engineering, Faculty of Electronics, Telecommunications, and Informatics, Gdańsk University of Technology, Poland; Department of Modelling and Optimization of Dynamical Systems, Systems Research Institute Warsaw, Poland.
| |
Collapse
|
98
|
Rajaraman S, Yang F, Zamzmi G, Xue Z, Antani S. Can Deep Adult Lung Segmentation Models Generalize to the Pediatric Population? EXPERT SYSTEMS WITH APPLICATIONS 2023; 229:120531. [PMID: 37397242 PMCID: PMC10310063 DOI: 10.1016/j.eswa.2023.120531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Lung segmentation in chest X-rays (CXRs) is an important prerequisite for improving the specificity of diagnoses of cardiopulmonary diseases in a clinical decision support system. Current deep learning models for lung segmentation are trained and evaluated on CXR datasets in which the radiographic projections are captured predominantly from the adult population. However, the shape of the lungs is reported to be significantly different across the developmental stages from infancy to adulthood. This might result in age-related data domain shifts that would adversely impact lung segmentation performance when the models trained on the adult population are deployed for pediatric lung segmentation. In this work, our goal is to (i) analyze the generalizability of deep adult lung segmentation models to the pediatric population and (ii) improve performance through a stage-wise, systematic approach consisting of CXR modality-specific weight initializations, stacked ensembles, and an ensemble of stacked ensembles. To evaluate segmentation performance and generalizability, novel evaluation metrics consisting of mean lung contour distance (MLCD) and average hash score (AHS) are proposed in addition to the multi-scale structural similarity index measure (MS-SSIM), the intersection of union (IoU), Dice score, 95% Hausdorff distance (HD95), and average symmetric surface distance (ASSD). Our results showed a significant improvement (p < 0.05) in cross-domain generalization through our approach. This study could serve as a paradigm to analyze the cross-domain generalizability of deep segmentation models for other medical imaging modalities and applications.
Collapse
Affiliation(s)
- Sivaramakrishnan Rajaraman
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Feng Yang
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Ghada Zamzmi
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Zhiyun Xue
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sameer Antani
- Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
99
|
Boussioux L, Ma Y, Thomas NK, Bertsimas D, Shusharina N, Pursley J, Chen YL, DeLaney TF, Qian J, Bortfeld T. Automated Segmentation of Sacral Chordoma and Surrounding Muscles Using Deep Learning Ensemble. Int J Radiat Oncol Biol Phys 2023; 117:738-749. [PMID: 37451472 PMCID: PMC10665084 DOI: 10.1016/j.ijrobp.2023.03.078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 03/18/2023] [Accepted: 03/30/2023] [Indexed: 07/18/2023]
Abstract
PURPOSE The manual segmentation of organ structures in radiation oncology treatment planning is a time-consuming and highly skilled task, particularly when treating rare tumors like sacral chordomas. This study evaluates the performance of automated deep learning (DL) models in accurately segmenting the gross tumor volume (GTV) and surrounding muscle structures of sacral chordomas. METHODS AND MATERIALS An expert radiation oncologist contoured 5 muscle structures (gluteus maximus, gluteus medius, gluteus minimus, paraspinal, piriformis) and sacral chordoma GTV on computed tomography images from 48 patients. We trained 6 DL auto-segmentation models based on 3-dimensional U-Net and residual 3-dimensional U-Net architectures. We then implemented an average and an optimally weighted average ensemble to improve prediction performance. We evaluated algorithms with the average and standard deviation of the volumetric Dice similarity coefficient, surface Dice similarity coefficient with 2- and 3-mm thresholds, and average symmetric surface distance. One independent expert radiation oncologist assessed the clinical viability of the DL contours and determined the necessary amount of editing before they could be used in clinical practice. RESULTS Quantitatively, the ensembles performed the best across all structures. The optimal ensemble (volumetric Dice similarity coefficient, average symmetric surface distance) was (85.5 ± 6.4, 2.6 ± 0.8; GTV), (94.4 ± 1.5, 1.0 ± 0.4; gluteus maximus), (92.6 ± 0.9, 0.9 ± 0.1; gluteus medius), (85.0 ± 2.7, 1.1 ± 0.3; gluteus minimus), (92.1 ± 1.5, 0.8 ± 0.2; paraspinal), and (78.3 ± 5.7, 1.5 ± 0.6; piriformis). The qualitative evaluation suggested that the best model could reduce the total muscle and tumor delineation time to a 19-minute average. CONCLUSIONS Our methodology produces expert-level muscle and sacral chordoma tumor segmentation using DL and ensemble modeling. It can substantially augment the streamlining and accuracy of treatment planning and represents a critical step toward automated delineation of the clinical target volume in sarcoma and other disease sites.
Collapse
Affiliation(s)
- Leonard Boussioux
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts; Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts; University of Washington, Michael G. Foster School of Business, Department of Information Systems and Operations Management, Seattle, Washington.
| | - Yu Ma
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts; Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Nancy Knight Thomas
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts; Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Dimitris Bertsimas
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts; Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Nadya Shusharina
- Department of Radiation Oncology, Massachusetts General Hospital, Boston, Massachusetts; Harvard Medical School, Boston, Massachusetts
| | - Jennifer Pursley
- Department of Radiation Oncology, Massachusetts General Hospital, Boston, Massachusetts; Harvard Medical School, Boston, Massachusetts
| | - Yen-Lin Chen
- Department of Radiation Oncology, Massachusetts General Hospital, Boston, Massachusetts; Harvard Medical School, Boston, Massachusetts
| | - Thomas F DeLaney
- Department of Radiation Oncology, Massachusetts General Hospital, Boston, Massachusetts; Harvard Medical School, Boston, Massachusetts
| | - Jack Qian
- Department of Radiation Oncology, Massachusetts General Hospital, Boston, Massachusetts; Harvard Medical School, Boston, Massachusetts
| | - Thomas Bortfeld
- Department of Radiation Oncology, Massachusetts General Hospital, Boston, Massachusetts; Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
100
|
Srisongkram T, Syahid NF, Tookkane D, Weerapreeyakul N, Puthongking P. Stacked ensemble learning on HaCaT cytotoxicity for skin irritation prediction: A case study on dipterocarpol. Food Chem Toxicol 2023; 181:114115. [PMID: 37863382 DOI: 10.1016/j.fct.2023.114115] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 09/26/2023] [Accepted: 10/17/2023] [Indexed: 10/22/2023]
Abstract
Skin irritation is an adverse effect associated with various substances, including chemicals, drugs, or natural products. Dipterocarpol, extracted from Dipterocarpus alatus, contains several skin benefits notably anticancer, wound healing, and antibacterial properties. However, the skin irritation of dipterocarpol remains unassessed. Quantitative structure-activity relationship (QSAR) is a recommended tool for toxicity assessment involving less time, money, and animal testing to access unavailable acute toxicity data. Therefore, our study aimed to develop a highly accurate machine learning-based QSAR model for predicting skin irritation. We utilized a stacked ensemble learning model with 1064 chemicals. We also adhered to the recommendations from the OECD for QSAR validation. Subsequently, we used the proposed model to explore the cytotoxicity of dipterocarpol on keratinocytes. Our findings indicate that the model displayed promising statistical quality in terms of accuracy, precision, and recall in both 10-fold cross-validation and test datasets. Moreover, the model predicted that dipterocarpol does not have skin irritation, which was confirmed by the cell-based assay. In conclusion, our proposed model can be applied for the risk assessment of skin irritation in untested compounds that fall within its applicability domain. The web application of this model is available at https://qsarlabs.com/#stackhacat.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand; Human High Performance and Health Promotion Research Institute, Khon Kaen University, Khon Kaen, 40002, Thailand.
| | - Nur Fadhilah Syahid
- Graduate School in the Program of Pharmaceutical Chemistry and Natural Products, Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand
| | - Dheerapat Tookkane
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand
| | - Natthida Weerapreeyakul
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand; Human High Performance and Health Promotion Research Institute, Khon Kaen University, Khon Kaen, 40002, Thailand
| | - Ploenthip Puthongking
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand
| |
Collapse
|