1
|
Murata MM, Igari F, Urbanowicz R, Mouakkad L, Kim S, Chen Z, DiVizio D, Posadas EM, Giuliano AE, Tanaka H. A Practical Approach for Targeting Structural Variants Genome-wide in Plasma Cell-free DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.25.564058. [PMID: 37961589 PMCID: PMC10634834 DOI: 10.1101/2023.10.25.564058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Plasma cell-free DNA (cfDNA) is a promising source of gene mutations for cancer detection by liquid biopsy. However, no current tests interrogate chromosomal structural variants (SVs) genome-wide. Here, we report a simple molecular and sequencing workflow called Genome-wide Analysis of Palindrome Formation (GAPF-seq) to probe DNA palindromes, a type of SV that often demarcates gene amplification. With low-throughput next-generation sequencing and automated machine learning, tumor DNA showed skewed chromosomal distributions of high-coverage 1-kb bins (HCBs), which differentiated 39 breast tumors from matched normal DNA with an average Area Under the Curve (AUC) of 0.9819. A proof-of-concept liquid biopsy study using cfDNA from prostate cancer patients and healthy individuals yielded an average AUC of 0.965. HCBs on the X chromosome emerged as a determinant feature and were associated with androgen receptor gene amplification. As a novel agnostic liquid biopsy approach, GAPF-seq could fill the technological gap offering unique cancer-specific SV profiles.
Collapse
|
2
|
Tanaka H, Murata M, Igari F, Urbanowicz R, Mouakkad L, Kim S, Chen Z, Di Vizio D, Posadas E, Giuliano A. A Practical Approach for Targeting Structural Variants Genome-wide in Plasma Cell-free DNA. RESEARCH SQUARE 2024:rs.3.rs-3492157. [PMID: 38260372 PMCID: PMC10802711 DOI: 10.21203/rs.3.rs-3492157/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Interrogating plasma cell-free DNA (cfDNA) to detect cancer offers promise; however, no current tests scan structural variants (SVs) throughout the genome. Here, we report a simple molecular workflow to enrich a tumorigenic SV (DNA palindromes/fold-back inversions) that often demarcates genomic amplification and its feasibility for cancer detection by combining low-throughput next-generation sequencing with automated machine learning (Genome-wide Analysis of Palindrome Formation, GAPF-seq). Tumor DNA signal manifested as skewed chromosomal distributions of high-coverage 1-kb bins (HCBs), differentiating 39 matched breast tumor DNA from normal DNA with an average AUC of 0.9819. In a proof-of-concept liquid biopsy study, cfDNA from 0.5 mL plasma from prostate cancer patients was sufficient for binary classification against matched buffy coat DNA with an average AUC of 0.965. HCBs on the X chromosome emerged as a determinant feature and were associated with AR amplification. GAPF-seq could generate unique cancer-specific SV profiles in an agnostic liquid biopsy setting.
Collapse
|
3
|
Meyer C, Romero NB, Evangelista T, Cadot B, Laporte J, Jeannin-Girardon A, Collet P, Ayadi A, Chennen K, Poch O. IMPatienT: An Integrated Web Application to Digitize, Process and Explore Multimodal PATIENt daTa. J Neuromuscul Dis 2024; 11:855-870. [PMID: 38701156 PMCID: PMC11307071 DOI: 10.3233/jnd-230085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2024] [Indexed: 05/05/2024]
Abstract
Medical acts, such as imaging, lead to the production of various medical text reports that describe the relevant findings. This induces multimodality in patient data by combining image data with free-text and consequently, multimodal data have become central to drive research and improve diagnoses. However, the exploitation of patient data is problematic as the ecosystem of analysis tools is fragmented according to the type of data (images, text, genetics), the task (processing, exploration) and domain of interest (clinical phenotype, histology). To address the challenges, we developed IMPatienT (Integrated digital Multimodal PATIENt daTa), a simple, flexible and open-source web application to digitize, process and explore multimodal patient data. IMPatienT has a modular architecture allowing to: (i) create a standard vocabulary for a domain, (ii) digitize and process free-text data, (iii) annotate images and perform image segmentation, (iv) generate a visualization dashboard and provide diagnosis decision support. To demonstrate the advantages of IMPatienT, we present a use case on a corpus of 40 simulated muscle biopsy reports of congenital myopathy patients. As IMPatienT provides users with the ability to design their own vocabulary, it can be adapted to any research domain and can be used as a patient registry for exploratory data analysis. A demo instance of the application is available at https://impatient.lbgi.fr/.
Collapse
Affiliation(s)
- Corentin Meyer
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Norma Beatriz Romero
- Neuromuscular Morphology Unit, Myology Institute, Reference Center of Neuromuscular Diseases Nord-Est-IDF, GHU Pitié-Salpêtrière, Paris, France
| | - Teresinha Evangelista
- Neuromuscular Morphology Unit, Myology Institute, Reference Center of Neuromuscular Diseases Nord-Est-IDF, GHU Pitié-Salpêtrière, Paris, France
| | - Brunot Cadot
- Sorbonne Université, INSERM, Center for Research in Myology, Myology Institute, GHU Pitié-Salpêtrière, Paris, France
| | - Jocelyn Laporte
- Department Translational Medicine, IGBMC, CNRS UMR 7104, Illkirch, France
| | - Anne Jeannin-Girardon
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Ali Ayadi
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Kirsley Chennen
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| |
Collapse
|
4
|
Ravindra NG, Espinosa C, Berson E, Phongpreecha T, Zhao P, Becker M, Chang AL, Shome S, Marić I, De Francesco D, Mataraso S, Saarunya G, Thuraiappah M, Xue L, Gaudillière B, Angst MS, Shaw GM, Herzog ED, Stevenson DK, England SK, Aghaeepour N. Deep representation learning identifies associations between physical activity and sleep patterns during pregnancy and prematurity. NPJ Digit Med 2023; 6:171. [PMID: 37770643 PMCID: PMC10539360 DOI: 10.1038/s41746-023-00911-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 08/21/2023] [Indexed: 09/30/2023] Open
Abstract
Preterm birth (PTB) is the leading cause of infant mortality globally. Research has focused on developing predictive models for PTB without prioritizing cost-effective interventions. Physical activity and sleep present unique opportunities for interventions in low- and middle-income populations (LMICs). However, objective measurement of physical activity and sleep remains challenging and self-reported metrics suffer from low-resolution and accuracy. In this study, we use physical activity data collected using a wearable device comprising over 181,944 h of data across N = 1083 patients. Using a new state-of-the art deep learning time-series classification architecture, we develop a 'clock' of healthy dynamics during pregnancy by using gestational age (GA) as a surrogate for progression of pregnancy. We also develop novel interpretability algorithms that integrate unsupervised clustering, model error analysis, feature attribution, and automated actigraphy analysis, allowing for model interpretation with respect to sleep, activity, and clinical variables. Our model performs significantly better than 7 other machine learning and AI methods for modeling the progression of pregnancy. We found that deviations from a normal 'clock' of physical activity and sleep changes during pregnancy are strongly associated with pregnancy outcomes. When our model underestimates GA, there are 0.52 fewer preterm births than expected (P = 1.01e - 67, permutation test) and when our model overestimates GA, there are 1.44 times (P = 2.82e - 39, permutation test) more preterm births than expected. Model error is negatively correlated with interdaily stability (P = 0.043, Spearman's), indicating that our model assigns a more advanced GA when an individual's daily rhythms are less precise. Supporting this, our model attributes higher importance to sleep periods in predicting higher-than-actual GA, relative to lower-than-actual GA (P = 1.01e - 21, Mann-Whitney U). Combining prediction and interpretability allows us to signal when activity behaviors alter the likelihood of preterm birth and advocates for the development of clinical decision support through passive monitoring and exercise habit and sleep recommendations, which can be easily implemented in LMICs.
Collapse
Affiliation(s)
- Neal G Ravindra
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Eloïse Berson
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford School of Medicine, Stanford, CA, USA
| | - Thanaphong Phongpreecha
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford School of Medicine, Stanford, CA, USA
| | - Peinan Zhao
- Department of Biology, Washington University in St. Louis, St. Louis, MO, USA
- Department of Obstetrics and Gynecology, Washington University in St. Louis, St. Louis, MO, USA
| | - Martin Becker
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Alan L Chang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Sayane Shome
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Ivana Marić
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Davide De Francesco
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Samson Mataraso
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Geetha Saarunya
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Melan Thuraiappah
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Lei Xue
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Brice Gaudillière
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Martin S Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Gary M Shaw
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
| | - Erik D Herzog
- Department of Biology, Washington University in St. Louis, St. Louis, MO, USA
| | - David K Stevenson
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA
| | - Sarah K England
- Department of Obstetrics and Gynecology, Washington University in St. Louis, St. Louis, MO, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA.
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
5
|
Weerasinghe MMA, Wang G, Whalley J, Crook-Rumsey M. Mental stress recognition on the fly using neuroplasticity spiking neural networks. Sci Rep 2023; 13:14962. [PMID: 37696860 PMCID: PMC10495416 DOI: 10.1038/s41598-023-34517-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 05/03/2023] [Indexed: 09/13/2023] Open
Abstract
Mental stress is found to be strongly connected with human cognition and wellbeing. As the complexities of human life increase, the effects of mental stress have impacted human health and cognitive performance across the globe. This highlights the need for effective non-invasive stress detection methods. In this work, we introduce a novel, artificial spiking neural network model called Online Neuroplasticity Spiking Neural Network (O-NSNN) that utilizes a repertoire of learning concepts inspired by the brain to classify mental stress using Electroencephalogram (EEG) data. These models are personalized and tested on EEG data recorded during sessions in which participants listen to different types of audio comments designed to induce acute stress. Our O-NSNN models learn on the fly producing an average accuracy of 90.76% (σ = 2.09) when classifying EEG signals of brain states associated with these audio comments. The brain-inspired nature of the individual models makes them robust and efficient and has the potential to be integrated into wearable technology. Furthermore, this article presents an exploratory analysis of trained O-NSNNs to discover links between perceived and acute mental stress. The O-NSNN algorithm proved to be better for personalized stress recognition in terms of accuracy, efficiency, and model interpretability.
Collapse
Affiliation(s)
- Mahima Milinda Alwis Weerasinghe
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand.
- Brain-Inspired AI and Neuroinformatics Lab, Department of Data Science, Sri Lanka Technological Campus, Padukka, Sri Lanka.
| | - Grace Wang
- School of Psychology and Wellbeing, University of Southern Queensland, Toowoomba, Australia
- Centre for Health Research, University of Southern Queensland, Toowoomba, Australia
| | - Jacqueline Whalley
- Department of Computer Science and Software Engineering, Auckland University of Technology, Auckland, New Zealand
| | - Mark Crook-Rumsey
- Department of Basic and Clinical Neuroscience, King's College London, London, UK
- UK Dementia Research Institute, Centre for Care Research and Technology, Imperial College London, London, UK
| |
Collapse
|
6
|
Kennedy EE, Davoudi A, Hwang S, Freda PJ, Urbanowicz R, Bowles KH, Mowery DL. Identifying Barriers to Post-Acute Care Referral and Characterizing Negative Patient Preferences Among Hospitalized Older Adults Using Natural Language Processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2023; 2022:606-615. [PMID: 37128417 PMCID: PMC10148308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Our objective was to detect common barriers to post-acute care (B2PAC) among hospitalized older adults using natural language processing (NLP) of clinical notes from patients discharged home when a clinical decision support system recommended post-acute care. We annotated B2PAC sentences from discharge planning notes and developed an NLP classifier to identify the highest-value B2PAC class (negative patient preferences). Thirteen machine learning models were compared with Amazon's AutoGluon deep learning model. The study included 594 acute care notes from 100 patient encounters (1156 sentences contained 11 B2PAC) in a large academic health system. The most frequent and modifiable B2PAC class was negative patient preferences (18.3%). The best supervised model was Extreme Gradient Boosting (F1: 0.859), but the deep learning model performed better (F1: 0.916). Alerting clinicians of negative patient preferences early in the hospitalization can prompt interventions such as patient education to ensure patients receive the right level of care and avoid negative outcomes.
Collapse
Affiliation(s)
- Erin E Kennedy
- University of Pennsylvania School of Nursing, NewCourtland Center for Transitions and Health, Philadelphia, PA
| | - Anahita Davoudi
- University of Pennsylvania, Institute for Biomedical Informatics, Philadelphia, PA
| | - Sy Hwang
- University of Pennsylvania, Institute for Biomedical Informatics, Philadelphia, PA
| | - Philip J Freda
- University of Pennsylvania, Institute for Biomedical Informatics, Philadelphia, PA
- Cedars-Sinai Medical Center, Department of Computational Biomedicine, Los Angeles, California
| | - Ryan Urbanowicz
- University of Pennsylvania, Institute for Biomedical Informatics, Philadelphia, PA
- Cedars-Sinai Medical Center, Department of Computational Biomedicine, Los Angeles, California
| | - Kathryn H Bowles
- University of Pennsylvania School of Nursing, NewCourtland Center for Transitions and Health, Philadelphia, PA
| | - Danielle L Mowery
- University of Pennsylvania, Institute for Biomedical Informatics, Philadelphia, PA
| |
Collapse
|
7
|
Hwang S, Urbanowicz R, Lynch S, Vernon T, Bresz K, Giraldo C, Kennedy E, Leabhart M, Bleacher T, Ripchinski MR, Mowery DL, Oyer RA. Toward Predicting 30-Day Readmission Among Oncology Patients: Identifying Timely and Actionable Risk Factors. JCO Clin Cancer Inform 2023; 7:e2200097. [PMID: 36809006 PMCID: PMC10476733 DOI: 10.1200/cci.22.00097] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/05/2022] [Accepted: 01/13/2023] [Indexed: 02/23/2023] Open
Abstract
PURPOSE Predicting 30-day readmission risk is paramount to improving the quality of patient care. In this study, we compare sets of patient-, provider-, and community-level variables that are available at two different points of a patient's inpatient encounter (first 48 hours and the full encounter) to train readmission prediction models and identify possible targets for appropriate interventions that can potentially reduce avoidable readmissions. METHODS Using electronic health record data from a retrospective cohort of 2,460 oncology patients and a comprehensive machine learning analysis pipeline, we trained and tested models predicting 30-day readmission on the basis of data available within the first 48 hours of admission and from the entire hospital encounter. RESULTS Leveraging all features, the light gradient boosting model produced higher, but comparable performance (area under receiver operating characteristic curve [AUROC]: 0.711) with the Epic model (AUROC: 0.697). Given features in the first 48 hours, the random forest model produces higher AUROC (0.684) than the Epic model (AUROC: 0.676). Both models flagged patients with a similar distribution of race and sex; however, our light gradient boosting and random forest models were more inclusive, flagging more patients among younger age groups. The Epic models were more sensitive to identifying patients with an average lower zip income. Our 48-hour models were powered by novel features at various levels: patient (weight change over 365 days, depression symptoms, laboratory values, and cancer type), hospital (winter discharge and hospital admission type), and community (zip income and marital status of partner). CONCLUSION We developed and validated models comparable with the existing Epic 30-day readmission models with several novel actionable insights that could create service interventions deployed by the case management or discharge planning teams that may decrease readmission rates over time.
Collapse
Affiliation(s)
- Sy Hwang
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA
| | - Ryan Urbanowicz
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA
- Department of Biostatistics, Epidemiology, & Informatics, University of Pennsylvania, Philadelphia, PA
| | - Selah Lynch
- Department of Biostatistics, Epidemiology, & Informatics, University of Pennsylvania, Philadelphia, PA
| | - Tawnya Vernon
- Ann B. Barshinger Cancer Institute (ABBCI), University of Pennsylvania, Philadelphia, PA
| | - Kellie Bresz
- Ann B. Barshinger Cancer Institute (ABBCI), University of Pennsylvania, Philadelphia, PA
| | - Carolina Giraldo
- Ann B. Barshinger Cancer Institute (ABBCI), University of Pennsylvania, Philadelphia, PA
- Osteopathic Medicine, Philadelphia College of Osteopathic Medicine, Philadelphia, PA
| | - Erin Kennedy
- Department of Nursing, University of Pennsylvania, Philadelphia, PA
| | - Max Leabhart
- Ann B. Barshinger Cancer Institute (ABBCI), University of Pennsylvania, Philadelphia, PA
| | - Troy Bleacher
- Ann B. Barshinger Cancer Institute (ABBCI), University of Pennsylvania, Philadelphia, PA
| | - Michael R. Ripchinski
- Ann B. Barshinger Cancer Institute (ABBCI), University of Pennsylvania, Philadelphia, PA
| | - Danielle L. Mowery
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA
- Department of Biostatistics, Epidemiology, & Informatics, University of Pennsylvania, Philadelphia, PA
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA
| | - Randall A. Oyer
- Ann B. Barshinger Cancer Institute (ABBCI), University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
8
|
Woodward AA, Urbanowicz RJ, Naj AC, Moore JH. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 2022; 46:555-571. [PMID: 35924480 PMCID: PMC9669229 DOI: 10.1002/gepi.22497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 01/07/2023]
Abstract
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.
Collapse
Affiliation(s)
- Alexa A. Woodward
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Ryan J. Urbanowicz
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
9
|
Ramírez-del Real T, Martínez-García M, Márquez MF, López-Trejo L, Gutiérrez-Esparza G, Hernández-Lemus E. Individual Factors Associated With COVID-19 Infection: A Machine Learning Study. Front Public Health 2022; 10:912099. [PMID: 35844896 PMCID: PMC9279686 DOI: 10.3389/fpubh.2022.912099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 05/24/2022] [Indexed: 11/13/2022] Open
Abstract
The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the potential factors associated with COVID-19 infections, applying machine learning techniques, particularly random forest, chi-squared, xgboost, and rpart for feature selection; ROSE and SMOTE were used as resampling methods due to the existence of class imbalance. Similarly, machine and deep learning algorithms such as support vector machines, C4.5, random forest, rpart, and deep neural networks were explored during the train/test phase to select the best prediction model. The dataset used in this study contains clinical data, anthropometric measurements, and other health parameters related to smoking habits, alcohol consumption, quality of sleep, physical activity, and health status during confinement due to the pandemic associated with COVID-19. The results showed that the XGBoost model got the best features associated with COVID-19 infection, and random forest approximated the best predictive model with a balanced accuracy of 90.41% using SMOTE as a resampling technique. The model with the best performance provides a tool to help prevent contracting SARS-CoV-2 since the variables with the highest risk factor are detected, and some of them are, to a certain extent controllable.
Collapse
Affiliation(s)
- Tania Ramírez-del Real
- Cátedras Conacyt, National Council on Science and Technology, Mexico City, Mexico
- Center for Research in Geospatial Information Sciences, Mexico City, Mexico
| | - Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Manlio F. Márquez
- Clinical Research Division, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Laura López-Trejo
- Institute for Security and Social Services of State Workers, Mexico City, Mexico
| | - Guadalupe Gutiérrez-Esparza
- Cátedras Conacyt, National Council on Science and Technology, Mexico City, Mexico
- Clinical Research Division, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
10
|
Rane RP, de Man EF, Kim J, Görgen K, Tschorn M, Rapp MA, Banaschewski T, Bokde ALW, Desrivieres S, Flor H, Grigis A, Garavan H, Gowland PA, Brühl R, Martinot JL, Martinot MLP, Artiges E, Nees F, Papadopoulos Orfanos D, Lemaitre H, Paus T, Poustka L, Fröhner J, Robinson L, Smolka MN, Winterer J, Whelan R, Schumann G, Walter H, Heinz A, Ritter K. Structural differences in adolescent brains can predict alcohol misuse. eLife 2022; 11:e77545. [PMID: 35616520 PMCID: PMC9255959 DOI: 10.7554/elife.77545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/25/2022] [Indexed: 12/02/2022] Open
Abstract
Alcohol misuse during adolescence (AAM) has been associated with disruptive development of adolescent brains. In this longitudinal machine learning (ML) study, we could predict AAM significantly from brain structure (T1-weighted imaging and DTI) with accuracies of 73 -78% in the IMAGEN dataset (n∼1182). Our results not only show that structural differences in brain can predict AAM, but also suggests that such differences might precede AAM behavior in the data. We predicted 10 phenotypes of AAM at age 22 using brain MRI features at ages 14, 19, and 22. Binge drinking was found to be the most predictable phenotype. The most informative brain features were located in the ventricular CSF, and in white matter tracts of the corpus callosum, internal capsule, and brain stem. In the cortex, they were spread across the occipital, frontal, and temporal lobes and in the cingulate cortex. We also experimented with four different ML models and several confound control techniques. Support Vector Machine (SVM) with rbf kernel and Gradient Boosting consistently performed better than the linear models, linear SVM and Logistic Regression. Our study also demonstrates how the choice of the predicted phenotype, ML model, and confound correction technique are all crucial decisions in an explorative ML study analyzing psychiatric disorders with small effect sizes such as AAM.
Collapse
Affiliation(s)
- Roshan Prakash Rane
- Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational NeuroscienceBerlinGermany
| | - Evert Ferdinand de Man
- Faculty IV – Electrical Engineering and Computer Science, Technische Universität BerlinBerlinGermany
| | - JiHoon Kim
- Department of Education and Psychology, Freie Universität BerlinBerlinGermany
| | - Kai Görgen
- Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational NeuroscienceBerlinGermany
- Science of Intelligence, Research Cluster of ExcellenceBerlinGermany
| | - Mira Tschorn
- Social and Preventive Medicine, Department of Sports and Health Sciences, Intra-faculty unit “Cognitive Sciences”, Faculty of Human Science, and Faculty of Health Sciences Brandenburg, Research Area Services Research and e-Health, University of PotsdamPotsdamGermany
| | - Michael A Rapp
- Social and Preventive Medicine, Department of Sports and Health Sciences, Intra-faculty unit “Cognitive Sciences”, Faculty of Human Science, and Faculty of Health Sciences Brandenburg, Research Area Services Research and e-Health, University of PotsdamPotsdamGermany
| | - Tobias Banaschewski
- Department of Child and Adolescent Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg UniversityMannheimGermany
| | - Arun LW Bokde
- Discipline of Psychiatry, School of Medicine and Trinity College Institute of Neuroscience, Trinity College DublinDublinIreland
| | - Sylvane Desrivieres
- Centre for Population Neuroscience and Precision Medicine (PONS), Institute of Psychiatry, Psychology Neuroscience SGDP Centre, King’s College LondonLondonUnited Kingdom
| | - Herta Flor
- Institute of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg UniversityHeidelbergGermany
- Department of Psychology, School of Social Sciences, University of MannheimMannheimGermany
| | | | - Hugh Garavan
- Departments of Psychiatry and Psychology, University of VermontBurlingtonUnited States
| | - Penny A Gowland
- Sir Peter Mansfield Imaging Centre School of Physics and Astronomy, University of NottinghamNottinghamUnited Kingdom
| | | | - Jean-Luc Martinot
- Institut National de la Santé et de la Recherche Médicale, INSERM U A10 ”Trajectoires développementales en psychiatrie” Universite Paris-Saclay, Ecole Normale Supérieure Paris-Saclay, CNRS, Centre BorelliGif-sur-YvetteFrance
| | - Marie-Laure Paillere Martinot
- Institut National de la Santé et de la Recherche Médicale, INSERM U A10 ”Trajectoires développementales en psychiatrie” Universite Paris-Saclay, Ecole Normale Supérieure Paris-Saclay, CNRS, Centre BorelliGif-sur-YvetteFrance
- AP-HP Sorbonne Université, Department of Child and Adolescent Psychiatry, Pitié-Salpêtrière HospitalParisFrance
| | - Eric Artiges
- Institut National de la Santé et de la Recherche Médicale, INSERM U A10 ”Trajectoires développementales en psychiatrie” Universite Paris-Saclay, Ecole Normale Supérieure Paris-Saclay, CNRS, Centre BorelliGif-sur-YvetteFrance
- Psychiatry Department, EPS Barthélémy DurandEtampesFrance
| | - Frauke Nees
- Department of Child and Adolescent Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg UniversityMannheimGermany
- Institute of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg UniversityHeidelbergGermany
- PONS Research Group, Dept of Psychiatry and Psychotherapy, Campus Charite Mitte, Humboldt UniversityBerlinGermany
| | | | - Herve Lemaitre
- NeuroSpin, CEA, Université Paris-SaclayParisFrance
- Institut des Maladies Neurodégénératives, UMR 5293, CNRS, CEA, University of BordeauxBordeauxFrance
| | - Tomas Paus
- Department of Psychiatry, Faculty of Medicine and Centre Hospitalier Universitaire Sainte-Justine, University of MontrealMontrealCanada
- Departments of Psychiatry and Psychology, University of TorontoTorontoCanada
| | - Luise Poustka
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Medical Centre GöttingenGöttingenGermany
| | - Juliane Fröhner
- Department of Psychiatry and Neuroimaging Center, Technische Universität DresdenDresdenGermany
| | - Lauren Robinson
- Department of Psychological Medicine, Section for Eating Disorders, Institute of Psychiatry, Psychology and Neuroscience, King’s College LondonLondonUnited Kingdom
| | - Michael N Smolka
- Department of Psychiatry and Neuroimaging Center, Technische Universität DresdenDresdenGermany
| | - Jeanne Winterer
- Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational NeuroscienceBerlinGermany
- Department of Education and Psychology, Freie Universität BerlinBerlinGermany
| | - Robert Whelan
- School of Psychology and Global Brain Health Institute, Trinity College DublinDublinIreland
| | - Gunter Schumann
- PONS Research Group, Dept of Psychiatry and Psychotherapy, Campus Charite Mitte, Humboldt UniversityBerlinGermany
| | - Henrik Walter
- Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational NeuroscienceBerlinGermany
| | - Andreas Heinz
- Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational NeuroscienceBerlinGermany
| | - Kerstin Ritter
- Charité – Universitätsmedizin Berlin (corporate member of Freie Universiät at Berlin, Humboldt-Universiät at zu Berlin, and Berlin Institute of Health), Department of Psychiatry and Psychotherapy, Bernstein Center for Computational NeuroscienceBerlinGermany
| | | |
Collapse
|
11
|
Lukic YX, Teepe GW, Fleisch E, Kowatsch T. Breathing as Input Modality in a Gameful Breathing Training App: Development and Evaluation of Breeze 2 (Preprint). JMIR Serious Games 2022; 10:e39186. [PMID: 35972793 PMCID: PMC9428773 DOI: 10.2196/39186] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/28/2022] [Accepted: 07/21/2022] [Indexed: 11/13/2022] Open
Abstract
Background Slow-paced breathing training can have positive effects on physiological and psychological well-being. Unfortunately, use statistics indicate that adherence to breathing training apps is low. Recent work suggests that gameful breathing training may help overcome this challenge. Objective This study aimed to introduce and evaluate the gameful breathing training app Breeze 2 and its novel real-time breathing detection algorithm that enables the interactive components of the app. Methods We developed the breathing detection algorithm by using deep transfer learning to detect inhalation, exhalation, and nonbreathing sounds (including silence). An additional heuristic prolongs detected exhalations to stabilize the algorithm’s predictions. We evaluated Breeze 2 with 30 participants (women: n=14, 47%; age: mean 29.77, SD 7.33 years). Participants performed breathing training with Breeze 2 in 2 sessions with and without headphones. They answered questions regarding user engagement (User Engagement Scale Short Form [UES-SF]), perceived effectiveness (PE), perceived relaxation effectiveness, and perceived breathing detection accuracy. We used Wilcoxon signed-rank tests to compare the UES-SF, PE, and perceived relaxation effectiveness scores with neutral scores. Furthermore, we correlated perceived breathing detection accuracy with actual multi-class balanced accuracy to determine whether participants could perceive the actual breathing detection performance. We also conducted a repeated-measure ANOVA to investigate breathing detection differences in balanced accuracy with and without the heuristic and when classifying data captured from headphones and smartphone microphones. The analysis controlled for potential between-subject effects of the participants’ sex. Results Our results show scores that were significantly higher than neutral scores for the UES-SF (W=459; P<.001), PE (W=465; P<.001), and perceived relaxation effectiveness (W=358; P<.001). Perceived breathing detection accuracy correlated significantly with the actual multi-class balanced accuracy (r=0.51; P<.001). Furthermore, we found that the heuristic significantly improved the breathing detection balanced accuracy (F1,25=6.23; P=.02) and that detection performed better on data captured from smartphone microphones than than on data from headphones (F1,25=17.61; P<.001). We did not observe any significant between-subject effects of sex. Breathing detection without the heuristic reached a multi-class balanced accuracy of 74% on the collected audio recordings. Conclusions Most participants (28/30, 93%) perceived Breeze 2 as engaging and effective. Furthermore, breathing detection worked well for most participants, as indicated by the perceived detection accuracy and actual detection accuracy. In future work, we aim to use the collected breathing sounds to improve breathing detection with regard to its stability and performance. We also plan to use Breeze 2 as an intervention tool in various studies targeting the prevention and management of noncommunicable diseases.
Collapse
Affiliation(s)
- Yanick Xavier Lukic
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Gisbert Wilhelm Teepe
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Elgar Fleisch
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
- Centre for Digital Health Interventions, Institute of Technology Management, University of St.Gallen, St.Gallen, Switzerland
| | - Tobias Kowatsch
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- School of Medicine, University of St.Gallen, St.Gallen, Switzerland
| |
Collapse
|
12
|
Nazmi S, Yan X, Homaifar A, Anwar M. Multi-label classification with local pairwise and high-order label correlations using graph partitioning. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
13
|
Yang Z, Benhabiles H, Hammoudi K, Windal F, He R, Collard D. A generalized deep learning-based framework for assistance to the human malaria diagnosis from microscopic images. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06604-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Vidula MK, Orlenko A, Zhao L, Salvador L, Small AM, Horton E, Cohen JB, Adusumalli S, Denduluri S, Kobayashi T, Hyman M, Fiorilli P, Magro C, Singh B, Pourmussa B, Greczylo C, Basso M, Ebert C, Yarde M, Li Z, Cvijic ME, Wang Z, Walsh A, Maranville J, Kick E, Luettgen J, Adam L, Schafer P, Ramirez-Valle F, Seiffert D, Moore JH, Gordon D, Chirinos JA. Plasma biomarkers associated with adverse outcomes in patients with calcific aortic stenosis. Eur J Heart Fail 2021; 23:2021-2032. [PMID: 34632675 DOI: 10.1002/ejhf.2361] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 09/29/2021] [Accepted: 10/06/2021] [Indexed: 12/25/2022] Open
Abstract
AIMS Enhanced risk stratification of patients with aortic stenosis (AS) is necessary to identify patients at high risk for adverse outcomes, and may allow for better management of patient subgroups at high risk of myocardial damage. The objective of this study was to identify plasma biomarkers and multimarker profiles associated with adverse outcomes in AS. METHODS AND RESULTS We studied 708 patients with calcific AS and measured 49 biomarkers using a Luminex platform. We studied the correlation between biomarkers and the risk of (i) death and (ii) death or heart failure-related hospital admission (DHFA). We also utilized machine-learning methods (a tree-based pipeline optimizer platform) to develop multimarker models associated with the risk of death and DHFA. In this cohort with a median follow-up of 2.8 years, multiple biomarkers were significantly predictive of death in analyses adjusted for clinical confounders, including tumour necrosis factor (TNF)-α [hazard ratio (HR) 1.28, P < 0.0001], TNF receptor 1 (TNFRSF1A; HR 1.38, P < 0.0001), fibroblast growth factor (FGF)-23 (HR 1.22, P < 0.0001), N-terminal pro B-type natriuretic peptide (NT-proBNP) (HR 1.58, P < 0.0001), matrix metalloproteinase-7 (HR 1.24, P = 0.0002), syndecan-1 (HR 1.27, P = 0.0002), suppression of tumorigenicity-2 (ST2) (IL1RL1; HR 1.22, P = 0.0002), interleukin (IL)-8 (CXCL8; HR 1.22, P = 0.0005), pentraxin (PTX)-3 (HR 1.17, P = 0.001), neutrophil gelatinase-associated lipocalin (LCN2; HR 1.18, P < 0.0001), osteoprotegerin (OPG) (TNFRSF11B; HR 1.26, P = 0.0002), and endostatin (COL18A1; HR 1.28, P = 0.0012). Several biomarkers were also significantly predictive of DHFA in adjusted analyses including FGF-23 (HR 1.36, P < 0.0001), TNF-α (HR 1.26, P < 0.0001), TNFR1 (HR 1.34, P < 0.0001), angiopoietin-2 (HR 1.26, P < 0.0001), syndecan-1 (HR 1.23, P = 0.0006), ST2 (HR 1.27, P < 0.0001), IL-8 (HR 1.18, P = 0.0009), PTX-3 (HR 1.18, P = 0.0002), OPG (HR 1.20, P = 0.0013), and NT-proBNP (HR 1.63, P < 0.0001). Machine-learning multimarker models were strongly associated with adverse outcomes (mean 1-year probability of death of 0%, 2%, and 60%; mean 1-year probability of DHFA of 0%, 4%, 97%; P < 0.0001). In these models, IL-6 (a biomarker of inflammation) and FGF-23 (a biomarker of calcification) emerged as the biomarkers of highest importance. CONCLUSIONS Plasma biomarkers are strongly associated with the risk of adverse outcomes in patients with AS. Biomarkers of inflammation and calcification were most strongly related to prognosis.
Collapse
Affiliation(s)
- Mahesh K Vidula
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Alena Orlenko
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Lei Zhao
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Lisa Salvador
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Aeron M Small
- Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Edward Horton
- Department of Internal Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Jordana B Cohen
- Renal-Electrolyte and Hypertension Division, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Srinath Adusumalli
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Srinivas Denduluri
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Taisei Kobayashi
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Matthew Hyman
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Paul Fiorilli
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Caroline Magro
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Bibi Singh
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Bianca Pourmussa
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Candy Greczylo
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Michael Basso
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | - Melissa Yarde
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Zhuyin Li
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | - Zhaoqing Wang
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Alice Walsh
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | - Ellen Kick
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | - Leonard Adam
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Peter Schafer
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | | | | | - Jason H Moore
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - David Gordon
- Bristol Myers Squibb Company, Lawrenceville, NJ, USA
| | - Julio A Chirinos
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.,University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| |
Collapse
|
15
|
Grzesiak E, Bent B, McClain MT, Woods CW, Tsalik EL, Nicholson BP, Veldman T, Burke TW, Gardener Z, Bergstrom E, Turner RB, Chiu C, Doraiswamy PM, Hero A, Henao R, Ginsburg GS, Dunn J. Assessment of the Feasibility of Using Noninvasive Wearable Biometric Monitoring Sensors to Detect Influenza and the Common Cold Before Symptom Onset. JAMA Netw Open 2021; 4:e2128534. [PMID: 34586364 PMCID: PMC8482058 DOI: 10.1001/jamanetworkopen.2021.28534] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
IMPORTANCE Currently, there are no presymptomatic screening methods to identify individuals infected with a respiratory virus to prevent disease spread and to predict their trajectory for resource allocation. OBJECTIVE To evaluate the feasibility of using noninvasive, wrist-worn wearable biometric monitoring sensors to detect presymptomatic viral infection after exposure and predict infection severity in patients exposed to H1N1 influenza or human rhinovirus. DESIGN, SETTING, AND PARTICIPANTS The cohort H1N1 viral challenge study was conducted during 2018; data were collected from September 11, 2017, to May 4, 2018. The cohort rhinovirus challenge study was conducted during 2015; data were collected from September 14 to 21, 2015. A total of 39 adult participants were recruited for the H1N1 challenge study, and 24 adult participants were recruited for the rhinovirus challenge study. Exclusion criteria for both challenges included chronic respiratory illness and high levels of serum antibodies. Participants in the H1N1 challenge study were isolated in a clinic for a minimum of 8 days after inoculation. The rhinovirus challenge took place on a college campus, and participants were not isolated. EXPOSURES Participants in the H1N1 challenge study were inoculated via intranasal drops of diluted influenza A/California/03/09 (H1N1) virus with a mean count of 106 using the median tissue culture infectious dose (TCID50) assay. Participants in the rhinovirus challenge study were inoculated via intranasal drops of diluted human rhinovirus strain type 16 with a count of 100 using the TCID50 assay. MAIN OUTCOMES AND MEASURES The primary outcome measures included cross-validated performance metrics of random forest models to screen for presymptomatic infection and predict infection severity, including accuracy, precision, sensitivity, specificity, F1 score, and area under the receiver operating characteristic curve (AUC). RESULTS A total of 31 participants with H1N1 (24 men [77.4%]; mean [SD] age, 34.7 [12.3] years) and 18 participants with rhinovirus (11 men [61.1%]; mean [SD] age, 21.7 [3.1] years) were included in the analysis after data preprocessing. Separate H1N1 and rhinovirus detection models, using only data on wearble devices as input, were able to distinguish between infection and noninfection with accuracies of up to 92% for H1N1 (90% precision, 90% sensitivity, 93% specificity, and 90% F1 score, 0.85 [95% CI, 0.70-1.00] AUC) and 88% for rhinovirus (100% precision, 78% sensitivity, 100% specificity, 88% F1 score, and 0.96 [95% CI, 0.85-1.00] AUC). The infection severity prediction model was able to distinguish between mild and moderate infection 24 hours prior to symptom onset with an accuracy of 90% for H1N1 (88% precision, 88% sensitivity, 92% specificity, 88% F1 score, and 0.88 [95% CI, 0.72-1.00] AUC) and 89% for rhinovirus (100% precision, 75% sensitivity, 100% specificity, 86% F1 score, and 0.95 [95% CI, 0.79-1.00] AUC). CONCLUSIONS AND RELEVANCE This cohort study suggests that the use of a noninvasive, wrist-worn wearable device to predict an individual's response to viral exposure prior to symptoms is feasible. Harnessing this technology would support early interventions to limit presymptomatic spread of viral respiratory infections, which is timely in the era of COVID-19.
Collapse
Affiliation(s)
- Emilia Grzesiak
- Biomedical Engineering Department, Duke University, Durham, North Carolina
| | - Brinnae Bent
- Biomedical Engineering Department, Duke University, Durham, North Carolina
| | - Micah T. McClain
- Duke Center for Applied Genomics and Precision Medicine, Duke University Medical Center, Durham, North Carolina
| | - Christopher W. Woods
- Duke Center for Applied Genomics and Precision Medicine, Duke University Medical Center, Durham, North Carolina
- Durham Veterans Affairs Medical Center, Durham, North Carolina
- Department of Medicine, Duke Global Health Institute, Durham, North Carolina
| | - Ephraim L. Tsalik
- Duke Center for Applied Genomics and Precision Medicine, Duke University Medical Center, Durham, North Carolina
- Durham Veterans Affairs Medical Center, Durham, North Carolina
| | | | - Timothy Veldman
- Department of Medicine, Duke Global Health Institute, Durham, North Carolina
| | - Thomas W. Burke
- Duke Center for Applied Genomics and Precision Medicine, Duke University Medical Center, Durham, North Carolina
| | - Zoe Gardener
- Department of Infectious Disease, Imperial College London, London, United Kingdom
| | - Emma Bergstrom
- Department of Infectious Disease, Imperial College London, London, United Kingdom
| | - Ronald B. Turner
- Department of Pediatrics, University of Virginia School of Medicine, Charlottesville
| | - Christopher Chiu
- Department of Infectious Disease, Imperial College London, London, United Kingdom
| | - P. Murali Doraiswamy
- Department of Psychiatry, Duke University School of Medicine, Durham, North Carolina
- Department of Medicine, Duke University School of Medicine, Durham, North Carolina
| | - Alfred Hero
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor
| | - Ricardo Henao
- Duke Center for Applied Genomics and Precision Medicine, Duke University Medical Center, Durham, North Carolina
| | - Geoffrey S. Ginsburg
- Duke Center for Applied Genomics and Precision Medicine, Duke University Medical Center, Durham, North Carolina
| | - Jessilyn Dunn
- Biomedical Engineering Department, Duke University, Durham, North Carolina
- Department of Medicine, Duke University School of Medicine, Durham, North Carolina
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina
| |
Collapse
|
16
|
Ancien F, Pucci F, Rooman M. In Silico Analysis of the Molecular-Level Impact of SMPD1 Variants on Niemann-Pick Disease Severity. Int J Mol Sci 2021; 22:4516. [PMID: 33925997 PMCID: PMC8123603 DOI: 10.3390/ijms22094516] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 04/10/2021] [Accepted: 04/20/2021] [Indexed: 12/12/2022] Open
Abstract
Sphingomyelin phosphodiesterase (SMPD1) is a key enzyme in the sphingolipid metabolism. Genetic SMPD1 variants have been related to the Niemann-Pick lysosomal storage disorder, which has different degrees of phenotypic severity ranging from severe symptomatology involving the central nervous system (type A) to milder ones (type B). They have also been linked to neurodegenerative disorders such as Parkinson and Alzheimer. In this paper, we leveraged structural, evolutionary and stability information on SMPD1 to predict and analyze the impact of variants at the molecular level. We developed the SMPD1-ZooM algorithm, which is able to predict with good accuracy whether variants cause Niemann-Pick disease and its phenotypic severity; the predictor is freely available for download. We performed a large-scale analysis of all possible SMPD1 variants, which led us to identify protein regions that are either robust or fragile with respect to amino acid variations, and show the importance of aromatic-involving interactions in SMPD1 function and stability. Our study also revealed a good correlation between SMPD1-ZooM scores and in vitro loss of SMPD1 activity. The understanding of the molecular effects of SMPD1 variants is of crucial importance to improve genetic screening of SMPD1-related disorders and to develop personalized treatments that restore SMPD1 functionality.
Collapse
Affiliation(s)
- François Ancien
- 3BIO—Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue F. Roosevelt 50, 1050 Brussels, Belgium; (F.A.); (F.P.)
- (IB)—Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| | - Fabrizio Pucci
- 3BIO—Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue F. Roosevelt 50, 1050 Brussels, Belgium; (F.A.); (F.P.)
- (IB)—Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| | - Marianne Rooman
- 3BIO—Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue F. Roosevelt 50, 1050 Brussels, Belgium; (F.A.); (F.P.)
- (IB)—Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| |
Collapse
|
17
|
Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11031098] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.
Collapse
|
18
|
Jojoa Acosta MF, Caballero Tovar LY, Garcia-Zapirain MB, Percybrooks WS. Melanoma diagnosis using deep learning techniques on dermatoscopic images. BMC Med Imaging 2021; 21:6. [PMID: 33407213 PMCID: PMC7789790 DOI: 10.1186/s12880-020-00534-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 12/08/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Melanoma has become more widespread over the past 30 years and early detection is a major factor in reducing mortality rates associated with this type of skin cancer. Therefore, having access to an automatic, reliable system that is able to detect the presence of melanoma via a dermatoscopic image of lesions and/or skin pigmentation can be a very useful tool in the area of medical diagnosis. METHODS Among state-of-the-art methods used for automated or computer assisted medical diagnosis, attention should be drawn to Deep Learning based on Convolutional Neural Networks, wherewith segmentation, classification and detection systems for several diseases have been implemented. The method proposed in this paper involves an initial stage that automatically crops the region of interest within a dermatoscopic image using the Mask and Region-based Convolutional Neural Network technique, and a second stage based on a ResNet152 structure, which classifies lesions as either "benign" or "malignant". RESULTS Training, validation and testing of the proposed model was carried out using the database associated to the challenge set out at the 2017 International Symposium on Biomedical Imaging. On the test data set, the proposed model achieves an increase in accuracy and balanced accuracy of 3.66% and 9.96%, respectively, with respect to the best accuracy and the best sensitivity/specificity ratio reported to date for melanoma detection in this challenge. Additionally, unlike previous models, the specificity and sensitivity achieve a high score (greater than 0.8) simultaneously, which indicates that the model is good for accurate discrimination between benign and malignant lesion, not biased towards any of those classes. CONCLUSIONS The results achieved with the proposed model suggest a significant improvement over the results obtained in the state of the art as far as performance of skin lesion classifiers (malignant/benign) is concerned.
Collapse
Affiliation(s)
| | | | | | - Winston Spencer Percybrooks
- Department of Electrical and Electronics Engineering, Universidad del Norte, Km.5 Vía Puerto Colombia, Barranquilla, Colombia
| |
Collapse
|
19
|
|
20
|
Nguyen HH, Saarakkala S, Blaschko MB, Tiulpin A. Semixup: In- and Out-of-Manifold Regularization for Deep Semi-Supervised Knee Osteoarthritis Severity Grading From Plain Radiographs. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:4346-4356. [PMID: 32804644 DOI: 10.1109/tmi.2020.3017007] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Knee osteoarthritis (OA) is one of the highest disability factors in the world. This musculoskeletal disorder is assessed from clinical symptoms, and typically confirmed via radiographic assessment. This visual assessment done by a radiologist requires experience, and suffers from moderate to high inter-observer variability. The recent literature has shown that deep learning methods can reliably perform the OA severity assessment according to the gold standard Kellgren-Lawrence (KL) grading system. However, these methods require large amounts of labeled data, which are costly to obtain. In this study, we propose the Semixup algorithm, a semi-supervised learning (SSL) approach to leverage unlabeled data. Semixup relies on consistency regularization using in- and out-of-manifold samples, together with interpolated consistency. On an independent test set, our method significantly outperformed other state-of-the-art SSL methods in most cases. Finally, when compared to a well-tuned fully supervised baseline that yielded a balanced accuracy (BA) of 70.9 ± 0.8% on the test set, Semixup had comparable performance - BA of 71 ± 0.8% ( p=0.368 ) while requiring 6 times less labeled data. These results show that our proposed SSL method allows building fully automatic OA severity assessment tools with datasets that are available outside research settings.
Collapse
|
21
|
Tibrewala R, Ozhinsky E, Shah R, Flament I, Crossley K, Srinivasan R, Souza R, Link TM, Pedoia V, Majumdar S. Computer-Aided Detection AI Reduces Interreader Variability in Grading Hip Abnormalities With MRI. J Magn Reson Imaging 2020; 52:1163-1172. [PMID: 32293775 PMCID: PMC10230649 DOI: 10.1002/jmri.27164] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 03/19/2020] [Accepted: 03/20/2020] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND Accurate interpretation of hip MRI is time-intensive and difficult, prone to inter- and intrareviewer variability, and lacks a universally accepted grading scale to evaluate morphological abnormalities. PURPOSE To 1) develop and evaluate a deep-learning-based model for binary classification of hip osteoarthritis (OA) morphological abnormalities on MR images, and 2) develop an artificial intelligence (AI)-based assist tool to find if using the model predictions improves interreader agreement in hip grading. STUDY TYPE Retrospective study aimed to evaluate a technical development. POPULATION A total of 764 MRI volumes (364 patients) obtained from two studies (242 patients from LASEM [FORCe] and 122 patients from UCSF), split into a 65-25-10% train, validation, test set for network training. FIELD STRENGTH/SEQUENCE 3T MRI, 2D T2 FSE, PD SPAIR. ASSESSMENT Automatic binary classification of cartilage lesions, bone marrow edema-like lesions, and subchondral cyst-like lesions using the MRNet, interreader agreement before and after using network predictions. STATISTICAL TESTS Receiver operating characteristic (ROC) curve, area under curve (AUC), specificity and sensitivity, and balanced accuracy. RESULTS For cartilage lesions, bone marrow edema-like lesions and subchondral cyst-like lesions the AUCs were: 0.80 (95% confidence interval [CI] 0.65, 0.95), 0.84 (95% CI 0.67, 1.00), and 0.77 (95% CI 0.66, 0.85), respectively. The sensitivity and specificity of the radiologist for binary classification were: 0.79 (95% CI 0.65, 0.93) and 0.80 (95% CI 0.59, 1.02), 0.40 (95% CI -0.02, 0.83) and 0.72 (95% CI 0.59, 0.86), 0.75 (95% CI 0.45, 1.05) and 0.88 (95% CI 0.77, 0.98). The interreader balanced accuracy increased from 53%, 71% and 56% to 60%, 73% and 68% after using the network predictions and saliency maps. DATA CONCLUSION We have shown that a deep-learning approach achieved high performance in clinical classification tasks on hip MR images, and that using the predictions from the deep-learning model improved the interreader agreement in all pathologies. LEVEL OF EVIDENCE 3 TECHNICAL EFFICACY STAGE: 1 J. Magn. Reson. Imaging 2020;52:1163-1172.
Collapse
Affiliation(s)
- Radhika Tibrewala
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
| | - Eugene Ozhinsky
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
| | - Rutwik Shah
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
| | - Io Flament
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
| | - Kay Crossley
- La Trobe Sport and Exercise Medicine Research Centre, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, Australia
| | - Ramya Srinivasan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
| | - Richard Souza
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
- Department of Physical Therapy and Rehabilitation Science, University of California San Francisco, San Francisco, California, USA
| | - Thomas M. Link
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
| | - Valentina Pedoia
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
| | - Sharmila Majumdar
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California, USA
| |
Collapse
|
22
|
Chirinos JA, Orlenko A, Zhao L, Basso MD, Cvijic ME, Li Z, Spires TE, Yarde M, Wang Z, Seiffert DA, Prenner S, Zamani P, Bhattacharya P, Kumar A, Margulies KB, Car BD, Gordon DA, Moore JH, Cappola TP. Multiple Plasma Biomarkers for Risk Stratification in Patients With Heart Failure and Preserved Ejection Fraction. J Am Coll Cardiol 2020; 75:1281-1295. [PMID: 32192654 PMCID: PMC7147356 DOI: 10.1016/j.jacc.2019.12.069] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 12/22/2019] [Accepted: 12/23/2019] [Indexed: 12/30/2022]
Abstract
BACKGROUND Better risk stratification strategies are needed to enhance clinical care and trial design in heart failure with preserved ejection fraction (HFpEF). OBJECTIVES The purpose of this study was to assess the value of a targeted plasma multi-marker approach to enhance our phenotypic characterization and risk prediction in HFpEF. METHODS In this study, the authors measured 49 plasma biomarkers from TOPCAT (Treatment of Preserved Cardiac Function Heart Failure With an Aldosterone Antagonist) trial participants (n = 379) using a Multiplex assay. The relationship between biomarkers and the risk of all-cause death or heart failure-related hospital admission (DHFA) was assessed. A tree-based pipeline optimizer platform was used to generate a multimarker predictive model for DHFA. We validated the model in an independent cohort of HFpEF patients enrolled in the PHFS (Penn Heart Failure Study) (n = 156). RESULTS Two large, tightly related dominant biomarker clusters were found, which included biomarkers of fibrosis/tissue remodeling, inflammation, renal injury/dysfunction, and liver fibrosis. Other clusters were composed of neurohormonal regulators of mineral metabolism, intermediary metabolism, and biomarkers of myocardial injury. Multiple biomarkers predicted incident DHFA, including 2 biomarkers related to mineral metabolism/calcification (fibroblast growth factor-23 and OPG [osteoprotegerin]), 3 inflammatory biomarkers (tumor necrosis factor-alpha, sTNFRI [soluble tumor necrosis factor-receptor I], and interleukin-6), YKL-40 (related to liver injury and inflammation), 2 biomarkers related to intermediary metabolism and adipocyte biology (fatty acid binding protein-4 and growth differentiation factor-15), angiopoietin-2 (related to angiogenesis), matrix metalloproteinase-7 (related to extracellular matrix turnover), ST-2, and N-terminal pro-B-type natriuretic peptide. A machine-learning-derived model using a combination of biomarkers was strongly predictive of the risk of DHFA (standardized hazard ratio: 2.85; 95% confidence interval: 2.03 to 4.02; p < 0.0001) and markedly improved the risk prediction when added to the MAGGIC (Meta-Analysis Global Group in Chronic Heart Failure Risk Score) risk score. In an independent cohort (PHFS), the model strongly predicted the risk of DHFA (standardized hazard ratio: 2.74; 95% confidence interval: 1.93 to 3.90; p < 0.0001), which was also independent of the MAGGIC risk score. CONCLUSIONS Various novel circulating biomarkers in key pathophysiological domains are predictive of outcomes in HFpEF, and a multimarker approach coupled with machine-learning represents a promising strategy for enhancing risk stratification in HFpEF.
Collapse
Affiliation(s)
- Julio A Chirinos
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania.
| | - Alena Orlenko
- University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Lei Zhao
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | | | | | - Zhuyin Li
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | | | - Melissa Yarde
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | - Zhaoqing Wang
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | | | - Stuart Prenner
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Payman Zamani
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Priyanka Bhattacharya
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Anupam Kumar
- Vanderbilt University Medical Center, Nashville, Tennessee
| | - Kenneth B Margulies
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Bruce D Car
- Bristol-Myers Squibb Company, Lawrenceville, New Jersey
| | | | - Jason H Moore
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Thomas P Cappola
- Division of Cardiovascular Medicine, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania; University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| |
Collapse
|
23
|
Tantin A, Bou Assi E, van Asselt E, Hached S, Sawan M. Predicting urinary bladder voiding by means of a linear discriminant analysis: Validation in rats. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101667] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
24
|
Borna K, Hoseini S, Aghaei MAM. Customer satisfaction prediction with Michigan-style learning classifier system. SN APPLIED SCIENCES 2019. [DOI: 10.1007/s42452-019-1493-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Abstract
Many different classification algorithms can be use in order to analyze, classify and predict data. Learning classifier system (LCS) which is known as a genetic base machine learning system, combines the machine learning with evolutionary computing and other heuristics to produce an adaptive system that learns to solve a particular problem. This paper uses the Michigan style LCS, in the context of bank customer satisfaction to classify customers into two different groups: unsatisfied/satisfied customers. Three different Rule Compaction strategies are used to compare the rule population’s accuracy and micro/macro population size. The result specifies features that mostly influence prediction.
Collapse
|
25
|
Hanley JP, Rizzo DM, Buzas JS, Eppstein MJ. A Tandem Evolutionary Algorithm for Identifying Causal Rules from Complex Data. EVOLUTIONARY COMPUTATION 2019; 28:87-114. [PMID: 30817200 DOI: 10.1162/evco_a_00252] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We propose a new evolutionary approach for discovering causal rules in complex classification problems from batch data. Key aspects include (a) the use of a hypergeometric probability mass function as a principled statistic for assessing fitness that quantifies the probability that the observed association between a given clause and target class is due to chance, taking into account the size of the dataset, the amount of missing data, and the distribution of outcome categories, (b) tandem age-layered evolutionary algorithms for evolving parsimonious archives of conjunctive clauses, and disjunctions of these conjunctions, each of which have probabilistically significant associations with outcome classes, and (c) separate archive bins for clauses of different orders, with dynamically adjusted order-specific thresholds. The method is validated on majority-on and multiplexer benchmark problems exhibiting various combinations of heterogeneity, epistasis, overlap, noise in class associations, missing data, extraneous features, and imbalanced classes. We also validate on a more realistic synthetic genome dataset with heterogeneity, epistasis, extraneous features, and noise. In all synthetic epistatic benchmarks, we consistently recover the true causal rule sets used to generate the data. Finally, we discuss an application to a complex real-world survey dataset designed to inform possible ecohealth interventions for Chagas disease.
Collapse
Affiliation(s)
- John P Hanley
- Department of Civil and Environmental Engineering, University of Vermont, Burlington, 05405, USA
| | - Donna M Rizzo
- Department of Civil and Environmental Engineering, University of Vermont, Burlington, 05405, USA
| | - Jeffrey S Buzas
- Department of Mathematics and Statistics, University of Vermont, Burlington, 05405, USA
| | - Margaret J Eppstein
- Department of Computer Science, University of Vermont, Burlington, 05405, USA
| |
Collapse
|
26
|
Preoperative and postoperative prediction of long-term meningioma outcomes. PLoS One 2018; 13:e0204161. [PMID: 30235308 PMCID: PMC6147484 DOI: 10.1371/journal.pone.0204161] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 08/20/2018] [Indexed: 12/22/2022] Open
Abstract
Background Meningiomas are stratified according to tumor grade and extent of resection, often in isolation of other clinical variables. Here, we use machine learning (ML) to integrate demographic, clinical, radiographic and pathologic data to develop predictive models for meningioma outcomes. Methods and findings We developed a comprehensive database containing information from 235 patients who underwent surgery for 257 meningiomas at a single institution from 1990 to 2015. The median follow-up was 4.3 years, and resection specimens were re-evaluated according to current diagnostic criteria, revealing 128 WHO grade I, 104 grade II and 25 grade III meningiomas. A series of ML algorithms were trained and tuned by nested resampling to create models based on preoperative features, conventional postoperative features, or both. We compared different algorithms’ accuracy as well as the unique insights they offered into the data. Machine learning models restricted to preoperative information, such as patient demographics and radiographic features, had similar accuracy for predicting local failure (AUC = 0.74) or overall survival (AUC = 0.68) as models based on meningioma grade and extent of resection (AUC = 0.73 and AUC = 0.72, respectively). Integrated models incorporating all available demographic, clinical, radiographic and pathologic data provided the most accurate estimates (AUC = 0.78 and AUC = 0.74, respectively). From these models, we developed decision trees and nomograms to estimate the risks of local failure or overall survival for meningioma patients. Conclusions Clinical information has been historically underutilized in the prediction of meningioma outcomes. Predictive models trained on preoperative clinical data perform comparably to conventional models trained on meningioma grade and extent of resection. Combination of all available information can help stratify meningioma patients more accurately.
Collapse
|
27
|
Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. Relief-based feature selection: Introduction and review. J Biomed Inform 2018; 85:189-203. [PMID: 30031057 PMCID: PMC6299836 DOI: 10.1016/j.jbi.2018.07.014] [Citation(s) in RCA: 314] [Impact Index Per Article: 52.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 06/29/2018] [Accepted: 07/14/2018] [Indexed: 01/25/2023]
Abstract
Feature selection plays a critical role in biomedical data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Relief-based algorithms (RBAs), a unique family of filter-style feature selection algorithms that have gained appeal by striking an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.
Collapse
Affiliation(s)
- Ryan J Urbanowicz
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | - William La Cava
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Randal S Olson
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
28
|
Urbanowicz RJ, Olson RS, Schmitt P, Meeker M, Moore JH. Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 2018; 85:168-188. [PMID: 30030120 PMCID: PMC6299838 DOI: 10.1016/j.jbi.2018.07.015] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 06/30/2018] [Accepted: 07/14/2018] [Indexed: 11/23/2022]
Abstract
Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. 'omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. genetic variants, gene expression, and clinical data) and (5) are computationally tractable. To that end, this work examines a set of filter-style feature selection algorithms inspired by the 'Relief' algorithm, i.e. Relief-Based algorithms (RBAs). We implement and expand these RBAs in an open source framework called ReBATE (Relief-Based Algorithm Training Environment). We apply a comprehensive genetic simulation study comparing existing RBAs, a proposed RBA called MultiSURF, and other established feature selection methods, over a variety of problems. The results of this study (1) support the assertion that RBAs are particularly flexible, efficient, and powerful feature selection methods that differentiate relevant features having univariate, multivariate, epistatic, or heterogeneous associations, (2) confirm the efficacy of expansions for classification vs. regression, discrete vs. continuous features, missing data, multiple classes, or class imbalance, (3) identify previously unknown limitations of specific RBAs, and (4) suggest that while MultiSURF∗ performs best for explicitly identifying pure 2-way interactions, MultiSURF yields the most reliable feature selection performance across a wide range of problem types.
Collapse
Affiliation(s)
- Ryan J Urbanowicz
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Randal S Olson
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Peter Schmitt
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
29
|
Abstract
Evolutionary computation (EC) has been widely applied to biological and biomedical data. The practice of EC involves the tuning of many parameters, such as population size, generation count, selection size, and crossover and mutation rates. Through an extensive series of experiments over multiple evolutionary algorithm implementations and 25 problems we show that parameter space tends to be rife with viable parameters, at least for the problems studied herein. We discuss the implications of this finding in practice for the researcher employing EC.
Collapse
|
30
|
Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min 2017; 10:36. [PMID: 29238404 PMCID: PMC5725843 DOI: 10.1186/s13040-017-0154-4] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 11/07/2017] [Indexed: 11/10/2022] Open
Abstract
Background The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. Results The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. Conclusions This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Collapse
Affiliation(s)
- Randal S Olson
- Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104 PA USA
| | - William La Cava
- Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104 PA USA
| | - Patryk Orzechowski
- Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104 PA USA.,Department of Automatics and Biomedical Engineering, AGH University of Science and Technology, Kraków, Poland
| | - Ryan J Urbanowicz
- Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104 PA USA
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104 PA USA
| |
Collapse
|
31
|
Implications of the curse of dimensionality for supervised learning classifier systems: theoretical and empirical analyses. Pattern Anal Appl 2017. [DOI: 10.1007/s10044-017-0649-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
32
|
Iqbal M, Browne WN, Zhang M. Extending XCS with Cyclic Graphs for Scalability on Complex Boolean Problems. EVOLUTIONARY COMPUTATION 2015; 25:173-204. [PMID: 26406166 DOI: 10.1162/evco_a_00167] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A main research direction in the field of evolutionary machine learning is to develop a scalable classifier system to solve high-dimensional problems. Recently work has begun on autonomously reusing learned building blocks of knowledge to scale from low-dimensional problems to high-dimensional ones. An XCS-based classifier system, known as XCSCFC, has been shown to be scalable, through the addition of expression tree-like code fragments, to a limit beyond standard learning classifier systems. XCSCFC is especially beneficial if the target problem can be divided into a hierarchy of subproblems and each of them is solvable in a bottom-up fashion. However, if the hierarchy of subproblems is too deep, then XCSCFC becomes impractical because of the needed computational time and thus eventually hits a limit in problem size. A limitation in this technique is the lack of a cyclic representation, which is inherent in finite state machines (FSMs). However, the evolution of FSMs is a hard task owing to the combinatorially large number of possible states, connections, and interaction. Usually this requires supervised learning to minimize inappropriate FSMs, which for high-dimensional problems necessitates subsampling or incremental testing. To avoid these constraints, this work introduces a state-machine-based encoding scheme into XCS for the first time, termed XCSSMA. The proposed system has been tested on six complex Boolean problem domains: multiplexer, majority-on, carry, even-parity, count ones, and digital design verification problems. The proposed approach outperforms XCSCFA (an XCS that computes actions) and XCSF (an XCS that computes predictions) in three of the six problem domains, while the performance in others is similar. In addition, XCSSMA evolved, for the first time, compact and human readable general classifiers (i.e., solving any n-bit problems) for the even-parity and carry problem domains, demonstrating its ability to produce scalable solutions using a cyclic representation.
Collapse
Affiliation(s)
- Muhammad Iqbal
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| | - Will N Browne
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| | - Mengjie Zhang
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| |
Collapse
|
33
|
|