1
|
Samadi ME, Mirzaieazar H, Mitsos A, Schuppert A. Noisecut: a python package for noise-tolerant classification of binary data using prior knowledge integration and max-cut solutions. BMC Bioinformatics 2024; 25:155. [PMID: 38641616 PMCID: PMC11031902 DOI: 10.1186/s12859-024-05769-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 04/09/2024] [Indexed: 04/21/2024] Open
Abstract
BACKGROUND Classification of binary data arises naturally in many clinical applications, such as patient risk stratification through ICD codes. One of the key practical challenges in data classification using machine learning is to avoid overfitting. Overfitting in supervised learning primarily occurs when a model learns random variations from noisy labels in training data rather than the underlying patterns. While traditional methods such as regularization and early stopping have demonstrated effectiveness in interpolation tasks, addressing overfitting in the classification of binary data, in which predictions always amount to extrapolation, demands extrapolation-enhanced strategies. One such approach is hybrid mechanistic/data-driven modeling, which integrates prior knowledge on input features into the learning process, enhancing the model's ability to extrapolate. RESULTS We present NoiseCut, a Python package for noise-tolerant classification of binary data by employing a hybrid modeling approach that leverages solutions of defined max-cut problems. In a comparative analysis conducted on synthetically generated binary datasets, NoiseCut exhibits better overfitting prevention compared to the early stopping technique employed by different supervised machine learning algorithms. The noise tolerance of NoiseCut stems from a dropout strategy that leverages prior knowledge of input features and is further enhanced by the integration of max-cut problems into the learning process. CONCLUSIONS NoiseCut is a Python package for the implementation of hybrid modeling for the classification of binary data. It facilitates the integration of mechanistic knowledge on the input features into learning from data in a structured manner and proves to be a valuable classification tool when the available training data is noisy and/or limited in size. This advantage is especially prominent in medical and biomedical applications where data scarcity and noise are common challenges. The codebase, illustrations, and documentation for NoiseCut are accessible for download at https://pypi.org/project/noisecut/ . The implementation detailed in this paper corresponds to the version 0.2.1 release of the software.
Collapse
Affiliation(s)
- Moein E Samadi
- Institute for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
| | - Hedieh Mirzaieazar
- Institute for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
| | - Alexander Mitsos
- Process Systems Engineering (AVT.SVT), RWTH Aachen University, Aachen, Germany
| | - Andreas Schuppert
- Institute for Computational Biomedicine, RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
2
|
Xu Z, Xu X, Zhu X, Niu K, Dong J, He Z. Attention-Based Deep Learning Model for Prediction of Major Adverse Cardiovascular Events in Peritoneal Dialysis Patients. IEEE J Biomed Health Inform 2024; 28:1101-1109. [PMID: 38048232 DOI: 10.1109/jbhi.2023.3338729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2023]
Abstract
Major adverse cardiovascular events (MACE) encompass pivotal cardiovascular outcomes such as myocardial infarction, unstable angina, and cardiovascular-related mortality. Patients undergoing peritoneal dialysis (PD) exhibit specific cardiovascular risk factors during the treatment, which can escalate the likelihood of cardiovascular events. Hence, the prediction and key factor analysis of MACE have assumed paramount significance for peritoneal dialysis patients. Current pathological methodologies for prognosis prediction are not only costly but also cumbersome in effectively processing electronic health records (EHRs) data with high dimensionality, heterogeneity, and time series. Therefore in this study, we propose the CVEformer, an attention-based neural network designed to predict MACE and analyze risk factors. CVEformer leverages the self-attention mechanism to capture temporal correlations among time series variables, allowing for weighted integration of variables and estimation of the probability of MACE. CVEformer first captures the correlations among heterogeneous variables through attention scores. Then, it analyzes the correlations within the time series data to identify key risk variables and predict the probability of MACE. When trained and evaluated on data from a large cohort of peritoneal dialysis patients across multiple centers, CVEformer outperforms existing models in terms of predictive performance.
Collapse
|
3
|
An Y, Tang K, Wang J. Time-Aware Multi-Type Data Fusion Representation Learning Framework for Risk Prediction of Cardiovascular Diseases. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; PP:1-1. [PMID: 34618675 DOI: 10.1109/tcbb.2021.3118418] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Predicting the future risk of cardiovascular diseases from the historical Electronic Health Records (EHRs) is a significant research task in personalized healthcare fields. In recent years, many deep neural network-based methods have emerged, which model patient disease progression by capturing the temporal patterns in sequential visit data. However, existing methods usually cannot effectively integrate the features of heterogeneous clinical data, and do not fully consider the impact of patients age and irregular time interval between consecutive medical records on the patients disease development. To address these challenges, we propose a Time-Aware Multi-type Data fUsion Representation learning framework (TAMDUR) for CVDs risk prediction. In this framework, we design a time-aware decay function, which is based on the patients age and the elapsed time between visits, to model the disease progression pattern. A parallel combination of Bi LSTM and CNN is constructed to respectively learn the temporal and non-temporal features from various types of clinical data. Finally, a multi-type data fusion representation layer based on self-attention is utilized to integrate various features and their correlations to obtain the final patient representation. We evaluate our model on a real medical dataset, and the experimental results demonstrate that TAMDUR outperforms the state-of-the-art approaches.
Collapse
|
4
|
An Y, Huang N, Chen X, Wu F, Wang J. High-Risk Prediction of Cardiovascular Diseases via Attention-Based Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1093-1105. [PMID: 31425047 DOI: 10.1109/tcbb.2019.2935059] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
High-risk prediction of cardiovascular disease is of great significance and impendency in medical fields with the increasing phenomenon of sub-health these years. Most existing pathological methods for the prognosis prediction are either costly or prone to misjudgement. Therefore, plenty of automated models based on machine learning have been proposed to predict the onset of cardiovascular disease with the premorbid information of patients extracted from their historical Electronic Health Records (EHRs). However, it is a tough job to select proper features from longitudinal and heterogeneous EHRs, and also a great challenge to obtain accurate and robust representations for patients. In this paper, we propose an entirely end-to-end model called DeepRisk based on attention mechanism and deep neural networks, which can not only learn high-quality features automatically from EHRs, but also efficiently integrate heterogeneous and time-ordered medical data, and finally predict patients' risk of cardiovascular diseases. Experiments are carried out on a real medical dataset and results show that DeepRisk can significantly improve the high-risk prediction accuracy for cardiovascular disease compared with state-of-the-art approaches.
Collapse
|
5
|
Haug N, Deischinger C, Gyimesi M, Kautzky-Willer A, Thurner S, Klimek P. High-risk multimorbidity patterns on the road to cardiovascular mortality. BMC Med 2020; 18:44. [PMID: 32151252 PMCID: PMC7063814 DOI: 10.1186/s12916-020-1508-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 02/03/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Multimorbidity, the co-occurrence of two or more diseases in one patient, is a frequent phenomenon. Understanding how different diseases condition each other over the lifetime of a patient could significantly contribute to personalised prevention efforts. However, most of our current knowledge on the long-term development of the health of patients (their disease trajectories) is either confined to narrow time spans or specific (sets of) diseases. Here, we aim to identify decisive events that potentially determine the future disease progression of patients. METHODS Health states of patients are described by algorithmically identified multimorbidity patterns (groups of included or excluded diseases) in a population-wide analysis of 9,000,000 patient histories of hospital diagnoses observed over 17 years. Over time, patients might acquire new diagnoses that change their health state; they describe a disease trajectory. We measure the age- and sex-specific risks for patients that they will acquire certain sets of diseases in the future depending on their current health state. RESULTS In the present analysis, the population is described by a set of 132 different multimorbidity patterns. For elderly patients, we find 3 groups of multimorbidity patterns associated with low (yearly in-hospital mortality of 0.2-0.3%), medium (0.3-1%) and high in-hospital mortality (2-11%). We identify combinations of diseases that significantly increase the risk to reach the high-mortality health states in later life. For instance, in men (women) aged 50-59 diagnosed with diabetes and hypertension, the risk for moving into the high-mortality region within 1 year is increased by the factor of 1.96 ± 0.11 (2.60 ± 0.18) compared with all patients of the same age and sex, respectively, and by the factor of 2.09 ± 0.12 (3.04 ± 0.18) if additionally diagnosed with metabolic disorders. CONCLUSIONS Our approach can be used both to forecast future disease burdens, as well as to identify the critical events in the careers of patients which strongly determine their disease progression, therefore constituting targets for efficient prevention measures. We show that the risk for cardiovascular diseases increases significantly more in females than in males when diagnosed with diabetes, hypertension and metabolic disorders.
Collapse
Affiliation(s)
- Nina Haug
- Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spitalgasse 23, Vienna, A-1090, Austria.,Complexity Science Hub Vienna, Josefstädter Straße 39, Vienna, A-1080, Austria
| | - Carola Deischinger
- Gender Medicine Unit, Division of Endocrinology and Metabolism, Department of Internal Medicine III, Medical University of Vienna, Spitalgasse 23, Vienna, A-1090, Austria
| | - Michael Gyimesi
- Gesundheit Österreich GmbH, Stubenring 6, Vienna, A-1010, Austria
| | - Alexandra Kautzky-Willer
- Gender Medicine Unit, Division of Endocrinology and Metabolism, Department of Internal Medicine III, Medical University of Vienna, Spitalgasse 23, Vienna, A-1090, Austria
| | - Stefan Thurner
- Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spitalgasse 23, Vienna, A-1090, Austria.,Complexity Science Hub Vienna, Josefstädter Straße 39, Vienna, A-1080, Austria.,IIASA, Schloßplatz 1, Laxenburg, A-2361, Austria.,Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, 85701, NM, USA
| | - Peter Klimek
- Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spitalgasse 23, Vienna, A-1090, Austria. .,Complexity Science Hub Vienna, Josefstädter Straße 39, Vienna, A-1080, Austria.
| |
Collapse
|
8
|
Arandjelovic O. Intuitive and interpretable visual communication of a complex statistical model of disease progression and risk. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2017:4199-4202. [PMID: 29060823 DOI: 10.1109/embc.2017.8037782] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Computer science and machine learning in particular are increasingly lauded for their potential to aid medical practice. However, the highly technical nature of the state of the art techniques can be a major obstacle in their usability by health care professionals and thus, their adoption and actual practical benefit. In this paper we describe a software tool which focuses on the visualization of predictions made by a recently developed method which leverages data in the form of large scale electronic records for making diagnostic predictions. Guided by risk predictions, our tool allows the user to explore interactively different diagnostic trajectories, or display cumulative long term prognostics, in an intuitive and easily interpretable manner.
Collapse
|