1
|
Guazzo A, Atzeni M, Idi E, Trescato I, Tavazzi E, Longato E, Manera U, Chió A, Gromicho M, Alves I, de Carvalho M, Vettoretti M, Di Camillo B. Predicting clinical events characterizing the progression of amyotrophic lateral sclerosis via machine learning approaches using routine visits data: a feasibility study. BMC Med Inform Decis Mak 2024; 24:318. [PMID: 39472842 PMCID: PMC11523576 DOI: 10.1186/s12911-024-02719-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 10/10/2024] [Indexed: 11/02/2024] Open
Abstract
BACKGROUND Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that results in death within a short time span (3-5 years). One of the major challenges in treating ALS is its highly heterogeneous disease progression and the lack of effective prognostic tools to forecast it. The main aim of this study was, then, to test the feasibility of predicting relevant clinical outcomes that characterize the progression of ALS with a two-year prediction horizon via artificial intelligence techniques using routine visits data. METHODS Three classification problems were considered: predicting death (binary problem), predicting death or percutaneous endoscopic gastrostomy (PEG) (multiclass problem), and predicting death or non-invasive ventilation (NIV) (multiclass problem). Two supervised learning models, a logistic regression (LR) and a deep learning multilayer perceptron (MLP), were trained ensuring technical robustness and reproducibility. Moreover, to provide insights into model explainability and result interpretability, model coefficients for LR and Shapley values for both LR and MLP were considered to characterize the relationship between each variable and the outcome. RESULTS On the one hand, predicting death was successful as both models yielded F1 scores and accuracy well above 0.7. The model explainability analysis performed for this outcome allowed for the understanding of how different methodological approaches consider the input variables when performing the prediction. On the other hand, predicting death alongside PEG or NIV proved to be much more challenging (F1 scores and accuracy in the 0.4-0.6 interval). CONCLUSIONS In conclusion, predicting death due to ALS proved to be feasible. However, predicting PEG or NIV in a multiclass fashion proved to be unfeasible with these data, regardless of the complexity of the methodological approach. The observed results suggest a potential ceiling on the amount of information extractable from the database, e.g., due to the intrinsic difficulty of the prediction tasks at hand, or to the absence of crucial predictors that are, however, not currently collected during routine practice.
Collapse
Affiliation(s)
- Alessandro Guazzo
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Michele Atzeni
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Elena Idi
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Isotta Trescato
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Erica Tavazzi
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Enrico Longato
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Umberto Manera
- Department of Neurosciences Rita Levi Montalcini, University of Turin, Turin, Italy
| | - Adriano Chió
- Department of Neurosciences Rita Levi Montalcini, University of Turin, Turin, Italy
| | - Marta Gromicho
- Faculdade de Medicina, IMM J. L. Antunes, Universidade de Lisboa, Lisbon, Portugal
| | - Inês Alves
- Faculdade de Medicina, IMM J. L. Antunes, Universidade de Lisboa, Lisbon, Portugal
| | - Mamede de Carvalho
- Faculdade de Medicina, IMM J. L. Antunes, Universidade de Lisboa, Lisbon, Portugal
| | - Martina Vettoretti
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, Padua, Italy.
- Department of Comparative Biomedicine and Food Science, University of Padova, Padua, Italy.
| |
Collapse
|
2
|
M Amaral D, Soares DF, Gromicho M, de Carvalho M, Madeira SC, Tomás P, Aidos H. Temporal stratification of amyotrophic lateral sclerosis patients using disease progression patterns. Nat Commun 2024; 15:5717. [PMID: 38977678 PMCID: PMC11231290 DOI: 10.1038/s41467-024-49954-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 06/25/2024] [Indexed: 07/10/2024] Open
Abstract
Identifying groups of patients with similar disease progression patterns is key to understand disease heterogeneity, guide clinical decisions and improve patient care. In this paper, we propose a data-driven temporal stratification approach, ClusTric, combining triclustering and hierarchical clustering. The proposed approach enables the discovery of complex disease progression patterns not found by univariate temporal analyses. As a case study, we use Amyotrophic Lateral Sclerosis (ALS), a neurodegenerative disease with a non-linear and heterogeneous disease progression. In this context, we applied ClusTric to stratify a hospital-based population (Lisbon ALS Clinic dataset) and validate it in a clinical trial population. The results unravelled four clinically relevant disease progression groups: slow progressors, moderate bulbar and spinal progressors, and fast progressors. We compared ClusTric with a state-of-the-art method, showing its effectiveness in capturing the heterogeneity of ALS disease progression in a lower number of clinically relevant progression groups.
Collapse
Affiliation(s)
- Daniela M Amaral
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Diogo F Soares
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
| | - Marta Gromicho
- Instituto de Medicina Molecular and Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Mamede de Carvalho
- Instituto de Medicina Molecular and Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Pedro Tomás
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Helena Aidos
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
| |
Collapse
|
3
|
Castanho EN, Aidos H, Madeira SC. Biclustering data analysis: a comprehensive survey. Brief Bioinform 2024; 25:bbae342. [PMID: 39007596 PMCID: PMC11247412 DOI: 10.1093/bib/bbae342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 05/16/2024] [Accepted: 07/01/2024] [Indexed: 07/16/2024] Open
Abstract
Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.
Collapse
Affiliation(s)
- Eduardo N Castanho
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Helena Aidos
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| |
Collapse
|
4
|
Soares DF, Henriques R, Gromicho M, de Carvalho M, Madeira SC. Triclustering-based classification of longitudinal data for prognostic prediction: targeting relevant clinical endpoints in amyotrophic lateral sclerosis. Sci Rep 2023; 13:6182. [PMID: 37061549 PMCID: PMC10105751 DOI: 10.1038/s41598-023-33223-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 04/10/2023] [Indexed: 04/17/2023] Open
Abstract
This work proposes a new class of explainable prognostic models for longitudinal data classification using triclusters. A new temporally constrained triclustering algorithm, termed TCtriCluster, is proposed to comprehensively find informative temporal patterns common to a subset of patients in a subset of features (triclusters), and use them as discriminative features within a state-of-the-art classifier with guarantees of interpretability. The proposed approach further enhances prediction with the potentialities of model explainability by revealing clinically relevant disease progression patterns underlying prognostics, describing features used for classification. The proposed methodology is used in the Amyotrophic Lateral Sclerosis (ALS) Portuguese cohort (N = 1321), providing the first comprehensive assessment of the prognostic limits of five notable clinical endpoints: need for non-invasive ventilation (NIV); need for an auxiliary communication device; need for percutaneous endoscopic gastrostomy (PEG); need for a caregiver; and need for a wheelchair. Triclustering-based predictors outperform state-of-the-art alternatives, being able to predict the need for auxiliary communication device (within 180 days) and the need for PEG (within 90 days) with an AUC above 90%. The approach was validated in clinical practice, supporting healthcare professionals in understanding the link between the highly heterogeneous patterns of ALS disease progression and the prognosis.
Collapse
Affiliation(s)
- Diogo F Soares
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal.
| | - Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Marta Gromicho
- Instituto de Medicina Molecular and Instituto de Fisiologia, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Mamede de Carvalho
- Instituto de Medicina Molecular and Instituto de Fisiologia, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| |
Collapse
|
5
|
Anjum M, Shahab S, Yu Y. Syndrome Pattern Recognition Method Using Sensed Patient Data for Neurodegenerative Disease Progression Identification. Diagnostics (Basel) 2023; 13:887. [PMID: 36900031 PMCID: PMC10000542 DOI: 10.3390/diagnostics13050887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/22/2023] [Accepted: 02/24/2023] [Indexed: 03/03/2023] Open
Abstract
Neurodegenerative diseases are a group of conditions that involve the progressive loss of function of neurons in the brain and spinal cord. These conditions can result in a wide range of symptoms, such as difficulty with movement, speech, and cognition. The causes of neurodegenerative diseases are poorly understood, but many factors are believed to contribute to the development of these conditions. The most important risk factors include ageing, genetics, abnormal medical conditions, toxins, and environmental exposures. A slow decline in visible cognitive functions characterises the progression of these diseases. If left unattended or unnoticed, disease progression can result in serious issues such as the cessation of motor function or even paralysis. Therefore, early recognition of neurodegenerative diseases is becoming increasingly important in modern healthcare. Many sophisticated artificial intelligence technologies are incorporated into modern healthcare systems for the early recognition of these diseases. This research article introduces a Syndrome-dependent Pattern Recognition Method for the early detection and progression monitoring of neurodegenerative diseases. The proposed method determines the variance between normal and abnormal intrinsic neural connectivity data. The observed data is combined with previous and healthy function examination data to identify the variance. In this combined analysis, deep recurrent learning is exploited by tuning the analysis layer based on variance suppressed by identifying normal and abnormal patterns in the combined analysis. This variance from different patterns is recurrently used to train the learning model for maximising of recognition accuracy. The proposed method achieves 16.77% high accuracy, 10.55% high precision, and 7.69% high pattern verification. It reduces the variance and verification time by 12.08% and 12.02%, respectively.
Collapse
Affiliation(s)
- Mohd Anjum
- Department of Computer Engineering, Aligarh Muslim University, Aligarh 202001, India
| | - Sana Shahab
- Department of Business Administration, College of Business Administration, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Yang Yu
- Centre for Infrastructure Engineering and Safety (CIES), University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
6
|
Tavazzi E, Gatta R, Vallati M, Cotti Piccinelli S, Filosto M, Padovani A, Castellano M, Di Camillo B. Leveraging process mining for modeling progression trajectories in amyotrophic lateral sclerosis. BMC Med Inform Decis Mak 2023; 22:346. [PMID: 36732801 PMCID: PMC9896660 DOI: 10.1186/s12911-023-02113-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 01/13/2023] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease whose spreading and progression mechanisms are still unclear. The ability to predict ALS prognosis would improve the patients' quality of life and support clinicians in planning treatments. In this paper, we investigate ALS evolution trajectories using Process Mining (PM) techniques enriched to both easily mine processes and automatically reveal how the pathways differentiate according to patients' characteristics. METHODS We consider data collected in two distinct data sources, namely the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) dataset and a real-world clinical register (ALS-BS) including data of patients followed up in two tertiary clinical centers of Brescia (Italy). With a focus on the functional abilities progressively impaired as the disease progresses, we use two Process Discovery methods, namely the Directly-Follows Graph and the CareFlow Miner, to mine the population disease trajectories on the PRO-ACT dataset. We characterize the impairment trajectories in terms of patterns, timing, and probabilities, and investigate the effect of some patients' characteristics at onset on the followed paths. Finally, we perform a comparative study of the impairment trajectories mined in PRO-ACT versus ALS-BS. RESULTS We delineate the progression pathways on PRO-ACT, identifying the predominant disabilities at different stages of the disease: for instance, 85% of patients enter the trials without disabilities, and 48% of them experience the impairment of Walking/Self-care abilities first. We then test how a spinal onset increases the risk of experiencing the loss of Walking/Self-care ability as first impairment (52% vs. 27% of patients develop it as the first impairment in the spinal vs. the bulbar cohorts, respectively), as well as how an older age at onset corresponds to a more rapid progression to death. When compared, the PRO-ACT and the ALS-BS patient populations present some similarities in terms of natural progression of the disease, as well as some differences in terms of observed trajectories plausibly due to the trial scheduling and recruitment criteria. CONCLUSIONS We exploited PM to provide an overview of the evolution scenarios of an ALS trial population and to preliminary compare it to the progression observed in a clinical cohort. Future work will focus on further improving the understanding of the disease progression mechanisms, by including additional real-world subjects as well as by extending the set of events considered in the impairment trajectories.
Collapse
Affiliation(s)
- Erica Tavazzi
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, 35131 Padua, Italy
| | - Roberto Gatta
- Department of Clinical and Experimental Sciences, University of Brescia, Viale Europa 11, 25121 Brescia, Italy
| | - Mauro Vallati
- School of Computing and Engineering, University of Huddersfield, Huddersfield, HD1 3DH UK
| | - Stefano Cotti Piccinelli
- Department of Clinical and Experimental Sciences, University of Brescia, Viale Europa 11, 25121 Brescia, Italy
- NeMO-Brescia Clinical Center for Neuromuscular Diseases, Via Paolo Richiedei 16, 25064 Gussago, Italy
| | - Massimiliano Filosto
- Department of Clinical and Experimental Sciences, University of Brescia, Viale Europa 11, 25121 Brescia, Italy
- NeMO-Brescia Clinical Center for Neuromuscular Diseases, Via Paolo Richiedei 16, 25064 Gussago, Italy
| | - Alessandro Padovani
- Department of Clinical and Experimental Sciences, University of Brescia, Viale Europa 11, 25121 Brescia, Italy
- Unit of Neurology, ASST Spedali Civili, Piazzale Spedali Civili 1, 25123 Brescia, Italy
| | - Maurizio Castellano
- Department of Clinical and Experimental Sciences, University of Brescia, Viale Europa 11, 25121 Brescia, Italy
| | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, 35131 Padua, Italy
- Department of Comparative Biomedicine and Food Science, University of Padova, Agripolis, Viale dell’Università, 16, 35020 Legnaro, Italy
| |
Collapse
|
7
|
Soares DF, Henriques R, Gromicho M, de Carvalho M, Madeira SC. Learning prognostic models using a mixture of biclustering and triclustering: Predicting the need for non-invasive ventilation in Amyotrophic Lateral Sclerosis. J Biomed Inform 2022; 134:104172. [PMID: 36055638 DOI: 10.1016/j.jbi.2022.104172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 03/31/2022] [Accepted: 08/15/2022] [Indexed: 11/26/2022]
Abstract
Longitudinal cohort studies to study disease progression generally combine temporal features produced under periodic assessments (clinical follow-up) with static features associated with single-time assessments, genetic, psychophysiological, and demographic profiles. Subspace clustering, including biclustering and triclustering stances, enables the discovery of local and discriminative patterns from such multidimensional cohort data. These patterns, highly interpretable, are relevant to identifying groups of patients with similar traits or progression patterns. Despite their potential, their use for improving predictive tasks in clinical domains remains unexplored. In this work, we propose to learn predictive models from static and temporal data using discriminative patterns, obtained via biclustering and triclustering, as features within a state-of-the-art classifier, thus enhancing model interpretation. triCluster is extended to find time-contiguous triclusters in temporal data (temporal patterns) and a biclustering algorithm to discover coherent patterns in static data. The transformed data space, composed of bicluster and tricluster features, capture local and cross-variable associations with discriminative power, yielding unique statistical properties of interest. As a case study, we applied our methodology to follow-up data from Portuguese patients with Amyotrophic Lateral Sclerosis (ALS) to predict the need for non-invasive ventilation (NIV) since the last appointment. The results showed that, in general, our methodology outperformed baseline results using the original features. Furthermore, the bicluster/tricluster-based patterns used by the classifier can be used by clinicians to understand the models by highlighting relevant prognostic patterns.
Collapse
Affiliation(s)
- Diogo F Soares
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal.
| | - Rui Henriques
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Marta Gromicho
- Instituto de Medicina Molecular, Instituto de Fisiologia, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Mamede de Carvalho
- Instituto de Medicina Molecular, Instituto de Fisiologia, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal.
| |
Collapse
|