1
|
Guazzo A, Longato E, Fadini GP, Morieri ML, Sparacino G, Di Camillo B. Deep-learning-based natural-language-processing models to identify cardiovascular disease hospitalisations of patients with diabetes from routine visits' text. Sci Rep 2023; 13:19132. [PMID: 37926737 PMCID: PMC10625981 DOI: 10.1038/s41598-023-45115-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/16/2023] [Indexed: 11/07/2023] Open
Abstract
Writing notes is the most widespread method to report clinical events. Therefore, most of the information about the disease history of a patient remains locked behind free-form text. Natural language processing (NLP) provides a solution to automatically transform free-form text into structured data. In the present work, electronic healthcare records data of patients with diabetes were used to develop deep-learning based NLP models to automatically identify, within free-form text describing routine visits, the occurrence of hospitalisations related to cardiovascular disease (CVDs), an outcome of diabetes. Four possible time windows of increasing level of expected difficulty were considered: infinite, 24 months, 12 months, and 6 months. Model performance was evaluated by means of the area under the precision recall curve, as well as precision, recall, and F1-score after thresholding. Results showed that the proposed NLP approach was successful for both the infinite and 24-month windows, while, as expected, performance deteriorated with shorter time windows. Possible clinical applications of tools based on the proposed NLP approach include the retrospective filling of medical records with respect to a patient's CVD history for epidemiological and research purposes as well as for clinical decision making.
Collapse
Affiliation(s)
- Alessandro Guazzo
- Department of Information Engineering, University of Padova, 35131, Padua, Italy
| | - Enrico Longato
- Department of Information Engineering, University of Padova, 35131, Padua, Italy
| | | | | | - Giovanni Sparacino
- Department of Information Engineering, University of Padova, 35131, Padua, Italy
| | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, 35131, Padua, Italy.
- Department of Comparative Biomedicine and Food Science, University of Padova, Legnaro, Italy.
| |
Collapse
|
3
|
Tavazzi E, Longato E, Vettoretti M, Aidos H, Trescato I, Roversi C, Martins AS, Castanho EN, Branco R, Soares DF, Guazzo A, Birolo G, Pala D, Bosoni P, Chiò A, Manera U, de Carvalho M, Miranda B, Gromicho M, Alves I, Bellazzi R, Dagliati A, Fariselli P, Madeira SC, Di Camillo B. Artificial intelligence and statistical methods for stratification and prediction of progression in amyotrophic lateral sclerosis: A systematic review. Artif Intell Med 2023; 142:102588. [PMID: 37316101 DOI: 10.1016/j.artmed.2023.102588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 04/14/2023] [Accepted: 05/16/2023] [Indexed: 06/16/2023]
Abstract
BACKGROUND Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterised by the progressive loss of motor neurons in the brain and spinal cord. The fact that ALS's disease course is highly heterogeneous, and its determinants not fully known, combined with ALS's relatively low prevalence, renders the successful application of artificial intelligence (AI) techniques particularly arduous. OBJECTIVE This systematic review aims at identifying areas of agreement and unanswered questions regarding two notable applications of AI in ALS, namely the automatic, data-driven stratification of patients according to their phenotype, and the prediction of ALS progression. Differently from previous works, this review is focused on the methodological landscape of AI in ALS. METHODS We conducted a systematic search of the Scopus and PubMed databases, looking for studies on data-driven stratification methods based on unsupervised techniques resulting in (A) automatic group discovery or (B) a transformation of the feature space allowing patient subgroups to be identified; and for studies on internally or externally validated methods for the prediction of ALS progression. We described the selected studies according to the following characteristics, when applicable: variables used, methodology, splitting criteria and number of groups, prediction outcomes, validation schemes, and metrics. RESULTS Of the starting 1604 unique reports (2837 combined hits between Scopus and PubMed), 239 were selected for thorough screening, leading to the inclusion of 15 studies on patient stratification, 28 on prediction of ALS progression, and 6 on both stratification and prediction. In terms of variables used, most stratification and prediction studies included demographics and features derived from the ALSFRS or ALSFRS-R scores, which were also the main prediction targets. The most represented stratification methods were K-means, and hierarchical and expectation-maximisation clustering; while random forests, logistic regression, the Cox proportional hazard model, and various flavours of deep learning were the most widely used prediction methods. Predictive model validation was, albeit unexpectedly, quite rarely performed in absolute terms (leading to the exclusion of 78 eligible studies), with the overwhelming majority of included studies resorting to internal validation only. CONCLUSION This systematic review highlighted a general agreement in terms of input variable selection for both stratification and prediction of ALS progression, and in terms of prediction targets. A striking lack of validated models emerged, as well as a general difficulty in reproducing many published studies, mainly due to the absence of the corresponding parameter lists. While deep learning seems promising for prediction applications, its superiority with respect to traditional methods has not been established; there is, instead, ample room for its application in the subfield of patient stratification. Finally, an open question remains on the role of new environmental and behavioural variables collected via novel, real-time sensors.
Collapse
Affiliation(s)
- Erica Tavazzi
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, Padua, 35131, Italy
| | - Enrico Longato
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, Padua, 35131, Italy
| | - Martina Vettoretti
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, Padua, 35131, Italy
| | - Helena Aidos
- LASIGE and Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal
| | - Isotta Trescato
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, Padua, 35131, Italy
| | - Chiara Roversi
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, Padua, 35131, Italy
| | - Andreia S Martins
- LASIGE and Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal
| | - Eduardo N Castanho
- LASIGE and Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal
| | - Ruben Branco
- LASIGE and Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal
| | - Diogo F Soares
- LASIGE and Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal
| | - Alessandro Guazzo
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, Padua, 35131, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Corso Dogliotti 14, Turin, 10126, Italy
| | - Daniele Pala
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, Pavia, 27100, Italy
| | - Pietro Bosoni
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, Pavia, 27100, Italy
| | - Adriano Chiò
- Department of Neurosciences "Rita Levi Montalcini", University of Turin, Via Cherasco 15, Turin, 10126, Italy
| | - Umberto Manera
- Department of Neurosciences "Rita Levi Montalcini", University of Turin, Via Cherasco 15, Turin, 10126, Italy
| | - Mamede de Carvalho
- Faculdade de Medicina, Instituto de Medicina Molecular João Lobo Antunes, Universidade de Lisboa, Av. Prof. Egas Moniz, Lisbon, 1649-028, Portugal
| | - Bruno Miranda
- Faculdade de Medicina, Instituto de Medicina Molecular João Lobo Antunes, Universidade de Lisboa, Av. Prof. Egas Moniz, Lisbon, 1649-028, Portugal
| | - Marta Gromicho
- Faculdade de Medicina, Instituto de Medicina Molecular João Lobo Antunes, Universidade de Lisboa, Av. Prof. Egas Moniz, Lisbon, 1649-028, Portugal
| | - Inês Alves
- Faculdade de Medicina, Instituto de Medicina Molecular João Lobo Antunes, Universidade de Lisboa, Av. Prof. Egas Moniz, Lisbon, 1649-028, Portugal
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, Pavia, 27100, Italy
| | - Arianna Dagliati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, Pavia, 27100, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Corso Dogliotti 14, Turin, 10126, Italy
| | - Sara C Madeira
- LASIGE and Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal
| | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, Via Gradenigo 6/b, Padua, 35131, Italy; Department of Comparative Biomedicine and Food Science, University of Padova, Agripolis, Viale dell'Università, 16, Legnaro (PD), 35020, Italy.
| |
Collapse
|
4
|
Guazzo A, Longato E, Morieri ML, Sparacino G, Franco-Novelletto B, Cancian M, Fusello M, Tramontan L, Battaggia A, Avogaro A, Fadini GP, Di Camillo B. Performance assessment across different care settings of a heart failure hospitalisation risk-score for type 2 diabetes using administrative claims. Sci Rep 2022; 12:7762. [PMID: 35545655 PMCID: PMC9095603 DOI: 10.1038/s41598-022-11758-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 04/19/2022] [Indexed: 11/25/2022] Open
Abstract
Predicting the risk of cardiovascular complications, in particular heart failure hospitalisation (HHF), can improve the management of type 2 diabetes (T2D). Most predictive models proposed so far rely on clinical data not available at the higher Institutional level. Therefore, it is of interest to assess the risk of HHF in people with T2D using administrative claims data only, which are more easily obtainable and could allow public health systems to identify high-risk individuals. In this paper, the administrative claims of > 175,000 patients with T2D were used to develop a new risk score for HHF based on Cox regression. Internal validation on the administrative data cohort yielded satisfactory results in terms of discrimination (max AUROC = 0.792, C-index = 0.786) and calibration (Hosmer-Lemeshow test p value < 0.05). The risk score was then tested on data gathered from two independent centers (one diabetes outpatient clinic and one primary care network) to demonstrate its applicability to different care settings in the medium-long term. Thanks to the large size and broad demographics of the administrative dataset used for training, the proposed model was able to predict HHF without significant performance loss concerning bespoke models developed within each setting using more informative, but harder-to-acquire clinical variables.
Collapse
Affiliation(s)
- Alessandro Guazzo
- Department of Information Engineering, University of Padova, 35122, Padua, Italy
| | - Enrico Longato
- Department of Information Engineering, University of Padova, 35122, Padua, Italy
| | | | - Giovanni Sparacino
- Department of Information Engineering, University of Padova, 35122, Padua, Italy
| | - Bruno Franco-Novelletto
- Scuola Veneta di Medicina Generale (SVEMG), Padua, Italy
- Società Italiana di Medicina Generale e delle Cure Primarie (SIMG), Florence, Italy
| | - Maurizio Cancian
- Scuola Veneta di Medicina Generale (SVEMG), Padua, Italy
- Società Italiana di Medicina Generale e delle Cure Primarie (SIMG), Florence, Italy
| | | | - Lara Tramontan
- Arsenàl.IT, Veneto's Research Centre for eHealth Innovation, 31100, Treviso, Italy
| | - Alessandro Battaggia
- Scuola Veneta di Medicina Generale (SVEMG), Padua, Italy
- Società Italiana di Medicina Generale e delle Cure Primarie (SIMG), Florence, Italy
| | - Angelo Avogaro
- Department of Medicine, University of Padova, 35128, Padua, Italy
| | | | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, 35122, Padua, Italy.
- Department of Comparative Biomedicine and Food Science, University of Padova, 35020, Legnaro, PD, Italy.
| |
Collapse
|