1
|
Faviez C, Chen X, Garcelon N, Zaidan M, Billot K, Petzold F, Faour H, Douillet M, Rozet JM, Cormier-Daire V, Attié-Bitach T, Lyonnet S, Saunier S, Burgun A. Objectivizing issues in the diagnosis of complex rare diseases: lessons learned from testing existing diagnosis support systems on ciliopathies. BMC Med Inform Decis Mak 2024; 24:134. [PMID: 38789985 PMCID: PMC11127295 DOI: 10.1186/s12911-024-02538-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 05/17/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND There are approximately 8,000 different rare diseases that affect roughly 400 million people worldwide. Many of them suffer from delayed diagnosis. Ciliopathies are rare monogenic disorders characterized by a significant phenotypic and genetic heterogeneity that raises an important challenge for clinical diagnosis. Diagnosis support systems (DSS) applied to electronic health record (EHR) data may help identify undiagnosed patients, which is of paramount importance to improve patients' care. Our objective was to evaluate three online-accessible rare disease DSSs using phenotypes derived from EHRs for the diagnosis of ciliopathies. METHODS Two datasets of ciliopathy cases, either proven or suspected, and two datasets of controls were used to evaluate the DSSs. Patient phenotypes were automatically extracted from their EHRs and converted to Human Phenotype Ontology terms. We tested the ability of the DSSs to diagnose cases in contrast to controls based on Orphanet ontology. RESULTS A total of 79 cases and 38 controls were selected. Performances of the DSSs on ciliopathy real world data (best DSS with area under the ROC curve = 0.72) were not as good as published performances on the test set used in the DSS development phase. None of these systems obtained results which could be described as "expert-level". Patients with multisystemic symptoms were generally easier to diagnose than patients with isolated symptoms. Diseases easily confused with ciliopathy generally affected multiple organs and had overlapping phenotypes. Four challenges need to be considered to improve the performances: to make the DSSs interoperable with EHR systems, to validate the performances in real-life settings, to deal with data quality, and to leverage methods and resources for rare and complex diseases. CONCLUSION Our study provides insights into the complexities of diagnosing highly heterogenous rare diseases and offers lessons derived from evaluation existing DSSs in real-world settings. These insights are not only beneficial for ciliopathy diagnosis but also hold relevance for the enhancement of DSS for various complex rare disorders, by guiding the development of more clinically relevant rare disease DSSs, that could support early diagnosis and finally make more patients eligible for treatment.
Collapse
Affiliation(s)
- Carole Faviez
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France.
- HeKA, Inria Paris, Paris, F-75012, France.
- Universite Paris Cite, Paris, France.
| | - Xiaoyi Chen
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Nicolas Garcelon
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Mohamad Zaidan
- Service de Néphrologie, Dialyse et Transplantation, Hôpital Universitaire Bicêtre, Assistance Publique-Hôpitaux de Paris (AP-HP), Kremlin Bicêtre, F-94270, France
| | - Katy Billot
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Friederike Petzold
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
- Division of Nephrology, University of Leipzig Medical Center, Leipzig, Germany
| | - Hassan Faour
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Maxime Douillet
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Jean-Michel Rozet
- Laboratory of Genetics in Ophthalmology, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Valérie Cormier-Daire
- Reference Centre for Constitutional Bone Diseases, laboratory of Osteochondrodysplasia, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
- Service de médecine génomique des maladies rares, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| | - Tania Attié-Bitach
- Service d'Histologie-Embryologie-Cytogénétique, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| | - Stanislas Lyonnet
- Service de médecine génomique des maladies rares, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
- Laboratory of Embryology and Genetics of Congenital Malformations, INSERM UMR 1163, Imagine Institute, Paris Cité, Paris, F-75015, France
| | - Sophie Saunier
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Anita Burgun
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Department of Medical Informatics, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| |
Collapse
|
2
|
Faviez C, Vincent M, Garcelon N, Boyer O, Knebelmann B, Heidet L, Saunier S, Chen X, Burgun A. Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity. Orphanet J Rare Dis 2024; 19:55. [PMID: 38336713 PMCID: PMC10858490 DOI: 10.1186/s13023-024-03063-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 02/03/2024] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Rare diseases affect approximately 400 million people worldwide. Many of them suffer from delayed diagnosis. Among them, NPHP1-related renal ciliopathies need to be diagnosed as early as possible as potential treatments have been recently investigated with promising results. Our objective was to develop a supervised machine learning pipeline for the detection of NPHP1 ciliopathy patients from a large number of nephrology patients using electronic health records (EHRs). METHODS AND RESULTS We designed a pipeline combining a phenotyping module re-using unstructured EHR data, a semantic similarity module to address the phenotype dependence, a feature selection step to deal with high dimensionality, an undersampling step to address the class imbalance, and a classification step with multiple train-test split for the small number of rare cases. The pipeline was applied to thirty NPHP1 patients and 7231 controls and achieved good performances (sensitivity 86% with specificity 90%). A qualitative review of the EHRs of 40 misclassified controls showed that 25% had phenotypes belonging to the ciliopathy spectrum, which demonstrates the ability of our system to detect patients with similar conditions. CONCLUSIONS Our pipeline reached very encouraging performance scores for pre-diagnosing ciliopathy patients. The identified patients could then undergo genetic testing. The same data-driven approach can be adapted to other rare diseases facing underdiagnosis challenges.
Collapse
Affiliation(s)
- Carole Faviez
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France.
- Inria, 75012, Paris, France.
| | - Marc Vincent
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Nicolas Garcelon
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Olivia Boyer
- Department of Pediatric Nephrology, APHP-Centre, Reference Center for Inherited Renal Diseases (MARHEA), Imagine Institute, Hôpital Necker-Enfants Malades, Université Paris Cité, 75015, Paris, France
- Laboratory of Renal Hereditary Diseases, INSERM UMR 1163, Imagine Institute, Université Paris Cité, 75015, Paris, France
| | - Bertrand Knebelmann
- Nephrology and Transplantation Department, MARHEA, Hôpital Necker-Enfants Malades, AP-HP, Université Paris Cité, 75015, Paris, France
| | - Laurence Heidet
- Department of Pediatric Nephrology, APHP-Centre, Reference Center for Inherited Renal Diseases (MARHEA), Imagine Institute, Hôpital Necker-Enfants Malades, Université Paris Cité, 75015, Paris, France
| | - Sophie Saunier
- Laboratory of Renal Hereditary Diseases, INSERM UMR 1163, Imagine Institute, Université Paris Cité, 75015, Paris, France
| | - Xiaoyi Chen
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Anita Burgun
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Département d'informatique Médicale, Hôpital Necker-Enfants Malades, AP-HP, 75015, Paris, France
| |
Collapse
|
3
|
Stasolla F, Passaro A, Di Gioia M, Curcio E, Zullo A. Combined extended reality and reinforcement learning to promote healthcare and reduce social anxiety in fragile X syndrome: a new assessment tool and a rehabilitative strategy. Front Psychol 2023; 14:1273117. [PMID: 38179497 PMCID: PMC10765535 DOI: 10.3389/fpsyg.2023.1273117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/30/2023] [Indexed: 01/06/2024] Open
Affiliation(s)
| | - Anna Passaro
- University “Giustino Fortunato” of Benevento, Benevento, Italy
| | | | - Enza Curcio
- University “Giustino Fortunato” of Benevento, Benevento, Italy
| | | |
Collapse
|
4
|
Tinker RJ, Peterson J, Bastarache L. Phenotypic presentation of Mendelian disease across the diagnostic trajectory in electronic health records. Genet Med 2023; 25:100921. [PMID: 37337966 PMCID: PMC11092403 DOI: 10.1016/j.gim.2023.100921] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 06/12/2023] [Accepted: 06/13/2023] [Indexed: 06/21/2023] Open
Abstract
PURPOSE To investigate the phenotypic presentation of Mendelian disease across the diagnostic trajectory in the electronic health record (EHR). METHODS We applied a conceptual model to delineate the diagnostic trajectory of Mendelian disease to the EHRs of patients affected by 1 of 9 Mendelian diseases. We assessed data availability and phenotype ascertainment across the diagnostic trajectory using phenotype risk scores and validated our findings via chart review of patients with hereditary connective tissue disorders. RESULTS We identified 896 individuals with genetically confirmed diagnoses, 216 (24%) of whom had fully ascertained diagnostic trajectories. Phenotype risk scores increased following clinical suspicion and diagnosis (P < 1 × 10-4, Wilcoxon rank sum test). We found that of all International Classification of Disease-based phenotypes in the EHR, 66% were recorded after clinical suspicion, and manual chart review yielded consistent results. CONCLUSION Using a novel conceptual model to study the diagnostic trajectory of genetic disease in the EHR, we demonstrated that phenotype ascertainment is, in large part, driven by the clinical examinations and studies prompted by clinical suspicion of a genetic disease, a process we term diagnostic convergence. Algorithms designed to detect undiagnosed genetic disease should consider censoring EHR data at the first date of clinical suspicion to avoid data leakage.
Collapse
Affiliation(s)
- Rory J Tinker
- Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Josh Peterson
- Vanderbilt University Medical Center, Department of Medicine, Nashville, TN; Vanderbilt University Medical Center, Department of Biomedical Informatics, Nashville, TN
| | - Lisa Bastarache
- Vanderbilt University Medical Center, Department of Biomedical Informatics, Nashville, TN.
| |
Collapse
|
5
|
Surianarayanan C, Lawrence JJ, Chelliah PR, Prakash E, Hewage C. Convergence of Artificial Intelligence and Neuroscience towards the Diagnosis of Neurological Disorders-A Scoping Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:3062. [PMID: 36991773 PMCID: PMC10053494 DOI: 10.3390/s23063062] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/09/2023] [Accepted: 03/09/2023] [Indexed: 06/19/2023]
Abstract
Artificial intelligence (AI) is a field of computer science that deals with the simulation of human intelligence using machines so that such machines gain problem-solving and decision-making capabilities similar to that of the human brain. Neuroscience is the scientific study of the struczture and cognitive functions of the brain. Neuroscience and AI are mutually interrelated. These two fields help each other in their advancements. The theory of neuroscience has brought many distinct improvisations into the AI field. The biological neural network has led to the realization of complex deep neural network architectures that are used to develop versatile applications, such as text processing, speech recognition, object detection, etc. Additionally, neuroscience helps to validate the existing AI-based models. Reinforcement learning in humans and animals has inspired computer scientists to develop algorithms for reinforcement learning in artificial systems, which enables those systems to learn complex strategies without explicit instruction. Such learning helps in building complex applications, like robot-based surgery, autonomous vehicles, gaming applications, etc. In turn, with its ability to intelligently analyze complex data and extract hidden patterns, AI fits as a perfect choice for analyzing neuroscience data that are very complex. Large-scale AI-based simulations help neuroscientists test their hypotheses. Through an interface with the brain, an AI-based system can extract the brain signals and commands that are generated according to the signals. These commands are fed into devices, such as a robotic arm, which helps in the movement of paralyzed muscles or other human parts. AI has several use cases in analyzing neuroimaging data and reducing the workload of radiologists. The study of neuroscience helps in the early detection and diagnosis of neurological disorders. In the same way, AI can effectively be applied to the prediction and detection of neurological disorders. Thus, in this paper, a scoping review has been carried out on the mutual relationship between AI and neuroscience, emphasizing the convergence between AI and neuroscience in order to detect and predict various neurological disorders.
Collapse
Affiliation(s)
| | | | | | - Edmond Prakash
- Research Center for Creative Arts, University for the Creative Arts (UCA), Farnham GU9 7DS, UK
| | - Chaminda Hewage
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff CF5 2YB, UK
| |
Collapse
|
6
|
Tinker RJ, Peterson J, Bastarache L. Phenotypic convergence: a novel phenomenon in the diagnostic process of Mendelian genetic disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.17.23284691. [PMID: 36711865 PMCID: PMC9882467 DOI: 10.1101/2023.01.17.23284691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Introduction The study of Mendelian disease has yielded a large body of knowledge about the phenotypic presentation of disease. Less is known about the way the diseases are reflected in the electronic health record (EHR). Aim To develop an EHR-based model of the diagnostic trajectory and investigate data availability and the longitudinal distribution of signs and symptoms of a Mendelian disorder within EHRs. Methods We created a conceptual model to specify key time points of the diagnostic trajectory and applied it to individuals with genetically confirmed hereditary connective tissue diseases (HCTD). Using the model, we assessed EHR data availability within each time interval. We tested the performance of phenotype risk scores (PheRS), an algorithm that detects Mendelian disease patterns and assessed the phenotypic expression of HCTD over the diagnostic trajectory. Results We identified 251 individuals with HCTD; 79 (35%) of these patients had a fully ascertained diagnostic trajectory. There were few documented signs and symptoms prior to clinical suspicion that evoked an HCTD disorder (median PheRS 0.14); once suspicion was documented, median PheRS increased to 1.87 (SD). The majority (72%) of phenotypic features were identified post clinical suspicion. Discussion Using a novel conceptual model for the diagnostic trajectory of Mendelian disease, we demonstrated that phenotype ascertainment is, in part, driven by the diagnostic process and that many findings are only documented following clinical suspicion and diagnosis, a process we term phenotypic convergence. Therefore, algorithms that aim to detect undiagnosed Mendelian disease should censor EHR data to avoid data leakage.
Collapse
|