1
|
Guérin J, Hennocq Q, Paternoster G, Arnaud É, Khonsari RH. Distractor position and distraction amplitude in fronto-facial monobloc advancement : A case series. JOURNAL OF STOMATOLOGY, ORAL AND MAXILLOFACIAL SURGERY 2024:101942. [PMID: 38897383 DOI: 10.1016/j.jormas.2024.101942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 06/05/2024] [Accepted: 06/06/2024] [Indexed: 06/21/2024]
Abstract
Fronto-facial monobloc advancement with internal distraction (FFMBA) is a central procedure in the management of faciocraniosynostoses. In techniques with internal distraction, two sets of devices are generally positioned: bilateral fronto-orbital and temporo-zygomatic distractors, using a temporal tongue and groove osteotomy design. It is believed that distractors must be positioned as parallel as possible in the horizontal and sagittal planes to avoid mechanical conflicts between the sliding bone fragments of the tongue and groove during distraction, and thus optimize the advancement amplitude. Several approaches involving surgical planification and guides for distractor positioning have thus been proposed to monitor distractor placement. To explore the need for surgical planification in distractor placement, the parallelism of the position of the 4 distractors was assessed in 19 FFMBA procedures and we correlated a set of 10 distractor angles with the degree of advancement. We report that the horizontal cut of the tongue and groove can be used as a landmark for the positioning of the lower, temporo-zygomatic, distractor in fronto-facial monobloc advancement. Other parameters (relative position of the two homolateral and the two contralateral distractors and the orientations of the vertical and horizontal cuts of the tongue and groove) do not interfere with distraction, other things being equal. Our results indicate that distractor orientation is not a critical issue in fronto-facial monobloc advancement when devices are positioned as parallel as possible based on visual monitoring.
Collapse
Affiliation(s)
- Jade Guérin
- Laboratoire 'Forme et Croissance du Crâne', Hôpital Necker - Enfants Malades, Assistance Publique - Hôpitaux de Paris, Paris, France
| | - Quentin Hennocq
- Laboratoire 'Forme et Croissance du Crâne', Hôpital Necker - Enfants Malades, Assistance Publique - Hôpitaux de Paris, Paris, France; Unité fonctionnelle de chirurgie craniofaciale, Service de neurochirurgie pédiatrique, CRMR CRANIOST, Filière TeteCou, Hôpital Necker - Enfants Malades, Assistance Publique - Hôpitaux de Paris, Faculté de Médecine, Université Paris Cité, Paris, France
| | - Giovanna Paternoster
- Unité fonctionnelle de chirurgie craniofaciale, Service de neurochirurgie pédiatrique, CRMR CRANIOST, Filière TeteCou, Hôpital Necker - Enfants Malades, Assistance Publique - Hôpitaux de Paris, Faculté de Médecine, Université Paris Cité, Paris, France
| | - Éric Arnaud
- Unité fonctionnelle de chirurgie craniofaciale, Service de neurochirurgie pédiatrique, CRMR CRANIOST, Filière TeteCou, Hôpital Necker - Enfants Malades, Assistance Publique - Hôpitaux de Paris, Faculté de Médecine, Université Paris Cité, Paris, France; Clinique Marcel Sembat, Ramsay, Boulogne-Billancourt, France
| | - Roman Hossein Khonsari
- Laboratoire 'Forme et Croissance du Crâne', Hôpital Necker - Enfants Malades, Assistance Publique - Hôpitaux de Paris, Paris, France; Unité fonctionnelle de chirurgie craniofaciale, Service de neurochirurgie pédiatrique, CRMR CRANIOST, Filière TeteCou, Hôpital Necker - Enfants Malades, Assistance Publique - Hôpitaux de Paris, Faculté de Médecine, Université Paris Cité, Paris, France.
| |
Collapse
|
2
|
Wang A, Liu C, Yang J, Weng C. Fine-tuning Large Language Models for Rare Disease Concept Normalization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.28.573586. [PMID: 38234802 PMCID: PMC10793431 DOI: 10.1101/2023.12.28.573586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Objective We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). Methods We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. Results When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ~20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. Conclusion Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLM to identify named medical entities from the clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Collapse
Affiliation(s)
- Andy Wang
- Peddie School, Hightstown, NJ, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Jingye Yang
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
3
|
Wang A, Liu C, Yang J, Weng C. Fine-tuning large language models for rare disease concept normalization. J Am Med Inform Assoc 2024:ocae133. [PMID: 38829731 DOI: 10.1093/jamia/ocae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 05/20/2024] [Accepted: 05/22/2024] [Indexed: 06/05/2024] Open
Abstract
OBJECTIVE We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). METHODS We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama 2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. RESULTS When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ∼20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. CONCLUSION Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLMs to identify named medical entities from clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Collapse
Affiliation(s)
- Andy Wang
- Peddie School, Hightstown, NJ 08520, United States
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Jingye Yang
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| |
Collapse
|
4
|
Lo Barco T, Garcelon N, Neuraz A, Nabbout R. Natural history of rare diseases using natural language processing of narrative unstructured electronic health records: The example of Dravet syndrome. Epilepsia 2024; 65:350-361. [PMID: 38065926 DOI: 10.1111/epi.17855] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/07/2023] [Accepted: 12/07/2023] [Indexed: 12/31/2023]
Abstract
OBJECTIVE The increasing implementation of electronic health records allows the use of advanced text-mining methods for establishing new patient phenotypes and stratification, and for revealing outcome correlations. In this study, we aimed to explore the electronic narrative clinical reports of a cohort of patients with Dravet syndrome (DS) longitudinally followed at our center, to identify the capacity of this methodology to retrace natural history of DS during the early years. METHODS We used a document-based clinical data warehouse employing natural language processing to recognize the phenotype concepts in the narrative medical reports. We included patients with DS who have a medical report produced before the age of 2 years and a follow-up after the age of 3 years ("DS cohort," 56 individuals). We selected two control populations, a "general control cohort" (275 individuals) and a "neurological control cohort" (281 individuals), with similar characteristics in terms of gender, number of reports, and age at last report. To find concepts specifically associated with DS, we performed a phenome-wide association study using Cox regression, comparing the reports of the three cohorts. We then performed a qualitative analysis of the surviving concepts based on their median age at first appearance. RESULTS A total of 76 concepts were prevalent in the reports of children with DS. Concepts appearing during the first 2 years were mostly related with the epilepsy features at the onset of DS (convulsive and prolonged seizures triggered by fever, often requiring in-hospital care). Subsequently, concepts related to new types of seizures and to drug resistance appeared. A series of non-seizure-related concepts emerged after the age of 2-3 years, referring to the nonseizure comorbidities classically associated with DS. SIGNIFICANCE The extraction of clinical terms by narrative reports of children with DS allows outlining the known natural history of this rare disease in early childhood. This original model of "longitudinal phenotyping" could be applied to other rare and very rare conditions with poor natural history description.
Collapse
Affiliation(s)
- Tommaso Lo Barco
- Department of Pediatric Neurology, Necker-Enfants Malades Hospital, Assistance Publique-Hôpitaux de Paris, Reference Center for Rare Epilepsies, Member of European Reference Network EpiCARE, Université Paris Cité, Paris, France
| | - Nicolas Garcelon
- Data Science Platform, Institut National de la Santé et de la Recherche Médicale Unité Mixte de Recherche 1163, Imagine Institute, Université Paris Cité, Paris, France
| | - Antoine Neuraz
- Data Science Platform, Institut National de la Santé et de la Recherche Médicale Unité Mixte de Recherche 1163, Imagine Institute, Université Paris Cité, Paris, France
| | - Rima Nabbout
- Department of Pediatric Neurology, Necker-Enfants Malades Hospital, Assistance Publique-Hôpitaux de Paris, Reference Center for Rare Epilepsies, Member of European Reference Network EpiCARE, Université Paris Cité, Paris, France
- Translational Research for Neurological Disorders, Institut National de la Santé et de la Recherche Médicale Unité Mixte de Recherche 1163, Imagine Institute, Université Paris Cité, Paris, France
| |
Collapse
|
5
|
Bazoge A, Morin E, Daille B, Gourraud PA. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review. JMIR Med Inform 2023; 11:e42477. [PMID: 38100200 PMCID: PMC10757232 DOI: 10.2196/42477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/16/2023] [Accepted: 09/07/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. OBJECTIVE The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. METHODS This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. RESULTS We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). CONCLUSIONS CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.
Collapse
Affiliation(s)
- Adrien Bazoge
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
| | - Emmanuel Morin
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Béatrice Daille
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Pierre-Antoine Gourraud
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
- Nantes Université, INSERM, CHU de Nantes, École Centrale Nantes, Centre de Recherche Translationnelle en Transplantation et Immunologie, CR2TI, F-44000 Nantes, France
| |
Collapse
|
6
|
Lovis C, Mageau A, Mékinian A, Tannier X, Carrat F. Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study. JMIR Med Inform 2022; 10:e42379. [PMID: 36534446 PMCID: PMC9808583 DOI: 10.2196/42379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/17/2022] [Accepted: 10/22/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. OBJECTIVE We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. METHODS Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. RESULTS For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. CONCLUSIONS Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients.
Collapse
Affiliation(s)
| | - Arthur Mageau
- Institut National de la Santé et de la Recherche Médicale, Unité Mixte de Recherche 1137 Infection Antimicrobials Modelling Evolution, Team Decision Sciences in Infectious Diseases, Université Paris Cité, Paris, France
| | - Arsène Mékinian
- Service de Médecine Interne, Inflammation-Immunopathology-Biotherapy Department, Hôpital Saint-Antoine, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Xavier Tannier
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, Institut National de la Santé et de la Recherche Médicale, Université Sorbonne, Paris, France
| | - Fabrice Carrat
- Institute Pierre Louis Epidemiology and Public Health, Institut National de la Santé et de la Recherche Médicale, Sorbonne Université, Paris, France.,Public Health Department, Hopital Saint-Antoine, Assistance Publique-Hôpitaux de Paris, Paris, France
| |
Collapse
|
7
|
Friedlander L, Vincent M, Berdal A, Cormier-Daire V, Lyonnet S, Garcelon N. Consideration of oral health in rare disease expertise centres: a retrospective study on 39 rare diseases using text mining extraction method. Orphanet J Rare Dis 2022; 17:317. [PMID: 35987771 PMCID: PMC9392290 DOI: 10.1186/s13023-022-02467-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 08/13/2022] [Indexed: 11/13/2022] Open
Abstract
Background Around 8000 rare diseases are currently defined. In the context of individual vulnerability and more specifically the one induced by rare diseases, ensuring oral health is a particularly important issue. The objective of the study is to evaluate the pattern of oral health care course for patients with any rare genetic disease. Description of oral phenotypic signs—which predict a theoretical dental health care course—and effective orientation into an oral healthcare were evaluated.
Materials and methods We set up a retrospective cohort study to describe the consideration of patient oral health and potential orientation to an oral health care course who have at least been seen once between 1 January 2017 and 1 January 2020 in Necker Enfants Malades Hospital. We recruited patients from this study using the data warehouse, Dr Warehouse® (DrWH), from Necker-Enfants Malades Hospital.
Results The study sample included 39 rare diseases, 2712 patients, with 54.7% girls and 45.3% boys. In the sample studied, 27.9% of patients had an acquisition delay or a pervasive developmental disorder. Among the patient files studied, oral and dental phenotypic signs were described for 18.40% of the patients, and an orientation in an oral healthcare was made in 15.60% of patients. The overall "network" effect was significantly associated with description of phenotypic signs (corrected p = 1.44e−77) and orientation to an oral healthcare (corrected p = 23.58e−44). Taking the Defiscience network (rare diseases of cerebral development and intellectual disability) as a reference for the odd ratio analysis, OSCAR, TETECOU, FILNEMUS, FIMARAD, MHEMO networks stand out from the other networks for their significantly higher consideration of oral phenotypic signs and orientation in an oral healthcare.
Conclusion To our knowledge, no study has explored the management of oral health in so many rare diseases. The expected benefits of this study are, among others, a better understanding, and a better knowledge of the oral care, or at least of the consideration of oral care, in patients with rare diseases. Moreover, with the will to improve the knowledge on genetic diseases, oral heath must have a major place in the deep patient phenotyping. Therefore, interdisciplinary consultations with health professionals from different fields are crucial.
Collapse
|
8
|
The prediction of hospital length of stay using unstructured data. BMC Med Inform Decis Mak 2021; 21:351. [PMID: 34922532 PMCID: PMC8684269 DOI: 10.1186/s12911-021-01722-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 12/13/2021] [Indexed: 11/10/2022] Open
Abstract
Objective This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis.
Methods This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patient was admitted through the Emergency Department (ED) and stayed for more than two days in the subsequent service. LOS was predicted using two random forest models. The first included unstructured text extracted from electronic health records (EHRs). A word-embedding algorithm based on UMLS terminology with exact matching restricted to patient-centric affirmation sentences was used to assess the EHR data. The second model was primarily based on structured data in the form of diagnoses coded from the International Classification of Disease 10th Edition (ICD-10) and triage codes (CCMU/GEMSA classifications). Variables common to both models were: age, gender, zip/postal code, LOS in the ED, recent visit flag, assigned patient ward after the ED stay and short-term ED activity. Models were trained on 80% of data and performance was evaluated by accuracy on the remaining 20% test data.
Results The model using unstructured data had a 75.0% accuracy compared to 74.1% for the model containing structured data. The two models produced a similar prediction in 86.6% of cases. In a secondary analysis restricted to intensive care patients, the accuracy of both models was also similar (76.3% vs 75.0%).
Conclusions LOS prediction using unstructured data had similar accuracy to using structured data and can be considered of use to accurately model LOS. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01722-4.
Collapse
|
9
|
Barco TL, Kuchenbuch M, Garcelon N, Neuraz A, Nabbout R. Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome. Orphanet J Rare Dis 2021; 16:309. [PMID: 34256808 PMCID: PMC8278630 DOI: 10.1186/s13023-021-01936-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 06/27/2021] [Indexed: 12/01/2022] Open
Abstract
Background The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care. Methods Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions. Results We found significative higher representation of concepts related to seizures’ phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders. Conclusions Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.
Collapse
Affiliation(s)
- Tommaso Lo Barco
- Department of Pediatric Neurology, Necker-Enfants Malades Hospital, APHP, Centre de Référence Épilepsies Rares, Member of ERN EPICARE, Université de Paris, Paris, France.,Child Neuropsychiatry, Department of Surgical Sciences, Dentistry, Gynecology and Pediatrics, University of Verona, Verona, Italy
| | - Mathieu Kuchenbuch
- Department of Pediatric Neurology, Necker-Enfants Malades Hospital, APHP, Centre de Référence Épilepsies Rares, Member of ERN EPICARE, Université de Paris, Paris, France.,Imagine Institute, INSERM, UMR 1163, Université de Paris, 75015, Paris, France
| | - Nicolas Garcelon
- Imagine Institute, INSERM, UMR 1163, Université de Paris, 75015, Paris, France
| | - Antoine Neuraz
- Université de Paris, Paris, France.,INSERM, UMR1138, Centre de Recherche Des Cordeliers, Paris, France.,Department of Medical Informatics, University Hospital Necker-Enfants Malades, APHP, Paris, France
| | - Rima Nabbout
- Department of Pediatric Neurology, Necker-Enfants Malades Hospital, APHP, Centre de Référence Épilepsies Rares, Member of ERN EPICARE, Université de Paris, Paris, France. .,Imagine Institute, INSERM, UMR 1163, Université de Paris, 75015, Paris, France. .,Université de Paris, Paris, France.
| |
Collapse
|
10
|
Bastard P, Galerne A, Lefevre-Utile A, Briand C, Baruchel A, Durand P, Landman-Parker J, Gouache E, Boddaert N, Moshous D, Gaudelus J, Cohen R, Deschenes G, Fischer A, Blanche S, de Pontual L, Neven B. Different Clinical Presentations and Outcomes of Disseminated Varicella in Children With Primary and Acquired Immunodeficiencies. Front Immunol 2021; 11:595478. [PMID: 33250898 PMCID: PMC7674974 DOI: 10.3389/fimmu.2020.595478] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Accepted: 10/09/2020] [Indexed: 11/13/2022] Open
Abstract
Primary infection with varicella-zoster virus (VZV) causes chickenpox, a benign and self-limited disease in healthy children. In patients with primary or acquired immunodeficiencies, primary infection can be life-threatening, due to rapid dissemination of the virus to various organs [lung, gastrointestinal tract, liver, eye, central nervous system (CNS)]. We retrospectively described and compared the clinical presentations and outcomes of disseminated varicella infection (DV) in patients with acquired (AID) (n= 7) and primary (PID) (n= 12) immunodeficiencies. Patients with AID were on immunosuppression (mostly steroids) for nephrotic syndrome, solid organ transplantation or the treatment of hemopathies, whereas those with PID had combined immunodeficiency (CID) or severe CID (SCID). The course of the disease was severe and fulminant in patients with AID, with multiple organ failure, no rash or a delayed rash, whereas patients with CID and SICD presented typical signs of chickenpox, including a rash, with dissemination to other organs, including the lungs and CNS. In the PID group, antiviral treatment was prolonged until immune reconstitution after bone marrow transplantation, which was performed in 10/12 patients. Four patients died, and three experienced neurological sequelae. SCID patients had the worst outcome. Our findings highlight substantial differences in the clinical presentation and course of DV between children with AID and PID, suggesting differences in pathophysiology. Prevention, early diagnosis and treatment are required to improve outcome.
Collapse
Affiliation(s)
- Paul Bastard
- Service de Pédiatrie, Hôpital Jean Verdier, Bondy, AP-HP (Assistance-Publique-Hôpitaux de Paris), France.,Service d'Immunologie et Hématologie Pédiatrique, Hôpital Necker Enfants Malades, AP-HP, Paris, France
| | - Aurélien Galerne
- Service de Pédiatrie, Hôpital Jean Verdier, Bondy, AP-HP (Assistance-Publique-Hôpitaux de Paris), France
| | - Alain Lefevre-Utile
- Service de Pédiatrie, Hôpital Jean Verdier, Bondy, AP-HP (Assistance-Publique-Hôpitaux de Paris), France.,INSERM U976-Human Systems Immunology and Inflammatory Networks, Institut de Recherche de Saint Louis, Paris, France.,Université de Paris, Paris, France
| | - Coralie Briand
- Service de Pédiatrie, Hôpital Jean Verdier, Bondy, AP-HP (Assistance-Publique-Hôpitaux de Paris), France
| | - André Baruchel
- Université de Paris, Paris, France.,Département d'Hématologie Pédiatrique, Hôpital Robert-Debré, AP-HP, Paris, France
| | - Philippe Durand
- Service de Réanimation Pédiatrique, Hôpital du Kremlin-Bicêtre, Kremlin-Bicêtre, France.,Université Paris XI, AP-HP, Paris.,Université Paris Saclay, Saint-Aubin, France
| | - Judith Landman-Parker
- Sorbonne Université, Service de d'Hématologie Oncologie Pédiatrique, Hôpital Armand Trousseau, AP-HP, Paris, France
| | - Elodie Gouache
- Sorbonne Université, Service de d'Hématologie Oncologie Pédiatrique, Hôpital Armand Trousseau, AP-HP, Paris, France
| | - Nathalie Boddaert
- Université de Paris, Paris, France.,Service de Radiologie Pédiatrique, Hôpital Necker Enfants Malades, AP-HP, Université de Paris, Paris, France.,INSERM U1163, Institut IMAGINE, Paris, France
| | - Despina Moshous
- Service d'Immunologie et Hématologie Pédiatrique, Hôpital Necker Enfants Malades, AP-HP, Paris, France.,Université de Paris, Paris, France.,INSERM U1163, Institut IMAGINE, Paris, France
| | - Joel Gaudelus
- Service de Pédiatrie, Hôpital Jean Verdier, Bondy, AP-HP (Assistance-Publique-Hôpitaux de Paris), France.,Sorbonne Paris Nord University, Bobigny, France
| | - Robert Cohen
- ACTIV Centre Hospitalier Intercommunal de Créteil, Créteil, France
| | - Georges Deschenes
- Service de Néphrologie Pédiatrique, Hôpital Robert-Debré, AP-HP, Paris, France
| | - Alain Fischer
- Service d'Immunologie et Hématologie Pédiatrique, Hôpital Necker Enfants Malades, AP-HP, Paris, France.,Université de Paris, Paris, France.,INSERM U1163, Institut IMAGINE, Paris, France.,Experimental Medicine, Collège de France, Paris, France
| | - Stéphane Blanche
- Service d'Immunologie et Hématologie Pédiatrique, Hôpital Necker Enfants Malades, AP-HP, Paris, France.,Université de Paris, Paris, France
| | - Loïc de Pontual
- Service de Pédiatrie, Hôpital Jean Verdier, Bondy, AP-HP (Assistance-Publique-Hôpitaux de Paris), France.,Sorbonne Paris Nord University, Bobigny, France
| | - Bénédicte Neven
- Service d'Immunologie et Hématologie Pédiatrique, Hôpital Necker Enfants Malades, AP-HP, Paris, France.,Université de Paris, Paris, France.,INSERM U1163, Institut IMAGINE, Paris, France
| |
Collapse
|
11
|
Zhan X, Humbert-Droz M, Mukherjee P, Gevaert O. Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases. PATTERNS (NEW YORK, N.Y.) 2021; 2:100289. [PMID: 34286303 PMCID: PMC8276012 DOI: 10.1016/j.patter.2021.100289] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 02/24/2021] [Accepted: 05/19/2021] [Indexed: 11/20/2022]
Abstract
Free-text clinical notes in electronic health records are more difficult for data mining while the structured diagnostic codes can be missing or erroneous. To improve the quality of diagnostic codes, this work extracts diagnostic codes from free-text notes: five old and new word vectorization methods were used to vectorize Stanford progress notes and predict eight ICD-10 codes of common cardiovascular diseases with logistic regression. The models showed good performance, with TF-IDF as the best vectorization model showing the highest AUROC (0.9499-0.9915) and AUPRC (0.2956-0.8072). The models also showed transferability when tested on MIMIC-III data with AUROC from 0.7952 to 0.9790 and AUPRC from 0.2353 to 0.8084. Model interpretability was shown by the important words with clinical meanings matching each disease. This study shows the feasibility of accurately extracting structured diagnostic codes, imputing missing codes, and correcting erroneous codes from free-text clinical notes for information retrieval and downstream machine-learning applications.
Collapse
Affiliation(s)
- Xianghao Zhan
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Marie Humbert-Droz
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Pritam Mukherjee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
12
|
Si Y, Bernstam EV, Roberts K. Generalized and transferable patient language representation for phenotyping with limited data. J Biomed Inform 2021; 116:103726. [PMID: 33711541 DOI: 10.1016/j.jbi.2021.103726] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 12/14/2020] [Accepted: 02/23/2021] [Indexed: 12/19/2022]
Abstract
The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high performance across the majority of phenotypes. Most important, the multi-task pre-training is almost always either the best-performing model or performs tolerably close to the best-performing model, a property we refer to as robust. All these results lead us to conclude that this multi-task transfer learning architecture is a robust approach for developing generalized and transferable patient language representations for numerous phenotypes.
Collapse
Affiliation(s)
- Yuqi Si
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Elmer V Bernstam
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA; Division of General Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, TX, USA
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA.
| |
Collapse
|
13
|
Deep phenotyping unstructured data mining in an extensive pediatric database to unravel a common KCNA2 variant in neurodevelopmental syndromes. Genet Med 2021; 23:968-971. [PMID: 33500571 PMCID: PMC8105164 DOI: 10.1038/s41436-020-01039-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 10/29/2020] [Accepted: 10/29/2020] [Indexed: 01/08/2023] Open
Abstract
Purpose Electronic health records are gaining popularity to detect and propose interdisciplinary treatments for patients with similar medical histories, diagnoses, and outcomes. These files are compiled by different nonexperts and expert clinicians. Data mining in these unstructured data is a transposable and sustainable methodology to search for patients presenting a high similitude of clinical features. Methods Exome and targeted next-generation sequencing bioinformatics analyses were performed at the Imagine Institute. Similarity Index (SI), an algorithm based on a vector space model (VSM) that exploits concepts extracted from clinical narrative reports was used to identify patients with highly similar clinical features. Results Here we describe a case of “automated diagnosis” indicated by Dr. Warehouse, a biomedical data warehouse oriented toward clinical narrative reports, developed at Necker Children’s Hospital using around 500,000 patients’ records. Through the use of this warehouse, we were able to match and identify two patients sharing very specific clinical neonatal and childhood features harboring the same de novo variant in KCNA2. Conclusion This innovative application of database clustering clinical features could advance identification of patients with rare and common genetic conditions and detect with high accuracy the natural history of patients harboring similar genetic pathogenic variants.
Collapse
|
14
|
Gagalova KK, Leon Elizalde MA, Portales-Casamar E, Görges M. What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions. JMIR Form Res 2020; 4:e17687. [PMID: 32852280 PMCID: PMC7484778 DOI: 10.2196/17687] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 06/09/2020] [Accepted: 07/17/2020] [Indexed: 12/23/2022] Open
Abstract
Background Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. Objective The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. Methods We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. Results Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). Conclusions IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.
Collapse
Affiliation(s)
- Kristina K Gagalova
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.,Research Institute, BC Children's Hospital, Vancouver, BC, Canada
| | - M Angelica Leon Elizalde
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Elodie Portales-Casamar
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada
| | - Matthias Görges
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Anesthesiology, Pharmacology and Therapeutics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
15
|
Clark MM, Hildreth A, Batalov S, Ding Y, Chowdhury S, Watkins K, Ellsworth K, Camp B, Kint CI, Yacoubian C, Farnaes L, Bainbridge MN, Beebe C, Braun JJA, Bray M, Carroll J, Cakici JA, Caylor SA, Clarke C, Creed MP, Friedman J, Frith A, Gain R, Gaughran M, George S, Gilmer S, Gleeson J, Gore J, Grunenwald H, Hovey RL, Janes ML, Lin K, McDonagh PD, McBride K, Mulrooney P, Nahas S, Oh D, Oriol A, Puckett L, Rady Z, Reese MG, Ryu J, Salz L, Sanford E, Stewart L, Sweeney N, Tokita M, Van Der Kraan L, White S, Wigby K, Williams B, Wong T, Wright MS, Yamada C, Schols P, Reynders J, Hall K, Dimmock D, Veeraraghavan N, Defay T, Kingsmore SF. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci Transl Med 2020; 11:11/489/eaat6177. [PMID: 31019026 DOI: 10.1126/scitranslmed.aat6177] [Citation(s) in RCA: 161] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 10/24/2018] [Accepted: 04/01/2019] [Indexed: 12/19/2022]
Abstract
By informing timely targeted treatments, rapid whole-genome sequencing can improve the outcomes of seriously ill children with genetic diseases, particularly infants in neonatal and pediatric intensive care units (ICUs). The need for highly qualified professionals to decipher results, however, precludes widespread implementation. We describe a platform for population-scale, provisional diagnosis of genetic diseases with automated phenotyping and interpretation. Genome sequencing was expedited by bead-based genome library preparation directly from blood samples and sequencing of paired 100-nt reads in 15.5 hours. Clinical natural language processing (CNLP) automatically extracted children's deep phenomes from electronic health records with 80% precision and 93% recall. In 101 children with 105 genetic diseases, a mean of 4.3 CNLP-extracted phenotypic features matched the expected phenotypic features of those diseases, compared with a match of 0.9 phenotypic features used in manual interpretation. We automated provisional diagnosis by combining the ranking of the similarity of a patient's CNLP phenome with respect to the expected phenotypic features of all genetic diseases, together with the ranking of the pathogenicity of all of the patient's genomic variants. Automated, retrospective diagnoses concurred well with expert manual interpretation (97% recall and 99% precision in 95 children with 97 genetic diseases). Prospectively, our platform correctly diagnosed three of seven seriously ill ICU infants (100% precision and recall) with a mean time saving of 22:19 hours. In each case, the diagnosis affected treatment. Genome sequencing with automated phenotyping and interpretation in a median of 20:10 hours may increase adoption in ICUs and, thereby, timely implementation of precise treatments.
Collapse
Affiliation(s)
- Michelle M Clark
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Amber Hildreth
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Pediatrics, University of California San Diego, San Diego, CA 92093, USA.,Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
| | - Sergey Batalov
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Yan Ding
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Shimul Chowdhury
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Kelly Watkins
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | | | - Brandon Camp
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | | | | | - Lauge Farnaes
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Pediatrics, University of California San Diego, San Diego, CA 92093, USA
| | - Matthew N Bainbridge
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Codified Genomics, LLC, Houston, TX 77033, USA
| | - Curtis Beebe
- Rady Children's Hospital, San Diego, CA 92123, USA
| | - Joshua J A Braun
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Margaret Bray
- Alexion Pharmaceuticals Inc., New Haven, CT 06510, USA
| | - Jeanne Carroll
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Pediatrics, University of California San Diego, San Diego, CA 92093, USA
| | - Julie A Cakici
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Sara A Caylor
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Christina Clarke
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Mitchell P Creed
- University of Kansas School of Medicine, Kansas City, MO 66160, USA
| | - Jennifer Friedman
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Neurosciences, University of California San Diego, San Diego, CA 92093, USA
| | | | | | - Mary Gaughran
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | | | | | - Joseph Gleeson
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Neurosciences, University of California San Diego, San Diego, CA 92093, USA
| | | | | | - Raymond L Hovey
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Marie L Janes
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Kejia Lin
- Rady Children's Hospital, San Diego, CA 92123, USA
| | | | - Kyle McBride
- Rady Children's Hospital, San Diego, CA 92123, USA
| | - Patrick Mulrooney
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Shareef Nahas
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Daeheon Oh
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Albert Oriol
- Rady Children's Hospital, San Diego, CA 92123, USA
| | - Laura Puckett
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Zia Rady
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | | | - Julie Ryu
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Pediatrics, University of California San Diego, San Diego, CA 92093, USA
| | - Lisa Salz
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Erica Sanford
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Pediatrics, University of California San Diego, San Diego, CA 92093, USA
| | | | - Nathaly Sweeney
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Pediatrics, University of California San Diego, San Diego, CA 92093, USA
| | - Mari Tokita
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Luca Van Der Kraan
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Sarah White
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Kristen Wigby
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.,Department of Pediatrics, University of California San Diego, San Diego, CA 92093, USA
| | | | - Terence Wong
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Meredith S Wright
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | - Catherine Yamada
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | | | - John Reynders
- Alexion Pharmaceuticals Inc., New Haven, CT 06510, USA
| | | | - David Dimmock
- Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA
| | | | - Thomas Defay
- Alexion Pharmaceuticals Inc., New Haven, CT 06510, USA
| | | |
Collapse
|
16
|
Electronic health records for the diagnosis of rare diseases. Kidney Int 2020; 97:676-686. [DOI: 10.1016/j.kint.2019.11.037] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 11/15/2019] [Accepted: 11/22/2019] [Indexed: 01/13/2023]
|
17
|
Yang DD, Baujat G, Neuraz A, Garcelon N, Messiaen C, Sandrin A, Cheron G, Burgun A, Pejin Z, Cormier-Daire V, Angoulvant F. Healthcare trajectory of children with rare bone disease attending pediatric emergency departments. Orphanet J Rare Dis 2020; 15:2. [PMID: 31900214 PMCID: PMC6942261 DOI: 10.1186/s13023-019-1284-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 12/19/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Children with rare bone diseases (RBDs), whether medically complex or not, raise multiple issues in emergency situations. The healthcare burden of children with RBD in emergency structures remains unknown. The objective of this study was to describe the place of the pediatric emergency department (PED) in the healthcare of children with RBD. METHODS We performed a retrospective single-center cohort study at a French university hospital. We included all children under the age of 18 years with RBD who visited the PED in 2017. By cross-checking data from the hospital clinical data warehouse, we were able to trace the healthcare trajectories of the patients. The main outcome of interest was the incidence (IR) of a second healthcare visit (HCV) within 30 days of the index visit to the PED. The secondary outcomes were the IR of planned and unplanned second HCVs and the proportion of patients classified as having chronic medically complex (CMC) disease at the PED visit. RESULTS The 141 visits to the PED were followed by 84 s HCVs, giving an IR of 0.60 [95% CI: 0.48-0.74]. These second HCVs were planned in 60 cases (IR = 0.43 [95% CI: 0.33-0.55]) and unplanned in 24 (IR = 0.17 [95% CI: 0.11-0.25]). Patients with CMC diseases accounted for 59 index visits (42%) and 43 s HCVs (51%). Multivariate analysis including CMC status as an independent variable, with adjustment for age, yielded an incidence rate ratio (IRR) of second HCVs of 1.51 [95% CI: 0.98-2.32]. The IRR of planned second HCVs was 1.20 [95% CI: 0.76-1.90] and that of unplanned second HCVs was 2.81 [95% CI: 1.20-6.58]. CONCLUSION An index PED visit is often associated with further HCVs in patients with RBD. The IRR of unplanned second HCVs was high, highlighting the major burden of HCVs for patients with chronic and severe disease.
Collapse
Affiliation(s)
- David Dawei Yang
- Assistance Publique - Hôpitaux de Paris, Pediatric Emergency Department, Necker-Enfants Malades Hospital, Paris Descartes University - Sorbonne Paris Cité, Paris, France.
| | - Geneviève Baujat
- Assistance Publique - Hôpitaux de Paris, Departement of Genetics, National Reference Center for Skeletal Dysplasia Hôpital Necker-Enfants Malades, Paris, France
- Département de Génétique, Université Paris Descartes-Sorbonne Paris Cité, INSERM UMR1163, Institut IMAGINE, Hôpital Necker-Enfants Malades, Paris, France
| | - Antoine Neuraz
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- Assistance Publique - Hôpitaux de Paris, Department of Medical Informatics, Necker-Enfants Malades Hospital, Paris Descartes University, Sorbonne Paris Cité, 75015, Paris, France
| | - Nicolas Garcelon
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- Institut IMAGINE, Plateforme de Data Science, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Claude Messiaen
- Banque Nationale de Données Maladies Rares, Hôpitaux de Paris, Hôpital Necker-Enfants Malades, Paris, France
| | - Arnaud Sandrin
- Banque Nationale de Données Maladies Rares, Hôpitaux de Paris, Hôpital Necker-Enfants Malades, Paris, France
| | - Gérard Cheron
- Assistance Publique - Hôpitaux de Paris, Pediatric Emergency Department, Necker-Enfants Malades Hospital, Paris Descartes University - Sorbonne Paris Cité, Paris, France
| | - Anita Burgun
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- Assistance Publique - Hôpitaux de Paris, Department of Medical Informatics, Necker-Enfants Malades Hospital, Paris Descartes University, Sorbonne Paris Cité, 75015, Paris, France
| | - Zagorka Pejin
- Hôpitaux de Paris, Department of Pediatric Orthopedics, Necker-Enfants Malades Hospital, Paris Descartes University, Sorbonne Paris Cité, 75015, Paris, France
| | - Valérie Cormier-Daire
- Assistance Publique - Hôpitaux de Paris, Departement of Genetics, National Reference Center for Skeletal Dysplasia Hôpital Necker-Enfants Malades, Paris, France
- Département de Génétique, Université Paris Descartes-Sorbonne Paris Cité, INSERM UMR1163, Institut IMAGINE, Hôpital Necker-Enfants Malades, Paris, France
| | - François Angoulvant
- Assistance Publique - Hôpitaux de Paris, Pediatric Emergency Department, Necker-Enfants Malades Hospital, Paris Descartes University - Sorbonne Paris Cité, Paris, France.
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France.
| |
Collapse
|
18
|
[Basis and perspectives of artificial intelligence in radiation therapy]. Cancer Radiother 2019; 23:913-916. [PMID: 31645301 DOI: 10.1016/j.canrad.2019.08.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 08/15/2019] [Accepted: 08/20/2019] [Indexed: 11/23/2022]
Abstract
Artificial intelligence is a highly polysemic term. In computer science, with the objective of being able to solve totally new problems in new contexts, artificial intelligence includes connectionism (neural networks) for learning and logics for reasoning. Artificial intelligence algorithms mimic tasks normally requiring human intelligence, like deduction, induction, and abduction. All apply to radiation oncology. Combined with radiomics, neural networks have obtained good results in image classification, natural language processing, phenotyping based on electronic health records, and adaptive radiation therapy. General adversial networks have been tested to generate synthetic data. Logics based systems have been developed for providing formal domain ontologies, supporting clinical decision and checking consistency of the systems. Artificial intelligence must integrate both deep learning and logic approaches to perform complex tasks and go beyond the so-called narrow artificial intelligence that is tailored to perform some highly specialized task. Combined together with mechanistic models, artificial intelligence has the potential to provide new tools such as digital twins for precision oncology.
Collapse
|
19
|
Grabar N, Grouin C. A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook. Yearb Med Inform 2019; 28:218-222. [PMID: 31419835 PMCID: PMC6697498 DOI: 10.1055/s-0039-1677937] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
OBJECTIVES To analyze the content of publications within the medical Natural Language Processing (NLP) domain in 2018. METHODS Automatic and manual pre-selection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS Two best papers have been selected this year. One dedicated to the generation of multi- documents summaries and another dedicated to the generation of imaging reports. We also proposed an analysis of the content of main research trends of NLP publications in 2018. CONCLUSIONS The year 2018 is very rich with regard to NLP issues and topics addressed. It shows the will of researchers to go towards robust and reproducible results. Researchers also prove to be creative for original issues and approaches.
Collapse
Affiliation(s)
- Natalia Grabar
- LIMSI, CNRS, Université Paris-Saclay, Orsay, France
- STL, CNRS, Université de Lille, Villeneuve-d'Ascq, France
| | - Cyril Grouin
- LIMSI, CNRS, Université Paris-Saclay, Orsay, France
| | | |
Collapse
|
20
|
Abstract
OBJECTIVES To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2018. METHOD A bibliographic search using a combination of MeSH descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting of the editorial team was organized to conclude on the selection of best papers. RESULTS Among the 1,469 retrieved papers published in 2018 in the various areas of CRI, the full review process selected four best papers. The first best paper describes a simple algorithm detecting co-morbidities in Electronic Healthcare Records (EHRs) using a clinical data warehouse and a knowledge base. The authors of the second best paper present a federated algorithm for predicting heart failure hospital admissions based on patients' medical history described in their distributed EHRs. The third best paper reports the evaluation of an open source, interoperable, and scalable data quality assessment tool measuring completeness of data items, which can be run on different architectures (EHRs and Clinical Data Warehouses (CDWs) based on PCORnet or OMOP data models). The fourth best paper reports a data quality program conducted across 37 hospitals addressing data quality Issues through the whole data life cycle from patient to researcher. CONCLUSIONS Research efforts in the CRI field currently focus on consolidating promises of early Distributed Research Networks aimed at maximizing the potential of large-scale, harmonized data from diverse, quickly developing digital sources. Data quality assessment methods and tools as well as privacy-enhancing techniques are major concerns. It is also notable that, following examples in the US and Asia, ambitious regional or national plans in Europe are launched that aim at developing big data and new artificial intelligence technologies to contribute to the understanding of health and diseases in whole populations and whole health systems, and returning actionable feedback loops to improve existing models of research and care. The use of "real-world" data is continuously increasing but the ultimate role of this data in clinical research remains to be determined.
Collapse
Affiliation(s)
- Christel Daniel
- AP-HP Information Systems Direction, Paris, France
- Sorbonne University, University Paris 13, Sorbonne Paris Cité, INSERM UMR_S 1142, LIMICS, Paris, France
| | | | | |
Collapse
|
21
|
The Korea Cancer Big Data Platform (K-CBP) for Cancer Research. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:ijerph16132290. [PMID: 31261630 PMCID: PMC6651426 DOI: 10.3390/ijerph16132290] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 05/31/2019] [Accepted: 06/24/2019] [Indexed: 12/23/2022]
Abstract
Data warehousing is the most important technology to address recent advances in precision medicine. However, a generic clinical data warehouse does not address unstructured and insufficient data. In precision medicine, it is essential to develop a platform that can collect and utilize data. Data were collected from electronic medical records, genomic sequences, tumor biopsy specimens, and national cancer control initiative databases in the National Cancer Center (NCC), Korea. Data were de-identified and stored in a safe and independent space. Unstructured clinical data were standardized and incorporated into cancer registries and linked to cancer genome sequences and tumor biopsy specimens. Finally, national cancer control initiative data from the public domain were independently organized and linked to cancer registries. We constructed a system for integrating and providing various cancer data called the Korea Cancer Big Data Platform (K-CBP). Although the K-CBP could be used for cancer research, the legal and regulatory aspects of data distribution and usage need to be addressed first. Nonetheless, the system will continue collecting data from cancer-related resources that will hopefully facilitate precision-based research.
Collapse
|
22
|
Xue H, Peng J, Shang X. Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO. BMC SYSTEMS BIOLOGY 2019; 13:34. [PMID: 30953559 PMCID: PMC6449884 DOI: 10.1186/s12918-019-0697-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Background Improving efficiency of disease diagnosis based on phenotype ontology is a critical yet challenging research area. Recently, Human Phenotype Ontology (HPO)-based semantic similarity has been affectively and widely used to identify causative genes and diseases. However, current phenotype similarity measurements just consider the annotations and hierarchy structure of HPO, neglecting the definition description of phenotype terms. Results In this paper, we propose a novel phenotype similarity measurement, termed as DisPheno, which adequately incorporates the definition of phenotype terms in addition to HPO structure and annotations to measure the similarity between phenotype terms. DisPheno also integrates phenotype term associations into phenotype-set similarity measurement using gene and disease annotations of phenotype terms. Conclusions Compared with five existing state-of-the-art methods, DisPheno shows great performance in HPO-based phenotype semantic similarity measurement and improves the efficiency of disease identification, especially on noisy patients dataset.
Collapse
Affiliation(s)
- Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
23
|
Shen F, Zhao Y, Wang L, Mojarad MR, Wang Y, Liu S, Liu H. Rare disease knowledge enrichment through a data-driven approach. BMC Med Inform Decis Mak 2019; 19:32. [PMID: 30764825 PMCID: PMC6376651 DOI: 10.1186/s12911-019-0752-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 02/01/2019] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Existing resources to assist the diagnosis of rare diseases are usually curated from the literature that can be limited for clinical use. It often takes substantial effort before the suspicion of a rare disease is even raised to utilize those resources. The primary goal of this study was to apply a data-driven approach to enrich existing rare disease resources by mining phenotype-disease associations from electronic medical record (EMR). METHODS We first applied association rule mining algorithms on EMR to extract significant phenotype-disease associations and enriched existing rare disease resources (Human Phenotype Ontology and Orphanet (HPO-Orphanet)). We generated phenotype-disease bipartite graphs for HPO-Orphanet, EMR, and enriched knowledge base HPO-Orphanet + and conducted a case study on Hodgkin lymphoma to compare performance on differential diagnosis among these three graphs. RESULTS We used disease-disease similarity generated by the eRAM, an existing rare disease encyclopedia, as a gold standard to compare the three graphs with sensitivity and specificity as (0.17, 0.36, 0.46) and (0.52, 0.47, 0.51) for three graphs respectively. We also compared the top 15 diseases generated by the HPO-Orphanet + graph with eRAM and another clinical diagnostic tool, the Phenomizer. CONCLUSIONS Per our evaluation results, our approach was able to enrich existing rare disease knowledge resources with phenotype-disease associations from EMR and thus support rare disease differential diagnosis.
Collapse
Affiliation(s)
- Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA.
| | - Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA
| | - Majid Rastegar Mojarad
- Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA.
| |
Collapse
|