51
|
Neuraz A, Looten V, Rance B, Daniel N, Garcelon N, Llanos LC, Burgun A, Rosset S. Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task? Stud Health Technol Inform 2019; 264:1558-1559. [PMID: 31438230 DOI: 10.3233/shti190533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We explore the impact of data source on word representations for different NLP tasks in the clinical domain in French (natural language understanding and text classification). We compared word embeddings (Fasttext) and language models (ELMo), learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for one of the two tasks(+7% and +8% of gain in F1-score).
Collapse
|
52
|
Digan W, Wack M, Looten V, Neuraz A, Burgun A, Rance B. Evaluating the Impact of Text Duplications on a Corpus of More than 600,000 Clinical Narratives in a French Hospital. Stud Health Technol Inform 2019; 264:103-107. [PMID: 31437894 DOI: 10.3233/shti190192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A significant part of medical knowledge is stored as unstructured free text. However, clinical narratives are known to contain duplicated sections due to clinicians' copy/paste parts of a former report into a new one. In this study, we aim at evaluating the duplications found within patient records in more than 650,000 French clinical narratives. We adapted a method to identify efficiently duplicated zones in a reasonable time. We evaluated the potential impact of duplications in two use cases: the presence of (i) treatments and/or (ii) relative dates. We identified an average rate of duplication of 33%. We found that 20% of the document contained drugs mentioned only in duplicated zones and that 1.45% of the document contained mentions of relative dates in duplicated zone, that could potentially lead to erroneous interpretation. We suggest the systematic identification and annotation of duplicated zones in clinical narratives for information extraction and temporal-oriented tasks.
Collapse
|
53
|
Arnoux-Guenegou A, Girardeau Y, Chen X, Deldossi M, Aboukhamis R, Faviez C, Dahamna B, Karapetiantz P, Guillemin-Lanne S, Lillo-Le Louët A, Texier N, Burgun A, Katsahian S. The Adverse Drug Reactions From Patient Reports in Social Media Project: Protocol for an Evaluation Against a Gold Standard. JMIR Res Protoc 2019; 8:e11448. [PMID: 31066711 PMCID: PMC6528435 DOI: 10.2196/11448] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/16/2018] [Accepted: 12/21/2018] [Indexed: 12/30/2022] Open
Abstract
Background Social media is a potential source of information on postmarketing drug safety surveillance that still remains unexploited nowadays. Information technology solutions aiming at extracting adverse reactions (ADRs) from posts on health forums require a rigorous evaluation methodology if their results are to be used to make decisions. First, a gold standard, consisting of manual annotations of the ADR by human experts from the corpus extracted from social media, must be implemented and its quality must be assessed. Second, as for clinical research protocols, the sample size must rely on statistical arguments. Finally, the extraction methods must target the relation between the drug and the disease (which might be either treated or caused by the drug) rather than simple co-occurrences in the posts. Objective We propose a standardized protocol for the evaluation of a software extracting ADRs from the messages on health forums. The study is conducted as part of the Adverse Drug Reactions from Patient Reports in Social Media project. Methods Messages from French health forums were extracted. Entity recognition was based on Racine Pharma lexicon for drugs and Medical Dictionary for Regulatory Activities terminology for potential adverse events (AEs). Natural language processing–based techniques automated the ADR information extraction (relation between the drug and AE entities). The corpus of evaluation was a random sample of the messages containing drugs and/or AE concepts corresponding to recent pharmacovigilance alerts. A total of 2 persons experienced in medical terminology manually annotated the corpus, thus creating the gold standard, according to an annotator guideline. We will evaluate our tool against the gold standard with recall, precision, and f-measure. Interannotator agreement, reflecting gold standard quality, will be evaluated with hierarchical kappa. Granularities in the terminologies will be further explored. Results Necessary and sufficient sample size was calculated to ensure statistical confidence in the assessed results. As we expected a global recall of 0.5, we needed at least 384 identified ADR concepts to obtain a 95% CI with a total width of 0.10 around 0.5. The automated ADR information extraction in the corpus for evaluation is already finished. The 2 annotators already completed the annotation process. The analysis of the performance of the ADR information extraction module as compared with gold standard is ongoing. Conclusions This protocol is based on the standardized statistical methods from clinical research to create the corpus, thus ensuring the necessary statistical power of the assessed results. Such evaluation methodology is required to make the ADR information extraction software useful for postmarketing drug safety surveillance. International Registered Report Identifier (IRRID) RR1-10.2196/11448
Collapse
|
54
|
Boulet S, Ursino M, Thall P, Burgun A, Zaanan A, Zohar S, Jannot A. Intégration de l’élicitation d’experts dans une méthode de sélection de variables en Bayésien par la méthode de « power prior ». Application au cancer du colon. Rev Epidemiol Sante Publique 2019. [DOI: 10.1016/j.respe.2019.03.097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
|
55
|
Boulet S, Ursino M, Thall P, Landi B, Lepère C, Pernot S, Burgun A, Taieb J, Zaanan A, Zohar S, Jannot AS. Integration of elicited expert information via a power prior in Bayesian variable selection: Application to colon cancer data. Stat Methods Med Res 2019; 29:541-567. [PMID: 30963815 DOI: 10.1177/0962280219841082] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
BACKGROUND Building tools to support personalized medicine needs to model medical decision-making. For this purpose, both expert and real world data provide a rich source of information. Currently, machine learning techniques are developing to select relevant variables for decision-making. Rather than using data-driven analysis alone, eliciting prior information from physicians related to their medical decision-making processes can be useful in variable selection. Our framework is electronic health records data on repeated dose adjustment of Irinotecan for the treatment of metastatic colorectal cancer. We propose a method that incorporates elicited expert weights associated with variables involved in dose reduction decisions into the Stochastic Search Variable Selection (SSVS), a Bayesian variable selection method, by using a power prior. METHODS Clinician experts were first asked to provide numerical clinical relevance weights to express their beliefs about the importance of each variable in their medical decision making. Then, we modeled the link between repeated dose reduction, patient characteristics, and toxicities by assuming a logistic mixed-effects model. Simulated data were generated based on the elicited weights and combined with the observed dose reduction data via a power prior. We compared the Bayesian power prior-based SSVS performance to the usual SSVS in our case study, including a sensitivity analysis using the power prior parameter. RESULTS The selected variables differ when using only expert knowledge, only the usual SSVS, or combining both. Our method enables one to select rare variables that may be missed using only the observed data and to discard variables that appear to be relevant based on the data but not relevant from the expert perspective. CONCLUSION We introduce an innovative Bayesian variable selection method that adaptively combines elicited expert information and real world data. The method selects a set of variables relevant to model medical decision process.
Collapse
|
56
|
Giraud P, Giraud P, Gasnier A, El Ayachy R, Kreps S, Foy JP, Durdux C, Huguet F, Burgun A, Bibault JE. Radiomics and Machine Learning for Radiotherapy in Head and Neck Cancers. Front Oncol 2019; 9:174. [PMID: 30972291 PMCID: PMC6445892 DOI: 10.3389/fonc.2019.00174] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 02/28/2019] [Indexed: 12/13/2022] Open
Abstract
Introduction: An increasing number of parameters can be considered when making decisions in oncology. Tumor characteristics can also be extracted from imaging through the use of radiomics and add to this wealth of clinical data. Machine learning can encompass these parameters and thus enhance clinical decision as well as radiotherapy workflow. Methods: We performed a description of machine learning applications at each step of treatment by radiotherapy in head and neck cancers. We then performed a systematic review on radiomics and machine learning outcome prediction models in head and neck cancers. Results: Machine Learning has several promising applications in treatment planning with automatic organ at risk delineation improvements and adaptative radiotherapy workflow automation. It may also provide new approaches for Normal Tissue Complication Probability models. Radiomics may provide additional data on tumors for improved machine learning powered predictive models, not only on survival, but also on risk of distant metastasis, in field recurrence, HPV status and extra nodal spread. However, most studies provide preliminary data requiring further validation. Conclusion: Promising perspectives arise from machine learning applications and radiomics based models, yet further data are necessary for their implementation in daily care.
Collapse
|
57
|
Bussy S, Veil R, Looten V, Burgun A, Gaïffas S, Guilloux A, Ranque B, Jannot AS. Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework. BMC Med Res Methodol 2019; 19:50. [PMID: 30841867 PMCID: PMC6404305 DOI: 10.1186/s12874-019-0673-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 02/04/2019] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a binary outcome setting (where we want to predict whether the readmission will occur within an arbitrarily chosen delay or not) or within a survival analysis setting (where the outcomes are directly the censored times), but not both. In this paper, we propose a comparison methodology to weight up those different settings both in terms of prediction and variables selection, while incorporating advanced machine learning strategies. METHODS Using a high-dimensional case study on a sickle-cell disease (SCD) cohort, we compare 8 statistical methods. In the binary outcome setting, we consider logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting (GB) and neural network (NN); while on the survival analysis setting, we consider the Cox Proportional Hazards (PH), the CURE and the C-mix models. We also propose a method using Gaussian Processes to extract meaningfull structured covariates from longitudinal data. RESULTS Among all assessed statistical methods, the survival analysis ones obtain the best results. In particular the C-mix model yields the better performances in both the two considered settings (AUC =0.94 in the binary outcome setting), as well as interesting interpretation aspects. There is some consistency in selected covariates across methods within a setting, but not much across the two settings. CONCLUSIONS It appears that learning withing the survival analysis setting first (so using all the temporal information), and then going back to a binary prediction using the survival estimates gives significantly better prediction performances than the ones obtained by models trained "directly" within the binary outcome setting.
Collapse
|
58
|
Looten V, Neuraz A, Garcelon N, Burgun A, Chatellier G, Rance B. Description des courriels des patients pris en charge à l’Hôpital Européen Georges Pompidou, Paris, France. Rev Epidemiol Sante Publique 2019. [DOI: 10.1016/j.respe.2019.01.090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
|
59
|
Looten V, Le Faou AL, de la Rocque de Severac PL, Burgun A, Boussadi A. Renseignement du statut tabagique dans un système d’information hospitalier : une étude observationnelle à partir de l’entrepôt de données cliniques de l’Hôpital Européen Georges Pompidou, Paris, France. Rev Epidemiol Sante Publique 2019. [DOI: 10.1016/j.respe.2019.01.097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
60
|
Looten V, Gariepy J, Simon M, Villey V, Burgun A, Chatellier G. L’entrepôt de données cliniques comme nouvel acteur du codage partagé : l’exemple de la dénutrition à l’hôpital européen Georges-Pompidou, Paris, France. Rev Epidemiol Sante Publique 2019. [DOI: 10.1016/j.respe.2019.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
61
|
Cohen S, Jannot AS, Iserin L, Bonnet D, Burgun A, Escudié JB. Accuracy of claim data in the identification and classification of adults with congenital heart diseases in electronic medical records. Arch Cardiovasc Dis 2019; 112:31-43. [PMID: 30612895 DOI: 10.1016/j.acvd.2018.07.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 07/23/2018] [Indexed: 11/17/2022]
Abstract
BACKGROUND The content of electronic medical records (EMRs) encompasses both structured data, such as billing codes, and unstructured data, including free-text reports. Epidemiological and clinical research into adult congenital heart disease (ACHD) increasingly relies on administrative claim data using the International Classification of Diseases (9th revision) (ICD-9). In France, administrative databases use ICD-10, the reliability of which is largely unknown in this context. AIMS To assess the accuracy of ICD-10 codes retrieved from administrative claim data in the identification and classification of ACHD. METHODS We randomly included 6000 patients hospitalized at least once in 2000-2014 in a cardiology department with a dedicated specialized ACHD Unit. For each patient, the clinical diagnosis extracted from the EMR was compared with the assigned ICD-10 codes. Performance of ICD-10 codes in the identification and classification of ACHD was assessed by estimating sensitivity, specificity and positive predictive value. RESULTS Among the 6000 patients included, 780 (13%) patients with ACHD were manually identified from EMRs (107,092 documents). ICD-10 codes correctly categorized 629 as having ACHD (sensitivity 0.81, 95% confidence interval 0.78-0.83), with a specificity of 0.99 (95% confidence interval 0.99-1). The performance of ICD-10 codes in correctly categorizing the ACHD defect subtype depended on the defect, with sensitivity ranging from 0 (e.g. unspecified congenital malformation of tricuspid valve) to 1 (e.g. common arterial trunk), and specificity ranging from 0.99 to 1. CONCLUSIONS Administrative data using ICD-10 codes is a precise tool for detecting ACHD, and may be used to establish a national cohort. Mining free-text reports in addition to coded administrative data may offset the lack of sensitivity and accuracy when describing the spectrum of congenital heart disease using ICD-10 codes.
Collapse
|
62
|
Gruson D, Petrelluzzi J, Mehl J, Burgun A, Garcelon N. [Ethical, legal and operational issues of artificial intelligence]. LA REVUE DU PRATICIEN 2018; 68:1145-1148. [PMID: 30869229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Ethical, legal and operational issues of artificial intelligence. Mastering the ethical issues associated with artificial intelligence healthcare without curbing its diffusion source of innovations and advances for our health system: this is the meaning of the idea of «positive regulation» of AI in healthcare This article presents the ethical, legal and operational issues associated with this approach at the heart of the Ethik-IA initiative. It shows that the answers to be provided are first and foremost recommendations of good practice, as in the case of the prototype of the standard of good practice of AI applied to genomic data developed with the teams of the Imagine University Hospital Institute.
Collapse
|
63
|
Bibault JE, Giraud P, Housset M, Durdux C, Taieb J, Berger A, Coriat R, Chaussade S, Dousset B, Nordlinger B, Burgun A. Author Correction: Deep Learning and Radiomics predict complete response after neo-adjuvant chemoradiation for locally advanced rectal cancer. Sci Rep 2018; 8:16914. [PMID: 30420742 PMCID: PMC6232138 DOI: 10.1038/s41598-018-35359-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
64
|
Bibault JE, Giraud P, Burgun A. Prédiction par deep learning de la réponse complète après chimioradiothérapie néoadjuvante dans le cancer du rectum localement évolué. Cancer Radiother 2018. [DOI: 10.1016/j.canrad.2018.07.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
65
|
Digan W, Countouris H, Barritault M, Baudoin D, Laurent-Puig P, Blons H, Burgun A, Rance B. An architecture for genomics analysis in a clinical setting using Galaxy and Docker. Gigascience 2018; 6:1-9. [PMID: 29048555 PMCID: PMC5691353 DOI: 10.1093/gigascience/gix099] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 10/09/2017] [Indexed: 12/12/2022] Open
Abstract
Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker.
Collapse
|
66
|
Djian J, Lellouch AG, Botter C, Levy J, Burgun A, Hivelin M, Lantieri L. [Clinical photography by smartphone in plastic surgery and protection of personal data: Development of a secured platform and application on 979 patients]. ANN CHIR PLAST ESTH 2018; 64:33-43. [PMID: 30001862 DOI: 10.1016/j.anplas.2018.06.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 06/11/2018] [Indexed: 10/28/2022]
Abstract
BACKGROUND The clinical photography in plastic and reconstructive surgery has known a numerical breakthrough. The storage of online data, massive means of analysis such as facial recognitions algorithms poses a serious issue when it comes to the protection of personal data. We will assess a platform's benefits in connection with the computerized medical record, which will allow keeping the photos filed and centralized in a smart and secure manner. METHOD We interviewed 300 plastic surgeons about the role of smartphone in their clinical practice. Concomitantly, we developed an innovative platform called Surgeon©, a secure way to index, file and send photographs with a smartphone on our hospital's server. Each photographic sequence was qualified using a specific form. We then collected prospectively, between May 1st 2017 and March 30th 2018, the number of patients photographed, the number of sequences and photographs taken and the average number of sequences per patient. RESULTS Out of 86 French plastic surgeons surveyed, 81% say that they could not go on with their daily practice today without their smartphone. Photographs taken were stored in their smartphones (50%) or synced with virtual storage (25.6%). A majority (80.2%) would use a dedicated secured smartphone application. Our application allowed us to photograph 979 patients, or 2345 sequences and 8112 photographs, with an average of 2.28 sequences per patient. CONCLUSION Thanks to its ergonomics and security, this platform can be set up in a hospital ward and beyond.
Collapse
|
67
|
Garcelon N, Neuraz A, Salomon R, Bahi-Buisson N, Amiel J, Picard C, Mahlaoui N, Benoit V, Burgun A, Rance B. Next generation phenotyping using narrative reports in a rare disease clinical data warehouse. Orphanet J Rare Dis 2018; 13:85. [PMID: 29855327 PMCID: PMC5984368 DOI: 10.1186/s13023-018-0830-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 05/23/2018] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Secondary use of data collected in Electronic Health Records opens perspectives for increasing our knowledge of rare diseases. The clinical data warehouse (named Dr. Warehouse) at the Necker-Enfants Malades Children's Hospital contains data collected during normal care for thousands of patients. Dr. Warehouse is oriented toward the exploration of clinical narratives. In this study, we present our method to find phenotypes associated with diseases of interest. METHODS We leveraged the frequency and TF-IDF to explore the association between clinical phenotypes and rare diseases. We applied our method in six use cases: phenotypes associated with the Rett, Lowe, Silver Russell, Bardet-Biedl syndromes, DOCK8 deficiency and Activated PI3-kinase Delta Syndrome (APDS). We asked domain experts to evaluate the relevance of the top-50 (for frequency and TF-IDF) phenotypes identified by Dr. Warehouse and computed the average precision and mean average precision. RESULTS Experts concluded that between 16 and 39 phenotypes could be considered as relevant in the top-50 phenotypes ranked by descending frequency discovered by Dr. Warehouse (resp. between 11 and 41 for TF-IDF). Average precision ranges from 0.55 to 0.91 for frequency and 0.52 to 0.95 for TF-IDF. Mean average precision was 0.79. Our study suggests that phenotypes identified in clinical narratives stored in Electronic Health Record can provide rare disease specialists with candidate phenotypes that can be used in addition to the literature. CONCLUSIONS Clinical Data Warehouses can be used to perform Next Generation Phenotyping, especially in the context of rare diseases. We have developed a method to detect phenotypes associated with a group of patients using medical concepts extracted from free-text clinical narratives.
Collapse
|
68
|
Zapletal E, Bibault JE, Giraud P, Burgun A. Integrating Multimodal Radiation Therapy Data into i2b2. Appl Clin Inform 2018; 9:377-390. [PMID: 29847842 PMCID: PMC5976493 DOI: 10.1055/s-0038-1651497] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Background
Clinical data warehouses are now widely used to foster clinical and translational research and the Informatics for Integrating Biology and the Bedside (i2b2) platform has become a de facto standard for storing clinical data in many projects. However, to design predictive models and assist in personalized treatment planning in cancer or radiation oncology, all available patient data need to be integrated into i2b2, including radiation therapy data that are currently not addressed in many existing i2b2 sites.
Objective
To use radiation therapy data in projects related to rectal cancer patients, we assessed the feasibility of integrating radiation oncology data into the i2b2 platform.
Methods
The Georges Pompidou European Hospital, a hospital from the Assistance Publique – Hôpitaux de Paris group, has developed an i2b2-based clinical data warehouse of various structured and unstructured clinical data for research since 2008. To store and reuse various radiation therapy data—dose details, activities scheduling, and dose-volume histogram (DVH) curves—in this repository, we first extracted raw data by using some reverse engineering techniques and a vendor's application programming interface. Then, we implemented a hybrid storage approach by combining the standard i2b2 “Entity-Attribute-Value” storage mechanism with a “JavaScript Object Notation (JSON) document-based” storage mechanism without modifying the i2b2 core tables. Validation was performed using (1) the Business Objects framework for replicating vendor's application screens showing dose details and activities scheduling data and (2) the R software for displaying the DVH curves.
Results
We developed a pipeline to integrate the radiation therapy data into the Georges Pompidou European Hospital i2b2 instance and evaluated it on a cohort of 262 patients. We were able to use the radiation therapy data on a preliminary use case by fetching the DVH curve data from the clinical data warehouse and displaying them in a R chart.
Conclusion
By adding radiation therapy data into the clinical data warehouse, we were able to analyze radiation therapy response in cancer patients and we have leveraged the i2b2 platform to store radiation therapy data, including detailed information such as the DVH to create new ontology-based modules that provides research investigators with a wider spectrum of clinical data.
Collapse
|
69
|
Chen X, Faviez C, Schuck S, Lillo-Le-Louët A, Texier N, Dahamna B, Huot C, Foulquié P, Pereira S, Leroux V, Karapetiantz P, Guenegou-Arnoux A, Katsahian S, Bousquet C, Burgun A. Mining Patients' Narratives in Social Media for Pharmacovigilance: Adverse Effects and Misuse of Methylphenidate. Front Pharmacol 2018; 9:541. [PMID: 29881351 PMCID: PMC5978246 DOI: 10.3389/fphar.2018.00541] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 05/04/2018] [Indexed: 12/29/2022] Open
Abstract
Background: The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA) have recognized social media as a new data source to strengthen their activities regarding drug safety. Objective: Our objective in the ADR-PRISM project was to provide text mining and visualization tools to explore a corpus of posts extracted from social media. We evaluated this approach on a corpus of 21 million posts from five patient forums, and conducted a qualitative analysis of the data available on methylphenidate in this corpus. Methods: We applied text mining methods based on named entity recognition and relation extraction in the corpus, followed by signal detection using proportional reporting ratio (PRR). We also used topic modeling based on the Correlated Topic Model to obtain the list of the matics in the corpus and classify the messages based on their topics. Results: We automatically identified 3443 posts about methylphenidate published between 2007 and 2016, among which 61 adverse drug reactions (ADR) were automatically detected. Two pharmacovigilance experts evaluated manually the quality of automatic identification, and a f-measure of 0.57 was reached. Patient's reports were mainly neuro-psychiatric effects. Applying PRR, 67% of the ADRs were signals, including most of the neuro-psychiatric symptoms but also palpitations. Topic modeling showed that the most represented topics were related to Childhood and Treatment initiation, but also Side effects. Cases of misuse were also identified in this corpus, including recreational use and abuse. Conclusion: Named entity recognition combined with signal detection and topic modeling have demonstrated their complementarity in mining social media data. An in-depth analysis focused on methylphenidate showed that this approach was able to detect potential signals and to provide better understanding of patients' behaviors regarding drugs, including misuse.
Collapse
|
70
|
Jantzen R, Looten V, Deborde T, Amar L, Bobrie G, Postel-Vinay N, Battaglia C, Tache A, Chedid A, Dhib MM, Plouin PF, Chatellier G, Rey G, Burgun A, Azizi M, Jannot AS. Chaînage de données hospitalières de patients produites en routine avec leurs données issues du registre national d’identification des personnes physiques : retour d’expérience. Rev Epidemiol Sante Publique 2018. [DOI: 10.1016/j.respe.2018.03.117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022] Open
|
71
|
Karapetiantz P, Bellet F, Audeh B, Lardon J, Leprovost D, Aboukhamis R, Morlane-Hondère F, Grouin C, Burgun A, Katsahian S, Jaulent MC, Beyens MN, Lillo-Le Louët A, Bousquet C. Descriptions of Adverse Drug Reactions Are Less Informative in Forums Than in the French Pharmacovigilance Database but Provide More Unexpected Reactions. Front Pharmacol 2018; 9:439. [PMID: 29765326 PMCID: PMC5938397 DOI: 10.3389/fphar.2018.00439] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 04/13/2018] [Indexed: 01/28/2023] Open
Abstract
Background: Social media have drawn attention for their potential use in Pharmacovigilance. Recent work showed that it is possible to extract information concerning adverse drug reactions (ADRs) from posts in social media. The main objective of the Vigi4MED project was to evaluate the relevance and quality of the information shared by patients on web forums about drug safety and its potential utility for pharmacovigilance. Methods: After selecting websites of interest, we manually evaluated the relevance of the content of posts for pharmacovigilance related to six drugs (agomelatine, baclofen, duloxetine, exenatide, strontium ranelate, and tetrazepam). We compared forums to the French Pharmacovigilance Database (FPVD) to (1) evaluate whether they contained relevant information to characterize a pharmacovigilance case report (patient’s age and sex; treatment indication, dose and duration; time-to-onset (TTO) and outcome of the ADR, and drug dechallenge and rechallenge) and (2) perform impact analysis (nature, seriousness, unexpectedness, and outcome of the ADR). Results: The cases in the FPVD were significantly more informative than posts in forums for patient description (age, sex), treatment description (dose, duration, TTO), and outcome of the ADR, but the indication for the treatment was more often found in forums. Cases were more often serious in the FPVD than in forums (46% vs. 4%), but forums more often contained an unexpected ADR than the FPVD (24% vs. 17%). Moreover, 197 unexpected ADRs identified in forums were absent from the FPVD and the distribution of the MedDRA System Organ Classes (SOCs) was different between the two data sources. Discussion: This study is the first to evaluate if patients’ posts may qualify as potential and informative case reports that should be stored in a pharmacovigilance database in the same way as case reports submitted by health professionals. The posts were less informative (except for the indication) and focused on less serious ADRs than the FPVD cases, but more unexpected ADRs were presented in forums than in the FPVD and their SOCs were different. Thus, web forums should be considered as a secondary, but complementary source for pharmacovigilance.
Collapse
|
72
|
Garcelon N, Neuraz A, Salomon R, Faour H, Benoit V, Delapalme A, Munnich A, Burgun A, Rance B. A clinician friendly data warehouse oriented toward narrative reports: Dr. Warehouse. J Biomed Inform 2018; 80:52-63. [DOI: 10.1016/j.jbi.2018.02.019] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 02/22/2018] [Accepted: 02/28/2018] [Indexed: 01/26/2023]
|
73
|
Ethier J, McGilchrist M, Barton A, Cloutier A, Curcin V, Delaney BC, Burgun A. The TRANSFoRm project: Experience and lessons learned regarding functional and interoperability requirements to support primary care. Learn Health Syst 2018; 2:e10037. [PMID: 31245579 PMCID: PMC6508823 DOI: 10.1002/lrh2.10037] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 07/05/2017] [Accepted: 07/12/2017] [Indexed: 01/02/2023] Open
Abstract
INTRODUCTION The current model of medical knowledge production, transfer, and application suffers from serious shortcomings. Learning health systems (LHS) have recently emerged as a potential solution-systems in which health information generated from patients is continuously analyzed to improve knowledge that will be transferred to patient care. METHOD Various approaches of data integration already exist and could be considered for the implementation of a LHS. We discuss what are the possible informatics approaches to address the functional requirements of LHS, in the specific context of primary care, and present the experience and lessons learned from the TRANSFoRm project. RESULT Implemented in 4 countries around 5 systems, TRANSFoRm is based on a local-as-view data mediation approach integrating the structural and terminological models in the same framework. It clearly demonstrated that it has the potential to address the requirements for a LHS in primary care, by dealing with data fragmented across multiple points of service. Also, it has the potential to support the generation of hypotheses from the context of clinical care, retrospective and prospective research, and decision support systems that improve the relevance of medical decisions. CONCLUSION The LHS approach embodies a shift from an institution-centered to a patient-centered perspective in knowledge production and transfer and can address important challenges in the primary care setting.
Collapse
|
74
|
Abdellaoui R, Foulquié P, Texier N, Faviez C, Burgun A, Schück S. Detection of Cases of Noncompliance to Drug Treatment in Patient Forum Posts: Topic Model Approach. J Med Internet Res 2018. [PMID: 29540337 PMCID: PMC5874436 DOI: 10.2196/jmir.9222] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background Medication nonadherence is a major impediment to the management of many health conditions. A better understanding of the factors underlying noncompliance to treatment may help health professionals to address it. Patients use peer-to-peer virtual communities and social media to share their experiences regarding their treatments and diseases. Using topic models makes it possible to model themes present in a collection of posts, thus to identify cases of noncompliance. Objective The aim of this study was to detect messages describing patients’ noncompliant behaviors associated with a drug of interest. Thus, the objective was the clustering of posts featuring a homogeneous vocabulary related to nonadherent attitudes. Methods We focused on escitalopram and aripiprazole used to treat depression and psychotic conditions, respectively. We implemented a probabilistic topic model to identify the topics that occurred in a corpus of messages mentioning these drugs, posted from 2004 to 2013 on three of the most popular French forums. Data were collected using a Web crawler designed by Kappa Santé as part of the Detec’t project to analyze social media for drug safety. Several topics were related to noncompliance to treatment. Results Starting from a corpus of 3650 posts related to an antidepressant drug (escitalopram) and 2164 posts related to an antipsychotic drug (aripiprazole), the use of latent Dirichlet allocation allowed us to model several themes, including interruptions of treatment and changes in dosage. The topic model approach detected cases of noncompliance behaviors with a recall of 98.5% (272/276) and a precision of 32.6% (272/844). Conclusions Topic models enabled us to explore patients’ discussions on community websites and to identify posts related with noncompliant behaviors. After a manual review of the messages in the noncompliance topics, we found that noncompliance to treatment was present in 6.17% (276/4469) of the posts.
Collapse
|
75
|
Bodenreider O, Burgun A. Accessing and Integrating Data and Knowledge for Biomedical Research. Yearb Med Inform 2018. [DOI: 10.1055/s-0038-1638588] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022] Open
Abstract
Summary
Objectives To review the issues that have arisen with the advent of translational research in terms of integration of data and knowledge, and survey current efforts to address these issues.
MethodsUsing examples form the biomedical literature, we identified new trends in biomedical research and their impact on bioinformatics. We analyzed the requirements for effective knowledge repositories and studied issues in the integration of biomedical knowledge.
Results New diagnostic and therapeutic approaches based on gene expression patterns have brought about new issues in the statistical analysis of data, and new workflows are needed are needed to support translational research. Interoperable data repositories based on standard annotations, infrastructures and services are needed to support the pooling and meta-analysis of data, as well as their comparison to earlier experiments. High-quality, integrated ontologies and knowledge bases serve as a source of prior knowledge used in combination with traditional data mining techniques and contribute to the development of more effective data analysis strategies.
Conclusion As biomedical research evolves from traditional clinical and biological investigations towards omics sciences and translational research, specific needs have emerged, including integrating data collected in research studies with patient clinical data, linking omics knowledge with medical knowledge, modeling the molecular basis of diseases, and developing tools that support in-depth analysis of research data. As such, translational research illustrates the need to bridge the gap between bioinformatics and medical informatics, and opens new avenues for biomedical informatics research.
Collapse
|