101
|
Rosier A, Mabo P, Temal L, Van Hille P, Dameron O, Deleger L, Grouin C, Zweigenbaum P, Jacques J, Chazard E, Laporte L, Henry C, Burgun A. Remote Monitoring of Cardiac Implantable Devices: Ontology Driven Classification of the Alerts. Stud Health Technol Inform 2016; 221:59-63. [PMID: 27071877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The number of patients that benefit from remote monitoring of cardiac implantable electronic devices, such as pacemakers and defibrillators, is growing rapidly. Consequently, the huge number of alerts that are generated and transmitted to the physicians represents a challenge to handle. We have developed a system based on a formal ontology that integrates the alert information and the patient data extracted from the electronic health record in order to better classify the importance of alerts. A pilot study was conducted on atrial fibrillation alerts. We show some examples of alert processing. The results suggest that this approach has the potential to significantly reduce the alert burden in telecardiology. The methods may be extended to other types of connected devices.
Collapse
|
102
|
Escudié JB, Jannot AS, Zapletal E, Cohen S, Malamut G, Burgun A, Rance B. Reviewing 741 patients records in two hours with FASTVISU. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:553-559. [PMID: 26958189 PMCID: PMC4765586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The secondary use of electronic health records opens up new perspectives. They provide researchers with structured data and unstructured data, including free text reports. Many applications been developed to leverage knowledge from free-text reports, but manual review of documents is still a complex process. We developed FASTVISU a web-based application to assist clinicians in reviewing documents. We used FASTVISU to review a set of 6340 documents from 741 patients suffering from the celiac disease. A first automated selection pruned the original set to 847 documents from 276 patients' records. The records were reviewed by two trained physicians to identify the presence of 15 auto-immune diseases. It took respectively two hours and two hours and a half to evaluate the entire corpus. Inter-annotator agreement was high (Cohen's kappa at 0.89). FASTVISU is a user-friendly modular solution to validate entities extracted by NLP methods from free-text documents stored in clinical data warehouses.
Collapse
|
103
|
Cohen S, Burgun A, Iserin L, Escudié JB. CO 10 Different tools to identify congenital heart diseases in databases: Electronic medical record and International classification of diseases. ARCHIVES OF CARDIOVASCULAR DISEASES SUPPLEMENTS 2015. [DOI: 10.1016/s1878-6480(15)30304-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
104
|
Rosier A, Mabo P, Temal L, Van Hille P, Dameron O, Deléger L, Grouin C, Zweigenbaum P, Jacques J, Chazard E, Laporte L, Henry C, Burgun A. Personalized and automated remote monitoring of atrial fibrillation. Europace 2015; 18:347-52. [DOI: 10.1093/europace/euv234] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 06/10/2015] [Indexed: 11/14/2022] Open
|
105
|
Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, Jaulent MC, Beyens MN, Burgun A, Bousquet C. Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review. J Med Internet Res 2015; 17:e171. [PMID: 26163365 PMCID: PMC4526988 DOI: 10.2196/jmir.4304] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 04/09/2015] [Accepted: 04/22/2015] [Indexed: 02/06/2023] Open
Abstract
Background The underreporting of adverse drug reactions (ADRs) through traditional reporting channels is a limitation in the efficiency of the current pharmacovigilance system. Patients’ experiences with drugs that they report on social media represent a new source of data that may have some value in postmarketing safety surveillance. Objective A scoping review was undertaken to explore the breadth of evidence about the use of social media as a new source of knowledge for pharmacovigilance. Methods Daubt et al’s recommendations for scoping reviews were followed. The research questions were as follows: How can social media be used as a data source for postmarketing drug surveillance? What are the available methods for extracting data? What are the different ways to use these data? We queried PubMed, Embase, and Google Scholar to extract relevant articles that were published before June 2014 and with no lower date limit. Two pairs of reviewers independently screened the selected studies and proposed two themes of review: manual ADR identification (theme 1) and automated ADR extraction from social media (theme 2). Descriptive characteristics were collected from the publications to create a database for themes 1 and 2. Results Of the 1032 citations from PubMed and Embase, 11 were relevant to the research question. An additional 13 citations were added after further research on the Internet and in reference lists. Themes 1 and 2 explored 11 and 13 articles, respectively. Ways of approaching the use of social media as a pharmacovigilance data source were identified. Conclusions This scoping review noted multiple methods for identifying target data, extracting them, and evaluating the quality of medical information from social media. It also showed some remaining gaps in the field. Studies related to the identification theme usually failed to accurately assess the completeness, quality, and reliability of the data that were analyzed from social media. Regarding extraction, no study proposed a generic approach to easily adding a new site or data source. Additional studies are required to precisely determine the role of social media in the pharmacovigilance system.
Collapse
|
106
|
Burgun A. Infrastructures de recherche pour l’accès aux bases de données médico-administratives à l’échelle nationale et européenne. Rev Epidemiol Sante Publique 2015. [DOI: 10.1016/j.respe.2015.03.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
107
|
Burgun A, Oksen DV, Kuchinke W, Prokosch HU, Ganslandt T, Buchan I, van Staa T, Cunningham J, Gjerstorff ML, Dufour JC, Gibrat JF, Nikolski M, Verger P, Cambon-Thomsen A, Masella C, Lettieri E, Bertele P, Salokannel M, Thiebaut R, Persoz C, Chêne G, Ohmann C. Proposal for a European Public Health Research Infrastructure for Sharing of health and Medical administrative data (PHRIMA). Stud Health Technol Inform 2015; 216:1005. [PMID: 26262306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In Europe, health and medical administrative data is increasingly accumulating on a national level. Looking further than re-use of this data on a national level, sharing health and medical administrative data would enable large-scale analyses and European-level public health projects. There is currently no research infrastructure for this type of sharing. The PHRIMA consortium proposes to realise the Public Health Research Infrastructure for Sharing of health and Medical Administrative data (PHRIMA) which will enable and facilitate the efficient and secure sharing of healthcare data.
Collapse
|
108
|
Katsahian S, Simond Moreau E, Leprovost D, Lardon J, Bousquet C, Kerdelhué G, Abdellaoui R, Texier N, Burgun A, Boussadi A, Faviez C. Evaluation of Internet Social Networks using Net scoring Tool: A Case Study in Adverse Drug Reaction Mining. Stud Health Technol Inform 2015; 210:526-530. [PMID: 25991203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
BACKGROUND AND OBJECTIVES Suspected adverse drug reactions (ADR) reported by patients through social media can be a complementary tool to already existing ADRs signal detection processes. However, several studies have shown that the quality of medical information published online varies drastically whatever the health topic addressed. The aim of this study is to use an existing rating tool on a set of social network web sites in order to assess the capabilities of these tools to guide experts for selecting the most adapted social network web site to mine ADRs. METHODS First, we reviewed and rated 132 Internet forums and social networks according to three major criteria: the number of visits, the notoriety of the forum and the number of messages posted in relation with health and drug therapy. Second, the pharmacist reviewed the topic-oriented message boards with a small number of drug names to ensure that they were not off topic. Six experts have been chosen to assess the selected internet forums using a French scoring tool: Net scoring. Three different scores and the agreement between experts according to each set of scores using weighted kappa pooled using mean have been computed. RESULTS Three internet forums were chosen at the end of the selection step. Some criteria get high score (scores 3-4) no matter the website evaluated like accessibility (45-46) or design (34-36), at the opposite some criteria always have bad scores like quantitative (40-42) and ethical aspect (43-44), hyperlinks actualization (30-33). Kappa were positives but very small which corresponds to a weak agreement between experts. CONCLUSION The personal opinion of the expert seems to have a major impact, undermining the relevance of the criterion. Our future work is to collect results given by this evaluation grid and proposes a new scoring tool for Internet social networks assessment.
Collapse
|
109
|
Pham AD, Névéol A, Lavergne T, Yasunaga D, Clément O, Meyer G, Morello R, Burgun A. Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings. BMC Bioinformatics 2014; 15:266. [PMID: 25099227 PMCID: PMC4133634 DOI: 10.1186/1471-2105-15-266] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 07/19/2014] [Indexed: 12/21/2022] Open
Abstract
Background Natural Language Processing (NLP) has been shown effective to analyze the content of radiology reports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learning to detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography and venography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts, modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. A corpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physician for relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by a physician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decision models accounted for the imbalanced nature of the data and exploited the structure of the reports. Results The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep vein thrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improved performances in all cases. Conclusions This study demonstrates the benefits of developing an automated method to identify medical concepts, modality and relations from radiology reports in French. An end-to-end automatic system for annotation and classification which could be applied to other radiology reports databases would be valuable for epidemiological surveillance, performance monitoring, and accreditation in French hospitals.
Collapse
|
110
|
Rosier A, Mabo P, Chauvin M, Burgun A. An ontology-based annotation of cardiac implantable electronic devices to detect therapy changes in a national registry. IEEE J Biomed Health Inform 2014; 19:971-8. [PMID: 25029524 DOI: 10.1109/jbhi.2014.2338741] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The patient population benefitting from cardiac implantable electronic devices (CIEDs) is increasing. This study introduces a device annotation method that supports the consistent description of the functional attributes of cardiac devices and evaluates how this method can detect device changes from a CIED registry. We designed the Cardiac Device Ontology, an ontology of CIEDs and device functions. We annotated 146 cardiac devices with this ontology and used it to detect therapy changes with respect to atrioventricular pacing, cardiac resynchronization therapy, and defibrillation capability in a French national registry of patients with implants (STIDEFIX). We then analyzed a set of 6905 device replacements from the STIDEFIX registry. Ontology-based identification of therapy changes (upgraded, downgraded, or similar) was accurate (6905 cases) and performed better than straightforward analysis of the registry codes (F-measure 1.00 versus 0.75 to 0.97). This study demonstrates the feasibility and effectiveness of ontology-based functional annotation of devices in the cardiac domain. Such annotation allowed a better description and in-depth analysis of STIDEFIX. This method was useful for the automatic detection of therapy changes and may be reused for analyzing data from other device registries.
Collapse
|
111
|
Ethier JF, Curcin V, Barton A, McGilchrist MM, Bastiaens H, Andreasson A, Rossiter J, Zhao L, Arvanitis TN, Taweel A, Delaney BC, Burgun A. Clinical data integration model. Core interoperability ontology for research using primary care data. Methods Inf Med 2014; 54:16-23. [PMID: 24954896 DOI: 10.3414/me13-02-0024] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Accepted: 04/23/2014] [Indexed: 12/18/2022]
Abstract
INTRODUCTION This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". BACKGROUND Primary care data is the single richest source of routine health care data. However its use, both in research and clinical work, often requires data from multiple clinical sites, clinical trials databases and registries. Data integration and interoperability are therefore of utmost importance. OBJECTIVES TRANSFoRm's general approach relies on a unified interoperability framework, described in a previous paper. We developed a core ontology for an interoperability framework based on data mediation. This article presents how such an ontology, the Clinical Data Integration Model (CDIM), can be designed to support, in conjunction with appropriate terminologies, biomedical data federation within TRANSFoRm, an EU FP7 project that aims to develop the digital infrastructure for a learning healthcare system in European Primary Care. METHODS TRANSFoRm utilizes a unified structural / terminological interoperability framework, based on the local-as-view mediation paradigm. Such an approach mandates the global information model to describe the domain of interest independently of the data sources to be explored. Following a requirement analysis process, no ontology focusing on primary care research was identified and, thus we designed a realist ontology based on Basic Formal Ontology to support our framework in collaboration with various terminologies used in primary care. RESULTS The resulting ontology has 549 classes and 82 object properties and is used to support data integration for TRANSFoRm's use cases. Concepts identified by researchers were successfully expressed in queries using CDIM and pertinent terminologies. As an example, we illustrate how, in TRANSFoRm, the Query Formulation Workbench can capture eligibility criteria in a computable representation, which is based on CDIM. CONCLUSION A unified mediation approach to semantic interoperability provides a flexible and extensible framework for all types of interaction between health record systems and research systems. CDIM, as core ontology of such an approach, enables simplicity and consistency of design across the heterogeneous software landscape and can support the specific needs of EHR-driven phenotyping research using primary care data.
Collapse
|
112
|
Lim Choi Keung SN, Zhao L, Rossiter J, McGilchrist M, Culross F, Ethier JF, Burgun A, Verheij RA, Khan N, Taweel A, Curcin V, Delaney BC, Arvanitis TN. Detailed clinical modelling approach to data extraction from heterogeneous data sources for clinical research. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:55-9. [PMID: 25954578 PMCID: PMC4419774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The reuse of routinely collected clinical data for clinical research is being explored as part of the drive to reduce duplicate data entry and to start making full use of the big data potential in the healthcare domain. Clinical researchers often need to extract data from patient registries and other patient record datasets for data analysis as part of clinical studies. In the TRANSFoRm project, researchers define their study requirements via a Query Formulation Workbench. We use a standardised approach to data extraction to retrieve relevant information from heterogeneous data sources, using semantic interoperability enabled via detailed clinical modelling. This approach is used for data extraction from data sources for analysis and for pre-population of electronic Case Report Forms from electronic health records in primary care clinical systems.
Collapse
|
113
|
Canuel V, Rance B, Avillach P, Degoulet P, Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Brief Bioinform 2014; 16:280-90. [PMID: 24608524 PMCID: PMC4364065 DOI: 10.1093/bib/bbu006] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The rise of personalized medicine and the availability of high-throughput molecular analyses in the context of clinical care have increased the need for adequate tools for translational researchers to manage and explore these data. We reviewed the biomedical literature for translational platforms allowing the management and exploration of clinical and omics data, and identified seven platforms: BRISK, caTRIP, cBio Cancer Portal, G-DOC, iCOD, iDASH and tranSMART. We analyzed these platforms along seven major axes. (1) The community axis regrouped information regarding initiators and funders of the project, as well as availability status and references. (2) We regrouped under the information content axis the nature of the clinical and omics data handled by each system. (3) The privacy management environment axis encompassed functionalities allowing control over data privacy. (4) In the analysis support axis, we detailed the analytical and statistical tools provided by the platforms. We also explored (5) interoperability support and (6) system requirements. The final axis (7) platform support listed the availability of documentation and installation procedures. A large heterogeneity was observed in regard to the capability to manage phenotype information in addition to omics data, their security and interoperability features. The analytical and visualization features strongly depend on the considered platform. Similarly, the availability of the systems is variable. This review aims at providing the reader with the background to choose the platform best suited to their needs. To conclude, we discuss the desiderata for optimal translational research platforms, in terms of privacy, interoperability and technical features.
Collapse
|
114
|
Neuraz A, Chouchana L, Malamut G, Le Beller C, Roche D, Beaune P, Degoulet P, Burgun A, Loriot MA, Avillach P. Phenome-wide association studies on a quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics. PLoS Comput Biol 2013; 9:e1003405. [PMID: 24385893 PMCID: PMC3873228 DOI: 10.1371/journal.pcbi.1003405] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 11/08/2013] [Indexed: 02/04/2023] Open
Abstract
Phenome-Wide Association Studies (PheWAS) investigate whether genetic polymorphisms associated with a phenotype are also associated with other diagnoses. In this study, we have developed new methods to perform a PheWAS based on ICD-10 codes and biological test results, and to use a quantitative trait as the selection criterion. We tested our approach on thiopurine S-methyltransferase (TPMT) activity in patients treated by thiopurine drugs. We developed 2 aggregation methods for the ICD-10 codes: an ICD-10 hierarchy and a mapping to existing ICD-9-CM based PheWAS codes. Eleven biological test results were also analyzed using discretization algorithms. We applied these methods in patients having a TPMT activity assessment from the clinical data warehouse of a French academic hospital between January 2000 and July 2013. Data after initiation of thiopurine treatment were analyzed and patient groups were compared according to their TPMT activity level. A total of 442 patient records were analyzed representing 10,252 ICD-10 codes and 72,711 biological test results. The results from the ICD-9-CM based PheWAS codes and ICD-10 hierarchy codes were concordant. Cross-validation with the biological test results allowed us to validate the ICD phenotypes. Iron-deficiency anemia and diabetes mellitus were associated with a very high TPMT activity (p = 0.0004 and p = 0.0015, respectively). We describe here an original method to perform PheWAS on a quantitative trait using both ICD-10 diagnosis codes and biological test results to identify associated phenotypes. In the field of pharmacogenomics, PheWAS allow for the identification of new subgroups of patients who require personalized clinical and therapeutic management. The use of underlying molecular mechanisms and other factors to describe and classify diseases is a major challenge for future treatment strategies. New methods are needed to achieve this goal. The phenome wide association study (PheWAS) methodology was initially developed to unveil unknown associations between a specific genetic status and phenotypic features (e.g. diagnoses from electronic health records). We initially propose to extend this method to assessment of the relationships between the levels of a quantitative trait and diagnosis codes. We also assess the relationships between this quantitative trait and the biological test results. We tested this method using the levels of enzymatic activity of thiopurine S-methyltransferase (TPMT) that is involved in the metabolism of thiopurine drugs used in inflammatory bowel diseases for example. We discovered an association between a very high TPMT activity and nutritional anemia and diabetes. These results could be used to describe a new subgroup of patients in order to optimize drug treatments.
Collapse
|
115
|
Dameron O, Besana P, Zekri O, Bourdé A, Burgun A, Cuggia M. OWL model of clinical trial eligibility criteria compatible with partially-known information. J Biomed Semantics 2013; 4:17. [PMID: 24034867 PMCID: PMC3852288 DOI: 10.1186/2041-1480-4-17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 08/21/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clinical trials are important for patients, for researchers and for companies. One of the major bottlenecks is patient recruitment. This task requires the matching of a large volume of information about the patient with numerous eligibility criteria, in a logically-complex combination. Moreover, some of the patient's information necessary to determine the status of the eligibility criteria may not be available at the time of pre-screening. RESULTS We showed that the classic approach based on negation as failure over-estimates rejection when confronted with partially-known information about the eligibility criteria because it ignores the distinction between a trial for which patient eligibility should be rejected and trials for which patient eligibility cannot be asserted. We have also shown that 58.64% of the values were unknown in the 286 prostate cancer cases examined during the weekly urology multidisciplinary meetings at Rennes' university hospital between October 2008 and March 2009.We propose an OWL design pattern for modeling eligibility criteria based on the open world assumption to address the missing information problem. We validate our model on a fictitious clinical trial and evaluate it on two real clinical trials. Our approach successfully distinguished clinical trials for which the patient is eligible, clinical trials for which we know that the patient is not eligible and clinical trials for which the patient may be eligible provided that further pieces of information (which we can identify) can be obtained. CONCLUSIONS OWL-based reasoning based on the open world assumption provides an adequate framework for distinguishing those patients who can confidently be rejected from those whose status cannot be determined. The expected benefits are a reduction of the workload of the physicians and a higher efficiency by allowing them to focus on the patients whose eligibility actually require expertise.
Collapse
|
116
|
Ethier JF, Dameron O, Curcin V, McGilchrist MM, Verheij RA, Arvanitis TN, Taweel A, Delaney BC, Burgun A. A unified structural/terminological interoperability framework based on LexEVS: application to TRANSFoRm. J Am Med Inform Assoc 2013; 20:986-94. [PMID: 23571850 PMCID: PMC3756256 DOI: 10.1136/amiajnl-2012-001312] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 03/10/2013] [Accepted: 03/14/2013] [Indexed: 12/17/2022] Open
Abstract
OBJECTIVE Biomedical research increasingly relies on the integration of information from multiple heterogeneous data sources. Despite the fact that structural and terminological aspects of interoperability are interdependent and rely on a common set of requirements, current efforts typically address them in isolation. We propose a unified ontology-based knowledge framework to facilitate interoperability between heterogeneous sources, and investigate if using the LexEVS terminology server is a viable implementation method. MATERIALS AND METHODS We developed a framework based on an ontology, the general information model (GIM), to unify structural models and terminologies, together with relevant mapping sets. This allowed a uniform access to these resources within LexEVS to facilitate interoperability by various components and data sources from implementing architectures. RESULTS Our unified framework has been tested in the context of the EU Framework Program 7 TRANSFoRm project, where it was used to achieve data integration in a retrospective diabetes cohort study. The GIM was successfully instantiated in TRANSFoRm as the clinical data integration model, and necessary mappings were created to support effective information retrieval for software tools in the project. CONCLUSIONS We present a novel, unifying approach to address interoperability challenges in heterogeneous data sources, by representing structural and semantic models in one framework. Systems using this architecture can rely solely on the GIM that abstracts over both the structure and coding. Information models, terminologies and mappings are all stored in LexEVS and can be accessed in a uniform manner (implementing the HL7 CTS2 service functional model). The system is flexible and should reduce the effort needed from data sources personnel for implementing and managing the integration.
Collapse
|
117
|
Bettembourg C, Diot C, Burgun A, Dameron O. GO2PUB: Querying PubMed with semantic expansion of gene ontology terms. J Biomed Semantics 2012; 3:7. [PMID: 22958570 PMCID: PMC3599846 DOI: 10.1186/2041-1480-3-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 08/26/2012] [Indexed: 11/27/2022] Open
Abstract
Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries. Conclusions We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org.
Collapse
|
118
|
Abstract
Ontology and associated generic tools are appropriate for knowledge modeling and reasoning, but most of the time, disease definitions in existing description logic (DL) ontology are not sufficient to classify patient's characteristics under a particular disease because they do not formalize operational definitions of diseases (association of signs and symptoms=diagnostic criteria). The main objective of this study is to propose an ontological representation which takes into account the diagnostic criteria on which specific patient conditions may be classified under a specific disease. This method needs as a prerequisite a clear list of necessary and sufficient diagnostic criteria as defined for lots of diseases by learned societies. It does not include probability/uncertainty which Web Ontology Language (OWL 2.0) cannot handle. We illustrate it with spondyloarthritis (SpA). Ontology has been designed in Protégé 4.1 OWL-DL2.0. Several kinds of criteria were formalized: (1) mandatory criteria, (2) picking two criteria among several diagnostic criteria, (3) numeric criteria. Thirty real patient cases were successfully classified with the reasoner. This study shows that it is possible to represent operational definitions of diseases with OWL and successfully classify real patient cases. Representing diagnostic criteria as descriptive knowledge (instead of rules in Semantic Web Rule Language or Prolog) allows us to take advantage of tools already available for OWL. While we focused on Assessment of SpondyloArthritis international Society SpA criteria, we believe that many of the representation issues addressed here are relevant to using OWL-DL for operational definition of other diseases in ontology.
Collapse
|
119
|
Lasbleiz J, Morelli J, Schnel N, Burgun A, Duvauferrier R, Saint Jalmes H. MRI image artifact ontology: a proposed method for improved recognition. Stud Health Technol Inform 2012; 180:103-107. [PMID: 22874161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
UNLABELLED Magnetic Resonance Imaging is an essential diagnostic imaging technique. The complexity of interpreting MRI images is often compounded by the presence of a wide range of artifacts which are often challenging to identify and eliminate. Ontology permits the construction of a knowledge database with which users can interact given an appropriate interface. The goal of this work is to create an interactive tool for the ontology of MRI artifacts that will allow a radiologist to compare any given MRI artifact image with those contained in the ontology. MATERIAL AND METHOD Using Protégé 4, we have constructed the ontology with input from an expert in MRI artifacts and utilizing images exemplifying such artifacts. The graphical user interface has been built in Java and the linkage with the ontology made with Owl API. RESULTS Using the tool, users can compare imaging artifacts encountered in daily practice to those in the database. Once a user has identified the image the most similar to their own, they then have instantaneous access to the knowledge contained in the ontology about the artifact. Individual users can also submit images and have access to DICOM data.
Collapse
|
120
|
Van Hille P, Jacques J, Taillard J, Rosier A, Delerue D, Burgun A, Dameron O. Comparing Drools and ontology reasoning approaches for telecardiology decision support. Stud Health Technol Inform 2012; 180:300-304. [PMID: 22874200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Implantable cardioverter defibrillators can generate numerous alerts. Automatically classifying these alerts according to their severity hinges on the CHA2DS2VASc score. It requires some reasoning capabilities for interpreting the patient's data. We compared two approaches for implementing the reasoning module. One is based on the Drools engine, and the other is based on semantic web formalisms. Both were valid approaches with correct performances. For a broader domain, their limitations are the number and complexity of Drools rules and the performances of ontology-based reasoning, which suggests using the ontology for automatically generating a part of the Drools rules.
Collapse
|
121
|
Grouin C, Deléger L, Rosier A, Temal L, Dameron O, Van Hille P, Burgun A, Zweigenbaum P. Automatic computation of CHA2DS2-VASc score: information extraction from clinical texts for thromboembolism risk assessment. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:501-10. [PMID: 22195104 PMCID: PMC3243195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The CHA2DS2-VASc score is a 10-point scale which allows cardiologists to easily identify potential stroke risk for patients with non-valvular fibrillation. In this article, we present a system based on natural language processing (lexicon and linguistic modules), including negation and speculation handling, which extracts medical concepts from French clinical records and uses them as criteria to compute the CHA2DS2-VASc score. We evaluate this system by comparing its computed criteria with those obtained by human reading of the same clinical texts, and by assessing the impact of the observed differences on the resulting CHA2DS2-VASc scores. Given 21 patient records, 168 instances of criteria were computed, with an accuracy of 97.6%, and the accuracy of the 21 CHA2DS2-VASc scores was 85.7%. All differences in scores trigger the same alert, which means that system performance on this test set yields similar results to human reading of the texts.
Collapse
|
122
|
Jadeau F, Barthelaix A, Burgun A, Deleval C, Garcelon N, Gelé P, Libersa C, Savonnet M, Tabone É, Combet C. L’annuaire des centres de ressources biologiques français. Med Sci (Paris) 2011; 27:895-7. [DOI: 10.1051/medsci/20112710019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
123
|
Jouhet V, Defossez G, Burgun A, le Beux P, Levillain P, Ingrand P, Claveau V. Automated classification of free-text pathology reports for registration of incident cases of cancer. Methods Inf Med 2011; 51:242-51. [PMID: 21792466 DOI: 10.3414/me11-01-0005] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Accepted: 05/30/2011] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Our study aimed to construct and evaluate functions called "classifiers", produced by supervised machine learning techniques, in order to categorize automatically pathology reports using solely their content. METHODS Patients from the Poitou-Charentes Cancer Registry having at least one pathology report and a single non-metastatic invasive neoplasm were included. A descriptor weighting function accounting for the distribution of terms among targeted classes was developed and compared to classic methods based on inverse document frequencies. The classification was performed with support vector machine (SVM) and Naive Bayes classifiers. Two levels of granularity were tested for both the topographical and the morphological axes of the ICD-O3 code. The ability to correctly attribute a precise ICD-O3 code and the ability to attribute the broad category defined by the International Agency for Research on Cancer (IARC) for the multiple primary cancer registration rules were evaluated using F1-measures. RESULTS 5121 pathology reports produced by 35 pathologists were selected. The best performance was achieved by our class-weighted descriptor, associated with a SVM classifier. Using this method, the pathology reports were properly classified in the IARC categories with F1-measures of 0.967 for both topography and morphology. The ICD-O3 code attribution had lower performance with a 0.715 F1-measure for topography and 0.854 for morphology. CONCLUSION These results suggest that free-text pathology reports could be useful as a data source for automated systems in order to identify and notify new cases of cancer. Future work is needed to evaluate the improvement in performance obtained from the use of natural language processing, including the case of multiple tumor description and possible incorporation of other medical documents such as surgical reports.
Collapse
|
124
|
Brochhausen M, Burgun A, Ceusters W, Hasman A, Leong TY, Musen M, Oliveira JL, Peleg M, Rector A, Schulz S. Discussion of "biomedical ontologies: toward scientific debate". Methods Inf Med 2011; 50:217-236. [PMID: 21566855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
|
125
|
Lasbleiz J, Saint-Jalmes H, Duvauferrier R, Burgun A. Creating a magnetic resonance imaging ontology. Stud Health Technol Inform 2011; 169:784-8. [PMID: 21893854 PMCID: PMC3391186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The goal of this work is to build an ontology of Magnetic Resonance Imaging. The MRI domain has been analysed regarding MRI simulators and the DICOM standard. Tow MRI simulators have been analysed: JEMRIS, which is developed in XML and C++, has a hierarchical organisation and SIMRI, which is developed in C, has a good representation of MRI physical processes. To build the ontology we have used Protégé 4, owl2 that allows quantitative representations. The ontology has been validated by a reasoner (Fact++) and by a good representation of DICOM headers and of MRI processes. The MRI ontology would improved MRI simulators and eased semantic interoperability.
Collapse
|