1
|
Kartheeswaran KP, Rayan AXA, Varrieth GT. Enhanced disease-disease association with information enriched disease representation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:8892-8932. [PMID: 37161227 DOI: 10.3934/mbe.2023391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
OBJECTIVE Quantification of disease-disease association (DDA) enables the understanding of disease relationships for discovering disease progression and finding comorbidity. For effective DDA strength calculation, there is a need to address the main challenge of integration of various biomedical aspects of DDA is to obtain an information rich disease representation. MATERIALS AND METHODS An enhanced and integrated DDA framework is developed that integrates enriched literature-based with concept-based DDA representation. The literature component of the proposed framework uses PubMed abstracts and consists of improved neural network model that classifies DDAs for an enhanced literature-based DDA representation. Similarly, an ontology-based joint multi-source association embedding model is proposed in the ontology component using Disease Ontology (DO), UMLS, claims insurance, clinical notes etc. Results and Discussion: The obtained information rich disease representation is evaluated on different aspects of DDA datasets such as Gene, Variant, Gene Ontology (GO) and a human rated benchmark dataset. The DDA scores calculated using the proposed method achieved a high correlation mainly in gene-based dataset. The quantified scores also shown better correlation of 0.821, when evaluated on human rated 213 disease pairs. In addition, the generated disease representation is proved to have substantial effect on correlation of DDA scores for different categories of disease pairs. CONCLUSION The enhanced context and semantic DDA framework provides an enriched disease representation, resulting in high correlated results with different DDA datasets. We have also presented the biological interpretation of disease pairs. The developed framework can also be used for deriving the strength of other biomedical associations.
Collapse
|
2
|
Chen DTL, Chang JPC, Cheng SW, Chang HC, Hsu JH, Chang HH, Chiu WC, Su KP. Kawasaki disease in childhood and psychiatric disorders: A population-based case-control prospective study in Taiwan. Brain Behav Immun 2022; 100:105-111. [PMID: 34848339 DOI: 10.1016/j.bbi.2021.11.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 11/09/2021] [Accepted: 11/22/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Kawasaki disease (KD) is a common childhood acute inflammatory disease and potentially triggers a chronic inflammation. Although some researches have investigated neurodevelopmental consequences following KD, the findings have been inconsistent. This is the first population-based study targeted on KD and common psychiatric disorders. OBJECTIVES We aimed to investigate the association between KD and psychiatric disorders and hypothesized that standard anti-inflammatory treatment by intravenous immunoglobulin (IVIG) may protect against development of psychiatric disorders. METHOD We retrieved data from Taiwan's National Health Insurance Research database (NHIRD). Patients (n = 282,513) with psychiatric disorders (the case group) during 1997-2013 were included, and the control group was matched with age, sex, income and urbanization (1:1). We calculated the prevalence of KD in both groups and estimated odd ratios (ORs) and 95% confidence intervals (CIs) in the subgroup analyses for KD in conditions of age, severity, and common psychiatric comorbidity. RESULTS Numbers of patients with KD were 460 in the cases and 380 in the controls (p = .006), and the crude OR of KD was 1.21 times greater (95% CI = 1.06-1.39, p = .006) in the case than the control groups. KD patients without IVIG treatment (n = 126) were higher in the cases than those in the controls (n = 54), with the OR of 2.33 (95% CI = 1.70-3.21, p < .0001). Subgroup analyses showed that KD survivors were at significant risk for autism spectrum disorders (ASD) (OR = 2.15, 95% CI = 1.27-3.65; p = .005) and attention deficit and hyperactivity disorders (ADHD) (OR = 1.19, 95% CI = 1.02-1.39; p = 0.03), and a trend of increased risk for anxiety disorders (OR = 1.36, 95%CI = 0.99-1.86; p = 0.05). CONCLUSIONS Patients with KD were more likely to have comorbid psychiatric disorders, including ASD and ADHD. Moreover, anti-inflammatory treatment with IVIG may have potential prophylactic effects against the development of psychiatric disorders.
Collapse
Affiliation(s)
- Daniel Tzu-Li Chen
- School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan; Department of Psychiatry and Mind-Body Interface Laboratory (MBI-Lab), China Medical University Hospital, Taichung, Taiwan; Graduate Institute of Biomedicine, College of Medicine, China Medical University, Taichung, Taiwan
| | - Jane Pei-Chen Chang
- Department of Psychiatry and Mind-Body Interface Laboratory (MBI-Lab), China Medical University Hospital, Taichung, Taiwan; Graduate Institute of Biomedicine, College of Medicine, China Medical University, Taichung, Taiwan; School of Medicine, College of Medicine, China Medical University, Taichung, Taiwan
| | - Szu-Wei Cheng
- Department of Psychiatry and Mind-Body Interface Laboratory (MBI-Lab), China Medical University Hospital, Taichung, Taiwan; School of Medicine, College of Medicine, China Medical University, Taichung, Taiwan
| | - Hui-Chih Chang
- Department of Psychiatry and Mind-Body Interface Laboratory (MBI-Lab), China Medical University Hospital, Taichung, Taiwan
| | - Jong-Hau Hsu
- Department of Pediatrics, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Hen-Hong Chang
- Graduate Institute of Integrated Medicine, College of Chinese Medicine, and Chinese Medicine Research Center, China Medical University, Taiwan; Department of Chinese Medicine, China Medical University Hospital, Taichung, Taiwan
| | - Wei-Che Chiu
- School of Medicine, Fu Jen Catholic University, Taipei, Taiwan; Department of Psychiatry, Cathay General Hospital, Taipei, Taiwan.
| | - Kuan-Pin Su
- Department of Psychiatry and Mind-Body Interface Laboratory (MBI-Lab), China Medical University Hospital, Taichung, Taiwan; Graduate Institute of Biomedicine, College of Medicine, China Medical University, Taichung, Taiwan; School of Medicine, College of Medicine, China Medical University, Taichung, Taiwan; An-Nan Hospital, China Medical University, Tainan, Taiwan.
| |
Collapse
|
3
|
Agmon S, Gillis P, Horvitz E, Radinsky K. Gender-sensitive word embeddings for healthcare. J Am Med Inform Assoc 2021; 29:415-423. [PMID: 34918101 DOI: 10.1093/jamia/ocab279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 11/30/2021] [Accepted: 12/10/2021] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE To analyze gender bias in clinical trials, to design an algorithm that mitigates the effects of biases of gender representation on natural-language (NLP) systems trained on text drawn from clinical trials, and to evaluate its performance. MATERIALS AND METHODS We analyze gender bias in clinical trials described by 16 772 PubMed abstracts (2008-2018). We present a method to augment word embeddings, the core building block of NLP-centric representations, by weighting abstracts by the number of women participants in the trial. We evaluate the resulting gender-sensitive embeddings performance on several clinical prediction tasks: comorbidity classification, hospital length of stay prediction, and intensive care unit (ICU) readmission prediction. RESULTS For female patients, the gender-sensitive model area under the receiver-operator characteristic (AUROC) is 0.86 versus the baseline of 0.81 for comorbidity classification, mean absolute error 4.59 versus the baseline of 4.66 for length of stay prediction, and AUROC 0.69 versus 0.67 for ICU readmission. All results are statistically significant. DISCUSSION Women have been underrepresented in clinical trials. Thus, using the broad clinical trials literature as training data for statistical language models could result in biased models, with deficits in knowledge about women. The method presented enables gender-sensitive use of publications as training data for word embeddings. In experiments, the gender-sensitive embeddings show better performance than baseline embeddings for the clinical tasks studied. The results highlight opportunities for recognizing and addressing gender and other representational biases in the clinical trials literature. CONCLUSION Addressing representational biases in data for training NLP embeddings can lead to better results on downstream tasks for underrepresented populations.
Collapse
Affiliation(s)
- Shunit Agmon
- Computer Science Faculty, Technion - Israel Institute of Technology, Haifa, Israel
| | - Plia Gillis
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | - Kira Radinsky
- Computer Science Faculty, Technion - Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
4
|
Robinson C, Lao F, Chanchlani R, Gayowsky A, Darling E, Batthish M. Long-term hearing and neurodevelopmental outcomes following Kawasaki disease: A population-based cohort study. Brain Dev 2021; 43:735-744. [PMID: 33824025 DOI: 10.1016/j.braindev.2021.03.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/05/2021] [Accepted: 03/07/2021] [Indexed: 12/20/2022]
Abstract
BACKGROUND Kawasaki disease (KD) incidence is increasing in Ontario. Cardiovascular sequelae following KD are well-described. However, there are limited data on non-cardiovascular outcomes. OBJECTIVES To determine the risk of hearing loss, anxiety, developmental disorders, intellectual disabilities and attention-deficit/hyperactivity disorder (ADHD) among KD survivors vs. non-exposed children. METHODS We included all Ontario children (≤18 yr) surviving hospitalization with a KD diagnosis between 1995 and 2018, using population-based health administrative databases. We excluded children with prior KD diagnoses and non-residents. KD cases were matched with 100 non-exposed children by age, sex and year. Follow-up continued until death or March 2019. We calculated the prevalence, incidence and adjusted hazard ratios (aHR [95%CI]) of outcomes between 0-1 yr, 1-5 yr, 5-10 yr and >10 yr follow-up. RESULTS Among 4597 KD survivors, 364 (7.9%) were diagnosed with hearing loss, 1213 (26.4%) anxiety disorders, 398 (8.7%) developmental disorders, 51 (1.1%) intellectual disability and 21 (0.5%) ADHD, during median 11 year follow-up. Compared to 459,700 non-exposed children, KD survivors were not at increased risk of hearing loss after adjustment for potential confounders. KD survivors were at increased risk of anxiety disorders between 0-1 yr (aHR 1.75 [1.46-2.10]), 1-5 yr (aHR 1.13 [1.01-1.28]), 5-10 yr (aHR 1.14 [1.03-1.28]) and >10 yr (aHR 1.11 [1.02-1.22]); developmental disorders between 0-1 yr (aHR 1.49 [1.28-1.74]) and 1-5 yr (aHR 1.19 [1.02-1.40]); intellectual disabilities >10 yr (aHR 2.36 [1.36-4.10]); and ADHD >10 yr (aHR 2.01 [1.14-3.57]). CONCLUSIONS KD survivors are at increased risk of being diagnosed with anxiety disorders sooner, being diagnosed with developmental disorders between 0 and 5 yr and being diagnosed with intellectual disabilities or ADHD >10 yr after KD diagnosis. This may justify enhanced developmental and audiological surveillance of KD survivors.
Collapse
Affiliation(s)
- Cal Robinson
- Department of Pediatrics, McMaster University, Hamilton, ON, Canada; Division of Nephrology, Department of Paediatrics, The Hospital for Sick Children, Toronto, ON, Canada
| | - Francis Lao
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada
| | - Rahul Chanchlani
- Division of Nephrology, Department of Pediatrics, McMaster University, Hamilton, ON, Canada; Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada; ICES McMaster, Hamilton, Ontario, Canada
| | | | - Elizabeth Darling
- McMaster Midwifery Research Centre, McMaster University, Hamilton, ON, Canada
| | - Michelle Batthish
- Division of Rheumatology, Department of Pediatrics, McMaster University, Hamilton, ON, Canada.
| |
Collapse
|
5
|
Huang L, Luo H, Li S, Wu FX, Wang J. Drug-drug similarity measure and its applications. Brief Bioinform 2020; 22:5956929. [PMID: 33152756 DOI: 10.1093/bib/bbaa265] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/13/2020] [Accepted: 09/14/2020] [Indexed: 02/01/2023] Open
Abstract
Drug similarities play an important role in modern biology and medicine, as they help scientists gain deep insights into drugs' therapeutic mechanisms and conduct wet labs that may significantly improve the efficiency of drug research and development. Nowadays, a number of drug-related databases have been constructed, with which many methods have been developed for computing similarities between drugs for studying associations between drugs, human diseases, proteins (drug targets) and more. In this review, firstly, we briefly introduce the publicly available drug-related databases. Secondly, based on different drug features, interaction relationships and multimodal data, we summarize similarity calculation methods in details. Then, we discuss the applications of drug similarities in various biological and medical areas. Finally, we evaluate drug similarity calculation methods with common evaluation metrics to illustrate the important roles of drug similarity measures on different applications.
Collapse
Affiliation(s)
- Lan Huang
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering at Central South University, Hunan, China
| | - Huimin Luo
- School of Computer and Information Engineering at Henan University, Kaifeng, China
| | - Suning Li
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Fang-Xiang Wu
- College of Engineering and Department of Computer Sciences, University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering at Central South University, Hunan, China
| |
Collapse
|
6
|
Ronzano F, Gutiérrez-Sacristán A, Furlong LI. Comorbidity4j: a tool for interactive analysis of disease comorbidities over large patient datasets. Bioinformatics 2020; 35:3530-3532. [PMID: 30689768 DOI: 10.1093/bioinformatics/btz061] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 12/09/2018] [Accepted: 01/22/2019] [Indexed: 12/22/2022] Open
Abstract
SUMMARY Pushed by the growing availability of Electronic Health Records for data mining, the identification of relevant patterns of co-occurring diseases over a population of individuals-referred to as comorbidity analysis-has become a common practice due to its great impact on life expectancy, quality of life and healthcare costs. In this scenario, the availability of scalable, easy-to-use software frameworks tailored to support the study of comorbidities over large datasets of patients is essential. We introduce Comorbidity4j, an open-source Java tool to perform systematic analyses of comorbidities by generating interactive Web visualizations to explore and refine results. Comorbidity4j processes user-provided clinical data by identifying significant disease co-occurrences and computing a comprehensive set of comorbidity indices. Patients can be stratified by sex, age and user-defined criteria. Comorbidity4j supports the analysis of the temporal directionality and the sex ratio of diseases. The incremental upload and validation of clinical input data and the customization of comorbidity analyses are performed by an interactive Web interface. With a Web browser, the results of such analyses can be filtered with respect to comorbidity indexes and disease names and explored by means of heat maps and network charts of disease associations. Comorbidity4j is optimized to efficiently process large datasets of clinical data. Besides a software tool for local execution, we provide Comorbidity4j as a Web service to enable users to perform online comorbidity analyses. AVAILABILITY AND IMPLEMENTATION Doc: http://comorbidity4j.readthedocs.io/; Source code: https://github.com/fra82/comorbidity4j, Web tool: http://comorbidity.eu/comorbidity4web/.
Collapse
Affiliation(s)
- Francesco Ronzano
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
7
|
Prakash AV, Park JW, Seong JW, Kang TJ. Repositioned Drugs for Inflammatory Diseases such as Sepsis, Asthma, and Atopic Dermatitis. Biomol Ther (Seoul) 2020; 28:222-229. [PMID: 32133828 PMCID: PMC7216745 DOI: 10.4062/biomolther.2020.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 02/11/2020] [Accepted: 02/17/2020] [Indexed: 12/14/2022] Open
Abstract
The process of drug discovery and drug development consumes billions of dollars to bring a new drug to the market. Drug development is time consuming and sometimes, the failure rates are high. Thus, the pharmaceutical industry is looking for a better option for new drug discovery. Drug repositioning is a good alternative technology that has demonstrated many advantages over de novo drug development, the most important one being shorter drug development timelines. In the last two decades, drug repositioning has made tremendous impact on drug development technologies. In this review, we focus on the recent advances in drug repositioning technologies and discuss the repositioned drugs used for inflammatory diseases such as sepsis, asthma, and atopic dermatitis.
Collapse
Affiliation(s)
- Annamneedi Venkata Prakash
- Convergence Research Center, Department of Pharmacy and Institute of Chronic Disease, Sahmyook University, Seoul 01795, Republic of Korea
| | - Jun Woo Park
- Convergence Research Center, Department of Pharmacy and Institute of Chronic Disease, Sahmyook University, Seoul 01795, Republic of Korea
| | - Ju-Won Seong
- Convergence Research Center, Department of Pharmacy and Institute of Chronic Disease, Sahmyook University, Seoul 01795, Republic of Korea
| | - Tae Jin Kang
- Convergence Research Center, Department of Pharmacy and Institute of Chronic Disease, Sahmyook University, Seoul 01795, Republic of Korea
| |
Collapse
|
8
|
Electronic health records for the diagnosis of rare diseases. Kidney Int 2020; 97:676-686. [DOI: 10.1016/j.kint.2019.11.037] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 11/15/2019] [Accepted: 11/22/2019] [Indexed: 01/13/2023]
|
9
|
Chaganti S, Welty VF, Taylor W, Albert K, Failla MD, Cascio C, Smith S, Mawn L, Resnick SM, Beason-Held LL, Bagnato F, Lasko T, Blume JD, Landman BA. Discovering novel disease comorbidities using electronic medical records. PLoS One 2019; 14:e0225495. [PMID: 31774837 PMCID: PMC6880990 DOI: 10.1371/journal.pone.0225495] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 09/22/2019] [Indexed: 11/18/2022] Open
Abstract
Increasing reliance on electronic medical records at large medical centers provides unique opportunities to perform population level analyses exploring disease progression and etiology. The massive accumulation of diagnostic, procedure, and laboratory codes in one place has enabled the exploration of co-occurring conditions, their risk factors, and potential prognostic factors. While most of the readily identifiable associations in medical records are (now) well known to the scientific community, there is no doubt many more relationships are still to be uncovered in EMR data. In this paper, we introduce a novel finding index to help with that task. This new index uses data mined from real-time PubMed abstracts to indicate the extent to which empirically discovered associations are already known (i.e., present in the scientific literature). Our methods leverage second-generation p-values, which better identify associations that are truly clinically meaningful. We illustrate our new method with three examples: Autism Spectrum Disorder, Alzheimer’s Disease, and Optic Neuritis. Our results demonstrate wide utility for identifying new associations in EMR data that have the highest priority among the complex web of correlations and causalities. Data scientists and clinicians can work together more effectively to discover novel associations that are both empirically reliable and clinically understudied.
Collapse
Affiliation(s)
- Shikha Chaganti
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail:
| | - Valerie F. Welty
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Warren Taylor
- Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Kimberly Albert
- Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Michelle D. Failla
- Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Carissa Cascio
- Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Seth Smith
- Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Louise Mawn
- Department of Ophthalmology and Visual Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Susan M. Resnick
- Laboratory of Behavioral Neuroscience, National Institute on Aging, Baltimore, Maryland, United States of America
| | - Lori L. Beason-Held
- Laboratory of Behavioral Neuroscience, National Institute on Aging, Baltimore, Maryland, United States of America
| | - Francesca Bagnato
- Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Thomas Lasko
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jeffrey D. Blume
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Bennett A. Landman
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
10
|
Gutiérrez-Sacristán A, Bravo À, Giannoula A, Mayer MA, Sanz F, Furlong LI. comoRbidity: an R package for the systematic analysis of disease comorbidities. Bioinformatics 2019; 34:3228-3230. [PMID: 29897411 PMCID: PMC6137966 DOI: 10.1093/bioinformatics/bty315] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 04/19/2018] [Indexed: 12/11/2022] Open
Abstract
Motivation The study of comorbidities is a major priority due to their impact on life expectancy, quality of life and healthcare cost. The availability of electronic health records (EHRs) for data mining offers the opportunity to discover disease associations and comorbidity patterns from the clinical history of patients gathered during routine medical care. This opens the need for analytical tools for detection of disease comorbidities, including the investigation of their underlying genetic basis. Results We present comoRbidity, an R package aimed at providing a systematic and comprehensive analysis of disease comorbidities from both the clinical and molecular perspectives. comoRbidity leverages from (i) user provided clinical data from EHR databases (the clinical comorbidity analysis) and (ii) genotype-phenotype information of the diseases under study (the molecular comorbidity analysis) for a comprehensive analysis of disease comorbidities. The clinical comorbidity analysis enables identifying significant disease comorbidities from clinical data, including sex and age stratification and temporal directionality analyses, while the molecular comorbidity analysis supports the generation of hypothesis on the underlying mechanisms of the disease comorbidities by exploring shared genes among disorders. The open-source comoRbidity package is a software tool aimed at expediting the integrative analysis of disease comorbidities by incorporating several analytical and visualization functions. Availability and implementation https://bitbucket.org/ibi_group/comorbidity Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alba Gutiérrez-Sacristán
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences (DCEXS), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences (DCEXS), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Large-Scale Text Understanding Systems Lab, TALN Research Group, Department of Information and Communication Technologies (DTIC), Universitat Pompeu Fabra, Barcelona, Spain
| | - Alexia Giannoula
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences (DCEXS), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Miguel A Mayer
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences (DCEXS), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences (DCEXS), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences (DCEXS), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
11
|
Vizza P, Tradigo G, Guzzi PH, Curia R, Sisca L, Aiello F, Fragomeni G, Cannataro M, Cascini GL, Veltri P. An Innovative Framework for Bioimage Annotation and Studies. Interdiscip Sci 2018; 10:544-557. [PMID: 29094319 DOI: 10.1007/s12539-017-0264-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 09/11/2017] [Accepted: 09/13/2017] [Indexed: 06/07/2023]
Abstract
The collection and analysis of clinical data are needed to investigate diseases and to define medical protocols and treatments. Bioimages, medical annotations and patient history are clinical data acquired and studied to perform a correct diagnosis and to propose an appropriate therapy. Currently, hospital departments manage these data using legacy systems which do not often allow data integration among different departments or health structures. Thus, in many cases clinical information sharing and exchange are difficult to implement. This is also the case for biomedical images for which data integration or data overlapping is usually not available. Image annotations and comparison can be crucial for physicians in many case studies. In this paper, a general purpose framework for bioimage management and annotations is proposed. Moreover, a simple-to-use information system has been developed to integrate clinical and diagnosis codes. The framework allows physicians (1) to integrate DICOM images from different platforms and (2) to report notes and highlights directly on images, thus offering, among the others, to query and compare similar clinical cases. This contribution is the result of a framework aimed to support oncologists in managing DICOM images and clinical data from different departments. Data integration is performed using a here-proposed XML-based module also utilized to trace temporal changes in image annotations.
Collapse
Affiliation(s)
- Patrizia Vizza
- Department of Surgical and Medical Science, Magna Graecia University, Catanzaro, Italy
| | - Giuseppe Tradigo
- Department of Computer, Modeling, Electronics and Systems Engineering, University of Calabria, Cosenza, Italy
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Science, Magna Graecia University, Catanzaro, Italy
| | | | | | | | - Gionata Fragomeni
- Department of Surgical and Medical Science, Magna Graecia University, Catanzaro, Italy
| | - Mario Cannataro
- Department of Surgical and Medical Science, Magna Graecia University, Catanzaro, Italy
| | - Giuseppe Lucio Cascini
- Department of Experimental and Clinical Medicine, Magna Graecia University, Catanzaro, Italy
| | - Pierangelo Veltri
- Department of Surgical and Clinical Science, University Magna Graecia of Catanzaro, Catanzaro, Italy.
| |
Collapse
|
12
|
Hemingway H, Asselbergs FW, Danesh J, Dobson R, Maniadakis N, Maggioni A, van Thiel GJM, Cronin M, Brobert G, Vardas P, Anker SD, Grobbee DE, Denaxas S. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential. Eur Heart J 2018; 39:1481-1495. [PMID: 29370377 PMCID: PMC6019015 DOI: 10.1093/eurheartj/ehx487] [Citation(s) in RCA: 124] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 07/19/2017] [Accepted: 08/08/2017] [Indexed: 12/13/2022] Open
Abstract
Aims Cohorts of millions of people's health records, whole genome sequencing, imaging, sensor, societal and publicly available data present a rapidly expanding digital trace of health. We aimed to critically review, for the first time, the challenges and potential of big data across early and late stages of translational cardiovascular disease research. Methods and results We sought exemplars based on literature reviews and expertise across the BigData@Heart Consortium. We identified formidable challenges including: data quality, knowing what data exist, the legal and ethical framework for their use, data sharing, building and maintaining public trust, developing standards for defining disease, developing tools for scalable, replicable science and equipping the clinical and scientific work force with new inter-disciplinary skills. Opportunities claimed for big health record data include: richer profiles of health and disease from birth to death and from the molecular to the societal scale; accelerated understanding of disease causation and progression, discovery of new mechanisms and treatment-relevant disease sub-phenotypes, understanding health and diseases in whole populations and whole health systems and returning actionable feedback loops to improve (and potentially disrupt) existing models of research and care, with greater efficiency. In early translational research we identified exemplars including: discovery of fundamental biological processes e.g. linking exome sequences to lifelong electronic health records (EHR) (e.g. human knockout experiments); drug development: genomic approaches to drug target validation; precision medicine: e.g. DNA integrated into hospital EHR for pre-emptive pharmacogenomics. In late translational research we identified exemplars including: learning health systems with outcome trials integrated into clinical care; citizen driven health with 24/7 multi-parameter patient monitoring to improve outcomes and population-based linkages of multiple EHR sources for higher resolution clinical epidemiology and public health. Conclusion High volumes of inherently diverse ('big') EHR data are beginning to disrupt the nature of cardiovascular research and care. Such big data have the potential to improve our understanding of disease causation and classification relevant for early translation and to contribute actionable analytics to improve health and healthcare.
Collapse
Affiliation(s)
- Harry Hemingway
- Research Department of Clinical Epidemiology, The Farr Institute of Health Informatics Research, University College London, 222 Euston Road, London NW1 2DA, UK
- The National Institute for Health Research, Biomedical Research Centre, University College London Hospitals NHS Foundation Trust, University College London, 222 Euston Road, London NW1 2DA, UK
| | - Folkert W Asselbergs
- Research Department of Clinical Epidemiology, The Farr Institute of Health Informatics Research, University College London, 222 Euston Road, London NW1 2DA, UK
- The National Institute for Health Research, Biomedical Research Centre, University College London Hospitals NHS Foundation Trust, University College London, 222 Euston Road, London NW1 2DA, UK
- Department of Cardiology, University Medical Center Utrecht, Heidelberglaan 100, Utrecht 3584 CX, The Netherlands
| | - John Danesh
- MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Worts Causeway, Cambridge CB1 8RN, UK
| | - Richard Dobson
- Research Department of Clinical Epidemiology, The Farr Institute of Health Informatics Research, University College London, 222 Euston Road, London NW1 2DA, UK
- The National Institute for Health Research, Biomedical Research Centre, University College London Hospitals NHS Foundation Trust, University College London, 222 Euston Road, London NW1 2DA, UK
- NIHR Biomedical Research Centre for Mental Health (IOP), King‘s College London, De Crespigny Park, London SE5 8AF, UK
| | - Nikolaos Maniadakis
- European Society of Cardiology (ESC), 2035 Route des Colles, Les Templiers - CS 80179 Biot, 06903 Sophia Antipolis, France
| | - Aldo Maggioni
- European Society of Cardiology (ESC), 2035 Route des Colles, Les Templiers - CS 80179 Biot, 06903 Sophia Antipolis, France
| | - Ghislaine J M van Thiel
- Department of Cardiology, University Medical Center Utrecht, Heidelberglaan 100, Utrecht 3584 CX, The Netherlands
| | - Maureen Cronin
- Vifor Pharma Ltd, lughofstrasse 61, 8152 Glattbrugg, Zurich, Switzerland
| | - Gunnar Brobert
- Department of Epidemiology, Bayer Pharma AG, Müllerstrasse 178, 13353 Berlin, Germany
| | - Panos Vardas
- European Society of Cardiology (ESC), 2035 Route des Colles, Les Templiers - CS 80179 Biot, 06903 Sophia Antipolis, France
| | - Stefan D Anker
- Division of Cardiology and Metabolism—Heart Failure, Cachexia & Sarcopenia; Department of Cardiology (CVK), Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Charité University Medicine, Charitépl. 1, 10117 Berlin, Germany
- Department of Cardiology and Pneumology, University Medicine Göttingen (UMG), Robert-Koch-Strasse 40, 37099, Göttingen, Germany
| | - Diederick E Grobbee
- Julius Centre for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands
| | - Spiros Denaxas
- Research Department of Clinical Epidemiology, The Farr Institute of Health Informatics Research, University College London, 222 Euston Road, London NW1 2DA, UK
- The National Institute for Health Research, Biomedical Research Centre, University College London Hospitals NHS Foundation Trust, University College London, 222 Euston Road, London NW1 2DA, UK
| | | |
Collapse
|
13
|
Brunson JC, Laubenbacher RC. Applications of network analysis to routinely collected health care data: a systematic review. J Am Med Inform Assoc 2018; 25:210-221. [PMID: 29025116 PMCID: PMC6664849 DOI: 10.1093/jamia/ocx052] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Revised: 04/18/2017] [Accepted: 04/23/2017] [Indexed: 01/21/2023] Open
Abstract
Objective To survey network analyses of datasets collected in the course of routine operations in health care settings and identify driving questions, methods, needs, and potential for future research. Materials and Methods A search strategy was designed to find studies that applied network analysis to routinely collected health care datasets and was adapted to 3 bibliographic databases. The results were grouped according to a thematic analysis of their settings, objectives, data, and methods. Each group received a methodological synthesis. Results The search found 189 distinct studies reported before August 2016. We manually partitioned the sample into 4 groups, which investigated institutional exchange, physician collaboration, clinical co-occurrence, and workplace interaction networks. Several robust and ongoing research programs were discerned within (and sometimes across) the groups. Little interaction was observed between these programs, despite conceptual and methodological similarities. Discussion We use the literature sample to inform a discussion of good practice at this methodological interface, including the concordance of motivations, study design, data, and tools and the validation and standardization of techniques. We then highlight instances of positive feedback between methodological development and knowledge domains and assess the overall cohesion of the sample.
Collapse
|
14
|
Jang D, Lee S, Lee J, Kim K, Lee D. Inferring new drug indications using the complementarity between clinical disease signatures and drug effects. J Biomed Inform 2015; 59:248-57. [PMID: 26707452 DOI: 10.1016/j.jbi.2015.12.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 10/31/2015] [Accepted: 12/09/2015] [Indexed: 11/17/2022]
Abstract
BACKGROUND Drug repositioning is the process of finding new indications for existing drugs. Its importance has been dramatically increasing recently due to the enormous increase in new drug discovery cost. However, most of the previous molecular-centered drug repositioning work is not able to reflect the end-point physiological activities of drugs because of the inherent complexity of human physiological systems. METHODS Here, we suggest a novel computational framework to make inferences for alternative indications of marketed drugs by using electronic clinical information which reflects the end-point physiological results of drug's effects on the biological activities of humans. In this work, we use the concept of complementarity between clinical disease signatures and clinical drug effects. With this framework, we establish disease-related clinical variable vectors (clinical disease signature vectors) and drug-related clinical variable vectors (clinical drug effect vectors) by applying two methodologies (i.e., statistical analysis and literature mining). Finally, we assign a repositioning possibility score to each disease-drug pair by the calculation of complementarity (anti-correlation) and association between clinical states ("up" or "down") of disease signatures and clinical effects ("up", "down" or "association") of drugs. A total of 717 clinical variables in the electronic clinical dataset (NHANES), are considered in this study. RESULTS The statistical significance of our prediction results is supported through two benchmark datasets (Comparative Toxicogenomics Database and Clinical Trials). We discovered not only lots of known relationships between diseases and drugs, but also many hidden disease-drug relationships. For example, glutathione and edetic-acid may be investigated as candidate drugs for asthma treatment. We examined prediction results by using statistical experiments (enrichment verification, hyper-geometric and permutation test P<0.009 in Comparative Toxicogenomics Database and Clinical Trials) and presented evidences for those with already published literature. CONCLUSION The results show that electronic clinical information is a feasible data resource and utilizing the complementarity (anti-correlated relationships) between clinical signatures of disease and clinical effects of drugs is a potentially predictive concept in drug repositioning research. It makes the proposed approach useful to identity novel relationships between diseases and drugs that have a high probability of being biologically valid.
Collapse
Affiliation(s)
- Dongjin Jang
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea; Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea.
| | - Sejoon Lee
- Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea.
| | - Jaehyun Lee
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea; Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea.
| | - Kiseong Kim
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea; Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea.
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea; Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701, Republic of Korea.
| |
Collapse
|
15
|
Contribution of Electronic Medical Records to the Management of Rare Diseases. BIOMED RESEARCH INTERNATIONAL 2015; 2015:954283. [PMID: 26539543 PMCID: PMC4619907 DOI: 10.1155/2015/954283] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Accepted: 07/21/2015] [Indexed: 11/17/2022]
Abstract
Purpose. Electronic health record systems provide great opportunity to study most diseases. Objective of this study was to determine whether electronic medical records (EMR) in ophthalmology contribute to management of rare eye diseases, isolated or in syndromes. Study was designed to identify and collect patients' data with ophthalmology-specific EMR. Methods. Ophthalmology-specific EMR software (Softalmo software Corilus) was used to acquire ophthalmological ocular consultation data from patients with five rare eye diseases. The rare eye diseases and data were selected and collected regarding expertise of eye center. Results. A total of 135,206 outpatient consultations were performed between 2011 and 2014 in our medical center specialized in rare eye diseases. The search software identified 29 congenital aniridia, 6 Axenfeld/Rieger syndrome, 11 BEPS, 3 Nanophthalmos, and 3 Rubinstein-Taybi syndrome. Discussion. EMR provides advantages for medical care. The use of ophthalmology-specific EMR is reliable and can contribute to a comprehensive ocular visual phenotype useful for clinical research. Conclusion. Routinely EMR acquired with specific software dedicated to ophthalmology provides sufficient detail for rare diseases. These software-collected data appear useful for creating patient cohorts and recording ocular examination, avoiding the time-consuming analysis of paper records and investigation, in a University Hospital linked to a National Reference Rare Center Disease.
Collapse
|
16
|
Boland MR, Shahn Z, Madigan D, Hripcsak G, Tatonetti NP. Birth month affects lifetime disease risk: a phenome-wide method. J Am Med Inform Assoc 2015; 22:1042-53. [PMID: 26041386 PMCID: PMC4986668 DOI: 10.1093/jamia/ocv046] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 04/18/2015] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE An individual's birth month has a significant impact on the diseases they develop during their lifetime. Previous studies reveal relationships between birth month and several diseases including atherothrombosis, asthma, attention deficit hyperactivity disorder, and myopia, leaving most diseases completely unexplored. This retrospective population study systematically explores the relationship between seasonal affects at birth and lifetime disease risk for 1688 conditions. METHODS We developed a hypothesis-free method that minimizes publication and disease selection biases by systematically investigating disease-birth month patterns across all conditions. Our dataset includes 1 749 400 individuals with records at New York-Presbyterian/Columbia University Medical Center born between 1900 and 2000 inclusive. We modeled associations between birth month and 1688 diseases using logistic regression. Significance was tested using a chi-squared test with multiplicity correction. RESULTS We found 55 diseases that were significantly dependent on birth month. Of these 19 were previously reported in the literature (P < .001), 20 were for conditions with close relationships to those reported, and 16 were previously unreported. We found distinct incidence patterns across disease categories. CONCLUSIONS Lifetime disease risk is affected by birth month. Seasonally dependent early developmental mechanisms may play a role in increasing lifetime risk of disease.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics Observational Health Data Sciences and Informatics (OHDSI)
| | | | - David Madigan
- Observational Health Data Sciences and Informatics (OHDSI) Department of Statistics
| | - George Hripcsak
- Department of Biomedical Informatics Observational Health Data Sciences and Informatics (OHDSI)
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics Observational Health Data Sciences and Informatics (OHDSI) Department of Systems Biology Department of Medicine, Columbia University, New York, NY, USA
| |
Collapse
|
17
|
Boland MR, Tatonetti NP, Hripcsak G. Development and validation of a classification approach for extracting severity automatically from electronic health records. J Biomed Semantics 2015; 6:14. [PMID: 25848530 PMCID: PMC4386082 DOI: 10.1186/s13326-015-0010-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 03/03/2015] [Indexed: 12/29/2022] Open
Abstract
Background Electronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships. Severity is important for distinguishing among phenotypes; however other severity indices classify patient-level severity (e.g., mild vs. acute dermatitis) rather than phenotype-level severity (e.g., acne vs. myocardial infarction). Phenotype-level severity is independent of the individual patient’s state and is relative to other phenotypes. Further, phenotype-level severity does not change based on the individual patient. For example, acne is mild at the phenotype-level and relative to other phenotypes. Therefore, a given patient may have a severe form of acne (this is the patient-level severity), but this does not effect its overall designation as a mild phenotype at the phenotype-level. Methods We present a method for classifying severity at the phenotype-level that uses the Systemized Nomenclature of Medicine – Clinical Terms. Our method is called the Classification Approach for Extracting Severity Automatically from Electronic Health Records (CAESAR). CAESAR combines multiple severity measures – number of comorbidities, medications, procedures, cost, treatment time, and a proportional index term. CAESAR employs a random forest algorithm and these severity measures to discriminate between severe and mild phenotypes. Results Using a random forest algorithm and these severity measures as input, CAESAR differentiates between severe and mild phenotypes (sensitivity = 91.67, specificity = 77.78) when compared to a manually evaluated reference standard (k = 0.716). Conclusions CAESAR enables researchers to measure phenotype severity from EHRs to identify phenotypes that are important for comparative effectiveness research.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA ; Department of Systems Biology, Columbia University, New York, NY USA ; Department of Medicine, Columbia University, New York, NY USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA
| |
Collapse
|
18
|
Wang L, Uesugi S, Ting IH, Okuhara K, Wang K. Network Analysis of Comorbidities: Case Study of HIV/AIDS in Taiwan. COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE 2015. [PMCID: PMC7122503 DOI: 10.1007/978-3-662-48319-0_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Comorbidities are the presence of one or more additional disorders or diseases co-occurring with a primary disease or disorder. The purpose of this study is to identify diseases that co-occur with HIV/AIDS and analyze the gender differences. Data was collected from 536 HIV/AIDS admission medical records out of 1,377,469 admission medical records from 1997 to 2010 in Taiwan. In this study, the comorbidity relationships are presented in the phenotypic disease network (PDN), and φ-correlation is used to measure the distance between two diseases on the network. The results show that there is a high correlation in the following pairs/triad of diseases: human immunodeficiency virus infection with specified conditions (042) and pneumocystosis pneumonia (1363), human immunodeficiency virus infection with specified malignant neoplasms (0422) and kaposi’s sarcoma of other specified sites (1768), human immunodeficiency virus acquired immunodeficiency syndrome, and unspecified (0429) and progressive multifocal leukoencephalopathy (0463), and lastly, human immunodeficiency virus infection with specified infections (0420), meningoencephalitis due to toxoplasmosis (1300), and human immunodeficiency virus infection specified infections causing other specified infections (0421).
Collapse
Affiliation(s)
- Leon Wang
- National University of Kaohsiung, Kaohsiung City, Taiwan
| | | | - I-Hsien Ting
- National University of Kaohsiung, Kaohsiung City, Taiwan
| | | | - Kai Wang
- National University of Kaohsiung, Kaohsiung City, Taiwan
| |
Collapse
|
19
|
Vilar S, Ryan PB, Madigan D, Stang PE, Schuemie MJ, Friedman C, Tatonetti NP, Hripcsak G. Similarity-based modeling applied to signal detection in pharmacovigilance. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2014; 3:e137. [PMID: 25250527 PMCID: PMC4211266 DOI: 10.1038/psp.2014.35] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 07/06/2014] [Indexed: 12/31/2022]
Abstract
One of the main objectives in pharmacovigilance is the detection of adverse drug events (ADEs) through mining of healthcare databases, such as electronic health records or administrative claims data. Although different approaches have been shown to be of great value, research is still focusing on the enhancement of signal detection to gain efficiency in further assessment and follow-up. We applied similarity-based modeling techniques, using 2D and 3D molecular structure, ADE, target, and ATC (anatomical therapeutic chemical) similarity measures, to the candidate associations selected previously in a medication-wide association study for four ADE outcomes. Our results showed an improvement in the precision when we ranked the subset of ADE candidates using similarity scorings. This method is simple, useful to strengthen or prioritize signals generated from healthcare databases, and facilitates ADE detection through the identification of the most similar drugs for which ADE information is available.
Collapse
Affiliation(s)
- S Vilar
- 1] Department of Biomedical Informatics, Columbia University, New York, New York, USA [2] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA
| | - P B Ryan
- 1] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [2] Janssen Research and Development, Titusville, New Jersey, USA
| | - D Madigan
- 1] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [2] Department of Statistics, Columbia University, New York, New York, USA
| | - P E Stang
- 1] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [2] Janssen Research and Development, Titusville, New Jersey, USA
| | - M J Schuemie
- 1] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [2] Janssen Research and Development, Titusville, New Jersey, USA
| | - C Friedman
- 1] Department of Biomedical Informatics, Columbia University, New York, New York, USA [2] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA
| | - N P Tatonetti
- 1] Department of Biomedical Informatics, Columbia University, New York, New York, USA [2] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA [3] Department of Systems Biology, Columbia University Medical Center, New York, New York, USA [4] Department of Medicine, Columbia University Medical Center, New York, New York, USA
| | - G Hripcsak
- 1] Department of Biomedical Informatics, Columbia University, New York, New York, USA [2] Observational Health Data Sciences and Informatics (OHDSI), New York, New York, USA
| |
Collapse
|
20
|
Finlayson SG, LePendu P, Shah NH. Building the graph of medicine from millions of clinical narratives. Sci Data 2014; 1:140032. [PMID: 25977789 PMCID: PMC4322575 DOI: 10.1038/sdata.2014.32] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 08/18/2014] [Indexed: 01/08/2023] Open
Abstract
Electronic health records (EHR) represent a rich and relatively untapped resource for characterizing the true nature of clinical practice and for quantifying the degree of inter-relatedness of medical entities such as drugs, diseases, procedures and devices. We provide a unique set of co-occurrence matrices, quantifying the pairwise mentions of 3 million terms mapped onto 1 million clinical concepts, calculated from the raw text of 20 million clinical notes spanning 19 years of data. Co-frequencies were computed by means of a parallelized annotation, hashing, and counting pipeline that was applied over clinical notes from Stanford Hospitals and Clinics. The co-occurrence matrix quantifies the relatedness among medical concepts which can serve as the basis for many statistical tests, and can be used to directly compute Bayesian conditional probabilities, association rules, as well as a range of test statistics such as relative risks and odds ratios. This dataset can be leveraged to quantitatively assess comorbidity, drug-drug, and drug-disease patterns for a range of clinical, epidemiological, and financial applications.
Collapse
Affiliation(s)
- Samuel G. Finlayson
- Center for Biomedical Informatics Research, Stanford University, Stanford, California 94305, USA
| | - Paea LePendu
- Center for Biomedical Informatics Research, Stanford University, Stanford, California 94305, USA
| | - Nigam H. Shah
- Center for Biomedical Informatics Research, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
21
|
Hanauer DA, Saeed M, Zheng K, Mei Q, Shedden K, Aronson AR, Ramakrishnan N. Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis. J Am Med Inform Assoc 2014; 21:925-37. [PMID: 24928177 PMCID: PMC4147617 DOI: 10.1136/amiajnl-2014-002767] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Revised: 05/23/2014] [Accepted: 05/27/2014] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVE We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel. METHODS Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations. RESULTS The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations. DISCUSSION Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations. CONCLUSIONS In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Mohammed Saeed
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Kai Zheng
- Department of Health Management and Policy, University of Michigan School of Public Health, Ann Arbor, Michigan, USA
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
| | - Qiaozhu Mei
- School of Information, University of Michigan, Ann Arbor, Michigan, USA
- Department of Electronic Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA
| | - Kerby Shedden
- Center for Statistical Consultation and Research, University of Michigan, Ann Arbor, Michigan, USA
| | - Alan R Aronson
- Lister Hill Center, National Library of Medicine, Bethesda, Maryland, USA
| | - Naren Ramakrishnan
- Department of Computer Science, Discovery Analytics Center, Virginia Tech, Arlington, Virginia, USA
| |
Collapse
|
22
|
Data-driven discovery of seasonally linked diseases from an Electronic Health Records system. BMC Bioinformatics 2014; 15 Suppl 6:S3. [PMID: 25078762 PMCID: PMC4158606 DOI: 10.1186/1471-2105-15-s6-s3] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Patterns of disease incidence can identify new risk factors for the disease or provide insight into the etiology. For example, allergies and infectious diseases have been shown to follow periodic temporal patterns due to seasonal changes in environmental or infectious agents. Previous work searching for seasonal or other temporal patterns in disease diagnosis rates has been limited both in the scope of the diseases examined and in the ability to distinguish unexpected seasonal patterns. Electronic Health Records (EHR) compile extensive longitudinal clinical information, constituting a unique source for discovery of trends in occurrence of disease. However, the data suffer from inherent biases that preclude an identification of temporal trends. METHODS Motivated by observation of the biases in this data source, we developed a method (Lomb-Scargle periodograms in detrended data, LSP-detrend) to find periodic patterns by adjusting the temporal information for broad trends in incidence, as well as seasonal changes in total hospitalizations. LSP-detrend can sensitively uncover periodic temporal patterns in the corrected data and identify the significance of the trend. We apply LSP-detrend to a compilation of records from 1.5 million patients encoded by ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification), including 2,805 disorders with more than 500 occurrences across a 12 year period, recorded from 1.5 million patients. RESULTS AND CONCLUSIONS Although EHR data, and ICD-9 coded records in particular, were not created with the intention of aggregated use for research, these data can in fact be mined for periodic patterns in incidence of disease, if confounders are properly removed. Of all diagnoses, around 10% are identified as seasonal by LSP-detrend, including many known phenomena. We robustly reproduce previous findings, even for relatively rare diseases. For instance, Kawasaki disease, a rare childhood disease that has been associated with weather patterns, is detected as strongly linked with winter months. Among the novel results, we find a bi-annual increase in exacerbations of myasthenia gravis, a potentially life threatening complication of an autoimmune disease. We dissect the causes of this seasonal incidence and propose that factors predisposing patients to this event vary through the year.
Collapse
|
23
|
Association between Kawasaki disease and autism: a population-based study in Taiwan. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2014; 11:3705-16. [PMID: 24705358 PMCID: PMC4025040 DOI: 10.3390/ijerph110403705] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Revised: 03/14/2014] [Accepted: 03/24/2014] [Indexed: 12/29/2022]
Abstract
Objective: The association between Kawasaki disease and autism has rarely been studied in Asian populations. By using a nationwide Taiwanese population-based claims database, we tested the hypothesis that Kawasaki disease may increase the risk of autism in Taiwan. Materials and Methods: Our study cohort consisted of patients who had received the diagnosis of Kawasaki disease (ICD-9-CM: 446.1) between 1997 and 2005 (N = 563). For a comparison cohort, five age- and gender-matched control patients for every patient in the study cohort were selected using random sampling (N = 2,815). All subjects were tracked for 5 years from the date of cohort entry to identify whether they had developed autism (ICD-9-CM code 299.0) or not. Cox proportional hazard regressions were then performed to evaluate 5-year autism-free survival rates. Results: The main finding of this study was that patients with Kawasaki disease seem to not be at increased risk of developing autism. Of the total patients, four patients developed autism during the 5-year follow-up period, among whom two were Kawasaki disease patients and two were in the comparison cohort. Further, the adjusted hazard ratios (AHR) (AHR: 4.81; 95% confidence interval: 0.68–34.35; P = 0.117) did not show any statistical significance between the Kawasaki disease group and the control group during the 5-year follow-up. Conclusion: Our study indicated that patients with Kawasaki disease are not at increased risk of autism.
Collapse
|
24
|
Yeleswarapu S, Rao A, Joseph T, Saipradeep VG, Srinivasan R. A pipeline to extract drug-adverse event pairs from multiple data sources. BMC Med Inform Decis Mak 2014; 14:13. [PMID: 24559132 PMCID: PMC3936866 DOI: 10.1186/1472-6947-14-13] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Accepted: 02/14/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pharmacovigilance aims to uncover and understand harmful side-effects of drugs, termed adverse events (AEs). Although the current process of pharmacovigilance is very systematic, the increasing amount of information available in specialized health-related websites as well as the exponential growth in medical literature presents a unique opportunity to supplement traditional adverse event gathering mechanisms with new-age ones. METHOD We present a semi-automated pipeline to extract associations between drugs and side effects from traditional structured adverse event databases, enhanced by potential drug-adverse event pairs mined from user-comments from health-related websites and MEDLINE abstracts. The pipeline was tested using a set of 12 drugs representative of two previous studies of adverse event extraction from health-related websites and MEDLINE abstracts. RESULTS Testing the pipeline shows that mining non-traditional sources helps substantiate the adverse event databases. The non-traditional sources not only contain the known AEs, but also suggest some unreported AEs for drugs which can then be analyzed further. CONCLUSION A semi-automated pipeline to extract the AE pairs from adverse event databases as well as potential AE pairs from non-traditional sources such as text from MEDLINE abstracts and user-comments from health-related websites is presented.
Collapse
Affiliation(s)
| | - Aditya Rao
- TCS Innovation Labs, Tata Consultancy Services Ltd, Deccan Park, 1, Software Units Layout, Madhapur, Hyderabad 500081, Andhra Pradesh, India.
| | | | | | | |
Collapse
|
25
|
Abstract
The growing amount and availability of electronic health record (EHR) data present enhanced opportunities for discovering new knowledge about diseases. In the past decade, there has been an increasing number of data and text mining studies focused on the identification of disease associations (e.g., disease-disease, disease-drug, and disease-gene) in structured and unstructured EHR data. This chapter presents a knowledge discovery framework for mining the EHR for disease knowledge and describes each step for data selection, preprocessing, transformation, data mining, and interpretation/validation. Topics including natural language processing, standards, and data privacy and security are also discussed in the context of this framework.
Collapse
Affiliation(s)
- Elizabeth S Chen
- Center for Clinical and Translational Science, University of Vermont, Burlington, VT, USA,
| | | |
Collapse
|
26
|
Sengupta D, Naik PK. SN algorithm: analysis of temporal clinical data for mining periodic patterns and impending augury. J Clin Bioinforma 2013; 3:24. [PMID: 24283349 PMCID: PMC4177143 DOI: 10.1186/2043-9113-3-24] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/25/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND EHR (Electronic Health Record) system has led to development of specialized form of clinical databases which enable storage of information in temporal prospective. It has been a big challenge for mining this form of clinical data considering varied temporal points. This study proposes a conjoined solution to analyze the clinical parameters akin to a disease. We have used "association rule mining algorithm" to discover association rules among clinical parameters that can be augmented with the disease. Furthermore, we have proposed a new algorithm, SN algorithm, to map clinical parameters along with a disease state at various temporal points. RESULT SN algorithm is based on Jacobian approach, which augurs the state of a disease 'Sn' at a given temporal point 'Tn' by mapping the derivatives with the temporal point 'T0', whose state of disease 'S0' is known. The predictive ability of the proposed algorithm is evaluated in a temporal clinical data set of brain tumor patients. We have obtained a very high prediction accuracy of ~97% for a brain tumor state 'Sn' for any temporal point 'Tn'. CONCLUSION The results indicate that the methodology followed may be of good value to the diagnostic procedure, especially for analyzing temporal form of clinical data.
Collapse
Affiliation(s)
- Dipankar Sengupta
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Waknaghat, Solan, H,P, India.
| | | |
Collapse
|
27
|
Hanauer DA, Ramakrishnan N, Seyfried LS. Describing the relationship between cat bites and human depression using data from an electronic health record. PLoS One 2013; 8:e70585. [PMID: 23936453 PMCID: PMC3731284 DOI: 10.1371/journal.pone.0070585] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 06/20/2013] [Indexed: 01/09/2023] Open
Abstract
Data mining approaches have been increasingly applied to the electronic health record and have led to the discovery of numerous clinical associations. Recent data mining studies have suggested a potential association between cat bites and human depression. To explore this possible association in more detail we first used administrative diagnosis codes to identify patients with either depression or bites, drawn from a population of 1.3 million patients. We then conducted a manual chart review in the electronic health record of all patients with a code for a bite to accurately determine which were from cats or dogs. Overall there were 750 patients with cat bites, 1,108 with dog bites, and approximately 117,000 patients with depression. Depression was found in 41.3% of patients with cat bites and 28.7% of those with dog bites. Furthermore, 85.5% of those with both cat bites and depression were women, compared to 64.5% of those with dog bites and depression. The probability of a woman being diagnosed with depression at some point in her life if she presented to our health system with a cat bite was 47.0%, compared to 24.2% of men presenting with a similar bite. The high proportion of depression in patients who had cat bites, especially among women, suggests that screening for depression could be appropriate in patients who present to a clinical provider with a cat bite. Additionally, while no causative link is known to explain this association, there is growing evidence to suggest that the relationship between cats and human mental illness, such as depression, warrants further investigation.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan, Ann Arbor, Michigan, USA.
| | | | | |
Collapse
|
28
|
Jung J, Lee D. Inferring disease association using clinical factors in a combinatorial manner and their use in drug repositioning. Bioinformatics 2013; 29:2017-23. [DOI: 10.1093/bioinformatics/btt327] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
|
29
|
Sahoo SS, Zhao M, Luo L, Bozorgi A, Gupta D, Lhatoo SD, Zhang GQ. OPIC: Ontology-driven Patient Information Capturing system for epilepsy. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:799-808. [PMID: 23304354 PMCID: PMC3540561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The widespread use of paper or document-based forms for capturing patient information in various clinical settings, for example in epilepsy centers, is a critical barrier for large-scale, multi-center research studies that require interoperable, consistent, and error-free data collection. This challenge can be addressed by a web-accessible and flexible patient data capture system that is supported by a common terminological system to facilitate data re-usability, sharing, and integration. We present OPIC, an Ontology-driven Patient Information Capture (OPIC) system that uses a domain-specific epilepsy and seizure ontology (EpSO) to (1) support structured entry of multi-modal epilepsy data, (2) proactively ensure quality of data through use of ontology terms in drop-down menus, and (3) identify and index clinically relevant ontology terms in free-text fields to improve accuracy of subsequent analytical queries (e.g. cohort identification). EpSO, modeled using the Web Ontology Language (OWL), conforms to the recommendations of the International League Against Epilepsy (ILAE) classification and terminological commission. OPIC has been developed using agile software engineering methodology for rapid development cycles in close collaboration with domain expert and end users. We report the result from the initial deployment of OPIC at the University Hospitals Case Medical Center (UH CMC) epilepsy monitoring unit (EMU) as part of the NIH-funded project on Sudden Unexpected Death in Epilepsy (SUDEP). Preliminary user evaluation shows that OPIC has achieved its design objectives to be an intuitive patient information capturing system that also reduces the potential for data entry errors and variability in use of epilepsy terms.
Collapse
Affiliation(s)
- Satya S Sahoo
- Division of Medical Informatics, CaseWestern Reserve University, Cleveland, OH, USA
| | | | | | | | | | | | | |
Collapse
|
30
|
Hanauer DA, Ramakrishnan N. Modeling temporal relationships in large scale clinical associations. J Am Med Inform Assoc 2012; 20:332-41. [PMID: 23019240 DOI: 10.1136/amiajnl-2012-001117] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVE We describe an approach for modeling temporal relationships in a large scale association analysis of electronic health record data. The addition of temporal information can inform hypothesis generation and help to explain the relationships. We applied this approach on a dataset containing 41.2 million time-stamped International Classification of Diseases, Ninth Revision (ICD-9) codes from 1.6 million patients. METHODS We performed two independent analyses including a pairwise association analysis using a χ(2) test and a temporal analysis using a binomial test. Data were visualized using network diagrams and reviewed for clinical significance. RESULTS We found nearly 400 000 highly associated pairs of ICD-9 codes with varying numbers of strong temporal associations ranging from ≥1 day to ≥10 years apart. Most of the findings were not considered clinically novel, although some, such as an association between Helicobacter pylori infection and diabetes, have recently been reported in the literature. The temporal analysis in our large cohort, however, revealed that diabetes usually preceded the diagnoses of H pylori, raising questions about possible cause and effect. DISCUSSION Such analyses have significant limitations, some of which are due to known problems with ICD-9 codes and others to potentially incomplete data even at a health system level. Nevertheless, large scale association analyses with temporal modeling can help provide a mechanism for novel discovery in support of hypothesis generation. CONCLUSIONS Temporal relationships can provide an additional layer of meaning in identifying and interpreting clinical associations.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI 48109-5940, USA.
| | | |
Collapse
|
31
|
Enhancing adverse drug event detection in electronic health records using molecular structure similarity: application to pancreatitis. PLoS One 2012; 7:e41471. [PMID: 22911794 PMCID: PMC3404072 DOI: 10.1371/journal.pone.0041471] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Accepted: 06/21/2012] [Indexed: 12/14/2022] Open
Abstract
Background Adverse drug events (ADEs) detection and assessment is at the center of pharmacovigilance. Data mining of systems, such as FDA’s Adverse Event Reporting System (AERS) and more recently, Electronic Health Records (EHRs), can aid in the automatic detection and analysis of ADEs. Although different data mining approaches have been shown to be valuable, it is still crucial to improve the quality of the generated signals. Objective To leverage structural similarity by developing molecular fingerprint-based models (MFBMs) to strengthen ADE signals generated from EHR data. Methods A reference standard of drugs known to be causally associated with the adverse event pancreatitis was used to create a MFBM. Electronic Health Records (EHRs) from the New York Presbyterian Hospital were mined to generate structured data. Disproportionality Analysis (DPA) was applied to the data, and 278 possible signals related to the ADE pancreatitis were detected. Candidate drugs associated with these signals were then assessed using the MFBM to find the most promising candidates based on structural similarity. Results The use of MFBM as a means to strengthen or prioritize signals generated from the EHR significantly improved the detection accuracy of ADEs related to pancreatitis. MFBM also highlights the etiology of the ADE by identifying structurally similar drugs, which could follow a similar mechanism of action. Conclusion The method proposed in this paper provides evidence of being a promising adjunct to existing automated ADE detection and analysis approaches.
Collapse
|
32
|
|
33
|
Aslakson E, Szekely S, Vernon SD, Bateman L, Baumbach J, Setty Y. Live sequence charts to model medical information. Theor Biol Med Model 2012; 9:22. [PMID: 22703558 PMCID: PMC3536704 DOI: 10.1186/1742-4682-9-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Accepted: 05/31/2012] [Indexed: 11/12/2022] Open
Abstract
Background Medical records accumulate data concerning patient health and the natural history of disease progression. However, methods to mine information systematically in a form other than an electronic health record are not yet available. The purpose of this study was to develop an object modeling technique as a first step towards a formal database of medical records. Method Live Sequence Charts (LSC) were used to formalize the narrative text obtained during a patient interview. LSCs utilize a visual scenario-based programming language to build object models. LSC extends the classical language of UML message sequence charts (MSC), predominantly through addition of modalities and providing executable semantics. Inter-object scenarios were defined to specify natural history event interactions and different scenarios in the narrative text. Result A simulated medical record was specified into LSC formalism by translating the text into an object model that comprised a set of entities and events. The entities described the participating components (i.e., doctor, patient and record) and the events described the interactions between elements. A conceptual model is presented to illustrate the approach. An object model was generated from data extracted from an actual new patient interview, where the individual was eventually diagnosed as suffering from Chronic Fatigue Syndrome (CFS). This yielded a preliminary formal designated vocabulary for CFS development that provided a basis for future formalism of these records. Conclusions Translation of medical records into object models created the basis for a formal database of the patient narrative that temporally depicts the events preceding disease, the diagnosis and treatment approach. The LSCs object model of the medical narrative provided an intuitive, visual representation of the natural history of the patient’s disease.
Collapse
Affiliation(s)
- Eric Aslakson
- Poiema, LLC, 375 Chelsea Cir NE, Atlanta, GA 30307, USA
| | | | | | | | | | | |
Collapse
|
34
|
Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; 13:395-405. [PMID: 22549152 DOI: 10.1038/nrg3208] [Citation(s) in RCA: 708] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Clinical data describing the phenotypes and treatment of patients represents an underused data source that has much greater research potential than is currently realized. Mining of electronic health records (EHRs) has the potential for establishing new patient-stratification principles and for revealing unknown disease correlations. Integrating EHR data with genetic data will also give a finer understanding of genotype-phenotype relationships. However, a broad range of ethical, legal and technical reasons currently hinder the systematic deposition of these data in EHRs and their mining. Here, we consider the potential for furthering medical research and clinical care using EHR data and the challenges that must be overcome before this is a reality.
Collapse
|
35
|
Trifonov V, Pasqualucci L, Dalla-Favera R, Rabadan R. Fractal-like distributions over the rational numbers in high-throughput biological and clinical data. Sci Rep 2012; 1:191. [PMID: 22355706 PMCID: PMC3240948 DOI: 10.1038/srep00191] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Accepted: 10/31/2011] [Indexed: 11/29/2022] Open
Abstract
Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing (HTS), expression profiles, proteomics, and electronic health records (EHR) are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of fractal-like distributions that commonly appear in the analysis of such data. The first set of examples are drawn from a HTS experiment. Here, the distributions appear as part of the evaluation of the error rate of the sequencing and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in EHR. The distributions are also relevant to identification of subclonal populations in tumors and the study of quasi-species and intrahost diversity of viral populations.
Collapse
Affiliation(s)
- Vladimir Trifonov
- Department of Biomedical Informatics, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA.
| | | | | | | |
Collapse
|