1
|
Thangudu RR, Holck M, Singhal D, Pilozzi A, Edwards N, Rudnick PA, Domagalski MJ, Chilappagari P, Ma L, Xin Y, Le T, Nyce K, Chaudhary R, Ketchum KA, Maurais A, Connolly B, Riffle M, Chambers MC, MacLean B, MacCoss MJ, McGarvey PB, Basu A, Otridge J, Casas-Silva E, Venkatachari S, Rodriguez H, Zhang X. NCI's Proteomic Data Commons: A Cloud-Based Proteomics Repository Empowering Comprehensive Cancer Analysis through Cross-Referencing with Genomic and Imaging Data. CANCER RESEARCH COMMUNICATIONS 2024; 4:2480-2488. [PMID: 39225545 DOI: 10.1158/2767-9764.crc-24-0243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 07/22/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024]
Abstract
Proteomics has emerged as a powerful tool for studying cancer biology, developing diagnostics, and therapies. With the continuous improvement and widespread availability of high-throughput proteomic technologies, the generation of large-scale proteomic data has become more common in cancer research, and there is a growing need for resources that support the sharing and integration of multi-omics datasets. Such datasets require extensive metadata including clinical, biospecimen, and experimental and workflow annotations that are crucial for data interpretation and reanalysis. The need to integrate, analyze, and share these data has led to the development of NCI's Proteomic Data Commons (PDC), accessible at https://pdc.cancer.gov. As a specialized repository within the NCI Cancer Research Data Commons (CRDC), PDC enables researchers to locate and analyze proteomic data from various cancer types and connect with genomic and imaging data available for the same samples in other CRDC nodes. Presently, PDC houses annotated data from more than 160 datasets across 19 cancer types, generated by several large-scale cancer research programs with cohort sizes exceeding 100 samples (tumor and associated normal when available). In this article, we review the current state of PDC in cancer research, discuss the opportunities and challenges associated with data sharing in proteomics, and propose future directions for the resource. SIGNIFICANCE The Proteomic Data Commons (PDC) plays a crucial role in advancing cancer research by providing a centralized repository of high-quality cancer proteomic data, enriched with extensive clinical annotations. By integrating and cross-referencing with complementary genomic and imaging data, the PDC facilitates multi-omics analyses, driving comprehensive insights, and accelerating discoveries across various cancer types.
Collapse
Affiliation(s)
| | | | | | | | | | - Paul A Rudnick
- Spectragen Informatics LLC, Bainbridge Island, Washington
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Esmeralda Casas-Silva
- Center for Biomedical Informatics & Information Technology, National Cancer Institute, Rockville, Maryland
| | | | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, Maryland
| | - Xu Zhang
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, Maryland
| |
Collapse
|
2
|
Adams MCB, Hurley RW, Siddons A, Topaloglu U, Wandner LD. NIH HEAL Clinical Data Elements (CDE) implementation: NIH HEAL Initiative IMPOWR network IDEA-CC. PAIN MEDICINE (MALDEN, MASS.) 2023; 24:743-749. [PMID: 36799548 PMCID: PMC10321760 DOI: 10.1093/pm/pnad018] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 02/14/2023] [Accepted: 02/15/2023] [Indexed: 02/18/2023]
Abstract
OBJECTIVE The National Institutes of Health (NIH) HEAL Initiative is making data findable, accessible, interoperable, and reusable (FAIR) to maximize the value of the unprecedented federal investment in pain and opioid-use disorder research. This involves standardizing the use of common data elements (CDE) for clinical research. METHODS This work describes the process of the selection, processing, harmonization, and design constraints of CDE across a pain and opioid use disorder clinical trials network (NIH HEAL IMPOWR). RESULTS The network alignment allowed for incorporation of newer data standards across the clinical trials. Specific advances included geographic coding (RUCA), deidentified patient identifiers (GUID), shareable clinical survey libraries (REDCap), and concept mapping to standardized concepts (UMLS). CONCLUSIONS While complex, harmonization across a network of chronic pain and opioid use disorder clinical trials with separate interventions can be optimized through use of CDEs and data standardization processes. This standardization process will support the robust secondary data analyses. Scaling this process could standardize CDE results across interventions or disease state which could help inform insurance companies or government organizations about coverage determinations. The development of the HEAL CDE program supports connecting isolated studies and solutions to each other, but the practical aspects may be challenging for some studies to implement. Leveraging tools and technology to simplify process and create ready to use resources may support wider adoption of consistent data standards.
Collapse
Affiliation(s)
- Meredith C B Adams
- Departments of Anesthesiology, Biomedical Informatics, and Public Health Sciences, Wake Forest University School of Medicine, Medical Center Boulevard, Winston-Salem, NC 27157, United States
| | - Robert W Hurley
- Departments of Anesthesiology, Translational Neuroscience, and Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Andrew Siddons
- National Institute of Neurological Disorders and Stroke, Bethesda, MD, United States
| | - Umit Topaloglu
- Department of Cancer Biology, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Laura D Wandner
- National Institute of Neurological Disorders and Stroke, Bethesda, MD, United States
| |
Collapse
|
3
|
Vesteghem C, Brøndum RF, Sønderkær M, Sommer M, Schmitz A, Bødker JS, Dybkær K, El-Galaly TC, Bøgsted M. Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives. Brief Bioinform 2021; 21:936-945. [PMID: 31263868 PMCID: PMC7299292 DOI: 10.1093/bib/bbz044] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 03/13/2019] [Accepted: 03/21/2019] [Indexed: 12/26/2022] Open
Abstract
Compelling research has recently shown that cancer is so heterogeneous that single research centres cannot produce enough data to fit prognostic and predictive models of sufficient accuracy. Data sharing in precision oncology is therefore of utmost importance. The Findable, Accessible, Interoperable and Reusable (FAIR) Data Principles have been developed to define good practices in data sharing. Motivated by the ambition of applying the FAIR Data Principles to our own clinical precision oncology implementations and research, we have performed a systematic literature review of potentially relevant initiatives. For clinical data, we suggest using the Genomic Data Commons model as a reference as it provides a field-tested and well-documented solution. Regarding classification of diagnosis, morphology and topography and drugs, we chose to follow the World Health Organization standards, i.e. ICD10, ICD-O-3 and Anatomical Therapeutic Chemical classifications, respectively. For the bioinformatics pipeline, the Genome Analysis ToolKit Best Practices using Docker containers offer a coherent solution and have therefore been selected. Regarding the naming of variants, we follow the Human Genome Variation Society's standard. For the IT infrastructure, we have built a centralized solution to participate in data sharing through federated solutions such as the Beacon Networks.
Collapse
Affiliation(s)
- Charles Vesteghem
- Department of Clinical Medicine, Aalborg University, Denmark.,Department of Haematology, Aalborg University Hospital, Denmark
| | | | - Mads Sønderkær
- Department of Haematology, Aalborg University Hospital, Denmark
| | - Mia Sommer
- Department of Clinical Medicine, Aalborg University, Denmark.,Department of Haematology, Aalborg University Hospital, Denmark
| | | | | | - Karen Dybkær
- Department of Clinical Medicine, Aalborg University, Denmark.,Department of Haematology, Aalborg University Hospital, Denmark.,Clinical Cancer Research Center, Aalborg University Hospital, Denmark
| | - Tarec Christoffer El-Galaly
- Department of Clinical Medicine, Aalborg University, Denmark.,Department of Haematology, Aalborg University Hospital, Denmark.,Clinical Cancer Research Center, Aalborg University Hospital, Denmark
| | - Martin Bøgsted
- Department of Clinical Medicine, Aalborg University, Denmark.,Department of Haematology, Aalborg University Hospital, Denmark.,Clinical Cancer Research Center, Aalborg University Hospital, Denmark
| |
Collapse
|
4
|
Renner R, Li S, Huang Y, van der Zijp-Tan AC, Tan S, Li D, Kasukurthi MV, Benton R, Borchert GM, Huang J, Jiang G. Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner. BMC Med Inform Decis Mak 2019; 19:276. [PMID: 31865899 PMCID: PMC6927104 DOI: 10.1186/s12911-019-0979-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. METHODS In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. RESULTS For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. DISCUSSION Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. CONCLUSIONS Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.
Collapse
Affiliation(s)
| | - Shengyu Li
- School of Computing, University of South Alabama, Mobile, AL 36688 USA
| | - Yulong Huang
- College of Allied Health Professions, University of South Alabama, Mobile, AL 36608 USA
| | | | - Shaobo Tan
- School of Computing, University of South Alabama, Mobile, AL 36688 USA
| | - Dongqi Li
- School of Computing, University of South Alabama, Mobile, AL 36688 USA
| | | | - Ryan Benton
- School of Computing, University of South Alabama, Mobile, AL 36688 USA
| | - Glen M. Borchert
- College of Medicine, University of South Alabama, Mobile, AL 36688 USA
| | - Jingshan Huang
- School of Computing, University of South Alabama, Mobile, AL 36688 USA
- Qilu University of Technology (Shandong Academy of Science), Jinan, China
| | | |
Collapse
|
5
|
Yang S, Guo J, Wei R. Semantic interoperability with heterogeneous information systems on the internet through automatic tabular document exchange. INFORM SYST 2017. [DOI: 10.1016/j.is.2016.10.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
6
|
Li G, Bankhead P, Dunne PD, O’Reilly PG, James JA, Salto-Tellez M, Hamilton PW, McArt DG. Embracing an integromic approach to tissue biomarker research in cancer: Perspectives and lessons learned. Brief Bioinform 2017; 18:634-646. [PMID: 27255914 PMCID: PMC5862317 DOI: 10.1093/bib/bbw044] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Revised: 04/08/2016] [Indexed: 02/07/2023] Open
Abstract
Modern approaches to biomedical research and diagnostics targeted towards precision medicine are generating 'big data' across a range of high-throughput experimental and analytical platforms. Integrative analysis of this rich clinical, pathological, molecular and imaging data represents one of the greatest bottlenecks in biomarker discovery research in cancer and other diseases. Following on from the publication of our successful framework for multimodal data amalgamation and integrative analysis, Pathology Integromics in Cancer (PICan), this article will explore the essential elements of assembling an integromics framework from a more detailed perspective. PICan, built around a relational database storing curated multimodal data, is the research tool sitting at the heart of our interdisciplinary efforts to streamline biomarker discovery and validation. While recognizing that every institution has a unique set of priorities and challenges, we will use our experiences with PICan as a case study and starting point, rationalizing the design choices we made within the context of our local infrastructure and specific needs, but also highlighting alternative approaches that may better suit other programmes of research and discovery. Along the way, we stress that integromics is not just a set of tools, but rather a cohesive paradigm for how modern bioinformatics can be enhanced. Successful implementation of an integromics framework is a collaborative team effort that is built with an eye to the future and greatly accelerates the processes of biomarker discovery, validation and translation into clinical practice.
Collapse
Affiliation(s)
- Gerald Li
- Centre for Cancer Research and Cell Biology (CCRCB), Queen’s University Belfast, Belfast, United Kingdom
| | - Peter Bankhead
- Centre for Cancer Research and Cell Biology (CCRCB), Queen’s University Belfast, Belfast, United Kingdom
| | - Philip D Dunne
- Centre for Cancer Research and Cell Biology (CCRCB), Queen’s University Belfast, Belfast, United Kingdom
| | - Paul G O’Reilly
- Centre for Cancer Research and Cell Biology (CCRCB), Queen’s University Belfast, Belfast, United Kingdom
| | - Jacqueline A James
- Centre for Cancer Research and Cell Biology (CCRCB), Queen’s University Belfast, Belfast, United Kingdom
| | - Manuel Salto-Tellez
- Centre for Cancer Research and Cell Biology (CCRCB), Queen’s University Belfast, Belfast, United Kingdom
| | - Peter W Hamilton
- Centre for Cancer Research and Cell Biology (CCRCB), Queen’s University Belfast, Belfast, United Kingdom
| | - Darragh G McArt
- Centre for Cancer Research and Cell Biology (CCRCB), Queen’s University Belfast, Belfast, United Kingdom
| |
Collapse
|
7
|
Hochheiser H, Castine M, Harris D, Savova G, Jacobson RS. An information model for computable cancer phenotypes. BMC Med Inform Decis Mak 2016; 16:121. [PMID: 27629872 PMCID: PMC5024416 DOI: 10.1186/s12911-016-0358-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 09/01/2016] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Standards, methods, and tools supporting the integration of clinical data and genomic information are an area of significant need and rapid growth in biomedical informatics. Integration of cancer clinical data and cancer genomic information poses unique challenges, because of the high volume and complexity of clinical data, as well as the heterogeneity and instability of cancer genome data when compared with germline data. Current information models of clinical and genomic data are not sufficiently expressive to represent individual observations and to aggregate those observations into longitudinal summaries over the course of cancer care. These models are acutely needed to support the development of systems and tools for generating the so called clinical "deep phenotype" of individual cancer patients, a process which remains almost entirely manual in cancer research and precision medicine. METHODS Reviews of existing ontologies and interviews with cancer researchers were used to inform iterative development of a cancer phenotype information model. We translated a subset of the Fast Healthcare Interoperability Resources (FHIR) models into the OWL 2 Description Logic (DL) representation, and added extensions as needed for modeling cancer phenotypes with terms derived from the NCI Thesaurus. Models were validated with domain experts and evaluated against competency questions. RESULTS The DeepPhe Information model represents cancer phenotype data at increasing levels of abstraction from mention level in clinical documents to summaries of key events and findings. We describe the model using breast cancer as an example, depicting methods to represent phenotypic features of cancers, tumors, treatment regimens, and specific biologic behaviors that span the entire course of a patient's disease. CONCLUSIONS We present a multi-scale information model for representing individual document mentions, document level classifications, episodes along a disease course, and phenotype summarization, linking individual observations to high-level summaries in support of subsequent integration and analysis.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA. .,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Melissa Castine
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA
| | - David Harris
- Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Guergana Savova
- Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Rebecca S Jacobson
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.,University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA
| |
Collapse
|
8
|
Jiang G, Kiefer RC, Rasmussen LV, Solbrig HR, Mo H, Pacheco JA, Xu J, Montague E, Thompson WK, Denny JC, Chute CG, Pathak J. Developing a data element repository to support EHR-driven phenotype algorithm authoring and execution. J Biomed Inform 2016; 62:232-42. [PMID: 27392645 DOI: 10.1016/j.jbi.2016.07.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Accepted: 07/04/2016] [Indexed: 01/25/2023]
Abstract
The Quality Data Model (QDM) is an information model developed by the National Quality Forum for representing electronic health record (EHR)-based electronic clinical quality measures (eCQMs). In conjunction with the HL7 Health Quality Measures Format (HQMF), QDM contains core elements that make it a promising model for representing EHR-driven phenotype algorithms for clinical research. However, the current QDM specification is available only as descriptive documents suitable for human readability and interpretation, but not for machine consumption. The objective of the present study is to develop and evaluate a data element repository (DER) for providing machine-readable QDM data element service APIs to support phenotype algorithm authoring and execution. We used the ISO/IEC 11179 metadata standard to capture the structure for each data element, and leverage Semantic Web technologies to facilitate semantic representation of these metadata. We observed there are a number of underspecified areas in the QDM, including the lack of model constraints and pre-defined value sets. We propose a harmonization with the models developed in HL7 Fast Healthcare Interoperability Resources (FHIR) and Clinical Information Modeling Initiatives (CIMI) to enhance the QDM specification and enable the extensibility and better coverage of the DER. We also compared the DER with the existing QDM implementation utilized within the Measure Authoring Tool (MAT) to demonstrate the scalability and extensibility of our DER-based approach.
Collapse
Affiliation(s)
- Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, USA.
| | - Richard C Kiefer
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Harold R Solbrig
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Huan Mo
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Jie Xu
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Enid Montague
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA; School of Computing, DePaul University, Chicago, IL, USA
| | | | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA; Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | | | - Jyotishman Pathak
- Division of Health Informatics, Weill Cornell Medical College, Cornell University, New York City, NY, USA
| |
Collapse
|
9
|
Shats O, Goldner W, Feng J, Sherman A, Smith RB, Sherman S. Thyroid Cancer and Tumor Collaborative Registry (TCCR). Cancer Inform 2016; 15:73-9. [PMID: 27168721 PMCID: PMC4856228 DOI: 10.4137/cin.s32470] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Revised: 03/08/2016] [Accepted: 03/20/2016] [Indexed: 12/14/2022] Open
Abstract
A multicenter, web-based Thyroid Cancer and Tumor Collaborative Registry (TCCR, http://tccr.unmc.edu) allows for the collection and management of various data on thyroid cancer (TC) and thyroid nodule (TN) patients. The TCCR is coupled with OpenSpecimen, an open-source biobank management system, to annotate biospecimens obtained from the TCCR subjects. The demographic, lifestyle, physical activity, dietary habits, family history, medical history, and quality of life data are provided and may be entered into the registry by subjects. Information on diagnosis, treatment, and outcome is entered by the clinical personnel. The TCCR uses advanced technical and organizational practices, such as (i) metadata-driven software architecture (design); (ii) modern standards and best practices for data sharing and interoperability (standardization); (iii) Agile methodology (project management); (iv) Software as a Service (SaaS) as a software distribution model (operation); and (v) the confederation principle as a business model (governance). This allowed us to create a secure, reliable, user-friendly, and self-sustainable system for TC and TN data collection and management that is compatible with various end-user devices and easily adaptable to a rapidly changing environment. Currently, the TCCR contains data on 2,261 subjects and data on more than 28,000 biospecimens. Data and biological samples collected by the TCCR are used in developing diagnostic, prevention, treatment, and survivorship strategies against TC.
Collapse
Affiliation(s)
- Oleg Shats
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE, USA.; Progenomix, Inc., Omaha, NE, USA
| | - Whitney Goldner
- College of Medicine, University of Nebraska Medical Center, Omaha, NE, USA
| | - Jianmin Feng
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE, USA
| | - Alexander Sherman
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE, USA
| | - Russell B Smith
- College of Medicine, University of Nebraska Medical Center, Omaha, NE, USA.; Nebraska Methodist Hospital, Omaha, NE, USA
| | - Simon Sherman
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE, USA.; Progenomix, Inc., Omaha, NE, USA
| |
Collapse
|
10
|
Jiang G, Evans J, Endle CM, Solbrig HR, Chute CG. Using Semantic Web technologies for the generation of domain-specific templates to support clinical study metadata standards. J Biomed Semantics 2016; 7:10. [PMID: 26949508 PMCID: PMC4778326 DOI: 10.1186/s13326-016-0053-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 12/02/2015] [Indexed: 11/20/2022] Open
Abstract
Background The Biomedical Research Integrated Domain Group (BRIDG) model is a formal domain analysis model for protocol-driven biomedical research, and serves as a semantic foundation for application and message development in the standards developing organizations (SDOs). The increasing sophistication and complexity of the BRIDG model requires new approaches to the management and utilization of the underlying semantics to harmonize domain-specific standards. The objective of this study is to develop and evaluate a Semantic Web-based approach that integrates the BRIDG model with ISO 21090 data types to generate domain-specific templates to support clinical study metadata standards development. Methods We developed a template generation and visualization system based on an open source Resource Description Framework (RDF) store backend, a SmartGWT-based web user interface, and a “mind map” based tool for the visualization of generated domain-specific templates. We also developed a RESTful Web Service informed by the Clinical Information Modeling Initiative (CIMI) reference model for access to the generated domain-specific templates. Results A preliminary usability study is performed and all reviewers (n = 3) had very positive responses for the evaluation questions in terms of the usability and the capability of meeting the system requirements (with the average score of 4.6). Conclusions Semantic Web technologies provide a scalable infrastructure and have great potential to enable computable semantic interoperability of models in the intersection of health care and clinical research.
Collapse
Affiliation(s)
- Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN 55905 USA
| | - Julie Evans
- Clinical Data Interchange Standards Consortium (CDISC), Austin, TX USA
| | - Cory M Endle
- Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN 55905 USA
| | - Harold R Solbrig
- Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN 55905 USA
| | | |
Collapse
|
11
|
Noor AM, Holmberg L, Gillett C, Grigoriadis A. Big Data: the challenge for small research groups in the era of cancer genomics. Br J Cancer 2015; 113:1405-12. [PMID: 26492224 PMCID: PMC4815885 DOI: 10.1038/bjc.2015.341] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 08/04/2015] [Accepted: 08/09/2015] [Indexed: 01/06/2023] Open
Abstract
In the past decade, cancer research has seen an increasing trend towards high-throughput techniques and translational approaches. The increasing availability of assays that utilise smaller quantities of source material and produce higher volumes of data output have resulted in the necessity for data storage solutions beyond those previously used. Multifactorial data, both large in sample size and heterogeneous in context, needs to be integrated in a standardised, cost-effective and secure manner. This requires technical solutions and administrative support not normally financially accounted for in small- to moderate-sized research groups. In this review, we highlight the Big Data challenges faced by translational research groups in the precision medicine era; an era in which the genomes of over 75 000 patients will be sequenced by the National Health Service over the next 3 years to advance healthcare. In particular, we have looked at three main themes of data management in relation to cancer research, namely (1) cancer ontology management, (2) IT infrastructures that have been developed to support data management and (3) the unique ethical challenges introduced by utilising Big Data in research.
Collapse
Affiliation(s)
- Aisyah Mohd Noor
- Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK
| | - Lars Holmberg
- Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK.,Department of Surgical Sciences, Uppsala University, Uppsala 751 85, Sweden
| | - Cheryl Gillett
- Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK.,Faculty of Life Sciences and Medicine, King's Health Partners Cancer Biobank, King's College London, Research Oncology, Guy's Hospital, London SE1 9RT, UK
| | - Anita Grigoriadis
- Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK.,Breast Cancer Now Research Unit, Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK
| |
Collapse
|
12
|
Hicks KA, Tcheng JE, Bozkurt B, Chaitman BR, Cutlip DE, Farb A, Fonarow GC, Jacobs JP, Jaff MR, Lichtman JH, Limacher MC, Mahaffey KW, Mehran R, Nissen SE, Smith EE, Targum SL. 2014 ACC/AHA Key Data Elements and Definitions for Cardiovascular Endpoint Events in Clinical Trials: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Data Standards (Writing Committee to Develop Cardiovascular Endpoints Data Standards). J Nucl Cardiol 2015; 22:1041-144. [PMID: 26204990 DOI: 10.1007/s12350-015-0209-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Hicks KA, Tcheng JE, Bozkurt B, Chaitman BR, Cutlip DE, Farb A, Fonarow GC, Jacobs JP, Jaff MR, Lichtman JH, Limacher MC, Mahaffey KW, Mehran R, Nissen SE, Smith EE, Targum SL. 2014 ACC/AHA Key Data Elements and Definitions for Cardiovascular Endpoint Events in Clinical Trials. Circulation 2015; 132:302-61. [DOI: 10.1161/cir.0000000000000156] [Citation(s) in RCA: 186] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
14
|
Ganzinger M, Knaup P. Requirements for data integration platforms in biomedical research networks: a reference model. PeerJ 2015; 3:e755. [PMID: 25699205 PMCID: PMC4327254 DOI: 10.7717/peerj.755] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 01/19/2015] [Indexed: 11/20/2022] Open
Abstract
Biomedical research networks need to integrate research data among their members and with external partners. To support such data sharing activities, an adequate information technology infrastructure is necessary. To facilitate the establishment of such an infrastructure, we developed a reference model for the requirements. The reference model consists of five reference goals and 15 reference requirements. Using the Unified Modeling Language, the goals and requirements are set into relation to each other. In addition, all goals and requirements are described textually in tables. This reference model can be used by research networks as a basis for a resource efficient acquisition of their project specific requirements. Furthermore, a concrete instance of the reference model is described for a research network on liver cancer. The reference model is transferred into a requirements model of the specific network. Based on this concrete requirements model, a service-oriented information technology architecture is derived and also described in this paper.
Collapse
Affiliation(s)
- Matthias Ganzinger
- Institute of Medical Biometry and Informatics, Heidelberg University , Heidelberg , Germany
| | - Petra Knaup
- Institute of Medical Biometry and Informatics, Heidelberg University , Heidelberg , Germany
| |
Collapse
|
15
|
2014 ACC/AHA Key Data Elements and Definitions for Cardiovascular Endpoint Events in Clinical Trials: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Data Standards (Writing Committee to Develop Cardiovascular Endpoints Data Standards). J Am Coll Cardiol 2014; 66:403-69. [PMID: 25553722 DOI: 10.1016/j.jacc.2014.12.018] [Citation(s) in RCA: 427] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
16
|
Abstract
The understanding of certain data often requires the collection of similar data from different places to be analysed and interpreted. Interoperability standards and ontologies, are facilitating data interchange around the world. However, beyond the existing networks and advances for data transfer, data sharing protocols to support multilateral agreements are useful to exploit the knowledge of distributed Data Warehouses. The access to a certain data set in a federated Data Warehouse may be constrained by the requirement to deliver another specific data set. When bilateral agreements between two nodes of a network are not enough to solve the constraints for accessing to a certain data set, multilateral agreements for data exchange are needed. We present the implementation of a Multi-Agent System for multilateral exchange agreements of clinical data, and evaluate how those multilateral agreements increase the percentage of data collected by a single node from the total amount of data available in the network. Different strategies to reduce the number of messages needed to achieve an agreement are also considered. The results show that with this collaborative sharing scenario the percentage of data collected dramaticaly improve from bilateral agreements to multilateral ones, up to reach almost all data available in the network.
Collapse
|
17
|
Lin CH, Wu NY, Liou DM. A multi-technique approach to bridge electronic case report form design and data standard adoption. J Biomed Inform 2014; 53:49-57. [PMID: 25200473 DOI: 10.1016/j.jbi.2014.08.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Revised: 08/22/2014] [Accepted: 08/30/2014] [Indexed: 10/24/2022]
Abstract
BACKGROUND AND OBJECTIVE The importance of data standards when integrating clinical research data has been recognized. The common data element (CDE) is a consensus-based data element for data harmonization and sharing between clinical researchers, it can support data standards adoption and mapping. However, the lack of a suitable methodology has become a barrier to data standard adoption. Our aim was to demonstrate an approach that allowed clinical researchers to design electronic case report forms (eCRFs) that complied with the data standard. METHODS We used a multi-technique approach, including information retrieval, natural language processing and an ontology-based knowledgebase to facilitate data standard adoption using the eCRF design. The approach took research questions as query texts with the aim of retrieving and associating relevant CDEs with the research questions. RESULTS The approach was implemented using a CDE-based eCRF builder, which was evaluated using CDE- related questions from CRFs used in the Parkinson Disease Biomarker Program, as well as CDE-unrelated questions from a technique support website. Our approach had a precision of 0.84, a recall of 0.80, a F-measure of 0.82 and an error of 0.31. Using the 303 testing CDE-related questions, our approach responded and provided suggested CDEs for 88.8% (269/303) of the study questions with a 90.3% accuracy (243/269). The reason for any missed and failed responses was also analyzed. CONCLUSION This study demonstrates an approach that helps to cross the barrier that inhibits data standard adoption in eCRF building and our evaluation reveals the approach has satisfactory performance. Our CDE-based form builder provides an alternative perspective regarding data standard compliant eCRF design.
Collapse
Affiliation(s)
- Ching-Heng Lin
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Nai-Yuan Wu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Der-Ming Liou
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.
| |
Collapse
|
18
|
Volpe G, Nickman NA, Bussard WE, Giacomelli B, Ferer DS, Urbanski C, Brookins L. Automation and improved technology to promote database synchronization. Am J Health Syst Pharm 2014; 71:675-8. [PMID: 24688043 DOI: 10.2146/ajhp130286] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Gwen Volpe
- Gwen Volpe, B.S.Pharm., LSS, BB, is Pharmacist Consultant, Omnicell, Mountain View, CA. Nancy A. Nickman, Ph.D., B.S.Pharm., is Professor of Pharmacotherapy and Clinical Coordinator of Analytics and Outcomes, University of Utah Hospitals and Clinics, and College of Pharmacy and L. S. Skaggs Pharmacy Institute, University of Utah, Salt Lake City. Wendy E. Bussard, Pharm.D., is Clinical Pharmacist/Application Coordinator, Department of Pharmacy Services, University of Michigan Health System, Ann Arbor. Barbara Giacomelli, Pharm.D., FASHP, is Managing Consultant, McKesson Pharmacy Optimization, Vineland, NJ. Darren S. Ferer, B.S.Pharm., is Pharmacy Informatics Coordinator, Kaleida Health, Buffalo, NY. Chris Urbanski, B.S.Pharm., M.S., is Director of Pharmacy Informatics and Medication Integration, Indiana University Health, Indianapolis. Leslie Brookins, B.S.Pharm., M.S., is Pharmacy IT Manager, Saint Luke's Health System, Kansas City, MO
| | | | | | | | | | | | | |
Collapse
|
19
|
Tenenbaum JD, Sansone SA, Haendel M. A sea of standards for omics data: sink or swim? J Am Med Inform Assoc 2014; 21:200-3. [PMID: 24076747 PMCID: PMC3932466 DOI: 10.1136/amiajnl-2013-002066] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 07/08/2013] [Accepted: 09/10/2013] [Indexed: 11/29/2022] Open
Abstract
In the era of Big Data, omic-scale technologies, and increasing calls for data sharing, it is generally agreed that the use of community-developed, open data standards is critical. Far less agreed upon is exactly which data standards should be used, the criteria by which one should choose a standard, or even what constitutes a data standard. It is impossible simply to choose a domain and have it naturally follow which data standards should be used in all cases. The 'right' standards to use is often dependent on the use case scenarios for a given project. Potential downstream applications for the data, however, may not always be apparent at the time the data are generated. Similarly, technology evolves, adding further complexity. Would-be standards adopters must strike a balance between planning for the future and minimizing the burden of compliance. Better tools and resources are required to help guide this balancing act.
Collapse
Affiliation(s)
- Jessica D Tenenbaum
- Duke Translational Medicine Institute, Duke University, Durham, North Carolina, USA
| | | | - Melissa Haendel
- Library and Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| |
Collapse
|
20
|
Coorevits P, Sundgren M, Klein GO, Bahr A, Claerhout B, Daniel C, Dugas M, Dupont D, Schmidt A, Singleton P, De Moor G, Kalra D. Electronic health records: new opportunities for clinical research. J Intern Med 2013; 274:547-60. [PMID: 23952476 DOI: 10.1111/joim.12119] [Citation(s) in RCA: 166] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Clinical research is on the threshold of a new era in which electronic health records (EHRs) are gaining an important novel supporting role. Whilst EHRs used for routine clinical care have some limitations at present, as discussed in this review, new improved systems and emerging research infrastructures are being developed to ensure that EHRs can be used for secondary purposes such as clinical research, including the design and execution of clinical trials for new medicines. EHR systems should be able to exchange information through the use of recently published international standards for their interoperability and clinically validated information structures (such as archetypes and international health terminologies), to ensure consistent and more complete recording and sharing of data for various patient groups. Such systems will counteract the obstacles of differing clinical languages and styles of documentation as well as the recognized incompleteness of routine records. Here, we discuss some of the legal and ethical concerns of clinical research data reuse and technical security measures that can enable such research while protecting privacy. In the emerging research landscape, cooperation infrastructures are being built where research projects can utilize the availability of patient data from federated EHR systems from many different sites, as well as in international multilingual settings. Amongst several initiatives described, the EHR4CR project offers a promising method for clinical research. One of the first achievements of this project was the development of a protocol feasibility prototype which is used for finding patients eligible for clinical trials from multiple sources.
Collapse
Affiliation(s)
- P Coorevits
- Department of Medical Informatics and Statistics, Ghent University, Ghent, Belgium; The European Institute for Health Records (EuroRec), Sint-Martens-Latem, Belgium
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Sinaci AA, Laleci Erturkmen GB. A federated semantic metadata registry framework for enabling interoperability across clinical research and care domains. J Biomed Inform 2013; 46:784-94. [PMID: 23751263 DOI: 10.1016/j.jbi.2013.05.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 05/23/2013] [Accepted: 05/25/2013] [Indexed: 10/26/2022]
Abstract
In order to enable secondary use of Electronic Health Records (EHRs) by bridging the interoperability gap between clinical care and research domains, in this paper, a unified methodology and the supporting framework is introduced which brings together the power of metadata registries (MDR) and semantic web technologies. We introduce a federated semantic metadata registry framework by extending the ISO/IEC 11179 standard, and enable integration of data element registries through Linked Open Data (LOD) principles where each Common Data Element (CDE) can be uniquely referenced, queried and processed to enable the syntactic and semantic interoperability. Each CDE and their components are maintained as LOD resources enabling semantic links with other CDEs, terminology systems and with implementation dependent content models; hence facilitating semantic search, much effective reuse and semantic interoperability across different application domains. There are several important efforts addressing the semantic interoperability in healthcare domain such as IHE DEX profile proposal, CDISC SHARE and CDISC2RDF. Our architecture complements these by providing a framework to interlink existing data element registries and repositories for multiplying their potential for semantic interoperability to a greater extent. Open source implementation of the federated semantic MDR framework presented in this paper is the core of the semantic interoperability layer of the SALUS project which enables the execution of the post marketing safety analysis studies on top of existing EHR systems.
Collapse
Affiliation(s)
- A Anil Sinaci
- Department of Computer Engineering, Middle East Technical University, 06800 Ankara, Turkey; SRDC Software Research & Development and Consultancy Ltd., ODTU Teknokent Silikon Blok No. 14, 06800 Ankara, Turkey.
| | | |
Collapse
|
22
|
Anderson HV, Weintraub WS, Radford MJ, Kremers MS, Roe MT, Shaw RE, Pinchotti DM, Tcheng JE. Standardized cardiovascular data for clinical research, registries, and patient care: a report from the Data Standards Workgroup of the National Cardiovascular Research Infrastructure project. J Am Coll Cardiol 2013; 61:1835-46. [PMID: 23500238 PMCID: PMC3664644 DOI: 10.1016/j.jacc.2012.12.047] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2012] [Accepted: 12/19/2012] [Indexed: 11/23/2022]
Abstract
Relatively little attention has been focused on standardization of data exchange in clinical research studies and patient care activities. Both are usually managed locally using separate and generally incompatible data systems at individual hospitals or clinics. In the past decade there have been nascent efforts to create data standards for clinical research and patient care data, and to some extent these are helpful in providing a degree of uniformity. Nonetheless, these data standards generally have not been converted into accepted computer-based language structures that could permit reliable data exchange across computer networks. The National Cardiovascular Research Infrastructure (NCRI) project was initiated with a major objective of creating a model framework for standard data exchange in all clinical research, clinical registry, and patient care environments, including all electronic health records. The goal is complete syntactic and semantic interoperability. A Data Standards Workgroup was established to create or identify and then harmonize clinical definitions for a base set of standardized cardiovascular data elements that could be used in this network infrastructure. Recognizing the need for continuity with prior efforts, the Workgroup examined existing data standards sources. A basic set of 353 elements was selected. The NCRI staff then collaborated with the 2 major technical standards organizations in health care, the Clinical Data Interchange Standards Consortium and Health Level Seven International, as well as with staff from the National Cancer Institute Enterprise Vocabulary Services. Modeling and mapping were performed to represent (instantiate) the data elements in appropriate technical computer language structures for endorsement as an accepted data standard for public access and use. Fully implemented, these elements will facilitate clinical research, registry reporting, administrative reporting and regulatory compliance, and patient care.
Collapse
Affiliation(s)
- H Vernon Anderson
- University of Texas Health Science Center, Houston, Texas 77030, USA.
| | | | | | | | | | | | | | | |
Collapse
|
23
|
The ISO/IEC 11179 norm for metadata registries: Does it cover healthcare standards in empirical research? J Biomed Inform 2013; 46:318-27. [DOI: 10.1016/j.jbi.2012.11.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Revised: 11/21/2012] [Accepted: 11/24/2012] [Indexed: 11/18/2022]
|
24
|
Abstract
The modern biomedical research and healthcare delivery domains have seen an unparalleled increase in the rate of innovation and novel technologies over the past several decades. Catalyzed by paradigm-shifting public and private programs focusing upon the formation and delivery of genomic and personalized medicine, the need for high-throughput and integrative approaches to the collection, management, and analysis of heterogeneous data sets has become imperative. This need is particularly pressing in the translational bioinformatics domain, where many fundamental research questions require the integration of large scale, multi-dimensional clinical phenotype and bio-molecular data sets. Modern biomedical informatics theory and practice has demonstrated the distinct benefits associated with the use of knowledge-based systems in such contexts. A knowledge-based system can be defined as an intelligent agent that employs a computationally tractable knowledge base or repository in order to reason upon data in a targeted domain and reproduce expert performance relative to such reasoning operations. The ultimate goal of the design and use of such agents is to increase the reproducibility, scalability, and accessibility of complex reasoning tasks. Examples of the application of knowledge-based systems in biomedicine span a broad spectrum, from the execution of clinical decision support, to epidemiologic surveillance of public data sets for the purposes of detecting emerging infectious diseases, to the discovery of novel hypotheses in large-scale research data sets. In this chapter, we will review the basic theoretical frameworks that define core knowledge types and reasoning operations with particular emphasis on the applicability of such conceptual models within the biomedical domain, and then go on to introduce a number of prototypical data integration requirements and patterns relevant to the conduct of translational bioinformatics that can be addressed via the design and use of knowledge-based systems.
Collapse
Affiliation(s)
- Philip R O Payne
- The Ohio State University, Department of Biomedical Informatics, Columbus, Ohio, United States of America.
| |
Collapse
|
25
|
Weber SC, Seto T, Olson C, Kenkare P, Kurian AW, Das AK. Oncoshare: lessons learned from building an integrated multi-institutional database for comparative effectiveness research. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:970-978. [PMID: 23304372 PMCID: PMC3540570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Comparative effectiveness research (CER) using observational data requires informatics methods for the extraction, standardization, sharing, and integration of data derived from a variety of electronic sources. In the Oncoshare project, we have developed such methods as part of a collaborative multi-institutional CER study of patterns, predictors, and outcome of breast cancer care. In this paper, we present an evaluation of the approaches we undertook and the lessons we learned in building and validating the Oncoshare data resource. Specifically, we determined that 1) the state or regional cancer registry makes the most efficient starting point for determining inclusion of subjects; 2) the data dictionary should be based on existing registry standards, such as Surveillance, Epidemiology and End Results (SEER), when applicable; 3) the Social Security Administration Death Master File (SSA DMF), rather than clinical resources, provides standardized ascertainment of mortality outcomes; and 4) CER database development efforts, despite the immediate availability of electronic data, may take as long as two years to produce validated, reliable data for research. Through our efforts using these methods, Oncoshare integrates complex, longitudinal data from multiple electronic medical records and registries and provides a rich, validated resource for research on oncology care.
Collapse
Affiliation(s)
- Susan C Weber
- Center for Clinical Informatics, Stanford University, USA
| | | | | | | | | | | |
Collapse
|
26
|
Jiang G, Solbrig HR, Chute CG. Quality evaluation of value sets from cancer study common data elements using the UMLS semantic groups. J Am Med Inform Assoc 2012; 19:e129-36. [PMID: 22511016 PMCID: PMC3392855 DOI: 10.1136/amiajnl-2011-000739] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE The objective of this study is to develop an approach to evaluate the quality of terminological annotations on the value set (ie, enumerated value domain) components of the common data elements (CDEs) in the context of clinical research using both unified medical language system (UMLS) semantic types and groups. MATERIALS AND METHODS The CDEs of the National Cancer Institute (NCI) Cancer Data Standards Repository, the NCI Thesaurus (NCIt) concepts and the UMLS semantic network were integrated using a semantic web-based framework for a SPARQL-enabled evaluation. First, the set of CDE-permissible values with corresponding meanings in external controlled terminologies were isolated. The corresponding value meanings were then evaluated against their NCI- or UMLS-generated semantic network mapping to determine whether all of the meanings fell within the same semantic group. RESULTS Of the enumerated CDEs in the Cancer Data Standards Repository, 3093 (26.2%) had elements drawn from more than one UMLS semantic group. A random sample (n=100) of this set of elements indicated that 17% of them were likely to have been misclassified. DISCUSSION The use of existing semantic web tools can support a high-throughput mechanism for evaluating the quality of large CDE collections. This study demonstrates that the involvement of multiple semantic groups in an enumerated value domain of a CDE is an effective anchor to trigger an auditing point for quality evaluation activities. CONCLUSION This approach produces a useful quality assurance mechanism for a clinical study CDE repository.
Collapse
Affiliation(s)
- Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA.
| | | | | |
Collapse
|
27
|
González-Beltrán AN, Yong MY, Dancey G, Begent R. Guidelines for information about therapy experiments: a proposal on best practice for recording experimental data on cancer therapy. BMC Res Notes 2012; 5:10. [PMID: 22226027 PMCID: PMC3285520 DOI: 10.1186/1756-0500-5-10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 01/06/2012] [Indexed: 12/03/2022] Open
Abstract
Background Biology, biomedicine and healthcare have become data-driven enterprises, where scientists and clinicians need to generate, access, validate, interpret and integrate different kinds of experimental and patient-related data. Thus, recording and reporting of data in a systematic and unambiguous fashion is crucial to allow aggregation and re-use of data. This paper reviews the benefits of existing biomedical data standards and focuses on key elements to record experiments for therapy development. Specifically, we describe the experiments performed in molecular, cellular, animal and clinical models. We also provide an example set of elements for a therapy tested in a phase I clinical trial. Findings We introduce the Guidelines for Information About Therapy Experiments (GIATE), a minimum information checklist creating a consistent framework to transparently report the purpose, methods and results of the therapeutic experiments. A discussion on the scope, design and structure of the guidelines is presented, together with a description of the intended audience. We also present complementary resources such as a classification scheme, and two alternative ways of creating GIATE information: an electronic lab notebook and a simple spreadsheet-based format. Finally, we use GIATE to record the details of the phase I clinical trial of CHT-25 for patients with refractory lymphomas. The benefits of using GIATE for this experiment are discussed. Conclusions While data standards are being developed to facilitate data sharing and integration in various aspects of experimental medicine, such as genomics and clinical data, no previous work focused on therapy development. We propose a checklist for therapy experiments and demonstrate its use in the 131Iodine labeled CHT-25 chimeric antibody cancer therapy. As future work, we will expand the set of GIATE tools to continue to encourage its use by cancer researchers, and we will engineer an ontology to annotate GIATE elements and facilitate unambiguous interpretation and data integration.
Collapse
|
28
|
Jiang G, Solbrig HR, Chute CG. Quality evaluation of cancer study Common Data Elements using the UMLS Semantic Network. J Biomed Inform 2011; 44 Suppl 1:S78-S85. [DOI: 10.1016/j.jbi.2011.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Revised: 07/29/2011] [Accepted: 08/01/2011] [Indexed: 11/27/2022]
|
29
|
Hu H, Correll M, Kvecher L, Osmond M, Clark J, Bekhash A, Schwab G, Gao D, Gao J, Kubatin V, Shriver CD, Hooke JA, Maxwell LG, Kovatich AJ, Sheldon JG, Liebman MN, Mural RJ. DW4TR: A Data Warehouse for Translational Research. J Biomed Inform 2011; 44:1004-19. [PMID: 21872681 DOI: 10.1016/j.jbi.2011.08.003] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Revised: 07/05/2011] [Accepted: 08/04/2011] [Indexed: 10/17/2022]
Abstract
The linkage between the clinical and laboratory research domains is a key issue in translational research. Integration of clinicopathologic data alone is a major task given the number of data elements involved. For a translational research environment, it is critical to make these data usable at the point-of-need. Individual systems have been developed to meet the needs of particular projects though the need for a generalizable system has been recognized. Increased use of Electronic Medical Record data in translational research will demand generalizing the system for integrating clinical data to support the study of a broad range of human diseases. To ultimately satisfy these needs, we have developed a system to support multiple translational research projects. This system, the Data Warehouse for Translational Research (DW4TR), is based on a light-weight, patient-centric modularly-structured clinical data model and a specimen-centric molecular data model. The temporal relationships of the data are also part of the model. The data are accessed through an interface composed of an Aggregated Biomedical-Information Browser (ABB) and an Individual Subject Information Viewer (ISIV) which target general users. The system was developed to support a breast cancer translational research program and has been extended to support a gynecological disease program. Further extensions of the DW4TR are underway. We believe that the DW4TR will play an important role in translational research across multiple disease types.
Collapse
Affiliation(s)
- Hai Hu
- Windber Research Institute, Windber, PA 15963, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
ACCF/AHA 2011 key data elements and definitions of a base cardiovascular vocabulary for electronic health records: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Clinical Data Standards. J Am Coll Cardiol 2011; 58:202-22. [PMID: 21652161 DOI: 10.1016/j.jacc.2011.05.001] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
31
|
Weintraub WS, Karlsberg RP, Tcheng JE, Boris JR, Buxton AE, Dove JT, Fonarow GC, Goldberg LR, Heidenreich P, Hendel RC, Jacobs AK, Lewis W, Mirro MJ, Shahian DM, Hendel RC, Bozkurt B, Jacobs JP, Peterson PN, Roger VL, Smith EE, Tcheng JE, Wang T. ACCF/AHA 2011 key data elements and definitions of a base cardiovascular vocabulary for electronic health records: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Clinical Data Standards. Circulation 2011; 124:103-23. [PMID: 21646493 DOI: 10.1161/cir.0b013e31821ccf71] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
32
|
Costa CM, Menárguez-Tortosa M, Fernández-Breis JT. Clinical data interoperability based on archetype transformation. J Biomed Inform 2011; 44:869-80. [PMID: 21645637 DOI: 10.1016/j.jbi.2011.05.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2011] [Revised: 05/13/2011] [Accepted: 05/17/2011] [Indexed: 10/18/2022]
Abstract
The semantic interoperability between health information systems is a major challenge to improve the quality of clinical practice and patient safety. In recent years many projects have faced this problem and provided solutions based on specific standards and technologies in order to satisfy the needs of a particular scenario. Most of such solutions cannot be easily adapted to new scenarios, thus more global solutions are needed. In this work, we have focused on the semantic interoperability of electronic healthcare records standards based on the dual model architecture and we have developed a solution that has been applied to ISO 13606 and openEHR. The technological infrastructure combines reference models, archetypes and ontologies, with the support of Model-driven Engineering techniques. For this purpose, the interoperability infrastructure developed in previous work by our group has been reused and extended to cover the requirements of data transformation.
Collapse
|
33
|
Boyd LB, Hunicke-Smith SP, Stafford GA, Freund ET, Ehlman M, Chandran U, Dennis R, Fernandez AT, Goldstein S, Steffen D, Tycko B, Klemm JD. The caBIG® Life Science Business Architecture Model. Bioinformatics 2011; 27:1429-35. [PMID: 21450709 PMCID: PMC3087952 DOI: 10.1093/bioinformatics/btr141] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2010] [Revised: 01/31/2011] [Accepted: 03/12/2011] [Indexed: 11/28/2022] Open
Abstract
MOTIVATION Business Architecture Models (BAMs) describe what a business does, who performs the activities, where and when activities are performed, how activities are accomplished and which data are present. The purpose of a BAM is to provide a common resource for understanding business functions and requirements and to guide software development. The cancer Biomedical Informatics Grid (caBIG®) Life Science BAM (LS BAM) provides a shared understanding of the vocabulary, goals and processes that are common in the business of LS research. RESULTS LS BAM 1.1 includes 90 goals and 61 people and groups within Use Case and Activity Unified Modeling Language (UML) Diagrams. Here we report on the model's current release, LS BAM 1.1, its utility and usage, and plans for future use and continuing development for future releases. AVAILABILITY AND IMPLEMENTATION The LS BAM is freely available as UML, PDF and HTML (https://wiki.nci.nih.gov/x/OFNyAQ).
Collapse
Affiliation(s)
- Lauren Becnel Boyd
- Department of Medicine, Hematology/Oncology, Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Livne OE, Schultz ND, Narus SP. Federated querying architecture with clinical & translational health IT application. J Med Syst 2011; 35:1211-24. [PMID: 21537849 DOI: 10.1007/s10916-011-9720-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2011] [Accepted: 04/13/2011] [Indexed: 11/28/2022]
Abstract
We present a software architecture that federates data from multiple heterogeneous health informatics data sources owned by multiple organizations. The architecture builds upon state-of-the-art open-source Java and XML frameworks in innovative ways. It consists of (a) federated query engine, which manages federated queries and result set aggregation via a patient identification service; and (b) data source facades, which translate the physical data models into a common model on-the-fly and handle large result set streaming. System modules are connected via reusable Apache Camel integration routes and deployed to an OSGi enterprise service bus. We present an application of our architecture that allows users to construct queries via the i2b2 web front-end, and federates patient data from the University of Utah Enterprise Data Warehouse and the Utah Population database. Our system can be easily adopted, extended and integrated with existing SOA Healthcare and HL7 frameworks such as i2b2 and caGrid.
Collapse
Affiliation(s)
- Oren E Livne
- Office of AVP for Health Sciences IT, University of Utah, Salt Lake City, UT 84112, USA.
| | | | | |
Collapse
|
35
|
Tao C, Jiang G, Wei W, Solbrig HR, Chute CG. Towards Semantic-Web Based Representation and Harmonization of Standard Meta-data Models for Clinical Studies. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2011; 2011:59-63. [PMID: 22211181 PMCID: PMC3248749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
In this paper, we introduce our case studies for representing clinical study meta-data models such as the HL7 Detailed Clinical Models (DCMs) and the ISO11179 model in a framework that is based on the Semantic-Web technology. We consider such a harmonization would provide computable semantics of the models, thus facilitate the model reuse, model harmonization and data integration.1.
Collapse
|
36
|
Scotch M, Mattocks K, Rabinowitz P, Brandt C. A qualitative study of state-level zoonotic disease surveillance in new England. Zoonoses Public Health 2011; 58:131-9. [PMID: 20163575 PMCID: PMC3857965 DOI: 10.1111/j.1863-2378.2009.01319.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Zoonotic diseases are infectious diseases transmittable between animals and humans and outbreaks of these diseases in animals can signify that humans are also infected (or vice versa). Thus, communication between animal and human health agencies is critical for surveillance. Understanding how these agencies conduct surveillance and share information is important for the development of successful automated zoonotic monitoring systems. Individual interviews were conducted with 13 professionals who perform animal or human zoonotic disease surveillance in one of the New England states. Questions centred on existing surveillance methods, collaborations between animal and human health agencies, and technological and data needs. The results showed that agencies routinely communicate over suspected zoonotic disease cases, yet there are barriers preventing automated electronic linking of health data of animals and humans. These include technological barriers and barriers due to sensitivity and confidentiality of information. Addressing these will facilitate the development of electronic systems for integrating animal and human zoonotic disease surveillance data.
Collapse
Affiliation(s)
- M Scotch
- Yale Center for Medical Informatics, Yale University, New Haven, CT, USA.
| | | | | | | |
Collapse
|
37
|
Pathak J, Peters L, Chute CG, Bodenreider O. Comparing and evaluating terminology services application programming interfaces: RxNav, UMLSKS and LexBIG. J Am Med Inform Assoc 2011; 17:714-9. [PMID: 20962136 DOI: 10.1136/jamia.2009.001149] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
To facilitate the integration of terminologies into applications, various terminology services application programming interfaces (API) have been developed in the recent past. In this study, three publicly available terminology services API, RxNav, UMLSKS and LexBIG, are compared and functionally evaluated with respect to the retrieval of information from one biomedical terminology, RxNorm, to which all three services provide access. A list of queries is established covering a wide spectrum of terminology services functionalities such as finding RxNorm concepts by their name, or navigating different types of relationships. Test data were generated from the RxNorm dataset to evaluate the implementation of the functionalities in the three API. The results revealed issues with various aspects of the API implementation (eg, handling of obsolete terms by LexBIG) and documentation (eg, navigational paths used in RxNav) that were subsequently addressed by the development teams of the three API investigated. Knowledge about such discrepancies helps inform the choice of an API for a given use case.
Collapse
Affiliation(s)
- Jyotishman Pathak
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota 55905, USA.
| | | | | | | |
Collapse
|
38
|
Brochhausen M, Spear AD, Cocos C, Weiler G, Martín L, Anguita A, Stenzhorn H, Daskalaki E, Schera F, Schwarz U, Sfakianakis S, Kiefer S, Dörr M, Graf N, Tsiknakis M. The ACGT Master Ontology and its applications--towards an ontology-driven cancer research and management system. J Biomed Inform 2011; 44:8-25. [PMID: 20438862 PMCID: PMC5755590 DOI: 10.1016/j.jbi.2010.04.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Revised: 04/23/2010] [Accepted: 04/27/2010] [Indexed: 11/28/2022]
Abstract
OBJECTIVE This paper introduces the objectives, methods and results of ontology development in the EU co-funded project Advancing Clinico-genomic Trials on Cancer-Open Grid Services for Improving Medical Knowledge Discovery (ACGT). While the available data in the life sciences has recently grown both in amount and quality, the full exploitation of it is being hindered by the use of different underlying technologies, coding systems, category schemes and reporting methods on the part of different research groups. The goal of the ACGT project is to contribute to the resolution of these problems by developing an ontology-driven, semantic grid services infrastructure that will enable efficient execution of discovery-driven scientific workflows in the context of multi-centric, post-genomic clinical trials. The focus of the present paper is the ACGT Master Ontology (MO). METHODS ACGT project researchers undertook a systematic review of existing domain and upper-level ontologies, as well as of existing ontology design software, implementation methods, and end-user interfaces. This included the careful study of best practices, design principles and evaluation methods for ontology design, maintenance, implementation, and versioning, as well as for use on the part of domain experts and clinicians. RESULTS To date, the results of the ACGT project include (i) the development of a master ontology (the ACGT-MO) based on clearly defined principles of ontology development and evaluation; (ii) the development of a technical infrastructure (the ACGT Platform) that implements the ACGT-MO utilizing independent tools, components and resources that have been developed based on open architectural standards, and which includes an application updating and evolving the ontology efficiently in response to end-user needs; and (iii) the development of an Ontology-based Trial Management Application (ObTiMA) that integrates the ACGT-MO into the design process of clinical trials in order to guarantee automatic semantic integration without the need to perform a separate mapping process.
Collapse
Affiliation(s)
- Mathias Brochhausen
- Institute of Formal Ontology and Medical, Information Science, Saarland University, P.O. Box 15 11 50, 66041 Saarbrücken, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Sreenivasaiah PK, Kim DH. Current trends and new challenges of databases and web applications for systems driven biological research. Front Physiol 2010; 1:147. [PMID: 21423387 PMCID: PMC3059952 DOI: 10.3389/fphys.2010.00147] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 10/18/2010] [Indexed: 12/17/2022] Open
Abstract
Dynamic and rapidly evolving nature of systems driven research imposes special requirements on the technology, approach, design and architecture of computational infrastructure including database and Web application. Several solutions have been proposed to meet the expectations and novel methods have been developed to address the persisting problems of data integration. It is important for researchers to understand different technologies and approaches. Having familiarized with the pros and cons of the existing technologies, researchers can exploit its capabilities to the maximum potential for integrating data. In this review we discuss the architecture, design and key technologies underlying some of the prominent databases and Web applications. We will mention their roles in integration of biological data and investigate some of the emerging design concepts and computational technologies that are likely to have a key role in the future of systems driven biomedical research.
Collapse
Affiliation(s)
- Pradeep Kumar Sreenivasaiah
- Systems Biology Research Center and College of Life Science, Gwangju Institute of Science and TechnologyGwangju, Republic of Korea
| | - Do Han Kim
- Systems Biology Research Center and College of Life Science, Gwangju Institute of Science and TechnologyGwangju, Republic of Korea
| |
Collapse
|
40
|
Arenson AD, Bakhireva LN, Chambers CD, Deximo CA, Foroud T, Jacobson JL, Jacobson SW, Jones KL, Mattson SN, May PA, Moore ES, Ogle K, Riley EP, Robinson LK, Rogers J, Streissguth AP, Tavares MC, Urbanski J, Yezerets Y, Surya R, Stewart CA, Barnett WK. Implementation of a shared data repository and common data dictionary for fetal alcohol spectrum disorders research. Alcohol 2010; 44:643-7. [PMID: 20036486 DOI: 10.1016/j.alcohol.2009.08.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Revised: 08/03/2009] [Accepted: 08/04/2009] [Indexed: 10/20/2022]
Abstract
Many previous attempts by fetal alcohol spectrum disorders researchers to compare data across multiple prospective and retrospective human studies have failed because of both structural differences in the collected data and difficulty in coming to agreement on the precise meaning of the terminology used to describe the collected data. Although some groups of researchers have an established track record of successfully integrating data, attempts to integrate data more broadly among different groups of researchers have generally faltered. Lack of tools to help researchers share and integrate data has also hampered data analysis. This situation has delayed improving diagnosis, intervention, and treatment before and after birth. We worked with various researchers and research programs in the Collaborative Initiative on Fetal Alcohol Spectrum Disorders (CI-FASD) to develop a set of common data dictionaries to describe the data to be collected, including definitions of terms and specification of allowable values. The resulting data dictionaries were the basis for creating a central data repository (CI-FASD Central Repository) and software tools to input and query data. Data entry restrictions ensure that only data that conform to the data dictionaries reach the CI-FASD Central Repository. The result is an effective system for centralized and unified management of the data collected and analyzed by the initiative, including a secure, long-term data repository. CI-FASD researchers are able to integrate and analyze data of different types, using multiple methods, and collected from multiple populations, and data are retained for future reuse in a secure, robust repository.
Collapse
|
41
|
Krikov S, Price RC, Matney SA, Allen-Brady K, Facelli JC. Enabling GeneHunter as a grid service: a case study for implementing analytical services in biomedical grids. Methods Inf Med 2010; 50:364-71. [PMID: 20963257 DOI: 10.3414/me10-01-0005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2010] [Accepted: 05/30/2010] [Indexed: 11/09/2022]
Abstract
BACKGROUND A cursory analysis of the biomedical grid literature shows that most projects emphasize data sharing and the development of new applications for the grid environment. Much less is known about the best practices for the migration of existing analytical tools into the grid environment. OBJECTIVES To make GeneHunter available as a grid service and to evaluate the effort and best practices needed to enable a legacy application as a grid service when addressing semantic integration and using the caBIG tools. METHODS We used the tools available in the caBIG environment because these tools are quite general and they may be used to deploy services in similar biomedical grids that are OGSA-compliant. RESULTS We achieved semantic integration of GeneHunter within the caBIG by creating a new UML model, LinkageX, for the LINKAGE data format. The LinkageX UML model has been published in the caDSR and it is publically available for usage with GeneHunter or any other program using this data format. CONCLUSIONS While achieving semantic interoperability is still a time-consuming task, the tools available in caBIG can greatly enhance productivity and decrease errors.
Collapse
Affiliation(s)
- S Krikov
- Department of Biomedical Informatics, The University of Utah, 26 South 2000 East, Room 5775 HSEB, Salt Lake City, Utah 84108, USA
| | | | | | | | | |
Collapse
|
42
|
Wang H, Bouzyk E, Kuehn A, Muller S, Chen Z, Khuri FR, Shin DM, Rogatko A, Tighiouart M. caGrid-Enabled caBIGTM Silver Level Compatible Head and Neck Cancer Tissue Database System. Open Med Inform J 2010; 4:171-8. [PMID: 21589853 PMCID: PMC3095113 DOI: 10.2174/1874431101004010171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2009] [Revised: 04/23/2010] [Accepted: 06/21/2010] [Indexed: 11/25/2022] Open
Abstract
There are huge amounts of biomedical data generated by research labs in each cancer institution. The data are stored in various formats and accessed through numerous interfaces. It is very difficult to exchange and integrate the data among different cancer institutions, even among different research labs within the same institution, in order to discover useful biomedical knowledge for the healthcare community. In this paper, we present the design and implementation of a caGrid-enabled caBIGTM silver level compatible head and neck cancer tissue database system. The system is implemented using a set of open source software and tools developed by the NCI, such as the caCORE SDK and caGrid. The head and neck cancer tissue database system has four interfaces: Web-based, Java API, XML utility, and Web service. The system has been shown to provide robust and programmatically accessible biomedical information services that syntactically and semantically interoperate with other resources.
Collapse
|
43
|
|
44
|
Jiang G, Solbrig HR, Iberson-Hurst D, Kush RD, Chute CG. A Collaborative Framework for Representation and Harmonization of Clinical Study Data Elements Using Semantic MediaWiki. SUMMIT ON TRANSLATIONAL BIOINFORMATICS 2010; 2010:11-5. [PMID: 21347136 PMCID: PMC3041544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Semantic interoperability among terminologies, data elements, and information models is fundamental and critical for sharing information from the scientific bench to the clinical bedside and back among systems. To meet this need, the vision for CDISC is to build a global, accessible electronic library, which enables precise and standardized data element definitions that can be used in applications and studies to improve biomedical research and its link with health care. As a pilot study, we propose a representation and harmonization framework for clinical study data elements and implement a prototype CDISC Shared Health and Research Electronic Library (CSHARE) using Semantic MediaWiki. We report the preliminary observations of how the components worked and the lessons learnt. In summary, the wiki provided a useful prototyping tool from a process standpoint.
Collapse
Affiliation(s)
- Guoqian Jiang
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, 55905
| | - Harold R. Solbrig
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, 55905
| | - Dave Iberson-Hurst
- Clinical Data Interchange Standards Consortium (CDISC), Austin, TX, 78746
| | - Rebecca D. Kush
- Clinical Data Interchange Standards Consortium (CDISC), Austin, TX, 78746
| | - Christopher G. Chute
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, 55905
| |
Collapse
|
45
|
Hartung M, Gross A, Kirsten T, Rahm E. Discovering Evolving Regions in Life Science Ontologies. LECTURE NOTES IN COMPUTER SCIENCE 2010. [DOI: 10.1007/978-3-642-15120-0_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
46
|
McCusker JP, Phillips JA, Beltrán AG, Finkelstein A, Krauthammer M. Semantic web data warehousing for caGrid. BMC Bioinformatics 2009; 10 Suppl 10:S2. [PMID: 19796399 PMCID: PMC2755823 DOI: 10.1186/1471-2105-10-s10-s2] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges.
Collapse
Affiliation(s)
- Jamie P McCusker
- grid.47100.320000000419368710Department of Pathology, Yale University School of Medicine, New Haven, CT USA
| | | | | | - Anthony Finkelstein
- grid.83440.3b0000000121901201Department of Computer Science, University College London, London, UK
| | - Michael Krauthammer
- grid.47100.320000000419368710Department of Pathology, Yale University School of Medicine, New Haven, CT USA
| |
Collapse
|
47
|
Payne PRO, Embi PJ, Sen CK. Translational informatics: enabling high-throughput research paradigms. Physiol Genomics 2009; 39:131-40. [PMID: 19737991 DOI: 10.1152/physiolgenomics.00050.2009] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
A common thread throughout the clinical and translational research domains is the need to collect, manage, integrate, analyze, and disseminate large-scale, heterogeneous biomedical data sets. However, well-established and broadly adopted theoretical and practical frameworks and models intended to address such needs are conspicuously absent in the published literature or other reputable knowledge sources. Instead, the development and execution of multidisciplinary, clinical, or translational studies are significantly limited by the propagation of "silos" of both data and expertise. Motivated by this fundamental challenge, we report upon the current state and evolution of biomedical informatics as it pertains to the conduct of high-throughput clinical and translational research and will present both a conceptual and practical framework for the design and execution of informatics-enabled studies. The objective of presenting such findings and constructs is to provide the clinical and translational research community with a common frame of reference for discussing and expanding upon such models and methodologies.
Collapse
Affiliation(s)
- Philip R O Payne
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210, USA.
| | | | | |
Collapse
|
48
|
Abstract
The National Cancer Institute Enterprise Vocabulary Services (NCI EVS) uses a wide range of quality assurance (QA) techniques to maintain and extend NCI Thesaurus (NCIt). NCIt is a reference terminology and biomedical ontology used in a growing number of NCI and other systems that extend from translational and basic research through clinical care to public information and administrative activities. Both automated and manual QA techniques are employed throughout the editing and publication cycle, which includes inserting and editing NCIt in NCI Metathesaurus. NCI EVS conducts its own additional periodic and ongoing content QA. External reviews, and extensive evaluation by and interaction with EVS partners and other users, have also played an important part in the QA process. There have always been tensions and compromises between meeting the needs of dependent systems and providing consistent and well-structured content; external QA and feedback have been important in identifying and addressing such issues. Currently, NCI EVS is exploring new approaches to broaden external participation in the terminology development and QA process.
Collapse
|
49
|
Min H, Manion FJ, Goralczyk E, Wong YN, Ross E, Beck JR. Integration of prostate cancer clinical data using an ontology. J Biomed Inform 2009; 42:1035-45. [PMID: 19497389 DOI: 10.1016/j.jbi.2009.05.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2009] [Revised: 05/21/2009] [Accepted: 05/22/2009] [Indexed: 10/20/2022]
Abstract
It is increasingly important for investigators to efficiently and effectively access, interpret, and analyze the data from diverse biological, literature, and annotation sources in a unified way. The heterogeneity of biomedical data and the lack of metadata are the primary sources of the difficulty for integration, presenting major challenges to effective search and retrieval of the information. As a proof of concept, the Prostate Cancer Ontology (PCO) is created for the development of the Prostate Cancer Information System (PCIS). PCIS is applied to demonstrate how the ontology is utilized to solve the semantic heterogeneity problem from the integration of two prostate cancer related database systems at the Fox Chase Cancer Center. As the results of the integration process, the semantic query language SPARQL is applied to perform the integrated queries across the two database systems based on PCO.
Collapse
Affiliation(s)
- Hua Min
- Fox Chase Cancer Center, Philadelphia, PA 19111, USA.
| | | | | | | | | | | |
Collapse
|
50
|
Cimino JJ, Hayamizu TF, Bodenreider O, Davis B, Stafford GA, Ringwald M. The caBIG terminology review process. J Biomed Inform 2009; 42:571-80. [PMID: 19154797 PMCID: PMC2729758 DOI: 10.1016/j.jbi.2008.12.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2008] [Revised: 10/23/2008] [Accepted: 12/14/2008] [Indexed: 10/21/2022]
Abstract
The National Cancer Institute (NCI) is developing an integrated biomedical informatics infrastructure, the cancer Biomedical Informatics Grid (caBIG), to support collaboration within the cancer research community. A key part of the caBIG architecture is the establishment of terminology standards for representing data. In order to evaluate the suitability of existing controlled terminologies, the caBIG Vocabulary and Data Elements Workspace (VCDE WS) working group has developed a set of criteria that serve to assess a terminology's structure, content, documentation, and editorial process. This paper describes the evolution of these criteria and the results of their use in evaluating four standard terminologies: the Gene Ontology (GO), the NCI Thesaurus (NCIt), the Common Terminology for Adverse Events (known as CTCAE), and the laboratory portion of the Logical Objects, Identifiers, Names and Codes (LOINC). The resulting caBIG criteria are presented as a matrix that may be applicable to any terminology standardization effort.
Collapse
Affiliation(s)
- James J Cimino
- National Institutes of Health, Laboratory for Informatics Development, Clinical Center, Room 6-2551, 10 Center Drive, Bethesda, MD 20892, USA.
| | | | | | | | | | | |
Collapse
|