1
|
Abstract
Data integration occurs when a query proceeds through multiple data sets, thereby relating diverse data extracted from different data sources. Data integration is particularly important to biomedical researchers since data obtained from experiments on human tissue specimens have little applied value unless they can be combined with medical data (i.e., pathologic and clinical information). In the past, research data were correlated with medical data by manually retrieving, reading, assembling and abstracting patient charts, pathology reports, radiology reports and the results of special tests and procedures. Manual annotation of research data is impractical when experiments involve hundreds or thousands of tissue specimens resulting in large, complex data collections. The purpose of this paper is to review how XML (eXtensible Markup Language) provides the fundamental tools that support biomedical data integration. The article also discusses some of the most important challenges that block the widespread availability of annotated biomedical data sets.
Collapse
Affiliation(s)
- Jules J Berman
- Pathology Informatics, Cancer Diagnosis Program, National Cancer Institute, Rockville, MD 20892, USA.
| | | |
Collapse
|
2
|
Lyttleton O, Wright A, Treanor D, Lewis P. Using XML to encode TMA DES metadata. J Pathol Inform 2011; 2:40. [PMID: 21969921 PMCID: PMC3169921 DOI: 10.4103/2153-3539.84233] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 12/17/2010] [Accepted: 06/27/2011] [Indexed: 12/02/2022] Open
Abstract
Background: The Tissue Microarray Data Exchange Specification (TMA DES) is an XML specification for encoding TMA experiment data. While TMA DES data is encoded in XML, the files that describe its syntax, structure, and semantics are not. The DTD format is used to describe the syntax and structure of TMA DES, and the ISO 11179 format is used to define the semantics of TMA DES. However, XML Schema can be used in place of DTDs, and another XML encoded format, RDF, can be used in place of ISO 11179. Encoding all TMA DES data and metadata in XML would simplify the development and usage of programs which validate and parse TMA DES data. XML Schema has advantages over DTDs such as support for data types, and a more powerful means of specifying constraints on data values. An advantage of RDF encoded in XML over ISO 11179 is that XML defines rules for encoding data, whereas ISO 11179 does not. Materials and Methods: We created an XML Schema version of the TMA DES DTD. We wrote a program that converted ISO 11179 definitions to RDF encoded in XML, and used it to convert the TMA DES ISO 11179 definitions to RDF. Results: We validated a sample TMA DES XML file that was supplied with the publication that originally specified TMA DES using our XML Schema. We successfully validated the RDF produced by our ISO 11179 converter with the W3C RDF validation service. Conclusions: All TMA DES data could be encoded using XML, which simplifies its processing. XML Schema allows datatypes and valid value ranges to be specified for CDEs, which enables a wider range of error checking to be performed using XML Schemas than could be performed using DTDs.
Collapse
|
3
|
Haroske G, Kramm T, Mörz M, Oberholzer M. [Oncological data elements in histopathology]. DER PATHOLOGE 2010; 31:385-92. [PMID: 20544201 DOI: 10.1007/s00292-010-1289-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Academic Contribution Register] [Indexed: 10/19/2022]
Abstract
In order to cope with increasing demands to supply information to a variety of documentation systems outside pathology, pathologists need to set standards both for the content and the use of the information they generate. Oncological datasets based on a set vocabulary are urgently required for use both in pathology and in further processing. Data elements were defined according to German pathology report guidelines for colorectal cancers in line with ISO 11179 requirements for the relations between data element concepts and value domains, as well as for further formal conditions, which can be exported in XML together with metadata information. Tests on 100 conventionally written diagnoses showed their principal usability and an increasing degree of guideline conformity in diagnoses commensurate with training time. This set of oncological data elements is a valuable checklist tool for pathologists, enabling formatted information export for further use and saving documentation effort.
Collapse
Affiliation(s)
- G Haroske
- Institut für Pathologie des Krankenhauses Dresden-Friedrichstadt, Friedrichstr. 41, 01067, Dresden, Deutschland.
| | | | | | | |
Collapse
|
4
|
Abstract
Managing patient test data and documenting regulatory compliance for tests performed at the point of care have traditionally been significant problems. In many situations, manual record-keeping has proven entirely inadequate for maintaining the integrity of the patient medical record or for providing an audit trail for quality assurance activities. Starting in the 1990s, a number of companies began to develop and market point-of-care data management systems. Over time, these data management systems have become increasingly sophisticated. It is now possible to interface multiple point-of-care devices from different manufacturers to a central data manager that is bidirectionally interfaced to the laboratory and hospital information systems. Despite these advances, many challenges remain. True real-time point-of-care "connectivity" across an entire institution has yet to be achieved, and there is still no satisfactory solution for manually performed visually read tests, some of which are commonly performed at the point of care. In the future, wireless point-of-care connectivity solutions hold great promise, but these technologies are yet to be fully developed.
Collapse
Affiliation(s)
- Ji Yeon Kim
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Pathology Service, WRN 219, Massachusetts General Hospital, Boston, MA 02114, USA
| | | |
Collapse
|
5
|
Sintchenko V, Gallego B. Laboratory-Guided Detection of Disease Outbreaks: Three Generations of Surveillance Systems. Arch Pathol Lab Med 2009; 133:916-25. [DOI: 10.5858/133.6.916] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Accepted: 01/06/2009] [Indexed: 11/06/2022]
Abstract
Abstract
Context.—Traditional biothreat surveillance systems are vulnerable to incomplete and delayed reporting of public health threats.
Objective.—To review current and emerging approaches to detection and monitoring of biothreats enabled by laboratory methods of diagnosis and to identify trends in the biosurveillance research.
Data Sources.—PubMed (1995 to December 2007) was searched with the combined search terms “surveillance” and “infectious diseases.” Additional articles were identified by hand searching the bibliographies of selected papers. Additional search terms were “public health,” “disease monitoring,” “cluster,” “outbreak,” “laboratory notification,” “molecular,” “detection,” “evaluation,” “genomics,” “communicable diseases,” “geographic information systems,” “bioterrorism,” “genotyping,” and “informatics.” Publication language was restricted to English. The bibliographies of key references were later hand searched to identify articles missing in the database search. Three approaches to infectious disease surveillance that involve clinical laboratories are contrasted: (1) laboratory-initiated infectious disease notifications, (2) syndromic surveillance based on health indicators, and (3) genotyping based surveillance of biothreats. Advances in molecular diagnostics enable rapid genotyping of biothreats and investigations of genes that were not previously identifiable by traditional methods. There is a need for coordination between syndromic and laboratory-based surveillance. Insufficient and delayed decision support and inadequate integration of surveillance signals into action plans remain the 2 main barriers to efficient public health monitoring and response. Decision support for public health users of biosurveillance alerts is often lacking.
Conclusions.—The merger of the 3 scientific fields of surveillance, genomics, and informatics offers an opportunity for the development of effective and rapid biosurveillance methods and tools.
Collapse
Affiliation(s)
- Vitali Sintchenko
- From the Centre for Infectious Diseases and Microbiology, Western Clinical School, The University of Sydney, Westmead Hospital (Dr Sintchenko), and the Centre for Health Informatics, University of New South Wales (Drs Sintchenko and Gallego), Sydney, Australia
| | - Blanca Gallego
- From the Centre for Infectious Diseases and Microbiology, Western Clinical School, The University of Sydney, Westmead Hospital (Dr Sintchenko), and the Centre for Health Informatics, University of New South Wales (Drs Sintchenko and Gallego), Sydney, Australia
| |
Collapse
|
6
|
Abstract
The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms) different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols) is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.
Collapse
Affiliation(s)
- Nadine Schuurman
- Department of Geography, Simon Fraser University RCB 7123, 8888 University Drive, Burnaby, British Columbia, Canada.
| | | |
Collapse
|
7
|
Abstract
Laboratory informatics is the application of computers and information systems to information management in the pathology laboratory. Effective information management is crucial to the success of pathologists and laboratorians. Informatics has become one of the key pillars of pathology, and the requirement for skilled informaticists in the laboratory has quickly grown. This article provides a wide-ranging review of pertinent aspects of laboratory informatics, and deals with important technical and management processes. Topics covered include personal computing, networks, databases, fundamentals and advanced functions of the laboratory information system, interfaces and standards, digital imaging, coding, hospital information systems and electronic medical records.
Collapse
Affiliation(s)
- Liron Pantanowitz
- Department of Pathology, Baystate Medical Center, Tufts University School of Medicine, 759 Chestnut Street, Springfield, MA 01199, USA.
| | | | | |
Collapse
|
8
|
Abstract
The usefulness of rapid pathogen genotyping is widely recognized, but its effective interpretation and application requires integration into clinical and public health decision-making. How can pathogen genotyping data best be translated to inform disease management and surveillance? Pathogen profiling integrates microbial genomics data into communicable disease control by consolidating phenotypic identity-based methods with DNA microarrays, proteomics, metabolomics and sequence-based typing. Sharing data on pathogen profiles should facilitate our understanding of transmission patterns and the dynamics of epidemics.
Collapse
Affiliation(s)
- Vitali Sintchenko
- Centre for Infectious Diseases and Microbiology Public Health, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Western Clinical School, The University of Sydney, New South Wales, Australia.
| | | | | |
Collapse
|
9
|
Drake TA, Braun J, Marchevsky A, Kohane IS, Fletcher C, Chueh H, Beckwith B, Berkowicz D, Kuo F, Zeng QT, Balis U, Holzbach A, McMurry A, Gee CE, McDonald CJ, Schadow G, Davis M, Hattab EM, Blevins L, Hook J, Becich M, Crowley RS, Taube SE, Berman J. A system for sharing routine surgical pathology specimens across institutions: the Shared Pathology Informatics Network. Hum Pathol 2007; 38:1212-25. [PMID: 17490722 DOI: 10.1016/j.humpath.2007.01.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Academic Contribution Register] [Received: 06/26/2006] [Revised: 01/06/2007] [Accepted: 01/11/2007] [Indexed: 10/23/2022]
Abstract
This report presents an overview for pathologists of the development and potential applications of a novel Web enabled system allowing indexing and retrieval of pathology specimens across multiple institutions. The system was developed through the National Cancer Institute's Shared Pathology Informatics Network program with the goal of creating a prototype system to find existing pathology specimens derived from routine surgical and autopsy procedures ("paraffin blocks") that may be relevant to cancer research. To reach this goal, a number of challenges needed to be met. A central aspect was the development of an informatics system that supported Web-based searching while retaining local control of data. Additional aspects included the development of an eXtensible Markup Language schema, representation of tissue specimen annotation, methods for deidentifying pathology reports, tools for autocoding critical data from these reports using the Unified Medical Language System, and hierarchies of confidentiality and consent that met or exceeded federal requirements. The prototype system supported Web-based querying of millions of pathology reports from 6 participating institutions across the country in a matter of seconds to minutes and the ability of bona fide researchers to identify and potentially to request specific paraffin blocks from the participating institutions. With the addition of associated clinical and outcome information, this system could vastly expand the pool of annotated tissues available for cancer research as well as other diseases.
Collapse
Affiliation(s)
- Thomas A Drake
- Department of Pathology and Laboratory Medicine, UCLA Medical Center, avid Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Burgoon LD. Clearing the Standards Landscape: the Semantics of Terminology and their Impact on Toxicogenomics. Toxicol Sci 2007; 99:403-12. [PMID: 17483496 DOI: 10.1093/toxsci/kfm108] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/15/2022] Open
Abstract
The emergence of the microarray data standards, especially the Minimum Information About a Microarray Experiment (MIAME), has spurred several organizations to develop their own standards for a myriad of technologies, including proteomics and metabolomics. These efforts have facilitated the creation of several large-scale gene expression repositories, including the toxicology-focused Chemical Effects in Biological Systems Knowledgebase at the National Institute of Environmental Health Sciences. Recently, efforts have been moved toward developing toxicogenomic data standards (e.g., MIAME-Tox), and the U.S. Food and Drug Administration and the U.S. Environmental Protection Agency either have developed or are developing regulatory guidance with respect to pharmaco- and toxicogenomics. However, for the toxicology community to be engaged in the process of standards development and approval, there needs to be a more thorough understanding of the terms associated with electronic data sharing and communication, especially with respect to defining the terms "standard," "controlled vocabulary," "object model," "markup language," and "ontology." This review will discuss these terms, especially as they pertain to toxicogenomics, how they relate to one-another, and what current efforts exist that may impact toxicology.
Collapse
Affiliation(s)
- Lyle D Burgoon
- Toxicogenomic Informatics and Solutions, LLC, P.O. Box 27482, Lansing, Michigan 48909, USA.
| |
Collapse
|
11
|
McMurry AJ, Gilbert CA, Reis BY, Chueh HC, Kohane IS, Mandl KD. A self-scaling, distributed information architecture for public health, research, and clinical care. J Am Med Inform Assoc 2007; 14:527-33. [PMID: 17460129 PMCID: PMC2244902 DOI: 10.1197/jamia.m2371] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE This study sought to define a scalable architecture to support the National Health Information Network (NHIN). This architecture must concurrently support a wide range of public health, research, and clinical care activities. STUDY DESIGN The architecture fulfils five desiderata: (1) adopt a distributed approach to data storage to protect privacy, (2) enable strong institutional autonomy to engender participation, (3) provide oversight and transparency to ensure patient trust, (4) allow variable levels of access according to investigator needs and institutional policies, (5) define a self-scaling architecture that encourages voluntary regional collaborations that coalesce to form a nationwide network. RESULTS Our model has been validated by a large-scale, multi-institution study involving seven medical centers for cancer research. It is the basis of one of four open architectures developed under funding from the Office of the National Coordinator of Health Information Technology, fulfilling the biosurveillance use case defined by the American Health Information Community. The model supports broad applicability for regional and national clinical information exchanges. CONCLUSIONS This model shows the feasibility of an architecture wherein the requirements of care providers, investigators, and public health authorities are served by a distributed model that grants autonomy, protects privacy, and promotes participation.
Collapse
Affiliation(s)
- Andrew J McMurry
- Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, 300 Longwood Ave., Enders Room 150, Boston, MA 02115, USA.
| | | | | | | | | | | |
Collapse
|
12
|
Paananen J, Wong G. Integration of genomic data for pharmacology and toxicology using Internet resources. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2006; 17:25-36. [PMID: 16513550 DOI: 10.1080/10659360600562053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Academic Contribution Register] [Indexed: 05/06/2023]
Abstract
Genome based technologies such as sequencing and gene expression profiling using microarrays are creating massive amounts of data. Results from these studies have provided unique insights into targets, biochemical pathways, and biological systems affected by drug or xenobiotic chemical treatments. Moreover, these genomic technologies offer the potential to identify biomarkers for pharmacological development or toxicological prediction. Nonetheless, microarray studies involving a single compound produce useful although limited data. To gain further power from these individual studies, the ability to combine datasets through integration schemes has become imperative. In the current study, we describe and analyze currently available Internet resources designed to address this problem. Many functionalities, such as ability to cross reference orthologous genes across species or to combine same technology platform data, are present in these resources. Nonetheless, these resources are limited in the number of technology platforms they can support. While the ability to integrate all currently existing gene expression datasets remains enigmatic, the current tools provide a partial solution that may still yield unique insights into the affects of exogenous molecules at the level of gene expression.
Collapse
Affiliation(s)
- J Paananen
- Department of Computer Science, University of Kuopio, Finland
| | | |
Collapse
|
13
|
Patel AA, Kajdacsy-Balla A, Berman JJ, Bosland M, Datta MW, Dhir R, Gilbertson J, Melamed J, Orenstein J, Tai KF, Becich MJ. The development of common data elements for a multi-institute prostate cancer tissue bank: the Cooperative Prostate Cancer Tissue Resource (CPCTR) experience. BMC Cancer 2005; 5:108. [PMID: 16111498 PMCID: PMC1236914 DOI: 10.1186/1471-2407-5-108] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 12/24/2004] [Accepted: 08/21/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Cooperative Prostate Cancer Tissue Resource (CPCTR) is a consortium of four geographically dispersed institutions that are funded by the U.S. National Cancer Institute (NCI) to provide clinically annotated prostate cancer tissue samples to researchers. To facilitate this effort, it was critical to arrive at agreed upon common data elements (CDEs) that could be used to collect demographic, pathologic, treatment and clinical outcome data. METHODS The CPCTR investigators convened a CDE curation subcommittee to develop and implement CDEs for the annotation of collected prostate tissues. The draft CDEs were refined and progressively annotated to make them ISO 11179 compliant. The CDEs were implemented in the CPCTR database and tested using software query tools developed by the investigators. RESULTS By collaborative consensus the CPCTR CDE subcommittee developed 145 data elements to annotate the tissue samples collected. These included for each case: 1) demographic data, 2) clinical history, 3) pathology specimen level elements to describe the staging, grading and other characteristics of individual surgical pathology cases, 4) tissue block level annotation critical to managing a virtual inventory of cases and facilitating case selection, and 5) clinical outcome data including treatment, recurrence and vital status. These elements have been used successfully to respond to over 60 requests by end-users for tissue, including paraffin blocks from cases with 5 to 10 years of follow up, tissue microarrays (TMAs), as well as frozen tissue collected prospectively for genomic profiling and genetic studies. The CPCTR CDEs have been fully implemented in two major tissue banks and have been shared with dozens of other tissue banking efforts. CONCLUSION The freely available CDEs developed by the CPCTR are robust, based on "best practices" for tissue resources, and are ISO 11179 compliant. The process for CDE development described in this manuscript provides a framework model for other organ sites and has been used as a model for breast and melanoma tissue banking efforts.
Collapse
Affiliation(s)
- Ashokkumar A Patel
- Department of Pathology, Center for Pathology Informatics, Benedum Oncology Informatics Center, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Jules J Berman
- Cancer Diagnosis Program, National Cancer Institute, Bethesda, MD, USA
| | - Maarten Bosland
- Departments of Environmental Medicine and Urology, New York University, New York, NY, USA
| | - Milton W Datta
- Departments of Pathology and Urology, Emory University, Atlanta, GA, USA
| | - Rajiv Dhir
- Department of Pathology, Center for Pathology Informatics, Benedum Oncology Informatics Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - John Gilbertson
- Department of Pathology, Center for Pathology Informatics, Benedum Oncology Informatics Center, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Jan Orenstein
- Department of Pathology, George Washington University, Washington, DC, USA
| | - Kuei-Fang Tai
- Bioinformatics Program, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Michael J Becich
- Department of Pathology, Center for Pathology Informatics, Benedum Oncology Informatics Center, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|