1
|
Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature. BMC Bioinformatics 2016; 17:250. [PMID: 27333889 PMCID: PMC4918084 DOI: 10.1186/s12859-016-1080-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 05/11/2016] [Indexed: 01/12/2023] Open
Abstract
Background Identification of associations between marketed drugs and adverse events from the biomedical literature assists drug safety monitoring efforts. Assessing the significance of such literature-derived associations and determining the granularity at which they should be captured remains a challenge. Here, we assess how defining a selection of adverse event terms from MeSH, based on information content, can improve the detection of adverse events for drugs and drug classes. Results We analyze a set of 105,354 candidate drug adverse event pairs extracted from article indexes in MEDLINE. First, we harmonize extracted adverse event terms by aggregating them into higher-level MeSH terms based on the terms’ information content. Then, we determine statistical enrichment of adverse events associated with drug and drug classes using a conditional hypergeometric test that adjusts for dependencies among associated terms. We compare our results with methods based on disproportionality analysis (proportional reporting ratio, PRR) and quantify the improvement in signal detection with our generalized enrichment analysis (GEA) approach using a gold standard of drug-adverse event associations spanning 174 drugs and four events. For single drugs, the best GEA method (Precision: .92/Recall: .71/F1-measure: .80) outperforms the best PRR based method (.69/.69/.69) on all four adverse event outcomes in our gold standard. For drug classes, our GEA performs similarly (.85/.69/.74) when increasing the level of abstraction for adverse event terms. Finally, on examining the 1609 individual drugs in our MEDLINE set, which map to chemical substances in ATC, we find signals for 1379 drugs (10,122 unique adverse event associations) on applying GEA with p < 0.005. Conclusions We present an approach based on generalized enrichment analysis that can be used to detect associations between drugs, drug classes and adverse events at a given level of granularity, at the same time correcting for known dependencies among events. Our study demonstrates the use of GEA, and the importance of choosing appropriate abstraction levels to complement current drug safety methods. We provide an R package for exploration of alternative abstraction levels of adverse event terms based on information content. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1080-z) contains supplementary material, which is available to authorized users.
Collapse
|
2
|
Statin Intensity or Achieved LDL? Practice-based Evidence for the Evaluation of New Cholesterol Treatment Guidelines. PLoS One 2016; 11:e0154952. [PMID: 27227451 PMCID: PMC4881915 DOI: 10.1371/journal.pone.0154952] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2016] [Accepted: 04/21/2016] [Indexed: 01/14/2023] Open
Abstract
Background The recently updated American College of Cardiology/American Heart Association cholesterol treatment guidelines outline a paradigm shift in the approach to cardiovascular risk reduction. One major change included a recommendation that practitioners prescribe fixed dose statin regimens rather than focus on specific LDL targets. The goal of this study was to determine whether achieved LDL or statin intensity was more strongly associated with major adverse cardiac events (MACE) using practice-based data from electronic health records (EHR). Methods We analyzed the EHR data of more than 40,000 adult patients on statin therapy between 1995 and 2013. Demographic and clinical variables were extracted from coded data and unstructured clinical text. To account for treatment selection bias we performed propensity score stratification as well as 1:1 propensity score matched analyses. Conditional Cox proportional hazards modeling was used to identify variables associated with MACE. Results We identified 7,373 adults with complete data whose cholesterol appeared to be actively managed. In a stratified propensity score analysis of the entire cohort over 3.3 years of follow-up, achieved LDL was a significant predictor of MACE outcome (Hazard Ratio 1.1; 95% confidence interval, 1.05–1.2; P < 0.0004), while statin intensity was not. In a 1:1 propensity score matched analysis performed to more aggressively control for covariate balance between treatment groups, achieved LDL remained significantly associated with MACE (HR 1.3; 95% CI, 1.03–1.7; P = 0.03) while treatment intensity again was not a significant predictor. Conclusions Using EHR data we found that on-treatment achieved LDL level was a significant predictor of MACE. Statin intensity alone was not associated with outcomes. These findings imply that despite recent guidelines, achieved LDL levels are clinically important and LDL titration strategies warrant further investigation in clinical trials.
Collapse
|
3
|
Poole S, Schroeder LF, Shah N. An unsupervised learning method to identify reference intervals from a clinical database. J Biomed Inform 2015; 59:276-84. [PMID: 26707631 DOI: 10.1016/j.jbi.2015.12.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Revised: 12/08/2015] [Accepted: 12/13/2015] [Indexed: 12/15/2022]
Abstract
Reference intervals are critical for the interpretation of laboratory results. The development of reference intervals using traditional methods is time consuming and costly. An alternative approach, known as an a posteriori method, requires an expert to enumerate diagnoses and procedures that can affect the measurement of interest. We develop a method, LIMIT, to use laboratory test results from a clinical database to identify ICD9 codes that are associated with extreme laboratory results, thus automating the a posteriori method. LIMIT was developed using sodium serum levels, and validated using potassium serum levels, both tests for which harmonized reference intervals already exist. To test LIMIT, reference intervals for total hemoglobin in whole blood were learned, and were compared with the hemoglobin reference intervals found using an existing a posteriori approach. In addition, prescription of iron supplements were used to identify individuals whose hemoglobin levels were low enough for a clinician to choose to take action. This prescription data indicating clinical action was then used to estimate the validity of the hemoglobin reference interval sets. Results show that LIMIT produces usable reference intervals for sodium, potassium and hemoglobin laboratory tests. The hemoglobin intervals produced using the data driven approaches consistently had higher positive predictive value and specificity in predicting an iron supplement prescription than the existing intervals. LIMIT represents a fast and inexpensive solution for calculating reference intervals, and shows that it is possible to use laboratory results and coded diagnoses to learn laboratory test reference intervals from clinical data warehouses.
Collapse
Affiliation(s)
- Sarah Poole
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States.
| | - Lee Frederick Schroeder
- Department of Pathology, University of Michigan School of Medicine, Ann Arbor, MI, United States
| | - Nigam Shah
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States
| |
Collapse
|
4
|
Peng J, Wang T, Wang J, Wang Y, Chen J. Extending gene ontology with gene association networks. Bioinformatics 2015; 32:1185-94. [PMID: 26644414 DOI: 10.1093/bioinformatics/btv712] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 11/26/2015] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Gene ontology (GO) is a widely used resource to describe the attributes for gene products. However, automatic GO maintenance remains to be difficult because of the complex logical reasoning and the need of biological knowledge that are not explicitly represented in the GO. The existing studies either construct whole GO based on network data or only infer the relations between existing GO terms. None is purposed to add new terms automatically to the existing GO. RESULTS We proposed a new algorithm 'GOExtender' to efficiently identify all the connected gene pairs labeled by the same parent GO terms. GOExtender is used to predict new GO terms with biological network data, and connect them to the existing GO. Evaluation tests on biological process and cellular component categories of different GO releases showed that GOExtender can extend new GO terms automatically based on the biological network. Furthermore, we applied GOExtender to the recent release of GO and discovered new GO terms with strong support from literature. AVAILABILITY AND IMPLEMENTATION Software and supplementary document are available at www.msu.edu/%7Ejinchen/GOExtender CONTACT jinchen@msu.edu or ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China, Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI 48824, USA
| | - Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jixuan Wang
- School of Software, Harbin Institute of Technology, Harbin, China and
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jin Chen
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI 48824, USA, Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
5
|
Cherry JM. The Saccharomyces Genome Database: Gene Product Annotation of Function, Process, and Component. Cold Spring Harb Protoc 2015; 2015:pdb.prot088914. [PMID: 26631125 PMCID: PMC5673600 DOI: 10.1101/pdb.prot088914] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
An ontology is a highly structured form of controlled vocabulary. Each entry in the ontology is commonly called a term. These terms are used when talking about an annotation. However, each term has a definition that, like the definition of a word found within a dictionary, provides the complete usage and detailed explanation of the term. It is critical to consult a term's definition because the distinction between terms can be subtle. The use of ontologies in biology started as a way of unifying communication between scientific communities and to provide a standard dictionary for different topics, including molecular functions, biological processes, mutant phenotypes, chemical properties and structures. The creation of ontology terms and their definitions often requires debate to reach agreement but the result has been a unified descriptive language used to communicate knowledge. In addition to terms and definitions, ontologies require a relationship used to define the type of connection between terms. In an ontology, a term can have more than one parent term, the term above it in an ontology, as well as more than one child, the term below it in the ontology. Many ontologies are used to construct annotations in the Saccharomyces Genome Database (SGD), as in all modern biological databases; however, Gene Ontology (GO), a descriptive system used to categorize gene function, is the most extensively used ontology in SGD annotations. Examples included in this protocol illustrate the structure and features of this ontology.
Collapse
Affiliation(s)
- J. Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305-5120
| |
Collapse
|
6
|
Younesi E, Malhotra A, Gündel M, Scordis P, Kodamullil AT, Page M, Müller B, Springstubbe S, Wüllner U, Scheller D, Hofmann-Apitius M. PDON: Parkinson's disease ontology for representation and modeling of the Parkinson's disease knowledge domain. Theor Biol Med Model 2015; 12:20. [PMID: 26395080 PMCID: PMC4580356 DOI: 10.1186/s12976-015-0017-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 09/14/2015] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Despite the unprecedented and increasing amount of data, relatively little progress has been made in molecular characterization of mechanisms underlying Parkinson's disease. In the area of Parkinson's research, there is a pressing need to integrate various pieces of information into a meaningful context of presumed disease mechanism(s). Disease ontologies provide a novel means for organizing, integrating, and standardizing the knowledge domains specific to disease in a compact, formalized and computer-readable form and serve as a reference for knowledge exchange or systems modeling of disease mechanism. METHODS The Parkinson's disease ontology was built according to the life cycle of ontology building. Structural, functional, and expert evaluation of the ontology was performed to ensure the quality and usability of the ontology. A novelty metric has been introduced to measure the gain of new knowledge using the ontology. Finally, a cause-and-effect model was built around PINK1 and two gene expression studies from the Gene Expression Omnibus database were re-annotated to demonstrate the usability of the ontology. RESULTS The Parkinson's disease ontology with a subclass-based taxonomic hierarchy covers the broad spectrum of major biomedical concepts from molecular to clinical features of the disease, and also reflects different views on disease features held by molecular biologists, clinicians and drug developers. The current version of the ontology contains 632 concepts, which are organized under nine views. The structural evaluation showed the balanced dispersion of concept classes throughout the ontology. The functional evaluation demonstrated that the ontology-driven literature search could gain novel knowledge not present in the reference Parkinson's knowledge map. The ontology was able to answer specific questions related to Parkinson's when evaluated by experts. Finally, the added value of the Parkinson's disease ontology is demonstrated by ontology-driven modeling of PINK1 and re-annotation of gene expression datasets relevant to Parkinson's disease. CONCLUSIONS Parkinson's disease ontology delivers the knowledge domain of Parkinson's disease in a compact, computer-readable form, which can be further edited and enriched by the scientific community and also to be used to construct, represent and automatically extend Parkinson's-related computable models. A practical version of the Parkinson's disease ontology for browsing and editing can be publicly accessed at http://bioportal.bioontology.org/ontologies/PDON .
Collapse
Affiliation(s)
- Erfan Younesi
- Department of Bionformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754, Sankt Augustin, Germany.
| | - Ashutosh Malhotra
- Department of Bionformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754, Sankt Augustin, Germany.
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, 53113, Bonn, Germany.
| | - Michaela Gündel
- Department of Bionformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754, Sankt Augustin, Germany.
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, 53113, Bonn, Germany.
| | - Phil Scordis
- Informatics group, UCB Pharma, 208 Bath Road, Slough, UK.
| | - Alpha Tom Kodamullil
- Department of Bionformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754, Sankt Augustin, Germany.
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, 53113, Bonn, Germany.
| | - Matt Page
- Informatics group, UCB Pharma, 208 Bath Road, Slough, UK.
| | - Bernd Müller
- Department of Bionformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754, Sankt Augustin, Germany.
| | - Stephan Springstubbe
- Department of Bionformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754, Sankt Augustin, Germany.
| | - Ullrich Wüllner
- Department of Neurology, University of Bonn, 53105, Bonn, Germany.
| | - Dieter Scheller
- Pharmacology Parkinson's Disease and Movement Disorders, UCB Pharma S.A., Chemin du Foriest, B-1420, Braine-l'Allued, Belgium.
| | - Martin Hofmann-Apitius
- Department of Bionformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754, Sankt Augustin, Germany.
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, 53113, Bonn, Germany.
| |
Collapse
|
7
|
Regan K, Payne PRO. From Molecules to Patients: The Clinical Applications of Translational Bioinformatics. Yearb Med Inform 2015; 10:164-9. [PMID: 26293863 PMCID: PMC4587059 DOI: 10.15265/iy-2015-005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE In order to realize the promise of personalized medicine, Translational Bioinformatics (TBI) research will need to continue to address implementation issues across the clinical spectrum. In this review, we aim to evaluate the expanding field of TBI towards clinical applications, and define common themes and current gaps in order to motivate future research. METHODS Here we present the state-of-the-art of clinical implementation of TBI-based tools and resources. Our thematic analyses of a targeted literature search of recent TBI-related articles ranged across topics in genomics, data management, hypothesis generation, molecular epidemiology, diagnostics, therapeutics and personalized medicine. RESULTS Open areas of clinically-relevant TBI research identified in this review include developing data standards and best practices, publicly available resources, integrative systemslevel approaches, user-friendly tools for clinical support, cloud computing solutions, emerging technologies and means to address pressing legal, ethical and social issues. CONCLUSIONS There is a need for further research bridging the gap from foundational TBI-based theories and methodologies to clinical implementation. We have organized the topic themes presented in this review into four conceptual foci - domain analyses, knowledge engineering, computational architectures and computation methods alongside three stages of knowledge development in order to orient future TBI efforts to accelerate the goals of personalized medicine.
Collapse
Affiliation(s)
| | - P R O Payne
- Philip R.O. Payne, PhD, FACMI, The Ohio State University, Department of Biomedical Informatics, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH 43210, USA, Tel: +1 614 292 4778, E-mail:
| |
Collapse
|
8
|
Liang X, Li H, Li S. A novel network pharmacology approach to analyse traditional herbal formulae: the Liu-Wei-Di-Huang pill as a case study. MOLECULAR BIOSYSTEMS 2014; 10:1014-22. [PMID: 24492828 DOI: 10.1039/c3mb70507b] [Citation(s) in RCA: 184] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Understanding the mechanisms of the pharmacological effects of herbal formulae from traditional Chinese medicine (TCM) is important for their appropriate application. However, this understanding has been impeded by the complex nature of herbal formulae. A herbal formula is a mixture of hundreds of chemical ingredients with multiple potential targets. The effects produced by an entire herbal formula cannot be adequately explained by considering separately each ingredient in it. This is a recognised problem that remains in need of methods to solve it. Here we introduce a holistic analysis method to decipher the molecular mechanisms of herbal formulae. This method combines chemical and therapeutic properties with network pharmacology, using a novel approach to evaluate the importance of the targets and ingredients of herbal formulae. We used the Liu-Wei-Di-Huang (LWDH) pill, a classic herbal formula, as an example to illustrate our method and validated some results by a following experiment. We revealed the core molecular targets and bioprocess network of the pharmacological effects of LWDH and inferred its therapeutic indications. This method provides a novel strategy to understand the mechanisms of herbal formulae in a holistic way and implies new applications of classic herbal formulae.
Collapse
Affiliation(s)
- Xujun Liang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China.
| | | | | |
Collapse
|
9
|
Kibbe WA, Arze C, Felix V, Mitraka E, Bolton E, Fu G, Mungall CJ, Binder JX, Malone J, Vasant D, Parkinson H, Schriml LM. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res 2014; 43:D1071-8. [PMID: 25348409 PMCID: PMC4383880 DOI: 10.1093/nar/gku1011] [Citation(s) in RCA: 371] [Impact Index Per Article: 37.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years. These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. This will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning.
Collapse
Affiliation(s)
- Warren A Kibbe
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Cesar Arze
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Victor Felix
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Elvira Mitraka
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Evan Bolton
- PubChem, National Center for Biotechnology Information, National Library of Medicine National Institutes of Health Department of Health and Human Services 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Gang Fu
- PubChem, National Center for Biotechnology Information, National Library of Medicine National Institutes of Health Department of Health and Human Services 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | - Janos X Binder
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, 69117, Germany Bioinformatics Core Facility, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, 4362, Luxembourg
| | - James Malone
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Drashtti Vasant
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lynn M Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|
10
|
Gan Z, Wang J, Salomonis N, Stowe JC, Haddad GG, McCulloch AD, Altintas I, Zambon AC. MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data. BMC Bioinformatics 2014; 15:69. [PMID: 24621103 PMCID: PMC3975178 DOI: 10.1186/1471-2105-15-69] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 02/27/2014] [Indexed: 12/16/2022] Open
Abstract
Background Mandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons. Results We developed a workflow to facilitate and standardize Meta-Analysis of Affymetrix Microarray Data analysis (MAAMD) in Kepler. Two freely available stand-alone software tools, R and AltAnalyze were embedded in MAAMD. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. MAAMD was tested by analyzing 4 different GEO datasets from mice and drosophila. MAAMD automates data downloading, data organization, data quality control assesment, differential gene expression analysis, clustering analysis, pathway visualization, gene-set enrichment analysis, and cross-species orthologous-gene comparisons. MAAMD was utilized to identify gene orthologues responding to hypoxia or hyperoxia in both mice and drosophila. The entire set of analyses for 4 datasets (34 total microarrays) finished in ~ one hour. Conclusions MAAMD saves time, minimizes the required computer skills, and offers a standardized procedure for users to analyze microarray datasets and make new intra- and inter-dataset comparisons.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Alexander C Zambon
- Department of Pharmacology, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
11
|
Mechanistic phenotypes: an aggregative phenotyping strategy to identify disease mechanisms using GWAS data. PLoS One 2013; 8:e81503. [PMID: 24349080 PMCID: PMC3861317 DOI: 10.1371/journal.pone.0081503] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 10/23/2013] [Indexed: 11/19/2022] Open
Abstract
A single mutation can alter cellular and global homeostatic mechanisms and give rise to multiple clinical diseases. We hypothesized that these disease mechanisms could be identified using low minor allele frequency (MAF<0.1) non-synonymous SNPs (nsSNPs) associated with “mechanistic phenotypes”, comprised of collections of related diagnoses. We studied two mechanistic phenotypes: (1) thrombosis, evaluated in a population of 1,655 African Americans; and (2) four groupings of cancer diagnoses, evaluated in 3,009 white European Americans. We tested associations between nsSNPs represented on GWAS platforms and mechanistic phenotypes ascertained from electronic medical records (EMRs), and sought enrichment in functional ontologies across the top-ranked associations. We used a two-step analytic approach whereby nsSNPs were first sorted by the strength of their association with a phenotype. We tested associations using two reverse genetic models and standard additive and recessive models. In the second step, we employed a hypothesis-free ontological enrichment analysis using the sorted nsSNPs to identify functional mechanisms underlying the diagnoses comprising the mechanistic phenotypes. The thrombosis phenotype was solely associated with ontologies related to blood coagulation (Fisher's p = 0.0001, FDR p = 0.03), driven by the F5, P2RY12 and F2RL2 genes. For the cancer phenotypes, the reverse genetics models were enriched in DNA repair functions (p = 2×10−5, FDR p = 0.03) (POLG/FANCI, SLX4/FANCP, XRCC1, BRCA1, FANCA, CHD1L) while the additive model showed enrichment related to chromatid segregation (p = 4×10−6, FDR p = 0.005) (KIF25, PINX1). We were able to replicate nsSNP associations for POLG/FANCI, BRCA1, FANCA and CHD1L in independent data sets. Mechanism-oriented phenotyping using collections of EMR-derived diagnoses can elucidate fundamental disease mechanisms.
Collapse
|
12
|
Peterson TA, Doughty E, Kann MG. Towards precision medicine: advances in computational approaches for the analysis of human variants. J Mol Biol 2013; 425:4047-63. [PMID: 23962656 PMCID: PMC3807015 DOI: 10.1016/j.jmb.2013.08.008] [Citation(s) in RCA: 106] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/07/2013] [Accepted: 08/08/2013] [Indexed: 12/26/2022]
Abstract
Variations and similarities in our individual genomes are part of our history, our heritage, and our identity. Some human genomic variants are associated with common traits such as hair and eye color, while others are associated with susceptibility to disease or response to drug treatment. Identifying the human variations producing clinically relevant phenotypic changes is critical for providing accurate and personalized diagnosis, prognosis, and treatment for diseases. Furthermore, a better understanding of the molecular underpinning of disease can lead to development of new drug targets for precision medicine. Several resources have been designed for collecting and storing human genomic variations in highly structured, easily accessible databases. Unfortunately, a vast amount of information about these genetic variants and their functional and phenotypic associations is currently buried in the literature, only accessible by manual curation or sophisticated text text-mining technology to extract the relevant information. In addition, the low cost of sequencing technologies coupled with increasing computational power has enabled the development of numerous computational methodologies to predict the pathogenicity of human variants. This review provides a detailed comparison of current human variant resources, including HGMD, OMIM, ClinVar, and UniProt/Swiss-Prot, followed by an overview of the computational methods and techniques used to leverage the available data to predict novel deleterious variants. We expect these resources and tools to become the foundation for understanding the molecular details of genomic variants leading to disease, which in turn will enable the promise of precision medicine.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Emily Doughty
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA
| | - Maricel G Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| |
Collapse
|
13
|
Brinkley JF, Borromeo C, Clarkson M, Cox TC, Cunningham MJ, Detwiler LT, Heike CL, Hochheiser H, Mejino JLV, Travillian RS, Shapiro LG. The ontology of craniofacial development and malformation for translational craniofacial research. AMERICAN JOURNAL OF MEDICAL GENETICS PART C-SEMINARS IN MEDICAL GENETICS 2013; 163C:232-45. [PMID: 24124010 DOI: 10.1002/ajmg.c.31377] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
We introduce the Ontology of Craniofacial Development and Malformation (OCDM) as a mechanism for representing knowledge about craniofacial development and malformation, and for using that knowledge to facilitate integrating craniofacial data obtained via multiple techniques from multiple labs and at multiple levels of granularity. The OCDM is a project of the NIDCR-sponsored FaceBase Consortium, whose goal is to promote and enable research into the genetic and epigenetic causes of specific craniofacial abnormalities through the provision of publicly accessible, integrated craniofacial data. However, the OCDM should be usable for integrating any web-accessible craniofacial data, not just those data available through FaceBase. The OCDM is based on the Foundational Model of Anatomy (FMA), our comprehensive ontology of canonical human adult anatomy, and includes modules to represent adult and developmental craniofacial anatomy in both human and mouse, mappings between homologous structures in human and mouse, and associated malformations. We describe these modules, as well as prototype uses of the OCDM for integrating craniofacial data. By using the terms from the OCDM to annotate data, and by combining queries over the ontology with those over annotated data, it becomes possible to create "intelligent" queries that can, for example, find gene expression data obtained from mouse structures that are precursors to homologous human structures involved in malformations such as cleft lip. We suggest that the OCDM can be useful not only for integrating craniofacial data, but also for expressing new knowledge gained from analyzing the integrated data.
Collapse
|