1
|
Liu J, Liu Q, Zhang L, Su S, Liu Y. Enabling Massive XML-Based Biological Data Management in HBase. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1994-2004. [PMID: 31094692 DOI: 10.1109/tcbb.2019.2915811] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Publishing biological data in XML formats is attractive for organizations who would like to provide their bioinformatics resources in an extensible and machine-readable format. In the era of big data, massive XML-based biological data management is emerged as a challengeable issue. With the continuous growth of the XML-based biological data sets, it is usually frustrating to use traditional declarative query languages to provide efficient query capabilities in terms of processing speed and scale. In this study, we report a novel platform to store and query massive XML-based biological data collections. A prototype tool for constructing HBase tables from XML-based biological data collections is first developed, and then a formal approach to transform the XML query model into the MapReduce query model is proposed. Finally, an evaluation of the query performance of the proposed approach on the existing XML-based biological databases is presented, showing that the performance advantages of the proposed solution. The source code of the massive XML-based biological data management platform is freely available at https://github.com/lyotvincent/X2H.
Collapse
|
2
|
Wu C, Devkota B, Evans P, Zhao X, Baker SW, Niazi R, Cao K, Gonzalez MA, Jayaraman P, Conlin LK, Krock BL, Deardorff MA, Spinner NB, Krantz ID, Santani AB, Tayoun ANA, Sarmady M. Rapid and accurate interpretation of clinical exomes using Phenoxome: a computational phenotype-driven approach. Eur J Hum Genet 2019; 27:612-620. [PMID: 30626929 PMCID: PMC6460638 DOI: 10.1038/s41431-018-0328-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 10/10/2018] [Accepted: 12/11/2018] [Indexed: 01/26/2023] Open
Abstract
Clinical exome sequencing (CES) has become the preferred diagnostic platform for complex pediatric disorders with suspected monogenic etiologies. Despite rapid advancements, the major challenge still resides in identifying the casual variants among the thousands of variants detected during CES testing, and thus establishing a molecular diagnosis. To improve the clinical exome diagnostic efficiency, we developed Phenoxome, a robust phenotype-driven model that adopts a network-based approach to facilitate automated variant prioritization. Phenoxome dissects the phenotypic manifestation of a patient in concert with their genomic profile to filter and then prioritize variants that are likely to affect the function of the gene (potentially pathogenic variants). To validate our method, we have compiled a clinical cohort of 105 positive patient samples that represent a wide range of genetic heterogeneity. Phenoxome identifies the causative variants within the top 5, 10, or 25 candidates in more than 50%, 71%, or 88% of these exomes, respectively. Furthermore, we show that our method is optimized for clinical testing by outperforming the current state-of-art method. We have demonstrated the performance of Phenoxome using a clinical cohort and showed that it enables rapid and accurate interpretation of clinical exomes. Phenoxome is available at https://phenoxome.chop.edu/ .
Collapse
Affiliation(s)
- Chao Wu
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Batsal Devkota
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Perry Evans
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Xiaonan Zhao
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Samuel W Baker
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Rojeen Niazi
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Kajia Cao
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Michael A Gonzalez
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Pushkala Jayaraman
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Laura K Conlin
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Bryan L Krock
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Matthew A Deardorff
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
- Division of Human Genetics, Department of Pediatrics, Roberts individualized Medical Genetics Center, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nancy B Spinner
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ian D Krantz
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
- Division of Human Genetics, Department of Pediatrics, Roberts individualized Medical Genetics Center, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Avni B Santani
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ahmad N Abou Tayoun
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Mahdi Sarmady
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
3
|
Yang C, Huang C, Su J. An improved SAO network-based method for technology trend analysis: A case study of graphene. J Informetr 2018. [DOI: 10.1016/j.joi.2018.01.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
4
|
Kaalia R, Ghosh I. Semantics based approach for analyzing disease-target associations. J Biomed Inform 2016; 62:125-35. [PMID: 27349858 DOI: 10.1016/j.jbi.2016.06.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 06/23/2016] [Accepted: 06/24/2016] [Indexed: 12/16/2022]
Abstract
BACKGROUND A complex disease is caused by heterogeneous biological interactions between genes and their products along with the influence of environmental factors. There have been many attempts for understanding the cause of these diseases using experimental, statistical and computational methods. In the present work the objective is to address the challenge of representation and integration of information from heterogeneous biomedical aspects of a complex disease using semantics based approach. METHODS Semantic web technology is used to design Disease Association Ontology (DAO-db) for representation and integration of disease associated information with diabetes as the case study. The functional associations of disease genes are integrated using RDF graphs of DAO-db. Three semantic web based scoring algorithms (PageRank, HITS (Hyperlink Induced Topic Search) and HITS with semantic weights) are used to score the gene nodes on the basis of their functional interactions in the graph. RESULTS Disease Association Ontology for Diabetes (DAO-db) provides a standard ontology-driven platform for describing genes, proteins, pathways involved in diabetes and for integrating functional associations from various interaction levels (gene-disease, gene-pathway, gene-function, gene-cellular component and protein-protein interactions). An automatic instance loader module is also developed in present work that helps in adding instances to DAO-db on a large scale. CONCLUSIONS Our ontology provides a framework for querying and analyzing the disease associated information in the form of RDF graphs. The above developed methodology is used to predict novel potential targets involved in diabetes disease from the long list of loose (statistically associated) gene-disease associations.
Collapse
Affiliation(s)
- Rama Kaalia
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Indira Ghosh
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
5
|
Han Y, Li L, Zhang Y, Yuan H, Ye L, Zhao J, Duan DD. Phenomics of Vascular Disease: The Systematic Approach to the Combination Therapy. Curr Vasc Pharmacol 2016; 13:433-40. [PMID: 25313004 PMCID: PMC4397150 DOI: 10.2174/1570161112666141014144829] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 02/15/2014] [Accepted: 05/21/2014] [Indexed: 12/28/2022]
Abstract
Vascular diseases are usually caused by multifactorial pathogeneses involving genetic and environmental factors. Our current understanding of vascular disease is, however, based on the focused genotype/phenotype studies driven by the “one-gene/one-phenotype” hypothesis. Drugs with “pure target” at individual molecules involved in the pathophysiological pathways are the mainstream of current clinical treatments and the basis of combination therapy of vascular diseases. Recently, the combination of genomics, proteomics, and metabolomics has unraveled the etiology and pathophysiology of vascular disease in a big-data fashion and also revealed unmatched relationships between the omic variability and the much narrower definition of various clinical phenotypes of vascular disease in individual patients. Here, we introduce the phenomics strategy that will change the conventional focused phenotype/genotype/genome study to a new systematic phenome/genome/proteome approach to the understanding of pathophysiology and combination therapy of vascular disease. A phenome is the sum total of an organism’s phenotypic traits that signify the expression of genome and specific environmental influence. Phenomics is the study of phenome to quantitatively correlate complex traits to variability not only in genome, but also in transcriptome, proteome, metabolome, interactome, and environmental factors by exploring the systems biology that links the genomic and phenomic spaces. The application of phenomics and the phenome-wide associated study (PheWAS) will not only identify a systemically-integrated set of biomarkers for diagnosis and prognosis of vascular disease but also provide novel treatment targets for combination therapy and thus make a revolutionary paradigm shift in the clinical treatment of these devastating diseases.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Dayue Darrel Duan
- Laboratory of Cardiovascular Phenomics, Department of Pharmacology, University of Nevada School of Medicine, Center for Molecular Medicine 303F, 1664 N Virginia Street/MS 318, Reno, Nevada 89557-0318, USA.
| |
Collapse
|
6
|
Luo J, Liang S. Prioritization of potential candidate disease genes by topological similarity of protein–protein interaction network and phenotype data. J Biomed Inform 2015; 53:229-36. [DOI: 10.1016/j.jbi.2014.11.004] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Revised: 10/31/2014] [Accepted: 11/07/2014] [Indexed: 11/28/2022]
|
7
|
Machado CM, Rebholz-Schuhmann D, Freitas AT, Couto FM. The semantic web in translational medicine: current applications and future directions. Brief Bioinform 2015; 16:89-103. [PMID: 24197933 PMCID: PMC4293377 DOI: 10.1093/bib/bbt079] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 10/08/2013] [Indexed: 11/14/2022] Open
Abstract
Semantic web technologies offer an approach to data integration and sharing, even for resources developed independently or broadly distributed across the web. This approach is particularly suitable for scientific domains that profit from large amounts of data that reside in the public domain and that have to be exploited in combination. Translational medicine is such a domain, which in addition has to integrate private data from the clinical domain with proprietary data from the pharmaceutical domain. In this survey, we present the results of our analysis of translational medicine solutions that follow a semantic web approach. We assessed these solutions in terms of their target medical use case; the resources covered to achieve their objectives; and their use of existing semantic web resources for the purposes of data sharing, data interoperability and knowledge discovery. The semantic web technologies seem to fulfill their role in facilitating the integration and exploration of data from disparate sources, but it is also clear that simply using them is not enough. It is fundamental to reuse resources, to define mappings between resources, to share data and knowledge. All these aspects allow the instantiation of translational medicine at the semantic web-scale, thus resulting in a network of solutions that can share resources for a faster transfer of new scientific results into the clinical practice. The envisioned network of translational medicine solutions is on its way, but it still requires resolving the challenges of sharing protected data and of integrating semantic-driven technologies into the clinical practice.
Collapse
Affiliation(s)
- Catia M. Machado
- *Corresponding author. Catia M. Machado, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Portugal and Instituto de Engenharia de Sistemas e Computadores - Investigação e Desenvolvimento, Universidade de Lisboa, Portugal. E-mail:
| | | | | | | |
Collapse
|
8
|
Shin D, Arthur G, Popescu M, Korkin D, Shyu CR. Uncovering influence links in molecular knowledge networks to streamline personalized medicine. J Biomed Inform 2014; 52:394-405. [PMID: 25150201 DOI: 10.1016/j.jbi.2014.08.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Revised: 08/04/2014] [Accepted: 08/08/2014] [Indexed: 01/10/2023]
Abstract
OBJECTIVES We developed Resource Description Framework (RDF)-induced InfluGrams (RIIG) - an informatics formalism to uncover complex relationships among biomarker proteins and biological pathways using the biomedical knowledge bases. We demonstrate an application of RIIG in morphoproteomics, a theranostic technique aimed at comprehensive analysis of protein circuitries to design effective therapeutic strategies in personalized medicine setting. METHODS RIIG uses an RDF "mashup" knowledge base that integrates publicly available pathway and protein data with ontologies. To mine for RDF-induced Influence Links, RIIG introduces notions of RDF relevancy and RDF collider, which mimic conditional independence and "explaining away" mechanism in probabilistic systems. Using these notions and constraint-based structure learning algorithms, the formalism generates the morphoproteomic diagrams, which we call InfluGrams, for further analysis by experts. RESULTS RIIG was able to recover up to 90% of predefined influence links in a simulated environment using synthetic data and outperformed a naïve Monte Carlo sampling of random links. In clinical cases of Acute Lymphoblastic Leukemia (ALL) and Mesenchymal Chondrosarcoma, a significant level of concordance between the RIIG-generated and expert-built morphoproteomic diagrams was observed. In a clinical case of Squamous Cell Carcinoma, RIIG allowed selection of alternative therapeutic targets, the validity of which was supported by a systematic literature review. We have also illustrated an ability of RIIG to discover novel influence links in the general case of the ALL. CONCLUSIONS Applications of the RIIG formalism demonstrated its potential to uncover patient-specific complex relationships among biological entities to find effective drug targets in a personalized medicine setting. We conclude that RIIG provides an effective means not only to streamline morphoproteomic studies, but also to bridge curated biomedical knowledge and causal reasoning with the clinical data in general.
Collapse
Affiliation(s)
- Dmitriy Shin
- University of Missouri, School of Medicine, Department of Pathology and Anatomical Sciences, Columbia, MO 65212, United States; University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States.
| | - Gerald Arthur
- University of Missouri, School of Medicine, Department of Pathology and Anatomical Sciences, Columbia, MO 65212, United States; University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States
| | - Mihail Popescu
- University of Missouri, School of Medicine, Department of Health Management and Informatics, Columbia, MO 65212, United States; University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States; University of Missouri, College of Engineering, Department of Computer Science, Columbia, MO 65211, United States
| | - Dmitry Korkin
- Worcester Polytechnic Institute, Department of Computer Science, Department of Biology and Biotechnology, Department of Applied Math, Worcester, MA 01609, United States
| | - Chi-Ren Shyu
- University of Missouri, Graduate School, MU Informatics Institute, Columbia, MO 65211, United States; University of Missouri, College of Engineering, Department of Electrical and Computer Engineering, Columbia, MO 65211, United States
| |
Collapse
|
9
|
Masino AJ, Dechene ET, Dulik MC, Wilkens A, Spinner NB, Krantz ID, Pennington JW, Robinson PN, White PS. Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology. BMC Bioinformatics 2014; 15:248. [PMID: 25047600 PMCID: PMC4117966 DOI: 10.1186/1471-2105-15-248] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 07/16/2014] [Indexed: 12/21/2022] Open
Abstract
Background Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient’s sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term’s information content. Results Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient’s exome data and filtering non-exomic and common variants, the median rank improved to 3. Conclusions Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-248) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Peter S White
- Department of Pediatrics, Cincinnati Children's Hospital and Medical Center, Cincinnati, OH, USA.
| |
Collapse
|
10
|
Zhan Y, Zhang R, Lv H, Song X, Xu X, Chai L, Lv W, Shang Z, Jiang Y, Zhang R. Prioritization of candidate genes for periodontitis using multiple computational tools. J Periodontol 2014; 85:1059-69. [PMID: 24476546 DOI: 10.1902/jop.2014.130523] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
BACKGROUND Both genetic and environmental factors contribute to the development of periodontitis. Genetic studies identified a variety of candidate genes for periodontitis. The aim of the present study is to identify the most promising candidate genes for periodontitis using an integrative gene ranking method. METHODS Seed genes that were confirmed to be associated with periodontitis were identified using text mining. Three types of candidate genes were then extracted from different resources (expression profiles, genome-wide association studies). Combining the seed genes, four freely available bioinformatics tools (ToppGene, DIR, Endeavour, and GPEC) were integrated for prioritization of candidate genes. Candidate genes that identified with at least three programs and ranked in the top 20 by each program were considered the most promising. RESULTS Prioritization analysis resulted in 21 promising genes involved or potentially involved in periodontitis. Among them, IL18 (interleukin 18), CD44 (CD44 molecule), CXCL1 (chemokine [CXC motif] ligand 1), IL6ST (interleukin 6 signal transducer), MMP3 (matrix metallopeptidase 3), MMP7, CCR1 (chemokine [C-C motif] receptor 1), MMP13, and TLR9 (Toll-like receptor 9) had been associated with periodontitis. However, the roles of other genes, such as CSF3 (colony stimulating factor 3 receptor), CD40, TNFSF14 (tumor necrosis factor receptor superfamily, member 14), IFNB1 (interferon-β1), TIRAP (toll-interleukin 1 receptor domain containing adaptor protein), IL2RA (interleukin 2 receptor α), ETS1 (v-ets avian erythroblastosis virus E26 oncogene homolog 1), GADD45B (growth arrest and DNA-damage-inducible 45 β), BIRC3 (baculoviral IAP repeat containing 3), VAV1 (vav 1 guanine nucleotide exchange factor), COL5A1 (collagen, type V, α1), and C3 (complement component 3), have not been investigated thoroughly in the process of periodontitis. These genes are mainly involved in bacterial infection, immune response, and inflammatory reaction, suggesting that further characterizing their roles in periodontitis will be important. CONCLUSIONS A combination of computational tools will be useful in mining candidate genes for periodontitis. These theoretical results provide new clues for experimental biologists to plan targeted experiments.
Collapse
Affiliation(s)
- Yuanbo Zhan
- Department of Periodontology and Oral Mucosa, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 511] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|
12
|
Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M, Cantor M, Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T, Wilson J, Lynch N, Wise J, Dix I. Empowering industrial research with shared biomedical vocabularies. Drug Discov Today 2011; 16:940-7. [PMID: 21963522 PMCID: PMC7098809 DOI: 10.1016/j.drudis.2011.09.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2011] [Revised: 07/29/2011] [Accepted: 09/19/2011] [Indexed: 10/17/2022]
Abstract
The life science industries (including pharmaceuticals, agrochemicals and consumer goods) are exploring new business models for research and development that focus on external partnerships. In parallel, there is a desire to make better use of data obtained from sources such as human clinical samples to inform and support early research programmes. Success in both areas depends upon the successful integration of heterogeneous data from multiple providers and scientific domains, something that is already a major challenge within the industry. This issue is exacerbated by the absence of agreed standards that unambiguously identify the entities, processes and observations within experimental results. In this article we highlight the risks to future productivity that are associated with incomplete biological and chemical vocabularies and suggest a new model to address this long-standing issue.
Collapse
|
13
|
Sardana D, Zhu C, Zhang M, Gudivada RC, Yang L, Jegga AG. Drug repositioning for orphan diseases. Brief Bioinform 2011; 12:346-56. [PMID: 21504985 DOI: 10.1093/bib/bbr021] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The need and opportunity to discover therapeutics for rare or orphan diseases are enormous. Due to limited prevalence and/or commercial potential, of the approximately 6000 orphan diseases (defined by the FDA Orphan Drug Act as <200 000 US prevalence), only a small fraction (5%) is of interest to the biopharmaceutical industry. The fact that drug development is complicated, time-consuming and expensive with extremely low success rates only adds to the low rate of therapeutics available for orphan diseases. An alternative and efficient strategy to boost the discovery of orphan disease therapeutics is to find connections between an existing drug product and orphan disease. Drug Repositioning or Drug Repurposing--finding a new indication for a drug--is one way to maximize the potential of a drug. The advantages of this approach are manifold, but rational drug repositioning for orphan diseases is not trivial and poses several formidable challenges--pharmacologically and computationally. Most of the repositioned drugs currently in the market are the result of serendipity. One reason the connection between drug candidates and their potential new applications are not identified in an earlier or more systematic fashion is that the underlying mechanism 'connecting' them is either very intricate and unknown or indirect or dispersed and buried in an ever-increasing sea of information, much of which is emerging only recently and therefore is not well organized. In this study, we will review some of these issues and the current methodologies adopted or proposed to overcome them and translate chemical and biological discoveries into safe and effective orphan disease therapeutics.
Collapse
Affiliation(s)
- Divya Sardana
- Department of Computer Science, University of Cincinnati, OH, USA
| | | | | | | | | | | |
Collapse
|
14
|
Webster YW, Dow ER, Koehler J, Gudivada RC, Palakal MJ. Leveraging health social networking communities in translational research. J Biomed Inform 2011; 44:536-44. [PMID: 21284958 DOI: 10.1016/j.jbi.2011.01.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Revised: 12/17/2010] [Accepted: 01/25/2011] [Indexed: 01/19/2023]
Abstract
Health social networking communities are emerging resources for translational research. We have designed and implemented a framework called HyGen, which combines Semantic Web technologies, graph algorithms and user profiling to discover and prioritize novel associations across disciplines. This manuscript focuses on the key strategies developed to overcome the challenges in handling patient-generated content in Health social networking communities. Heuristic and quantitative evaluations were carried out in colorectal cancer. The results demonstrate the potential of our approach to bridge silos and to identify hidden links among clinical observations, drugs, genes and diseases. In Amyotrophic Lateral Sclerosis case studies, HyGen has identified 15 of the 20 published disease genes. Additionally, HyGen has highlighted new candidates for future investigations, as well as a scientifically meaningful connection between riluzole and alcohol abuse.
Collapse
Affiliation(s)
- Yue W Webster
- School of Informatics, Indiana University Purdue University, IN, USA.
| | | | | | | | | |
Collapse
|
15
|
Miles A, Zhao J, Klyne G, White-Cooper H, Shotton D. OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster. J Biomed Inform 2011; 43:752-61. [PMID: 20382263 DOI: 10.1016/j.jbi.2010.04.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Revised: 04/01/2010] [Accepted: 04/05/2010] [Indexed: 01/02/2023]
Abstract
MOTIVATION Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. RESULTS We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. AVAILABILITY The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.
Collapse
Affiliation(s)
- Alistair Miles
- Department of Zoology, University of Oxford, Oxford OX1 3PS, UK
| | | | | | | | | |
Collapse
|
16
|
Kann MG. Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief Bioinform 2010; 11:96-110. [PMID: 20007728 PMCID: PMC2810112 DOI: 10.1093/bib/bbp048] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Revised: 09/15/2009] [Indexed: 12/29/2022] Open
Abstract
Over a 100 years ago, William Bateson provided, through his observations of the transmission of alkaptonuria in first cousin offspring, evidence of the application of Mendelian genetics to certain human traits and diseases. His work was corroborated by Archibald Garrod (Archibald AE. The incidence of alkaptonuria: a study in chemical individuality. Lancert 1902;ii:1616-20) and William Farabee (Farabee WC. Inheritance of digital malformations in man. In: Papers of the Peabody Museum of American Archaeology and Ethnology. Cambridge, Mass: Harvard University, 1905; 65-78), who recorded the familial tendencies of inheritance of malformations of human hands and feet. These were the pioneers of the hunt for disease genes that would continue through the century and result in the discovery of hundreds of genes that can be associated with different diseases. Despite many ground-breaking discoveries during the last century, we are far from having a complete understanding of the intricate network of molecular processes involved in diseases, and we are still searching for the cures for most complex diseases. In the last few years, new genome sequencing and other high-throughput experimental techniques have generated vast amounts of molecular and clinical data that contain crucial information with the potential of leading to the next major biomedical discoveries. The need to mine, visualize and integrate these data has motivated the development of several informatics approaches that can broadly be grouped in the research area of 'translational bioinformatics'. This review highlights the latest advances in the field of translational bioinformatics, focusing on the advances of computational techniques to search for and classify disease genes.
Collapse
Affiliation(s)
- Maricel G Kann
- University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA.
| |
Collapse
|
17
|
Qu XA, Gudivada RC, Jegga AG, Neumann EK, Aronow BJ. Inferring novel disease indications for known drugs by semantically linking drug action and disease mechanism relationships. BMC Bioinformatics 2009; 10 Suppl 5:S4. [PMID: 19426461 PMCID: PMC2679404 DOI: 10.1186/1471-2105-10-s5-s4] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Discovering that drug entities already approved for one disease are effective treatments for other distinct diseases can be highly beneficial and cost effective. To do this predictively, our conjecture is that a semantic infrastructure linking mechanistic relationships between pharmacologic entities and multidimensional knowledge of biological systems and disease processes will be highly enabling. RESULTS To develop a knowledge framework capable of modeling and interconnecting drug actions and disease mechanisms across diverse biological systems contexts, we designed a Disease-Drug Correlation Ontology (DDCO), formalized in OWL, that integrates multiple ontologies, controlled vocabularies, and data schemas and interlinks these with diverse datasets extracted from pharmacological and biological domains. Using the complex disease Systemic Lupus Erythematosus (SLE) as an example, a high-dimensional pharmacome-diseasome graph network was generated as RDF XML, and subjected to graph-theoretic proximity and connectivity analytic approaches to rank drugs versus the compendium of SLE-associated genes, pathways, and clinical features. Tamoxifen, a current candidate therapeutic for SLE, was the highest ranked drug. CONCLUSION This early stage demonstration highlights critical directions to follow that will enable translational pharmacotherapeutic research. The uniform application of Semantic Web methodology to problems in data integration, knowledge representation, and analysis provides an efficient and potentially powerful means to allow mining of drug action and disease mechanism relationships. Further improvements in semantic representation of mechanistic relationships will provide a fertile basis for accelerated drug repositioning, reasoning, and discovery across the spectrum of human disease.
Collapse
Affiliation(s)
- Xiaoyan A Qu
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, OH, USA.
| | | | | | | | | |
Collapse
|
18
|
PADMANABHAN SANDOSH, HASTIE CLAIRE, SAINSBURY CHRISTOPHERA, MCBRIDE MARTINW, CONNELL JOHNM, DOMINICZAK ANNAF. THE CAT, THE FLY AND THE BEETLE — WHY GENETICS NEEDS A SEMANTIC EDUCATION. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2009. [DOI: 10.1142/s1793351x09000665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Major advances have been made in the understanding of the genetic basis of diseases since Mendel's publication of the results of plant breeding experiments in 1866. To date these advances have been largely confined to the monogenic disorders — caused by mutations in a single gene. The public-health implications of these advances is relatively limited. In this review we explore our current understanding of the genetic basis of human traits and the reasons why current theories may account for the difficulties in identifying the genes for common diseases. We then postulate that semantic computing may be rightly poised to help understand complex disease causation, and explore the efforts that have been made to date to develop the necessary technological approach to the problem.
Collapse
Affiliation(s)
- SANDOSH PADMANABHAN
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow G12 8TA, UK
| | - CLAIRE HASTIE
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow G12 8TA, UK
| | | | - MARTIN W. MCBRIDE
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow G12 8TA, UK
| | - JOHN M. CONNELL
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow G12 8TA, UK
| | - ANNA F. DOMINICZAK
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow G12 8TA, UK
| |
Collapse
|
19
|
Cheung KH, Kashyap V, Luciano JS, Chen H, Wang Y, Stephens S. Semantic mashup of biomedical data. J Biomed Inform 2008; 41:683-6. [PMID: 18703163 PMCID: PMC3742004 DOI: 10.1016/j.jbi.2008.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2008] [Revised: 07/30/2008] [Accepted: 08/05/2008] [Indexed: 12/24/2022]
Affiliation(s)
- KH Cheung
- Yale Center for Medical Informatics and Departments of Anesthesiology and Genetics, School of Medicine, Computer Science Department, Yale University, P.O. Box 208009, New Haven, CT 06520, USA
| | - V Kashyap
- Clinical Informatics R&D, Partners HealthCare System, Wellesley, Massachusetts, USA
| | | | - H Chen
- College of Computer Science, Zhejiang University, Hangzhou, China
| | - Y Wang
- Lilly Singapore Centre for Drug Discovery, Singapore
| | - S Stephens
- Discovery IT, Eli Lilly, Boston, MA, USA
| |
Collapse
|