151
|
A review of drug-induced liver injury databases. Arch Toxicol 2017; 91:3039-3049. [DOI: 10.1007/s00204-017-2024-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 06/28/2017] [Indexed: 01/23/2023]
|
152
|
Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I. GWAB: a web server for the network-based boosting of human genome-wide association data. Nucleic Acids Res 2017; 45:W154-W161. [PMID: 28449091 PMCID: PMC5793838 DOI: 10.1093/nar/gkx284] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2017] [Revised: 04/01/2017] [Accepted: 04/17/2017] [Indexed: 12/29/2022] Open
Abstract
During the last decade, genome-wide association studies (GWAS) have represented a major approach to dissect complex human genetic diseases. Due in part to limited statistical power, most studies identify only small numbers of candidate genes that pass the conventional significance thresholds (e.g. P ≤ 5 × 10-8). This limitation can be partly overcome by increasing the sample size, but this comes at a higher cost. Alternatively, weak association signals can be boosted by incorporating independent data. Previously, we demonstrated the feasibility of boosting GWAS disease associations using gene networks. Here, we present a web server, GWAB (www.inetbio.org/gwab), for the network-based boosting of human GWAS data. Using GWAS summary statistics (P-values) for SNPs along with reference genes for a disease of interest, GWAB reprioritizes candidate disease genes by integrating the GWAS and network data. We found that GWAB could more effectively retrieve disease-associated reference genes than GWAS could alone. As an example, we describe GWAB-boosted candidate genes for coronary artery disease and supporting data in the literature. These results highlight the inherent value in sub-threshold GWAS associations, which are often not publicly released. GWAB offers a feasible general approach to boost such associations for human disease genetics.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - Changbae Bang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - Tak Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - Sohyun Hwang
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si 13496, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - U Martin Singh-Blom
- Cognition Group, Schibsted Products & Technologies, Västra Järnvägsgatan 21, 111 64 Stockholm, Sweden
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA
- Department of Molecular Biosciences, University of Texas at Austin, TX 78712, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| |
Collapse
|
153
|
Baas M, Stubbs AP, van Zessen DB, Galjaard RJH, van der Spek PJ, Hovius SER, van Nieuwenhoven CA. Identification of Associated Genes and Diseases in Patients With Congenital Upper-Limb Anomalies: A Novel Application of the OMT Classification. J Hand Surg Am 2017; 42:533-545.e4. [PMID: 28669419 DOI: 10.1016/j.jhsa.2017.03.043] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 01/09/2017] [Accepted: 03/30/2017] [Indexed: 02/02/2023]
Abstract
PURPOSE Congenital upper-limb anomalies (CULA) can present as a part of a syndrome or association. There is a wide spectrum of CULA, each of which might be related to different diseases. The structure provided by the Oberg, Manske, and Tonkin (OMT) classification could aid in differential diagnosis formulation in patients with CULA. The aims of this study were to review the Human Phenotype Ontology (HPO) project database for diseases and causative genes related to the CULA described in the OMT classification and to develop a methodology for differential diagnosis formulation based on the observed congenital anomalies, CulaPhen. METHODS We reviewed the HPO database for all diseases, including causative genes related to CULA. All CULA were classified according to the OMT classification; associated non-hand phenotypes were classified into 12 anatomical groups. We analyzed the contribution of each anatomical group to a given disease and developed a tool for differential diagnosis formulation based on these contributions. We compared our results with cases from the literature and with a current HPO tool, Phenomizer. RESULTS In total, 514 hand phenotypes were obtained, 384 of which could be classified in the OMT classification. A total of 1,403 diseases could be related to those CULA. A comparison with 10 recently published cases with CULA revealed that the presented phenotype matched the descriptions in our dataset. The differential diagnosis produced using our methodology was more accurate than Phenomizer in 4 of 5 examples. CONCLUSIONS The OMT classification can be used to describe hand anomalies that may present in over 1,400 diseases. CulaPhen was developed to provide a (hand) phenotype-based differential diagnosis. Differential diagnosis formulation based on the proposed system outperforms the system in current use. CLINICAL RELEVANCE This study illustrates that the OMT diagnoses, either individually or combined, can be cross-referenced with different diseases and syndromes. Therefore, use of the OMT classification can aid differential diagnosis formulation for CULA patients.
Collapse
Affiliation(s)
- Martijn Baas
- Department of Plastic and Reconstructive Surgery and Hand Surgery, Erasmus MC, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Andrew P Stubbs
- Department of Clinical Genetics, Erasmus MC, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - David B van Zessen
- Department of Clinical Genetics, Erasmus MC, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Robert-Jan H Galjaard
- Department of Bioinformatics, Erasmus MC, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Peter J van der Spek
- Department of Clinical Genetics, Erasmus MC, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Steven E R Hovius
- Department of Plastic and Reconstructive Surgery and Hand Surgery, Erasmus MC, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Christianne A van Nieuwenhoven
- Department of Plastic and Reconstructive Surgery and Hand Surgery, Erasmus MC, Erasmus University Medical Center, Rotterdam, The Netherlands.
| |
Collapse
|
154
|
Ankeny RA. Geneticization in MIM/OMIM®? Exploring Historic and Epistemic Drivers of Contemporary Understandings of Genetic Disease. THE JOURNAL OF MEDICINE AND PHILOSOPHY 2017. [DOI: 10.1093/jmp/jhx013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
155
|
Le DH, Pham VH. HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network. BMC SYSTEMS BIOLOGY 2017; 11:61. [PMID: 28619054 PMCID: PMC5472867 DOI: 10.1186/s12918-017-0437-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 05/31/2017] [Indexed: 12/31/2022]
Abstract
Background Finding gene-disease and disease-disease associations play important roles in the biomedical area and many prioritization methods have been proposed for this goal. Among them, approaches based on a heterogeneous network of genes and diseases are considered state-of-the-art ones, which achieve high prediction performance and can be used for diseases with/without known molecular basis. Results Here, we developed a Cytoscape app, namely HGPEC, based on a random walk with restart algorithm on a heterogeneous network of genes and diseases. This app can prioritize candidate genes and diseases by employing a heterogeneous network consisting of a network of genes/proteins and a phenotypic disease similarity network. Based on the rankings, novel disease-gene and disease-disease associations can be identified. These associations can be supported with network- and rank-based visualization as well as evidences and annotations from biomedical data. A case study on prediction of novel breast cancer-associated genes and diseases shows the abilities of HGPEC. In addition, we showed prominence in the performance of HGPEC compared to other tools for prioritization of candidate disease genes. Conclusions Taken together, our app is expected to effectively predict novel disease-gene and disease-disease associations and support network- and rank-based visualization as well as biomedical evidences for such the associations. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0437-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Duc-Hau Le
- Vinmec Research Institute of Stem Cell and Gene Technology, 458 Minh Khai, Hai Ba Trung, Hanoi, Vietnam.,Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam
| | - Van-Huy Pham
- Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam.
| |
Collapse
|
156
|
Holmes RS, Spradling-Reeves KD, Cox LA. Mammalian Glutamyl Aminopeptidase Genes (ENPEP) and Proteins: Comparative Studies of a Major Contributor to Arterial Hypertension. JOURNAL OF DATA MINING IN GENOMICS & PROTEOMICS 2017; 8:2. [PMID: 29900035 PMCID: PMC5995572 DOI: 10.4172/2153-0602.1000211] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Glutamyl aminopeptidase (ENPEP) is a member of the M1 family of endopeptidases which are mammalian type II integral membrane zinc-containing endopeptidases. ENPEP is involved in the catabolic pathway of the renin-angiotensin system forming angiotensin III, which participates in blood pressure regulation and blood vessel formation. Comparative ENPEP amino acid sequences and structures and ENPEP gene locations were examined using data from several mammalian genome projects. Mammalian ENPEP sequences shared 71-98% identities. Five N-glycosylation sites were conserved for all mammalian ENPEP proteins examined although 9-18 sites were observed, in each case. Sequence alignments, key amino acid residues and predicted secondary and tertiary structures were also studied, including transmembrane and cytoplasmic sequences and active site residues. Highest levels of human ENPEP expression were observed in the terminal ileum of the small intestine and in the kidney cortex. Mammalian ENPEP genes contained 20 coding exons. The human ENPEP gene promoter and first coding exon contained a CpG island (CpG27) and at least 6 transcription factor binding sites, whereas the 3'-UTR region contained 7 miRNA target sites, which may contribute to the regulation of ENPEP gene expression in tissues of the body. Phylogenetic analyses examined the relationships of mammalian ENPEP genes and proteins, including primate, other eutherian, marsupial and monotreme sources, using chicken ENPEP as a primordial sequence for comparative purposes.
Collapse
Affiliation(s)
- Roger S Holmes
- Department of Genetics and Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA
- Griffith Institute for Drug Design and School of Natural Sciences, Griffith University, Nathan, QLD, Australia
| | - Kimberly D Spradling-Reeves
- Department of Genetics and Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA
| | - Laura A Cox
- Department of Genetics and Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA
| |
Collapse
|
157
|
Abstract
Deciphering gene–disease association is a crucial step in designing therapeutic strategies against diseases. There are experimental methods for identifying gene–disease associations, such as genome-wide association studies and linkage analysis, but these can be expensive and time consuming. As a result, various
in silico methods for predicting associations from these and other data have been developed using different approaches. In this article, we review some of the recent approaches to the computational prediction of gene–disease association. We look at recent advancements in algorithms, categorising them into those based on genome variation, networks, text mining, and crowdsourcing. We also look at some of the challenges faced in the computational prediction of gene–disease associations.
Collapse
Affiliation(s)
- Kenneth Opap
- University of Cape Town, Cape Town, South Africa
| | | |
Collapse
|
158
|
A Computational Approach to Identify the Biophysical and Structural Aspects of Methylenetetrahydrofolate Reductase (MTHFR) Mutations (A222V, E429A, and R594Q) Leading to Schizophrenia. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2017; 108:105-125. [PMID: 28427558 DOI: 10.1016/bs.apcsb.2017.01.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The association between depression and methylenetetrahydrofolate reductase (MTHFR) has been continually demonstrated in clinical studies, yet there are sparse resources available to build a relationship between the mutations associated with MTHFR and depression. The common mutations found to be associated with schizophrenia and MTHFR are A222V, E429A, and R594Q. Although abundant research on structural and functional effects caused by A222V mutation is available, very less amount of studies have been done on the other two mutants (E429A and R594Q). Hence in this study, a comparative analysis was carried out between the most common A222V mutation, a prevalent E429A mutation, and a less prevalent and less deleterious R594Q mutation. To predict structural rearrangements upon mutation, we proposed a computational pipeline using in silico prediction tools, molecular docking, and molecular dynamics simulation analysis. Since the association of flavin adenine dinucleotide (FAD) is important for the functioning of the protein, binding analysis between protein and the coenzyme was performed. This would enable us to understand the interference level of each mutation over FAD-binding activity. Consequently, we found that two mutations (A222V and E429A) showed lesser binding activity and structural deviations when compared to the native molecule and mutant R594Q. Comparatively, higher structural changes were observed with A222V mutant complex in comparison to other mutant complexes. Computational studies like this could render better insights into the structural changes in the protein and their relationship with the disease condition.
Collapse
|
159
|
Fu S, Liu X, Luo M, Xie K, Nice EC, Zhang H, Huang C. Proteogenomic studies on cancer drug resistance: towards biomarker discovery and target identification. Expert Rev Proteomics 2017; 14:351-362. [PMID: 28276747 DOI: 10.1080/14789450.2017.1299006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Chemoresistance is a major obstacle for current cancer treatment. Proteogenomics is a powerful multi-omics research field that uses customized protein sequence databases generated by genomic and transcriptomic information to identify novel genes (e.g. noncoding, mutation and fusion genes) from mass spectrometry-based proteomic data. By identifying aberrations that are differentially expressed between tumor and normal pairs, this approach can also be applied to validate protein variants in cancer, which may reveal the response to drug treatment. Areas covered: In this review, we will present recent advances in proteogenomic investigations of cancer drug resistance with an emphasis on integrative proteogenomic pipelines and the biomarker discovery which contributes to achieving the goal of using precision/personalized medicine for cancer treatment. Expert commentary: The discovery and comprehensive understanding of potential biomarkers help identify the cohort of patients who may benefit from particular treatments, and will assist real-time clinical decision-making to maximize therapeutic efficacy and minimize adverse effects. With the development of MS-based proteomics and NGS-based sequencing, a growing number of proteogenomic tools are being developed specifically to investigate cancer drug resistance.
Collapse
Affiliation(s)
- Shuyue Fu
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| | - Xiang Liu
- b Department of Pathology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Maochao Luo
- c West China School of Public Health, Sichuan University , Chengdu , P.R.China
| | - Ke Xie
- d Department of Oncology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Edouard C Nice
- e Department of Biochemistry and Molecular Biology , Monash University , Clayton , Australia
| | - Haiyuan Zhang
- f School of Medicine , Yangtze University , P. R. China
| | - Canhua Huang
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| |
Collapse
|
160
|
McClelland KS, Yao HHC. Leveraging Online Resources to Prioritize Candidate Genes for Functional Analyses: Using the Fetal Testis as a Test Case. Sex Dev 2017; 11:1-20. [PMID: 28196369 PMCID: PMC6171109 DOI: 10.1159/000455113] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2016] [Indexed: 01/03/2023] Open
Abstract
With each new microarray or RNA-seq experiment, massive quantities of transcriptomic information are generated with the purpose to produce a list of candidate genes for functional analyses. Yet an effective strategy remains elusive to prioritize the genes on these candidate lists. In this review, we outline a prioritizing strategy by taking a step back from the bench and leveraging the rich range of public databases. This in silico approach provides an economical, less biased, and more effective solution. We discuss the publicly available online resources that can be used to answer a range of questions about a gene. Is the gene of interest expressed in the system of interest (using expression databases)? Where else is this gene expressed (using added-value transcriptomic resources)? What pathways and processes is the gene involved in (using enriched gene pathway analysis and mouse knockout databases)? Is this gene correlated with human diseases (using human disease variant databases)? Using mouse fetal testis as an example, our strategies identified 298 genes annotated as expressed in the fetal testis. We cross-referenced these genes to existing microarray data and narrowed the list down to cell-type-specific candidates (35 for Sertoli cells, 11 for Leydig cells, and 25 for germ cells). Our strategies can be customized so that they allow researchers to effectively and confidently prioritize genes for functional analysis.
Collapse
Affiliation(s)
- Kathryn S McClelland
- Reproductive and Developmental Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | | |
Collapse
|
161
|
|
162
|
Chadaeva IV, Ponomarenko MP, Rasskazov DA, Sharypova EB, Kashina EV, Matveeva MY, Arshinova TV, Ponomarenko PM, Arkova OV, Bondar NP, Savinkova LK, Kolchanov NA. Candidate SNP markers of aggressiveness-related complications and comorbidities of genetic diseases are predicted by a significant change in the affinity of TATA-binding protein for human gene promoters. BMC Genomics 2016; 17:995. [PMID: 28105927 PMCID: PMC5249025 DOI: 10.1186/s12864-016-3353-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Aggressiveness in humans is a hereditary behavioral trait that mobilizes all systems of the body-first of all, the nervous and endocrine systems, and then the respiratory, vascular, muscular, and others-e.g., for the defense of oneself, children, family, shelter, territory, and other possessions as well as personal interests. The level of aggressiveness of a person determines many other characteristics of quality of life and lifespan, acting as a stress factor. Aggressive behavior depends on many parameters such as age, gender, diseases and treatment, diet, and environmental conditions. Among them, genetic factors are believed to be the main parameters that are well-studied at the factual level, but in actuality, genome-wide studies of aggressive behavior appeared relatively recently. One of the biggest projects of the modern science-1000 Genomes-involves identification of single nucleotide polymorphisms (SNPs), i.e., differences of individual genomes from the reference genome. SNPs can be associated with hereditary diseases, their complications, comorbidities, and responses to stress or a drug. Clinical comparisons between cohorts of patients and healthy volunteers (as a control) allow for identifying SNPs whose allele frequencies significantly separate them from one another as markers of the above conditions. Computer-based preliminary analysis of millions of SNPs detected by the 1000 Genomes project can accelerate clinical search for SNP markers due to preliminary whole-genome search for the most meaningful candidate SNP markers and discarding of neutral and poorly substantiated SNPs. RESULTS Here, we combine two computer-based search methods for SNPs (that alter gene expression) {i} Web service SNP_TATA_Comparator (DNA sequence analysis) and {ii} PubMed-based manual search for articles on aggressiveness using heuristic keywords. Near the known binding sites for TATA-binding protein (TBP) in human gene promoters, we found aggressiveness-related candidate SNP markers, including rs1143627 (associated with higher aggressiveness in patients undergoing cytokine immunotherapy), rs544850971 (higher aggressiveness in old women taking lipid-lowering medication), and rs10895068 (childhood aggressiveness-related obesity in adolescence with cardiovascular complications in adulthood). CONCLUSIONS After validation of these candidate markers by clinical protocols, these SNPs may become useful for physicians (may help to improve treatment of patients) and for the general population (a lifestyle choice preventing aggressiveness-related complications).
Collapse
Affiliation(s)
- Irina V. Chadaeva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk, 630090 Russia
| | - Mikhail P. Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk, 630090 Russia
| | - Dmitry A. Rasskazov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Ekaterina B. Sharypova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Elena V. Kashina
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Marina Yu Matveeva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Tatjana V. Arshinova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Petr M. Ponomarenko
- Children’s Hospital Los Angeles, 4640 Hollywood Boulevard, University of Southern California, Los Angeles, CA 90027 USA
| | - Olga V. Arkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
- Vector-Best Inc, Koltsovo, Novosibirsk Region 630559 Russia
| | - Natalia P. Bondar
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Ludmila K. Savinkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
| | - Nikolay A. Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk, 630090 Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk, 630090 Russia
| |
Collapse
|
163
|
Raghuraman P, Jesu Jaya Sudan R, Lesitha Jeeva Kumari J, Sudandiradoss C. Casting the critical regions in nucleotide binding oligomerization domain 2 protein: a signature mediated structural dynamics approach. J Biomol Struct Dyn 2016; 35:3297-3315. [PMID: 27790943 DOI: 10.1080/07391102.2016.1254116] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Nucleotide binding oligomerization domain 2 (NOD2), a protein involved in the first line defence mechanism has a pivotal role in innate immunity. Impaired function of this protein is implicated in disorders such as Blau syndrome and Crohn's disease. Since an altered function is linked to protein's structure, we framed a systematic strategy to interpret the structure-function relationship of the protein. Initiated with mutation-based pattern prediction and identified a distant ortholog (DO) of NOD2 from which the intra-residue interaction network was elucidated. The network was used to identify hotspots that serve as critical points to maintain the stable architecture of the protein. Structural comparison of NOD2 domains with a DO revealed the minimal number of intra-protein interactions required by the protein to maintain the structural fold. In addition, the conventional molecular dynamics simulation emphasized the conformational transitions at hot spot residues between native NOD2 domains and its respective mutants (G116R, R42W and R54A) structures. The analysis of intra-protein interactions globally and the displacement of residues locally around the mutational site revealed loss of several critical bonds and residues vital for the protein's function. Conclusively we report, about 10 residues in leucine-rich repeat, 13 residues in NOD and 6 residues in CARD domain are required by the NOD2 to maintain its function. This protocol will help the researchers to achieve for more prospective studies to attest druggable site utility in discovering novel drug candidates.
Collapse
Affiliation(s)
- P Raghuraman
- a Department of Biotechnology , School of Bioscience and Technology, VIT University , Vellore 632301 , India
| | - R Jesu Jaya Sudan
- a Department of Biotechnology , School of Bioscience and Technology, VIT University , Vellore 632301 , India
| | - J Lesitha Jeeva Kumari
- a Department of Biotechnology , School of Bioscience and Technology, VIT University , Vellore 632301 , India
| | - C Sudandiradoss
- a Department of Biotechnology , School of Bioscience and Technology, VIT University , Vellore 632301 , India
| |
Collapse
|
164
|
Guo Y, Alexander K, Clark AG, Grimson A, Yu H. Integrated network analysis reveals distinct regulatory roles of transcription factors and microRNAs. RNA (NEW YORK, N.Y.) 2016; 22:1663-1672. [PMID: 27604961 PMCID: PMC5066619 DOI: 10.1261/rna.048025.114] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 07/25/2016] [Indexed: 06/06/2023]
Abstract
Analysis of transcription regulatory networks has revealed many principal features that govern gene expression regulation. MicroRNAs (miRNAs) have emerged as another major class of gene regulators that influence gene expression post-transcriptionally, but there remains a need to assess quantitatively their global roles in gene regulation. Here, we have constructed an integrated gene regulatory network comprised of transcription factors (TFs), miRNAs, and their target genes and analyzed the effect of regulation on target mRNA expression, target protein expression, protein-protein interaction, and disease association. We found that while target genes regulated by the same TFs tend to be co-expressed, co-regulation by miRNAs does not lead to co-expression assessed at either mRNA or protein levels. Analysis of interacting protein pairs in the regulatory network revealed that compared to genes co-regulated by miRNAs, a higher fraction of genes co-regulated by TFs encode proteins in the same complex. Although these results suggest that genes co-regulated by TFs are more functionally related than those co-regulated by miRNAs, genes that share either TF or miRNA regulators are more likely to cause the same disease. Further analysis on the interplay between TFs and miRNAs suggests that TFs tend to regulate intramodule/pathway clusters, while miRNAs tend to regulate intermodule/pathway clusters. These results demonstrate that although TFs and miRNAs both regulate gene expression, they occupy distinct niches in the overall regulatory network within the cell.
Collapse
Affiliation(s)
- Yu Guo
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York 14853, USA
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Katherine Alexander
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew G Clark
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew Grimson
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Haiyuan Yu
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York 14853, USA
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
165
|
Abstract
The fact that lung cancer is a heterogeneous disease suggests that there is a high likelihood that effective lung cancer biomarkers will need to address patient-specific molecular defects, clinical characters, and aspects of the tumor microenvironment. In this transition, clinical bioinformatics tools and resources are the most appropriate means to improve the analysis, as major biological databases are now containing clinical data alongside genomics, proteomics, and other biological data. Clinical bioinformatics comprises a series of concepts and approaches that have been used successfully both to delineate novel biological mechanisms and to drive translational advances in individualized healthcare. In this article, we outline several of emerging clinical bioinformatics-based strategies as they apply specifically to lung cancer.
Collapse
Affiliation(s)
- Duojiao Wu
- Zhongshan Hospital of Fudan University, Biomedical Research Center, Shanghai Institute of Clinical Bioinformatics, Fucan University Center for Clinical Bioinformatics, Shanghai, 200032, China
| | | |
Collapse
|
166
|
Raimondi D, Orlando G, Messens J, Vranken WF. Investigating the Molecular Mechanisms Behind Uncharacterized Cysteine Losses from Prediction of Their Oxidation State. Hum Mutat 2016; 38:86-94. [PMID: 27667481 DOI: 10.1002/humu.23129] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 09/13/2016] [Accepted: 09/20/2016] [Indexed: 01/08/2023]
Abstract
Cysteines are among the rarest amino acids in nature, and are both functionally and structurally very important for proteins. The ability of cysteines to form disulfide bonds is especially relevant, both for constraining the folded state of the protein and for performing enzymatic duties. But how does the variation record of human proteins reflect their functional importance and structural role, especially with regard to deleterious mutations? We created HUMCYS, a manually curated dataset of single amino acid variants that (1) have a known disease/neutral phenotypic outcome and (2) cause the loss of a cysteine, in order to investigate how mutated cysteines relate to structural aspects such as surface accessibility and cysteine oxidation state. We also have developed a sequence-based in silico cysteine oxidation predictor to overcome the scarcity of experimentally derived oxidation annotations, and applied it to extend our analysis to classes of proteins for which the experimental determination of their structure is technically challenging, such as transmembrane proteins. Our investigation shows that we can gain insights into the reason behind the outcome of cysteine losses in otherwise uncharacterized proteins, and we discuss the possible molecular mechanisms leading to deleterious phenotypes, such as the involvement of the mutated cysteine in a structurally or enzymatically relevant disulfide bond.
Collapse
Affiliation(s)
- Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,Structural Biology Research Center (SBRC), VIB, Brussels, Belgium.,Machine Learning Group, ULB, Brussels, Belgium
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,Structural Biology Research Center (SBRC), VIB, Brussels, Belgium.,Machine Learning Group, ULB, Brussels, Belgium
| | - Joris Messens
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,Structural Biology Research Center (SBRC), VIB, Brussels, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,Structural Biology Research Center (SBRC), VIB, Brussels, Belgium
| |
Collapse
|
167
|
Döring K, Grüning BA, Telukunta KK, Thomas P, Günther S. PubMedPortable: A Framework for Supporting the Development of Text Mining Applications. PLoS One 2016; 11:e0163794. [PMID: 27706202 PMCID: PMC5051953 DOI: 10.1371/journal.pone.0163794] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 09/14/2016] [Indexed: 11/18/2022] Open
Abstract
Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.
Collapse
Affiliation(s)
- Kersten Döring
- Pharmaceutical Bioinformatics, Institute of Pharmaceutical Sciences, Albert-Ludwigs University, 79104 Freiburg, Germany
| | - Björn A. Grüning
- Bioinformatics, Institute of Computer Science, Albert-Ludwigs University, 79110 Freiburg, Germany
| | - Kiran K. Telukunta
- Bioinformatics, Institute of Computer Science, Albert-Ludwigs University, 79110 Freiburg, Germany
| | - Philippe Thomas
- Language Technology Lab, German Research Center for Artificial Intelligence, DFKI GmbH, 10559 Berlin, Germany
| | - Stefan Günther
- Pharmaceutical Bioinformatics, Institute of Pharmaceutical Sciences, Albert-Ludwigs University, 79104 Freiburg, Germany
- * E-mail:
| |
Collapse
|
168
|
Ivanov PC, Liu KKL, Bartsch RP. Focus on the emerging new fields of Network Physiology and Network Medicine. NEW JOURNAL OF PHYSICS 2016; 18:100201. [PMID: 30881198 PMCID: PMC6415921 DOI: 10.1088/1367-2630/18/10/100201] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Despite the vast progress and achievements in systems biology and integrative physiology in the last decades, there is still a significant gap in understanding the mechanisms through which (i) genomic, proteomic and metabolic factors and signaling pathways impact vertical processes across cells, tissues and organs leading to the expression of different disease phenotypes and influence the functional and clinical associations between diseases, and (ii) how diverse physiological systems and organs coordinate their functions over a broad range of space and time scales and horizontally integrate to generate distinct physiologic states at the organism level. Two emerging fields, network medicine and network physiology, aim to address these fundamental questions. Novel concepts and approaches derived from recent advances in network theory, coupled dynamical systems, statistical and computational physics show promise to provide new insights into the complexity of physiological structure and function in health and disease, bridging the genetic and sub-cellular level with inter-cellular interactions and communications among integrated organ systems and sub-systems. These advances form first building blocks in the methodological formalism and theoretical framework necessary to address fundamental problems and challenges in physiology and medicine. This 'focus on' issue contains 26 articles representing state-of-the-art contributions covering diverse systems from the sub-cellular to the organism level where physicists have key role in laying the foundations of these new fields.
Collapse
Affiliation(s)
- Plamen Ch Ivanov
- Keck Laboratory for Network Physiology, Department of Physics, Boston University, Boston, Massachusetts, USA
- Harvard Medical School and Division of Sleep Medicine, Brigham and Women Hospital, Boston, MA 02115, USA
- Institute of Solid State Physics, Bulgarian Academy of Sciences, Sofia 1784, Bulgaria
- (Editor of the ‘focus on’ issue)
| | - Kang K L Liu
- Keck Laboratory for Network Physiology, Department of Physics, Boston University, Boston, Massachusetts, USA
- Department of Neurology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA
| | - Ronny P Bartsch
- Department of Physics, Bar-Ilan University, Ramat Gan, 5290002, Israel
| |
Collapse
|
169
|
Ali SK, Sneha P, Priyadharshini Christy J, Zayed H, George Priya Doss C. Molecular dynamics-based analyses of the structural instability and secondary structure of the fibrinogen gamma chain protein with the D356V mutation. J Biomol Struct Dyn 2016; 35:2714-2724. [PMID: 27677677 DOI: 10.1080/07391102.2016.1229634] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Mutations in the fibrinogen gamma chain (FGG) gene have been associated with various disorders, such as dysfibrinogenemia, thrombophilia, and hypofibrinogenemia. A literature survey showed that a residue exchange in fibrinogen Milano I from γ Asp to Val at position 330 impairs fibrin polymerization. The D356V (D330V) mutation located in the C-terminus was predicted to be highly deleterious and to affect the function of the protein. The pathogenicity of the altered gene and changes in protein functions were predicted using in silico methods, such as SIFT, PolyPhen 2, I-Mutant 3.0, Align GV-GD, PhD-SNP, and SNPs&GO. The secondary structure of the mutant protein was unwound by the end of the 50-ns simulation period, and a structural change in the helix-turn transition of the alpha-helical (352-356) region residues was observed. Moreover, a change in the length of the helical region was visualized in the mutant trajectory file, indicating the local transient unfolding of the protein. The obtained computational results suggest that the substitution of the neutral amino acid valine for the acidic amino acid aspartic acid at position 356 results in an unwound conformation within 50 ns, which might contribute to defective polymerization. Our analysis also provides insights into the effect of the conformational change in the D356V (D330V) mutant on protein structure and function.
Collapse
Affiliation(s)
- Shabana Kouser Ali
- a Department of Integrative Biology, School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India
| | - P Sneha
- a Department of Integrative Biology, School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India
| | - J Priyadharshini Christy
- a Department of Integrative Biology, School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India
| | - Hatem Zayed
- b Biomedical Sciences Program, College of Health Sciences , Qatar University , P.O. Box 2713, Doha , Qatar
| | - C George Priya Doss
- a Department of Integrative Biology, School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India
| |
Collapse
|
170
|
Literature-based knowledgebase of pancreatic cancer gene to prioritize the key genes and pathways. J Genet Genomics 2016; 43:569-571. [DOI: 10.1016/j.jgg.2016.04.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Revised: 03/22/2016] [Accepted: 04/17/2016] [Indexed: 11/19/2022]
|
171
|
Kwon YK, Kim J, Cho KH. Dynamical Robustness against Multiple Mutations in Signaling Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:996-1002. [PMID: 26529781 DOI: 10.1109/tcbb.2015.2495251] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
It has been known that the robust behavior of a cellular signaling network is strongly related to the structural characteristics of the network, such as connectivity, the number of feedback loops, and the number of feed-forward loops. Previous studies proved such relationships through dynamical simulations of various random network models. Most of them, however, focused on robustness against a single node mutation. Considering that complex diseases such as cancer are mostly caused by simultaneous dysfunction of multiple genes, it is needed to investigate the robustness of a network against multiple node mutations. In this paper, we investigated the robustness of a network against multiple node mutations through extensive simulations on the basis of Boolean network models. We found that the robustness against multiple mutations is, in most cases, weaker than the robustness against a single node mutation on average. Moreover, we found that the robustness against multiple mutations is strongly positively correlated with the robustness against single mutation. The difference between the multiple- and single-mutation robustness became larger as the number of mutated nodes increased or the number of nodes that are robust to single-mutation decreased. We further found that a node of relatively large connectivity or being involved with many feedback loops tends to be non-robust against multiple mutations. This finding is supported by the observation that poly-genic disease genes have high connectivity and are involved with a large number of feedback loops than mono-genic disease genes in a human signaling network. Together, our study shows that previous studies for a single node mutation can be extended to understand the network dynamics for multiple node mutations.
Collapse
|
172
|
Kwon YK. Properties of Boolean dynamics by node classification using feedback loops in a network. BMC SYSTEMS BIOLOGY 2016; 10:83. [PMID: 27558408 PMCID: PMC4997653 DOI: 10.1186/s12918-016-0322-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 07/14/2016] [Indexed: 11/23/2022]
Abstract
Background Biological networks keep their functions robust against perturbations. Many previous studies through simulations or experiments have shown that feedback loop (FBL) structures play an important role in controlling the network robustness without fully explaining how they do it. Hence, there is a pressing need to more rigorously analyze the influence of FBL structures on network robustness. Results In this paper, I propose a novel node classification notion based on the FBL structures involved. More specifically, I classify a node as a no-FBL-in-upstream (NFU) or no-FBL-in-downstream (NFD) node if no feedback loop is involved with any upstream or downstream path of the node, respectively. Based on those definitions, I first prove that every NFU node is eventually frozen in Boolean dynamics. Thus, NFU nodes converge to a fixed value determined by the upstream source nodes. Second, I prove that a network is robust against an arbitrary state perturbation subject to a non-source NFD node. This implies that a network state eventually sustains the attractor despite a perturbation subject to a non-source NFD node. Inspired by this result, I further propose a perturbation-sustainable probability that indicates how likely a perturbation effect is to be sustained through propagations. I show that genes with a high perturbation-sustainable probability are likely to be essential, disease, and drug-target genes in large human signaling networks. Conclusion Taken together, these results will promote understanding of the effects of FBL on network robustness in a more rigorous manner. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0322-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yung-Keun Kwon
- School of Electrical Engineering, University of Ulsan, 93 Daehak-ro, Nam-gu, Ulsan, 44610, Republic of Korea.
| |
Collapse
|
173
|
iFish: predicting the pathogenicity of human nonsynonymous variants using gene-specific/family-specific attributes and classifiers. Sci Rep 2016; 6:31321. [PMID: 27527004 PMCID: PMC4985647 DOI: 10.1038/srep31321] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 07/18/2016] [Indexed: 12/30/2022] Open
Abstract
Accurate prediction of the pathogenicity of genomic variants, especially nonsynonymous single nucleotide variants (nsSNVs), is essential in biomedical research and clinical genetics. Most current prediction methods build a generic classifier for all genes. However, different genes and gene families have different features. We investigated whether gene-specific and family-specific customized classifiers could improve prediction accuracy. Customized gene-specific and family-specific attributes were selected with AIC, BIC, and LASSO, and Support Vector Machine classifiers were generated for 254 genes and 152 gene families, covering a total of 5,985 genes. Our results showed that the customized attributes reflected key features of the genes and gene families, and the customized classifiers achieved higher prediction accuracy than the generic classifier. The customized classifiers and the generic classifier for other genes and families were integrated into a new tool named iFish (integrated Functional inference of SNVs in human, http://ifish.cbi.pku.edu.cn). iFish outperformed other methods on benchmark datasets as well as on prioritization of candidate causal variants from whole exome sequencing. iFish provides a user-friendly web-based interface and supports other functionalities such as integration of genetic evidence. iFish would facilitate high-throughput evaluation and prioritization of nsSNVs in human genetics research.
Collapse
|
174
|
Kamaraj B, Purohit R. Mutational Analysis on Membrane Associated Transporter Protein (MATP) and Their Structural Consequences in Oculocutaeous Albinism Type 4 (OCA4)-A Molecular Dynamics Approach. J Cell Biochem 2016; 117:2608-19. [PMID: 27019209 DOI: 10.1002/jcb.25555] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 03/24/2016] [Indexed: 12/11/2022]
Abstract
Oculocutaneous albinism type IV (OCA4) is an autosomal recessive inherited disorder which is characterized by reduced biosynthesis of melanin pigmentation in skin, hair, and eyes and caused by the genetic mutations in the membrane-associated transporter protein (MATP) encoded by SLC45A2 gene. The MATP protein consists of 530 amino acids which contains 12 putative transmembrane domains and plays an important role in pigmentation and probably functions as a membrane transporter in melanosomes. We scrutinized the most OCA4 disease-associated mutation and their structural consequences on SLC45A2 gene. To understand the atomic arrangement in 3D space, the native and mutant structures were modeled. Further the structural behavior of native and mutant MATP protein was investigated by molecular dynamics simulation (MDS) approach in explicit lipid and water background. We found Y317C as the most deleterious and disease-associated SNP on SLC45A2 gene. In MDS, mutations in MATP protein showed loss of stability and became more flexible, which alter its structural conformation and function. This phenomenon has indicated a significant role in inducing OCA4. Our study explored the understanding of molecular mechanism of MATP protein upon mutation at atomic level and further helps in the field of pharmacogenomics to develop a personalized medicine for OCA4 disorder. J. Cell. Biochem. 117: 2608-2619, 2016. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Balu Kamaraj
- Research Group PLASMANT, Department of Chemistry, University of Antwerp, Universiteitsplein 1, 2610, Wilrijk-Antwerp, Belgium
| | - Rituraj Purohit
- Department of Biotechnology, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India.
| |
Collapse
|
175
|
Griffon N, Schuers M, Dhombres F, Merabti T, Kerdelhué G, Rollin L, Darmoni SJ. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge. BMC Med Inform Decis Mak 2016; 16:101. [PMID: 27484923 PMCID: PMC4970261 DOI: 10.1186/s12911-016-0333-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 07/09/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. METHODS Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. RESULTS For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). CONCLUSION The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.
Collapse
Affiliation(s)
- N Griffon
- Department of Biomedical Informatics, Rouen University Hospital, TIBS, LITIS EA 4108, Rouen University, 76031, Rouen Cedex, France. .,INSERM, U1142, LIMICS, 75006, Paris, France; Sorbonne Universités, UPMC Univ Paris 06 UMR_S 1142, LIMICS, 75006, Paris, France; Univ Paris 13, Sorbonne Paris Cité, LIMICS (UMR_S 1142), 93430, Villetaneuse, France.
| | - M Schuers
- Department of Biomedical Informatics, Rouen University Hospital, TIBS, LITIS EA 4108, Rouen University, 76031, Rouen Cedex, France.,Department of Family Practice, Rouen University, Rouen, France
| | - F Dhombres
- INSERM, U1142, LIMICS, 75006, Paris, France; Sorbonne Universités, UPMC Univ Paris 06 UMR_S 1142, LIMICS, 75006, Paris, France; Univ Paris 13, Sorbonne Paris Cité, LIMICS (UMR_S 1142), 93430, Villetaneuse, France.,Service de Médecine Fœtale, Hôpital Trousseau - Hôpitaux Universitaires de l'Est Parisien (APHP), Université Pierre et Marie Curie, Paris, France
| | - T Merabti
- Department of Biomedical Informatics, Rouen University Hospital, TIBS, LITIS EA 4108, Rouen University, 76031, Rouen Cedex, France
| | - G Kerdelhué
- Department of Biomedical Informatics, Rouen University Hospital, TIBS, LITIS EA 4108, Rouen University, 76031, Rouen Cedex, France
| | - L Rollin
- Department of Biomedical Informatics, Rouen University Hospital, TIBS, LITIS EA 4108, Rouen University, 76031, Rouen Cedex, France.,Department of Occupational Medicine, Rouen University Hospital, Rouen, France
| | - S J Darmoni
- Department of Biomedical Informatics, Rouen University Hospital, TIBS, LITIS EA 4108, Rouen University, 76031, Rouen Cedex, France.,INSERM, U1142, LIMICS, 75006, Paris, France; Sorbonne Universités, UPMC Univ Paris 06 UMR_S 1142, LIMICS, 75006, Paris, France; Univ Paris 13, Sorbonne Paris Cité, LIMICS (UMR_S 1142), 93430, Villetaneuse, France
| |
Collapse
|
176
|
A gene browser of colorectal cancer with literature evidence and pre-computed regulatory information to identify key tumor suppressors and oncogenes. Sci Rep 2016; 6:30624. [PMID: 27477450 PMCID: PMC4967895 DOI: 10.1038/srep30624] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2016] [Accepted: 07/06/2016] [Indexed: 02/07/2023] Open
Abstract
Colorectal cancer (CRC) is a cancer of growing incidence that associates with a high mortality rate worldwide. There is a poor understanding of the heterogeneity of CRC with regard to causative genetic mutations and gene regulatory mechanisms. Previous studies have identified several susceptibility genes in small-scale experiments. However, the information has not been comprehensively and systematically compiled and interpreted. In this study, we constructed the gbCRC, the first literature-based gene resource for investigating CRC-related human genes. The features of our database include: (i) manual curation of experimentally-verified genes reported in the literature; (ii) comprehensive integration of five reliable data sources; and (iii) pre-computed regulatory patterns involving transcription factors, microRNAs and long non-coding RNAs. In total, 2067 genes associating with 2819 PubMed abstracts were compiled. Comprehensive functional annotations associated with all the genes, including gene expression profiles, homologous genes in other model species, protein-protein interactions, somatic mutations, and potential methylation sites. These comprehensive annotations and this pre-computed regulatory information highlighted the importance of the gbCRC with regard to the unexplored regulatory network of CRC. This information is available in a plain text format that is free to download.
Collapse
|
177
|
Vieira Braga FA, Teichmann SA, Chen X. Genetics and immunity in the era of single-cell genomics. Hum Mol Genet 2016; 25:R141-R148. [PMID: 27412011 PMCID: PMC5036872 DOI: 10.1093/hmg/ddw192] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 06/15/2016] [Indexed: 12/28/2022] Open
Abstract
Recent developments in the field of single-cell genomics (SCG) are changing our understanding of how functional phenotypes of cell populations emerge from the behaviour of individual cells. Some of the applications of SCG include the discovery of new gene networks and novel cell subpopulations, fine mapping of transcription kinetics, and the relationships between cell clonality and their functional phenotypes. Immunology is one of the fields that is benefiting the most from such advancements, providing us with completely new insights into mammalian immunity. In this review, we start by covering new immunological insights originating from the use of single-cell genomic tools, specifically single-cell RNA-sequencing. Furthermore, we discuss how new genetic study designs are starting to explain inter-individual variation in the immune response. We conclude with a perspective on new multi-omics technologies capable of integrating several readouts from the same single cell and how such techniques might push our biological understanding of mammalian immunity to a new level.
Collapse
Affiliation(s)
| | - Sarah A Teichmann
- Wellcome Trust Sanger Institute European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Cavendish Laboratory, Cambridge University, Cambridge, UK
| | - Xi Chen
- Wellcome Trust Sanger Institute
| |
Collapse
|
178
|
Kaplun A, Krull M, Lakshman K, Matys V, Lewicki B, Hogan JD. Establishing and validating regulatory regions for variant annotation and expression analysis. BMC Genomics 2016; 17 Suppl 2:393. [PMID: 27357948 PMCID: PMC4928138 DOI: 10.1186/s12864-016-2724-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND The regulatory effect of inherited or de novo genetic variants occurring in promoters as well as in transcribed or even coding gene regions is gaining greater recognition as a contributing factor to disease processes in addition to mutations affecting protein functionality. Thousands of such regulatory mutations are already recorded in HGMD, OMIM, ClinVar and other databases containing published disease causing and associated mutations. It is therefore important to properly annotate genetic variants occurring in experimentally verified and predicted transcription factor binding sites (TFBS) that could thus influence the factor binding event. Selection of the promoter sequence used is an important factor in the analysis as it directly influences the composition of the sequence available for transcription factor binding analysis. RESULTS In this study we first establish genomic regions likely to be involved in regulation of gene expression. TRANSFAC uses a method of virtual transcription start sites (vTSS) calculation to define the best supported promoter for a gene. We have performed a comparison of the virtually calculated promoters between the best supported and secondary promoters in hg19 and hg38 reference genomes to test and validate the approach. Next we create and utilize a workflow for systematic analysis of casual disease associated variants in TFBS using Genome Trax and TRANSFAC databases. A total of 841 and 736 experimentally verified TFBSs within best supported promoters were mapped over HGMD and ClinVar mutation sites respectively. Tens of thousands of predicted ChIP-Seq derived TFBSs were mapped over mutations as well. We have further analyzed some of these mutations for potential gain or loss in transcription factor binding. CONCLUSIONS We have confirmed the validity of TRANSFAC's approach to define the best supported promoters and established a workflow of their use in annotation of regulatory genetic variants.
Collapse
Affiliation(s)
- Alexander Kaplun
- QIAGEN Bioinformatics, 35 Gatehouse Drive, Waltham, MA, 02451, USA.
| | - Mathias Krull
- QIAGEN Bioinformatics, 35 Gatehouse Drive, Waltham, MA, 02451, USA
| | | | - Volker Matys
- QIAGEN Bioinformatics, 35 Gatehouse Drive, Waltham, MA, 02451, USA
| | - Birgit Lewicki
- QIAGEN Bioinformatics, 35 Gatehouse Drive, Waltham, MA, 02451, USA
| | - Jennifer D Hogan
- QIAGEN Bioinformatics, 35 Gatehouse Drive, Waltham, MA, 02451, USA
| |
Collapse
|
179
|
Zhao M, Chen L, Liu Y, Qu H. GCGene: a gene resource for gastric cancer with literature evidence. Oncotarget 2016; 7:33983-93. [PMID: 27127885 PMCID: PMC5085132 DOI: 10.18632/oncotarget.9030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 04/16/2016] [Indexed: 12/31/2022] Open
Abstract
Gastric cancer (GC) is the fifth most common cancer and third leading cause of cancer-related deaths worldwide. Its lethality primarily stems from a lack of detection strategies for early stages of GC and a lack of noninvasive detection strategies for advanced stages. The development of early diagnostic biomarkers largely depends on understanding the biological pathways and regulatory mechanisms associated with putative GC genes. Unfortunately, the GC-implicated genes that have been identified thus far are scattered among thousands of published studies, and no systematic summary is available, which hinders the development of a large-scale genetic screen. To provide a publically accessible resource tool to meet this need, we constructed a literature-based database GCGene (Gastric Cancer Gene database) with comprehensive annotations supported by a user-friendly website. In the current release, we have collected 1,815 unique human genes including 1,678 protein-coding and 137 non-coding genes curated from extensive examination of 3,142 PubMed abstracts. The resulting database has a convenient web-based interface to facilitate both textual and sequence-based searches. All curated genes in GCGene are downloadable for advanced bioinformatics data mining. Gene prioritization was performed to rank the relative relevance of these genes in GC development. The 100 top-ranked genes are highly mutated according to the cohort of published studies we reviewed. By conducting a network analysis of these top-ranked GC-associated genes in the human interactome, we were able to identify strong links between 8 highly connected genes with low expression and patient survival time. GCGene is freely available to academic users at http://gcgene.bioinfo-minzhao.org/.
Collapse
Affiliation(s)
- Min Zhao
- School of Engineering, Faculty of Science, Health, Education and Engineering, University of The Sunshine Coast, Maroochydore DC, Queensland, Australia
| | - Luming Chen
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, P.R. China
| | - Yining Liu
- School of Engineering, Faculty of Science, Health, Education and Engineering, University of The Sunshine Coast, Maroochydore DC, Queensland, Australia
| | - Hong Qu
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, P.R. China
| |
Collapse
|
180
|
Thirumal Kumar D, George Priya Doss C, Sneha P, Tayubi IA, Siva R, Chakraborty C, Magesh R. Influence of V54M mutation in giant muscle protein titin: a computational screening and molecular dynamics approach. J Biomol Struct Dyn 2016; 35:917-928. [PMID: 27125723 DOI: 10.1080/07391102.2016.1166456] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Recent genetic studies have revealed the impact of mutations in associated genes for cardiac sarcomere components leading to dilated cardiomyopathy (DCM). The cardiac sarcomere is composed of thick and thin filaments and a giant muscle protein known as titin or connectin. Titin interacts with T-cap/telethonin in the Z-line region and plays a vital role in regulating sarcomere assembly. Initially, we screened all the variants associated with giant protein titin and analyzed their impact with the aid of pathogenicity and stability prediction methods. V54M mutation found in the hydrophobic core region of the protein associated with abnormal clinical phenotype leads to DCM was selected for further analysis. To address this issue, we mapped the deleterious mutant V54M, modeled the mutant protein complex, and deciphered the impact of mutation on binding with its partner telethonin in the titin crystal structure of PDB ID: 1YA5 with the aid of docking analysis. Furthermore, two run molecular dynamics simulation was initiated to understand the mechanistic action of V54M mutation in altering the protein structure, dynamics, and stability. According to the results obtained from the repeated 50 ns trajectory files, the overall effect of V54M mutation was destabilizing and transition of bend to coil in the secondary structure was observed. Furthermore, MMPBSA elucidated that V54M found in the Z-line region of titin decreases the binding affinity of titin to Z-line proteins T-cap/telethonin thereby hindering the protein-protein interaction.
Collapse
Affiliation(s)
- D Thirumal Kumar
- a School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India
| | - C George Priya Doss
- a School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India
| | - P Sneha
- a School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India
| | - Iftikhar Aslam Tayubi
- a School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India.,b Faculty of Computing and Information Technology , King Abdulaziz University , Rabigh 21911 , Saudi Arabia
| | - R Siva
- a School of Biosciences and Technology , VIT University , Vellore , Tamil Nadu 632014 , India
| | - Chiranjib Chakraborty
- c Department of Bio-informatics , School of Computer and Information Sciences, Galgotias University , Greater Noida , Uttar Pradesh 201306 , India
| | - R Magesh
- d Faculty of Biomedical Sciences, Technology & Research, Department of Biotechnology , Sri Ramachandra University , Chennai , Tamil Nadu 600116 , India
| |
Collapse
|
181
|
Yang J, Wu SJ, Yang SY, Peng JW, Wang SN, Wang FY, Song YX, Qi T, Li YX, Li YY. DNetDB: The human disease network database based on dysfunctional regulation mechanism. BMC SYSTEMS BIOLOGY 2016; 10:36. [PMID: 27209279 PMCID: PMC4875653 DOI: 10.1186/s12918-016-0280-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 05/05/2016] [Indexed: 11/18/2022]
Abstract
Disease similarity study provides new insights into disease taxonomy, pathogenesis, which plays a guiding role in diagnosis and treatment. The early studies were limited to estimate disease similarities based on clinical manifestations, disease-related genes, medical vocabulary concepts or registry data, which were inevitably biased to well-studied diseases and offered small chance of discovering novel findings in disease relationships. In other words, genome-scale expression data give us another angle to address this problem since simultaneous measurement of the expression of thousands of genes allows for the exploration of gene transcriptional regulation, which is believed to be crucial to biological functions. Although differential expression analysis based methods have the potential to explore new disease relationships, it is difficult to unravel the upstream dysregulation mechanisms of diseases. We therefore estimated disease similarities based on gene expression data by using differential coexpression analysis, a recently emerging method, which has been proved to be more potential to capture dysfunctional regulation mechanisms than differential expression analysis. A total of 1,326 disease relationships among 108 diseases were identified, and the relevant information constituted the human disease network database (DNetDB). Benefiting from the use of differential coexpression analysis, the potential common dysfunctional regulation mechanisms shared by disease pairs (i.e. disease relationships) were extracted and presented. Statistical indicators, common disease-related genes and drugs shared by disease pairs were also included in DNetDB. In total, 1,326 disease relationships among 108 diseases, 5,598 pathways, 7,357 disease-related genes and 342 disease drugs are recorded in DNetDB, among which 3,762 genes and 148 drugs are shared by at least two diseases. DNetDB is the first database focusing on disease similarity from the viewpoint of gene regulation mechanism. It provides an easy-to-use web interface to search and browse the disease relationships and thus helps to systematically investigate etiology and pathogenesis, perform drug repositioning, and design novel therapeutic interventions. Database URL: http://app.scbit.org/DNetDB/#.
Collapse
Affiliation(s)
- Jing Yang
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China.,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P.R. China
| | - Su-Juan Wu
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China
| | - Shao-You Yang
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China.,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China
| | - Jia-Wei Peng
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China.,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China
| | - Shi-Nuo Wang
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China.,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China
| | - Fu-Yan Wang
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China.,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China
| | - Yu-Xing Song
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China.,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China
| | - Ting Qi
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China.,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China
| | - Yi-Xue Li
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China. .,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P.R. China. .,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Shanghai Engineering Research Center of Pharmaceutical Translation, 1278 Keyuan Road, Shanghai, 201203, P.R. China.
| | - Yuan-Yuan Li
- Shanghai Center for Bioinformation Technology, Shanghai, 200235, P.R. China. .,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Shanghai Engineering Research Center of Pharmaceutical Translation, 1278 Keyuan Road, Shanghai, 201203, P.R. China.
| |
Collapse
|
182
|
Zhang Q, Nogales-Cadenas R, Lin JR, Zhang W, Cai Y, Vijg J, Zhang ZD. Systems-level analysis of human aging genes shed new light on mechanisms of aging. Hum Mol Genet 2016; 25:2934-2947. [PMID: 27179790 DOI: 10.1093/hmg/ddw145] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 04/07/2016] [Accepted: 05/09/2016] [Indexed: 11/13/2022] Open
Abstract
Although studies over the last decades have firmly connected a number of genes and molecular pathways to aging, the aging process as a whole still remains poorly understood. To gain novel insights into the mechanisms underlying aging, instead of considering aging genes individually, we studied their characteristics at the systems level in the context of biological networks. We calculated a comprehensive set of network characteristics for human aging-related genes from the GenAge database. By comparing them with other functional groups of genes, we identified a robust group of aging-specific network characteristics. To find the structural basis and the molecular mechanisms underlying this aging-related network specificity, we also analyzed protein domain interactions and gene expression patterns across different tissues. Our study revealed that aging genes not only tend to be network hubs, playing important roles in communication among different functional modules or pathways, but also are more likely to physically interact and be co-expressed with essential genes. The high expression of aging genes across a large number of tissue types also points to a high level of connectivity among aging genes. Unexpectedly, contrary to the depletion of interactions among hub genes in biological networks, we observed close interactions among aging hubs, which renders the aging subnetworks vulnerable to random attacks and thus may contribute to the aging process. Comparison across species reveals the evolution process of the aging subnetwork. As the organisms become more complex, the complexity of its aging mechanisms increases and their aging hub genes are more functionally connected.
Collapse
Affiliation(s)
| | | | | | | | | | - Jan Vijg
- Department of Genetics.,Department of Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, Bronx, NY, USA
| | | |
Collapse
|
183
|
Singhal A, Simmons M, Lu Z. Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature. J Am Med Inform Assoc 2016; 23:766-72. [PMID: 27121612 DOI: 10.1093/jamia/ocw041] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 02/19/2016] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine. MATERIALS AND METHODS We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular disease. In order to predict a relationship between the mutation and the target disease, several features, such as statistical features, distance features, and sentiment features, were constructed. Our ML model was trained with a pre-labeled dataset consisting of manually curated information about mutation-disease associations. The model was subsequently used to extract disease-related mutations from larger biomedical literature corpora. RESULTS The performance of the proposed approach was assessed using a benchmarking dataset. Results show that our proposed approach gains significant improvement over the previous state of the art and obtains F-measures of 0.880 and 0.845 for prostate and breast cancer mutations, respectively. DISCUSSION To demonstrate its utility, we applied our approach to all abstracts in PubMed for 3 diseases (including a non-cancer disease). The mutations extracted were then manually validated against human-curated databases. The validation results show that the proposed approach is useful in a real-world setting to extract uncurated disease mutations from the biomedical literature. CONCLUSIONS The proposed approach improves the state of the art for mutation-disease extraction from text. It is scalable and generalizable to identify mutations for any disease at a PubMed scale.
Collapse
Affiliation(s)
- Ayush Singhal
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| | - Michael Simmons
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
184
|
Nguyen VN, Huang KY, Weng JTY, Lai KR, Lee TY. UbiNet: an online resource for exploring the functional associations and regulatory networks of protein ubiquitylation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw054. [PMID: 27114492 PMCID: PMC4843525 DOI: 10.1093/database/baw054] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 03/20/2016] [Indexed: 12/19/2022]
Abstract
Protein ubiquitylation catalyzed by E3 ubiquitin ligases are crucial in the regulation of many cellular processes. Owing to the high throughput of mass spectrometry-based proteomics, a number of methods have been developed for the experimental determination of ubiquitylation sites, leading to a large collection of ubiquitylation data. However, there exist no resources for the exploration of E3-ligase-associated regulatory networks of for ubiquitylated proteins in humans. Therefore, the UbiNet database was developed to provide a full investigation of protein ubiquitylation networks by incorporating experimentally verified E3 ligases, ubiquitylated substrates and protein-protein interactions (PPIs). To date, UbiNet has accumulated 43 948 experimentally verified ubiquitylation sites from 14 692 ubiquitylated proteins of humans. Additionally, we have manually curated 499 E3 ligases as well as two E1 activating and 46 E2 conjugating enzymes. To delineate the regulatory networks among E3 ligases and ubiquitylated proteins, a total of 430 530 PPIs were integrated into UbiNet for the exploration of ubiquitylation networks with an interactive network viewer. A case study demonstrated that UbiNet was able to decipher a scheme for the ubiquitylation of tumor proteins p63 and p73 that is consistent with their functions. Although the essential role of Mdm2 in p53 regulation is well studied, UbiNet revealed that Mdm2 and additional E3 ligases might be implicated in the regulation of other tumor proteins by protein ubiquitylation. Moreover, UbiNet could identify potential substrates for a specific E3 ligase based on PPIs and substrate motifs. With limited knowledge about the mechanisms through which ubiquitylated proteins are regulated by E3 ligases, UbiNet offers users an effective means for conducting preliminary analyses of protein ubiquitylation. The UbiNet database is now freely accessible via http://csb.cse.yzu.edu.tw/UbiNet/ The content is regularly updated with the literature and newly released data.Database URL: http://csb.cse.yzu.edu.tw/UbiNet/.
Collapse
Affiliation(s)
- Van-Nui Nguyen
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan University of Information and Communication Technology, Thai Nguyen University, Vietnam and
| | - Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| | - K Robert Lai
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| |
Collapse
|
185
|
Di Fruscio G, Schulz A, De Cegli R, Savarese M, Mutarelli M, Parenti G, Banfi S, Braulke T, Nigro V, Ballabio A. Lysoplex: An efficient toolkit to detect DNA sequence variations in the autophagy-lysosomal pathway. Autophagy 2016; 11:928-38. [PMID: 26075876 PMCID: PMC4502703 DOI: 10.1080/15548627.2015.1043077] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The autophagy-lysosomal pathway (ALP) regulates cell homeostasis and plays a crucial role in human diseases, such as lysosomal storage disorders (LSDs) and common neurodegenerative diseases. Therefore, the identification of DNA sequence variations in genes involved in this pathway and their association with human diseases would have a significant impact on health. To this aim, we developed Lysoplex, a targeted next-generation sequencing (NGS) approach, which allowed us to obtain a uniform and accurate coding sequence coverage of a comprehensive set of 891 genes involved in lysosomal, endocytic, and autophagic pathways. Lysoplex was successfully validated on 14 different types of LSDs and then used to analyze 48 mutation-unknown patients with a clinical phenotype of neuronal ceroid lipofuscinosis (NCL), a genetically heterogeneous subtype of LSD. Lysoplex allowed us to identify pathogenic mutations in 67% of patients, most of whom had been unsuccessfully analyzed by several sequencing approaches. In addition, in 3 patients, we found potential disease-causing variants in novel NCL candidate genes. We then compared the variant detection power of Lysoplex with data derived from public whole exome sequencing (WES) efforts. On average, a 50% higher number of validated amino acid changes and truncating variations per gene were identified. Overall, we identified 61 truncating sequence variations and 488 missense variations with a high probability to cause loss of function in a total of 316 genes. Interestingly, some loss-of-function variations of genes involved in the ALP pathway were found in homozygosity in the normal population, suggesting that their role is not essential. Thus, Lysoplex provided a comprehensive catalog of sequence variants in ALP genes and allows the assessment of their relevance in cell biology as well as their contribution to human disease.
Collapse
|
186
|
Piñero J, Berenstein A, Gonzalez-Perez A, Chernomoretz A, Furlong LI. Uncovering disease mechanisms through network biology in the era of Next Generation Sequencing. Sci Rep 2016; 6:24570. [PMID: 27080396 PMCID: PMC4832203 DOI: 10.1038/srep24570] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2015] [Accepted: 03/31/2016] [Indexed: 12/25/2022] Open
Abstract
Characterizing the behavior of disease genes in the context of biological networks has the potential to shed light on disease mechanisms, and to reveal both new candidate disease genes and therapeutic targets. Previous studies addressing the network properties of disease genes have produced contradictory results. Here we have explored the causes of these discrepancies and assessed the relationship between the network roles of disease genes and their tolerance to deleterious germline variants in human populations leveraging on: the abundance of interactome resources, a comprehensive catalog of disease genes and exome variation data. We found that the most salient network features of disease genes are driven by cancer genes and that genes related to different types of diseases play network roles whose centrality is inversely correlated to their tolerance to likely deleterious germline mutations. This proved to be a multiscale signature, including global, mesoscopic and local network centrality features. Cancer driver genes, the most sensitive to deleterious variants, occupy the most central positions, followed by dominant disease genes and then by recessive disease genes, which are tolerant to variants and isolated within their network modules.
Collapse
Affiliation(s)
- Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), DCEXS, Pompeu Fabra University (UPF). C/Dr. Aiguader, 88. 08003- Barcelona, Spain
| | - Ariel Berenstein
- Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires. Pabellón 1, Ciudad Universitaria, Buenos Aires, Argentina.,Instituto de Física de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Pabellón 1, Ciudad Universitaria, Buenos Aires, Argentina
| | - Abel Gonzalez-Perez
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), DCEXS, Pompeu Fabra University (UPF). C/Dr. Aiguader, 88. 08003- Barcelona, Spain
| | - Ariel Chernomoretz
- Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires. Pabellón 1, Ciudad Universitaria, Buenos Aires, Argentina.,Instituto de Física de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Pabellón 1, Ciudad Universitaria, Buenos Aires, Argentina.,Laboratorio de Biología de Sistemas Integrativa, Fundación Instituto Leloir, Buenos Aires, Argentina
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), DCEXS, Pompeu Fabra University (UPF). C/Dr. Aiguader, 88. 08003- Barcelona, Spain
| |
Collapse
|
187
|
Mahmood ASMA, Wu TJ, Mazumder R, Vijay-Shanker K. DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS One 2016; 11:e0152725. [PMID: 27073839 PMCID: PMC4830514 DOI: 10.1371/journal.pone.0152725] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 03/19/2016] [Indexed: 11/22/2022] Open
Abstract
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.
Collapse
Affiliation(s)
- A. S. M. Ashique Mahmood
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
- * E-mail:
| | - Tsung-Jung Wu
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, District of Columbia, United States of America
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, District of Columbia, United States of America
- McCormick Genomic and Proteomic Center, George Washington University, Washington, District of Columbia, United States of America
| | - K. Vijay-Shanker
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
| |
Collapse
|
188
|
Li J, Batcha AMN, Grüning B, Mansmann UR. An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology. Cancer Inform 2016; 14:87-107. [PMID: 27081306 PMCID: PMC4827795 DOI: 10.4137/cin.s30793] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 03/02/2016] [Accepted: 03/17/2016] [Indexed: 12/23/2022] Open
Abstract
Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility.
Collapse
Affiliation(s)
- Jian Li
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany.; German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Aarif Mohamed Nazeer Batcha
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany.; German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany.; Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Ulrich R Mansmann
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany
| |
Collapse
|
189
|
Turnaev II, Rasskazov DA, Arkova OV, Ponomarenko MP, Ponomarenko PM, Savinkova LK, Kolchanov NA. Hypothetical SNP markers that significantly affect the affinity of the TATA-binding protein to VEGFA, ERBB2, IGF1R, FLT1, KDR, and MET oncogene promoters as chemotherapy targets. Mol Biol 2016. [DOI: 10.1134/s0026893316010209] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
190
|
James RA, Campbell IM, Chen ES, Boone PM, Rao MA, Bainbridge MN, Lupski JR, Yang Y, Eng CM, Posey JE, Shaw CA. A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics. Genome Med 2016; 8:13. [PMID: 26838676 PMCID: PMC4736244 DOI: 10.1186/s13073-016-0261-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 01/05/2016] [Indexed: 12/22/2022] Open
Abstract
Background Genome-wide data are increasingly important in the clinical evaluation of human disease. However, the large number of variants observed in individual patients challenges the efficiency and accuracy of diagnostic review. Recent work has shown that systematic integration of clinical phenotype data with genotype information can improve diagnostic workflows and prioritization of filtered rare variants. We have developed visually interactive, analytically transparent analysis software that leverages existing disease catalogs, such as the Online Mendelian Inheritance in Man database (OMIM) and the Human Phenotype Ontology (HPO), to integrate patient phenotype and variant data into ranked diagnostic alternatives. Methods Our tool, “OMIM Explorer” (http://www.omimexplorer.com), extends the biomedical application of semantic similarity methods beyond those reported in previous studies. The tool also provides a simple interface for translating free-text clinical notes into HPO terms, enabling clinical providers and geneticists to contribute phenotypes to the diagnostic process. The visual approach uses semantic similarity with multidimensional scaling to collapse high-dimensional phenotype and genotype data from an individual into a graphical format that contextualizes the patient within a low-dimensional disease map. The map proposes a differential diagnosis and algorithmically suggests potential alternatives for phenotype queries—in essence, generating a computationally assisted differential diagnosis informed by the individual’s personal genome. Visual interactivity allows the user to filter and update variant rankings by interacting with intermediate results. The tool also implements an adaptive approach for disease gene discovery based on patient phenotypes. Results We retrospectively analyzed pilot cohort data from the Baylor Miraca Genetics Laboratory, demonstrating performance of the tool and workflow in the re-analysis of clinical exomes. Our tool assigned to clinically reported variants a median rank of 2, placing causal variants in the top 1 % of filtered candidates across the 47 cohort cases with reported molecular diagnoses of exome variants in OMIM Morbidmap genes. Our tool outperformed Phen-Gen, eXtasy, PhenIX, PHIVE, and hiPHIVE in the prioritization of these clinically reported variants. Conclusions Our integrative paradigm can improve efficiency and, potentially, the quality of genomic medicine by more effectively utilizing available phenotype information, catalog data, and genomic knowledge. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0261-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Regis A James
- Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Ian M Campbell
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Edward S Chen
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Philip M Boone
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Mitchell A Rao
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Matthew N Bainbridge
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - James R Lupski
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.,Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA.,Department of Pediatrics, Texas Children's Hospital, Houston, TX, USA
| | - Yaping Yang
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Miraca Genetics Laboratories, Baylor College of Medicine, Houston, TX, USA
| | - Christine M Eng
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Miraca Genetics Laboratories, Baylor College of Medicine, Houston, TX, USA
| | - Jennifer E Posey
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Chad A Shaw
- Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA. .,Department of Statistics, Rice University, Houston, TX, 77005, USA.
| |
Collapse
|
191
|
Abstract
The increasing cost of drug development together with a significant drop in the number of new drug approvals raises the need for innovative approaches for target identification and efficacy prediction. Here, we take advantage of our increasing understanding of the network-based origins of diseases to introduce a drug-disease proximity measure that quantifies the interplay between drugs targets and diseases. By correcting for the known biases of the interactome, proximity helps us uncover the therapeutic effect of drugs, as well as to distinguish palliative from effective treatments. Our analysis of 238 drugs used in 78 diseases indicates that the therapeutic effect of drugs is localized in a small network neighborhood of the disease genes and highlights efficacy issues for drugs used in Parkinson and several inflammatory disorders. Finally, network-based proximity allows us to predict novel drug-disease associations that offer unprecedented opportunities for drug repurposing and the detection of adverse effects. Attempts to predict novel use for existing drugs rarely consider information on the impact on the genes perturbed in a given disease. Here, the authors present a novel network-based drug-disease proximity measure that provides insight on gene specific therapeutic effect of drugs and may facilitate drug repurposing.
Collapse
|
192
|
Gawad C, Koh W, Quake SR. Single-cell genome sequencing: current state of the science. Nat Rev Genet 2016; 17:175-88. [PMID: 26806412 DOI: 10.1038/nrg.2015.16] [Citation(s) in RCA: 875] [Impact Index Per Article: 109.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The field of single-cell genomics is advancing rapidly and is generating many new insights into complex biological systems, ranging from the diversity of microbial ecosystems to the genomics of human cancer. In this Review, we provide an overview of the current state of the field of single-cell genome sequencing. First, we focus on the technical challenges of making measurements that start from a single molecule of DNA, and then explore how some of these recent methodological advancements have enabled the discovery of unexpected new biology. Areas highlighted include the application of single-cell genomics to interrogate microbial dark matter and to evaluate the pathogenic roles of genetic mosaicism in multicellular organisms, with a focus on cancer. We then attempt to predict advances we expect to see in the next few years.
Collapse
Affiliation(s)
- Charles Gawad
- Departments of Oncology and Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Winston Koh
- Departments of Bioengineering and Applied Physics, Stanford University, Stanford, California 94304, USA.,Howard Hughes Medical Institute, Stanford University, California 94304, USA
| | - Stephen R Quake
- Departments of Bioengineering and Applied Physics, Stanford University, Stanford, California 94304, USA.,Howard Hughes Medical Institute, Stanford University, California 94304, USA
| |
Collapse
|
193
|
Abstract
Network alignment has become a standard tool in comparative biology, allowing the inference of protein function, interaction, and orthology. However, current alignment techniques are based on topological properties of networks and do not take into account their functional implications. Here we propose, for the first time, an algorithm to align two metabolic networks by taking advantage of their coupled metabolic models. These models allow us to assess the functional implications of genes or reactions, captured by the metabolic fluxes that are altered following their deletion from the network. Such implications may spread far beyond the region of the network where the gene or reaction lies. We apply our algorithm to align metabolic networks from various organisms, ranging from bacteria to humans, showing that our alignment can reveal functional orthology relations that are missed by conventional topological alignments.
Collapse
Affiliation(s)
- Arnon Mazza
- 1 Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel
| | - Allon Wagner
- 1 Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel .,2 Department of Electrical Engineering and Computer Science, University of California , Berkeley, Berkeley, California
| | - Eytan Ruppin
- 1 Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel .,3 The Sackler School of Medicine, Tel Aviv University , Tel Aviv, Israel .,4 Center for Bioinformatics and Computational Biology and Department of Computer Science, University of Maryland , College Park, Maryland
| | - Roded Sharan
- 1 Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel
| |
Collapse
|
194
|
Zhao M, Liu Y, O'Mara TA. ECGene: A Literature-Based Knowledgebase of Endometrial Cancer Genes. Hum Mutat 2016; 37:337-43. [PMID: 26699919 PMCID: PMC5066700 DOI: 10.1002/humu.22950] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 12/16/2015] [Indexed: 12/14/2022]
Abstract
Endometrial cancer (EC) ranks as the sixth common cancer for women worldwide. To better distinguish cancer subtypes and identify effective early diagnostic biomarkers, we need improved understanding of the biological mechanisms associated with EC dysregulated genes. Although there is a wealth of clinical and molecular information relevant to EC in the literature, there has been no systematic summary of EC‐implicated genes. In this study, we developed a literature‐based database ECGene (Endometrial Cancer Gene database) with comprehensive annotations. ECGene features manual curation of 414 genes from thousands of publications, results from eight EC gene expression datasets, precomputation of coexpressed long noncoding RNAs, and an EC‐implicated gene interactome. In the current release, we generated and comprehensively annotated a list of 458 EC‐implicated genes. We found the top‐ranked EC‐implicated genes are frequently mutated in The Cancer Genome Atlas (TCGA) tumor samples. Furthermore, systematic analysis of coexpressed lncRNAs provided insight into the important roles of lncRNA in EC development. ECGene has a user‐friendly Web interface and is freely available at http://ecgene.bioinfo‐minzhao.org/. As the first literature‐based online resource for EC, ECGene serves as a useful gateway for researchers to explore EC genetics.
Collapse
Affiliation(s)
- Min Zhao
- School of Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, 4558, Australia
| | - Yining Liu
- School of Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, 4558, Australia
| | - Tracy A O'Mara
- Genetics and Computational Biology Department, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, 4006, Australia
| |
Collapse
|
195
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2016; 44:D7-19. [PMID: 26615191 PMCID: PMC4702911 DOI: 10.1093/nar/gkv1290] [Citation(s) in RCA: 1048] [Impact Index Per Article: 131.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Revised: 11/04/2015] [Accepted: 11/05/2015] [Indexed: 11/25/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
|
196
|
Darrason M. Mechanistic and topological explanations in medicine: the case of medical genetics and network medicine. SYNTHESE 2015; 195:147-173. [PMID: 32214509 PMCID: PMC7089272 DOI: 10.1007/s11229-015-0983-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 11/28/2015] [Indexed: 06/10/2023]
Abstract
Medical explanations have often been thought on the model of biological ones and are frequently defined as mechanistic explanations of a biological dysfunction. In this paper, I argue that topological explanations, which have been described in ecology or in cognitive sciences, can also be found in medicine and I discuss the relationships between mechanistic and topological explanations in medicine, through the example of network medicine and medical genetics. Network medicine is a recent discipline that relies on the analysis of various disease networks (including disease-gene networks) in order to find organizing principles in disease explanation. My aim is to show how topological explanations in network medicine can help solving the conceptual issues that pure mechanistic explanations of the genetics of disease are currently facing, namely the crisis of the concept of genetic disease, the progressive geneticization of diseases and the dissolution of the distinction between monogenic and polygenic diseases. However, I will also argue that topological explanations should not be considered as independent and radically different from mechanistic explanations for at least two reasons. First, in network medicine, topological explanations depend on and use mechanistic information. Second, they leave out some missing gaps in disease explanation that require, in turn, the development of new mechanistic explanations. Finally, I will insist on the specific contribution of topological explanations in medicine: they push us to develop an explanation of disease in general, instead of focusing on single explanations of individual diseases. This last point may have major consequences for biomedical research.
Collapse
Affiliation(s)
- Marie Darrason
- Institut d’Histoire et de Philosophie des Sciences et des Techniques (IHPST - CNRS / Université Paris 1 Panthéon Sorbonne / ENS), 13 rue du Four, 75006 Paris, France
| |
Collapse
|
197
|
Arkova OV, Ponomarenko MP, Rasskazov DA, Drachkova IA, Arshinova TV, Ponomarenko PM, Savinkova LK, Kolchanov NA. Obesity-related known and candidate SNP markers can significantly change affinity of TATA-binding protein for human gene promoters. BMC Genomics 2015; 16 Suppl 13:S5. [PMID: 26694100 PMCID: PMC4686794 DOI: 10.1186/1471-2164-16-s13-s5] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Obesity affects quality of life and life expectancy and is associated with cardiovascular disorders, cancer, diabetes, reproductive disorders in women, prostate diseases in men, and congenital anomalies in children. The use of single nucleotide polymorphism (SNP) markers of diseases and drug responses (i.e., significant differences of personal genomes of patients from the reference human genome) can help physicians to improve treatment. Clinical research can validate SNP markers via genotyping of patients and demonstration that SNP alleles are significantly more frequent in patients than in healthy people. The search for biomedical SNP markers of interest can be accelerated by computer-based analysis of hundreds of millions of SNPs in the 1000 Genomes project because of selection of the most meaningful candidate SNP markers and elimination of neutral SNPs. RESULTS We cross-validated the output of two computer-based methods: DNA sequence analysis using Web service SNP_TATA_Comparator and keyword search for articles on comorbidities of obesity. Near the sites binding to TATA-binding protein (TBP) in human gene promoters, we found 22 obesity-related candidate SNP markers, including rs10895068 (male breast cancer in obesity); rs35036378 (reduced risk of obesity after ovariectomy); rs201739205 (reduced risk of obesity-related cancers due to weight loss by diet/exercise in obese postmenopausal women); rs183433761 (obesity resistance during a high-fat diet); rs367732974 and rs549591993 (both: cardiovascular complications in obese patients with type 2 diabetes mellitus); rs200487063 and rs34104384 (both: obesity-caused hypertension); rs35518301, rs72661131, and rs562962093 (all: obesity); and rs397509430, rs33980857, rs34598529, rs33931746, rs33981098, rs34500389, rs63750953, rs281864525, rs35518301, and rs34166473 (all: chronic inflammation in comorbidities of obesity). Using an electrophoretic mobility shift assay under nonequilibrium conditions, we empirically validated the statistical significance (α < 0.00025) of the differences in TBP affinity values between the minor and ancestral alleles of 4 out of the 22 SNPs: rs200487063, rs201381696, rs34104384, and rs183433761. We also measured half-life (t1/2), Gibbs free energy change (ΔG), and the association and dissociation rate constants, ka and kd, of the TBP-DNA complex for these SNPs. CONCLUSIONS Validation of the 22 candidate SNP markers by proper clinical protocols appears to have a strong rationale and may advance postgenomic predictive preventive personalized medicine.
Collapse
Affiliation(s)
- Olga V Arkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Mikhail P Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk 630090, Russia
- Laboratory of Evolutionary Bioinformatics and Theoretical Genetics, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyev Avenue, Novosibirsk 630090, Russia
| | - Dmitry A Rasskazov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Irina A Drachkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Tatjana V Arshinova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Petr M Ponomarenko
- Children's Hospital Los Angeles, 4640 Hollywood Boulevard, University of Southern California, Los Angeles, CA 90027, USA
| | - Ludmila K Savinkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
| | - Nikolay A Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 10 Lavrentyeva Avenue, Novosibirsk 630090, Russia
- Novosibirsk State University, 2 Pirogova Street, Novosibirsk 630090, Russia
| |
Collapse
|
198
|
Cherry JM. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining. Cold Spring Harb Protoc 2015; 2015:pdb.prot088906. [PMID: 26631124 PMCID: PMC5673598 DOI: 10.1101/pdb.prot088906] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided.
Collapse
Affiliation(s)
- J. Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305-5120
| |
Collapse
|
199
|
Network regularised Cox regression and multiplex network models to predict disease comorbidities and survival of cancer. Comput Biol Chem 2015; 59 Pt B:15-31. [DOI: 10.1016/j.compbiolchem.2015.08.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2015] [Revised: 08/21/2015] [Accepted: 08/25/2015] [Indexed: 12/17/2022]
|
200
|
Reprint of “Abstraction for data integration: Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction”. Comput Biol Chem 2015; 59 Pt B:123-38. [DOI: 10.1016/j.compbiolchem.2015.08.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 06/04/2015] [Accepted: 06/05/2015] [Indexed: 12/21/2022]
|