101
|
Kahanda I, Funk CS, Ullah F, Verspoor KM, Ben-Hur A. A close look at protein function prediction evaluation protocols. Gigascience 2015; 4:41. [PMID: 26380075 PMCID: PMC4570743 DOI: 10.1186/s13742-015-0082-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Accepted: 08/24/2015] [Indexed: 01/04/2023] Open
Abstract
Background The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine whether cross-validation provides a good estimate of performance. Results The CAFA2 task is a combination of two subtasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (structured support vector machine, binary support vector machines and guilt-by-association methods) do not usually achieve the same level of accuracy on these two tasks as that achieved by cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods. Conclusions These results have implications for the design of computational experiments in the area of automated function prediction and can provide useful insight for the understanding and design of future CAFA competitions. Electronic supplementary material The online version of this article (doi:10.1186/s13742-015-0082-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Indika Kahanda
- Department of Computer Science, Colorado State University, Fort Collins, 80523 CO USA
| | - Christopher S Funk
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, 80045 CO USA
| | - Fahad Ullah
- Department of Computer Science, Colorado State University, Fort Collins, 80523 CO USA
| | - Karin M Verspoor
- Department of Computing and Information Systems, University of Melbourne, 3010 Parkville, Victoria, Australia
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, 80523 CO USA
| |
Collapse
|
102
|
Bassim S, Chapman RW, Tanguy A, Moraga D, Tremblay R. Predicting growth and mortality of bivalve larvae using gene expression and supervised machine learning. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2015; 16:59-72. [PMID: 26282335 DOI: 10.1016/j.cbd.2015.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Revised: 07/13/2015] [Accepted: 07/24/2015] [Indexed: 10/23/2022]
Abstract
It is commonly known that the nature of the diet has diverse consequences on larval performance and longevity, however it is still unclear which genes have critical impacts on bivalve development and which pathways are of particular importance in their vulnerability or resistance. First we show that a diet deficient in essential fatty acid (EFA) produces higher larval mortality rates, a reduced shell growth, and lower postlarval performance, all of which are positively correlated with a decline in arachidonic and eicosapentaenoic acids levels, two EFAs known as eicosanoid precursors. Eicosanoids affect the cell inflammatory reactions and are synthesized from long-chain EFAs. Second, we show for the first time that a deficiency in eicosanoid precursors is associated with a network of 29 genes. Their differential regulation can lead to slower growth and higher mortality of Mytilus edulis larvae. Some of these genes are specific to bivalves and others are implicated at the same time in lipid metabolism and defense. Several genes are expressed only during pre-metamorphosis where they are essential for muscle or neurone development and biomineralization, but only in stress-induced larvae. Finally, we discuss how our networks of differentially expressed genes might dynamically alter the development of marine bivalves, especially under dietary influence.
Collapse
Affiliation(s)
- Sleiman Bassim
- Institut des Sciences de la mer de Rimouski, Universite du Quebec a Rimouski, 310, allee des Ursulines, Rimouski Quebec G5L 3A1, Canada; Laboratoire des Sciences de l'Environnement Marin, Institut Universitaire Europeen de la Mer, Universite de Bretagne Occidentale, Rue Dumont d'Urville, 29280 Plouzane, France
| | - Robert W Chapman
- Marine Resources Research Institute, South Carolina Department of Natural Resources and Hollings Marine Laboratory, 331 Ft. Johnson Road, Charleston, SC 29412, USA
| | - Arnaud Tanguy
- UPMC Universite Paris 6, UMR 7144, Genetique et Adaptation en Milieu Extreme, Station Biologique de Roscoff, France
| | - Dario Moraga
- Laboratoire des Sciences de l'Environnement Marin, Institut Universitaire Europeen de la Mer, Universite de Bretagne Occidentale, Rue Dumont d'Urville, 29280 Plouzane, France
| | - Rejean Tremblay
- Institut des Sciences de la mer de Rimouski, Universite du Quebec a Rimouski, 310, allee des Ursulines, Rimouski Quebec G5L 3A1, Canada.
| |
Collapse
|
103
|
Towner RA, Wren JD. Prioritizing uncharacterized genes in the search for glioma biomarkers. CNS Oncol 2015; 3:93-5. [PMID: 25055012 DOI: 10.2217/cns.14.8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
- Rheal A Towner
- Advanced Magnetic Resonance Center, Oklahoma Medical Research Foundation, 825 NE 13th Street, Oklahoma City, OK 73104, USA
| | | |
Collapse
|
104
|
Nelson RM, Pettersson ME. Degrees of separation as a statistical tool for evaluating candidate genes. Comput Biol Med 2014; 55:49-52. [PMID: 25450218 DOI: 10.1016/j.compbiomed.2014.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 09/22/2014] [Accepted: 10/01/2014] [Indexed: 11/29/2022]
Abstract
Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available.
Collapse
Affiliation(s)
- Ronald M Nelson
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden.
| | - Mats E Pettersson
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
105
|
Abayomi O, Amato D, Bailey C, Bitanihirwe B, Bowen L, Burshtein S, Cullen A, Fusté M, Herrmann AP, Khodaie B, Kilian S, Lang QA, Manning EE, Massuda R, Nurjono M, Sadiq S, Sanchez-Gutierrez T, Sheinbaum T, Shivakumar V, Simon N, Spiteri-Staines A, Sirijit S, Toftdahl NG, Wadehra S, Wang Y, Wigton R, Wright S, Yagoda S, Zaytseva Y, O'Shea A, DeLisi LE. The 4th Schizophrenia International Research Society Conference, 5-9 April 2014, Florence, Italy: a summary of topics and trends. Schizophr Res 2014; 159:e1-22. [PMID: 25306204 PMCID: PMC4394607 DOI: 10.1016/j.schres.2014.08.032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Revised: 08/07/2014] [Accepted: 08/26/2014] [Indexed: 11/26/2022]
Abstract
The 4th Schizophrenia International Research Society Conference was held in Florence, Italy, April 5-9, 2014 and this year had as its emphasis, "Fostering Collaboration in Schizophrenia Research". Student travel awardees served as rapporteurs for each oral session, summarized the important contributions of each session and then each report was integrated into a final summary of data discussed at the entire conference by topic. It is hoped that by combining data from different presentations, patterns of interest will emerge and thus lead to new progress for the future. In addition, the following report provides an overview of the conference for those who were present, but could not participate in all sessions, and those who did not have the opportunity to attend, but who would be interested in an update on current investigations ongoing in the field of schizophrenia research.
Collapse
Affiliation(s)
- Olukayode Abayomi
- Ladoke Akintola University of Technology Teaching Hospital, PMB 4007, Ogbomoso, Oyo, Nigeria
| | - Davide Amato
- Department of Psychiatry and Psychotherapy, Friedrich-Alexander-University of Erlangen-Nuremberg, Ulmenweg 19, 91054 Erlangen, Germany
| | - Candace Bailey
- University of Texas Medical Branch, School of Medicine, 215 Mechanic Street, Apt. M206, Galveston77550, TX, United States
| | - Byron Bitanihirwe
- Laboratory of System and Cell Biology of Neurodegeneration, University of Zurich, Wagistrasse 12, 8952 Schlieren, Zurich, Switzerland
| | - Lynneice Bowen
- Morehouse School of Medicine, 720 Westview Dr. SW, Atlanta, GA 30310, United States
| | | | - Alexis Cullen
- Health Services and Population Research Department, David Goldberg Centre, Institute of Psychiatry, De Crespigny Park, Denmark Hill, London SE5 8AF, UK
| | - Montserrat Fusté
- Department of Psychosis Studies, Institute of Psychiatry, King's College London, 16 De Crespigny Park, SE5 8AF London, UK
| | - Ana P Herrmann
- Pharmacology Department, Basic Health Sciences Institute, Universidade Federal do Rio Grande do Sul, Rua Sarmento Leite, 500, 90050-170 Porto Alegre, RS, Brazil
| | | | - Sanja Kilian
- Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, Cape Town, South Africa
| | - Qortni A Lang
- Howard University College of Medicine, 520 W Street, Washington, DC 20059, United States
| | - Elizabeth E Manning
- The Florey Institute of Neuroscience and Mental Health, Kenneth Myer Building, 30 Royal Parade, Parkville 3052, VIC, Australia
| | - Raffael Massuda
- Laboratory of Molecular Psychiatry, INCT for Translational Medicine, Hospital de Clínicas de Porto Alegre, Universidade Federal do Rio Grande do Sul, Rua Ramiro Barcelos, 2350 Santa Cecília, Porto Alegre, RS 90035-903, Brazil
| | - Milawaty Nurjono
- Saw Swee Hock School of Public Health, National University of Singapore, MD3, 16 Medical Drive, Singapore 117597, Singapore
| | - Sarosh Sadiq
- Government College University, 170-S, 19/B, College Road, New Samanabad, Lahore, Pakistan
| | - Teresa Sanchez-Gutierrez
- Child and Adolescent Psychiatry Department, Hospital General Universitario Gregorio Marañón, School of Medicine, Universidad Complutense, IiSGM, CIBERSAM, C/Ibiza, 43 28009, Madrid, Spain
| | - Tamara Sheinbaum
- Departament de Psicologia Clínica i de la Salut, Universitat Autònoma de Barcelona, Edifici B, 08193 Bellaterra, Barcelona, Spain
| | | | - Nicholas Simon
- Department of Neuroscience, A210 Langley Hall, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Anneliese Spiteri-Staines
- Centre for Youth Mental Health, The University of Melbourne, 35 Poplar Road, Parkville 3052, Victoria, Australia
| | - Suttajit Sirijit
- Department of Psychiatry, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nanna Gilliam Toftdahl
- Mental Health Centre Copenhagen, Bispebjerg Bakke 23, Entrance 13A, 3rd floor, DK-2400, Copenhagen NV, Denmark
| | - Sunali Wadehra
- Wayne State University School of Medicine, 469 West Hancock, Detroit 48201, MI, United States
| | - Yi Wang
- Neuropsychology and Applied Cognitive Neuroscience Laboratory, Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, 16 Lincui Road, Beijing 100101, China
| | - Rebekah Wigton
- Cognition and Schizophrenia Imaging Laboratory, Institute of Psychiatry, King's College, 16 De Crespigny Park Rd, Denmark Hill, London SE5 8AF, UK
| | - Susan Wright
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Neuroimaging Research Program, P.O. Box 21247, Baltimore, MD 21228, United States
| | - Sergey Yagoda
- Department of Psychiatry, Psychotherapy and Medical Psychology of Stavropol State Medical University, 28b Aivazovsky str, Stavropol 355007, Russia
| | - Yuliya Zaytseva
- Moscow Research Institute of Psychiatry, Russian Federation/Prague Psychiatric Centre affiliated with 3rd Faculty of Medicine, Charles University in Prague, Czech Republic
| | - Anne O'Shea
- Harvard Medical School, Brockton, MA 02301, United States. anne_o'
| | - Lynn E DeLisi
- Department of Psychiatry, Harvard Medical School, 940 Belmont Street, Brockton, MA 02301, United States; VA Boston Healthcare System, 940 Belmont Street, Brockton, MA 02301, United States.
| |
Collapse
|
106
|
Luo Y, Riedlinger G, Szolovits P. Text mining in cancer gene and pathway prioritization. Cancer Inform 2014; 13:69-79. [PMID: 25392685 PMCID: PMC4216063 DOI: 10.4137/cin.s13874] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 05/18/2014] [Accepted: 05/18/2014] [Indexed: 12/18/2022] Open
Abstract
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
Collapse
Affiliation(s)
- Yuan Luo
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gregory Riedlinger
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
107
|
Clancy T, Hovig E. From proteomes to complexomes in the era of systems biology. Proteomics 2014; 14:24-41. [PMID: 24243660 DOI: 10.1002/pmic.201300230] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Revised: 10/22/2013] [Accepted: 11/06/2013] [Indexed: 01/16/2023]
Abstract
Protein complexes carry out almost the entire signaling and functional processes in the cell. The protein complex complement of a cell, and its network of complex-complex interactions, is referred to here as the complexome. Computational methods to predict protein complexes from proteomics data, resulting in network representations of complexomes, have recently being developed. In addition, key advances have been made toward understanding the network and structural organization of complexomes. We review these bioinformatics advances, and their discovery-potential, as well as the merits of integrating proteomics data with emerging methods in systems biology to study protein complex signaling. It is envisioned that improved integration of proteomics and systems biology, incorporating the dynamics of protein complexes in space and time, may lead to more predictive models of cell signaling networks for effective modulation.
Collapse
Affiliation(s)
- Trevor Clancy
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | | |
Collapse
|
108
|
Mulder NJ, Akinola RO, Mazandu GK, Rapanoel H. Using biological networks to improve our understanding of infectious diseases. Comput Struct Biotechnol J 2014; 11:1-10. [PMID: 25379138 PMCID: PMC4212278 DOI: 10.1016/j.csbj.2014.08.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Infectious diseases are the leading cause of death, particularly in developing countries. Although many drugs are available for treating the most common infectious diseases, in many cases the mechanism of action of these drugs or even their targets in the pathogen remain unknown. In addition, the key factors or processes in pathogens that facilitate infection and disease progression are often not well understood. Since proteins do not work in isolation, understanding biological systems requires a better understanding of the interconnectivity between proteins in different pathways and processes, which includes both physical and other functional interactions. Such biological networks can be generated within organisms or between organisms sharing a common environment using experimental data and computational predictions. Though different data sources provide different levels of accuracy, confidence in interactions can be measured using interaction scores. Connections between interacting proteins in biological networks can be represented as graphs and edges, and thus studied using existing algorithms and tools from graph theory. There are many different applications of biological networks, and here we discuss three such applications, specifically applied to the infectious disease tuberculosis, with its causative agent Mycobacterium tuberculosis and host, Homo sapiens. The applications include the use of the networks for function prediction, comparison of networks for evolutionary studies, and the generation and use of host–pathogen interaction networks.
Collapse
Affiliation(s)
- Nicola J Mulder
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Richard O Akinola
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Holifidy Rapanoel
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| |
Collapse
|
109
|
Montojo J, Zuberi K, Shao Q, Bader GD, Morris Q. Network Assessor: an automated method for quantitative assessment of a network's potential for gene function prediction. Front Genet 2014; 5:123. [PMID: 24904632 PMCID: PMC4032932 DOI: 10.3389/fgene.2014.00123] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 04/21/2014] [Indexed: 01/17/2023] Open
Abstract
Significant effort has been invested in network-based gene function prediction algorithms based on the guilt by association (GBA) principle. Existing approaches for assessing prediction performance typically compute evaluation metrics, either averaged across all functions being considered, or strictly from properties of the network. Since the success of GBA algorithms depends on the specific function being predicted, evaluation metrics should instead be computed for each function. We describe a novel method for computing the usefulness of a network by measuring its impact on gene function cross validation prediction performance across all gene functions. We have implemented this in software called Network Assessor, and describe its use in the GeneMANIA (GM) quality control system. Network Assessor is part of the GM command line tools.
Collapse
Affiliation(s)
- Jason Montojo
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto Toronto, ON, Canada
| | - Khalid Zuberi
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto Toronto, ON, Canada
| | - Quentin Shao
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto Toronto, ON, Canada
| | - Gary D Bader
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto Toronto, ON, Canada
| | - Quaid Morris
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto Toronto, ON, Canada
| |
Collapse
|
110
|
Grennan KS, Chen C, Gershon ES, Liu C. Molecular network analysis enhances understanding of the biology of mental disorders. Bioessays 2014; 36:606-16. [PMID: 24733456 DOI: 10.1002/bies.201300147] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
We provide an introduction to network theory, evidence to support a connection between molecular network structure and neuropsychiatric disease, and examples of how network approaches can expand our knowledge of the molecular bases of these diseases. Without systematic methods to derive their biological meanings and inter-relatedness, the many molecular changes associated with neuropsychiatric disease, including genetic variants, gene expression changes, and protein differences, present an impenetrably complex set of findings. Network approaches can potentially help integrate and reconcile these findings, as well as provide new insights into the molecular architecture of neuropsychiatric diseases. Network approaches to neuropsychiatric disease are still in their infancy, and we discuss what might be done to improve their prospects.
Collapse
Affiliation(s)
- Kay S Grennan
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA
| | | | | | | |
Collapse
|
111
|
Wang YXR, Huang H. Review on statistical methods for gene network reconstruction using expression data. J Theor Biol 2014; 362:53-61. [PMID: 24726980 DOI: 10.1016/j.jtbi.2014.03.040] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Revised: 03/29/2014] [Accepted: 03/31/2014] [Indexed: 12/16/2022]
Abstract
Network modeling has proven to be a fundamental tool in analyzing the inner workings of a cell. It has revolutionized our understanding of biological processes and made significant contributions to the discovery of disease biomarkers. Much effort has been devoted to reconstruct various types of biochemical networks using functional genomic datasets generated by high-throughput technologies. This paper discusses statistical methods used to reconstruct gene regulatory networks using gene expression data. In particular, we highlight progress made and challenges yet to be met in the problems involved in estimating gene interactions, inferring causality and modeling temporal changes of regulation behaviors. As rapid advances in technologies have made available diverse, large-scale genomic data, we also survey methods of incorporating all these additional data to achieve better, more accurate inference of gene networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| | - Haiyan Huang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| |
Collapse
|
112
|
Rhee SY, Mutwil M. Towards revealing the functions of all genes in plants. TRENDS IN PLANT SCIENCE 2014; 19:212-21. [PMID: 24231067 DOI: 10.1016/j.tplants.2013.10.006] [Citation(s) in RCA: 146] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 10/10/2013] [Accepted: 10/16/2013] [Indexed: 05/19/2023]
Abstract
The great recent progress made in identifying the molecular parts lists of organisms revealed the paucity of our understanding of what most of the parts do. In this review, we introduce computational and statistical approaches and omics data used for inferring gene function in plants, with an emphasis on network-based inference. We also discuss caveats associated with network-based function predictions such as performance assessment, annotation propagation, the guilt-by-association concept, and the meaning of hubs. Finally, we note the current limitations and possible future directions such as the need for gold standard data from several species, unified access to data and tools, quantitative comparison of data and tool quality, and high-throughput experimental validation platforms for systematic gene function elucidation in plants.
Collapse
Affiliation(s)
- Seung Yon Rhee
- Carnegie Institution for Science, Department of Plant Biology, 260 Panama St, Stanford, CA 94305, USA.
| | - Marek Mutwil
- Max Planck Institute for Molecular Plant Physiology, 14476 Potsdam, Germany.
| |
Collapse
|
113
|
Jiménez-Gómez JM. Network types and their application in natural variation studies in plants. CURRENT OPINION IN PLANT BIOLOGY 2014; 18:80-86. [PMID: 24632305 DOI: 10.1016/j.pbi.2014.02.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/06/2014] [Accepted: 02/17/2014] [Indexed: 06/03/2023]
Abstract
We are in the age of data-driven biology. Not even a decade after the invention of high-throughput sequencing technologies, there are methods that accurately monitor DNA polymorphisms, transcription profiles, methylation states, transcription factor binding sites, chromatin compactness, nucleosome positions, dynamic histone marks, and so on. We are starting to generate comparable amounts of protein or metabolite data. A key issue is how are we going to make sense of all this information. Network analysis is the most promising method to integrate, query and display large amounts of data for human interpretation. This review shortly summarizes the basic types of networks, their properties and limitations. In addition, I introduce the application of networks to the study of the molecular mechanisms behind natural phenotypic variation.
Collapse
Affiliation(s)
- José M Jiménez-Gómez
- INRA - Institut National de la Recherche Agronomique, UMR 1318, Institut Jean-Pierre Bourgin, Versailles, France; Max Planck Institute for Plant Breeding Research, Department of Plant Breeding and Genetics, Carl-von-Linné-Weg 10, 50829 Cologne, Germany.
| |
Collapse
|
114
|
Gulati J, Baldwin IT, Gaquerel E. The roots of plant defenses: integrative multivariate analyses uncover dynamic behaviors of gene and metabolic networks of roots elicited by leaf herbivory. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2014; 77:880-92. [PMID: 24456376 PMCID: PMC4190575 DOI: 10.1111/tpj.12439] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2013] [Revised: 12/11/2013] [Accepted: 01/09/2014] [Indexed: 05/08/2023]
Abstract
High-throughput analyses have frequently been used to characterize herbivory-induced reconfigurations in plant primary and secondary metabolism in above- and below-ground tissues, but the conclusions drawn from these analyses are often limited by the univariate methods used to analyze the data. Here we use our previously described multivariate time-series data analysis to evaluate leaf herbivory-elicited transcriptional and metabolic dynamics in the roots of Nicotiana attenuata. We observed large, but transient, systemic responses in the roots that contrasted with the pattern of co-linearity observed in the up- and downregulation of genes and metabolites across the entire time series in treated and systemic leaves. Using this newly developed approach for the analysis of whole-plant molecular responses in a time-course multivariate data set, we simultaneously analyzed stress responses in leaves and roots in response to the elicitation of a leaf. We found that transient systemic responses in roots resolved into two principal trends characterized by: (i) an inversion of root-specific semi-diurnal (12 h) transcript oscillations and (ii) transcriptional changes with major amplitude effects that translated into a distinct suite of root-specific secondary metabolites (e.g. alkaloids synthesized in the roots of N. attenuata). These findings underscore the importance of understanding tissue-specific stress responses in the correct day-night phase context and provide a holistic framework for the important role played by roots in above-ground stress responses.
Collapse
Affiliation(s)
- Jyotasana Gulati
- Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, Hans-Knoell-Str. 8, 07745 Jena, Germany
| | - Ian T. Baldwin
- Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, Hans-Knoell-Str. 8, 07745 Jena, Germany
| | - Emmanuel Gaquerel
- Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, Hans-Knoell-Str. 8, 07745 Jena, Germany
- Centre for Organismal Studies, University of Heidelberg, Im Neuenheimer Feld 360, 69120 Heidelberg, Germany
| |
Collapse
|
115
|
Jia P, Zhao Z. Network.assisted analysis to prioritize GWAS results: principles, methods and perspectives. Hum Genet 2014; 133:125-38. [PMID: 24122152 PMCID: PMC3943795 DOI: 10.1007/s00439-013-1377-1] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Accepted: 10/03/2013] [Indexed: 01/24/2023]
Abstract
Genome-wide association studies (GWAS) have rapidly become a powerful tool in genetic studies of complex diseases and traits. Traditionally, single marker-based tests have been used prevalently in GWAS and have uncovered tens of thousands of disease-associated SNPs. Network-assisted analysis (NAA) of GWAS data is an emerging area in which network-related approaches are developed and utilized to perform advanced analyses of GWAS data in order to study various human diseases or traits. Progress has been made in both methodology development and applications of NAA in GWAS data, and it has already been demonstrated that NAA results may enhance our interpretation and prioritization of candidate genes and markers. Inspired by the strong interest in and high demand for advanced GWAS data analysis, in this review article, we discuss the methodologies and strategies that have been reported for the NAA of GWAS data. Many NAA approaches search for subnetworks and assess the combined effects of multiple genes participating in the resultant subnetworks through a gene set analysis. With no restriction to pre-defined canonical pathways, NAA has the advantage of defining subnetworks with the guidance of the GWAS data under investigation. In addition, some NAA methods prioritize genes from GWAS data based on their interconnections in the reference network. Here, we summarize NAA applications to various diseases and discuss the available options and potential caveats related to their practical usage. Additionally, we provide perspectives regarding this rapidly growing research area.
Collapse
|
116
|
Zhan Y, Zhang R, Lv H, Song X, Xu X, Chai L, Lv W, Shang Z, Jiang Y, Zhang R. Prioritization of candidate genes for periodontitis using multiple computational tools. J Periodontol 2014; 85:1059-69. [PMID: 24476546 DOI: 10.1902/jop.2014.130523] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
BACKGROUND Both genetic and environmental factors contribute to the development of periodontitis. Genetic studies identified a variety of candidate genes for periodontitis. The aim of the present study is to identify the most promising candidate genes for periodontitis using an integrative gene ranking method. METHODS Seed genes that were confirmed to be associated with periodontitis were identified using text mining. Three types of candidate genes were then extracted from different resources (expression profiles, genome-wide association studies). Combining the seed genes, four freely available bioinformatics tools (ToppGene, DIR, Endeavour, and GPEC) were integrated for prioritization of candidate genes. Candidate genes that identified with at least three programs and ranked in the top 20 by each program were considered the most promising. RESULTS Prioritization analysis resulted in 21 promising genes involved or potentially involved in periodontitis. Among them, IL18 (interleukin 18), CD44 (CD44 molecule), CXCL1 (chemokine [CXC motif] ligand 1), IL6ST (interleukin 6 signal transducer), MMP3 (matrix metallopeptidase 3), MMP7, CCR1 (chemokine [C-C motif] receptor 1), MMP13, and TLR9 (Toll-like receptor 9) had been associated with periodontitis. However, the roles of other genes, such as CSF3 (colony stimulating factor 3 receptor), CD40, TNFSF14 (tumor necrosis factor receptor superfamily, member 14), IFNB1 (interferon-β1), TIRAP (toll-interleukin 1 receptor domain containing adaptor protein), IL2RA (interleukin 2 receptor α), ETS1 (v-ets avian erythroblastosis virus E26 oncogene homolog 1), GADD45B (growth arrest and DNA-damage-inducible 45 β), BIRC3 (baculoviral IAP repeat containing 3), VAV1 (vav 1 guanine nucleotide exchange factor), COL5A1 (collagen, type V, α1), and C3 (complement component 3), have not been investigated thoroughly in the process of periodontitis. These genes are mainly involved in bacterial infection, immune response, and inflammatory reaction, suggesting that further characterizing their roles in periodontitis will be important. CONCLUSIONS A combination of computational tools will be useful in mining candidate genes for periodontitis. These theoretical results provide new clues for experimental biologists to plan targeted experiments.
Collapse
Affiliation(s)
- Yuanbo Zhan
- Department of Periodontology and Oral Mucosa, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
117
|
Gillis J, Ballouz S, Pavlidis P. Bias tradeoffs in the creation and analysis of protein-protein interaction networks. J Proteomics 2014; 100:44-54. [PMID: 24480284 DOI: 10.1016/j.jprot.2014.01.020] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Revised: 01/13/2014] [Accepted: 01/17/2014] [Indexed: 02/04/2023]
Abstract
UNLABELLED Networks constructed from aggregated protein-protein interaction data are commonplace in biology. But the studies these data are derived from were conducted with their own hypotheses and foci. Focusing on data from budding yeast present in BioGRID, we determine that many of the downstream signals present in network data are significantly impacted by biases in the original data. We determine the degree to which selection bias in favor of biologically interesting bait proteins goes down with study size, while we also find that promiscuity in prey contributes more substantially in larger studies. We analyze interaction studies over time with respect to data in the Gene Ontology and find that reproducibly observed interactions are less likely to favor multifunctional proteins. We find that strong alignment between co-expression and protein-protein interaction data occurs only for extreme co-expression values, and use this data to suggest candidates for targets likely to reveal novel biology in follow-up studies. BIOLOGICAL SIGNIFICANCE Protein-protein interaction data finds particularly heavy use in the interpretation of disease-causal variants. In principle, network data allows researchers to find novel commonalities among candidate genes. In this study, we detail several of the most salient biases contributing to aggregated protein-protein interaction databases. We find strong evidence for the role of selection and laboratory biases. Many of these effects contribute to the commonalities researchers find for disease genes. In order for characterization of disease genes and their interactions to not simply be an artifact of researcher preference, it is imperative to identify data biases explicitly. Based on this, we also suggest ways to move forward in producing candidates less influenced by prior knowledge. This article is part of a Special Issue entitled: Can Proteomics Fill the Gap Between Genomics and Phenotypes?
Collapse
Affiliation(s)
- Jesse Gillis
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, 500 Sunnyside Boulevard, Woodbury, NY 11797, United States.
| | - Sara Ballouz
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, 500 Sunnyside Boulevard, Woodbury, NY 11797, United States.
| | - Paul Pavlidis
- Department of Psychiatry and Centre for High-Throughput Biology, University of British Columbia, 2185 East Mall., Vancouver, BC V6T 1Z4, Canada.
| |
Collapse
|
118
|
Santoni D, Swiercz A, Zmieńko A, Kasprzak M, Blazewicz M, Bertolazzi P, Felici G. An integrated approach (CLuster Analysis Integration Method) to combine expression data and protein-protein interaction networks in agrigenomics: application on Arabidopsis thaliana. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:155-65. [PMID: 24404838 DOI: 10.1089/omi.2013.0050] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Experimental co-expression data and protein-protein interaction networks are frequently used to analyze the interactions among genes or proteins. Recent studies have investigated methods to integrate these two sources of information. We propose a new method to integrate co-expression data obtained through DNA microarray analysis (MA) and protein-protein interaction (PPI) network data, and apply it to Arabidopsis thaliana. The proposed method identifies small subsets of highly interacting proteins. Based on the analysis of the basis of co-localization and mRNA developmental expression, we show that these groups provide important biological insights; additionally, these subsets are significantly enriched with respect to KEGG Pathways and can be used to predict successfully whether proteins belong to known pathways. Thus, the method is able to provide relevant biological information and support the functional identification of complex genetic traits of economic value in plant agrigenomics research. The method has been implemented in a prototype software tool named CLAIM (CLuster Analysis Integration Method) and can be downloaded from http://bio.cs.put.poznan.pl/research_fields . CLAIM is based on the separate clustering of MA and PPI data; the clusters are merged in a special graph; cliques of this graph are subsets of strongly connected proteins. The proposed method was successfully compared with existing methods. CLAIM appears to be a useful semi-automated tool for protein functional analysis and warrants further evaluation in agrigenomics research.
Collapse
Affiliation(s)
- Daniele Santoni
- 1 Institute for Systems Analysis and Computer Science "Antonio Ruberti" , National Research Council of Italy, Rome, Italy
| | | | | | | | | | | | | |
Collapse
|
119
|
Gaiteri C, Ding Y, French B, Tseng GC, Sibille E. Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. GENES, BRAIN, AND BEHAVIOR 2014; 13:13-24. [PMID: 24320616 PMCID: PMC3896950 DOI: 10.1111/gbb.12106] [Citation(s) in RCA: 187] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2013] [Revised: 09/25/2013] [Accepted: 11/10/2013] [Indexed: 12/12/2022]
Abstract
In a research environment dominated by reductionist approaches to brain disease mechanisms, gene network analysis provides a complementary framework in which to tackle the complex dysregulations that occur in neuropsychiatric and other neurological disorders. Gene-gene expression correlations are a common source of molecular networks because they can be extracted from high-dimensional disease data and encapsulate the activity of multiple regulatory systems. However, the analysis of gene coexpression patterns is often treated as a mechanistic black box, in which looming 'hub genes' direct cellular networks, and where other features are obscured. By examining the biophysical bases of coexpression and gene regulatory changes that occur in disease, recent studies suggest it is possible to use coexpression networks as a multi-omic screening procedure to generate novel hypotheses for disease mechanisms. Because technical processing steps can affect the outcome and interpretation of coexpression networks, we examine the assumptions and alternatives to common patterns of coexpression analysis and discuss additional topics such as acceptable datasets for coexpression analysis, the robust identification of modules, disease-related prioritization of genes and molecular systems and network meta-analysis. To accelerate coexpression research beyond modules and hubs, we highlight some emerging directions for coexpression network research that are especially relevant to complex brain disease, including the centrality-lethality relationship, integration with machine learning approaches and network pharmacology.
Collapse
Affiliation(s)
- Chris Gaiteri
- . Modeling, Analysis and Theory Group, Allen Institute for Brain Science, Seattle WA, USA
| | - Ying Ding
- . Carnegie Mellon-University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA
| | - Beverly French
- . Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - George C. Tseng
- . Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Etienne Sibille
- . Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
120
|
Spanagel R. Convergent functional genomics in addiction research - a translational approach to study candidate genes and gene networks. In Silico Pharmacol 2013; 1:18. [PMID: 25505662 PMCID: PMC4230431 DOI: 10.1186/2193-9616-1-18] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 11/12/2013] [Indexed: 01/16/2023] Open
Abstract
Convergent functional genomics (CFG) is a translational methodology that integrates in a Bayesian fashion multiple lines of evidence from studies in human and animal models to get a better understanding of the genetics of a disease or pathological behavior. Here the integration of data sets that derive from forward genetics in animals and genetic association studies including genome wide association studies (GWAS) in humans is described for addictive behavior. The aim of forward genetics in animals and association studies in humans is to identify mutations (e.g. SNPs) that produce a certain phenotype; i.e. "from phenotype to genotype". Most powerful in terms of forward genetics is combined quantitative trait loci (QTL) analysis and gene expression profiling in recombinant inbreed rodent lines or genetically selected animals for a specific phenotype, e.g. high vs. low drug consumption. By Bayesian scoring genomic information from forward genetics in animals is then combined with human GWAS data on a similar addiction-relevant phenotype. This integrative approach generates a robust candidate gene list that has to be functionally validated by means of reverse genetics in animals; i.e. "from genotype to phenotype". It is proposed that studying addiction relevant phenotypes and endophenotypes by this CFG approach will allow a better determination of the genetics of addictive behavior.
Collapse
Affiliation(s)
- Rainer Spanagel
- Institute of Psychopharmacology, Central Institute of Mental Health, Faculty of Medicine Mannheim, University of Heidelberg, J5, 68159 Mannheim, Germany
| |
Collapse
|
121
|
Pavlidis P, Gillis J. Progress and challenges in the computational prediction of gene function using networks: 2012-2013 update. F1000Res 2013; 2:230. [PMID: 24715959 PMCID: PMC3962002 DOI: 10.12688/f1000research.2-230.v1] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/21/2013] [Indexed: 12/12/2022] Open
Abstract
In an opinion published in 2012, we reviewed and discussed our studies of how gene network-based guilt-by-association (GBA) is impacted by confounds related to gene multifunctionality. We found such confounds account for a significant part of the GBA signal, and as a result meaningfully evaluating and applying computationally-guided GBA is more challenging than generally appreciated. We proposed that effort currently spent on incrementally improving algorithms would be better spent in identifying the features of data that do yield novel functional insights. We also suggested that part of the problem is the reliance by computational biologists on gold standard annotations such as the Gene Ontology. In the year since, there has been continued heavy activity in GBA-based research, including work that contributes to our understanding of the issues we raised. Here we provide a review of some of the most relevant recent work, or which point to new areas of progress and challenges.
Collapse
Affiliation(s)
- Paul Pavlidis
- Centre for High-Throughput Biology and Department of Psychiatry, University of British Columbia, Vancouver, V6T1Z4, Canada
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Woodbury, NY, 11797, USA
| |
Collapse
|
122
|
Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Mouse model phenotypes provide information about human drug targets. ACTA ACUST UNITED AC 2013; 30:719-25. [PMID: 24158600 PMCID: PMC3933875 DOI: 10.1093/bioinformatics/btt613] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Motivation: Methods for computational drug target identification use information from diverse information sources to predict or prioritize drug targets for known drugs. One set of resources that has been relatively neglected for drug repurposing is animal model phenotype. Results: We investigate the use of mouse model phenotypes for drug target identification. To achieve this goal, we first integrate mouse model phenotypes and drug effects, and then systematically compare the phenotypic similarity between mouse models and drug effect profiles. We find a high similarity between phenotypes resulting from loss-of-function mutations and drug effects resulting from the inhibition of a protein through a drug action, and demonstrate how this approach can be used to suggest candidate drug targets. Availability and implementation: Analysis code and supplementary data files are available on the project Web site at https://drugeffects.googlecode.com. Contact:leechuck@leechuck.de or roh25@aber.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Computer Science, University of Aberystwyth, Old College, King Street, Aberystwyth SY23 2AX, Department of Biology, Institute of Biochemistry and School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario K1S 5B6, Canada and Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK
| | | | | | | | | | | |
Collapse
|
123
|
Abstract
High throughput technologies have been applied to investigate the underlying mechanisms of complex diseases, identify disease-associations and help to improve treatment. However it is challenging to derive biological insight from conventional single gene based analysis of "omics" data from high throughput experiments due to sample and patient heterogeneity. To address these challenges, many novel pathway and network based approaches were developed to integrate various "omics" data, such as gene expression, copy number alteration, Genome Wide Association Studies, and interaction data. This review will cover recent methodological developments in pathway analysis for the detection of dysregulated interactions and disease-associated subnetworks, prioritization of candidate disease genes, and disease classifications. For each application, we will also discuss the associated challenges and potential future directions.
Collapse
|
124
|
Stojanova D, Ceci M, Malerba D, Dzeroski S. Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction. BMC Bioinformatics 2013; 14:285. [PMID: 24070402 PMCID: PMC3850549 DOI: 10.1186/1471-2105-14-285] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 09/18/2013] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. RESULTS This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. CONCLUSIONS Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.
Collapse
Affiliation(s)
- Daniela Stojanova
- Department of Knowledge Technologies, JoŽef Stefan Institute, Jamova cesta 39, Ljubljana, Slovenia.
| | | | | | | |
Collapse
|
125
|
Mabbott NA, Baillie JK, Brown H, Freeman TC, Hume DA. An expression atlas of human primary cells: inference of gene function from coexpression networks. BMC Genomics 2013; 14:632. [PMID: 24053356 PMCID: PMC3849585 DOI: 10.1186/1471-2164-14-632] [Citation(s) in RCA: 276] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2013] [Accepted: 06/25/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND The specialisation of mammalian cells in time and space requires genes associated with specific pathways and functions to be co-ordinately expressed. Here we have combined a large number of publically available microarray datasets derived from human primary cells and analysed large correlation graphs of these data. RESULTS Using the network analysis tool BioLayout Express3D we identify robust co-associations of genes expressed in a wide variety of cell lineages. We discuss the biological significance of a number of these associations, in particular the coexpression of key transcription factors with the genes that they are likely to control. CONCLUSIONS We consider the regulation of genes in human primary cells and specifically in the human mononuclear phagocyte system. Of particular note is the fact that these data do not support the identity of putative markers of antigen-presenting dendritic cells, nor classification of M1 and M2 activation states, a current subject of debate within immunological field. We have provided this data resource on the BioGPS web site (http://biogps.org/dataset/2429/primary-cell-atlas/) and on macrophages.com (http://www.macrophages.com/hu-cell-atlas).
Collapse
Affiliation(s)
- Neil A Mabbott
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Edinburgh EH25 9RG, UK.
| | | | | | | | | |
Collapse
|
126
|
Winterbach W, Mieghem PV, Reinders M, Wang H, Ridder DD. Topology of molecular interaction networks. BMC SYSTEMS BIOLOGY 2013; 7:90. [PMID: 24041013 PMCID: PMC4231395 DOI: 10.1186/1752-0509-7-90] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Accepted: 08/01/2013] [Indexed: 12/23/2022]
Abstract
Molecular interactions are often represented as network models which have become the common language of many areas of biology. Graphs serve as convenient mathematical representations of network models and have themselves become objects of study. Their topology has been intensively researched over the last decade after evidence was found that they share underlying design principles with many other types of networks.Initial studies suggested that molecular interaction network topology is related to biological function and evolution. However, further whole-network analyses did not lead to a unified view on what this relation may look like, with conclusions highly dependent on the type of molecular interactions considered and the metrics used to study them. It is unclear whether global network topology drives function, as suggested by some researchers, or whether it is simply a byproduct of evolution or even an artefact of representing complex molecular interaction networks as graphs.Nevertheless, network biology has progressed significantly over the last years. We review the literature, focusing on two major developments. First, realizing that molecular interaction networks can be naturally decomposed into subsystems (such as modules and pathways), topology is increasingly studied locally rather than globally. Second, there is a move from a descriptive approach to a predictive one: rather than correlating biological network topology to generic properties such as robustness, it is used to predict specific functions or phenotypes.Taken together, this change in focus from globally descriptive to locally predictive points to new avenues of research. In particular, multi-scale approaches are developments promising to drive the study of molecular interaction networks further.
Collapse
Affiliation(s)
- Wynand Winterbach
- Network Architectures and Services, Department of Intelligent Systems, Faculty of
Electrical Engineering, Mathematics and Computer Science, Delft University of
Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty of Electrical
Engineering, Mathematics and Computer Science, Delft University of Technology,
P.O. Box 5031, 2600 GA Delft, The Netherlands
| | - Piet Van Mieghem
- Network Architectures and Services, Department of Intelligent Systems, Faculty of
Electrical Engineering, Mathematics and Computer Science, Delft University of
Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty of Electrical
Engineering, Mathematics and Computer Science, Delft University of Technology,
P.O. Box 5031, 2600 GA Delft, The Netherlands
- Netherlands Bioinformatics Center, 6500 HB Nijmegen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, 2600 GA Delft, The
Netherlands
| | - Huijuan Wang
- Network Architectures and Services, Department of Intelligent Systems, Faculty of
Electrical Engineering, Mathematics and Computer Science, Delft University of
Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands
| | - Dick de Ridder
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty of Electrical
Engineering, Mathematics and Computer Science, Delft University of Technology,
P.O. Box 5031, 2600 GA Delft, The Netherlands
- Netherlands Bioinformatics Center, 6500 HB Nijmegen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, 2600 GA Delft, The
Netherlands
| |
Collapse
|
127
|
Abstract
Next-generation sequencing projects continue to drive a vast accumulation of metagenomic sequence data. Given the growth rate of this data, automated approaches to functional annotation are indispensable and a cornerstone heuristic of many computational protocols is the concept of guilt by association. The guilt by association paradigm has been heavily exploited by genomic context methods that offer functional predictions that are complementary to homology-based annotations, thereby offering a means to extend functional annotation. In particular, operon methods that exploit co-directional intergenic distances can provide homology-free functional annotation through the transfer of functions among co-operonic genes, under the assumption that guilt by association is indeed applicable. Although guilt by association is a well-accepted annotative device, its applicability to metagenomic functional annotation has not been definitively demonstrated. Here a large-scale assessment of metagenomic guilt by association is undertaken where functional associations are predicted on the basis of co-directional intergenic distances. Specifically, functional annotations are compared within pairs of adjacent co-directional genes, as well as operons of various lengths (i.e. number of member genes), in order to reveal new information about annotative cohesion versus operon length. The results suggests that co-directional gene pairs offer reduced confidence for metagenomic guilt by association due to difficulty in resolving the existence of functional associations when intergenic distance is the sole predictor of pairwise gene interactions. However, metagenomic operons, particularly those with substantial lengths, appear to be capable of providing a superior basis for metagenomic guilt by association due to increased annotative stability. The need for improved recognition of metagenomic operons is discussed, as well as the limitations of the present work.
Collapse
Affiliation(s)
- Gregory Vey
- Department of Biology, University of Waterloo, Waterloo, Ontario, Canada.
| |
Collapse
|
128
|
Gillis J, Pavlidis P. Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA). BMC Bioinformatics 2013; 14 Suppl 3:S15. [PMID: 23630983 PMCID: PMC3633048 DOI: 10.1186/1471-2105-14-s3-s15] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
The assignment of gene function remains a difficult but important task in computational biology. The establishment of the first Critical Assessment of Functional Annotation (CAFA) was aimed at increasing progress in the field. We present an independent analysis of the results of CAFA, aimed at identifying challenges in assessment and at understanding trends in prediction performance. We found that well-accepted methods based on sequence similarity (i.e., BLAST) have a dominant effect. Many of the most informative predictions turned out to be either recovering existing knowledge about sequence similarity or were "post-dictions" already documented in the literature. These results indicate that deep challenges remain in even defining the task of function assignment, with a particular difficulty posed by the problem of defining function in a way that is not dependent on either flawed gold standards or the input data itself. In particular, we suggest that using the Gene Ontology (or other similar systematizations of function) as a gold standard is unlikely to be the way forward.
Collapse
Affiliation(s)
- Jesse Gillis
- Stanley Institute for Cognitive Genomic, Cold Spring Harbor Laboratory, 196 Genome Research Center, 500 Sunnyside Boulevard Woodbury, NY 11797, USA
| | | |
Collapse
|
129
|
Hoehndorf R, Schofield PN, Gkoutos GV. An integrative, translational approach to understanding rare and orphan genetically based diseases. Interface Focus 2013; 3:20120055. [PMID: 23853703 PMCID: PMC3638468 DOI: 10.1098/rsfs.2012.0055] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2012] [Accepted: 12/07/2012] [Indexed: 01/15/2023] Open
Abstract
PhenomeNet is an approach for integrating phenotypes across species and identifying candidate genes for genetic diseases based on the similarity between a disease and animal model phenotypes. In contrast to ‘guilt-by-association’ approaches, PhenomeNet relies exclusively on the comparison of phenotypes to suggest candidate genes, and can, therefore, be applied to study the molecular basis of rare and orphan diseases for which the molecular basis is unknown. In addition to disease phenotypes from the Online Mendelian Inheritance in Man (OMIM) database, we have now integrated the clinical signs from Orphanet into PhenomeNet. We demonstrate that our approach can efficiently identify known candidate genes for genetic diseases in Orphanet and OMIM. Furthermore, we find evidence that mutations in the HIP1 gene might cause Bassoe syndrome, a rare disorder with unknown genetic aetiology. Our results demonstrate that integration and computational analysis of human disease and animal model phenotypes using PhenomeNet has the potential to reveal novel insights into the pathobiology underlying genetic diseases.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK ; Department of Computer Science, University of Aberystwyth, Old College, King Street, Aberystwyth SY23 2AX, UK
| | | | | |
Collapse
|
130
|
Abstract
Many neurologic diseases cause discrete episodic impairment in contrast with progressive deterioration. The symptoms of these episodic disorders exhibit striking variety. Herein we review what is known of the phenotypes, genetics, and pathophysiology of episodic neurologic disorders. Of these, most are genetically complex, with unknown or polygenic inheritance. In contrast, a fascinating panoply of episodic disorders exhibit Mendelian inheritance. We classify episodic Mendelian disorders according to the primary neuroanatomical location affected: skeletal muscle, cardiac muscle, neuromuscular junction, peripheral nerve, or central nervous system (CNS). Most known Mendelian mutations alter genes that encode membrane-bound ion channels. These mutations cause ion channel dysfunction, which ultimately leads to altered membrane excitability as manifested by episodic disease. Other Mendelian disease genes encode proteins essential for ion channel trafficking or stability. These observations have cemented the channelopathy paradigm, in which episodic disorders are conceptualized as disorders of ion channels. However, we expand on this paradigm to propose that dysfunction at the synaptic and neuronal circuit levels may underlie some episodic neurologic entities.
Collapse
Affiliation(s)
- Jonathan F Russell
- Department of Neurology, Howard Hughes Medical Institute, School of Medicine, University of California-San Francisco, CA 94158, USA.
| | | | | |
Collapse
|
131
|
Lan P, Li W, Lin WD, Santi S, Schmidt W. Mapping gene activity of Arabidopsis root hairs. Genome Biol 2013; 14:R67. [PMID: 23800126 PMCID: PMC3707065 DOI: 10.1186/gb-2013-14-6-r67] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Accepted: 06/25/2013] [Indexed: 11/30/2022] Open
Abstract
Background Quantitative information on gene activity at single cell-type resolution is essential for the understanding of how cells work and interact. Root hairs, or trichoblasts, tubular-shaped outgrowths of specialized cells in the epidermis, represent an ideal model for cell fate acquisition and differentiation in plants. Results Here, we provide an atlas of gene and protein expression in Arabidopsis root hair cells, generated by paired-end RNA sequencing and LC/MS-MS analysis of protoplasts from plants containing a pEXP7-GFP reporter construct. In total, transcripts of 23,034 genes were detected in root hairs. High-resolution proteome analysis led to the reliable identification of 2,447 proteins, 129 of which were differentially expressed between root hairs and non-root hair tissue. Dissection of pre-mRNA splicing patterns showed that all types of alternative splicing were cell type-dependent, and less complex in EXP7-expressing cells when compared to non-root hair cells. Intron retention was repressed in several transcripts functionally related to root hair morphogenesis, indicative of a cell type-specific control of gene expression by alternative splicing of pre-mRNA. Concordance between mRNA and protein expression was generally high, but in many cases mRNA expression was not predictive for protein abundance. Conclusions The integrated analysis shows that gene activity in root hairs is dictated by orchestrated, multilayered regulatory mechanisms that allow for a cell type-specific composition of functional components.
Collapse
|
132
|
Kleessen S, Klie S, Nikoloski Z. Data integration through proximity-based networks provides biological principles of organization across scales. THE PLANT CELL 2013; 25:1917-27. [PMID: 23749845 PMCID: PMC3723603 DOI: 10.1105/tpc.113.111039] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Revised: 04/30/2013] [Accepted: 05/16/2013] [Indexed: 05/18/2023]
Abstract
Plant behaviors across levels of cellular organization, from biochemical components to tissues and organs, relate and reflect growth habitats. Quantification of the relationship between behaviors captured in various phenotypic characteristics and growth habitats can help reveal molecular mechanisms of plant adaptation. The aim of this article is to introduce the power of using statistics originally developed in the field of geographic variability analysis together with prominent network models in elucidating principles of biological organization. We provide a critical systematic review of the existing statistical and network-based approaches that can be employed to determine patterns of covariation from both uni- and multivariate phenotypic characteristics in plants. We demonstrate that parameter-independent network-based approaches result in robust insights about phenotypic covariation. These insights can be quantified and tested by applying well-established statistics combining the network structure with the phenotypic characteristics. We show that the reviewed network-based approaches are applicable from the level of genes to the study of individuals in a population of Arabidopsis thaliana. Finally, we demonstrate that the patterns of covariation can be generalized to quantifiable biological principles of organization. Therefore, these network-based approaches facilitate not only interpretation of large-scale data sets, but also prediction of biochemical and biological behaviors based on measurable characteristics.
Collapse
Affiliation(s)
- Sabrina Kleessen
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany
| | - Sebastian Klie
- Genes and Small Molecules Group, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany
- Address correspondence to
| |
Collapse
|
133
|
Inferring gene function and network organization in Drosophila signaling by combined analysis of pleiotropy and epistasis. G3-GENES GENOMES GENETICS 2013; 3:807-14. [PMID: 23550134 PMCID: PMC3656728 DOI: 10.1534/g3.113.005710] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
High-throughput genetic interaction screens have enabled functional genomics on a network scale. Groups of cofunctional genes commonly exhibit similar interaction patterns across a large network, leading to novel functional inferences for a minority of previously uncharacterized genes within a group. However, such analyses are often unsuited to cases with a few relevant gene variants or sparse annotation. Here we describe an alternative analysis of cell growth signaling using a computational strategy that integrates patterns of pleiotropy and epistasis to infer how gene knockdowns enhance or suppress the effects of other knockdowns. We analyzed the interaction network for RNAi knockdowns of a set of 93 incompletely annotated genes in a Drosophila melanogaster model of cellular signaling. We inferred novel functional relationships between genes by modeling genetic interactions in terms of knockdown-to-knockdown influences. The method simultaneously analyzes the effects of partially pleiotropic genes on multiple quantitative phenotypes to infer a consistent model of each genetic interaction. From these models we proposed novel candidate Ras inhibitors and their Ras signaling interaction partners, and each of these hypotheses can be inferred independent of network-wide patterns. At the same time, the network-scale interaction patterns consistently mapped pathway organization. The analysis therefore assigns functional relevance to individual genetic interactions while also revealing global genetic architecture.
Collapse
|
134
|
Rouchka EC, Flight RM. Proceedings of the 12th Annual UT-ORNL-KBRIN Bioinformatics Summit 2013. BMC Bioinformatics 2013; 14 Suppl 17:A1. [PMID: 24625056 PMCID: PMC3853103 DOI: 10.1186/1471-2105-14-s17-a1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Eric C Rouchka
- Department of Computer Engineering and Computer Science, University of Louisville, Duthie Center for Engineering, Louisville, KY 40292, USA
| | - Robert M Flight
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
135
|
Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA. Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate. PLoS One 2013; 8:e56810. [PMID: 23468881 PMCID: PMC3585227 DOI: 10.1371/journal.pone.0056810] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Accepted: 01/14/2013] [Indexed: 01/25/2023] Open
Abstract
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.
Collapse
Affiliation(s)
- Karen G. Dowell
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
| | - Allen K. Simons
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Zack Z. Wang
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Kyuson Yun
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
| | - Matthew A. Hibbs
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
- Trinity University, Department of Computer Science, San Antonio, Texas, United States of America
- * E-mail:
| |
Collapse
|
136
|
Gillis J, Pavlidis P. Assessing identity, redundancy and confounds in Gene Ontology annotations over time. ACTA ACUST UNITED AC 2013; 29:476-82. [PMID: 23297035 DOI: 10.1093/bioinformatics/bts727] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored. RESULTS We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their 'functional identity' over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks. AVAILABILITY Data available at http://chibi.ubc.ca/assessGO.
Collapse
Affiliation(s)
- Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 192B Genome Research Center, 500 Sunnyside Boulevard, Woodbury, NY 11797, USA
| | | |
Collapse
|
137
|
CINPER: an interactive web system for pathway prediction for prokaryotes. PLoS One 2012; 7:e51252. [PMID: 23236458 PMCID: PMC3517448 DOI: 10.1371/journal.pone.0051252] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Accepted: 10/30/2012] [Indexed: 11/19/2022] Open
Abstract
We present a web-based network-construction system, CINPER (CSBL INteractive Pathway BuildER), to assist a user to build a user-specified gene network for a prokaryotic organism in an intuitive manner. CINPER builds a network model based on different types of information provided by the user and stored in the system. CINPER’s prediction process has four steps: (i) collection of template networks based on (partially) known pathways of related organism(s) from the SEED or BioCyc database and the published literature; (ii) construction of an initial network model based on the template networks using the P-Map program; (iii) expansion of the initial model, based on the association information derived from operons, protein-protein interactions, co-expression modules and phylogenetic profiles; and (iv) computational validation of the predicted models based on gene expression data. To facilitate easy applications, CINPER provides an interactive visualization environment for a user to enter, search and edit relevant data and for the system to display (partial) results and prompt for additional data. Evaluation of CINPER on 17 well-studied pathways in the MetaCyc database shows that the program achieves an average recall rate of 76% and an average precision rate of 90% on the initial models; and a higher average recall rate at 87% and an average precision rate at 28% on the final models. The reduced precision rate in the final models versus the initial models reflects the reality that the final models have large numbers of novel genes that have no experimental evidences and hence are not yet collected in the MetaCyc database. To demonstrate the usefulness of this server, we have predicted an iron homeostasis gene network of Synechocystis sp. PCC6803 using the server. The predicted models along with the server can be accessed at http://csbl.bmb.uga.edu/cinper/.
Collapse
|
138
|
Abstract
Molecular network data are increasingly becoming available, necessitating the development of well performing computational tools for their analyses. Such tools enabled conceptually different approaches for exploring human diseases to be undertaken, in particular, those that study the relationship between a multitude of biomolecules within a cell. Hence, a new field of network biology has emerged as part of systems biology, aiming to untangle the complexity of cellular network organization. We survey current network analysis methods that aim to give insight into human disease.
Collapse
Affiliation(s)
- Vuk Janjić
- Department of Computing, Imperial College London, 180 Queen's Gate, SW7 2AZ London, UK
| | | |
Collapse
|
139
|
|
140
|
Bassel GW, Gaudinier A, Brady SM, Hennig L, Rhee SY, De Smet I. Systems analysis of plant functional, transcriptional, physical interaction, and metabolic networks. THE PLANT CELL 2012; 24:3859-75. [PMID: 23110892 PMCID: PMC3517224 DOI: 10.1105/tpc.112.100776] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Revised: 08/21/2012] [Accepted: 10/11/2012] [Indexed: 05/19/2023]
Abstract
Physiological responses, developmental programs, and cellular functions rely on complex networks of interactions at different levels and scales. Systems biology brings together high-throughput biochemical, genetic, and molecular approaches to generate omics data that can be analyzed and used in mathematical and computational models toward uncovering these networks on a global scale. Various approaches, including transcriptomics, proteomics, interactomics, and metabolomics, have been employed to obtain these data on the cellular, tissue, organ, and whole-plant level. We summarize progress on gene regulatory, cofunction, protein interaction, and metabolic networks. We also illustrate the main approaches that have been used to obtain these networks, with specific examples from Arabidopsis thaliana, and describe the pros and cons of each approach.
Collapse
Affiliation(s)
- George W. Bassel
- School of Biosciences, University of Birmingham, Birmingham B15 2TT, United Kingdom
- Division of Plant and Crop Sciences, School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Loughborough LE12 5RD, United Kingdom
| | - Allison Gaudinier
- Department of Plant Biology and Genome Center, University of California, Davis, California 95616
| | - Siobhan M. Brady
- Department of Plant Biology and Genome Center, University of California, Davis, California 95616
| | - Lars Hennig
- Department of Plant Biology and Forest Genetics, Uppsala BioCenter, Swedish University of Agricultural Sciences and Linnean Center for Plant Biology, SE-75007 Uppsala, Sweden
| | - Seung Y. Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Ive De Smet
- Division of Plant and Crop Sciences, School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Loughborough LE12 5RD, United Kingdom
| |
Collapse
|
141
|
Pavlidis P, Gillis J. Progress and challenges in the computational prediction of gene function using networks. F1000Res 2012; 1:14. [PMID: 23936626 PMCID: PMC3782350 DOI: 10.12688/f1000research.1-14.v1] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/05/2012] [Indexed: 01/13/2023] Open
Abstract
In this opinion piece, we attempt to unify recent arguments we have made that serious confounds affect the use of network data to predict and characterize gene function. The development of computational approaches to determine gene function is a major strand of computational genomics research. However, progress beyond using BLAST to transfer annotations has been surprisingly slow. We have previously argued that a large part of the reported success in using "guilt by association" in network data is due to the tendency of methods to simply assign new functions to already well-annotated genes. While such predictions will tend to be correct, they are generic; it is true, but not very helpful, that a gene with many functions is more likely to have any function. We have also presented evidence that much of the remaining performance in cross-validation cannot be usefully generalized to new predictions, making progressive improvement in analysis difficult to engineer. Here we summarize our findings about how these problems will affect network analysis, discuss some ongoing responses within the field to these issues, and consolidate some recommendations and speculation, which we hope will modestly increase the reliability and specificity of gene function prediction.
Collapse
Affiliation(s)
- Paul Pavlidis
- Centre for High-Throughput Biology and Department of Psychiatry, University of British Columbia, Vancouver, V6T1Z4, Canada
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Woodbury, NY, 11797, USA
| |
Collapse
|
142
|
|
143
|
Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet 2012; 13:523-36. [DOI: 10.1038/nrg3253] [Citation(s) in RCA: 332] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
144
|
Improving disease gene prioritization by comparing the semantic similarity of phenotypes in mice with those of human diseases. PLoS One 2012; 7:e38937. [PMID: 22719993 PMCID: PMC3375301 DOI: 10.1371/journal.pone.0038937] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 05/16/2012] [Indexed: 12/14/2022] Open
Abstract
Despite considerable progress in understanding the molecular origins of hereditary human diseases, the molecular basis of several thousand genetic diseases still remains unknown. High-throughput phenotype studies are underway to systematically assess the phenotype outcome of targeted mutations in model organisms. Thus, comparing the similarity between experimentally identified phenotypes and the phenotypes associated with human diseases can be used to suggest causal genes underlying a disease. In this manuscript, we present a method for disease gene prioritization based on comparing phenotypes of mouse models with those of human diseases. For this purpose, either human disease phenotypes are “translated” into a mouse-based representation (using the Mammalian Phenotype Ontology), or mouse phenotypes are “translated” into a human-based representation (using the Human Phenotype Ontology). We apply a measure of semantic similarity and rank experimentally identified phenotypes in mice with respect to their phenotypic similarity to human diseases. Our method is evaluated on manually curated and experimentally verified gene–disease associations for human and for mouse. We evaluate our approach using a Receiver Operating Characteristic (ROC) analysis and obtain an area under the ROC curve of up to . Furthermore, we are able to confirm previous results that the Vax1 gene is involved in Septo-Optic Dysplasia and suggest Gdf6 and Marcks as further potential candidates. Our method significantly outperforms previous phenotype-based approaches of prioritizing gene–disease associations. To enable the adaption of our method to the analysis of other phenotype data, our software and prioritization results are freely available under a BSD licence at http://code.google.com/p/phenomeblast/wiki/CAMP. Furthermore, our method has been integrated in PhenomeNET and the results can be explored using the PhenomeBrowser at http://phenomebrowser.net.
Collapse
|