1
|
Peng J, Xue H, Wei Z, Tuncali I, Hao J, Shang X. Integrating multi-network topology for gene function prediction using deep neural networks. Brief Bioinform 2020; 22:2096-2105. [PMID: 32249297 DOI: 10.1093/bib/bbaa036] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 02/09/2020] [Accepted: 02/25/2020] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION The emergence of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contributes to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods do not consider the shared information among different networks during the feature learning process. RESULTS Taking the correlation among the networks into account, we design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human datasets and compare with three state-of-the-art methods. The results demonstrate the superior performance of our method. We not only provide a comprehensive analysis of the performance of the newly proposed algorithm but also provide a tool for extracting features of genes based on multiple networks, which can be used in the downstream machine learning task. AVAILABILITY DeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN. CONTACT jiajiepeng@nwpu.edu.cn; shang@nwpu.edu.cn; jianye.hao@tju.edu.cn.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Hansheng Xue
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Zhongyu Wei
- Research School of Computer Science, Australian National University, Canberra, 2601, Australia
| | - Idil Tuncali
- School of Data Science, Fudan University, Shanghai, 200433, China
| | | | | |
Collapse
|
2
|
Gligorijevic V, Barot M, Bonneau R. deepNF: deep network fusion for protein function prediction. Bioinformatics 2019; 34:3873-3881. [PMID: 29868758 PMCID: PMC6223364 DOI: 10.1093/bioinformatics/bty440] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 05/28/2018] [Indexed: 01/10/2023] Open
Abstract
Motivation The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly non-linear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. Results We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting gene ontology terms of varying type and specificity. Availability and implementation deepNF is freely available at: https://github.com/VGligorijevic/deepNF. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vladimir Gligorijevic
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Meet Barot
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.,Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA.,Center for Data Science, New York University, New York, NY, USA
| |
Collapse
|
3
|
Kacsoh BZ, Barton S, Jiang Y, Zhou N, Mooney SD, Friedberg I, Radivojac P, Greene CS, Bosco G. New Drosophila Long-Term Memory Genes Revealed by Assessing Computational Function Prediction Methods. G3 (BETHESDA, MD.) 2019; 9:251-267. [PMID: 30463884 PMCID: PMC6325913 DOI: 10.1534/g3.118.200867] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 11/20/2018] [Indexed: 01/26/2023]
Abstract
A major bottleneck to our understanding of the genetic and molecular foundation of life lies in the ability to assign function to a gene and, subsequently, a protein. Traditional molecular and genetic experiments can provide the most reliable forms of identification, but are generally low-throughput, making such discovery and assignment a daunting task. The bottleneck has led to an increasing role for computational approaches. The Critical Assessment of Functional Annotation (CAFA) effort seeks to measure the performance of computational methods. In CAFA3, we performed selected screens, including an effort focused on long-term memory. We used homology and previous CAFA predictions to identify 29 key Drosophila genes, which we tested via a long-term memory screen. We identify 11 novel genes that are involved in long-term memory formation and show a high level of connectivity with previously identified learning and memory genes. Our study provides first higher-order behavioral assay and organism screen used for CAFA assessments and revealed previously uncharacterized roles of multiple genes as possible regulators of neuronal plasticity at the boundary of information acquisition and memory formation.
Collapse
Affiliation(s)
- Balint Z Kacsoh
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Stephen Barton
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Yuxiang Jiang
- Department of Computer Science, Indiana University, Bloomington, IN
| | - Naihui Zhou
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, Iowa 50011
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, Iowa 50011
| | - Predrag Radivojac
- College of Computer and Information Science, Northeastern University, Boston, MA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, 19104
| | - Giovanni Bosco
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| |
Collapse
|
4
|
Kacsoh BZ, Greene CS, Bosco G. Machine Learning Analysis Identifies Drosophila Grunge/Atrophin as an Important Learning and Memory Gene Required for Memory Retention and Social Learning. G3 (BETHESDA, MD.) 2017; 7:3705-3718. [PMID: 28889104 PMCID: PMC5677163 DOI: 10.1534/g3.117.300172] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 08/07/2017] [Indexed: 12/12/2022]
Abstract
High-throughput experiments are becoming increasingly common, and scientists must balance hypothesis-driven experiments with genome-wide data acquisition. We sought to predict novel genes involved in Drosophila learning and long-term memory from existing public high-throughput data. We performed an analysis using PILGRM, which analyzes public gene expression compendia using machine learning. We evaluated the top prediction alongside genes involved in learning and memory in IMP, an interface for functional relationship networks. We identified Grunge/Atrophin (Gug/Atro), a transcriptional repressor, histone deacetylase, as our top candidate. We find, through multiple, distinct assays, that Gug has an active role as a modulator of memory retention in the fly and its function is required in the adult mushroom body. Depletion of Gug specifically in neurons of the adult mushroom body, after cell division and neuronal development is complete, suggests that Gug function is important for memory retention through regulation of neuronal activity, and not by altering neurodevelopment. Our study provides a previously uncharacterized role for Gug as a possible regulator of neuronal plasticity at the interface of memory retention and memory extinction.
Collapse
Affiliation(s)
- Balint Z Kacsoh
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire 03755
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Giovanni Bosco
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire 03755
| |
Collapse
|
5
|
Abstract
“Big Data” has surpassed “systems biology” and “omics” as the hottest buzzword in the biological sciences, but is there any substance behind the hype? Certainly, we have learned about various aspects of cell and molecular biology from the many individual high-throughput data sets that have been published in the past 15–20 years. These data, although useful as individual data sets, can provide much more knowledge when interrogated with Big Data approaches, such as applying integrative methods that leverage the heterogeneous data compendia in their entirety. Here we discuss the benefits and challenges of such Big Data approaches in biology and how cell and molecular biologists can best take advantage of them.
Collapse
Affiliation(s)
- Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540
| | - Olga G Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540 Department of Computer Science, Princeton University, Princeton, NJ 08540 Simons Center for Data Analysis, Simons Foundation, New York, NY 10010
| |
Collapse
|
6
|
Pazos Obregón F, Papalardo C, Castro S, Guerberoff G, Cantera R. Putative synaptic genes defined from a Drosophila whole body developmental transcriptome by a machine learning approach. BMC Genomics 2015; 16:694. [PMID: 26370122 PMCID: PMC4570697 DOI: 10.1186/s12864-015-1888-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 09/01/2015] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Although roughly a thousand genes are expected to be important for this function in Drosophila melanogaster, just a few hundreds of them are known so far. RESULTS In this work we trained three learning algorithms to predict a "synaptic function" for genes of Drosophila using data from a whole-body developmental transcriptome published by others. Using statistical and biological criteria to analyze and combine the predictions, we obtained a gene catalogue that is highly enriched in genes of relevance for Drosophila synapse assembly and function but still not recognized as such. CONCLUSIONS The utility of our approach is that it reduces the number of genes to be tested through hypothesis-driven experimentation.
Collapse
Affiliation(s)
- Flavio Pazos Obregón
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Avenida Italia 3318, PC 11600, Montevideo, Uruguay.
| | - Cecilia Papalardo
- Instituto de Matemática y Estadística "Prof. Ing. Rafael Laguardia", Facultad de Ingeniería, Universidad de la República, Montevideo, Uruguay.
| | - Sebastián Castro
- Instituto de Matemática y Estadística "Prof. Ing. Rafael Laguardia", Facultad de Ingeniería, Universidad de la República, Montevideo, Uruguay.
| | - Gustavo Guerberoff
- Instituto de Matemática y Estadística "Prof. Ing. Rafael Laguardia", Facultad de Ingeniería, Universidad de la República, Montevideo, Uruguay.
| | - Rafael Cantera
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Avenida Italia 3318, PC 11600, Montevideo, Uruguay.
- Zoology Department, Stockholm University, Stockholm, Sweden.
| |
Collapse
|
7
|
Kramer M, Dutkowski J, Yu M, Bafna V, Ideker T. Inferring gene ontologies from pairwise similarity data. Bioinformatics 2014; 30:i34-42. [PMID: 24932003 PMCID: PMC4058954 DOI: 10.1093/bioinformatics/btu282] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference. METHODS We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast. RESULTS For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall). CONCLUSION This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data.
Collapse
Affiliation(s)
- Michael Kramer
- Department of Medicine and Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Janusz Dutkowski
- Department of Medicine and Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Michael Yu
- Department of Medicine and Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Vineet Bafna
- Department of Medicine and Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Department of Medicine and Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
8
|
ŽITNIK MARINKA, ZUPAN BLAŽ. Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014:400-411. [PMID: 24297565 PMCID: PMC3902649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker's yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps.
Collapse
Affiliation(s)
- MARINKA ŽITNIK
- Faculty of Computer and Information Science, University of Ljubljana, Tržaška 25, SI-1000, Slovenia,
| | - BLAŽ ZUPAN
- Faculty of Computer and Information Science, University of Ljubljana, Tržaška 25, SI-1000, Slovenia; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX-77030, USA,
| |
Collapse
|
9
|
Mitsakakis N, Razak Z, Escobar M, Westwood JT. Prediction of Drosophila melanogaster gene function using Support Vector Machines. BioData Min 2013; 6:8. [PMID: 23547736 PMCID: PMC3669044 DOI: 10.1186/1756-0381-6-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Accepted: 02/11/2013] [Indexed: 11/23/2022] Open
Abstract
Background While the genomes of hundreds of organisms have been sequenced and good approaches exist for finding protein encoding genes, an important remaining challenge is predicting the functions of the large fraction of genes for which there is no annotation. Large gene expression datasets from microarray experiments already exist and many of these can be used to help assign potential functions to these genes. We have applied Support Vector Machines (SVM), a sigmoid fitting function and a stratified cross‐validation approach to analyze a large microarray experiment dataset from Drosophila melanogaster in order to predict possible functions for previously un‐annotated genes. A total of approximately 5043 different genes, or about one‐third of the predicted genes in the D. melanogaster genome, are represented in the dataset and 1854 (or 37%) of these genes are un‐annotated. Results 39 Gene Ontology Biological Process (GO‐BP) categories were found with precision value equal or larger than 0.75, when recall was fixed at the 0.4 level. For two of those categories, we have provided additional support for assigning given genes to the category by showing that the majority of transcripts for the genes belonging in a given category have a similar localization pattern during embryogenesis. Additionally, by assessing the predictions using a confidence score, we have been able to provide a putative GO‐BP term for 1422 previously un‐annotated genes or about 77% of the un‐annotated genes represented on the microarray and about 19% of all of the un‐annotated genes in the D. melanogaster genome. Conclusions Our study successfully employs a number of SVM classifiers, accompanied by detailed calibration and validation techniques, to generate a number of predictions for new annotations for D. melanogaster genes. The applied probabilistic analysis to SVM output improves the interpretability of the prediction results and the objectivity of the validation procedure.
Collapse
Affiliation(s)
- Nicholas Mitsakakis
- Toronto Health Economics and Technology Assessment (THETA) Collaborative, University of Toronto, Toronto, Canada.
| | | | | | | |
Collapse
|
10
|
Ekins S, Shigeta R, Bunin BA. Bottlenecks caused by software gaps in miRNA and RNAi research. Pharm Res 2012; 29:1717-21. [PMID: 22362409 DOI: 10.1007/s11095-012-0712-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2012] [Accepted: 02/15/2012] [Indexed: 10/28/2022]
Abstract
Understanding the regulation of gene expression is critical to many areas of biology while control via RNAs has found considerable interest as a tool for scientific discovery and potential therapeutic applications. For example whole genome RNA interference (RNAi) screens and whole proteome scans provide views of how the entire transcriptome or proteome responds to biological, chemical or environmental perturbations of a gene's activity. Small RNA (sRNA) or MicroRNA (miRNA) are known to regulate pathways and bind mRNA, while the function of miRNAs discovered in experimental studies is often unknown. In both cases, RNAi and miRNA require labor intensive studies to tease out their functions within gene networks. Available software to analyze relationships is currently an ad hoc and often a manual process that can take up to several hours to analyze a single candidate RNAi or miRNA. With experiments frequently highlighting tens to hundreds of candidates this represents a considerable bottleneck. We suggest there is a gap in miRNA and RNAi research caused by inadequate current software that could be improved. For example a new software application could be created that provides interactive, comprehensive target analysis that leverages past datasets to lead to statistically stronger analyses.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, USA.
| | | | | |
Collapse
|
11
|
Flockhart IT, Booker M, Hu Y, McElvany B, Gilly Q, Mathey-Prevot B, Perrimon N, Mohr SE. FlyRNAi.org--the database of the Drosophila RNAi screening center: 2012 update. Nucleic Acids Res 2011; 40:D715-9. [PMID: 22067456 PMCID: PMC3245182 DOI: 10.1093/nar/gkr953] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
FlyRNAi (http://www.flyrnai.org), the database and website of the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School, serves a dual role, tracking both production of reagents for RNA interference (RNAi) screening in Drosophila cells and RNAi screen results. The database and website is used as a platform for community availability of protocols, tools, and other resources useful to researchers planning, conducting, analyzing or interpreting the results of Drosophila RNAi screens. Based on our own experience and user feedback, we have made several changes. Specifically, we have restructured the database to accommodate new types of reagents; added information about new RNAi libraries and other reagents; updated the user interface and website; and added new tools of use to the Drosophila community and others. Overall, the result is a more useful, flexible and comprehensive website and database.
Collapse
Affiliation(s)
- Ian T Flockhart
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Greene CS, Troyanskaya OG. PILGRM: an interactive data-driven discovery platform for expert biologists. Nucleic Acids Res 2011; 39:W368-74. [PMID: 21653547 PMCID: PMC3125802 DOI: 10.1093/nar/gkr440] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
PILGRM (the platform for interactive learning by genomics results mining) puts advanced supervised analysis techniques applied to enormous gene expression compendia into the hands of bench biologists. This flexible system empowers its users to answer diverse biological questions that are often outside of the scope of common databases in a data-driven manner. This capability allows domain experts to quickly and easily generate hypotheses about biological processes, tissues or diseases of interest. Specifically PILGRM helps biologists generate these hypotheses by analyzing the expression levels of known relevant genes in large compendia of microarray data. Because PILGRM is data-driven, it complements a user’s knowledge and literature analysis with mining of diverse functional genomic data, thereby generating novel predictions that can drive experimental follow-up. This server is free, does not require registration and is available for use at http://pilgrm.princeton.edu.
Collapse
Affiliation(s)
- Casey S Greene
- Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| | | |
Collapse
|
13
|
Korcsmáros T, Szalay MS, Rovó P, Palotai R, Fazekas D, Lenti K, Farkas IJ, Csermely P, Vellai T. Signalogs: orthology-based identification of novel signaling pathway components in three metazoans. PLoS One 2011; 6:e19240. [PMID: 21559328 PMCID: PMC3086880 DOI: 10.1371/journal.pone.0019240] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2010] [Accepted: 03/29/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Uncovering novel components of signal transduction pathways and their interactions within species is a central task in current biological research. Orthology alignment and functional genomics approaches allow the effective identification of signaling proteins by cross-species data integration. Recently, functional annotation of orthologs was transferred across organisms to predict novel roles for proteins. Despite the wide use of these methods, annotation of complete signaling pathways has not yet been transferred systematically between species. PRINCIPAL FINDINGS Here we introduce the concept of 'signalog' to describe potential novel signaling function of a protein on the basis of the known signaling role(s) of its ortholog(s). To identify signalogs on genomic scale, we systematically transferred signaling pathway annotations among three animal species, the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and humans. Using orthology data from InParanoid and signaling pathway information from the SignaLink database, we predict 88 worm, 92 fly, and 73 human novel signaling components. Furthermore, we developed an on-line tool and an interactive orthology network viewer to allow users to predict and visualize components of orthologous pathways. We verified the novelty of the predicted signalogs by literature search and comparison to known pathway annotations. In C. elegans, 6 out of the predicted novel Notch pathway members were validated experimentally. Our approach predicts signaling roles for 19 human orthodisease proteins and 5 known drug targets, and suggests 14 novel drug target candidates. CONCLUSIONS Orthology-based pathway membership prediction between species enables the identification of novel signaling pathway components that we referred to as signalogs. Signalogs can be used to build a comprehensive signaling network in a given species. Such networks may increase the biomedical utilization of C. elegans and D. melanogaster. In humans, signalogs may identify novel drug targets and new signaling mechanisms for approved drugs.
Collapse
Affiliation(s)
- Tamás Korcsmáros
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | | | | | | | | | | | | | | | | |
Collapse
|