1
|
Lu X, Chen G, Li J, Hu X, Sun F. MAGCN: A Multiple Attention Graph Convolution Networks for Predicting Synthetic Lethality. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2681-2689. [PMID: 36374879 DOI: 10.1109/tcbb.2022.3221736] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Synthetic lethality (SL) is a potential cancer therapeutic strategy and drug discovery. Computational approaches to identify synthetic lethality genes have become an effective complement to wet experiments which are time consuming and costly. Graph convolutional networks (GCN) has been utilized to such prediction task as be good at capturing the neighborhood dependency in a graph. However, it is still a lack of the mechanism of aggregating the complementary neighboring information from various heterogeneous graphs. Here, we propose the Multiple Attention Graph Convolution Networks for predicting synthetic lethality (MAGCN). First, we obtain the functional similarity features and topological structure features of genes from different data sources respectively, such as Gene Ontology data and Protein-Protein Interaction. Then, graph convolutional network is utilized to accumulate the knowledge from neighbor nodes according to synthetic lethal associations. Meanwhile, we propose a multiple graphs attention model and construct a multiple graphs attention network to learn the contribution factors of different graphs to generate embedded representation by aggregating these graphs. Finally, the generated feature matrix is decoded to predict potential synthetic lethal interaction. Experimental results show that MAGCN is superior to other baseline methods. Case study demonstrates the ability of MAGCN to predict human SL gene pairs.
Collapse
|
2
|
Wang J, Zhang Q, Han J, Zhao Y, Zhao C, Yan B, Dai C, Wu L, Wen Y, Zhang Y, Leng D, Wang Z, Yang X, He S, Bo X. Computational methods, databases and tools for synthetic lethality prediction. Brief Bioinform 2022; 23:6555403. [PMID: 35352098 PMCID: PMC9116379 DOI: 10.1093/bib/bbac106] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/15/2022] [Accepted: 03/02/2022] [Indexed: 12/17/2022] Open
Abstract
Synthetic lethality (SL) occurs between two genes when the inactivation of either gene alone has no effect on cell survival but the inactivation of both genes results in cell death. SL-based therapy has become one of the most promising targeted cancer therapies in the last decade as PARP inhibitors achieve great success in the clinic. The key point to exploiting SL-based cancer therapy is the identification of robust SL pairs. Although many wet-lab-based methods have been developed to screen SL pairs, known SL pairs are less than 0.1% of all potential pairs due to large number of human gene combinations. Computational prediction methods complement wet-lab-based methods to effectively reduce the search space of SL pairs. In this paper, we review the recent applications of computational methods and commonly used databases for SL prediction. First, we introduce the concept of SL and its screening methods. Second, various SL-related data resources are summarized. Then, computational methods including statistical-based methods, network-based methods, classical machine learning methods and deep learning methods for SL prediction are summarized. In particular, we elaborate on the negative sampling methods applied in these models. Next, representative tools for SL prediction are introduced. Finally, the challenges and future work for SL prediction are discussed.
Collapse
Affiliation(s)
- Jing Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Qinglong Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Junshan Han
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yanpeng Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Caiyun Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Bowei Yan
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chong Dai
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Lianlian Wu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongming Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoxi Yang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| |
Collapse
|
3
|
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction. Sci Rep 2020; 10:3612. [PMID: 32107391 PMCID: PMC7046773 DOI: 10.1038/s41598-020-60235-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 11/05/2019] [Indexed: 12/15/2022] Open
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.
Collapse
|
4
|
Cáceres JJ, Paccanaro A. Disease gene prediction for molecularly uncharacterized diseases. PLoS Comput Biol 2019; 15:e1007078. [PMID: 31276496 PMCID: PMC6636748 DOI: 10.1371/journal.pcbi.1007078] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 07/17/2019] [Accepted: 05/09/2019] [Indexed: 02/06/2023] Open
Abstract
Network medicine approaches have been largely successful at increasing our knowledge of molecularly characterized diseases. Given a set of disease genes associated with a disease, neighbourhood-based methods and random walkers exploit the interactome allowing the prediction of further genes for that disease. In general, however, diseases with no known molecular basis constitute a challenge. Here we present a novel network approach to prioritize gene-disease associations that is able to also predict genes for diseases with no known molecular basis. Our method, which we have called Cardigan (ChARting DIsease Gene AssociatioNs), uses semi-supervised learning and exploits a measure of similarity between disease phenotypes. We evaluated its performance at predicting genes for both molecularly characterized and uncharacterized diseases in OMIM, using both weighted and binary interactomes, and compared it with state-of-the-art methods. Our tests, which use datasets collected at different points in time to replicate the dynamics of the disease gene discovery process, prove that Cardigan is able to accurately predict disease genes for molecularly uncharacterized diseases. Additionally, standard leave-one-out cross validation tests show how our approach outperforms state-of-the-art methods at predicting genes for molecularly characterized diseases by 14%-65%. Cardigan can also be used for disease module prediction, where it outperforms state-of-the-art methods by 87%-299%.
Collapse
Affiliation(s)
- Juan J. Cáceres
- Centre for Systems and Synthetic Biology & Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
| | - Alberto Paccanaro
- Centre for Systems and Synthetic Biology & Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
- * E-mail:
| |
Collapse
|
5
|
Kiblawi S, Chasman D, Henning A, Park E, Poon H, Gould M, Ahlquist P, Craven M. Augmenting subnetwork inference with information extracted from the scientific literature. PLoS Comput Biol 2019; 15:e1006758. [PMID: 31246951 PMCID: PMC6619809 DOI: 10.1371/journal.pcbi.1006758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 07/10/2019] [Accepted: 01/04/2019] [Indexed: 11/20/2022] Open
Abstract
Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference There is a multitude of publicly available databases that contain information about biological entities (i.e., genes, proteins, and other small molecules) as well as information about how these entities interact together. However, these databases are often incomplete. There is a wealth of information present in the text of the scientific literature that is not yet available in these databases. Using tools that mine the scientific literature we are able to extract some of this potentially relevant information. In this work we show how we can use publicly available databases in conjunction with the information extracted from the scientific literature to infer the networks that are involved in specific biological processes, such as viral replication and cancer tumor growth.
Collapse
Affiliation(s)
- Sid Kiblawi
- Department of Computer Sciences, University of Wisconsin, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, WI, USA
| | - Amanda Henning
- Department of Oncology, University of Wisconsin, Madison, WI, USA
| | - Eunju Park
- Institute for Molecular Virology, University of Wisconsin, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | | | - Michael Gould
- Department of Oncology, University of Wisconsin, Madison, WI, USA
| | - Paul Ahlquist
- Institute for Molecular Virology, University of Wisconsin, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
- Howard Hughes Medical Institute, University of Wisconsin, Madison, WI, USA
| | - Mark Craven
- Department of Computer Sciences, University of Wisconsin, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
- * E-mail:
| |
Collapse
|
6
|
Frasca M, Bertoni A, Valentini G. UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions. J Comput Biol 2015; 22:1057-74. [PMID: 26402488 DOI: 10.1089/cmb.2014.0110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The proper integration of multiple sources of data and the unbalance between annotated and unannotated proteins represent two of the main issues of the automated function prediction (AFP) problem. Most of supervised and semisupervised learning algorithms for AFP proposed in literature do not jointly consider these items, with a negative impact on both sensitivity and precision performances, due to the unbalance between annotated and unannotated proteins that characterize the majority of functional classes and to the specific and complementary information content embedded in each available source of data. We propose UNIPred (unbalance-aware network integration and prediction of protein functions), an algorithm that properly combines different biomolecular networks and predicts protein functions using parametric semisupervised neural models. The algorithm explicitly takes into account the unbalance between unannotated and annotated proteins both to construct the integrated network and to predict protein annotations for each functional class. Full-genome and ontology-wide experiments with three eukaryotic model organisms show that the proposed method compares favorably with state-of-the-art learning algorithms for AFP.
Collapse
Affiliation(s)
- Marco Frasca
- DI - Department of Computer Science, University of Milan , Milan, Italy
| | - Alberto Bertoni
- DI - Department of Computer Science, University of Milan , Milan, Italy
| | - Giorgio Valentini
- DI - Department of Computer Science, University of Milan , Milan, Italy
| |
Collapse
|
7
|
Qin T, Matmati N, Tsoi LC, Mohanty BK, Gao N, Tang J, Lawson AB, Hannun YA, Zheng WJ. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network. Nucleic Acids Res 2014; 42:e138. [PMID: 25063300 PMCID: PMC4191379 DOI: 10.1093/nar/gku678] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general.
Collapse
Affiliation(s)
- Tingting Qin
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nabil Matmati
- The Stony Brook University Cancer Center and the Department of Medicine, Stony Brook, NY 11794, USA
| | - Lam C Tsoi
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Bidyut K Mohanty
- Department of Biochemistry & Molecular Biology, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Nan Gao
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
| | - Andrew B Lawson
- Department of Public Health Science, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Yusuf A Hannun
- The Stony Brook University Cancer Center and the Department of Medicine, Stony Brook, NY 11794, USA
| | - W Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
8
|
Valentini G, Paccanaro A, Caniza H, Romero AE, Re M. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif Intell Med 2014; 61:63-78. [PMID: 24726035 PMCID: PMC4070077 DOI: 10.1016/j.artmed.2014.03.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 03/05/2014] [Accepted: 03/10/2014] [Indexed: 02/07/2023]
Abstract
OBJECTIVE In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. MATERIALS AND METHODS We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. RESULTS The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. CONCLUSIONS Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.
Collapse
Affiliation(s)
- Giorgio Valentini
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy.
| | - Alberto Paccanaro
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Horacio Caniza
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Alfonso E Romero
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Matteo Re
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy
| |
Collapse
|
9
|
Mesiti M, Re M, Valentini G. Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction. Gigascience 2014; 3:5. [PMID: 24843788 PMCID: PMC4006453 DOI: 10.1186/2047-217x-3-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 04/01/2014] [Indexed: 01/08/2023] Open
Abstract
Background Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterized biological processes and pathways, and their effectiveness with poorly annotated species is relatively limited. A possible solution to this problem might consist in the construction of big networks including multiple species, but this in turn poses challenging computational problems, due to the scalability limitations of existing algorithms and the main memory requirements induced by the construction of big networks. Distributed computation or the usage of big computers could in principle respond to these issues, but raises further algorithmic problems and require resources not satisfiable with simple off-the-shelf computers. Results We propose a novel framework for scalable network-based learning of multi-species protein functions based on both a local implementation of existing algorithms and the adoption of innovative technologies: we solve “locally” the AFP problem, by designing “vertex-centric” implementations of network-based algorithms, but we do not give up thinking “globally” by exploiting the overall topology of the network. This is made possible by the adoption of secondary memory-based technologies that allow the efficient use of the large memory available on disks, thus overcoming the main memory limitations of modern off-the-shelf computers. This approach has been applied to the analysis of a large multi-species network including more than 300 species of bacteria and to a network with more than 200,000 proteins belonging to 13 Eukaryotic species. To our knowledge this is the first work where secondary-memory based network analysis has been applied to multi-species function prediction using biological networks with hundreds of thousands of proteins. Conclusions The combination of these algorithmic and technological approaches makes feasible the analysis of large multi-species networks using ordinary computers with limited speed and primary memory, and in perspective could enable the analysis of huge networks (e.g. the whole proteomes available in SwissProt), using well-equipped stand-alone machines.
Collapse
Affiliation(s)
- Marco Mesiti
- AnacletoLab - Department of Computer Science, University of Milano, Via Comelico 39/41, 20135 Milano, Italy
| | - Matteo Re
- AnacletoLab - Department of Computer Science, University of Milano, Via Comelico 39/41, 20135 Milano, Italy
| | - Giorgio Valentini
- AnacletoLab - Department of Computer Science, University of Milano, Via Comelico 39/41, 20135 Milano, Italy
| |
Collapse
|
10
|
Re M, Valentini G. Network-based drug ranking and repositioning with respect to DrugBank therapeutic categories. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1359-1371. [PMID: 24407295 DOI: 10.1109/tcbb.2013.62] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Drug repositioning is a challenging computational problem involving the integration of heterogeneous sources of biomolecular data and the design of label ranking algorithms able to exploit the overall topology of the underlying pharmacological network. In this context, we propose a novel semisupervised drug ranking problem: prioritizing drugs in integrated biochemical networks according to specific DrugBank therapeutic categories. Algorithms for drug repositioning usually perform the inference step into an inhomogeneous similarity space induced by the relationships existing between drugs and a second type of entity (e.g., disease, target, ligand set), thus making unfeasible a drug ranking within a homogeneous pharmacological space. To deal with this problem, we designed a general framework based on bipartite network projections by which homogeneous pharmacological networks can be constructed and integrated from heterogeneous and complementary sources of chemical, biomolecular and clinical information. Moreover, we present a novel algorithmic scheme based on kernelized score functions that adopts both local and global learning strategies to effectively rank drugs in the integrated pharmacological space using different network combination methods. Detailed experiments with more than 80 DrugBank therapeutic categories involving about 1,300 FDA-approved drugs show the effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Matteo Re
- Universita degli Studi di Milano, Milano
| | | |
Collapse
|
11
|
Re M, Mesiti M, Valentini G. A fast ranking algorithm for predicting gene functions in biomolecular networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1812-1818. [PMID: 23221088 DOI: 10.1109/tcbb.2012.114] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Ranking genes in functional networks according to a specific biological function is a challenging task raising relevant performance and computational complexity problems. To cope with both these problems we developed a transductive gene ranking method based on kernelized score functions able to fully exploit the topology and the graph structure of biomolecular networks and to capture significant functional relationships between genes. We run the method on a network constructed by integrating multiple biomolecular data sources in the yeast model organism, achieving significantly better results than the compared state-of-the-art network-based algorithms for gene function prediction, and with relevant savings in computational time. The proposed approach is general and fast enough to be in perspective applied to other relevant node ranking problems in large and complex biological networks.
Collapse
Affiliation(s)
- Matteo Re
- Dipartimento di Informatica, Università degli Studi di Milano, Via Comelico 39/41, I-20135 Milano, Italia.
| | | | | |
Collapse
|
12
|
Abstract
Background Co-expression based Cancer Modules (CMs) are sets of genes that act in concert to carry out specific functions in different cancer types, and are constructed by exploiting gene expression profiles related to specific clinical conditions or expression signatures associated to specific processes altered in cancer. Unfortunately, genes involved in cancer are not always detectable using only expression signatures or co-expressed sets of genes, and in principle other types of functional interactions should be exploited to obtain a comprehensive picture of the molecular mechanisms underlying the onset and progression of cancer. Results We propose a novel semi-supervised method to rank genes with respect to CMs using networks constructed from different sources of functional information, not limited to gene expression data. It exploits on the one hand local learning strategies through score functions that extend the guilt-by-association approach, and on the other hand global learning strategies through graph kernels embedded in the score functions, able to take into account the overall topology of the network. The proposed kernelized score functions compare favorably with other state-of-the-art semi-supervised machine learning methods for gene ranking in biological networks and scales well with the number of genes, thus allowing fast processing of very large gene networks. Conclusions The modular nature of kernelized score functions provides an algorithmic scheme from which different gene ranking algorithms can be derived, and the results show that using integrated functional networks we can successfully predict CMs defined mainly through expression signatures obtained from gene expression data profiling. A preliminary analysis of top ranked "false positive" genes shows that our approach could be in perspective applied to discover novel genes involved in the onset and progression of tumors related to specific CMs.
Collapse
Affiliation(s)
- Matteo Re
- Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano MI, Italia
| | | |
Collapse
|
13
|
Greene CS, Troyanskaya OG. Accurate evaluation and analysis of functional genomics data and methods. Ann N Y Acad Sci 2012; 1260:95-100. [PMID: 22268703 DOI: 10.1111/j.1749-6632.2011.06383.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The development of technology capable of inexpensively performing large-scale measurements of biological systems has generated a wealth of data. Integrative analysis of these data holds the promise of uncovering gene function, regulation, and, in the longer run, understanding complex disease. However, their analysis has proved very challenging, as it is difficult to quickly and effectively assess the relevance and accuracy of these data for individual biological questions. Here, we identify biases that present challenges for the assessment of functional genomics data and methods. We then discuss evaluation methods that, taken together, begin to address these issues. We also argue that the funding of systematic data-driven experiments and of high-quality curation efforts will further improve evaluation metrics so that they more-accurately assess functional genomics data and methods. Such metrics will allow researchers in the field of functional genomics to continue to answer important biological questions in a data-driven manner.
Collapse
Affiliation(s)
- Casey S Greene
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA.
| | | |
Collapse
|
14
|
Large Scale Ranking and Repositioning of Drugs with Respect to DrugBank Therapeutic Categories. BIOINFORMATICS RESEARCH AND APPLICATIONS 2012. [DOI: 10.1007/978-3-642-30191-9_21] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|