51
|
Xiang J, Zhang NR, Zhang JS, Lv XY, Li M. PrGeFNE: Predicting disease-related genes by fast network embedding. Methods 2020; 192:3-12. [PMID: 32610158 DOI: 10.1016/j.ymeth.2020.06.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/13/2020] [Accepted: 06/22/2020] [Indexed: 12/14/2022] Open
Abstract
Identifying disease-related genes is of importance for understanding of molecule mechanisms of diseases, as well as diagnosis and treatment of diseases. Many computational methods have been proposed to predict disease-related genes, but how to make full use of multi-source biological data to enhance the ability of disease-gene prediction is still challenging. In this paper, we proposed a novel method for predicting disease-related genes by using fast network embedding (PrGeFNE), which can integrate multiple types of associations related to diseases and genes. Specifically, we first constructed a heterogeneous network by using phenotype-disease, disease-gene, protein-protein and gene-GO associations; and low-dimensional representation of nodes is extracted from the network by using a fast network embedding algorithm. Then, a dual-layer heterogeneous network was reconstructed by using the low-dimensional representation, and a network propagation was applied to the dual-layer heterogeneous network to predict disease-related genes. Through cross-validation and newly added-association validation, we displayed the important roles of different types of association data in enhancing the ability of disease-gene prediction, and confirmed the excellent performance of PrGeFNE by comparing to state-of-the-art algorithms. Furthermore, we developed a web tool that can facilitate researchers to search for candidate genes of different diseases predicted by PrGeFNE, along with the enrichment analysis of GO and pathway on candidate gene set. This may be useful for investigation of diseases' molecular mechanisms as well as their experimental validations. The web tool is available at http://bioinformatics.csu.edu.cn/prgefne/.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, 410219 Hunan, China
| | - Ning-Rui Zhang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Jia-Shuai Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiao-Yi Lv
- School of Software, Xinjiang University, Urumqi 830046, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
52
|
Zhao T, Hu Y, Zang T, Wang Y. Identifying Protein Biomarkers in Blood for Alzheimer's Disease. Front Cell Dev Biol 2020; 8:472. [PMID: 32626709 PMCID: PMC7314983 DOI: 10.3389/fcell.2020.00472] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 05/20/2020] [Indexed: 12/26/2022] Open
Abstract
Background: At present, the main diagnostic methods for Alzheimer's disease (AD) are positron emission tomography (PET) scanning of the brain and analysis of cerebrospinal fluid (CSF) sample, but these methods are expensive and harmful to patients. Recently, more researchers focus on diagnosing AD by detecting biomarkers in blood, which is a cheaper and harmless way. Therefore, identifying AD-related proteins in blood can help treatment and diagnosis. Methods: We proposed a hypothesis that similar diseases share similar proteins. Diseases with similar symptoms are caused by abnormalities of similar proteins. Assuming that the similarities between AD and other diseases obey the normal distribution, we developed an iterative method based on disease similarity (IBDS). We combined Elastic Network (EN) with Minimum angle regression (MAR) to find the optimal solution. Finally, we used case studies and Summary data Mendelian Random (SMR) to verify our method. Results: We selected 39 diseases which are highly related to AD. They correspond 1,481 kinds of proteins. One hundred and eighty-four proteins are reported to be related to AD in Uniprot and the number would be 284 with our method. The AUC of our method by cross-validation is 0.9251 which is much higher than previous methods. Conclusion: In this paper, we presented a novel method for prioritizing AD-related proteins. Seven proteins have tissue specificity in blood among these 284 proteins, which could be used to diagnose AD in future. Case studies and SMR have been used to prove the relationship between these 7 proteins and AD. Availability and Implementation: https://github.com/zty2009/Identifying-Protein-Biomarkers-in-Blood-for-Alzheimer-s-Disease.
Collapse
Affiliation(s)
- Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
53
|
Ivanov S, Lagunin A, Filimonov D, Tarasova O. Network-Based Analysis of OMICs Data to Understand the HIV-Host Interaction. Front Microbiol 2020; 11:1314. [PMID: 32625189 PMCID: PMC7311653 DOI: 10.3389/fmicb.2020.01314] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 05/25/2020] [Indexed: 12/22/2022] Open
Abstract
The interaction of human immunodeficiency virus with human cells is responsible for all stages of the viral life cycle, from the infection of CD4+ cells to reverse transcription, integration, and the assembly of new viral particles. To date, a large amount of OMICs data as well as information from functional genomics screenings regarding the HIV–host interaction has been accumulated in the literature and in public databases. We processed databases containing HIV–host interactions and found 2910 HIV-1-human protein-protein interactions, mostly related to viral group M subtype B, 137 interactions between human and HIV-1 coding and non-coding RNAs, essential for viral lifecycle and cell defense mechanisms, 232 transcriptomics, 27 proteomics, and 34 epigenomics HIV-related experiments. Numerous studies regarding network-based analysis of corresponding OMICs data have been published in recent years. We overview various types of molecular networks, which can be created using OMICs data, including HIV–human protein–protein interaction networks, co-expression networks, gene regulatory and signaling networks, and approaches for the analysis of their topology and dynamics. The network-based analysis can be used to determine the critical pathways and key proteins involved in the HIV life cycle, cellular and immune responses to infection, viral escape from host defense mechanisms, and mechanisms mediating different susceptibility of humans to infection. The proteins and pathways identified in these studies represent a basis for developing new anti-HIV therapeutic strategies such as new drugs preventing infection of CD4+ cells and viral replication, effective vaccines, “shock and kill” and “block and lock” approaches to cure latent infection.
Collapse
Affiliation(s)
- Sergey Ivanov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia.,Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Alexey Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia.,Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Dmitry Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| | - Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| |
Collapse
|
54
|
Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. BIOINFORMATICS (OXFORD, ENGLAND) 2020; 36:3457-3465. [PMID: 32129827 DOI: 10.1101/721423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 05/26/2023]
Abstract
BACKGROUND Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT arjun@msu.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science and Engineering
| | | | | | - Kayla A Johnson
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
55
|
Yan W, Liu X, Wang Y, Han S, Wang F, Liu X, Xiao F, Hu G. Identifying Drug Targets in Pancreatic Ductal Adenocarcinoma Through Machine Learning, Analyzing Biomolecular Networks, and Structural Modeling. Front Pharmacol 2020; 11:534. [PMID: 32425783 PMCID: PMC7204992 DOI: 10.3389/fphar.2020.00534] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Accepted: 04/06/2020] [Indexed: 12/16/2022] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is one of the leading causes of cancer-related death and has an extremely poor prognosis. Thus, identifying new disease-associated genes and targets for PDAC diagnosis and therapy is urgently needed. This requires investigations into the underlying molecular mechanisms of PDAC at both the systems and molecular levels. Herein, we developed a computational method of predicting cancer genes and anticancer drug targets that combined three independent expression microarray datasets of PDAC patients and protein-protein interaction data. First, Support Vector Machine–Recursive Feature Elimination was applied to the gene expression data to rank the differentially expressed genes (DEGs) between PDAC patients and controls. Then, protein-protein interaction networks were constructed based on the DEGs, and a new score comprising gene expression and network topological information was proposed to identify cancer genes. Finally, these genes were validated by “druggability” prediction, survival and common network analysis, and functional enrichment analysis. Furthermore, two integrins were screened to investigate their structures and dynamics as potential drug targets for PDAC. Collectively, 17 disease genes and some stroma-related pathways including extracellular matrix-receptor interactions were predicted to be potential drug targets and important pathways for treating PDAC. The protein-drug interactions and hinge sites predication of ITGAV and ITGA2 suggest potential drug binding residues in the Thigh domain. These findings provide new possibilities for targeted therapeutic interventions in PDAC, which may have further applications in other cancer types.
Collapse
Affiliation(s)
- Wenying Yan
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Xingyi Liu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Yibo Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Shuqing Han
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Fan Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Xin Liu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Fei Xiao
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China.,State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| |
Collapse
|
56
|
Zeng M, Li M, Wu FX, Li Y, Pan Y. DeepEP: a deep learning framework for identifying essential proteins. BMC Bioinformatics 2019; 20:506. [PMID: 31787076 PMCID: PMC6886168 DOI: 10.1186/s12859-019-3076-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics. Results We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins. Conclusion We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods.
Collapse
Affiliation(s)
- Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, People's Republic of China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, People's Republic of China.
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA23529, USA
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA30302, USA
| |
Collapse
|
57
|
Sabetian S, Shamsir MS. Computer aided analysis of disease linked protein networks. Bioinformation 2019; 15:513-522. [PMID: 31485137 PMCID: PMC6704336 DOI: 10.6026/97320630015513] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Revised: 04/16/2019] [Accepted: 04/17/2019] [Indexed: 12/26/2022] Open
Abstract
Proteins can interact in various ways, ranging from direct physical relationships to indirect interactions in a formation of protein-protein interaction network. Diagnosis of the protein connections is critical to identify various cellular pathways. Today constructing and analyzing the protein interaction network is being developed as a powerful approach to create network pharmacology toward detecting unknown genes and proteins associated with diseases. Discovery drug targets regarding therapeutic decisions are exciting outcomes of studying disease networks. Protein connections may be identified by experimental and recent new computational approaches. Due to difficulties in analyzing in-vivo proteins interactions, many researchers have encouraged improving computational methods to design protein interaction network. In this review, the experimental and computational approaches and also advantages and disadvantages of these methods regarding the identification of new interactions in a molecular mechanism have been reviewed. Systematic analysis of complex biological systems including network pharmacology and disease network has also been discussed in this review.
Collapse
Affiliation(s)
- Soudabeh Sabetian
- Department of Biological and Health Sciences, Faculty of Bioscience and Medical Engineering, Universiti Teknologi Malaysia, 81310 Johor, Malaysia
- Infertility Research Center, Shiraz University, Shiraz 71454, Iran, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mohd Shahir Shamsir
- Department of Biological and Health Sciences, Faculty of Bioscience and Medical Engineering, Universiti Teknologi Malaysia, 81310 Johor, Malaysia
| |
Collapse
|
58
|
Wen QF, Liu S, Dong C, Guo HX, Gao YZ, Guo FB. Geptop 2.0: An Updated, More Precise, and Faster Geptop Server for Identification of Prokaryotic Essential Genes. Front Microbiol 2019; 10:1236. [PMID: 31214154 PMCID: PMC6558110 DOI: 10.3389/fmicb.2019.01236] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/17/2019] [Indexed: 12/16/2022] Open
Abstract
Geptop has performed effectively in the identification of prokaryotic essential genes since its first release in 2013. It estimates gene essentiality for prokaryotes based on orthology and phylogeny. Genome-scale essentiality data of more prokaryotic species are available, and the information has been collected into public essential gene repositories such as DEG and OGEE. A faster and more accurate toolkit is needed to meet the increasing prokaryotic genome data. We updated Geptop by supplementing more validated essentiality data into reference set (from 19 to 37 species), and introducing multi-process technology to accelerate the computing speed. Compared with Geptop 1.0 and other gene essentiality prediction models, Geptop 2.0 can generate more stable predictions and finish the computation in a shorter time. The software is available both as an online server and a downloadable standalone application. We hope that the improved Geptop 2.0 will facilitate researches in gene essentiality and the development of novel antibacterial drugs. The gene essentiality prediction tool is available at http://cefg.uestc.cn/geptop.
Collapse
Affiliation(s)
- Qing-Feng Wen
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shuo Liu
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chuan Dong
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hai-Xia Guo
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yi-Zhou Gao
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Feng-Biao Guo
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|