1
|
Xu M, Abdullah NA, Md Sabri AQ. A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data. Comput Biol Chem 2024; 108:107997. [PMID: 38154318 DOI: 10.1016/j.compbiolchem.2023.107997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 11/03/2023] [Accepted: 12/03/2023] [Indexed: 12/30/2023]
Abstract
This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods.
Collapse
Affiliation(s)
- Mingzhe Xu
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia; School of Energy and Intelligence Engineering, Henan University of Animal Husbandry and Economy, #6 North Longzihu Rd, Zhengzhou 450000, China.
| | - Nor Aniza Abdullah
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia.
| | - Aznul Qalid Md Sabri
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia.
| |
Collapse
|
2
|
Renaux A, Terwagne C, Cochez M, Tiddi I, Nowé A, Lenaerts T. A knowledge graph approach to predict and interpret disease-causing gene interactions. BMC Bioinformatics 2023; 24:324. [PMID: 37644440 PMCID: PMC10463539 DOI: 10.1186/s12859-023-05451-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. RESULTS We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. CONCLUSION Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.
Collapse
Affiliation(s)
- Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Chloé Terwagne
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
| | - Michael Cochez
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| | - Ilaria Tiddi
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
3
|
Zhang L, Lu D, Bi X, Zhao K, Yu G, Quan N. Predicting disease genes based on multi-head attention fusion. BMC Bioinformatics 2023; 24:162. [PMID: 37085750 PMCID: PMC10122338 DOI: 10.1186/s12859-023-05285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 04/12/2023] [Indexed: 04/23/2023] Open
Abstract
BACKGROUND The identification of disease-related genes is of great significance for the diagnosis and treatment of human disease. Most studies have focused on developing efficient and accurate computational methods to predict disease-causing genes. Due to the sparsity and complexity of biomedical data, it is still a challenge to develop an effective multi-feature fusion model to identify disease genes. RESULTS This paper proposes an approach to predict the pathogenic gene based on multi-head attention fusion (MHAGP). Firstly, the heterogeneous biological information networks of disease genes are constructed by integrating multiple biomedical knowledge databases. Secondly, two graph representation learning algorithms are used to capture the feature vectors of gene-disease pairs from the network, and the features are fused by introducing multi-head attention. Finally, multi-layer perceptron model is used to predict the gene-disease association. CONCLUSIONS The MHAGP model outperforms all of other methods in comparative experiments. Case studies also show that MHAGP is able to predict genes potentially associated with diseases. In the future, more biological entity association data, such as gene-drug, disease phenotype-gene ontology and so on, can be added to expand the information in heterogeneous biological networks and achieve more accurate predictions. In addition, MHAGP with strong expansibility can be used for potential tasks such as gene-drug association and drug-disease association prediction.
Collapse
Affiliation(s)
- Linlin Zhang
- College of Software Engineering, Xinjiang University, Urumqi, China.
| | - Dianrong Lu
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| | - Xuehua Bi
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi, China
| | - Kai Zhao
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| | - Guanglei Yu
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi, China
| | - Na Quan
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| |
Collapse
|
4
|
Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023; 18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open
Abstract
Graph analytical approaches permit identifying novel genes involved in complex diseases, but are limited by (i) inferring structural network similarity of connected gene nodes, ignoring potentially relevant unconnected nodes; (ii) using homogeneous graphs, missing gene-disease associations' complexity; (iii) relying on disease/gene-phenotype associations' similarities, involving highly incomplete data; (iv) using binary classification, with gene-disease edges as positive training samples, and non-associated gene and disease nodes as negative samples that may include currently unknown disease genes; or (v) reporting predicted novel associations without systematically evaluating their accuracy. Addressing these limitations, we develop the Heterogeneous Integrated Graph for Predicting Disease Genes (HetIG-PreDiG) model that includes gene-gene, gene-disease, and gene-tissue associations. We predict novel disease genes using low-dimensional representation of nodes accounting for network structure, and extending beyond network structure using the developed Gene-Disease Prioritization Score (GDPS) reflecting the degree of gene-disease association via gene co-expression data. For negative training samples, we select non-associated gene and disease nodes with lower GDPS that are less likely to be affiliated. We evaluate the developed model's success in predicting novel disease genes by analyzing the prediction probabilities of gene-disease associations. HetIG-PreDiG successfully predicts (Micro-F1 = 0.95) gene-disease associations, outperforming baseline models, and is validated using published literature, thus advancing our understanding of complex genetic diseases.
Collapse
Affiliation(s)
- Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- Department of Psychiatry, Harvard Medical School, Boston, MA, United States of America
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States of America
| | - Yael Shvili
- Department of Surgery A, Meir Medical Center, Kfar Sava, Israel
| | - Alon Bartal
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- * E-mail:
| |
Collapse
|
5
|
Zheng K, Zhang XL, Wang L, You ZH, Ji BY, Liang X, Li ZW. SPRDA: a link prediction approach based on the structural perturbation to infer disease-associated Piwi-interacting RNAs. Brief Bioinform 2023; 24:6850564. [PMID: 36445194 DOI: 10.1093/bib/bbac498] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/17/2022] [Accepted: 10/19/2022] [Indexed: 11/30/2022] Open
Abstract
piRNA and PIWI proteins have been confirmed for disease diagnosis and treatment as novel biomarkers due to its abnormal expression in various cancers. However, the current research is not strong enough to further clarify the functions of piRNA in cancer and its underlying mechanism. Therefore, how to provide large-scale and serious piRNA candidates for biological research has grown up to be a pressing issue. In this study, a novel computational model based on the structural perturbation method is proposed to predict potential disease-associated piRNAs, called SPRDA. Notably, SPRDA belongs to positive-unlabeled learning, which is unaffected by negative examples in contrast to previous approaches. In the 5-fold cross-validation, SPRDA shows high performance on the benchmark dataset piRDisease, with an AUC of 0.9529. Furthermore, the predictive performance of SPRDA for 10 diseases shows the robustness of the proposed method. Overall, the proposed approach can provide unique insights into the pathogenesis of the disease and will advance the field of oncology diagnosis and treatment.
Collapse
Affiliation(s)
- Kai Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China
| | - Xin-Lu Zhang
- Civil Product General Research Institute, The 36th Research Institute of China Electronics Technology Group Corporation, Jiaxing, 314000, China
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.,Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Bo-Ya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410006, China
| | - Xiao Liang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Zheng-Wei Li
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.,Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| |
Collapse
|
6
|
Luo J, Ouyang W, Shen C, Cai J. Multi-relation graph embedding for predicting miRNA-target gene interactions by integrating gene sequence information. IEEE J Biomed Health Inform 2022; 26:4345-4353. [PMID: 35439150 DOI: 10.1109/jbhi.2022.3168008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Accumulated studies have found that miRNAs are in charge of many complex diseases such as cancers by modulating gene expression. Predicting miRNA-target interactions is beneficial for uncovering the crucial roles of miRNAs in regulating target genes and the progression of diseases. The emergence of large-scale genomic and biological data as well as the recent development in heterogeneous networks provides new opportunities for miRNA target identification. Compared with conventional methods, computational methods become a decent solution for high efficiency. Thus, designing a method that could excavate valid information from the heterogeneous network and gene sequences is in great demand for improving the prediction accuracy. In this study, we proposed a graph-based model named MRMTI for the prediction of miRNA-target interactions. MRMTI utilized the multi-relation graph convolution module and the Bi-LSTM module to incorporate both network topology and sequential information. The learned embeddings of miRNAs and genes were then used to calculate the prediction scores of miRNA-target pairs. Comparisons with other state-of-the-art graph embedding methods and existing bioinformatic tools illustrated the superiority of MRMTI under multiple criteria metrics. Three variants of MRMTI implied the positive effect of multi-relation. The experimental results of case studies further demonstrated the prominent ability of MRMTI in predicting novel associations.
Collapse
|
7
|
Du J, Lin D, Yuan R, Chen X, Liu X, Yan J. Graph Embedding Based Novel Gene Discovery Associated With Diabetes Mellitus. Front Genet 2021; 12:779186. [PMID: 34899863 PMCID: PMC8657768 DOI: 10.3389/fgene.2021.779186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 10/20/2021] [Indexed: 11/25/2022] Open
Abstract
Diabetes mellitus is a group of complex metabolic disorders which has affected hundreds of millions of patients world-widely. The underlying pathogenesis of various types of diabetes is still unclear, which hinders the way of developing more efficient therapies. Although many genes have been found associated with diabetes mellitus, more novel genes are still needed to be discovered towards a complete picture of the underlying mechanism. With the development of complex molecular networks, network-based disease-gene prediction methods have been widely proposed. However, most existing methods are based on the hypothesis of guilt-by-association and often handcraft node features based on local topological structures. Advances in graph embedding techniques have enabled automatically global feature extraction from molecular networks. Inspired by the successful applications of cutting-edge graph embedding methods on complex diseases, we proposed a computational framework to investigate novel genes associated with diabetes mellitus. There are three main steps in the framework: network feature extraction based on graph embedding methods; feature denoising and regeneration using stacked autoencoder; and disease-gene prediction based on machine learning classifiers. We compared the performance by using different graph embedding methods and machine learning classifiers and designed the best workflow for predicting genes associated with diabetes mellitus. Functional enrichment analysis based on Human Phenotype Ontology (HPO), KEGG, and GO biological process and publication search further evaluated the predicted novel genes.
Collapse
Affiliation(s)
| | | | | | | | | | - Jing Yan
- Zhejiang Hospital, Hangzhou, China.,Zhejiang Provincial Key Lab of Geriatrics, Zhejiang Hospital, Hangzhou, China
| |
Collapse
|
8
|
Feng S, Heath E, Jefferson B, Joslyn C, Kvinge H, Mitchell HD, Praggastis B, Eisfeld AJ, Sims AC, Thackray LB, Fan S, Walters KB, Halfmann PJ, Westhoff-Smith D, Tan Q, Menachery VD, Sheahan TP, Cockrell AS, Kocher JF, Stratton KG, Heller NC, Bramer LM, Diamond MS, Baric RS, Waters KM, Kawaoka Y, McDermott JE, Purvine E. Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinformatics 2021; 22:287. [PMID: 34051754 PMCID: PMC8164482 DOI: 10.1186/s12859-021-04197-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 05/13/2021] [Indexed: 12/25/2022] Open
Abstract
Background Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and have shown promise in modeling systems such as protein complexes and metabolic reactions. In this paper we seek to understand how hypergraphs can more faithfully identify, and potentially predict, important genes based on complex relationships inferred from genomic expression data sets. Results We compiled a novel data set of transcriptional host response to pathogenic viral infections and formulated relationships between genes as a hypergraph where hyperedges represent significantly perturbed genes, and vertices represent individual biological samples with specific experimental conditions. We find that hypergraph betweenness centrality is a superior method for identification of genes important to viral response when compared with graph centrality. Conclusions Our results demonstrate the utility of using hypergraphs to represent complex biological systems and highlight central important responses in common to a variety of highly pathogenic viruses. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04197-2.
Collapse
Affiliation(s)
- Song Feng
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Emily Heath
- Department of Mathematics, University of Illinois, Urbana-Champaign, IL, USA
| | - Brett Jefferson
- Computing and Analytics Division, Pacific Northwest National Laboratory, Seattle, WA, USA
| | - Cliff Joslyn
- Computing and Analytics Division, Pacific Northwest National Laboratory, Seattle, WA, USA.,Systems Science Program, Portland State University, Portland, OR, USA
| | - Henry Kvinge
- Computing and Analytics Division, Pacific Northwest National Laboratory, Seattle, WA, USA
| | - Hugh D Mitchell
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Brenda Praggastis
- Computing and Analytics Division, Pacific Northwest National Laboratory, Seattle, WA, USA
| | - Amie J Eisfeld
- Department of Pathobiological Sciences, School of Veterinary Medicine, Influenza Research Institute, University of Wisconsin-Madison, 575 Science Drive, 53711, Madison, WI, USA
| | - Amy C Sims
- Signature Science and Technology Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Larissa B Thackray
- Department of Medicine, Washington University School of Medicine, 63110, Saint Louis, MO, USA
| | - Shufang Fan
- Department of Pathobiological Sciences, School of Veterinary Medicine, Influenza Research Institute, University of Wisconsin-Madison, 575 Science Drive, 53711, Madison, WI, USA
| | - Kevin B Walters
- Department of Pathobiological Sciences, School of Veterinary Medicine, Influenza Research Institute, University of Wisconsin-Madison, 575 Science Drive, 53711, Madison, WI, USA
| | - Peter J Halfmann
- Department of Pathobiological Sciences, School of Veterinary Medicine, Influenza Research Institute, University of Wisconsin-Madison, 575 Science Drive, 53711, Madison, WI, USA
| | - Danielle Westhoff-Smith
- Department of Pathobiological Sciences, School of Veterinary Medicine, Influenza Research Institute, University of Wisconsin-Madison, 575 Science Drive, 53711, Madison, WI, USA
| | - Qing Tan
- Department of Medicine, Washington University School of Medicine, 63110, Saint Louis, MO, USA
| | - Vineet D Menachery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Department of Microbiology and Immunology, University of Texas Medical Branch, Galveston, TX, USA
| | - Timothy P Sheahan
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Jacob F Kocher
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kelly G Stratton
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Natalie C Heller
- Computing and Analytics Division, Pacific Northwest National Laboratory, Seattle, WA, USA
| | - Lisa M Bramer
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Michael S Diamond
- Department of Medicine, Washington University School of Medicine, 63110, Saint Louis, MO, USA.,Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA.,Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Ralph S Baric
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Katrina M Waters
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA.,Department of Comparative Medicine, University of Washington, Seattle, WA, USA
| | - Yoshihiro Kawaoka
- Department of Pathobiological Sciences, School of Veterinary Medicine, Influenza Research Institute, University of Wisconsin-Madison, 575 Science Drive, 53711, Madison, WI, USA.,Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo, Tokyo, 108-8639, Japan.,ERATO Infection-Induced Host Responses Project, Saitama, 332-0012, Japan.,Department of Special Pathogens, International Research Center for Infectious Diseases, Institute of Medical Science, University of Tokyo, Tokyo, 108-8639, Japan
| | - Jason E McDermott
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA.,Department of Molecular Microbiology and Immunology, Oregon Health and Science University, Portland, OR, USA
| | - Emilie Purvine
- Computing and Analytics Division, Pacific Northwest National Laboratory, Seattle, WA, USA.
| |
Collapse
|
9
|
Ata SK, Wu M, Fang Y, Ou-Yang L, Kwoh CK, Li XL. Recent advances in network-based methods for disease gene prediction. Brief Bioinform 2020; 22:6023077. [PMID: 33276376 DOI: 10.1093/bib/bbaa303] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/10/2020] [Indexed: 01/28/2023] Open
Abstract
Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.
Collapse
Affiliation(s)
- Sezin Kircali Ata
- School of Computer Science and Engineering Nanyang Technological University (NTU)
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, Singapore
| | - Yuan Fang
- School of Information Systems, Singapore Management University, Singapore
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen China
| | | | - Xiao-Li Li
- Department head and principal scientist at I2R, A*STAR, Singapore
| |
Collapse
|
10
|
Peng L, Shen L, Liao L, Liu G, Zhou L. RNMFMDA: A Microbe-Disease Association Identification Method Based on Reliable Negative Sample Selection and Logistic Matrix Factorization With Neighborhood Regularization. Front Microbiol 2020; 11:592430. [PMID: 33193260 PMCID: PMC7652725 DOI: 10.3389/fmicb.2020.592430] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 09/17/2020] [Indexed: 12/22/2022] Open
Abstract
Microbes with abnormal levels have important impacts on the formation and development of various complex diseases. Identifying possible Microbe-Disease Associations (MDAs) helps to understand the mechanisms of complex diseases. However, experimental methods for MDA identification are costly and time-consuming. In this study, a new computational model, RNMFMDA, was developed to find possible MDAs. RNMFMDA contains two main processes. First, Reliable Negative MDA samples were selected based on Positive-Unlabeled (PU) learning and random walk with restart on the heterogeneous microbe-disease network. Second, Logistic Matrix Factorization with Neighborhood Regularization (LMFNR) was developed to compute the association probabilities for all microbe-disease pairs. To evaluate the performance of the proposed RNMFMDA method, we compared RNMFMDA with five state-of-the-art MDA prediction methods based on five-fold cross-validations on microbes, diseases, and MDAs. As a result, RNMFMDA obtained the best AUCs of 0.6332, 0.8669, and 0.9081, respectively for the three five-fold cross validations, significantly outperforming other models. The promising prediction performance may be attributed to the following three features: highly quality negative MDA sample selection, LMFNR-based MDA prediction model, and various biological information integration. In addition, a few predicted microbe-disease pairs with high association scores are worthy of further experimental validation.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Longjie Liao
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Guangyi Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|