1
|
Wei H, Gao L, Wu S, Jiang Y, Liu B. DiSMVC: a multi-view graph collaborative learning framework for measuring disease similarity. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae306. [PMID: 38715444 DOI: 10.1093/bioinformatics/btae306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/19/2024] [Accepted: 05/05/2024] [Indexed: 05/30/2024]
Abstract
MOTIVATION Exploring potential associations between diseases can help in understanding pathological mechanisms of diseases and facilitating the discovery of candidate biomarkers and drug targets, thereby promoting disease diagnosis and treatment. Some computational methods have been proposed for measuring disease similarity. However, these methods describe diseases without considering their latent multi-molecule regulation and valuable supervision signal, resulting in limited biological interpretability and efficiency to capture association patterns. RESULTS In this study, we propose a new computational method named DiSMVC. Different from existing predictors, DiSMVC designs a supervised graph collaborative framework to measure disease similarity. Multiple bio-entity associations related to genes and miRNAs are integrated via cross-view graph contrastive learning to extract informative disease representation, and then association pattern joint learning is implemented to compute disease similarity by incorporating phenotype-annotated disease associations. The experimental results show that DiSMVC can draw discriminative characteristics for disease pairs, and outperform other state-of-the-art methods. As a result, DiSMVC is a promising method for predicting disease associations with molecular interpretability. AVAILABILITY AND IMPLEMENTATION Datasets and source codes are available at https://github.com/Biohang/DiSMVC.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710126, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710126, China
| | - Shuai Wu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710126, China
| | - Yina Jiang
- Department of Basic Medicine, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China
| | - Bin Liu
- Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen, Guangdong 518172, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
2
|
Gottesman L. The History and Physical, R.I.P. Dis Colon Rectum 2024; 67:487-490. [PMID: 38150312 DOI: 10.1097/dcr.0000000000003171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Affiliation(s)
- Lester Gottesman
- Division of Colorectal Surgery, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
3
|
Pedrosa VB, Chen SY, Gloria LS, Doucette JS, Boerman JP, Rosa GJM, Brito LF. Machine learning methods for genomic prediction of cow behavioral traits measured by automatic milking systems in North American Holstein cattle. J Dairy Sci 2024:S0022-0302(24)00497-1. [PMID: 38395400 DOI: 10.3168/jds.2023-24082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 01/18/2024] [Indexed: 02/25/2024]
Abstract
Identifying genome-enabled methods that provide more accurate genomic prediction is crucial when evaluating complex traits such as dairy cow behavior. In this study, we aimed to compare the predictive performance of traditional genomic prediction methods and deep learning algorithms for genomic prediction of milking refusals (MREF) and milking failures (MFAIL) in North American Holstein cows measured by automatic milking systems (milking robots). A total of 1,993,509 daily records from 4,511 genotyped Holstein cows were collected by 36 milking robot stations. After quality control, 57,600 single nucleotide polymorphisms (SNP) were available for the analyses. Four genomic prediction methods were considered: Bayesian Lasso (LASSO), Multiple Layer Perceptron (MLP), Convolutional Neural Network (CNN), and Genomic Best Linear Unbiased Prediction (GBLUP). We implemented the first 3 methods using the Keras and TensorFlow libraries in Python (v.3.9) while the GBLUP method was implemented using the BLUPF90+ family programs. The accuracy of genomic prediction (Mean Square Error) for MREF and MFAIL was 0.34 (0.08) and 0.27 (0.08) based on LASSO, 0.36 (0.09) and 0.32 (0.09) for MLP, 0.37 (0.08) and 0.30 (0.09) for CNN, and 0.35 (0.09) and 0.31(0.09) based on GBLUP, respectively. Additionally, we observed a lower re-ranking of top selected individuals based on the MLP versus CNN methods compared with the other approaches for both MREF and MFAIL. Although the deep learning methods showed slightly higher accuracies than GBLUP, the results may not be sufficient to justify their use over traditional methods due to their higher computational demand and the difficulty of performing genomic prediction for non-genotyped individuals using deep learning procedures. Overall, this study provides insights into the potential feasibility of using deep learning methods to enhance genomic prediction accuracy for behavioral traits in livestock. Further research is needed to determine their practical applicability to large dairy cattle breeding programs.
Collapse
Affiliation(s)
- Victor B Pedrosa
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Shi-Yi Chen
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Leonardo S Gloria
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Jarrod S Doucette
- Agriculture Information Technology (AgIT), Purdue University, West Lafayette, IN, 47907, USA
| | - Jacquelyn P Boerman
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Guilherme J M Rosa
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Luiz F Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
4
|
Na D, Lim DH, Hong JS, Lee HM, Cho D, Yu MS, Shaker B, Ren J, Lee B, Song JG, Oh Y, Lee K, Oh KS, Lee MY, Choi MS, Choi HS, Kim YH, Bui JM, Lee K, Kim HW, Lee YS, Gsponer J. A multi-layered network model identifies Akt1 as a common modulator of neurodegeneration. Mol Syst Biol 2023; 19:e11801. [PMID: 37984409 DOI: 10.15252/msb.202311801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/25/2023] [Accepted: 10/27/2023] [Indexed: 11/22/2023] Open
Abstract
The accumulation of misfolded and aggregated proteins is a hallmark of neurodegenerative proteinopathies. Although multiple genetic loci have been associated with specific neurodegenerative diseases (NDs), molecular mechanisms that may have a broader relevance for most or all proteinopathies remain poorly resolved. In this study, we developed a multi-layered network expansion (MLnet) model to predict protein modifiers that are common to a group of diseases and, therefore, may have broader pathophysiological relevance for that group. When applied to the four NDs Alzheimer's disease (AD), Huntington's disease, and spinocerebellar ataxia types 1 and 3, we predicted multiple members of the insulin pathway, including PDK1, Akt1, InR, and sgg (GSK-3β), as common modifiers. We validated these modifiers with the help of four Drosophila ND models. Further evaluation of Akt1 in human cell-based ND models revealed that activation of Akt1 signaling by the small molecule SC79 increased cell viability in all models. Moreover, treatment of AD model mice with SC79 enhanced their long-term memory and ameliorated dysregulated anxiety levels, which are commonly affected in AD patients. These findings validate MLnet as a valuable tool to uncover molecular pathways and proteins involved in the pathophysiology of entire disease groups and identify potential therapeutic targets that have relevance across disease boundaries. MLnet can be used for any group of diseases and is available as a web tool at http://ssbio.cau.ac.kr/software/mlnet.
Collapse
Affiliation(s)
- Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Do-Hwan Lim
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
- School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea
| | - Jae-Sang Hong
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
- Center for Systems Biology, Massachusetts General Hospital, Boston, MA, USA
| | - Hyang-Mi Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Daeahn Cho
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Myeong-Sang Yu
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Bilal Shaker
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Jun Ren
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Bomi Lee
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Jae Gwang Song
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Yuna Oh
- Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - Kyungeun Lee
- Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - Kwang-Seok Oh
- Information-based Drug Research Center, Korea Research Institute of Chemical Technology, Deajeon, Republic of Korea
| | - Mi Young Lee
- Information-based Drug Research Center, Korea Research Institute of Chemical Technology, Deajeon, Republic of Korea
| | - Min-Seok Choi
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
| | - Han Saem Choi
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Yang-Hee Kim
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Jennifer M Bui
- Department of Biochemistry and Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Kangseok Lee
- Department of Life Science, Chung-Ang University, Seoul, Republic of Korea
| | - Hyung Wook Kim
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Young Sik Lee
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
| | - Jörg Gsponer
- Department of Biochemistry and Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
5
|
Yu S, Wang Z, Nan J, Li A, Yang X, Tang X. Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation. JMIR Form Res 2023; 7:e50998. [PMID: 37966892 PMCID: PMC10687686 DOI: 10.2196/50998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/28/2023] [Accepted: 10/27/2023] [Indexed: 11/16/2023] Open
Abstract
BACKGROUND Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets. OBJECTIVE The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease-causing property. Consequently, a PPIK-based metagraph representation approach is proposed. METHODS To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models. RESULTS Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases. CONCLUSIONS The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia.
Collapse
Affiliation(s)
- Shirui Yu
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Ziyang Wang
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Jiale Nan
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Aihua Li
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Xuemei Yang
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Xiaoli Tang
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
6
|
Wang Z, Gu Y, Zheng S, Yang L, Li J. MGREL: A multi-graph representation learning-based ensemble learning method for gene-disease association prediction. Comput Biol Med 2023; 155:106642. [PMID: 36805231 DOI: 10.1016/j.compbiomed.2023.106642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/15/2023] [Accepted: 02/05/2023] [Indexed: 02/12/2023]
Abstract
The identification of gene-disease associations plays an important role in the exploration of pathogenic mechanisms and therapeutic targets. Computational methods have been regarded as an effective way to discover the potential gene-disease associations in recent years. However, most of them ignored the combination of abundant genetic, therapeutic information, and gene-disease network topology. To this end, we re-organized the current gene-disease association benchmark dataset by extracting the newest gene-disease associations from the OMIM database. Then, we developed a multi-graph representation learning-based ensemble model, named MGREL to predict gene-disease associations. MGREL integrated two feature generation channels to extract gene and disease features, including a knowledge extraction channel which learned high-order representations from genetic and therapeutic information, and a graph learning channel which acquired network topological representations through multiple advanced graph representation learning methods. Then, an ensemble learning method with 5 machine learning models was used as the classifier to predict the gene-disease association. Comprehensive experiments have demonstrated the significant performance achieved by MGREL compared to 5 state-of-the-art methods. For the major measurements (AUC = 0.925, AUPR = 0.935), the relative improvements of MGREL compared to the suboptimal methods are 3.24%, and 2.75%, respectively. MGREL also achieved impressive improvements in the challenging tasks of predicting potential associations for unknown genes/diseases. In addition, case studies implied potential applications for MGREL in the discovery of potential therapeutic targets.
Collapse
Affiliation(s)
- Ziyang Wang
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Yaowen Gu
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Si Zheng
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Lin Yang
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Jiao Li
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China.
| |
Collapse
|
7
|
Identifying Tumor-Associated Genes from Bilayer Networks of DNA Methylation Sites and RNAs. LIFE (BASEL, SWITZERLAND) 2022; 13:life13010076. [PMID: 36676027 PMCID: PMC9861397 DOI: 10.3390/life13010076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/29/2022]
Abstract
Network theory has attracted much attention from the biological community because of its high efficacy in identifying tumor-associated genes. However, most researchers have focused on single networks of single omics, which have less predictive power. With the available multiomics data, multilayer networks can now be used in molecular research. In this study, we achieved this with the construction of a bilayer network of DNA methylation sites and RNAs. We applied the network model to five types of tumor data to identify key genes associated with tumors. Compared with the single network, the proposed bilayer network resulted in more tumor-associated DNA methylation sites and genes, which we verified with prognostic and KEGG enrichment analyses.
Collapse
|
8
|
Predicting Genetic Disorder and Types of Disorder Using Chain Classifier Approach. Genes (Basel) 2022; 14:genes14010071. [PMID: 36672812 PMCID: PMC9858679 DOI: 10.3390/genes14010071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 12/16/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Genetic disorders are the result of mutation in the deoxyribonucleic acid (DNA) sequence which can be developed or inherited from parents. Such mutations may lead to fatal diseases such as Alzheimer's, cancer, Hemochromatosis, etc. Recently, the use of artificial intelligence-based methods has shown superb success in the prediction and prognosis of different diseases. The potential of such methods can be utilized to predict genetic disorders at an early stage using the genome data for timely treatment. This study focuses on the multi-label multi-class problem and makes two major contributions to genetic disorder prediction. A novel feature engineering approach is proposed where the class probabilities from an extra tree (ET) and random forest (RF) are joined to make a feature set for model training. Secondly, the study utilizes the classifier chain approach where multiple classifiers are joined in a chain and the predictions from all the preceding classifiers are used by the conceding classifiers to make the final prediction. Because of the multi-label multi-class data, macro accuracy, Hamming loss, and α-evaluation score are used to evaluate the performance. Results suggest that extreme gradient boosting (XGB) produces the best scores with a 92% α-evaluation score and a 84% macro accuracy score. The performance of XGB is much better than state-of-the-art approaches, in terms of both performance and computational complexity.
Collapse
|
9
|
Ismail E, Gad W, Hashem M. HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes. BMC Bioinformatics 2022; 23:554. [PMID: 36544099 PMCID: PMC9768984 DOI: 10.1186/s12859-022-05099-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
PURPOSE Autism spectrum disorder (ASD) is the most prevalent disease today. The causes of its infection may be attributed to genetic causes by 80% and environmental causes by 20%. In spite of this, the majority of the current research is concerned with environmental causes, and the least proportion with the genetic causes of the disease. Autism is a complex disease, which makes it difficult to identify the genes that cause the disease. METHODS Hybrid ensemble-based classification (HEC-ASD) model for predicting ASD genes using gradient boosting machines is proposed. The proposed model utilizes gene ontology (GO) to construct a gene functional similarity matrix using hybrid gene similarity (HGS) method. HGS measures the semantic similarity between genes effectively. It combines the graph-based method, such as Wang method with the number of directed children's nodes of gene term from GO. Moreover, an ensemble gradient boosting classifier is adapted to enhance the prediction of genes forming a robust classification model. RESULTS The proposed model is evaluated using the Simons Foundation Autism Research Initiative (SFARI) gene database. The experimental results are promising as they improve the classification performance for predicting ASD genes. The results are compared with other approaches that used gene regulatory network (GRN), protein to protein interaction network (PPI), or GO. The HEC-ASD model reaches the highest prediction accuracy of 0.88% using ensemble learning classifiers. CONCLUSION The proposed model demonstrates that ensemble learning technique using gradient boosting is effective in predicting autism spectrum disorder genes. Moreover, the HEC-ASD model utilized GO rather than using PPI network and GRN.
Collapse
Affiliation(s)
- Eman Ismail
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Walaa Gad
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| | - Mohamed Hashem
- grid.7269.a0000 0004 0621 1570Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
| |
Collapse
|
10
|
Fischer S, Gillis J. Defining the extent of gene function using ROC curvature. Bioinformatics 2022; 38:5390-5397. [PMID: 36271855 PMCID: PMC9750128 DOI: 10.1093/bioinformatics/btac692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 09/19/2022] [Accepted: 10/20/2022] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Interactions between proteins help us understand how genes are functionally related and how they contribute to phenotypes. Experiments provide imperfect 'ground truth' information about a small subset of potential interactions in a specific biological context, which can then be extended to the whole genome across different contexts, such as conditions, tissues or species, through machine learning methods. However, evaluating the performance of these methods remains a critical challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves. RESULTS We identify Functional Equivalence Classes (FECs), subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves built from gene-centric prediction tasks, such as function or interaction predictions. FECs are widespread across data types and methods, they can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10-50 genes), and tissue-specific secondary markers (100-500 genes). In addition, FECs suggest the existence of functional modules that span a wide range of the genome, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in the definition of functional gene sets. AVAILABILITY AND IMPLEMENTATION Code for analyses and figures is available at https://github.com/yexilein/pyroc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stephan Fischer
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY 11724, USA
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris F-75015, France
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY 11724, USA
- Department of Physiology, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
11
|
Xie M, Lei X, Zhong J, Ouyang J, Li G. Drug response prediction using graph representation learning and Laplacian feature selection. BMC Bioinformatics 2022; 23:532. [PMID: 36494630 PMCID: PMC9733001 DOI: 10.1186/s12859-022-05080-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 11/22/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Knowing the responses of a patient to drugs is essential to make personalized medicine practical. Since the current clinical drug response experiments are time-consuming and expensive, utilizing human genomic information and drug molecular characteristics to predict drug responses is of urgent importance. Although a variety of computational drug response prediction methods have been proposed, their effectiveness is still not satisfying. RESULTS In this study, we propose a method called LGRDRP (Learning Graph Representation for Drug Response Prediction) to predict cell line-drug responses. At first, LGRDRP constructs a heterogeneous network integrating multiple kinds of information: cell line miRNA expression profiles, drug chemical structure similarity, gene-gene interaction, cell line-gene interaction and known cell line-drug responses. Then, for each cell line, learning graph representation and Laplacian feature selection are combined to obtain network topology features related to the cell line. The learning graph representation method learns network topology structure features, and the Laplacian feature selection method further selects out some most important ones from them. Finally, LGRDRP trains an SVM model to predict drug responses based on the selected features of the known cell line-drug responses. Our five-fold cross-validation results show that LGRDRP is significantly superior to the art-of-the-state methods in the measures of the average area under the receiver operating characteristics curve, the average area under the precision-recall curve and the recall rate of top-k predicted sensitive cell lines. CONCLUSIONS Our results demonstrated that the usage of multiple types of information about cell lines and drugs, the learning graph representation method, and the Laplacian feature selection is useful to the improvement of performance in predicting drug responses. We believe that such an approach would be easily extended to similar problems such as miRNA-disease relationship inference.
Collapse
Affiliation(s)
- Minzhu Xie
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China ,grid.411427.50000 0001 0089 3695Key Laboratory of Computing and Stochastic Mathematics (LCSM) (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, China
| | - Xiaowen Lei
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jianchen Zhong
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jianxing Ouyang
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Guijing Li
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| |
Collapse
|
12
|
Yang H, Ding Y, Tang J, Guo F. Inferring human microbe–drug associations via multiple kernel fusion on graph neural network. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
13
|
Wang W, Han R, Zhang M, Wang Y, Wang T, Wang Y, Shang X, Peng J. A network-based method for brain disease gene prediction by integrating brain connectome and molecular network. Brief Bioinform 2021; 23:6415315. [PMID: 34727570 DOI: 10.1093/bib/bbab459] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/18/2021] [Accepted: 10/07/2021] [Indexed: 12/27/2022] Open
Abstract
Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Ruijiang Han
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Menghan Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| |
Collapse
|
14
|
Tarazona S, Arzalluz-Luque A, Conesa A. Undisclosed, unmet and neglected challenges in multi-omics studies. NATURE COMPUTATIONAL SCIENCE 2021; 1:395-402. [PMID: 38217236 DOI: 10.1038/s43588-021-00086-z] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 01/15/2024]
Abstract
Multi-omics approaches have become a reality in both large genomics projects and small laboratories. However, the multi-omics research community still faces a number of issues that have either not been sufficiently discussed or for which current solutions are still limited. In this Perspective, we elaborate on these limitations and suggest points of attention for future research. We finally discuss new opportunities and challenges brought to the field by the rapid development of single-cell high-throughput molecular technologies.
Collapse
Affiliation(s)
- Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Angeles Arzalluz-Luque
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Microbiology and Cell Science Department, Institute for Food and Agricultural Research, University of Florida, Gainesville, FL, USA.
- Genetics Institute, University of Florida, Gainesville, FL, USA.
- Institute for Integrative Systems Biology, Spanish National Research Council, Valencia, Spain.
| |
Collapse
|