1
|
Saranya KR, Vimina ER, Pinto FR. TransNeT-CGP: A cluster-based comorbid gene prioritization by integrating transcriptomics and network-topological features. Comput Biol Chem 2024; 110:108038. [PMID: 38461796 DOI: 10.1016/j.compbiolchem.2024.108038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/11/2024] [Accepted: 02/25/2024] [Indexed: 03/12/2024]
Abstract
The local disruptions caused by the genes of one disease can influence the pathways associated with the other diseases resulting in comorbidity. For gene therapies, it is necessary to prioritize the key genes that regulate common biological mechanisms to tackle the issues caused by overlapping diseases. This work proposes a clustering-based computational approach for prioritising the comorbid genes within the overlapping disease modules by analyzing Protein-Protein Interaction networks. For this, a sub-network with gene interactions of the disease pair was extracted from the interactome. The edge weights are assigned by combining the pairwise gene expression correlation and betweenness centrality scores. Further, a weighted graph clustering algorithm is applied and dominant nodes of high-density clusters are ranked based on clustering coefficients and neighborhood connectivity. Case studies based on neurodegenerative diseases such as Amyotrophic Lateral Sclerosis- Spinal Muscular Atrophy (ALS-SMA) pair and cancers such as Ovarian Carcinoma-Invasive Ductal Breast Carcinoma (OC-IDBC) pair were conducted to examine the efficacy of the proposed method. To identify the mechanistic role of top-ranked genes, we used Functional and Pathway enrichment analysis, connectivity analysis with leave-one-out (LOO) method, analysis of associated disease-related protein complexes, and prioritization tools such as TOPPGENE and Heml2.0. From pathway analysis, it was observed that the top 10 genes obtained using the proposed method were associated with 10 pathways in ALS-SMA comorbidity and 15 in the case of OC-IDBC, while that in similar methods like SAPDSB and S2B were 4, 6 respectively for ALS-SMA and 9, 10 respectively for OC-IDBC. In both case studies, 70 % of the disease-specific benchmark protein complexes were linked to top-ranked genes of the proposed method while that of SAPDSB and S2B were 55 % and 60 % respectively. Additionally, it was found that the removal of the top 10 genes disconnect the network into 14 distinct components in the case of ALS-SMA and 9 in the case of OC-IDBC. The experimental results shows that the proposed method can be effectively used for identifying key genes in comorbidity and can offer insights about the intricate molecular relationship driving comorbid diseases.
Collapse
Affiliation(s)
- K R Saranya
- Department of Computer Science & IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - E R Vimina
- Department of Computer Science & IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - F R Pinto
- Chemistry and Biochemistry Department, Faculty of Sciences, University of Lisbon, Portugal.
| |
Collapse
|
2
|
Emam SM, Moussa N. Signaling pathways of dental implants' osseointegration: a narrative review on two of the most relevant; NF-κB and Wnt pathways. BDJ Open 2024; 10:29. [PMID: 38580623 PMCID: PMC10997788 DOI: 10.1038/s41405-024-00211-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/09/2024] [Accepted: 03/11/2024] [Indexed: 04/07/2024] Open
Abstract
INTRODUCTION Cell signaling pathways are the biological reactions that control cell functions and fate. They also directly affect the body reactions to implanted biomaterials. It is well-known that dental implants success depends on a successful integration with the alveolar bone: "osseointegration" which events comprise early and later responses to the implanted biomaterials. The early events are mainly immune-inflammatory responses to the implant considered by its microenvironment as a foreign body. Later reactions are osteogenic aiming to regulate bone formation and remodeling. All these events are controlled by the cell signaling pathways in an incredible harmonious coordination. AIM The number of pathways having a role in osseointegration is so big to be reviewed in a single article. So the aim of this review was to study only two of the most relevant ones: the inflammatory Nuclear Factor Kappa B (NF-κB) pathway regulating the early osseointegration events and the osteogenic Wnt pathway regulating later events. METHODS We conducted a literature review using key databases to provide an overview about the NF-κB and Wnt cell signaling pathways and their mutual relationship with dental implants. A simplified narrative approach was conducted to explain these cell signaling pathways, their mode of activation and how they are related to the cellular events of osseointegration. RESULTS AND CONCLUSION NF-κB and Wnt cell signaling pathways are important cross-talking pathways that are affected by the implant's material and surface characteristics. The presence of the implant itself in the bone alters the intracellular events of both pathways in the adjacent implant's cellular microenvironment. Both pathways have a great role in the success or failure of osseointegration. Such knowledge can offer a new hope to treat failed implants and enhance osseointegration in difficult cases. This is consistent with advances in Omics technologies that can change the paradigm of dental implant therapy.
Collapse
Affiliation(s)
- Samar Mohamed Emam
- Department of Prosthodontics, Faculty of Dentistry, Alexandria University, Alexandria, Egypt.
- Department of Biotechnology, Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt.
| | - Nermine Moussa
- Department of Biotechnology, Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt
| |
Collapse
|
3
|
Zhang P, Zhang W, Sun W, Xu J, Hu H, Wang L, Wong L. Identification of gene biomarkers for brain diseases via multi-network topological semantics extraction and graph convolutional network. BMC Genomics 2024; 25:175. [PMID: 38350848 PMCID: PMC10865627 DOI: 10.1186/s12864-024-09967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/03/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Brain diseases pose a significant threat to human health, and various network-based methods have been proposed for identifying gene biomarkers associated with these diseases. However, the brain is a complex system, and extracting topological semantics from different brain networks is necessary yet challenging to identify pathogenic genes for brain diseases. RESULTS In this study, we present a multi-network representation learning framework called M-GBBD for the identification of gene biomarker in brain diseases. Specifically, we collected multi-omics data to construct eleven networks from different perspectives. M-GBBD extracts the spatial distributions of features from these networks and iteratively optimizes them using Kullback-Leibler divergence to fuse the networks into a common semantic space that represents the gene network for the brain. Subsequently, a graph consisting of both gene and large-scale disease proximity networks learns representations through graph convolution techniques and predicts whether a gene is associated which brain diseases while providing associated scores. Experimental results demonstrate that M-GBBD outperforms several baseline methods. Furthermore, our analysis supported by bioinformatics revealed CAMP as a significantly associated gene with Alzheimer's disease identified by M-GBBD. CONCLUSION Collectively, M-GBBD provides valuable insights into identifying gene biomarkers for brain diseases and serves as a promising framework for brain networks representation learning.
Collapse
Affiliation(s)
- Ping Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Hubei Hongshan Laboratory, Wuhan, 430074, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hua Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
| | - Leon Wong
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
| |
Collapse
|
4
|
Yang X, Yang G, Chu J. The Neural Metric Factorization for Computational Drug Repositioning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:731-741. [PMID: 35061591 DOI: 10.1109/tcbb.2022.3144429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Computational drug repositioning aims to discover new therapeutic diseases for marketed drugs and has the advantages of low cost, short development cycle, and high controllability compared to traditional drug development. The matrix factorization model has become the cornerstone technique for computational drug repositioning due to its ease of implementation and excellent scalability. However, the matrix factorization model uses the inner product operation to represent the association between drugs and diseases, which is lacking in expressive ability. Moreover, the degree of similarity of drugs or diseases could not be implied on their respective latent factor vectors, which is not satisfy the common sense of drug discovery. Therefore, a neural metric factorization model for computational drug repositioning (NMFDR) is proposed in this work. We novelly consider the latent factor vector of drugs and diseases as a point in the high-dimensional coordinate system and propose a generalized euclidean distance to represent the association between drugs and diseases to compensate for the shortcomings of the inner product operation. Furthermore, by embedding multiple drug (disease) metrics information into the encoding space of the latent factor vector, the information about the similarity between drugs (diseases) can be reflected in the distance between latent factor vectors. Finally, we conduct wide analysis experiments on three real datasets to demonstrate the effectiveness of the above improvement points and the superiority of the NMFDR model.
Collapse
|
5
|
Zhang SW, Xu JY, Zhang T. DGMP: Identifying Cancer Driver Genes by Jointing DGCN and MLP from Multi-omics Genomic Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:928-938. [PMID: 36464123 PMCID: PMC10025764 DOI: 10.1016/j.gpb.2022.11.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 10/21/2022] [Accepted: 11/04/2022] [Indexed: 12/03/2022]
Abstract
Identification of cancer driver genes plays an important role in precision oncology research, which is helpful to understand cancer initiation and progression. However, most existing computational methods mainly used the protein-protein interaction (PPI) networks, or treated the directed gene regulatory networks (GRNs) as the undirected gene-gene association networks to identify the cancer driver genes, which will lose the unique structure regulatory information in the directed GRNs, and then affect the outcome of the cancer driver gene identification. Here, based on the multi-omics pan-cancer data (i.e., gene expression, mutation, copy number variation, and DNA methylation), we propose a novel method (called DGMP) to identify cancer driver genes by jointing directed graph convolutional network (DGCN) and multilayer perceptron (MLP). DGMP learns the multi-omics features of genes as well as the topological structure features in GRN with the DGCN model and uses MLP to weigh more on gene features for mitigating the bias toward the graph topological features in the DGCN learning process. The results on three GRNs show that DGMP outperforms other existing state-of-the-art methods. The ablation experimental results on the DawnNet network indicate that introducing MLP into DGCN can offset the performance degradation of DGCN, and jointing MLP and DGCN can effectively improve the performance of identifying cancer driver genes. DGMP can identify not only the highly mutated cancer driver genes but also the driver genes harboring other kinds of alterations (e.g., differential expression and aberrant DNA methylation) or genes involved in GRNs with other cancer genes. The source code of DGMP can be freely downloaded from https://github.com/NWPU-903PR/DGMP.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Jing-Yu Xu
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Tong Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
6
|
Hou S, Zhang P, Yang K, Wang L, Ma C, Li Y, Li S. Decoding multilevel relationships with the human tissue-cell-molecule network. Brief Bioinform 2022; 23:6585388. [PMID: 35551347 DOI: 10.1093/bib/bbac170] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/09/2022] [Accepted: 04/16/2022] [Indexed: 02/01/2023] Open
Abstract
Understanding the biological functions of molecules in specific human tissues or cell types is crucial for gaining insights into human physiology and disease. To address this issue, it is essential to systematically uncover associations among multilevel elements consisting of disease phenotypes, tissues, cell types and molecules, which could pose a challenge because of their heterogeneity and incompleteness. To address this challenge, we describe a new methodological framework, called Graph Local InfoMax (GLIM), based on a human multilevel network (HMLN) that we established by introducing multiple tissues and cell types on top of molecular networks. GLIM can systematically mine the potential relationships between multilevel elements by embedding the features of the HMLN through contrastive learning. Our simulation results demonstrated that GLIM consistently outperforms other state-of-the-art algorithms in disease gene prediction. Moreover, GLIM was also successfully used to infer cell markers and rewire intercellular and molecular interactions in the context of specific tissues or diseases. As a typical case, the tissue-cell-molecule network underlying gastritis and gastric cancer was first uncovered by GLIM, providing systematic insights into the mechanism underlying the occurrence and development of gastric cancer. Overall, our constructed methodological framework has the potential to systematically uncover complex disease mechanisms and mine high-quality relationships among phenotypical, tissue, cellular and molecular elements.
Collapse
Affiliation(s)
- Siyu Hou
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Peng Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Kuo Yang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China.,School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Lan Wang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Changzheng Ma
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Yanda Li
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Shao Li
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| |
Collapse
|
7
|
Yin Q, Liu Q, Fu Z, Zeng W, Zhang B, Zhang X, Jiang R, Lv H. scGraph: a graph neural network-based approach to automatically identify cell types. Bioinformatics 2022; 38:2996-3003. [PMID: 35394015 DOI: 10.1093/bioinformatics/btac199] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 12/13/2021] [Accepted: 04/07/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Single cell technologies play a crucial role in revolutionizing biological research over the past decade, which strengthens our understanding in cell differentiation, development, and regulation from a single-cell level perspective. Single-cell RNA sequencing (scRNA-seq) is one of the most common single cell technologies, which enables probing transcriptional states in thousands of cells in one experiment. Identification of cell types from scRNA-seq measurements is a fundamental and crucial question to answer. Most previous studies directly take gene expression as input while ignoring the comprehensive gene-gene interactions. RESULTS We propose scGraph, an automatic cell identification algorithm leveraging gene interaction relationships to enhance the performance of the cell type identification. ScGraph is based on a graph neural network to aggregate the information of interacting genes. In a series of experiments, we demonstrate that scGraph is accurate and outperforms eight comparison methods in the task of cell type identification. Moreover, scGraph automatically learns the gene interaction relationships from biological data and the pathway enrichment analysis shows consistent findings with previous analysis, providing insights on the analysis of regulatory mechanism. AVAILABILITY scGraph is freely available at https://github.com/QijinYin/scGraph and https://figshare.com/articles/software/scGraph/17157743. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qiao Liu
- Department of Statistics, Stanford University Stanford, CA 94305
| | - Zhuoran Fu
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wanwen Zeng
- Department of Statistics, Stanford University Stanford, CA 94305.,College of Software, Nankai University, Tianjin, 300350, China
| | - Boheng Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Hairong Lv
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China.,Fuzhou Institute of Data Technology, Changle, Fuzhou, 350200, China
| |
Collapse
|
8
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
9
|
Zhang Y, Chen L, Li S. CIPHER-SC: Disease-Gene Association Inference Using Graph Convolution on a Context-Aware Network With Single-Cell Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:819-829. [PMID: 32809944 DOI: 10.1109/tcbb.2020.3017547] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Inference of disease-gene associations helps unravel the pathogenesis of diseases and contributes to the treatment. Although many machine learning-based methods have been developed to predict causative genes, accurate association inference remains challenging. One major reason is the inaccurate feature selection and accumulation of error brought by commonly used multi-stage training architecture. In addition, the existing methods do not incorporate cell-type-specific information, thus fail to study gene functions at a higher resolution. Therefore, we introduce single-cell transcriptome data and construct a context-aware network to unbiasedly integrate all data sources. Then we develop a graph convolution-based approach named CIPHER-SC to realize a complete end-to-end learning architecture. Our approach outperforms four state-of-the-art approaches in five-fold cross-validations on three distinct test sets with the best AUC of 0.9501, demonstrating its stable ability either to predict the novel genes or to predict with genetic basis. The ablation study shows that our complete end-to-end design and unbiased data integration boost the performance from 0.8727 to 0.9443 in AUC. The addition of single-cell data further improves the prediction accuracy and makes our results be enriched for cell-type-specific genes. These results confirm the ability of CIPHER-SC to discover reliable disease genes. Our implementation is available at http://github.com/YidingZhang117/CIPHER-SC.
Collapse
|
10
|
Ding P, Ouyang W, Luo J, Kwoh CK. Heterogeneous information network and its application to human health and disease. Brief Bioinform 2021; 21:1327-1346. [PMID: 31566212 DOI: 10.1093/bib/bbz091] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/29/2019] [Accepted: 06/30/2019] [Indexed: 12/11/2022] Open
Abstract
The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.
Collapse
Affiliation(s)
- Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chee-Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
11
|
Zhang M, Hu Y, Zhu M. EPIsHilbert: Prediction of Enhancer-Promoter Interactions via Hilbert Curve Encoding and Transfer Learning. Genes (Basel) 2021; 12:genes12091385. [PMID: 34573367 PMCID: PMC8472018 DOI: 10.3390/genes12091385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 08/31/2021] [Accepted: 09/01/2021] [Indexed: 12/19/2022] Open
Abstract
Enhancer-promoter interactions (EPIs) play a significant role in the regulation of gene transcription. However, enhancers may not necessarily interact with the closest promoters, but with distant promoters via chromatin looping. Considering the spatial position relationship between enhancers and their target promoters is important for predicting EPIs. Most existing methods only consider sequence information regardless of spatial information. On the other hand, recent computational methods lack generalization capability across different cell line datasets. In this paper, we propose EPIsHilbert, which uses Hilbert curve encoding and two transfer learning approaches. Hilbert curve encoding can preserve the spatial position information between enhancers and promoters. Additionally, we use visualization techniques to explore important sequence fragments that have a high impact on EPIs and the spatial relationships between them. Transfer learning can improve prediction performance across cell lines. In order to further prove the effectiveness of transfer learning, we analyze the sequence coincidence of different cell lines. Experimental results demonstrate that EPIsHilbert is a state-of-the-art model that is superior to most of the existing methods both in specific cell lines and cross cell lines.
Collapse
|
12
|
Zhang H, Ferguson A, Robertson G, Jiang M, Zhang T, Sudlow C, Smith K, Rannikmae K, Wu H. Benchmarking network-based gene prioritization methods for cerebral small vessel disease. Brief Bioinform 2021; 22:bbab006. [PMID: 33634312 PMCID: PMC8425308 DOI: 10.1093/bib/bbab006] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/31/2020] [Accepted: 01/04/2021] [Indexed: 12/25/2022] Open
Abstract
Network-based gene prioritization algorithms are designed to prioritize disease-associated genes based on known ones using biological networks of protein interactions, gene-disease associations (GDAs) and other relationships between biological entities. Various algorithms have been developed based on different mechanisms, but it is not obvious which algorithm is optimal for a specific disease. To address this issue, we benchmarked multiple algorithms for their application in cerebral small vessel disease (cSVD). We curated protein-gene interactions (PGIs) and GDAs from databases and assembled PGI networks and disease-gene heterogeneous networks. A screening of algorithms resulted in seven representative algorithms to be benchmarked. Performance of algorithms was assessed using both leave-one-out cross-validation (LOOCV) and external validation with MEGASTROKE genome-wide association study (GWAS). We found that random walk with restart on the heterogeneous network (RWRH) showed best LOOCV performance, with median LOOCV rediscovery rank of 185.5 (out of 19 463 genes). The GenePanda algorithm had most GWAS-confirmable genes in top 200 predictions, while RWRH had best ranks for small vessel stroke-associated genes confirmed in GWAS. In conclusion, RWRH has overall better performance for application in cSVD despite its susceptibility to bias caused by degree centrality. Choice of algorithms should be determined before applying to specific disease. Current pure network-based gene prioritization algorithms are unlikely to find novel disease-associated genes that are not associated with known ones. The tools for implementing and benchmarking algorithms have been made available and can be generalized for other diseases.
Collapse
Affiliation(s)
- Huayu Zhang
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Amy Ferguson
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Grant Robertson
- Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Muchen Jiang
- Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Teng Zhang
- Department of Orthopaedics and Traumatology, the University of Hong Kong, Hong Kong, China
| | - Cathie Sudlow
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Keith Smith
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Kristiina Rannikmae
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Honghan Wu
- Health Data Research UK, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
| |
Collapse
|
13
|
Tan K, Huang W, Liu X, Hu J, Dong S. A Hierarchical Graph Convolution Network for Representation Learning of Gene Expression Data. IEEE J Biomed Health Inform 2021; 25:3219-3229. [PMID: 33449889 DOI: 10.1109/jbhi.2021.3052008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The curse of dimensionality, which is caused by high-dimensionality and low-sample-size, is a major challenge in gene expression data analysis. However, the real situation is even worse: labelling data is laborious and time-consuming, so only a small part of the limited samples will be labelled. Having such few labelled samples further increases the difficulty of training deep learning models. Interpretability is an important requirement in biomedicine. Many existing deep learning methods are trying to provide interpretability, but rarely apply to gene expression data. Recent semi-supervised graph convolution network methods try to address these problems by smoothing the label information over a graph. However, to the best of our knowledge, these methods only utilize graphs in either the feature space or sample space, which restrict their performance. We propose a transductive semi-supervised representation learning method called a hierarchical graph convolution network (HiGCN) to aggregate the information of gene expression data in both feature and sample spaces. HiGCN first utilizes external knowledge to construct a feature graph and a similarity kernel to construct a sample graph. Then, two spatial-based GCNs are used to aggregate information on these graphs. To validate the model's performance, synthetic and real datasets are provided to lend empirical support. Compared with two recent models and three traditional models, HiGCN learns better representations of gene expression data, and these representations improve the performance of downstream tasks, especially when the model is trained on a few labelled samples. Important features can be extracted from our model to provide reliable interpretability.
Collapse
|
14
|
Watson J, Schwartz JM, Francavilla C. Using Multilayer Heterogeneous Networks to Infer Functions of Phosphorylated Sites. J Proteome Res 2021; 20:3532-3548. [PMID: 34164982 PMCID: PMC8256419 DOI: 10.1021/acs.jproteome.1c00150] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Indexed: 01/23/2023]
Abstract
Mass spectrometry-based quantitative phosphoproteomics has become an essential approach in the study of cellular processes such as signaling. Commonly used methods to analyze phosphoproteomics datasets depend on generic, gene-centric annotations such as Gene Ontology terms, which do not account for the function of a protein in a particular phosphorylation state. Analysis of phosphoproteomics data is hampered by a lack of phosphorylated site-specific annotations. We propose a method that combines shotgun phosphoproteomics data, protein-protein interactions, and functional annotations into a heterogeneous multilayer network. Phosphorylation sites are associated to potential functions using a random walk on the heterogeneous network (RWHN) algorithm. We validated our approach against a model of the MAPK/ERK pathway and functional annotations from PhosphoSitePlus and were able to associate differentially regulated sites on the same proteins to their previously described specific functions. We further tested the algorithm on three previously published datasets and were able to reproduce their experimentally validated conclusions and to associate phosphorylation sites with known functions based on their regulatory patterns. Our approach provides a refinement of commonly used analysis methods and accurately predicts context-specific functions for sites with similar phosphorylation profiles.
Collapse
Affiliation(s)
- Joanne Watson
- Division
of Evolution & Genomic Sciences, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| | - Jean-Marc Schwartz
- Division
of Evolution & Genomic Sciences, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| | - Chiara Francavilla
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine & Health, University of Manchester, Manchester M13 9PT, U.K.
| |
Collapse
|
15
|
Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021; 22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open
Abstract
The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Jiashuai Zhang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
16
|
Huang Q, Wang J, Zhang X, Guo M, Yu G. IsoDA: Isoform-Disease Association Prediction by Multiomics Data Fusion. J Comput Biol 2021; 28:804-819. [PMID: 33826865 DOI: 10.1089/cmb.2020.0626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A gene can be spliced into different isoforms by alternative splicing, which contributes to the functional diversity of protein species. Computational prediction of gene-disease associations (GDAs) has been studied for decades. However, the process of identifying the isoform-disease associations (IDAs) at a large scale is rarely explored, which can decipher the pathology at a more granular level. The main bottleneck is the lack of IDAs in current databases and the multilevel omics data fusion. To bridge this gap, we propose a computational approach called Isoform-Disease Association prediction by multiomics data fusion (IsoDA) to predict IDAs. Based on the relationship between a gene and its spliced isoforms, IsoDA first introduces a dispatch and aggregation term to dispatch gene-disease associations to individual isoforms, and reversely aggregate these dispatched associations to their hosting genes. At the same time, it fuses the genome, transcriptome, and proteome data by joint matrix factorization to improve the prediction of IDAs. Experimental results show that IsoDA significantly outperforms the related state-of-the-art methods at both the gene level and isoform level. A case study further shows that IsoDA credibly identifies three isoforms spliced from apolipoprotein E, which have individual associations with Alzheimer's disease, and two isoforms spliced from vascular endothelial growth factor A, which have different associations with coronary heart disease. The codes of IsoDA are available at http://mlda.swu.edu.cn/codes.php?name=IsoDA.
Collapse
Affiliation(s)
- Qiuyue Huang
- College of Computer and Information Science, Southwest University, Chongqing, China.,School of Software, Shandong University, Jinan, China
| | - Jun Wang
- School of Software, Shandong University, Jinan, China
| | - Xiangliang Zhang
- Department of Computer Science, Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maozu Guo
- Department of Computer Science, College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China.,School of Software, Shandong University, Jinan, China.,Department of Computer Science, Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
17
|
Luo P, Chen B, Liao B, Wu F. Predicting disease‐associated genes: Computational methods, databases, and evaluations. WIRES DATA MINING AND KNOWLEDGE DISCOVERY 2021; 11. [DOI: 10.1002/widm.1383] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Accepted: 06/13/2020] [Indexed: 09/09/2024]
Abstract
AbstractComplex diseases are associated with a set of genes (called disease genes), the identification of which can help scientists uncover the mechanisms of diseases and develop new drugs and treatment strategies. Due to the huge cost and time of experimental identification techniques, many computational algorithms have been proposed to predict disease genes. Although several review publications in recent years have discussed many computational methods, some of them focus on cancer driver genes while others focus on biomolecular networks, which only cover a specific aspect of existing methods. In this review, we summarize existing methods and classify them into three categories based on their rationales. Then, the algorithms, biological data, and evaluation methods used in the computational prediction are discussed. Finally, we highlight the limitations of existing methods and point out some future directions for improving these algorithms. This review could help investigators understand the principles of existing methods, and thus develop new methods to advance the computational prediction of disease genes.This article is categorized under:Technologies > Machine LearningTechnologies > PredictionAlgorithmic Development > Biological Data Mining
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering University of Saskatchewan Saskatoon Canada
- Princess Margaret Cancer Centre University Health Network Toronto Canada
| | - Bolin Chen
- School of Computer Science and Technology Northwestern Polytechnical University China
| | - Bo Liao
- School of Mathematics and Statistics Hainan Normal University Haikou China
| | - Fang‐Xiang Wu
- Department of Mechanical Engineering and Department of Computer Science University of Saskatchewan Saskatoon Canada
| |
Collapse
|
18
|
Yang K, Lu K, Wu Y, Yu J, Liu B, Zhao Y, Chen J, Zhou X. A network-based machine-learning framework to identify both functional modules and disease genes. Hum Genet 2021; 140:897-913. [PMID: 33409574 DOI: 10.1007/s00439-020-02253-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 12/22/2020] [Indexed: 01/20/2023]
Abstract
Disease gene identification is a critical step towards uncovering the molecular mechanisms of diseases and systematically investigating complex disease phenotypes. Despite considerable efforts to develop powerful computing methods, candidate gene identification remains a severe challenge owing to the connectivity of an incomplete interactome network, which hampers the discovery of true novel candidate genes. We developed a network-based machine-learning framework to identify both functional modules and disease candidate genes. In this framework, we designed a semi-supervised non-negative matrix factorization model to obtain the functional modules related to the diseases and genes. Of note, we proposed a disease gene-prioritizing method called MapGene that integrates the correlations from both functional modules and network closeness. Our framework identified a set of functional modules with highly functional homogeneity and close gene interactions. Experiments on a large-scale benchmark dataset showed that MapGene performs significantly better than the state-of-the-art algorithms. Further analysis demonstrates MapGene can effectively relieve the impact of the incompleteness of interactome networks and obtain highly reliable rankings of candidate genes. In addition, disease cases on Parkinson's disease and diabetes mellitus confirmed the generalization of MapGene for novel candidate gene identification. This work proposed, for the first time, an integrated computing framework to predict both functional modules and disease candidate genes. The methodology and results support that our framework has the potential to help discover underlying functional modules and reliable candidate genes in human disease.
Collapse
Affiliation(s)
- Kuo Yang
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China.,Institute for TCM-X, MOE Key Laboratory of Bioinformatics / Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, 10084, China
| | - Kezhi Lu
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China.,imec-DistriNet, KU Leuven, Leuven, 3001, Belgium
| | - Yang Wu
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Jian Yu
- Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Yi Zhao
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Jianxin Chen
- Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Xuezhong Zhou
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China. .,Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
| |
Collapse
|
19
|
Lu J, Wilfred P, Korbie D, Trau M. Regulation of Canonical Oncogenic Signaling Pathways in Cancer via DNA Methylation. Cancers (Basel) 2020; 12:E3199. [PMID: 33143142 PMCID: PMC7692324 DOI: 10.3390/cancers12113199] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/24/2020] [Accepted: 10/28/2020] [Indexed: 02/07/2023] Open
Abstract
Disruption of signaling pathways that plays a role in the normal development and cellular homeostasis may lead to the dysregulation of cellular signaling and bring about the onset of different diseases, including cancer. In addition to genetic aberrations, DNA methylation also acts as an epigenetic modifier to drive the onset and progression of cancer by mediating the reversible transcription of related genes. Although the role of DNA methylation as an alternative driver of carcinogenesis has been well-established, the global effects of DNA methylation on oncogenic signaling pathways and the presentation of cancer is only emerging. In this article, we introduced a differential methylation parsing pipeline (MethylMine) which mined for epigenetic biomarkers based on feature selection. This pipeline was used to mine for biomarkers, which presented a substantial difference in methylation between the tumor and the matching normal tissue samples. Combined with the Data Integration Analysis for Biomarker discovery (DIABLO) framework for machine learning and multi-omic analysis, we revisited the TCGA DNA methylation and RNA-Seq datasets for breast, colorectal, lung, and prostate cancer, and identified differentially methylated genes within the NRF2-KEAP1/PI3K oncogenic pathway, which regulates the expression of cytoprotective genes, that serve as potential therapeutic targets to treat different cancers.
Collapse
Affiliation(s)
- Jennifer Lu
- Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia; (J.L.); (P.W.)
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Premila Wilfred
- Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia; (J.L.); (P.W.)
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Darren Korbie
- Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia; (J.L.); (P.W.)
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Matt Trau
- Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia; (J.L.); (P.W.)
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
20
|
Chen Z, Bai X, Ma L, Wang X, Liu X, Liu Y, Chen L, Wan L. A Branch Point on Differentiation Trajectory is the Bifurcating Event Revealed by Dynamical Network Biomarker Analysis of Single-Cell Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:366-375. [PMID: 29994127 DOI: 10.1109/tcbb.2018.2847690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The advance in single-cell profiling technologies and the development in computational algorithms provide the opportunity to reconstruct pseudo temporal trajectory with branch point of cellular development. On the other hand, theories such as dynamical network biomarkers (DNB) theory have been recently proposed to characterize the pre-transition state in biological systems. Few studies have validated whether the branch point identified in pseudo time is the critical point in dynamical system. In this study, the dynamical behavior of the branch point on the pseudo trajectory has been investigated. We study the pseudo temporal trajectories reconstructed by Wishbone and diffusion pseudotime analysis (DPT) algorithms, as well as the simulated trajectory. DNB theory is applied to justify the bifurcating event on the pseudo trajectories. Our results demonstrate that the branch point recovered by Wishbone and DPT algorithms is confirmed as a transition state in cell differentiation process by DNB theory. Furthermore, we show that an appropriate DNB group will amplify the comprehensive index of critical event as defined in DNB theory. Our study provides biological insights on pseudo trajectory with branch point in a dynamical view and also indicates that DNB theory may serve as a benchmark to check the validity of branch point.
Collapse
|
21
|
Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020; 11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open
Abstract
Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene-disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene-disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xueping Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
- Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
22
|
Yang K, Wang R, Liu G, Shu Z, Wang N, Zhang R, Yu J, Chen J, Li X, Zhou X. HerGePred: Heterogeneous Network Embedding Representation for Disease Gene Prediction. IEEE J Biomed Health Inform 2020; 23:1805-1815. [PMID: 31283472 DOI: 10.1109/jbhi.2018.2870728] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The discovery of disease-causing genes is a critical step towards understanding the nature of a disease and determining a possible cure for it. In recent years, many computational methods to identify disease genes have been proposed. However, making full use of disease-related (e.g., symptoms) and gene-related (e.g., gene ontology and protein-protein interactions) information to improve the performance of disease gene prediction is still an issue. Here, we develop a heterogeneous disease-gene-related network (HDGN) embedding representation framework for disease gene prediction (called HerGePred). Based on this framework, a low-dimensional vector representation (LVR) of the nodes in the HDGN can be obtained. Then, we propose two specific algorithms, namely, an LVR-based similarity prediction and a random walk with restart on a reconstructed heterogeneous disease-gene network (RW-RDGN), to predict disease genes with high performance. First, to validate the rationality of the framework, we analyze the similarity-based overlap distribution of disease pairs and design an experiment for disease-gene association recovery, the results of which revealed that the LVR of nodes performs well at preserving the local and global network structure of the HDGN. Then, we apply tenfold cross validation and external validation to compare our methods with other well-known disease gene prediction algorithms. The experimental results show that the RW-RDGN performs better than the state-of-the-art algorithm. The prediction results of disease candidate genes are essential for molecular mechanism investigation and experimental validation. The source codes of HerGePred and experimental data are available at https://github.com/yangkuoone/HerGePred.
Collapse
|
23
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
24
|
Ma L, Rolls ET, Liu X, Liu Y, Jiao Z, Wang Y, Gong W, Ma Z, Gong F, Wan L. Multi-scale analysis of schizophrenia risk genes, brain structure, and clinical symptoms reveals integrative clues for subtyping schizophrenia patients. J Mol Cell Biol 2019; 11:678-687. [PMID: 30508120 PMCID: PMC6788727 DOI: 10.1093/jmcb/mjy071] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 11/01/2018] [Accepted: 11/20/2018] [Indexed: 12/30/2022] Open
Abstract
Analysis linking directly genomics, neuroimaging phenotypes and clinical measurements is crucial for understanding psychiatric disorders, but remains rare. Here, we describe a multi-scale analysis using genome-wide SNPs, gene expression, grey matter volume (GMV), and the positive and negative syndrome scale scores (PANSS) to explore the etiology of schizophrenia. With 72 drug-naive schizophrenic first episode patients (FEPs) and 73 matched heathy controls, we identified 108 genes, from schizophrenia risk genes, that correlated significantly with GMV, which are highly co-expressed in the brain during development. Among these 108 candidates, 19 distinct genes were found associated with 16 brain regions referred to as hot clusters (HCs), primarily in the frontal cortex, sensory-motor regions and temporal and parietal regions. The patients were subtyped into three groups with distinguishable PANSS scores by the GMV of the identified HCs. Furthermore, we found that HCs with common GMV among patient groups are related to genes that mostly mapped to pathways relevant to neural signaling, which are associated with the risk for schizophrenia. Our results provide an integrated view of how genetic variants may affect brain structures that lead to distinct disease phenotypes. The method of multi-scale analysis that was described in this research, may help to advance the understanding of the etiology of schizophrenia.
Collapse
Affiliation(s)
- Liang Ma
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Edmund T Rolls
- Department of Computer Science, University of Warwick, Coventry, UK.,Oxford Centre for Computational Neuroscience, Oxford, UK
| | - Xiuqin Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
| | - Yuting Liu
- School of Science, Beijing Jiaotong University, Beijing, China
| | - Zeyu Jiao
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Yue Wang
- School of Science, Beijing Jiaotong University, Beijing, China
| | - Weikang Gong
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhiming Ma
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Fuzhou Gong
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Lin Wan
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
25
|
Wu M, Lin Z, Ma S, Chen T, Jiang R, Wong WH. Simultaneous inference of phenotype-associated genes and relevant tissues from GWAS data via Bayesian integration of multiple tissue-specific gene networks. J Mol Cell Biol 2019; 9:436-452. [PMID: 29300920 DOI: 10.1093/jmcb/mjx059] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 12/20/2017] [Indexed: 02/07/2023] Open
Abstract
Although genome-wide association studies (GWAS) have successfully identified thousands of genomic loci associated with hundreds of complex traits in the past decade, the debate about such problems as missing heritability and weak interpretability has been appealing for effective computational methods to facilitate the advanced analysis of the vast volume of existing and anticipated genetic data. Towards this goal, gene-level integrative GWAS analysis with the assumption that genes associated with a phenotype tend to be enriched in biological gene sets or gene networks has recently attracted much attention, due to such advantages as straightforward interpretation, less multiple testing burdens, and robustness across studies. However, existing methods in this category usually exploit non-tissue-specific gene networks and thus lack the ability to utilize informative tissue-specific characteristics. To overcome this limitation, we proposed a Bayesian approach called SIGNET (Simultaneously Inference of GeNEs and Tissues) to integrate GWAS data and multiple tissue-specific gene networks for the simultaneous inference of phenotype-associated genes and relevant tissues. Through extensive simulation studies, we showed the effectiveness of our method in finding both associated genes and relevant tissues for a phenotype. In applications to real GWAS data of 14 complex phenotypes, we demonstrated the power of our method in both deciphering genetic basis and discovering biological insights of a phenotype. With this understanding, we expect to see SIGNET as a valuable tool for integrative GWAS analysis, thereby boosting the prevention, diagnosis, and treatment of human inherited diseases and eventually facilitating precision medicine.
Collapse
Affiliation(s)
- Mengmeng Wu
- Department of Computer Science, Tsinghua University, Beijing 100084, China.,Ministry of Education Key Laboratory of Bioinformatics and Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China.,Department of Statistics, Stanford University, CA 94305, USA
| | - Zhixiang Lin
- Department of Statistics, Stanford University, CA 94305, USA
| | - Shining Ma
- Department of Statistics, Stanford University, CA 94305, USA
| | - Ting Chen
- Department of Computer Science, Tsinghua University, Beijing 100084, China.,Ministry of Education Key Laboratory of Bioinformatics and Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics and Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China.,Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, CA 94305, USA
| |
Collapse
|
26
|
Li W, Wong WH, Jiang R. DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning. Nucleic Acids Res 2019; 47:e60. [PMID: 30869141 PMCID: PMC6547469 DOI: 10.1093/nar/gkz167] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 02/08/2019] [Accepted: 02/28/2019] [Indexed: 12/20/2022] Open
Abstract
Interactions between regulatory elements are of crucial importance for the understanding of transcriptional regulation and the interpretation of disease mechanisms. Hi-C technique has been developed for genome-wide detection of chromatin contacts. However, unless extremely deep sequencing is performed on a very large number of input cells, which is technically limited and expensive, current Hi-C experiments do not have high enough resolution to resolve contacts between regulatory elements. Here, we develop DeepTACT, a bootstrapping deep learning model, to integrate genome sequences and chromatin accessibility data for the prediction of chromatin contacts between regulatory elements. DeepTACT can infer not only promoter-enhancer interactions, but also promoter-promoter interactions. In tests based on promoter capture Hi-C data, DeepTACT shows better performance over existing methods. DeepTACT analysis also identifies a class of hub promoters, which are correlated with transcriptional activation across cell lines, enriched in housekeeping genes, functionally related to fundamental biological processes, and capable of reflecting cell similarity. Finally, the utility of chromatin contacts in the study of human diseases is illustrated by the association of IFNA2 to coronary artery disease via an integrative analysis of GWAS data and interactions predicted by DeepTACT.
Collapse
Affiliation(s)
- Wenran Li
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
27
|
Predicting disease-genes based on network information loss and protein complexes in heterogeneous network. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.12.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
28
|
Ma S, Jiang T, Jiang R. Constructing tissue-specific transcriptional regulatory networks via a Markov random field. BMC Genomics 2018; 19:884. [PMID: 30598101 PMCID: PMC6311931 DOI: 10.1186/s12864-018-5277-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recent advances in sequencing technologies have enabled parallel assays of chromatin accessibility and gene expression for major human cell lines. Such innovation provides a great opportunity to decode phenotypic consequences of genetic variation via the construction of predictive gene regulatory network models. However, there still lacks a computational method to systematically integrate chromatin accessibility information with gene expression data to recover complicated regulatory relationships between genes in a tissue-specific manner. RESULTS We propose a Markov random field (MRF) model for constructing tissue-specific transcriptional regulatory networks via integrative analysis of DNase-seq and RNA-seq data. Our method, named CSNets (cell-line specific regulatory networks), first infers regulatory networks for individual cell lines using chromatin accessibility information, and then fine-tunes these networks using the MRF based on pairwise similarity between cell lines derived from gene expression data. Using this method, we constructed regulatory networks specific to 110 human cell lines and 13 major tissues with the use of ENCODE data. We demonstrated the high quality of these networks via comprehensive statistical analysis based on ChIP-seq profiles, functional annotations, taxonomic analysis, and literature surveys. We further applied these networks to analyze GWAS data of Crohn's disease and prostate cancer. Results were either consistent with the literature or provided biological insights into regulatory mechanisms of these two complex diseases. The website of CSNets is freely available at http://bioinfo.au.tsinghua.edu.cn/jianglab/CSNETS/ . CONCLUSIONS CSNets demonstrated the power of joint analysis on epigenomic and transcriptomic data towards the accurate construction of gene regulatory network. Our work provides not only a useful resource of regulatory networks to the community, but also valuable experiences in methodology development for multi-omics data integration.
Collapse
Affiliation(s)
- Shining Ma
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program Stanford University, Stanford, CA 94305 USA
| | - Tao Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084 China
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084 China
| |
Collapse
|
29
|
Kabir MH, Patrick R, Ho JWK, O'Connor MD. Identification of active signaling pathways by integrating gene expression and protein interaction data. BMC SYSTEMS BIOLOGY 2018; 12:120. [PMID: 30598083 PMCID: PMC6311899 DOI: 10.1186/s12918-018-0655-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Background Signaling pathways are the key biological mechanisms that transduce extracellular signals to affect transcription factor mediated gene regulation within cells. A number of computational methods have been developed to identify the topological structure of a specific signaling pathway using protein-protein interaction data, but they are not designed for identifying active signaling pathways in an unbiased manner. On the other hand, there are statistical methods based on gene sets or pathway data that can prioritize likely active signaling pathways, but they do not make full use of active pathway structure that link receptor, kinases and downstream transcription factors. Results Here, we present a method to simultaneously predict the set of active signaling pathways, together with their pathway structure, by integrating protein-protein interaction network and gene expression data. We evaluated the capacity for our method to predict active signaling pathways for dental epithelial cells, ocular lens epithelial cells, human pluripotent stem cell-derived lens epithelial cells, and lens fiber cells. This analysis showed our approach could identify all the known active pathways that are associated with tooth formation and lens development. Conclusions The results suggest that SPAGI can be a useful approach to identify the potential active signaling pathways given a gene expression profile. Our method is implemented as an open source R package, available via https://github.com/VCCRI/SPAGI/. Electronic supplementary material The online version of this article (10.1186/s12918-018-0655-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Md Humayun Kabir
- School of Medicine, Western Sydney University, Campbelltown, NSW, Australia.,Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia.,Department of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh
| | - Ralph Patrick
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia.,St. Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia.,Stem Cells Australia, Melbourne Brain Centre, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Joshua W K Ho
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia. .,St. Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia. .,School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong, SAR, China.
| | - Michael D O'Connor
- School of Medicine, Western Sydney University, Campbelltown, NSW, Australia. .,Molecular Medicine Research Group, Western Sydney University, Campbelltown, NSW, Australia.
| |
Collapse
|
30
|
Yang K, Wang N, Liu G, Wang R, Yu J, Zhang R, Chen J, Zhou X. Heterogeneous network embedding for identifying symptom candidate genes. J Am Med Inform Assoc 2018; 25:1452-1459. [PMID: 30357378 PMCID: PMC7646926 DOI: 10.1093/jamia/ocy117] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 07/24/2018] [Accepted: 08/11/2018] [Indexed: 11/12/2022] Open
Abstract
Objective Investigating the molecular mechanisms of symptoms is a vital task in precision medicine to refine disease taxonomy and improve the personalized management of chronic diseases. Although there are abundant experimental studies and computational efforts to obtain the candidate genes of diseases, the identification of symptom genes is rarely addressed. We curated a high-quality benchmark dataset of symptom-gene associations and proposed a heterogeneous network embedding for identifying symptom genes. Methods We proposed a heterogeneous network embedding representation algorithm, which constructed a heterogeneous symptom-related network that integrated symptom-related associations and applied an embedding representation algorithm to obtain the low-dimensional vector representation of nodes. By measuring the relevance between symptoms and genes via calculating the similarities of their vectors, the candidate genes of given symptoms can be obtained. Results A benchmark dataset of 18 270 symptom-gene associations between 505 symptoms and 4549 genes was curated. We compared our method to baseline algorithms (FSGER and PRINCE). The experimental results indicated our algorithm achieved a significant improvement over the state-of-the-art method, with precision and recall improved by 66.80% (0.844 vs 0.506) and 53.96% (0.311 vs 0.202), respectively, for TOP@3 and association precision improved by 37.71% (0.723 vs 0.525) over the PRINCE. Conclusions The experimental validation of the algorithms and the literature validation of typical symptoms indicated our method achieved excellent performance. Hence, we curated a prediction dataset of 17 479 symptom-candidate genes. The benchmark and prediction datasets have the potential to promote investigations of the molecular mechanisms of symptoms and provide candidate genes for validation in experimental settings.
Collapse
Affiliation(s)
- Kuo Yang
- School of Computer and Information Technology and Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
| | - Ning Wang
- School of Computer and Information Technology and Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
| | - Guangming Liu
- School of Computer and Information Technology and Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
| | - Ruyu Wang
- School of Computer and Information Technology and Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
| | - Jian Yu
- School of Computer and Information Technology and Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
| | - Runshun Zhang
- Guanganmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Jianxin Chen
- Beijing University of Chinese Medicine, Beijing, China
| | - Xuezhong Zhou
- School of Computer and Information Technology and Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| |
Collapse
|
31
|
Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning. Methods 2018; 145:41-50. [DOI: 10.1016/j.ymeth.2018.06.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 04/10/2018] [Accepted: 06/01/2018] [Indexed: 12/20/2022] Open
|
32
|
Li W, Wang M, Sun J, Wang Y, Jiang R. Gene co-opening network deciphers gene functional relationships. MOLECULAR BIOSYSTEMS 2018; 13:2428-2439. [PMID: 28976510 DOI: 10.1039/c7mb00430c] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Genome sequencing technology has generated a vast amount of genomic and epigenomic data, and has provided us a great opportunity to study gene functions on a global scale from an epigenomic view. In the last decade, network-based studies, such as those based on PPI networks and co-expression networks, have shown good performance in capturing functional relationships between genes. However, the functions of a gene and the mechanism of interaction of genes with each other to elucidate their functions are still not entirely clear. Here, we construct a gene co-opening network based on chromatin accessibility of genes. We show that genes related to a specific biological process or the same disease tend to be clustered in the co-opening network. This understanding allows us to detect functional clusters from the network and to predict new functions for genes. We further apply the network to prioritize disease genes for Psoriasis, and demonstrate the power of the joint analysis of the co-opening network and GWAS data in identifying disease genes. Taken together, the co-opening network provides a new viewpoint for the elucidation of gene associations and the interpretation of disease mechanisms.
Collapse
Affiliation(s)
- Wenran Li
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China.
| | | | | | | | | |
Collapse
|
33
|
Abstract
Background Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. Results We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. Conclusions EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification. Electronic supplementary material The online version of this article (10.1186/s12864-018-4459-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wanwen Zeng
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, Beijing, 100084, China.,Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Mengmeng Wu
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, Beijing, 100084, China.,Department of Computer Science, Tsinghua University, Beijing, 100084, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, Beijing, 100084, China. .,Department of Automation, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
34
|
Wang Y, Fu L, Ren J, Yu Z, Chen T, Sun F. Identifying Group-Specific Sequences for Microbial Communities Using Long k-mer Sequence Signatures. Front Microbiol 2018; 9:872. [PMID: 29774017 PMCID: PMC5943621 DOI: 10.3389/fmicb.2018.00872] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 04/16/2018] [Indexed: 12/19/2022] Open
Abstract
Comparing metagenomic samples is crucial for understanding microbial communities. For different groups of microbial communities, such as human gut metagenomic samples from patients with a certain disease and healthy controls, identifying group-specific sequences offers essential information for potential biomarker discovery. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered "group-specific" in our study. Our main purpose is to discover group-specific sequence regions between control and case groups as disease-associated markers. We developed a long k-mer (k ≥ 30 bps)-based computational pipeline to detect group-specific sequences at strain resolution free from reference sequences, sequence alignments, and metagenome-wide de novo assembly. We called our method MetaGO: Group-specific oligonucleotide analysis for metagenomic samples. An open-source pipeline on Apache Spark was developed with parallel computing. We applied MetaGO to one simulated and three real metagenomic datasets to evaluate the discriminative capability of identified group-specific markers. In the simulated dataset, 99.11% of group-specific logical 40-mers covered 98.89% disease-specific regions from the disease-associated strain. In addition, 97.90% of group-specific numerical 40-mers covered 99.61 and 96.39% of differentially abundant genome and regions between two groups, respectively. For a large-scale metagenomic liver cirrhosis (LC)-associated dataset, we identified 37,647 group-specific 40-mer features. Any one of the features can predict disease status of the training samples with the average of sensitivity and specificity higher than 0.8. The random forests classification using the top 10 group-specific features yielded a higher AUC (from ∼0.8 to ∼0.9) than that of previous studies. All group-specific 40-mers were present in LC patients, but not healthy controls. All the assembled 11 LC-specific sequences can be mapped to two strains of Veillonella parvula: UTDB1-3 and DSM2008. The experiments on the other two real datasets related to Inflammatory Bowel Disease and Type 2 Diabetes in Women consistently demonstrated that MetaGO achieved better prediction accuracy with fewer features compared to previous studies. The experiments showed that MetaGO is a powerful tool for identifying group-specific k-mers, which would be clinically applicable for disease prediction. MetaGO is available at https://github.com/VVsmileyx/MetaGO.
Collapse
Affiliation(s)
- Ying Wang
- Department of Automation, Xiamen University, Xiamen, China
| | - Lei Fu
- Department of Automation, Xiamen University, Xiamen, China
| | - Jie Ren
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, United States
| | - Zhaoxia Yu
- Department of Statistics, University of California, Irvine, Irvine, CA, United States
| | - Ting Chen
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, United States
- Bioinformatics Division, Tsinghua National Laboratory of Information Science and Technology, Tsinghua University, Beijing, China
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, United States
- Center for Computational Systems Biology, Fudan University, Shanghai, China
| |
Collapse
|
35
|
Tu R, Qian J, Rui M, Tao N, Sun M, Zhuang Y, Lv H, Han J, Li M, Xie W. Proteolytic cleavage is required for functional neuroligin 2 maturation and trafficking in Drosophila. J Mol Cell Biol 2018; 9:231-242. [PMID: 28498949 PMCID: PMC5907836 DOI: 10.1093/jmcb/mjx015] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 05/03/2017] [Indexed: 01/15/2023] Open
Abstract
Neuroligins (Nlgs) are transmembrane cell adhesion molecules playing essential roles in synapse development and function. Genetic mutations in neuroligin genes have been linked with some neurodevelopmental disorders such as autism. These mutated Nlgs are mostly retained in the endoplasmic reticulum (ER). However, the mechanisms underlying normal Nlg maturation and trafficking have remained largely unknown. Here, we found that Drosophila neuroligin 2 (DNlg2) undergoes proteolytic cleavage in the ER in a variety of Drosophila tissues throughout developmental stages. A region encompassing Y642-T698 is required for this process. The immature non-cleavable DNlg2 is retained in the ER and non-functional. The C-terminal fragment of DNlg2 instead of the full-length or non-cleavable DNlg2 is able to rescue neuromuscular junction defects and GluRIIB reduction induced by dnlg2 deletion. Intriguingly, the autism-associated R598C mutation in DNlg2 leads to similar marked defects in DNlg2 proteolytic process and ER export, revealing a potential role of the improper Nlg cleavage in autism pathogenesis. Collectively, our findings uncover a specific mechanism that controls DNlg2 maturation and trafficking via proteolytic cleavage in the ER, suggesting that the perturbed proteolytic cleavage of Nlgs likely contributes to autism disorder.
Collapse
Affiliation(s)
- Renjun Tu
- Institute of Life Sciences, The Collaborative Innovation Center for Brain Science, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Jinjun Qian
- Institute of Life Sciences, The Collaborative Innovation Center for Brain Science, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Menglong Rui
- The Key Laboratory of Developmental Genes and Human Disease, Jiangsu Co-innovation Center of Neuroregeneration, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Nana Tao
- Institute of Life Sciences, The Collaborative Innovation Center for Brain Science, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Mingkuan Sun
- The Key Laboratory of Developmental Genes and Human Disease, Jiangsu Co-innovation Center of Neuroregeneration, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Yan Zhuang
- The Key Laboratory of Developmental Genes and Human Disease, Jiangsu Co-innovation Center of Neuroregeneration, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Huihui Lv
- Institute of Life Sciences, The Collaborative Innovation Center for Brain Science, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Junhai Han
- Institute of Life Sciences, The Collaborative Innovation Center for Brain Science, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China.,The Key Laboratory of Developmental Genes and Human Disease, Jiangsu Co-innovation Center of Neuroregeneration, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Moyi Li
- Institute of Life Sciences, The Collaborative Innovation Center for Brain Science, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China.,The Key Laboratory of Developmental Genes and Human Disease, Jiangsu Co-innovation Center of Neuroregeneration, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| | - Wei Xie
- Institute of Life Sciences, The Collaborative Innovation Center for Brain Science, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China.,The Key Laboratory of Developmental Genes and Human Disease, Jiangsu Co-innovation Center of Neuroregeneration, Southeast University, 2 SiPaiLou Road, Nanjing 210096, China
| |
Collapse
|
36
|
Yang K, Liu G, Wang N, Zhang R, Yu J, Chen J, Zhou X. Heterogeneous network propagation for herb target identification. BMC Med Inform Decis Mak 2018; 18:17. [PMID: 29589568 PMCID: PMC5872392 DOI: 10.1186/s12911-018-0592-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Identifying targets of herbs is a primary step for investigating pharmacological mechanisms of herbal drugs in Traditional Chinese medicine (TCM). Experimental targets identification of herbs is a difficult and time-consuming work. Computational method for identifying herb targets is an efficient approach. However, how to make full use of heterogeneous network data about herbs and targets to improve the performance of herb targets prediction is still a dilemma. METHODS In our study, a random walk algorithm on the heterogeneous herb-target network (named heNetRW) has been proposed to identify protein targets of herbs. By building a heterogeneous herb-target network involving herbs, targets and their interactions and simulating random walk algorithm on the network, the candidate targets of the given herb can be predicted. RESULTS The experimental results on large-scale dataset showed that heNetRW had higher performance of targets prediction than PRINCE (improved F1-score by 0.08 and Hit@1 by 21.34% in one validation setting, and improved F1-score by 0.54 and Hit@1 by 69.08% in the other validation setting). Furthermore, we evaluated novel candidate targets of two herbs (rhizoma coptidis and turmeric), which showed our approach could generate potential targets that are valuable for further experimental investigations. CONCLUSIONS Compared with PRINCE algorithm, heNetRW algorithm can fuse more known information (such as, known herb-target associations and pathway-based similarities of protein pairs) to improve prediction performance. Experimental results also indicated heNetRW had higher performance than PRINCE. The prediction results not only can be used to guide the selection of candidate targets of herbs, but also help to reveal the molecule mechanisms of herbal drugs.
Collapse
Affiliation(s)
- Kuo Yang
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044 China
| | - Guangming Liu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044 China
| | - Ning Wang
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044 China
| | - Runshun Zhang
- Guanganmen Hospital, China Academy of Chinese Medical Sciences, Beijing, 100053 China
| | - Jian Yu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044 China
| | - Jianxin Chen
- Beijing University of Chinese Medicine, Beijing, 100029 China
| | - Xuezhong Zhou
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044 China
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700 China
| |
Collapse
|
37
|
Zhao XM, Li S. HISP: a hybrid intelligent approach for identifying directed signaling pathways. J Mol Cell Biol 2018; 9:453-462. [DOI: 10.1093/jmcb/mjx054] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 12/20/2017] [Indexed: 01/15/2023] Open
Affiliation(s)
- Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Shan Li
- Department of Mathematics, Shanghai University, Shanghai 200444, China
| |
Collapse
|
38
|
|
39
|
Lin L, Yang T, Fang L, Yang J, Yang F, Zhao J. Gene gravity-like algorithm for disease gene prediction based on phenotype-specific network. BMC SYSTEMS BIOLOGY 2017; 11:121. [PMID: 29212543 PMCID: PMC5718078 DOI: 10.1186/s12918-017-0519-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 11/24/2017] [Indexed: 01/24/2023]
Abstract
Background Polygenic diseases are usually caused by the dysfunction of multiple genes. Unravelling such disease genes is crucial to fully understand the genetic landscape of diseases on molecular level. With the advent of ‘omic’ data era, network-based methods have prominently boosted disease gene discovery. However, how to make better use of different types of data for the prediction of disease genes remains a challenge. Results In this study, we improved the performance of disease gene prediction by integrating the similarity of disease phenotype, biological function and network topology. First, for each phenotype, a phenotype-specific network was specially constructed by mapping phenotype similarity information of given phenotype onto the protein-protein interaction (PPI) network. Then, we developed a gene gravity-like algorithm, to score candidate genes based on not only topological similarity but also functional similarity. We tested the proposed network and algorithm by conducting leave-one-out and leave-10%-out cross validation and compared them with state-of-art algorithms. The results showed a preference to phenotype-specific network as well as gene gravity-like algorithm. At last, we tested the predicting capacity of proposed algorithms by test gene set derived from the DisGeNET database. Also, potential disease genes of three polygenic diseases, obesity, prostate cancer and lung cancer, were predicted by proposed methods. We found that the predicted disease genes are highly consistent with literature and database evidence. Conclusions The good performance of phenotype-specific networks indicates that phenotype similarity information has positive effect on the prediction of disease genes. The proposed gene gravity-like algorithm outperforms the algorithm of Random Walk with Restart (RWR), implicating its predicting capacity by combing topological similarity with functional similarity. Our work will give an insight to the discovery of disease genes by fusing multiple similarities of genes and diseases. Electronic supplementary material The online version of this article (10.1186/s12918-017-0519-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Limei Lin
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Tinghong Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Ling Fang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jian Yang
- School of Pharmacy, Second Military Medical University, Shanghai, China
| | - Fan Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jing Zhao
- Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
| |
Collapse
|
40
|
Zhang L, Liu Y, Wang M, Wu Z, Li N, Zhang J, Yang C. EZH2-, CHD4-, and IDH-linked epigenetic perturbation and its association with survival in glioma patients. J Mol Cell Biol 2017; 9:477-488. [PMID: 29272522 PMCID: PMC5907834 DOI: 10.1093/jmcb/mjx056] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 11/12/2017] [Accepted: 12/18/2017] [Indexed: 12/13/2022] Open
Abstract
Glioma is a complex disease with limited treatment options. Recent advances have identified isocitrate dehydrogenase (IDH) mutations in up to 80% lower grade gliomas (LGG) and in 76% secondary glioblastomas (GBM). IDH mutations are also seen in 10%-20% of acute myeloid leukemia (AML). In AML, it was determined that mutations of IDH and other genes involving epigenetic regulations are early events, emerging in the pre-leukemic stem cells (pre-LSCs) stage, whereas mutations in genes propagating oncogenic signal are late events in leukemia. IDH mutations are also early events in glioma, occurring before TP53 mutation, 1p/19q deletion, etc. Despite these advances in glioma research, studies into other molecular alterations have lagged considerably. In this study, we analyzed currently available databases. We identified EZH2, KMT2C, and CHD4 as important genes in glioma in addition to the known gene IDH1/2. We also showed that genomic alterations of PIK3CA, CDKN2A, CDK4, FIP1L1, or FUBP1 collaborate with IDH mutations to negatively affect patients' survival in LGG. In LGG patients with TP53 mutations or IDH1/2 mutations, additional genomic alterations of EZH2, KMC2C, and CHD4 individually or in combination were associated with a markedly decreased disease-free survival than patients without such alterations. Alterations of EZH2, KMT2C, and CHD4 at genetic level or protein level could perturb epigenetic program, leading to malignant transformation in glioma. By reviewing current literature on both AML and glioma and performing bioinformatics analysis on available datasets, we developed a hypothetical model on the tumorigenesis from premalignant stem cells to glioma.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Ying Liu
- The Vivian Smith Department of Neurosurgery, Center for Stem Cell and Regenerative Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Mengning Wang
- Harvard Stem Cell Institute, Harvard University, Cambridge, MA, USA
| | - Zhenhai Wu
- Department of neurosurgery, ShouGuang People’s Hospital, Shandong, China
| | - Na Li
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jinsong Zhang
- Pharmacological & Physiological Science, School of Medicine, Saint Louis University, St. Louis, MO, USA
| | - Chuanwei Yang
- Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
41
|
Gan M, Li W, Zeng W, Wang X, Jiang R. Mimvec: a deep learning approach for analyzing the human phenome. BMC SYSTEMS BIOLOGY 2017; 11:76. [PMID: 28950906 PMCID: PMC5615244 DOI: 10.1186/s12918-017-0451-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Background The human phenome has been widely used with a variety of genomic data sources in the inference of disease genes. However, most existing methods thus far derive phenotype similarity based on the analysis of biomedical databases by using the traditional term frequency-inverse document frequency (TF-IDF) formulation. This framework, though intuitive, not only ignores semantic relationships between words but also tends to produce high-dimensional vectors, and hence lacks the ability to precisely capture intrinsic semantic characteristics of biomedical documents. To overcome these limitations, we propose a framework called mimvec to analyze the human phenome by making use of the state-of-the-art deep learning technique in natural language processing. Results We converted 24,061 records in the Online Mendelian Inheritance in Man (OMIM) database to low-dimensional vectors using our method. We demonstrated that the vector presentation not only effectively enabled classification of phenotype records against gene ones, but also succeeded in discriminating diseases of different inheritance styles and different mechanisms. We further derived pairwise phenotype similarities between 7988 human inherited diseases using their vector presentations. With a joint analysis of this phenome with multiple genomic data, we showed that phenotype overlap indeed implied genotype overlap. We finally used the derived phenotype similarities with genomic data to prioritize candidate genes and demonstrated advantages of this method over existing ones. Conclusions Our method is capable of not only capturing semantic relationships between words in biomedical records but also alleviating the dimensional disaster accompanying the traditional TF-IDF framework. With the approaching of precision medicine, there will be abundant electronic records of medicine and health awaiting for deep analysis, and we expect to see a wide spectrum of applications borrowing the idea of our method in the near future.
Collapse
Affiliation(s)
- Mingxin Gan
- Department of Management Science and Engineering, Dongling School of Economics and Management, University of Science and Technology Beijing, Beijing, 100083, China
| | - Wenran Li
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Department of Automation and Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Wanwen Zeng
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Department of Automation and Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Cardiovascular Disease, Fu Wai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100037, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Department of Automation and Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China. .,Institute for Data Science, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
42
|
Wu M, Chen T, Jiang R. Leveraging multiple genomic data to prioritize disease-causing indels from exome sequencing data. Sci Rep 2017; 7:1804. [PMID: 28496131 PMCID: PMC5431795 DOI: 10.1038/s41598-017-01834-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 04/05/2017] [Indexed: 01/26/2023] Open
Abstract
The emergence of exome sequencing in recent years has enabled rapid and cost-effective detection of genetic variants in coding regions and offers a great opportunity to combine sequencing experiments with subsequent computational analysis for dissecting genetic basis of human inherited diseases. However, this strategy, though successful in practice, still faces such challenges as limited sample size and substantial number or diversity of candidate variants. To overcome these obstacles, researchers have been concentrated in the development of advanced computational methods and have recently achieved great progress for analysing single nucleotide variant. Nevertheless, it still remains unclear on how to analyse indels, another type of genetic variant that accounts for substantial proportion of known disease-causing variants. In this paper, we proposed an integrative method to effectively identify disease-causing indels from exome sequencing data. Specifically, we put forward a statistical method to combine five functional prediction scores, four genic association scores and a genic intolerance score to produce an integrated p-value, which could then be used for prioritizing candidate indels. We performed extensive simulation studies and demonstrated that our method achieved high accuracy in uncovering disease-causing indels. Our software is available at http://bioinfo.au.tsinghua.edu.cn/jianglab/IndelPrioritizer/.
Collapse
Affiliation(s)
- Mengmeng Wu
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China.,Department of Computer Science, Tsinghua University, Beijing, 100084, China
| | - Ting Chen
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China. .,Department of Computer Science, Tsinghua University, Beijing, 100084, China.
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China. .,Department of Automation, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
43
|
Köhler S, Vasilevsky NA, Engelstad M, Foster E, McMurry J, Aymé S, Baynam G, Bello SM, Boerkoel CF, Boycott KM, Brudno M, Buske OJ, Chinnery PF, Cipriani V, Connell LE, Dawkins HJS, DeMare LE, Devereau AD, de Vries BBA, Firth HV, Freson K, Greene D, Hamosh A, Helbig I, Hum C, Jähn JA, James R, Krause R, F Laulederkind SJ, Lochmüller H, Lyon GJ, Ogishima S, Olry A, Ouwehand WH, Pontikos N, Rath A, Schaefer F, Scott RH, Segal M, Sergouniotis PI, Sever R, Smith CL, Straub V, Thompson R, Turner C, Turro E, Veltman MWM, Vulliamy T, Yu J, von Ziegenweidt J, Zankl A, Züchner S, Zemojtel T, Jacobsen JOB, Groza T, Smedley D, Mungall CJ, Haendel M, Robinson PN. The Human Phenotype Ontology in 2017. Nucleic Acids Res 2016; 45:D865-D876. [PMID: 27899602 PMCID: PMC5210535 DOI: 10.1093/nar/gkw1039] [Citation(s) in RCA: 501] [Impact Index Per Article: 62.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 10/28/2016] [Indexed: 12/14/2022] Open
Abstract
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Nicole A Vasilevsky
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Mark Engelstad
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Erin Foster
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Julie McMurry
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Ségolène Aymé
- Institut du Cerveau et de la Moelle épinière-ICM, CNRS UMR 7225-Inserm U 1127-UPMC-P6 UMR S 1127, Hôpital Pitié-Salpêtrière, 47, bd de l'Hôpital, 75013 Paris, France
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, King Edward Memorial Hospital Department of Health, Government of Western Australia, Perth, WA 6008, Australia.,School of Paediatrics and Child Health, University of Western Australia, Perth, WA 6008, Australia
| | - Susan M Bello
- The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, USA
| | - Cornelius F Boerkoel
- Imagenetics Research, Sanford Health, PO Box 5039, Route 5001, Sioux Falls, SD 57117-5039, USA
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1L7, Canada
| | - Orion J Buske
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1L7, Canada
| | - Patrick F Chinnery
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge CB2 0QQ, UK.,NIHR Rare Diseases Translational Research Collaboration, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Valentina Cipriani
- UCL Institute of Ophthalmology, Department of Ocular Biology and Therapeutics, 11-43 Bath Street, London EC1V 9EL, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | | | - Hugh J S Dawkins
- Office of Population Health Genomics, Public Health Division, Health Department of Western Australia, 189 Royal Street, Perth, WA, 6004 Australia
| | - Laura E DeMare
- Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA
| | - Andrew D Devereau
- Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Bert B A de Vries
- Department of Human Genetics, Radboud University, University Medical Centre, Nijmegen, The Netherlands
| | - Helen V Firth
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, Leuven, Belgium
| | - Daniel Greene
- Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Long Road, Cambridge CB2 0PT, UK.,Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | - Ada Hamosh
- McKusick-Nathans Institute of Genetic Medicine, Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ingo Helbig
- Division of Neurology, The Children's Hospital of Philadelphia, 3501 Civic Center Blvd, Philadelphia, PA 19104, USA.,Department of Neuropediatrics, University Medical Center Schleswig-Holstein (UKSH), Kiel, Germany
| | - Courtney Hum
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON M5G 1H3, Canada
| | - Johanna A Jähn
- Department of Neuropediatrics, University Medical Center Schleswig-Holstein (UKSH), Kiel, Germany
| | - Roger James
- NIHR Rare Diseases Translational Research Collaboration, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK.,Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | - Roland Krause
- LuxembourgCentre for Systems Biomedicine, University of Luxembourg, 7, avenue des Hauts-Fourneaux, L-4362 Esch-sur-Alzette, Luxembourg
| | | | - Hanns Lochmüller
- John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, University of Newcastle, Newcastle upon Tyne, UK
| | - Gholson J Lyon
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, New York, NY 11797, USA
| | - Soichi Ogishima
- Dept of Bioclinical Informatics, Tohoku Medical Megabank Organization, Tohoku University, Tohoku Medical Megabank Organization Bldg 7F room #741,736, Seiryo 2-1, Aoba-ku, Sendai Miyagi 980-8573 Japan
| | - Annie Olry
- Orphanet-INSERM, US14, Plateforme Maladies Rares, 96 rue Didot, 75014 Paris, France
| | - Willem H Ouwehand
- Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | - Nikolas Pontikos
- UCL Institute of Ophthalmology, Department of Ocular Biology and Therapeutics, 11-43 Bath Street, London EC1V 9EL, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Ana Rath
- Orphanet-INSERM, US14, Plateforme Maladies Rares, 96 rue Didot, 75014 Paris, France
| | - Franz Schaefer
- Division of Pediatric Nephrology and KFH Children's Kidney Center, Center for Pediatrics and Adolescent Medicine, 69120 Heidelberg, Germany
| | - Richard H Scott
- Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Michael Segal
- SimulConsult Inc., 27 Crafts Road, Chestnut Hill, MA 02467, USA
| | | | - Richard Sever
- Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA
| | - Cynthia L Smith
- The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, USA
| | - Volker Straub
- John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, University of Newcastle, Newcastle upon Tyne, UK
| | - Rachel Thompson
- John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, University of Newcastle, Newcastle upon Tyne, UK
| | - Catherine Turner
- John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, University of Newcastle, Newcastle upon Tyne, UK
| | - Ernest Turro
- Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Long Road, Cambridge CB2 0PT, UK.,Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | - Marijcke W M Veltman
- NIHR Rare Diseases Translational Research Collaboration, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Tom Vulliamy
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Jing Yu
- Nuffield Department of Clinical Neurosciences, University of Oxford, Level 6, West Wing, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Julie von Ziegenweidt
- Department of Haematology, University of Cambridge, NHS Blood and Transplant Centre, Long Road, Cambridge CB2 0PT, UK
| | - Andreas Zankl
- Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, Australia.,Academic Department of Medical Genetics, Sydney Childrens Hospitals Network (Westmead), Australia
| | - Stephan Züchner
- JD McDonald Department of Human Genetics and Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
| | - Tomasz Zemojtel
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Julius O B Jacobsen
- Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Tudor Groza
- Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia
| | - Damian Smedley
- Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Melissa Haendel
- Library and Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA .,Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| |
Collapse
|
44
|
Qin GM, Li RY, Zhao XM. Identifying Disease Associated miRNAs Based on Protein Domains. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:1027-1035. [PMID: 26829801 DOI: 10.1109/tcbb.2016.2515608] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
MicroRNAs (miRNAs) are a class of small endogenous non-coding genes, acting as regulators in the post-transcriptional processes. Recently, the miRNAs are found to be widely involved in different types of diseases. Therefore, the identification of disease associated miRNAs can help understand the mechanisms that underlie the disease and identify new biomarkers. However, it is not easy to identify the miRNAs related to diseases due to its extensive involvements in various biological processes. In this work, we present a new approach to identify disease associated miRNAs based on domains, the functional and structural blocks of proteins. The results on real datasets demonstrate that our method can effectively identify disease related miRNAs with high precision.
Collapse
|
45
|
Zhang Y, Huang H, Dong X, Fang Y, Wang K, Zhu L, Wang K, Huang T, Yang J. A Dynamic 3D Graphical Representation for RNA Structure Analysis and Its Application in Non-Coding RNA Classification. PLoS One 2016; 11:e0152238. [PMID: 27213271 PMCID: PMC4877074 DOI: 10.1371/journal.pone.0152238] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 03/10/2016] [Indexed: 12/21/2022] Open
Abstract
With the development of new technologies in transcriptome and epigenetics, RNAs have been identified to play more and more important roles in life processes. Consequently, various methods have been proposed to assess the biological functions of RNAs and thus classify them functionally, among which comparative study of RNA structures is perhaps the most important one. To measure the structural similarity of RNAs and classify them, we propose a novel three dimensional (3D) graphical representation of RNA secondary structure, in which an RNA secondary structure is first transformed into a characteristic sequence based on chemical property of nucleic acids; a dynamic 3D graph is then constructed for the characteristic sequence; and lastly a numerical characterization of the 3D graph is used to represent the RNA secondary structure. We tested our algorithm on three datasets: (1) Dataset I consisting of nine RNA secondary structures of viruses, (2) Dataset II consisting of complex RNA secondary structures including pseudo-knots, and (3) Dataset III consisting of 18 non-coding RNA families. We also compare our method with other nine existing methods using Dataset II and III. The results demonstrate that our method is better than other methods in similarity measurement and classification of RNA secondary structures.
Collapse
Affiliation(s)
- Yi Zhang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
- Hebei Laboratory of Pharmaceutic Molecular Chemistry, Shijiazhuang, Hebei 050018, People's Republic of China
- * E-mail: (JY); (YZ); (TH)
| | - Haiyun Huang
- Department of Information Retrieval of Library, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Xiaoqing Dong
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Yiliang Fang
- International Travel Healthcare Center, Fuzhou, Fujian 350001, People's Republic of China
| | - Kejing Wang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Lijuan Zhu
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Ke Wang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
- * E-mail: (JY); (YZ); (TH)
| | - Jialiang Yang
- Department of Mathematics, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, People's Republic of China
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States of America
- * E-mail: (JY); (YZ); (TH)
| |
Collapse
|
46
|
Wang Y, Jiang R, Wong WH. Modeling the causal regulatory network by integrating chromatin accessibility and transcriptome data. Natl Sci Rev 2016; 3:240-251. [PMID: 28690910 DOI: 10.1093/nsr/nww025] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Cell packs a lot of genetic and regulatory information through a structure known as chromatin, i.e. DNA is wrapped around histone proteins and is tightly packed in a remarkable way. To express a gene in a specific coding region, the chromatin would open up and DNA loop may be formed by interacting enhancers and promoters. Furthermore, the mediator and cohesion complexes, sequence-specific transcription factors, and RNA polymerase II are recruited and work together to elaborately regulate the expression level. It is in pressing need to understand how the information, about when, where, and to what degree genes should be expressed, is embedded into chromatin structure and gene regulatory elements. Thanks to large consortia such as Encyclopedia of DNA Elements (ENCODE) and Roadmap Epigenomic projects, extensive data on chromatin accessibility and transcript abundance are available across many tissues and cell types. This rich data offer an exciting opportunity to model the causal regulatory relationship. Here, we will review the current experimental approaches, foundational data, computational problems, interpretive frameworks, and integrative models that will enable the accurate interpretation of regulatory landscape. Particularly, we will discuss the efforts to organize, analyze, model, and integrate the DNA accessibility data, transcriptional data, and functional genomic regions together. We believe that these efforts will eventually help us understand the information flow within the cell and will influence research directions across many fields.
Collapse
Affiliation(s)
- Yong Wang
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305, USA.,Academy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100080, China
| | - Rui Jiang
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305, USA.,MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
47
|
Wu J, Wu M, Li L, Liu Z, Zeng W, Jiang R. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw024. [PMID: 26989155 PMCID: PMC4795934 DOI: 10.1093/database/baw024] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 02/14/2016] [Indexed: 12/26/2022]
Abstract
The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases.
Collapse
Affiliation(s)
- Jiaxin Wu
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Mengmeng Wu
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Lianshuo Li
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Zhuo Liu
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wanwen Zeng
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
48
|
Zhang W, Coba MP, Sun F. Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:4. [PMID: 26818594 PMCID: PMC4895779 DOI: 10.1186/s12918-015-0247-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understandingof the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases. Methods Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations. Results We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn’s disease and type 2 diabetes. Conclusions The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0247-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wangshu Zhang
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, USA.
| | - Marcelo P Coba
- Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. .,Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, USA. .,Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China.
| |
Collapse
|
49
|
Zou Q, Li J, Song L, Zeng X, Wang G. Similarity computation strategies in the microRNA-disease network: a survey. Brief Funct Genomics 2015; 15:55-64. [PMID: 26134276 DOI: 10.1093/bfgp/elv024] [Citation(s) in RCA: 141] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Various microRNAs have been demonstrated to play roles in a number of human diseases. Several microRNA-disease network reconstruction methods have been used to describe the association from a systems biology perspective. The key problem for the network is the similarity computation model. In this article, we reviewed the main similarity computation methods and discussed these methods and future works. This survey may prompt and guide systems biology and bioinformatics researchers to build more perfect microRNA-disease associations and may make the network relationship clear for medical researchers.
Collapse
|