1
|
Sgariglia D, Carneiro FRG, Vidal de Carvalho LA, Pedreira CE, Carels N, da Silva FAB. Optimizing therapeutic targets for breast cancer using boolean network models. Comput Biol Chem 2024; 109:108022. [PMID: 38350182 DOI: 10.1016/j.compbiolchem.2024.108022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 09/18/2023] [Accepted: 01/31/2024] [Indexed: 02/15/2024]
Abstract
Studying gene regulatory networks associated with cancer provides valuable insights for therapeutic purposes, given that cancer is fundamentally a genetic disease. However, as the number of genes in the system increases, the complexity arising from the interconnections between network components grows exponentially. In this study, using Boolean logic to adjust the existing relationships between network components has facilitated simplifying the modeling process, enabling the generation of attractors that represent cell phenotypes based on breast cancer RNA-seq data. A key therapeutic objective is to guide cells, through targeted interventions, to transition from the current cancer attractor to a physiologically distinct attractor unrelated to cancer. To achieve this, we developed a computational method that identifies network nodes whose inhibition can facilitate the desired transition from one tumor attractor to another associated with apoptosis, leveraging transcriptomic data from cell lines. To validate the model, we utilized previously published in vitro experiments where the downregulation of specific proteins resulted in cell growth arrest and death of a breast cancer cell line. The method proposed in this manuscript combines diverse data sources, conducts structural network analysis, and incorporates relevant biological knowledge on apoptosis in cancer cells. This comprehensive approach aims to identify potential targets of significance for personalized medicine.
Collapse
Affiliation(s)
| | - Flavia Raquel Gonçalves Carneiro
- Center of Technological Development in Health (CDTS), FIOCRUZ, Rio de Janeiro, Brazil; Laboratório Interdisciplinar de Pesquisas Médicas Instituto Oswaldo Cruz, FIOCRUZ, Rio de Janeiro, Brazil; Program of Immunology and Tumor Biology, Brazilian National Cancer Institute(INCA), Rio de Janeiro 20231050, Brazil
| | | | | | - Nicolas Carels
- Platform of Biological System Modeling, Center of Technological Development in Health (CDTS), FIOCRUZ, Rio de Janeiro, Brazil
| | | |
Collapse
|
2
|
Lv Y, Wen L, Hu WJ, Deng C, Ren HW, Bao YN, Su BW, Gao P, Man ZY, Luo YY, Li CJ, Xiang ZX, Wang B, Luan ZL. Schizophrenia in the genetic era: a review from development history, clinical features and genomic research approaches to insights of susceptibility genes. Metab Brain Dis 2024; 39:147-171. [PMID: 37542622 DOI: 10.1007/s11011-023-01271-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/27/2023] [Indexed: 08/07/2023]
Abstract
Schizophrenia is a devastating neuropsychiatric disorder affecting 1% of the world population and ranks as one of the disorders providing the most severe burden for society. Schizophrenia etiology remains obscure involving multi-risk factors, such as genetic, environmental, nutritional, and developmental factors. Complex interactions of genetic and environmental factors have been implicated in the etiology of schizophrenia. This review provides an overview of the historical origins, pathophysiological mechanisms, diagnosis, clinical symptoms and corresponding treatment of schizophrenia. In addition, as schizophrenia is a polygenic, genetic disorder caused by the combined action of multiple micro-effective genes, we further detail several approaches, such as candidate gene association study (CGAS) and genome-wide association study (GWAS), which are commonly used in schizophrenia genomics studies. A number of GWASs about schizophrenia have been performed with the hope to identify novel, consistent and influential risk genetic factors. Finally, some schizophrenia susceptibility genes have been identified and reported in recent years and their biological functions are also listed. This review may serve as a summary of past research on schizophrenia genomics and susceptibility genes (NRG1, DISC1, RELN, BDNF, MSI2), which may point the way to future schizophrenia genetics research. In addition, depending on the above discovery of susceptibility genes and their exact function, the development and application of antipsychotic drugs will be promoted in the future.
Collapse
Affiliation(s)
- Ye Lv
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Lin Wen
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Wen-Juan Hu
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Chong Deng
- Department of Neurosurgery, The Second Affiliated Hospital of Dalian Medical University, Dalian, 116027, China
| | - Hui-Wen Ren
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Ya-Nan Bao
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Bo-Wei Su
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Ping Gao
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Zi-Yue Man
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Yi-Yang Luo
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Cheng-Jie Li
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Zhi-Xin Xiang
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China
| | - Bing Wang
- Department of Endocrinology and Metabolism, The Central hospital of Dalian University of Technology, Dalian, 116000, China.
| | - Zhi-Lin Luan
- Advanced Institute for Medical Sciences, Dalian Medical University, Dalian, 116044, China.
| |
Collapse
|
3
|
Zhong J, Han C, Wang Y, Chen P, Liu R. Identifying the critical state of complex biological systems by the directed-network rank score method. Bioinformatics 2022; 38:5398-5405. [PMID: 36282843 PMCID: PMC9750123 DOI: 10.1093/bioinformatics/btac707] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 09/21/2022] [Accepted: 10/24/2022] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Catastrophic transitions are ubiquitous in the dynamic progression of complex biological systems; that is, a critical transition at which complex systems suddenly shift from one stable state to another occurs. Identifying such a critical point or tipping point is essential for revealing the underlying mechanism of complex biological systems. However, it is difficult to identify the tipping point since few significant differences in the critical state are detected in terms of traditional static measurements. RESULTS In this study, by exploring the dynamic changes in gene cooperative effects between the before-transition and critical states, we presented a model-free approach, the directed-network rank score (DNRS), to detect the early-warning signal of critical transition in complex biological systems. The proposed method is applicable to both bulk and single-cell RNA-sequencing (scRNA-seq) data. This computational method was validated by the successful identification of the critical or pre-transition state for both simulated and six real datasets, including three scRNA-seq datasets of embryonic development and three tumor datasets. In addition, the functional and pathway enrichment analyses suggested that the corresponding DNRS signaling biomarkers were involved in key biological processes. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/zhongjiayuan/DNRS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Yangkai Wang
- School of Mathematics, South China University of Technology, Guangzhou 510640, China
| | - Pei Chen
- To whom correspondence should be addressed. or
| | - Rui Liu
- To whom correspondence should be addressed. or
| |
Collapse
|
4
|
Yang K, Zheng Y, Lu K, Chang K, Wang N, Shu Z, Yu J, Liu B, Gao Z, Zhou X. PDGNet: Predicting Disease Genes Using a Deep Neural Network With Multi-View Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:575-584. [PMID: 32750864 DOI: 10.1109/tcbb.2020.3002771] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The knowledge of phenotype-genotype associations is crucial for the understanding of disease mechanisms. Numerous studies have focused on developing efficient and accurate computing approaches to predict disease genes. However, owing to the sparseness and complexity of medical data, developing an efficient deep neural network model to identify disease genes remains a huge challenge. Therefore, we develop a novel deep neural network model that fuses the multi-view features of phenotypes and genotypes to identify disease genes (termed PDGNet). Our model integrated the multi-view features of diseases and genes and leveraged the feedback information of training samples to optimize the parameters of deep neural network and obtain the deep vector features of diseases and genes. The evaluation experiments on a large data set indicated that PDGNet obtained higher performance than the state-of-the-art method (precision and recall improved by 9.55 and 9.63 percent). The analysis results for the candidate genes indicated that the predicted genes have strong functional homogeneity and dense interactions with known genes. We validated the top predicted genes of Parkinson's disease based on external curated data and published medical literatures, which indicated that the candidate genes have a huge potential to guide the selection of causal genes in the 'wet experiment'. The source codes and the data of PDGNet are available at https://github.com/yangkuoone/PDGNet.
Collapse
|
5
|
Umlai UKI, Bangarusamy DK, Estivill X, Jithesh PV. Genome sequencing data analysis for rare disease gene discovery. Brief Bioinform 2021; 23:6366880. [PMID: 34498682 DOI: 10.1093/bib/bbab363] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/24/2021] [Accepted: 08/17/2021] [Indexed: 12/14/2022] Open
Abstract
Rare diseases occur in a smaller proportion of the general population, which is variedly defined as less than 200 000 individuals (US) or in less than 1 in 2000 individuals (Europe). Although rare, they collectively make up to approximately 7000 different disorders, with majority having a genetic origin, and affect roughly 300 million people globally. Most of the patients and their families undergo a long and frustrating diagnostic odyssey. However, advances in the field of genomics have started to facilitate the process of diagnosis, though it is hindered by the difficulty in genome data analysis and interpretation. A major impediment in diagnosis is in the understanding of the diverse approaches, tools and datasets available for variant prioritization, the most important step in the analysis of millions of variants to select a few potential variants. Here we present a review of the latest methodological developments and spectrum of tools available for rare disease genetic variant discovery and recommend appropriate data interpretation methods for variant prioritization. We have categorized the resources based on various steps of the variant interpretation workflow, starting from data processing, variant calling, annotation, filtration and finally prioritization, with a special emphasis on the last two steps. The methods discussed here pertain to elucidating the genetic basis of disease in individual patient cases via trio- or family-based analysis of the genome data. We advocate the use of a combination of tools and datasets and to follow multiple iterative approaches to elucidate the potential causative variant.
Collapse
Affiliation(s)
- Umm-Kulthum Ismail Umlai
- Division of Genomics & Translational Biomedicine, College of Health & Life Sciences, Hamad Bin Khalifa University, B-147, Penrose House, PO Box 34110, Education City, Doha, Qatar
| | - Dhinoth Kumar Bangarusamy
- Division of Genomics & Translational Biomedicine, College of Health & Life Sciences, Hamad Bin Khalifa University, B-147, Penrose House, PO Box 34110, Education City, Doha, Qatar
| | - Xavier Estivill
- Quantitative Genomics Laboratories (qGenomics), Barcelona, Catalonia, Spain
| | - Puthen Veettil Jithesh
- Division of Genomics & Translational Biomedicine, College of Health & Life Sciences, Hamad Bin Khalifa University, B-147, Penrose House, PO Box 34110, Education City, Doha, Qatar
| |
Collapse
|
6
|
Thummadi NB, Vishnu E, Subbiah EV, Manimaran P. A graph centrality-based approach for candidate gene prediction for type 1 diabetes. Immunol Res 2021; 69:422-428. [PMID: 34297307 DOI: 10.1007/s12026-021-09217-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/15/2021] [Indexed: 10/20/2022]
Abstract
Type 1 diabetes mellitus (T1DM) or insulin-dependent diabetes is an autoimmune disease that may pose life-threatening situations to individuals. In most cases, cytotoxic T lymphocytes (CTLs) promotes killing of islets of Langerhans in the pancreas, which harbour insulin-producing beta cells. The trigger for autoimmune attack is still unclear; therefore, identifying and targeting candidate genes are imperative to hinder its deleterious effects. In the present study, we focused on identification of new candidate genes for T1DM. For our study, we exclusively selected immune-related genes as they play a crucial role in T1DM. We constructed and analysed a human immunome signalling network (directed network) to identify the new candidate genes through various graph centrality measures combining with Gene Ontology (GO). As a result, we identified 4 new candidate genes which may act as potential drug targets for T1DM. We further validated for their disease relevance through literature survey and pathway analysis and found that 3 out of 4 predicted genes mirrored their well-established roles as potential targets for T1DM.
Collapse
Affiliation(s)
- N B Thummadi
- Department of Animal Biology, University of Hyderabad, Gachibowli, Hyderabad, 500046, India
| | - E Vishnu
- School of Physics, University of Hyderabad, Gachibowli, Hyderabad, 500046, Telangana, India
| | - E V Subbiah
- Department of Sports Biosciences, Central University of Rajasthan, Kishangarh, Ajmer, 305817, India
| | - P Manimaran
- School of Physics, University of Hyderabad, Gachibowli, Hyderabad, 500046, Telangana, India.
| |
Collapse
|
7
|
An integrative network-based approach for drug target indication expansion. PLoS One 2021; 16:e0253614. [PMID: 34242265 PMCID: PMC8270215 DOI: 10.1371/journal.pone.0253614] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 06/08/2021] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The identification of a target-indication pair is regarded as the first step in a traditional drug discovery and development process. Significant investment and attrition occur during discovery and development before a molecule is shown to be safe and efficacious for the selected indication and becomes an approved drug. Many drug targets are functionally pleiotropic and might be good targets for multiple indications. Methodologies that leverage years of scientific contributions on drug targets to allow systematic evaluation of other indication opportunities are critical for both patients and drug discovery and development scientists. METHODS We introduced a network-based approach to systematically screen and prioritize disease indications for drug targets. The approach fundamentally integrates disease genomics data and protein interaction network. Further, the methodology allows for indication identification by leveraging state-of-art network algorithms to generate and compare the target and disease subnetworks. RESULTS We first evaluated the performance of our method on recovering FDA approved indications for 15 randomly selected drug targets. The results showed superior performance when compared with other state-of-art approaches. Using this approach, we predicted novel indications supported by literature evidence for several highly pursued drug targets such as IL12/IL23 combination. CONCLUSIONS Our results demonstrated a potential global approach for indication expansion strategies. The proposed methodology enables rapid and systematic evaluation of both individual and combined drug targets for novel indications. Additionally, this approach provides novel insights on expanding the role of genes and pathways for developing therapeutic intervention strategies.
Collapse
|
8
|
Yang K, Wang R, Liu G, Shu Z, Wang N, Zhang R, Yu J, Chen J, Li X, Zhou X. HerGePred: Heterogeneous Network Embedding Representation for Disease Gene Prediction. IEEE J Biomed Health Inform 2020; 23:1805-1815. [PMID: 31283472 DOI: 10.1109/jbhi.2018.2870728] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The discovery of disease-causing genes is a critical step towards understanding the nature of a disease and determining a possible cure for it. In recent years, many computational methods to identify disease genes have been proposed. However, making full use of disease-related (e.g., symptoms) and gene-related (e.g., gene ontology and protein-protein interactions) information to improve the performance of disease gene prediction is still an issue. Here, we develop a heterogeneous disease-gene-related network (HDGN) embedding representation framework for disease gene prediction (called HerGePred). Based on this framework, a low-dimensional vector representation (LVR) of the nodes in the HDGN can be obtained. Then, we propose two specific algorithms, namely, an LVR-based similarity prediction and a random walk with restart on a reconstructed heterogeneous disease-gene network (RW-RDGN), to predict disease genes with high performance. First, to validate the rationality of the framework, we analyze the similarity-based overlap distribution of disease pairs and design an experiment for disease-gene association recovery, the results of which revealed that the LVR of nodes performs well at preserving the local and global network structure of the HDGN. Then, we apply tenfold cross validation and external validation to compare our methods with other well-known disease gene prediction algorithms. The experimental results show that the RW-RDGN performs better than the state-of-the-art algorithm. The prediction results of disease candidate genes are essential for molecular mechanism investigation and experimental validation. The source codes of HerGePred and experimental data are available at https://github.com/yangkuoone/HerGePred.
Collapse
|
9
|
Barman RK, Mukhopadhyay A, Maulik U, Das S. Identification of infectious disease-associated host genes using machine learning techniques. BMC Bioinformatics 2019; 20:736. [PMID: 31881961 PMCID: PMC6935192 DOI: 10.1186/s12859-019-3317-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 12/16/2019] [Indexed: 02/06/2023] Open
Abstract
Background With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. Results We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. Conclusions To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.
Collapse
Affiliation(s)
- Ranjan Kumar Barman
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India.,Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Santasabuj Das
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India. .,Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, P-33, C.I.T.Road Scheme XM, Beliaghata-700010, Kolkata, West Bengal, India.
| |
Collapse
|
10
|
Xu W, Li S, Zhang Z, Hu J, Zhao Y. Prioritization of differentially expressed genes through integrating public expression data. Anim Genet 2019; 50:726-732. [PMID: 31512747 DOI: 10.1111/age.12855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/06/2019] [Indexed: 11/29/2022]
Abstract
Differentially expressed gene (DEG) analysis is a major approach for interpreting phenotype differences and produces a large number of candidate genes. Given that it is burdensome to validate too many genes through benchwork, an urgent need exists for DEG prioritization. Here, a novel method is proposed for prioritizing bona fide DEGs by constructing the normal range of gene expression through integrating public expression data. Prioritization was performed by ranking the differences in cumulative probability for genes in case and control groups. DEGs from a study on pig muscle tissue were used to evaluate the prioritization accuracy. The results showed that the method reached an area under the receiver operating characteristic curve of 96.42% and can effectively shorten the list of candidate genes from a differential expression experiment to find novel causal genes. Our method can be easily extended to other tissues or species to promote functional research in broad applications.
Collapse
Affiliation(s)
- W Xu
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.,State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| | - S Li
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.,State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| | - Z Zhang
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| | - J Hu
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| | - Y Zhao
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.,State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
11
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
12
|
Sun D, Ren X, Ari E, Korcsmaros T, Csermely P, Wu LY. Discovering cooperative biomarkers for heterogeneous complex disease diagnoses. Brief Bioinform 2019; 20:89-101. [PMID: 28968712 DOI: 10.1093/bib/bbx090] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Indexed: 12/13/2022] Open
Abstract
Biomarkers with high reproducibility and accurate prediction performance can contribute to comprehending the underlying pathogenesis of related complex diseases and further facilitate disease diagnosis and therapy. Techniques integrating gene expression profiles and biological networks for the identification of network-based disease biomarkers are receiving increasing interest. The biomarkers for heterogeneous diseases often exhibit strong cooperative effects, which implies that a set of genes may achieve more accurate outcome prediction than any single gene. In this study, we evaluated various biomarker identification methods that consider gene cooperative effects implicitly or explicitly, and proposed the gene cooperation network to explicitly model the cooperative effects of gene combinations. The gene cooperation network-enhanced method, named as MarkRank, achieves superior performance compared with traditional biomarker identification methods in both simulation studies and real data sets. The biomarkers identified by MarkRank not only have a better prediction accuracy but also have stronger topological relationships in the biological network and exhibit high specificity associated with the related diseases. Furthermore, the top genes identified by MarkRank involve crucial biological processes of related diseases and give a good prioritization for known disease genes. In conclusion, MarkRank suggests that explicit modeling of gene cooperative effects can greatly improve biomarker identification for complex diseases, especially for diseases with high heterogeneity.
Collapse
Affiliation(s)
- Duanchen Sun
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xianwen Ren
- Biodynamic Optical Imaging Center, Peking University, Beijing, China
| | - Eszter Ari
- Department of Genetics, Eötvös Loránd University, Budapest
| | - Tamas Korcsmaros
- Institute of Food Research and the Earlham Institute, Norwich, UK
| | - Peter Csermely
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Ling-Yun Wu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
13
|
Cáceres JJ, Paccanaro A. Disease gene prediction for molecularly uncharacterized diseases. PLoS Comput Biol 2019; 15:e1007078. [PMID: 31276496 PMCID: PMC6636748 DOI: 10.1371/journal.pcbi.1007078] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 07/17/2019] [Accepted: 05/09/2019] [Indexed: 02/06/2023] Open
Abstract
Network medicine approaches have been largely successful at increasing our knowledge of molecularly characterized diseases. Given a set of disease genes associated with a disease, neighbourhood-based methods and random walkers exploit the interactome allowing the prediction of further genes for that disease. In general, however, diseases with no known molecular basis constitute a challenge. Here we present a novel network approach to prioritize gene-disease associations that is able to also predict genes for diseases with no known molecular basis. Our method, which we have called Cardigan (ChARting DIsease Gene AssociatioNs), uses semi-supervised learning and exploits a measure of similarity between disease phenotypes. We evaluated its performance at predicting genes for both molecularly characterized and uncharacterized diseases in OMIM, using both weighted and binary interactomes, and compared it with state-of-the-art methods. Our tests, which use datasets collected at different points in time to replicate the dynamics of the disease gene discovery process, prove that Cardigan is able to accurately predict disease genes for molecularly uncharacterized diseases. Additionally, standard leave-one-out cross validation tests show how our approach outperforms state-of-the-art methods at predicting genes for molecularly characterized diseases by 14%-65%. Cardigan can also be used for disease module prediction, where it outperforms state-of-the-art methods by 87%-299%.
Collapse
Affiliation(s)
- Juan J. Cáceres
- Centre for Systems and Synthetic Biology & Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
| | - Alberto Paccanaro
- Centre for Systems and Synthetic Biology & Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
- * E-mail:
| |
Collapse
|
14
|
Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, Cau P, Remy E, Baudot A. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics 2018; 35:497-505. [DOI: 10.1093/bioinformatics/bty637] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 07/16/2018] [Indexed: 01/04/2023] Open
Affiliation(s)
- Alberto Valdeolivas
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
- ProGeLife, Marseille
| | - Laurent Tichit
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| | - Claire Navarro
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Sophie Perrin
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Gaëlle Odelin
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Nicolas Levy
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Pierre Cau
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Elisabeth Remy
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| |
Collapse
|
15
|
Vlaic S, Conrad T, Tokarski-Schnelle C, Gustafsson M, Dahmen U, Guthke R, Schuster S. ModuleDiscoverer: Identification of regulatory modules in protein-protein interaction networks. Sci Rep 2018; 8:433. [PMID: 29323246 PMCID: PMC5764996 DOI: 10.1038/s41598-017-18370-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 12/06/2017] [Indexed: 02/08/2023] Open
Abstract
The identification of disease-associated modules based on protein-protein interaction networks (PPINs) and gene expression data has provided new insights into the mechanistic nature of diverse diseases. However, their identification is hampered by the detection of protein communities within large-scale, whole-genome PPINs. A presented successful strategy detects a PPIN's community structure based on the maximal clique enumeration problem (MCE), which is a non-deterministic polynomial time-hard problem. This renders the approach computationally challenging for large PPINs implying the need for new strategies. We present ModuleDiscoverer, a novel approach for the identification of regulatory modules from PPINs and gene expression data. Following the MCE-based approach, ModuleDiscoverer uses a randomization heuristic-based approximation of the community structure. Given a PPIN of Rattus norvegicus and public gene expression data, we identify the regulatory module underlying a rodent model of non-alcoholic steatohepatitis (NASH), a severe form of non-alcoholic fatty liver disease (NAFLD). The module is validated using single-nucleotide polymorphism (SNP) data from independent genome-wide association studies and gene enrichment tests. Based on gene enrichment tests, we find that ModuleDiscoverer performs comparably to three existing module-detecting algorithms. However, only our NASH-module is significantly enriched with genes linked to NAFLD-associated SNPs. ModuleDiscoverer is available at http://www.hki-jena.de/index.php/0/2/490 (Others/ModuleDiscoverer).
Collapse
Affiliation(s)
- Sebastian Vlaic
- Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Systems Biology and Bioinformatics, Jena, 07745, Germany.
- Friedrich-Schiller-University, Department of Bioinformatics, Jena, 07743, Germany.
| | - Theresia Conrad
- Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Systems Biology and Bioinformatics, Jena, 07745, Germany
| | - Christian Tokarski-Schnelle
- Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Systems Biology and Bioinformatics, Jena, 07745, Germany
- University Hospital Jena, Friedrich-Schiller-University, General, Visceral and Vascular Surgery, Jena, 07749, Germany
| | - Mika Gustafsson
- Linköping University, Bioinformatics, Department of Physics, Chemistry and Biology, Linköping, 581 83, Sweden
| | - Uta Dahmen
- University Hospital Jena, Friedrich-Schiller-University, General, Visceral and Vascular Surgery, Jena, 07749, Germany
| | - Reinhard Guthke
- Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Systems Biology and Bioinformatics, Jena, 07745, Germany
| | - Stefan Schuster
- Friedrich-Schiller-University, Department of Bioinformatics, Jena, 07743, Germany
| |
Collapse
|
16
|
Peng C, Li A, Wang M. Discovery of Bladder Cancer-related Genes Using Integrative Heterogeneous Network Modeling of Multi-omics Data. Sci Rep 2017; 7:15639. [PMID: 29142286 PMCID: PMC5688092 DOI: 10.1038/s41598-017-15890-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 11/02/2017] [Indexed: 02/06/2023] Open
Abstract
In human health, a fundamental challenge is the identification of disease-related genes. Bladder cancer (BC) is a worldwide malignant tumor, which has resulted in 170,000 deaths in 2010 up from 114,000 in 1990. Moreover, with the emergence of multi-omics data, more comprehensive analysis of human diseases become possible. In this study, we propose a multi-step approach for the identification of BC-related genes by using integrative Heterogeneous Network Modeling of Multi-Omics data (iHNMMO). The heterogeneous network model properly and comprehensively reflects the multiple kinds of relationships between genes in the multi-omics data of BC, including general relationships, unique relationships under BC condition, correlational relationships within each omics and regulatory relationships between different omics. Besides, a network-based propagation algorithm with resistance is utilized to quantize the relationships between genes and BC precisely. The results of comprehensive performance evaluation suggest that iHNMMO significantly outperforms other approaches. Moreover, further analysis suggests that the top ranked genes may be functionally implicated in BC, which also confirms the superiority of iHNMMO. In summary, this study shows that disease-related genes can be better identified through reasonable integration of multi-omics data.
Collapse
Affiliation(s)
- Chen Peng
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China.
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230037, China.
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230037, China
| |
Collapse
|
17
|
Tian Z, Guo M, Wang C, Xing L, Wang L, Zhang Y. Constructing an integrated gene similarity network for the identification of disease genes. J Biomed Semantics 2017; 8:32. [PMID: 29297379 PMCID: PMC5763299 DOI: 10.1186/s13326-017-0141-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. RESULTS We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. CONCLUSIONS RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Maozu Guo
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Chunyu Wang
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - LinLin Xing
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Lei Wang
- Institute of Health Service and Medical Information Academy of Military Medical Sciences Beijing, Beijing, 100850 China
| | - Yin Zhang
- Institute of Health Service and Medical Information Academy of Military Medical Sciences Beijing, Beijing, 100850 China
| |
Collapse
|
18
|
Caldera M, Buphamalai P, Müller F, Menche J. Interactome-based approaches to human disease. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.coisb.2017.04.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
19
|
Peng C, Li A. A Heterogeneous Network Based Method for Identifying GBM-Related Genes by Integrating Multi-Dimensional Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:713-720. [PMID: 28113912 DOI: 10.1109/tcbb.2016.2555314] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The emergence of multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of human diseases and therefore improving diagnosis, treatment, and prevention. In this study, we proposed a heterogeneous network based method by integrating multi-dimensional data (HNMD) to identify GBM-related genes. The novelty of the method lies in that the multi-dimensional data of GBM from TCGA dataset that provide comprehensive information of genes, are combined with protein-protein interactions to construct a weighted heterogeneous network, which reflects both the general and disease-specific relationships between genes. In addition, a propagation algorithm with resistance is introduced to precisely score and rank GBM-related genes. The results of comprehensive performance evaluation show that the proposed method significantly outperforms the network based methods with single-dimensional data and other existing approaches. Subsequent analysis of the top ranked genes suggests they may be functionally implicated in GBM, which further corroborates the superiority of the proposed method. The source code and the results of HNMD can be downloaded from the following URL: http://bioinformatics.ustc.edu.cn/hnmd/ .
Collapse
|
20
|
Kaalia R, Ghosh I. Semantics based approach for analyzing disease-target associations. J Biomed Inform 2016; 62:125-35. [PMID: 27349858 DOI: 10.1016/j.jbi.2016.06.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 06/23/2016] [Accepted: 06/24/2016] [Indexed: 12/16/2022]
Abstract
BACKGROUND A complex disease is caused by heterogeneous biological interactions between genes and their products along with the influence of environmental factors. There have been many attempts for understanding the cause of these diseases using experimental, statistical and computational methods. In the present work the objective is to address the challenge of representation and integration of information from heterogeneous biomedical aspects of a complex disease using semantics based approach. METHODS Semantic web technology is used to design Disease Association Ontology (DAO-db) for representation and integration of disease associated information with diabetes as the case study. The functional associations of disease genes are integrated using RDF graphs of DAO-db. Three semantic web based scoring algorithms (PageRank, HITS (Hyperlink Induced Topic Search) and HITS with semantic weights) are used to score the gene nodes on the basis of their functional interactions in the graph. RESULTS Disease Association Ontology for Diabetes (DAO-db) provides a standard ontology-driven platform for describing genes, proteins, pathways involved in diabetes and for integrating functional associations from various interaction levels (gene-disease, gene-pathway, gene-function, gene-cellular component and protein-protein interactions). An automatic instance loader module is also developed in present work that helps in adding instances to DAO-db on a large scale. CONCLUSIONS Our ontology provides a framework for querying and analyzing the disease associated information in the form of RDF graphs. The above developed methodology is used to predict novel potential targets involved in diabetes disease from the long list of loose (statistically associated) gene-disease associations.
Collapse
Affiliation(s)
- Rama Kaalia
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Indira Ghosh
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
21
|
Predicting Abdominal Aortic Aneurysm Target Genes by Level-2 Protein-Protein Interaction. PLoS One 2015; 10:e0140888. [PMID: 26496478 PMCID: PMC4619739 DOI: 10.1371/journal.pone.0140888] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 09/30/2015] [Indexed: 12/22/2022] Open
Abstract
Abdominal aortic aneurysm (AAA) is frequently lethal and has no effective pharmaceutical treatment, posing a great threat to human health. Previous bioinformatics studies of the mechanisms underlying AAA relied largely on the detection of direct protein-protein interactions (level-1 PPI) between the products of reported AAA-related genes. Thus, some proteins not suspected to be directly linked to previously reported genes of pivotal importance to AAA might have been missed. In this study, we constructed an indirect protein-protein interaction (level-2 PPI) network based on common interacting proteins encoded by known AAA-related genes and successfully predicted previously unreported AAA-related genes using this network. We used four methods to test and verify the performance of this level-2 PPI network: cross validation, human AAA mRNA chip array comparison, literature mining, and verification in a mouse CaPO4 AAA model. We confirmed that the new level-2 PPI network is superior to the original level-1 PPI network and proved that the top 100 candidate genes predicted by the level-2 PPI network shared similar GO functions and KEGG pathways compared with positive genes.
Collapse
|
22
|
Abstract
Background Coronary artery disease (CAD), one of the leading causes of death globally, is influenced by both environmental and genetic risk factors. Gene-centric genome-wide association studies (GWAS) involving cases and controls have been remarkably successful in identifying genetic loci contributing to CAD. Modern in silico platforms, such as candidate gene prediction tools, permit a systematic analysis of GWAS data to identify candidate genes for complex diseases like CAD. Subsequent integration of drug-target data from drug databases with the predicted candidate genes can potentially identify novel therapeutics suitable for repositioning towards treatment of CAD. Methods Previously, we were able to predict 264 candidate genes and 104 potential therapeutic targets for CAD using Gentrepid (http://www.gentrepid.org), a candidate gene prediction platform with two bioinformatic modules to reanalyze Wellcome Trust Case-Control Consortium GWAS data. In an expanded study, using five bioinformatic modules on the same data, Gentrepid predicted 647 candidate genes and successfully replicated 55% of the candidate genes identified by the more powerful CARDIoGRAMplusC4D consortium meta-analysis. Hence, Gentrepid was capable of enhancing lower quality genotype-phenotype data, using an independent knowledgebase of existing biological data. Here, we used our methodology to integrate drug data from three drug databases: the Therapeutic Target Database, PharmGKB and Drug Bank, with the 647 candidate gene predictions from Gentrepid. We utilized known CAD targets, the scientific literature, existing drug data and the CARDIoGRAMplusC4D meta-analysis study as benchmarks to validate Gentrepid predictions for CAD. Results Our analysis identified a total of 184 predicted candidate genes as novel therapeutic targets for CAD, and 981 novel therapeutics feasible for repositioning in clinical trials towards treatment of CAD. The benchmarks based on known CAD targets and the scientific literature showed that our results were significant (p < 0.05). Conclusions We have demonstrated that available drugs may potentially be repositioned as novel therapeutics for the treatment of CAD. Drug repositioning can save valuable time and money spent on preclinical and phase I clinical studies.
Collapse
|
23
|
Luo Y, Riedlinger G, Szolovits P. Text mining in cancer gene and pathway prioritization. Cancer Inform 2014; 13:69-79. [PMID: 25392685 PMCID: PMC4216063 DOI: 10.4137/cin.s13874] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 05/18/2014] [Accepted: 05/18/2014] [Indexed: 12/18/2022] Open
Abstract
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
Collapse
Affiliation(s)
- Yuan Luo
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gregory Riedlinger
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
24
|
Smedley D, Köhler S, Czeschik JC, Amberger J, Bocchini C, Hamosh A, Veldboer J, Zemojtel T, Robinson PN. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases. Bioinformatics 2014; 30:3215-22. [PMID: 25078397 PMCID: PMC4221119 DOI: 10.1093/bioinformatics/btu508] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Motivation: Whole-exome sequencing (WES) has opened up previously unheard of possibilities for identifying novel disease genes in Mendelian disorders, only about half of which have been elucidated to date. However, interpretation of WES data remains challenging. Results: Here, we analyze protein–protein association (PPA) networks to identify candidate genes in the vicinity of genes previously implicated in a disease. The analysis, using a random-walk with restart (RWR) method, is adapted to the setting of WES by developing a composite variant-gene relevance score based on the rarity, location and predicted pathogenicity of variants and the RWR evaluation of genes harboring the variants. Benchmarking using known disease variants from 88 disease-gene families reveals that the correct gene is ranked among the top 10 candidates in ≥50% of cases, a figure which we confirmed using a prospective study of disease genes identified in 2012 and PPA data produced before that date. We implement our method in a freely available Web server, ExomeWalker, that displays a ranked list of candidates together with information on PPAs, frequency and predicted pathogenicity of the variants to allow quick and effective searches for candidates that are likely to reward closer investigation. Availability and implementation: http://compbio.charite.de/ExomeWalker Contact: peter.robinson@charite.de
Collapse
Affiliation(s)
- Damian Smedley
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany
| | - Sebastian Köhler
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany
| | - Johanna Christina Czeschik
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany
| | - Joanna Amberger
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany
| | - Carol Bocchini
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany
| | - Ada Hamosh
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany
| | - Julian Veldboer
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany
| | - Tomasz Zemojtel
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany
| | - Peter N Robinson
- Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany Mouse Informatics Group, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Genome Informatics Department, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany, McKusick-Nathans Institute of Genetic Medicine, John Hopkins University School of Medicine, Baltimore, MD 21205, USA, Department of Mathematics and Computer Science, Institute for Bioinformatics, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-701 Poznan, Poland, Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin and Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany Mouse Informatics Group, The Wellcome Trust Sang
| |
Collapse
|
25
|
Li X, Zhou X, Peng Y, Liu B, Zhang R, Hu J, Yu J, Jia C, Sun C. Network based integrated analysis of phenotype-genotype data for prioritization of candidate symptom genes. BIOMED RESEARCH INTERNATIONAL 2014; 2014:435853. [PMID: 24991551 PMCID: PMC4060751 DOI: 10.1155/2014/435853] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Accepted: 04/30/2014] [Indexed: 11/17/2022]
Abstract
BACKGROUND Symptoms and signs (symptoms in brief) are the essential clinical manifestations for individualized diagnosis and treatment in traditional Chinese medicine (TCM). To gain insights into the molecular mechanism of symptoms, we develop a computational approach to identify the candidate genes of symptoms. METHODS This paper presents a network-based approach for the integrated analysis of multiple phenotype-genotype data sources and the prediction of the prioritizing genes for the associated symptoms. The method first calculates the similarities between symptoms and diseases based on the symptom-disease relationships retrieved from the PubMed bibliographic database. Then the disease-gene associations and protein-protein interactions are utilized to construct a phenotype-genotype network. The PRINCE algorithm is finally used to rank the potential genes for the associated symptoms. RESULTS The proposed method gets reliable gene rank list with AUC (area under curve) 0.616 in classification. Some novel genes like CALCA, ESR1, and MTHFR were predicted to be associated with headache symptoms, which are not recorded in the benchmark data set, but have been reported in recent published literatures. CONCLUSIONS Our study demonstrated that by integrating phenotype-genotype relationships into a complex network framework it provides an effective approach to identify candidate genes of symptoms.
Collapse
Affiliation(s)
- Xing Li
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Xuezhong Zhou
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Yonghong Peng
- School of Engineering and Informatics, University of Bradford, West Yorkshire BD7 1DP, UK
| | - Baoyan Liu
- China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Runshun Zhang
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Jingqing Hu
- Institute of Basic Theory of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Jian Yu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Caiyan Jia
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Changkai Sun
- Liaoning Provincial Key Laboratory of Cerebral Diseases, Institute for Brain Disorders, Dalian Medical University, Dalian 116044, China
| |
Collapse
|
26
|
Gleditsia sinensis: transcriptome sequencing, construction, and application of its protein-protein interaction network. BIOMED RESEARCH INTERNATIONAL 2014; 2014:404578. [PMID: 24982878 PMCID: PMC4058233 DOI: 10.1155/2014/404578] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 04/21/2014] [Indexed: 11/18/2022]
Abstract
Gleditsia sinensis is a genus of deciduous tree in the family Caesalpinioideae, native to China, and is of great economic importance. However, despite its economic value, gene sequence information is strongly lacking. In the present study, transcriptome sequencing of G. sinensis was performed resulting in approximately 75.5 million clean reads assembled into 142155 unique transcripts generating 58583 unigenes. The average length of the unigenes was 900 bp, with an N50 of 549 bp. The obtained unigene sequences were then compared to four protein databases to include NCBI nonredundant protein (NRDB), Swiss-prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Cluster of Orthologous Groups (COG). Using BLAST procedure, 31385 unigenes (53.6%) were generated to have functional annotations. Additionally, sequence homologies between identified unigenes and genes of known species in a protein-protein interaction (PPI) network facilitated G. sinensis PPI network construction. Based on this network construction, new stress resistance genes (including cold, drought, and high salinity) were predicted. The present study is the first investigation of genome-wide gene expression in G. sinensis with the results providing a basis for future functional genomic studies relating to this species.
Collapse
|
27
|
Jin Z, Kotera M, Goto S. Virus proteins similar to human proteins as possible disturbance on human pathways. SYSTEMS AND SYNTHETIC BIOLOGY 2014; 8:283-95. [PMID: 26396652 DOI: 10.1007/s11693-014-9141-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Revised: 02/19/2014] [Accepted: 03/21/2014] [Indexed: 10/25/2022]
Abstract
Cancer is not rare anywhere in the world now, and the global burden of cancer continues to increase largely every year. Previous research on infections and cancers reported that, about 17.8 % of the cancers worldwide, which are over 1.9 million cases of cancer, are related to viral infections. At least six oncoviruses, cancer-causing viruses, have been known so far, which include hepatitis B virus, hepatitis C virus, Epstein-Barr virus (EBV or HHV-4), human papillomavirus, human T lymphotropic virus type 1, Kaposi's sarcoma-associated herpesvirus (KSHV or HHV-8), but the pathogenic mechanism is far from being completely understood. In this study, assuming that finding human proteins significantly similar to viral oncoproteins leads to a categorization of the cancer-related pathways that are currently not clearly known, we analyzed different types of virus-caused cancers based on their similarity in order to clarify the unknown cancer mechanisms. As a result, we obtained several potential tumor pathways that may be significant and essential in oncogenic cancer process, which will be helpful for further study on cancer mechanisms and the development of new drug targets.
Collapse
Affiliation(s)
- Zhao Jin
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011 Japan
| | - Masaaki Kotera
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011 Japan
| | - Susumu Goto
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011 Japan
| |
Collapse
|
28
|
Grover MP, Ballouz S, Mohanasundaram KA, George RA, Sherman CDH, Crowley TM, Wouters MA. Identification of novel therapeutics for complex diseases from genome-wide association data. BMC Med Genomics 2014; 7 Suppl 1:S8. [PMID: 25077696 PMCID: PMC4101352 DOI: 10.1186/1755-8794-7-s1-s8] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data to the clinic has not kept pace. Over the past 60 years, pharmaceutical companies have successfully demonstrated the safety and efficacy of over 1,200 novel therapeutic drugs via costly clinical studies. While this process must continue, better use can be made of the existing valuable data. In silico tools such as candidate gene prediction systems allow rapid identification of disease genes by identifying the most probable candidate genes linked to genetic markers of the disease or phenotype under investigation. Integration of drug-target data with candidate gene prediction systems can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on preclinical studies and phase I clinical trials. Methods We previously used Gentrepid (http://www.gentrepid.org) as a platform to predict 1,497 candidate genes for the seven complex diseases considered in the Wellcome Trust Case-Control Consortium genome-wide association study; namely Type 2 Diabetes, Bipolar Disorder, Crohn's Disease, Hypertension, Type 1 Diabetes, Coronary Artery Disease and Rheumatoid Arthritis. Here, we adopted a simple approach to integrate drug data from three publicly available drug databases: the Therapeutic Target Database, the Pharmacogenomics Knowledgebase and DrugBank; with candidate gene predictions from Gentrepid at the systems level. Results Using the publicly available drug databases as sources of drug-target association data, we identified a total of 428 candidate genes as novel therapeutic targets for the seven phenotypes of interest, and 2,130 drugs feasible for repositioning against the predicted novel targets. Conclusions By integrating genetic, bioinformatic and drug data, we have demonstrated that currently available drugs may be repositioned as novel therapeutics for the seven diseases studied here, quickly taking advantage of prior work in pharmaceutics to translate ground-breaking results in genetics to clinical treatments.
Collapse
|
29
|
Zhu C, Wu C, Aronow BJ, Jegga AG. Computational approaches for human disease gene prediction and ranking. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 799:69-84. [PMID: 24292962 DOI: 10.1007/978-1-4614-8778-4_4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
While candidate gene association studies continue to be the most practical and frequently employed approach in disease gene investigation for complex disorders, selecting suitable genes to test is a challenge. There are several computational approaches available for selecting and prioritizing disease candidate genes. A majority of these tools are based on guilt-by-association principle where novel disease candidate genes are identified and prioritized based on either functional or topological similarity to known disease genes. In this chapter we review the prioritization criteria and the algorithms along with some use cases that demonstrate how these tools can be used for identifying and ranking human disease candidate genes.
Collapse
Affiliation(s)
- Cheng Zhu
- Department of Computer Science, College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH, USA
| | | | | | | |
Collapse
|
30
|
Wu J, Li Y, Jiang R. Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies. PLoS Genet 2014; 10:e1004237. [PMID: 24651380 PMCID: PMC3961190 DOI: 10.1371/journal.pgen.1004237] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Accepted: 01/27/2014] [Indexed: 01/06/2023] Open
Abstract
Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring. The detection of causative nonsynonymous single nucleotide variants (SNVs) is essential for the understanding of the pathogenesis of human inherited diseases. In this paper, we propose a statistical method called SPRING (Snv PRioritization via the INtegration of Genomic data) to combine six functional effect scores calculated by existing methods and five association scores derived from multiple genomic data sources to estimate the statistical significance that a nonsynonymous SNV is pathogenic for a query disease. We find that SPRING is effective in identifying disease-causing SNVs for diseases whose genetic bases are either partly known or completely unknown across a variety of inheritance styles. With real exome sequencing data, we show the qualified potential of SPRING in not only the detection of causative SNVs in simulation studies but also the identification of pathogenic de novo mutations for autism, epileptic encephalopathies and intellectual disability.
Collapse
Affiliation(s)
- Jiaxin Wu
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing, China
| | - Yanda Li
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing, China
- * E-mail:
| |
Collapse
|
31
|
Hindumathi V, Kranthi T, Rao SB, Manimaran P. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach. MOLECULAR BIOSYSTEMS 2014; 10:1450-60. [PMID: 24647578 DOI: 10.1039/c4mb00004h] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.
Collapse
Affiliation(s)
- V Hindumathi
- C R Rao Advanced Institute of Mathematics, Statistics and Computer Science, University of Hyderabad Campus, Prof. C R Rao Road, Gachibowli, Hyderabad - 500046, India.
| | | | | | | |
Collapse
|
32
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
33
|
Abstract
A rare or orphan disorder is any disease that affects a small percentage of the population. Most genes and pathways underlying these disorders remain unknown. High-throughput techniques are frequently applied to detect disease candidate genes. The speed and affordability of sequencing following recent technological advances while advantageous are accompanied by the problem of data deluge. Furthermore, experimental validation of disease candidate genes is both time-consuming and expensive. Therefore, several computational approaches have been developed to identify the most promising candidates for follow-up studies. Based on the guilt by association principle, most of these approaches use prior knowledge about a disease of interest to discover and rank novel candidate genes. In this chapter, a brief overview of some of the in silico strategies for candidate gene prioritization is provided. To demonstrate their utility in rare disease research, a Web-based computational suite of tools that use integrated heterogeneous data sources for ranking disease candidate genes is used to demonstrate how to run typical queries using this system.
Collapse
Affiliation(s)
- Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, ML 7024, Cincinnati, OH, 45229, USA,
| |
Collapse
|
34
|
Abstract
The exponential growth of experimental and clinical data generated from systematic studies, the complexity in health and diseases, and the request for the establishment of systems models are bringing bioinformatics to the center stage of pharmacogenomics and systems biology. Bioinformatics plays an essential role in bridging the gap among different knowledge domains for the translation of the voluminous data into better diagnosis, prognosis, prevention, and treatment. Bioinformatics is essential in finding the spatiotemporal patterns in pharmacogenomics, including the time-series analyses of the associations between genetic structural variations and functional alterations such as drug responses. The elucidation of the cross talks among different systems levels and time scales can contribute to the discovery of accurate and robust biomarkers at various diseases stages for the development of systems and dynamical medicine. Various resources are available for such purposes, including databases and tools supporting "omics" studies such as genomics, proteomics, epigenomics, transcriptomics, metabolomics, lipidomics, pharmacogenomics, and chronomics. The combination of bioinformatics and health informatics methods would provide powerful decision support in both scientific and clinical environments. Data integration, data mining, and knowledge discovery (KD) methods would enable the simulation of complex systems and dynamical networks to establish predictive models for achieving predictive, preventive, and personalized medicine.
Collapse
Affiliation(s)
- Qing Yan
- PharmTao, 5672, 4601 Lafayette Street, Santa Clara, CA, 95056-5672, USA,
| |
Collapse
|
35
|
Safari-Alighiarloo N, Taghizadeh M, Rezaei-Tavirani M, Goliaei B, Peyvandi AA. Protein-protein interaction networks (PPI) and complex diseases. GASTROENTEROLOGY AND HEPATOLOGY FROM BED TO BENCH 2014; 7:17-31. [PMID: 25436094 PMCID: PMC4017556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Accepted: 12/23/2013] [Indexed: 11/16/2022]
Abstract
The physical interaction of proteins which lead to compiling them into large densely connected networks is a noticeable subject to investigation. Protein interaction networks are useful because of making basic scientific abstraction and improving biological and biomedical applications. Based on principle roles of proteins in biological function, their interactions determine molecular and cellular mechanisms, which control healthy and diseased states in organisms. Therefore, such networks facilitate the understanding of pathogenic (and physiologic) mechanisms that trigger the onset and progression of diseases. Consequently, this knowledge can be translated into effective diagnostic and therapeutic strategies. Furthermore, the results of several studies have proved that the structure and dynamics of protein networks are disturbed in complex diseases such as cancer and autoimmune disorders. Based on such relationship, a novel paradigm is suggested in order to confirm that the protein interaction networks can be the target of therapy for treatment of complex multi-genic diseases rather than individual molecules with disrespect the network.
Collapse
Affiliation(s)
- Nahid Safari-Alighiarloo
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Taghizadeh
- Bioinformatics Department, Institute of Biochemistry and Biophysics, Tehran University, Tehran, Iran
| | - Mostafa Rezaei-Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Bahram Goliaei
- Bioinformatics Department, Institute of Biochemistry and Biophysics, Tehran University, Tehran, Iran
| | - Ali Asghar Peyvandi
- Hearing Disorders Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
36
|
Ballouz S, Liu JY, Oti M, Gaeta B, Fatkin D, Bahlo M, Wouters MA. Candidate disease gene prediction using Gentrepid: application to a genome-wide association study on coronary artery disease. Mol Genet Genomic Med 2013; 2:44-57. [PMID: 24498628 PMCID: PMC3907915 DOI: 10.1002/mgg3.40] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2012] [Accepted: 08/19/2013] [Indexed: 12/12/2022] Open
Abstract
Current single-locus-based analyses and candidate disease gene prediction methodologies used in genome-wide association studies (GWAS) do not capitalize on the wealth of the underlying genetic data, nor functional data available from molecular biology. Here, we analyzed GWAS data from the Wellcome Trust Case Control Consortium (WTCCC) on coronary artery disease (CAD). Gentrepid uses a multiple-locus-based approach, drawing on protein pathway- or domain-based data to make predictions. Known disease genes may be used as additional information (seeded method) or predictions can be based entirely on GWAS single nucleotide polymorphisms (SNPs) (ab initio method). We looked in detail at specific predictions made by Gentrepid for CAD and compared these with known genetic data and the scientific literature. Gentrepid was able to extract known disease genes from the candidate search space and predict plausible novel disease genes from both known and novel WTCCC-implicated loci. The disease gene candidates are consistent with known biological information. The results demonstrate that this computational approach is feasible and a valuable discovery tool for geneticists.
Collapse
Affiliation(s)
- Sara Ballouz
- Structural and Computational Biology Division, Victor Chang Cardiac Research Institute Darlinghurst, NSW, 2010, Australia ; School of Computer Science and Engineering, University of New South Wales Kensington, NSW, 2052, Australia
| | - Jason Y Liu
- Structural and Computational Biology Division, Victor Chang Cardiac Research Institute Darlinghurst, NSW, 2010, Australia
| | - Martin Oti
- Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre Nijmegen, The Netherlands
| | - Bruno Gaeta
- School of Computer Science and Engineering, University of New South Wales Kensington, NSW, 2052, Australia
| | - Diane Fatkin
- School of Medical Sciences, University of New South Wales Kensington, NSW, 2052, Australia ; Molecular Cardiology and Biophysics Division, Victor Chang Cardiac Research Institute Darlinghurst, NSW, 2010, Australia
| | - Melanie Bahlo
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research Parkville, VIC, 3052, Australia
| | - Merridee A Wouters
- School of Medicine, Deakin University Geelong, VIC, 3217, Australia ; School of Life and Environmental Sciences, Deakin University Geelong, VIC, 3217, Australia
| |
Collapse
|
37
|
Yang JS, Kim J, Park S, Jeon J, Shin YE, Kim S. Spatial and functional organization of mitochondrial protein network. Sci Rep 2013; 3:1403. [PMID: 23466738 PMCID: PMC3590558 DOI: 10.1038/srep01403] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2012] [Accepted: 02/21/2013] [Indexed: 12/24/2022] Open
Abstract
Characterizing the spatial organization of the human mitochondrial proteome will enhance our understanding of mitochondrial functions at the molecular level and provide key insight into protein-disease associations. However, the sub-organellar location and possible association with mitochondrial diseases are not annotated for most mitochondrial proteins. Here, we characterized the functional and spatial organization of mitochondrial proteins by assessing their position in the Mitochondrial Protein Functional (MPF) network. Network position was assigned to the MPF network and facilitated the determination of sub-organellar location and functional organization of mitochondrial proteins. Moreover, network position successfully identified candidate disease genes of several mitochondrial disorders. Thus, our data support the use of network position as a novel method to explore the molecular function and pathogenesis of mitochondrial proteins.
Collapse
Affiliation(s)
- Jae-Seong Yang
- School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, Pohang, Gyeongbuk, Korea, 790-784
| | | | | | | | | | | |
Collapse
|
38
|
Ballouz S, Liu JY, George RA, Bains N, Liu A, Oti M, Gaeta B, Fatkin D, Wouters MA. Gentrepid V2.0: a web server for candidate disease gene prediction. BMC Bioinformatics 2013; 14:249. [PMID: 23947436 PMCID: PMC3844418 DOI: 10.1186/1471-2105-14-249] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2012] [Accepted: 08/13/2013] [Indexed: 01/06/2023] Open
Abstract
Background Candidate disease gene prediction is a rapidly developing area of bioinformatics research with the potential to deliver great benefits to human health. As experimental studies detecting associations between genetic intervals and disease proliferate, better bioinformatic techniques that can expand and exploit the data are required. Description Gentrepid is a web resource which predicts and prioritizes candidate disease genes for both Mendelian and complex diseases. The system can take input from linkage analysis of single genetic intervals or multiple marker loci from genome-wide association studies. The underlying database of the Gentrepid tool sources data from numerous gene and protein resources, taking advantage of the wealth of biological information available. Using known disease gene information from OMIM, the system predicts and prioritizes disease gene candidates that participate in the same protein pathways or share similar protein domains. Alternatively, using an ab initio approach, the system can detect enrichment of these protein annotations without prior knowledge of the phenotype. Conclusions The system aims to integrate the wealth of protein information currently available with known and novel phenotype/genotype information to acquire knowledge of biological mechanisms underpinning disease. We have updated the system to facilitate analysis of GWAS data and the study of complex diseases. Application of the system to GWAS data on hypertension using the ICBP data is provided as an example. An interesting prediction is a ZIP transporter additional to the one found by the ICBP analysis. The webserver URL is https://www.gentrepid.org/.
Collapse
Affiliation(s)
- Sara Ballouz
- School of Medicine, Deakin University, Geelong, VIC 3217, Australia.
| | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Nie Y, Yu J. Mining breast cancer genes with a network based noise-tolerant approach. BMC SYSTEMS BIOLOGY 2013; 7:49. [PMID: 23799982 PMCID: PMC3702465 DOI: 10.1186/1752-0509-7-49] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 06/21/2013] [Indexed: 12/22/2022]
Abstract
BACKGROUND Mining novel breast cancer genes is an important task in breast cancer research. Many approaches prioritize candidate genes based on their similarity to known cancer genes, usually by integrating multiple data sources. However, different types of data often contain varying degrees of noise. For effective data integration, it's important to design methods that work robustly with respect to noise. RESULTS Gene Ontology (GO) annotations were often utilized in cancer gene mining works. However, the vast majority of GO annotations were computationally derived, thus not completely accurate. A set of genes annotated with breast cancer enriched GO terms was adopted here as a set of source data with realistic noise. A novel noise tolerant approach was proposed to rank candidate breast cancer genes using noisy source data within the framework of a comprehensive human Protein-Protein Interaction (PPI) network. Performance of the proposed method was quantitatively evaluated by comparing it with the more established random walk approach. Results showed that the proposed method exhibited better performance in ranking known breast cancer genes and higher robustness against data noise than the random walk approach. When noise started to increase, the proposed method was able to maintained relatively stable performance, while the random walk approach showed drastic performance decline; when noise increased to a large extent, the proposed method was still able to achieve better performance than random walk did. CONCLUSIONS A novel noise tolerant method was proposed to mine breast cancer genes. Compared to the well established random walk approach, it showed better performance in correctly ranking cancer genes and worked robustly with respect to noise within source data. To the best of our knowledge, it's the first such effort to quantitatively analyze noise tolerance between different breast cancer gene mining methods. The sorted gene list can be valuable for breast cancer research. The proposed quantitative noise analysis method may also prove useful for other data integration efforts. It is hoped that the current work can lead to more discussions about influence of data noise on different computational methods for mining disease genes.
Collapse
Affiliation(s)
- Yaling Nie
- National Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | | |
Collapse
|
40
|
Abstract
Disease-causing aberrations in the normal function of a gene define that gene as a disease gene. Proving a causal link between a gene and a disease experimentally is expensive and time-consuming. Comprehensive prioritization of candidate genes prior to experimental testing drastically reduces the associated costs. Computational gene prioritization is based on various pieces of correlative evidence that associate each gene with the given disease and suggest possible causal links. A fair amount of this evidence comes from high-throughput experimentation. Thus, well-developed methods are necessary to reliably deal with the quantity of information at hand. Existing gene prioritization techniques already significantly improve the outcomes of targeted experimental studies. Faster and more reliable techniques that account for novel data types are necessary for the development of new diagnostics, treatments, and cure for many diseases.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Microbiology, School of Environmental and Biological Sciences, Rutgers University, New Brunswick, New Jersey, USA.
| |
Collapse
|
41
|
Rule extraction in gene-disease relationship discovery. Gene 2013; 518:132-8. [PMID: 23235120 DOI: 10.1016/j.gene.2012.11.060] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 11/27/2012] [Indexed: 11/24/2022]
Abstract
BACKGROUND Biomedical data available to researchers and clinicians have increased dramatically over the past years because of the exponential growth of knowledge in medical biology. It is difficult for curators to go through all of the unstructured documents so as to curate the information to the database. Associating genes with diseases is important because it is a fundamental challenge in human health with applications to understanding disease properties and developing new techniques for prevention, diagnosis and therapy. METHODS Our study uses the automatic rule-learning approach to gene-disease relationship extraction. We first prepare the experimental corpus from MEDLINE and OMIM. A parser is applied to produce some grammatical information. We then learn all possible rules that discriminate relevant from irrelevant sentences. After that, we compute the scores of the learned rules in order to select rules of interest. As a result, a set of rules is generated. RESULTS We produce the learned rules automatically from the 1000 positive and 1000 negative sentences. The test set includes 400 sentences composed of 200 positives and 200 negatives. Precision, recall and F-score served as our evaluation metrics. The results reveal that the maximal precision rate is 77.8% and the maximal recall rate is 63.5%. The maximal F-score is 66.9% where the precision rate is 70.6% and the recall rate is 63.5%. CONCLUSIONS We employ the rule-learning approach to extract gene-disease relationships. Our main contributions are to build rules automatically and to support a more complete set of rules than a manually generated one. The experiments show exhilarating results and some improving efforts will be made in the future.
Collapse
|
42
|
Jia M, Liu Y, Shen Z, Zhao C, Zhang M, Yi Z, Wen C, Deng Y, Shi T. HDAM: a resource of human disease associated mutations from next generation sequencing studies. BMC Med Genomics 2013; 6 Suppl 1:S16. [PMID: 23369322 PMCID: PMC3552701 DOI: 10.1186/1755-8794-6-s1-s16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Background Next generation sequencing (NGS) technologies have greatly facilitated the rapid and economical detection of pathogenic mutations in human disorders. However, mutation descriptions are hard to be compared and integrated due to various reference sequences and annotation tools adopted in different articles as well as the nomenclature of diseases/traits. Description The Human Disease Associated Mutation (HDAM) database is dedicated to collect, standardize and re-annotate mutations for human diseases discovered by NGS studies. In the current release, HDAM contains 1,114 mutations, located in 669 genes and associated with 125 human diseases through literature mining. All mutation records have uniform and unequivocal descriptions of sequence changes according to the Human Genome Sequence Variation Society (HGVS) nomenclature recommendations. Each entry displays comprehensive information, including mutation location in genome (hg18/hg19), gene functional annotation, protein domain annotation, susceptible diseases, the first literature report of the mutation and etc. Moreover, new mutation-disease relationships predicted by Bayesian network are also presented under each mutation. Conclusion HDAM contains hundreds rigorously curated human mutations from NGS studies and was created to provide a comprehensive view of these mutations that confer susceptibility to the common disorders. HDAM can be freely accessed at http://www.megabionet.org/HDAM.
Collapse
Affiliation(s)
- Meiwei Jia
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Science, East China Normal University, Shanghai 200241, China
| | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Magger O, Waldman YY, Ruppin E, Sharan R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput Biol 2012; 8:e1002690. [PMID: 23028288 PMCID: PMC3459874 DOI: 10.1371/journal.pcbi.1002690] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Accepted: 07/28/2012] [Indexed: 01/07/2023] Open
Abstract
The prioritization of candidate disease-causing genes is a fundamental challenge in the post-genomic era. Current state of the art methods exploit a protein-protein interaction (PPI) network for this task. They are based on the observation that genes causing phenotypically-similar diseases tend to lie close to one another in a PPI network. However, to date, these methods have used a static picture of human PPIs, while diseases impact specific tissues in which the PPI networks may be dramatically different. Here, for the first time, we perform a large-scale assessment of the contribution of tissue-specific information to gene prioritization. By integrating tissue-specific gene expression data with PPI information, we construct tissue-specific PPI networks for 60 tissues and investigate their prioritization power. We find that tissue-specific PPI networks considerably improve the prioritization results compared to those obtained using a generic PPI network. Furthermore, they allow predicting novel disease-tissue associations, pointing to sub-clinical tissue effects that may escape early detection.
Collapse
Affiliation(s)
- Oded Magger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | | | |
Collapse
|
44
|
Gao S, Jia S, Hessner MJ, Wang X. Predicting disease-related subnetworks for type 1 diabetes using a new network activity score. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:566-78. [PMID: 22917479 DOI: 10.1089/omi.2012.0029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In this study we investigated the advantage of including network information in prioritizing disease genes of type 1 diabetes (T1D). First, a naïve Bayesian network (NBN) model was developed to integrate information from multiple data sources and to define a T1D-involvement probability score (PS) for each individual gene. The algorithm was validated using known functional candidate genes as a benchmark. Genes with higher PS were found to be more likely to appear in T1D-related publications. Next a new network activity metric was proposed to evaluate the T1D relevance of protein-protein interaction (PPI) subnetworks. The metric considered the contribution both from individual genes and from network topological characteristics. The predictions were confirmed by several independent datasets, including a genome wide association study (GWAS), and two large-scale human gene expression studies. We found that novel candidate genes in the T1D subnetworks showed more significant associations with T1D than genes predicted using PS alone. Interestingly, most novel candidates were not encoded within the human leukocyte antigen (HLA) region, and their expression levels showed correlation with disease only in cohorts with low-risk HLA genotypes. The results suggested the importance of mapping disease gene networks in dissecting the genetics of complex diseases, and offered a general approach to network-based disease gene prioritization from multiple data sources.
Collapse
Affiliation(s)
- Shouguo Gao
- Department of Physics, the University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | | | | | | |
Collapse
|
45
|
Masoudi-Nejad A, Meshkin A, Haji-Eghrari B, Bidkhori G. RETRACTED ARTICLE: Candidate gene prioritization. Mol Genet Genomics 2012; 287:679-98. [DOI: 10.1007/s00438-012-0710-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 07/12/2012] [Indexed: 01/16/2023]
|
46
|
Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet 2012; 13:523-36. [DOI: 10.1038/nrg3253] [Citation(s) in RCA: 332] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
47
|
Ma H, Xu D, Fu Q. Identification of ankylosing spondylitis-associated genes by expression profiling. Int J Mol Med 2012; 30:693-6. [PMID: 22751785 DOI: 10.3892/ijmm.2012.1047] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Accepted: 05/11/2012] [Indexed: 11/06/2022] Open
Abstract
Ankylosing spondylitis (AS) is a chronic inflammation attacking the sacroiliac joints and the spine. Certain genes have been associated with the occurrence of AS. Gene chip data were utilized to recognize genes associated with AS for the association of the clinical diagnosis and the biomedical study. Microarray expression data of AS were acquired from the public microarray database GEO (gene expression omnibus), and AS-related genes were obtained by differential gene expression profiling. The transcriptional and translational levels of these genes were further examined. The transcriptional and translational levels of three genes were shown to be upregulated in a mouse model of AS by real-time PCR and Elisa assay, respectively. Differential expression of AS-related genes was identified by analysis of gene chip data, contributing to the advancement of the understanding of the pathogenesis of AS.
Collapse
Affiliation(s)
- Hui Ma
- Department of Orthopaedics, Changhai Hospital, Second Military Medical University, Shanghai, P.R. China
| | | | | |
Collapse
|
48
|
Zhang L, Li X, Tai J, Li W, Chen L. Predicting candidate genes based on combined network topological features: a case study in coronary artery disease. PLoS One 2012; 7:e39542. [PMID: 22761820 PMCID: PMC3382204 DOI: 10.1371/journal.pone.0039542] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Accepted: 05/22/2012] [Indexed: 11/26/2022] Open
Abstract
Predicting candidate genes using gene expression profiles and unbiased protein-protein interactions (PPI) contributes a lot in deciphering the pathogenesis of complex diseases. Recent studies showed that there are significant disparities in network topological features between non-disease and disease genes in protein-protein interaction settings. Integrated methods could consider their characteristics comprehensively in a biological network. In this study, we introduce a novel computational method, based on combined network topological features, to construct a combined classifier and then use it to predict candidate genes for coronary artery diseases (CAD). As a result, 276 novel candidate genes were predicted and were found to share similar functions to known disease genes. The majority of the candidate genes were cross-validated by other three methods. Our method will be useful in the search for candidate genes of other diseases.
Collapse
Affiliation(s)
- Liangcai Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LCZ); (LC)
| | - Xu Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Jingxie Tai
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LCZ); (LC)
| |
Collapse
|
49
|
Eronen L, Toivonen H. Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics 2012; 13:119. [PMID: 22672646 PMCID: PMC3505483 DOI: 10.1186/1471-2105-13-119] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Accepted: 04/17/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. RESULTS Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. CONCLUSIONS The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Collapse
Affiliation(s)
- Lauri Eronen
- Biocomputing Platforms Ltd, Innopoli 2, Tekniikantie 14, , FI-02150 Espoo, Finland.
| | | |
Collapse
|
50
|
Britto R, Sallou O, Collin O, Michaux G, Primig M, Chalmel F. GPSy: a cross-species gene prioritization system for conserved biological processes--application in male gamete development. Nucleic Acids Res 2012; 40:W458-65. [PMID: 22570409 PMCID: PMC3394256 DOI: 10.1093/nar/gks380] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
We present gene prioritization system (GPSy), a cross-species gene prioritization system that facilitates the arduous but critical task of prioritizing genes for follow-up functional analyses. GPSy’s modular design with regard to species, data sets and scoring strategies enables users to formulate queries in a highly flexible manner. Currently, the system encompasses 20 topics related to conserved biological processes including male gamete development discussed in this article. The web server-based tool is freely available at http://gpsy.genouest.org.
Collapse
|