1
|
He B, Wang K, Xiang J, Bing P, Tang M, Tian G, Guo C, Xu M, Yang J. DGHNE: network enhancement-based method in identifying disease-causing genes through a heterogeneous biomedical network. Brief Bioinform 2022; 23:6712302. [PMID: 36151744 DOI: 10.1093/bib/bbac405] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 08/01/2022] [Accepted: 08/21/2022] [Indexed: 12/14/2022] Open
Abstract
The identification of disease-causing genes is critical for mechanistic understanding of disease etiology and clinical manipulation in disease prevention and treatment. Yet the existing approaches in tackling this question are inadequate in accuracy and efficiency, demanding computational methods with higher identification power. Here, we proposed a new method called DGHNE to identify disease-causing genes through a heterogeneous biomedical network empowered by network enhancement. First, a disease-disease association network was constructed by the cosine similarity scores between phenotype annotation vectors of diseases, and a new heterogeneous biomedical network was constructed by using disease-gene associations to connect the disease-disease network and gene-gene network. Then, the heterogeneous biomedical network was further enhanced by using network embedding based on the Gaussian random projection. Finally, network propagation was used to identify candidate genes in the enhanced network. We applied DGHNE together with five other methods into the most updated disease-gene association database termed DisGeNet. Compared with all other methods, DGHNE displayed the highest area under the receiver operating characteristic curve and the precision-recall curve, as well as the highest precision and recall, in both the global 5-fold cross-validation and predicting new disease-gene associations. We further performed DGHNE in identifying the candidate causal genes of Parkinson's disease and diabetes mellitus, and the genes connecting hyperglycemia and diabetes mellitus. In all cases, the predicted causing genes were enriched in disease-associated gene ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways, and the gene-disease associations were highly evidenced by independent experimental studies.
Collapse
Affiliation(s)
- Binsheng He
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China
| | - Kun Wang
- School of Mathematical Sciences, Ocean University of China, Qingdao 266100, China
| | - Ju Xiang
- Academician Workstation, Changsha Medical University, Changsha 410219, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang 212001, Jiangsu, China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing 100102, China
| | - Cheng Guo
- Center for Infection and Immunity, Mailman School of Public Health, Columbia University, New York, NY, 10032, USA
| | - Miao Xu
- Broad institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China.,Geneis (Beijing) Co., Ltd., Beijing 100102, China
| |
Collapse
|
2
|
Abstract
DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.In this chapter we provide an overview of the methods and tools used to create networks from microarray data and describe multiple methods on how to analyze a single network or a group of networks. The described methods range from topological metrics, functional group identification to data integration strategies, topological pathway analysis as well as graphical models.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology , University of Helsinki, Helsinki, Finland.
| |
Collapse
|
3
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
4
|
GPS: Identification of disease genes by rank aggregation of multi-genomic scoring schemes. Genomics 2019; 111:612-618. [DOI: 10.1016/j.ygeno.2018.03.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 03/16/2018] [Accepted: 03/21/2018] [Indexed: 12/19/2022]
|
5
|
Freytag S, Burgess R, Oliver KL, Bahlo M. brain-coX: investigating and visualising gene co-expression in seven human brain transcriptomic datasets. Genome Med 2017; 9:55. [PMID: 28595657 PMCID: PMC5465565 DOI: 10.1186/s13073-017-0444-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 05/26/2017] [Indexed: 12/17/2022] Open
Abstract
Background The pathogenesis of neurological and mental health disorders often involves multiple genes, complex interactions, as well as brain- and development-specific biological mechanisms. These characteristics make identification of disease genes for such disorders challenging, as conventional prioritisation tools are not specifically tailored to deal with the complexity of the human brain. Thus, we developed a novel web-application—brain-coX—that offers gene prioritisation with accompanying visualisations based on seven gene expression datasets in the post-mortem human brain, the largest such resource ever assembled. Results We tested whether our tool can correctly prioritise known genes from 37 brain-specific KEGG pathways and 17 psychiatric conditions. We achieved average sensitivity of nearly 50%, at the same time reaching a specificity of approximately 75%. We also compared brain-coX’s performance to that of its main competitors, Endeavour and ToppGene, focusing on the ability to discover novel associations. Using a subset of the curated SFARI autism gene collection we show that brain-coX’s prioritisations are most similar to SFARI’s own curated gene classifications. Conclusions brain-coX is the first prioritisation and visualisation web-tool targeted to the human brain and can be freely accessed via http://shiny.bioinf.wehi.edu.au/freytag.s/. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0444-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saskia Freytag
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia. .,Department of Medical Biology, University of Melbourne, 1G Royale Parade, 3052, Parkville, Australia.
| | - Rosemary Burgess
- Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, 245 Burgundy Street, 3084, Heidelberg, Australia
| | - Karen L Oliver
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia.,Epilepsy Research Centre, Department of Medicine, Austin Health, University of Melbourne, 245 Burgundy Street, 3084, Heidelberg, Australia
| | - Melanie Bahlo
- Population Health and Immunity Divison, The Walter and Eliza Hall Institute of Medical Research, 1G Royale Parade, 3052, Parkville, Australia.,Department of Medical Biology, University of Melbourne, 1G Royale Parade, 3052, Parkville, Australia.,School of Mathematics and Statistics, University of Melbourne, 3010, Parkville, Australia
| |
Collapse
|
6
|
Li S, Li R, Wang H, Li L, Li H, Li Y. The Key Genes of Chronic Pancreatitis which Bridge Chronic Pancreatitis and Pancreatic Cancer Can be Therapeutic Targets. Pathol Oncol Res 2017; 24:215-222. [PMID: 28435988 DOI: 10.1007/s12253-017-0217-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 03/24/2017] [Indexed: 01/15/2023]
Abstract
An important question in systems biology is what role the underlying molecular mechanisms play in disease progression. The relationship between chronic pancreatitis and pancreatic cancer needs further exploration in a system view. We constructed the disease network based on gene expression data and protein-protein interaction. We proposed an approach to discover the underlying core network and molecular factors in the progression of pancreatic diseases, which contain stages of chronic pancreatitis and pancreatic cancer. The chronic pancreatitis and pancreatic cancer core network and key factors were revealed and then verified by gene set enrichment analysis of pathways and diseases. The key factors provide the microenvironment for tumor initiation and the change of gene expression level of key factors bridge chronic pancreatitis and pancreatic cancer. Some new candidate genes need further verification by experiments. Transcriptome profiling-based network analysis reveals the importance of chronic pancreatitis genes and pathways in pancreatic cancer development on a system level by computational method and they can be therapeutic targets.
Collapse
Affiliation(s)
- Shuang Li
- The Key Laboratory of Pathobiology, Ministry of Education, College of Basic Medical Sciences, Jilin University, Changchun, China
| | - Rui Li
- National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing, China
| | - Heping Wang
- Department of Neurosurgery, Tongji Hospital, Tongji Medical School, Wuhan, China
| | - Lisha Li
- The Key Laboratory of Pathobiology, Ministry of Education, College of Basic Medical Sciences, Jilin University, Changchun, China.
| | - Huiyu Li
- The Key Laboratory of Pathobiology, Ministry of Education, College of Basic Medical Sciences, Jilin University, Changchun, China
| | - Yulin Li
- The Key Laboratory of Pathobiology, Ministry of Education, College of Basic Medical Sciences, Jilin University, Changchun, China
| |
Collapse
|
7
|
Abstract
MOTIVATION Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. RESULTS To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. AVAILABILITY AND IMPLEMENTATION nlp. CASE edu/public/data/DMN
Collapse
Affiliation(s)
- Yang Chen
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Li Li
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Guo-Qiang Zhang
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Rong Xu
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
8
|
Network regularised Cox regression and multiplex network models to predict disease comorbidities and survival of cancer. Comput Biol Chem 2015; 59 Pt B:15-31. [DOI: 10.1016/j.compbiolchem.2015.08.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2015] [Revised: 08/21/2015] [Accepted: 08/25/2015] [Indexed: 12/17/2022]
|
9
|
Theofilatos KA, Likothanassis S, Mavroudi S. Quo vadis computational analysis of PPI data or why the future isn't here yet. Front Genet 2015; 6:289. [PMID: 26442107 PMCID: PMC4584938 DOI: 10.3389/fgene.2015.00289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 08/31/2015] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Spiros Likothanassis
- InSyBio Ltd. London, UK ; Pattern Recognition Laboratory, Department of Computer Engineering and Informatics, University of Patras Patras, Greece
| | - Seferina Mavroudi
- InSyBio Ltd. London, UK ; Pattern Recognition Laboratory, Department of Computer Engineering and Informatics, University of Patras Patras, Greece ; Department of Social Work, School of Sciences of Health and Care, Technological Educational Institute of Western Greece Patras, Greece
| |
Collapse
|
10
|
Emran NA. Data Completeness Measures. PATTERN ANALYSIS, INTELLIGENT SECURITY AND THE INTERNET OF THINGS 2015:117-130. [DOI: 10.1007/978-3-319-17398-6_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
11
|
Assessment of curated phenotype mining in neuropsychiatric disorder literature. Methods 2014; 74:90-6. [PMID: 25484337 DOI: 10.1016/j.ymeth.2014.11.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Revised: 11/25/2014] [Accepted: 11/27/2014] [Indexed: 12/14/2022] Open
Abstract
Clinical evaluation of patients and diagnosis of disorder is crucial to make decisions on appropriate therapies. In addition, in the case of genetic disorders resulting from gene abnormalities, phenotypic effects may guide basic research on the mechanisms of a disorder to find the mutated gene and therefore to propose novel targets for drug therapy. However, this approach is complicated by two facts. First, the relationship between genes and disorders is not simple: one gene may be related to multiple disorders and a disorder may be caused by mutations in different genes. Second, recognizing relevant phenotypes might be difficult for clinicians working with patients of closely related complex disorders. Neuropsychiatric disorders best illustrate these difficulties since phenotypes range from metabolic to behavioral aspects, the latter extremely complex. Based on our clinical expertise on five neurodegenerative disorders, and from the wealth of bibliographical data on neuropsychiatric disorders, we have built a resource to infer associations between genes, chemicals, phenotypes for a total of 31 disorders. An initial step of automated text mining of the literature related to 31 disorders returned thousands of enriched terms. Fewer relevant phenotypic terms were manually selected by clinicians as relevant to the five neural disorders of their expertise and used to analyze the complete set of disorders. Analysis of the data indicates general relationships between neuropsychiatric disorders, which can be used to classify and characterize them. Correlation analyses allowed us to propose novel associations of genes and drugs with disorders. More generally, the results led us to uncovering mechanisms of disease that span multiple neuropsychiatric disorders, for example that genes related to synaptic transmission and receptor functions tend to be involved in many disorders, whereas genes related to sensory perception and channel transport functions are associated with fewer disorders. Our study shows that starting from expertise covering a limited set of neurological disorders and using text and data mining methods, meaningful and novel associations regarding genes, chemicals and phenotypes can be derived for an expanded set of neuropsychiatric disorders. Our results are intended for clinicians to help them evaluate patients, and for basic scientists to propose new gene targets for drug therapies. This strategy can be extended to virtually all diseases and takes advantage of the ever increasing amount of biomedical literature.
Collapse
|
12
|
Chen Y, Xu R. Mining cancer-specific disease comorbidities from a large observational health database. Cancer Inform 2014; 13:37-44. [PMID: 25392682 PMCID: PMC4216041 DOI: 10.4137/cin.s13893] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Revised: 04/29/2014] [Accepted: 04/30/2014] [Indexed: 12/28/2022] Open
Abstract
Cancer comorbidities often reflect the complex pathogenesis of cancers and provide valuable clues to discover the underlying genetic mechanisms of cancers. In this study, we systematically mine and analyze cancer-specific comorbidity from the FDA Adverse Event Reporting System. We stratified 3,354,043 patients based on age and gender, and developed a network-based approach to extract comorbidity patterns from each patient group. We compared the comorbidity patterns among different patient groups and investigated the effect of age and gender on cancer comorbidity patterns. The results demonstrated that the comorbidity relationships between cancers and non-cancer diseases largely depend on age and gender. A few exceptions are depression, anxiety, and metabolic syndrome, whose comorbidity relationships with cancers are relatively stable among all patients. Literature evidences demonstrate that these stable cancer comorbidities reflect the pathogenesis of cancers. We applied our comorbidity mining approach on colorectal cancer and detected its comorbid associations with metabolic syndrome components, diabetes, and osteoporosis. Our results not only confirmed known cancer comorbidities but also generated novel hypotheses, which can illuminate the common pathophysiology between cancers and their co-occurring diseases.
Collapse
Affiliation(s)
- Yang Chen
- Division of Medical Informatics, Case Western Reserve University, Cleveland, OH, USA
| | - Rong Xu
- Division of Medical Informatics, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
13
|
Chen Y, Zhang X, Zhang GQ, Xu R. Comparative analysis of a novel disease phenotype network based on clinical manifestations. J Biomed Inform 2014; 53:113-20. [PMID: 25277758 DOI: 10.1016/j.jbi.2014.09.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 08/18/2014] [Accepted: 09/21/2014] [Indexed: 12/21/2022]
Abstract
Systems approaches to analyzing disease phenotype networks in combination with protein functional interaction networks have great potential in illuminating disease pathophysiological mechanisms. While many genetic networks are readily available, disease phenotype networks remain largely incomplete. In this study, we built a large-scale Disease Manifestation Network (DMN) from 50,543 highly accurate disease-manifestation semantic relationships in the United Medical Language System (UMLS). Our new phenotype network contains 2305 nodes and 373,527 weighted edges to represent the disease phenotypic similarities. We first compared DMN with the networks representing genetic relationships among diseases, and demonstrated that the phenotype clustering in DMN reflects common disease genetics. Then we compared DMN with a widely-used disease phenotype network in previous gene discovery studies, called mimMiner, which was extracted from the textual descriptions in Online Mendelian Inheritance in Man (OMIM). We demonstrated that DMN contains different knowledge from the existing phenotype data source. Finally, a case study on Marfan syndrome further proved that DMN contains useful information and can provide leads to discover unknown disease causes. Integrating DMN in systems approaches with mimMiner and other data offers the opportunities to predict novel disease genetics. We made DMN publicly available at nlp/case.edu/public/data/DMN.
Collapse
Affiliation(s)
- Yang Chen
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States; Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Xiang Zhang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Guo-Qiang Zhang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States; Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Rong Xu
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States.
| |
Collapse
|
14
|
Liu RL, Shih CC. Identification of highly related references about gene-disease association. BMC Bioinformatics 2014; 15:286. [PMID: 25155502 PMCID: PMC4162969 DOI: 10.1186/1471-2105-15-286] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2013] [Accepted: 08/12/2014] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Curation of gene-disease associations published in literature should be based on careful and frequent survey of the references that are highly related to specific gene-disease associations. Retrieval of the references is thus essential for timely and complete curation. RESULTS We present a technique CRFref (Conclusive, Rich, and Focused References) that, given a gene-disease pair < g, d>, ranks high those biomedical references that are likely to provide conclusive, rich, and focused results about g and d. Such references are expected to be highly related to the association between g and d. CRFref ranks candidate references based on their scores. To estimate the score of a reference r, CRFref estimates and integrates three measures: degree of conclusiveness, degree of richness, and degree of focus of r with respect to < g, d>. To evaluate CRFref, experiments are conducted on over one hundred thousand references for over one thousand gene-disease pairs. Experimental results show that CRFref performs significantly better than several typical types of baselines in ranking high those references that expert curators select to develop the summaries for specific gene-disease associations. CONCLUSION CRFref is a good technique to rank high those references that are highly related to specific gene-disease associations. It can be incorporated into existing search engines to prioritize biomedical references for curators and researchers, as well as those text mining systems that aim at the study of gene-disease associations.
Collapse
Affiliation(s)
- Rey-Long Liu
- Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan.
| | | |
Collapse
|
15
|
Valentini G, Paccanaro A, Caniza H, Romero AE, Re M. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif Intell Med 2014; 61:63-78. [PMID: 24726035 PMCID: PMC4070077 DOI: 10.1016/j.artmed.2014.03.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 03/05/2014] [Accepted: 03/10/2014] [Indexed: 02/07/2023]
Abstract
OBJECTIVE In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. MATERIALS AND METHODS We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. RESULTS The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. CONCLUSIONS Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.
Collapse
Affiliation(s)
- Giorgio Valentini
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy.
| | - Alberto Paccanaro
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Horacio Caniza
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Alfonso E Romero
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Matteo Re
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy
| |
Collapse
|
16
|
High-Throughput Translational Medicine: Challenges and Solutions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 799:39-67. [DOI: 10.1007/978-1-4614-8778-4_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
17
|
Network Analysis of Human Disease Comorbidity Patterns Based on Large-Scale Data Mining. BIOINFORMATICS RESEARCH AND APPLICATIONS 2014. [DOI: 10.1007/978-3-319-08171-7_22] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
18
|
Emran NA, Embury S, Missier P. Measuring Population-Based Completeness for Single Nucleotide Polymorphism (SNP) Databases. ADVANCED APPROACHES TO INTELLIGENT INFORMATION AND DATABASE SYSTEMS 2014:173-182. [DOI: 10.1007/978-3-319-05503-9_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
19
|
Nayak L, Tunga H, De RK. Disease co-morbidity and the human Wnt signaling pathway: a network-wise study. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2013; 17:318-37. [PMID: 23692364 DOI: 10.1089/omi.2012.0053] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The human Wnt signaling pathway contains 57 genes communicating among themselves by 70 experimentally established associations, as given in the KEGG/PATHWAY database. It is responsible for a variety of crucial biological functions such as regulation of cell fate determination, proliferation, differentiation, migration, and apoptosis. Abnormal behavior of its members causes numerous types of human cancers, dramatic changes in bone mass density that lead to diseases such as osteoporosis-pseudo-glioma syndrome, Van-Buchem disease, skeletal malformation, autosomal dominant sclerosteosis, and osteoporosis type I syndromes. So far, single genes have been investigated for their disease-causing properties, and single diseases have been traced backwards to discover foul-play of the system pathways. Differential expression of the whole genome has been mapped by microarray. But how all the genes involved in a pathway affect each other in single/multiple disease state(s) and whether the presence of one disease state makes a person prone to another kind of disease(s) (i.e., co-morbidity among diseases associated with a certain important biological pathway) is still unknown. We have developed a human Wnt signaling pathway diseasome and analyzed it for finding answers to such questions. Data used in constructing the diseasome can be downloaded from the publicly accessible webserver http://www.isical.ac.in/-rajat/diseasome/index.php.
Collapse
Affiliation(s)
- Losiana Nayak
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| | | | | |
Collapse
|
20
|
Kamphans T, Sabri P, Zhu N, Heinrich V, Mundlos S, Robinson PN, Parkhomchuk D, Krawitz PM. Filtering for compound heterozygous sequence variants in non-consanguineous pedigrees. PLoS One 2013; 8:e70151. [PMID: 23940540 PMCID: PMC3734130 DOI: 10.1371/journal.pone.0070151] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Accepted: 06/20/2013] [Indexed: 01/06/2023] Open
Abstract
The identification of disease-causing mutations in next-generation sequencing (NGS) data requires efficient filtering techniques. In patients with rare recessive diseases, compound heterozygosity of pathogenic mutations is the most likely inheritance model if the parents are non-consanguineous. We developed a web-based compound heterozygous filter that is suited for data from NGS projects and that is easy to use for non-bioinformaticians. We analyzed the power of compound heterozygous mutation filtering by deriving background distributions for healthy individuals from different ethnicities and studied the effectiveness in trios as well as more complex pedigree structures. While usually more then 30 genes harbor potential compound heterozygotes in single exomes, this number can be markedly reduced with every additional member of the pedigree that is included in the analysis. In a real data set with exomes of four family members, two sisters affected by Mabry syndrome and their healthy parents, the disease-causing gene PIGO, which harbors the pathogenic compound heterozygous variants, could be readily identified. Compound heterozygous filtering is an efficient means to reduce the number of candidate mutations in studies aiming at identifying recessive disease genes in non-consanguineous families. A web-server is provided to make this filtering strategy available at www.gene-talk.de.
Collapse
Affiliation(s)
| | - Peggy Sabri
- Institute for Medical Genetics and Human Genetics, Charité Universtätsmedizin, Berlin, Germany
| | - Na Zhu
- Institute for Medical Genetics and Human Genetics, Charité Universtätsmedizin, Berlin, Germany
| | - Verena Heinrich
- Institute for Medical Genetics and Human Genetics, Charité Universtätsmedizin, Berlin, Germany
| | - Stefan Mundlos
- Institute for Medical Genetics and Human Genetics, Charité Universtätsmedizin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Peter N. Robinson
- Institute for Medical Genetics and Human Genetics, Charité Universtätsmedizin, Berlin, Germany
| | - Dmitri Parkhomchuk
- Institute for Medical Genetics and Human Genetics, Charité Universtätsmedizin, Berlin, Germany
| | - Peter M. Krawitz
- Institute for Medical Genetics and Human Genetics, Charité Universtätsmedizin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
21
|
Xu R, Li L, Wang Q. Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature. Bioinformatics 2013; 29:2186-94. [PMID: 23828786 DOI: 10.1093/bioinformatics/btt359] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Systems approaches to studying phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repurposing. Currently, systematic study of disease phenotypic relationships on a phenome-wide scale is limited because large-scale machine-understandable disease-phenotype relationship knowledge bases are often unavailable. Here, we present an automatic approach to extract disease-manifestation (D-M) pairs (one specific type of disease-phenotype relationship) from the wide body of published biomedical literature. DATA AND METHODS Our method leverages external knowledge and limits the amount of human effort required. For the text corpus, we used 119 085 682 MEDLINE sentences (21 354 075 citations). First, we used D-M pairs from existing biomedical ontologies as prior knowledge to automatically discover D-M-specific syntactic patterns. We then extracted additional pairs from MEDLINE using the learned patterns. Finally, we analysed correlations between disease manifestations and disease-associated genes and drugs to demonstrate the potential of this newly created knowledge base in disease gene discovery and drug repurposing. RESULTS In total, we extracted 121 359 unique D-M pairs with a high precision of 0.924. Among the extracted pairs, 120 419 (99.2%) have not been captured in existing structured knowledge sources. We have shown that disease manifestations correlate positively with both disease-associated genes and drug treatments. CONCLUSIONS The main contribution of our study is the creation of a large-scale and accurate D-M phenotype relationship knowledge base. This unique knowledge base, when combined with existing phenotypic, genetic and proteomic datasets, can have profound implications in our deeper understanding of disease etiology and in rapid drug repurposing. AVAILABILITY http://nlp.case.edu/public/data/DMPatternUMLS/
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, OH 44106, USA.
| | | | | |
Collapse
|
22
|
Nie Y, Yu J. Mining breast cancer genes with a network based noise-tolerant approach. BMC SYSTEMS BIOLOGY 2013; 7:49. [PMID: 23799982 PMCID: PMC3702465 DOI: 10.1186/1752-0509-7-49] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 06/21/2013] [Indexed: 12/22/2022]
Abstract
BACKGROUND Mining novel breast cancer genes is an important task in breast cancer research. Many approaches prioritize candidate genes based on their similarity to known cancer genes, usually by integrating multiple data sources. However, different types of data often contain varying degrees of noise. For effective data integration, it's important to design methods that work robustly with respect to noise. RESULTS Gene Ontology (GO) annotations were often utilized in cancer gene mining works. However, the vast majority of GO annotations were computationally derived, thus not completely accurate. A set of genes annotated with breast cancer enriched GO terms was adopted here as a set of source data with realistic noise. A novel noise tolerant approach was proposed to rank candidate breast cancer genes using noisy source data within the framework of a comprehensive human Protein-Protein Interaction (PPI) network. Performance of the proposed method was quantitatively evaluated by comparing it with the more established random walk approach. Results showed that the proposed method exhibited better performance in ranking known breast cancer genes and higher robustness against data noise than the random walk approach. When noise started to increase, the proposed method was able to maintained relatively stable performance, while the random walk approach showed drastic performance decline; when noise increased to a large extent, the proposed method was still able to achieve better performance than random walk did. CONCLUSIONS A novel noise tolerant method was proposed to mine breast cancer genes. Compared to the well established random walk approach, it showed better performance in correctly ranking cancer genes and worked robustly with respect to noise within source data. To the best of our knowledge, it's the first such effort to quantitatively analyze noise tolerance between different breast cancer gene mining methods. The sorted gene list can be valuable for breast cancer research. The proposed quantitative noise analysis method may also prove useful for other data integration efforts. It is hoped that the current work can lead to more discussions about influence of data noise on different computational methods for mining disease genes.
Collapse
Affiliation(s)
- Yaling Nie
- National Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | | |
Collapse
|
23
|
Emran NA, Embury S, Missier P, Ahmad N. Reference Architectures to Measure Data Completeness across Integrated Databases. INTELLIGENT INFORMATION AND DATABASE SYSTEMS 2013:216-225. [DOI: 10.1007/978-3-642-36546-1_23] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
24
|
Emran NA, Embury S, Missier P, Isa MNM, Muda AK. Measuring Data Completeness for Microbial Genomics Database. INTELLIGENT INFORMATION AND DATABASE SYSTEMS 2013:186-195. [DOI: 10.1007/978-3-642-36546-1_20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
25
|
Gonçalves JP, Francisco AP, Moreau Y, Madeira SC. Interactogeneous: disease gene prioritization using heterogeneous networks and full topology scores. PLoS One 2012. [PMID: 23185389 PMCID: PMC3501465 DOI: 10.1371/journal.pone.0049634] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.
Collapse
Affiliation(s)
- Joana P. Gonçalves
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
- * E-mail: (JPG); (SCM)
| | - Alexandre P. Francisco
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
| | - Yves Moreau
- Electrical Engineering Department, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Sara C. Madeira
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
- * E-mail: (JPG); (SCM)
| |
Collapse
|
26
|
Andrade-Navarro MA. Mining the literature: new methods to exploit keyword profiles. Genome Med 2012; 4:81. [PMID: 23114100 PMCID: PMC3580450 DOI: 10.1186/gm382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Bibliographic records in the PubMed database of biomedical literature are annotated with Medical Subject Headings (MeSH) by curators, which summarize the content of the articles. Two recent publications explain how to generate profiles of MeSH terms for a set of bibliographic records and to use them to define any given concept by its associated literature. These concepts can then be related by their keyword profiles, and this can be used, for example, to detect new associations between genes and inherited diseases. See related research articles: http://www.biomedcentral.com/1471-2105/13/249/abstracthttp://genomemedicine.com/content/4/9/75/abstract
Collapse
|
27
|
Börnigen D, Tranchevent LC, Bonachela-Capdevila F, Devriendt K, De Moor B, De Causmaecker P, Moreau Y. An unbiased evaluation of gene prioritization tools. Bioinformatics 2012; 28:3081-8. [PMID: 23047555 DOI: 10.1093/bioinformatics/bts581] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Gene prioritization aims at identifying the most promising candidate genes among a large pool of candidates-so as to maximize the yield and biological relevance of further downstream validation experiments and functional studies. During the past few years, several gene prioritization tools have been defined, and some of them have been implemented and made available through freely available web tools. In this study, we aim at comparing the predictive performance of eight publicly available prioritization tools on novel data. We have performed an analysis in which 42 recently reported disease-gene associations from literature are used to benchmark these tools before the underlying databases are updated. RESULTS Cross-validation on retrospective data provides performance estimate likely to be overoptimistic because some of the data sources are contaminated with knowledge from disease-gene association. Our approach mimics a novel discovery more closely and thus provides more realistic performance estimates. There are, however, marked differences, and tools that rely on more advanced data integration schemes appear more powerful. CONTACT yves.moreau@esat.kuleuven.be SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniela Börnigen
- Department of Electrical Engineering, ESAT-SCD, Katholieke Universiteit Leuven, Leuven, Belgium
| | | | | | | | | | | | | |
Collapse
|
28
|
Cheung WA, Ouellette BF, Wasserman WW. Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles. Genome Med 2012; 4:75. [PMID: 23021552 PMCID: PMC3580445 DOI: 10.1186/gm376] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2012] [Revised: 09/11/2012] [Accepted: 09/28/2012] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND MEDLINE(®)/PubMed(®) currently indexes over 18 million biomedical articles, providing unprecedented opportunities and challenges for text analysis. Using Medical Subject Heading Over-representation Profiles (MeSHOPs), an entity of interest can be robustly summarized, quantitatively identifying associated biomedical terms and predicting novel indirect associations. METHODS A procedure is introduced for quantitative comparison of MeSHOPs derived from a group of MEDLINE(®) articles for a biomedical topic (for example, articles for a specific gene or disease). Similarity scores are computed to compare MeSHOPs of genes and diseases. RESULTS Similarity scores successfully infer novel associations between diseases and genes. The number of papers addressing a gene or disease has a strong influence on predicted associations, revealing an important bias for gene-disease relationship prediction. Predictions derived from comparisons of MeSHOPs achieves a mean 8% AUC improvement in the identification of gene-disease relationships compared to gene-independent baseline properties. CONCLUSIONS MeSHOP comparisons are demonstrated to provide predictive capacity for novel relationships between genes and human diseases. We demonstrate the impact of literature bias on the performance of gene-disease prediction methods. MeSHOPs provide a rich source of annotation to facilitate relationship discovery in biomedical informatics.
Collapse
Affiliation(s)
- Warren A Cheung
- Bioinformatics Graduate Program, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, 980 W. 28th Ave, Vancouver, V5Z 4H4, Canada
| | - Bf Francis Ouellette
- Department of Cells and Systems Biology, Ontario Institute for Cancer Research, University of Toronto, 101 College Street, Toronto, M5G 0A3, Canada
| | - Wyeth W Wasserman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, 980 W. 28th Ave, Vancouver, V5Z 4H4, Canada
| |
Collapse
|
29
|
Magger O, Waldman YY, Ruppin E, Sharan R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput Biol 2012; 8:e1002690. [PMID: 23028288 PMCID: PMC3459874 DOI: 10.1371/journal.pcbi.1002690] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Accepted: 07/28/2012] [Indexed: 01/07/2023] Open
Abstract
The prioritization of candidate disease-causing genes is a fundamental challenge in the post-genomic era. Current state of the art methods exploit a protein-protein interaction (PPI) network for this task. They are based on the observation that genes causing phenotypically-similar diseases tend to lie close to one another in a PPI network. However, to date, these methods have used a static picture of human PPIs, while diseases impact specific tissues in which the PPI networks may be dramatically different. Here, for the first time, we perform a large-scale assessment of the contribution of tissue-specific information to gene prioritization. By integrating tissue-specific gene expression data with PPI information, we construct tissue-specific PPI networks for 60 tissues and investigate their prioritization power. We find that tissue-specific PPI networks considerably improve the prioritization results compared to those obtained using a generic PPI network. Furthermore, they allow predicting novel disease-tissue associations, pointing to sub-clinical tissue effects that may escape early detection.
Collapse
Affiliation(s)
- Oded Magger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | | | |
Collapse
|
30
|
Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet 2012; 13:523-36. [DOI: 10.1038/nrg3253] [Citation(s) in RCA: 332] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
31
|
Doncheva NT, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:429-42. [PMID: 22689539 DOI: 10.1002/wsbm.1177] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many efforts are still devoted to the discovery of genes involved with specific phenotypes, in particular, diseases. High-throughput techniques are thus applied frequently to detect dozens or even hundreds of candidate genes. However, the experimental validation of many candidates is often an expensive and time-consuming task. Therefore, a great variety of computational approaches has been developed to support the identification of the most promising candidates for follow-up studies. The biomedical knowledge already available about the disease of interest and related genes is commonly exploited to find new gene-disease associations and to prioritize candidates. In this review, we highlight recent methodological advances in this research field of candidate gene prioritization. We focus on approaches that use network information and integrate heterogeneous data sources. Furthermore, we discuss current benchmarking procedures for evaluating and comparing different prioritization methods.
Collapse
|
32
|
Gilissen C, Hoischen A, Brunner HG, Veltman JA. Disease gene identification strategies for exome sequencing. Eur J Hum Genet 2012; 20:490-7. [PMID: 22258526 PMCID: PMC3330229 DOI: 10.1038/ejhg.2011.258] [Citation(s) in RCA: 310] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2011] [Revised: 10/31/2011] [Accepted: 12/07/2011] [Indexed: 12/16/2022] Open
Abstract
Next generation sequencing can be used to search for Mendelian disease genes in an unbiased manner by sequencing the entire protein-coding sequence, known as the exome, or even the entire human genome. Identifying the pathogenic mutation amongst thousands to millions of genomic variants is a major challenge, and novel variant prioritization strategies are required. The choice of these strategies depends on the availability of well-phenotyped patients and family members, the mode of inheritance, the severity of the disease and its population frequency. In this review, we discuss the current strategies for Mendelian disease gene identification by exome resequencing. We conclude that exome strategies are successful and identify new Mendelian disease genes in approximately 60% of the projects. Improvements in bioinformatics as well as in sequencing technology will likely increase the success rate even further. Exome sequencing is likely to become the most commonly used tool for Mendelian disease gene identification for the coming years.
Collapse
Affiliation(s)
- Christian Gilissen
- Department of Human Genetics, Nijmegen Centre for Molecular Life Sciences and Institute for Genetic and Metabolic Disorders, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands.
| | | | | | | |
Collapse
|
33
|
Piro RM, Di Cunto F. Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 2012; 279:678-96. [PMID: 22221742 DOI: 10.1111/j.1742-4658.2012.08471.x] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The identification of genes involved in human hereditary diseases often requires the time-consuming and expensive examination of a great number of possible candidate genes, since genome-wide techniques such as linkage analysis and association studies frequently select many hundreds of 'positional' candidates. Even considering the positive impact of next-generation sequencing technologies, the prioritization of candidate genes may be an important step for disease-gene identification. In this paper we develop a basic classification scheme for computational approaches to disease-gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center, (DKFZ), Heidelberg, Germany.
| | | |
Collapse
|
34
|
Li X, Li C, Shang D, Li J, Han J, Miao Y, Wang Y, Wang Q, Li W, Wu C, Zhang Y, Li X, Yao Q. The implications of relationships between human diseases and metabolic subpathways. PLoS One 2011; 6:e21131. [PMID: 21695054 PMCID: PMC3117879 DOI: 10.1371/journal.pone.0021131] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2011] [Accepted: 05/20/2011] [Indexed: 01/08/2023] Open
Abstract
One of the challenging problems in the etiology of diseases is to explore the relationships between initiation and progression of diseases and abnormalities in local regions of metabolic pathways. To gain insight into such relationships, we applied the “k-clique” subpathway identification method to all disease-related gene sets. For each disease, the disease risk regions of metabolic pathways were then identified and considered as subpathways associated with the disease. We finally built a disease-metabolic subpathway network (DMSPN). Through analyses based on network biology, we found that a few subpathways, such as that of cytochrome P450, were highly connected with many diseases, and most belonged to fundamental metabolisms, suggesting that abnormalities of fundamental metabolic processes tend to cause more types of diseases. According to the categories of diseases and subpathways, we tested the clustering phenomenon of diseases and metabolic subpathways in the DMSPN. The results showed that both disease nodes and subpathway nodes displayed slight clustering phenomenon. We also tested correlations between network topology and genes within disease-related metabolic subpathways, and found that within a disease-related subpathway in the DMSPN, the ratio of disease genes and the ratio of tissue-specific genes significantly increased as the number of diseases caused by the subpathway increased. Surprisingly, the ratio of essential genes significantly decreased and the ratio of housekeeping genes remained relatively unchanged. Furthermore, the coexpression levels between disease genes and other types of genes were calculated for each subpathway in the DMSPN. The results indicated that those genes intensely influenced by disease genes, including essential genes and tissue-specific genes, might be significantly associated with the disease diversity of subpathways, suggesting that different kinds of genes within a disease-related subpathway may play significantly differential roles on the diversity of diseases caused by the corresponding subpathway.
Collapse
Affiliation(s)
- Xia Li
- Bio-Pharmaceutical Key Laboratory of Heilongjiang Province, and College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Abstract
Despite increasing sequencing capacity, genetic disease investigation still frequently results in the identification of loci containing multiple candidate disease genes that need to be tested for involvement in the disease. This process can be expedited by prioritizing the candidates prior to testing. Over the last decade, a large number of computational methods and tools have been developed to assist the clinical geneticist in prioritizing candidate disease genes. In this chapter, we give an overview of computational tools that can be used for this purpose, all of which are freely available over the web.
Collapse
Affiliation(s)
- Martin Oti
- Structural and Computational Biology Division, Victor Chang Cardiac Research Institute, 2010, Darlinghurst, NSW, Australia.
| | | | | |
Collapse
|
36
|
Lacson R, Mbagwu M, Yousif H, Ohno-Machado L. Assessing the quality of annotations in asthma gene expression experiments. BMC Bioinformatics 2010; 11 Suppl 9:S8. [PMID: 21044366 PMCID: PMC2967749 DOI: 10.1186/1471-2105-11-s9-s8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background The amount of data deposited in the Gene Expression Omnibus (GEO) has expanded significantly. It is important to ensure that these data are properly annotated with clinical data and descriptions of experimental conditions so that they can be useful for future analysis. This study assesses the adequacy of documented asthma markers in GEO. Three objective measures (coverage, consistency and association) were used for evaluation of annotations contained in 17 asthma studies. Results There were 918 asthma samples with 20,640 annotated markers. Of these markers, only 10,419 had documented values (50% coverage). In one study carefully examined for consistency, there were discrepancies in drug name usage, with brand name and generic name used in different sections to refer to the same drug. Annotated markers showed adequate association with other relevant variables (i.e. the use of medication only when its corresponding disease state was present). Conclusions There is inadequate variable coverage within GEO and usage of terms lacks consistency. Association between relevant variables, however, was adequate.
Collapse
Affiliation(s)
- Ronilda Lacson
- Decision Systems Group, Brigham & Women's Hospital, Harvard Medical School, Boston, MA, USA.
| | | | | | | |
Collapse
|