1
|
Yu X, Lai S, Chen H, Chen M. Protein–protein interaction network with machine learning models and multiomics data reveal potential neurodegenerative disease-related proteins. Hum Mol Genet 2020; 29:1378-1387. [DOI: 10.1093/hmg/ddaa065] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 12/22/2019] [Accepted: 04/01/2020] [Indexed: 12/18/2022] Open
Abstract
AbstractResearch of protein–protein interaction in several model organisms is accumulating since the development of high-throughput experimental technologies and computational methods. The protein–protein interaction network (PPIN) is able to examine biological processes in a systematic manner and has already been used to predict potential disease-related proteins or drug targets. Based on the topological characteristics of the PPIN, we investigated the application of the random forest classification algorithm to predict proteins that may cause neurodegenerative disease, a set of pathological changes featured by protein malfunction. By integrating multiomics data, we further showed the validity of our machine learning model and narrowed down the prediction results to several hub proteins that play essential roles in the PPIN. The novel insights into neurodegeneration pathogenesis brought by this computational study can indicate promising directions for future experimental research.
Collapse
Affiliation(s)
- Xinjian Yu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Siqi Lai
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hongjun Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
2
|
Barman RK, Mukhopadhyay A, Maulik U, Das S. Identification of infectious disease-associated host genes using machine learning techniques. BMC Bioinformatics 2019; 20:736. [PMID: 31881961 PMCID: PMC6935192 DOI: 10.1186/s12859-019-3317-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 12/16/2019] [Indexed: 02/06/2023] Open
Abstract
Background With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. Results We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. Conclusions To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.
Collapse
Affiliation(s)
- Ranjan Kumar Barman
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India.,Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Santasabuj Das
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India. .,Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, P-33, C.I.T.Road Scheme XM, Beliaghata-700010, Kolkata, West Bengal, India.
| |
Collapse
|
3
|
Jamal S, Goyal S, Shanker A, Grover A. Integrating network, sequence and functional features using machine learning approaches towards identification of novel Alzheimer genes. BMC Genomics 2016; 17:807. [PMID: 27756223 PMCID: PMC5070370 DOI: 10.1186/s12864-016-3108-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 09/20/2016] [Indexed: 01/01/2023] Open
Abstract
Background Alzheimer’s disease (AD) is a complex progressive neurodegenerative disorder commonly characterized by short term memory loss. Presently no effective therapeutic treatments exist that can completely cure this disease. The cause of Alzheimer’s is still unclear, however one of the other major factors involved in AD pathogenesis are the genetic factors and around 70 % risk of the disease is assumed to be due to the large number of genes involved. Although genetic association studies have revealed a number of potential AD susceptibility genes, there still exists a need for identification of unidentified AD-associated genes and therapeutic targets to have better understanding of the disease-causing mechanisms of Alzheimer’s towards development of effective AD therapeutics. Results In the present study, we have used machine learning approach to identify candidate AD associated genes by integrating topological properties of the genes from the protein-protein interaction networks, sequence features and functional annotations. We also used molecular docking approach and screened already known anti-Alzheimer drugs against the novel predicted probable targets of AD and observed that an investigational drug, AL-108, had high affinity for majority of the possible therapeutic targets. Furthermore, we performed molecular dynamics simulations and MM/GBSA calculations on the docked complexes to validate our preliminary findings. Conclusions To the best of our knowledge, this is the first comprehensive study of its kind for identification of putative Alzheimer-associated genes using machine learning approaches and we propose that such computational studies can improve our understanding on the core etiology of AD which could lead to the development of effective anti-Alzheimer drugs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3108-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Salma Jamal
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, 110067, India.,Department of Bioscience and Biotechnology, Banasthali University, Tonk, Rajasthan, 304022, India
| | - Sukriti Goyal
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, 110067, India.,Department of Bioscience and Biotechnology, Banasthali University, Tonk, Rajasthan, 304022, India
| | - Asheesh Shanker
- Bioinformatics Programme, Centre for Biological Sciences, Central University of South Bihar, BIT Campus, Patna, Bihar, India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, 110067, India.
| |
Collapse
|
4
|
Barresi S, Niceta M, Alfieri P, Brankovic V, Piccini G, Bruselles A, Barone MR, Cusmai R, Tartaglia M, Bertini E, Zanni G. Mutations in the IRBIT domain of ITPR1 are a frequent cause of autosomal dominant nonprogressive congenital ataxia. Clin Genet 2016; 91:86-91. [PMID: 27062503 DOI: 10.1111/cge.12783] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Revised: 04/01/2016] [Accepted: 04/01/2016] [Indexed: 01/23/2023]
Abstract
Congenital ataxias are nonprogressive neurological disorders characterized by neonatal hypotonia, developmental delay and ataxia, variably associated with intellectual disability and other neurological or extraneurological features. We performed trio-based whole-exome sequencing of 12 families with congenital cerebellar and/or vermis atrophy in parallel with targeted next-generation sequencing of known ataxia genes (CACNA1A, ITPR1, KCNC3, ATP2B3 and GRM1) in 12 additional patients with a similar phenotype. Novel pathological mutations of ITPR1 (inositol 1,4,5-trisphosphate receptor, type 1) were found in seven patients from four families (4/24, ∼16.8%) all localized in the IRBIT (inositol triphosphate receptor binding protein) domain which plays an essential role in the regulation of neuronal plasticity and development. Our study expands the mutational spectrum of ITPR1-related congenital ataxia and indicates that ITPR1 gene screening should be implemented in this subgroup of ataxias.
Collapse
Affiliation(s)
- S Barresi
- Department of Neurosciences, Unit of Molecular Medicine for Neuromuscular and Neurodegenerative Disorders, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy.,Genetics and Rare Diseases Research Division, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - M Niceta
- Genetics and Rare Diseases Research Division, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - P Alfieri
- Department of Neurosciences, Child Neuropsychiatry, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - V Brankovic
- Clinic for Child Neurology and Psychiatry, Medical Faculty, University of Belgrade, Belgrade, Serbia
| | - G Piccini
- Department of Neurosciences, Child Neuropsychiatry, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - A Bruselles
- Department of Hematology, Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome, Italy
| | - M R Barone
- Centro ambulatoriale di Riabilitazione, Fondazione Betania Onlus, Catanzaro, Italy
| | - R Cusmai
- Department of Neurosciences, Neurology, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - M Tartaglia
- Genetics and Rare Diseases Research Division, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - E Bertini
- Department of Neurosciences, Unit of Molecular Medicine for Neuromuscular and Neurodegenerative Disorders, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - G Zanni
- Department of Neurosciences, Unit of Molecular Medicine for Neuromuscular and Neurodegenerative Disorders, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| |
Collapse
|
5
|
An O, Dall'Olio GM, Mourikis TP, Ciccarelli FD. NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings. Nucleic Acids Res 2015; 44:D992-9. [PMID: 26516186 PMCID: PMC4702816 DOI: 10.1093/nar/gkv1123] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022] Open
Abstract
The Network of Cancer Genes (NCG, http://ncg.kcl.ac.uk/) is a manually curated repository of cancer genes derived from the scientific literature. Due to the increasing amount of cancer genomic data, we have introduced a more robust procedure to extract cancer genes from published cancer mutational screenings and two curators independently reviewed each publication. NCG release 5.0 (August 2015) collects 1571 cancer genes from 175 published studies that describe 188 mutational screenings of 13 315 cancer samples from 49 cancer types and 24 primary sites. In addition to collecting cancer genes, NCG also provides information on the experimental validation that supports the role of these genes in cancer and annotates their properties (duplicability, evolutionary origin, expression profile, function and interactions with proteins and miRNAs).
Collapse
Affiliation(s)
- Omer An
- Division of Cancer Studies, King's College London, London SE11UL, UK
| | | | - Thanos P Mourikis
- Division of Cancer Studies, King's College London, London SE11UL, UK
| | | |
Collapse
|