1
|
Cui H, Srinivasan S, Gao Z, Korkin D. The Extent of Edgetic Perturbations in the Human Interactome Caused by Population-Specific Mutations. Biomolecules 2023; 14:40. [PMID: 38254640 PMCID: PMC11154503 DOI: 10.3390/biom14010040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 11/30/2023] [Accepted: 12/03/2023] [Indexed: 01/24/2024] Open
Abstract
Until recently, efforts in population genetics have been focused primarily on people of European ancestry. To attenuate this bias, global population studies, such as the 1000 Genomes Project, have revealed differences in genetic variation across ethnic groups. How many of these differences can be attributed to population-specific traits? To answer this question, the mutation data must be linked with functional outcomes. A new "edgotype" concept has been proposed, which emphasizes the interaction-specific, "edgetic", perturbations caused by mutations in the interacting proteins. In this work, we performed systematic in silico edgetic profiling of ~50,000 non-synonymous SNVs (nsSNVs) from the 1000 Genomes Project by leveraging our semi-supervised learning approach SNP-IN tool on a comprehensive set of over 10,000 protein interaction complexes. We interrogated the functional roles of the variants and their impact on the human interactome and compared the results with the pathogenic variants disrupting PPIs in the same interactome. Our results demonstrated that a considerable number of nsSNVs from healthy populations could rewire the interactome. We also showed that the proteins enriched with interaction-disrupting mutations were associated with diverse functions and had implications in a broad spectrum of diseases. Further analysis indicated that distinct gene edgetic profiles among major populations could shed light on the molecular mechanisms behind the population phenotypic variances. Finally, the network analysis revealed that the disease-associated modules surprisingly harbored a higher density of interaction-disrupting mutations from healthy populations. The variation in the cumulative network damage within these modules could potentially account for the observed disparities in disease susceptibility, which are distinctly specific to certain populations. Our work demonstrates the feasibility of a large-scale in silico edgetic study, and reveals insights into the orchestrated play of population-specific mutations in the human interactome.
Collapse
Affiliation(s)
- Hongzhu Cui
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Chromatography and Mass Spectrometry Division, Thermo Fisher Scientific, San Jose, CA 95134, USA
| | - Suhas Srinivasan
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Program in Epithelial Biology, Stanford School of Medicine, Stanford, CA 94305, USA
- Center for Personal Dynamic Regulomes, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Ziyang Gao
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| |
Collapse
|
2
|
Fernando PC, Mabee PM, Zeng E. Protein-protein interaction network module changes associated with the vertebrate fin-to-limb transition. Sci Rep 2023; 13:22594. [PMID: 38114646 PMCID: PMC10730527 DOI: 10.1038/s41598-023-50050-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
Evolutionary phenotypic transitions, such as the fin-to-limb transition in vertebrates, result from modifications in related proteins and their interactions, often in response to changing environment. Identifying these alterations in protein networks is crucial for a more comprehensive understanding of these transitions. However, previous research has not attempted to compare protein-protein interaction (PPI) networks associated with evolutionary transitions, and most experimental studies concentrate on a limited set of proteins. Therefore, the goal of this work was to develop a network-based platform for investigating the fin-to-limb transition using PPI networks. Quality-enhanced protein networks, constructed by integrating PPI networks with anatomy ontology data, were leveraged to compare protein modules for paired fins (pectoral fin and pelvic fin) of fishes (zebrafish) to those of the paired limbs (forelimb and hindlimb) of mammals (mouse). This also included prediction of novel protein candidates and their validation by enrichment and homology analyses. Hub proteins such as shh and bmp4, which are crucial for module stability, were identified, and their changing roles throughout the transition were examined. Proteins with preserved roles during the fin-to-limb transition were more likely to be hub proteins. This study also addressed hypotheses regarding the role of non-preserved proteins associated with the transition.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Plant Sciences, University of Colombo, Colombo, Sri Lanka.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
- National Ecological Observatory Network, Battelle, 1625 38th St. #100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Departments of Preventive & Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA.
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA.
- Departments of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA.
- Departments of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
3
|
Melograna F, Li Z, Galazzo G, van Best N, Mommers M, Penders J, Stella F, Van Steen K. Edge and modular significance assessment in individual-specific networks. Sci Rep 2023; 13:7868. [PMID: 37188794 PMCID: PMC10185658 DOI: 10.1038/s41598-023-34759-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 05/07/2023] [Indexed: 05/17/2023] Open
Abstract
Individual-specific networks, defined as networks of nodes and connecting edges that are specific to an individual, are promising tools for precision medicine. When such networks are biological, interpretation of functional modules at an individual level becomes possible. An under-investigated problem is relevance or "significance" assessment of each individual-specific network. This paper proposes novel edge and module significance assessment procedures for weighted and unweighted individual-specific networks. Specifically, we propose a modular Cook's distance using a method that involves iterative modeling of one edge versus all the others within a module. Two procedures assessing changes between using all individuals and using all individuals but leaving one individual out (LOO) are proposed as well (LOO-ISN, MultiLOO-ISN), relying on empirically derived edges. We compare our proposals to competitors, including adaptions of OPTICS, kNN, and Spoutlier methods, by an extensive simulation study, templated on real-life scenarios for gene co-expression and microbial interaction networks. Results show the advantages of performing modular versus edge-wise significance assessments for individual-specific networks. Furthermore, modular Cook's distance is among the top performers across all considered simulation settings. Finally, the identification of outlying individuals regarding their individual-specific networks, is meaningful for precision medicine purposes, as confirmed by network analysis of microbiome abundance profiles.
Collapse
Affiliation(s)
- Federico Melograna
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium.
| | - Zuqi Li
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Gianluca Galazzo
- School of Nutrition and Translational Research in Metabolism (NUTRIM), Department of Medical Microbiology Infectious Diseases and Infection Prevention, Maastricht University Medical Center+, Maastricht, The Netherlands
| | - Niels van Best
- Institute of Medical Microbiology, RWTH University Hospital Aachen, RWTH University, Aachen, Germany
- Department of Epidemiology, Care and Public Health Research Institute (CAPHRI), Maastricht University, Maastricht, The Netherlands
| | - Monique Mommers
- Department of Epidemiology, Care and Public Health Research Institute (CAPHRI), Maastricht University, Maastricht, The Netherlands
| | - John Penders
- School of Nutrition and Translational Research in Metabolism (NUTRIM), Department of Medical Microbiology Infectious Diseases and Infection Prevention, Maastricht University Medical Center+, Maastricht, The Netherlands
- Care and Public Health Research Institute (CAPHRI), Maastricht University, Maastricht, The Netherlands
| | - Fabio Stella
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, 20126, Milan, Italy
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
4
|
Jiang L, Hao S, Lin L, Gao X, Xu J. fRNC: Uncovering the dynamic and condition-specific RBP-ncRNA circuits from multi-omics data. Comput Struct Biotechnol J 2023; 21:2276-2285. [PMID: 37035550 PMCID: PMC10073992 DOI: 10.1016/j.csbj.2023.03.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 03/15/2023] [Accepted: 03/21/2023] [Indexed: 03/30/2023] Open
Abstract
The RNA binding protein (RBP) and non-coding RNA (ncRNA) interacting networks are increasingly recognized as the main mechanism in gene regulation, and are tightly associated with cellular malfunction and disease. Here, we present fRNC, a systems biology tool to uncover the dynamic spectrum of RBP-ncRNA circuits (RNC) by integrating transcriptomics, interactomics and proteomics data. fRNC constructs the RBP-ncRNA network derived from CLIP-seq or PARE experiments. Given scoring on nodes and edges according to differential analysis of expression data, it finds an RNC containing global maximum significant RBPs and ncRNAs. Alternatively, it can also capture the locally maximum scoring RNC according to user-defined starting nodes with the greedy search. When compared with existing tools, fRNC can detect more accurate and robust sub-network with scalability. As shown in the cases of esophageal carcinoma, breast cancer and Alzheimer's disease, fRNC enables users to analyze the collective behaviors between RBP and the interacting ncRNAs, and reveal novel insights into the disease-associated processes. The fRNC R package is available at https://github.com/BioinformaticsSTU/fRNC.
Collapse
|
5
|
Alcalá-Corona SA, Sandoval-Motta S, Espinal-Enríquez J, Hernández-Lemus E. Modularity in Biological Networks. Front Genet 2021; 12:701331. [PMID: 34594357 PMCID: PMC8477004 DOI: 10.3389/fgene.2021.701331] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 08/23/2021] [Indexed: 01/13/2023] Open
Abstract
Network modeling, from the ecological to the molecular scale has become an essential tool for studying the structure, dynamics and complex behavior of living systems. Graph representations of the relationships between biological components open up a wide variety of methods for discovering the mechanistic and functional properties of biological systems. Many biological networks are organized into a modular structure, so methods to discover such modules are essential if we are to understand the biological system as a whole. However, most of the methods used in biology to this end, have a limited applicability, as they are very specific to the system they were developed for. Conversely, from the statistical physics and network science perspective, graph modularity has been theoretically studied and several methods of a very general nature have been developed. It is our perspective that in particular for the modularity detection problem, biology and theoretical physics/network science are less connected than they should. The central goal of this review is to provide the necessary background and present the most applicable and pertinent methods for community detection in a way that motivates their further usage in biological research.
Collapse
Affiliation(s)
- Sergio Antonio Alcalá-Corona
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Santiago Sandoval-Motta
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico.,National Council on Science and Technology, Mexico City, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
6
|
Das AB. Lung disease network reveals impact of comorbidity on SARS-CoV-2 infection and opportunities of drug repurposing. BMC Med Genomics 2021; 14:226. [PMID: 34535131 PMCID: PMC8447809 DOI: 10.1186/s12920-021-01079-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 09/08/2021] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Higher mortality of COVID-19 patients with lung disease is a formidable challenge for the health care system. Genetic association between COVID-19 and various lung disorders must be understood to comprehend the molecular basis of comorbidity and accelerate drug development. METHODS Lungs tissue-specific neighborhood network of human targets of SARS-CoV-2 was constructed. This network was integrated with lung diseases to build a disease-gene and disease-disease association network. Network-based toolset was used to identify the overlapping disease modules and drug targets. The functional protein modules were identified using community detection algorithms and biological processes, and pathway enrichment analysis. RESULTS In total, 141 lung diseases were linked to a neighborhood network of SARS-CoV-2 targets, and 59 lung diseases were found to be topologically overlapped with the COVID-19 module. Topological overlap with various lung disorders allows repurposing of drugs used for these disorders to hit the closely associated COVID-19 module. Further analysis showed that functional protein-protein interaction modules in the lungs, substantially hijacked by SARS-CoV-2, are connected to several lung disorders. FDA-approved targets in the hijacked protein modules were identified and that can be hit by exiting drugs to rescue these modules from virus possession. CONCLUSION Lung diseases are clustered with COVID-19 in the same network vicinity, indicating the potential threat for patients with respiratory diseases after SARS-CoV-2 infection. Pathobiological similarities between lung diseases and COVID-19 and clinical evidence suggest that shared molecular features are the probable reason for comorbidity. Network-based drug repurposing approaches can be applied to improve the clinical conditions of COVID-19 patients.
Collapse
Affiliation(s)
- Asim Bikas Das
- Department of Biotechnology, National Institute of Technology Warangal, Warangal, 506004, Telangana, India.
| |
Collapse
|
7
|
Prasad K, AlOmar SY, Alqahtani SAM, Malik MZ, Kumar V. Brain Disease Network Analysis to Elucidate the Neurological Manifestations of COVID-19. Mol Neurobiol 2021; 58:1875-1893. [PMID: 33409839 PMCID: PMC7787249 DOI: 10.1007/s12035-020-02266-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 12/16/2020] [Indexed: 01/08/2023]
Abstract
Although COVID-19 largely causes respiratory complications, it can also lead to various extrapulmonary manifestations resulting in higher mortality and these comorbidities are posing a challenge to the health care system. Reports indicate that 30–60% of patients with COVID-19 suffer from neurological symptoms. To understand the molecular basis of the neurologic comorbidity in COVID-19 patients, we have investigated the genetic association between COVID-19 and various brain disorders through a systems biology-based network approach and observed a remarkable resemblance. Our results showed 123 brain-related disorders associated with COVID-19 and form a high-density disease-disease network. The brain-disease-gene network revealed five highly clustered modules demonstrating a greater complexity of COVID-19 infection. Moreover, we have identified 35 hub proteins of the network which were largely involved in the protein catabolic process, cell cycle, RNA metabolic process, and nuclear transport. Perturbing these hub proteins by drug repurposing will improve the clinical conditions in comorbidity. In the near future, we assumed that in COVID-19 patients, many other neurological manifestations will likely surface. Thus, understanding the infection mechanisms of SARS-CoV-2 and associated comorbidity is a high priority to contain its short- and long-term effects on human health. Our network-based analysis strengthens the understanding of the molecular basis of the neurological manifestations observed in COVID-19 and also suggests drug for repurposing.
Collapse
Affiliation(s)
- Kartikay Prasad
- Amity Institute of Neuropsychology & Neurosciences, Amity University, Noida, 201303, India
| | - Suliman Yousef AlOmar
- Doping research chair, Department of Zoology, College of Science, King Saud University, Riyadh, 11451, Saudi Arabia
| | | | - Md Zubbair Malik
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India.
| | - Vijay Kumar
- Amity Institute of Neuropsychology & Neurosciences, Amity University, Noida, 201303, India.
| |
Collapse
|
8
|
Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics 2020; 21:442. [PMID: 33028186 PMCID: PMC7542696 DOI: 10.1186/s12859-020-03773-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 09/22/2020] [Indexed: 01/04/2023] Open
Abstract
Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Biology, University of South Dakota, Vermillion, SD, USA.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA.,National Ecological Observatory Network, Battelle Memorial Institute, 1685 38th St., Suite 100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Preventive and Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA. .,Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
9
|
Halder AK, Denkiewicz M, Sengupta K, Basu S, Plewczynski D. Aggregated network centrality shows non-random structure of genomic and proteomic networks. Methods 2020; 181-182:5-14. [PMID: 31740366 DOI: 10.1016/j.ymeth.2019.11.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 11/02/2019] [Accepted: 11/08/2019] [Indexed: 11/25/2022] Open
Abstract
Network analysis is a powerful tool for modelling biological systems. We propose a new approach that integrates the genomic interaction data at population level with the proteomic interaction data. In our approach we use chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) data from human genome to construct a set of genomic interaction networks, considering the natural partitioning of chromatin into chromatin contact domains (CCD). The genomic networks are then mapped onto proteomic interactions, to create protein-protein interaction (PPI) subnetworks. Furthermore, the network-based topological properties of these proteomic subnetworks are investigated, namely closeness centrality, betweenness centrality and clustering coefficient. We statistically confirm, that networks identified by our method significantly differ from random networks in these network properties. Additionally, we identify one of the regions, namely chr6:32014923-33217929, as having an above-random concentration of the single nucleotide polymorphisms (SNPs) related to autoimmune diseases. Then we present it in the form of a meta-network, which includes multi-omic data: genomic contact sites (anchors), genes, proteins and SNPs. Using this example we demonstrate, that the created networks provide a valid mapping of genes to SNPs, expanding on the raw SNP dataset used.
Collapse
Affiliation(s)
- Anup Kumar Halder
- Centre of New Technologies, University of Warsaw, Warsaw, Poland; Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.
| | - Michał Denkiewicz
- Centre of New Technologies, University of Warsaw, Warsaw, Poland; Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Kaustav Sengupta
- Centre of New Technologies, University of Warsaw, Warsaw, Poland; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.
| | - Dariusz Plewczynski
- Centre of New Technologies, University of Warsaw, Warsaw, Poland; Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland; Computer Science Department, University of California, 2063 Kemper Hall, One Shields Avenue, Davis, CA 95616-8562, United States.
| |
Collapse
|
10
|
The Eminence of Co-Expressed Ties in Schizophrenia Network Communities. DATA 2019. [DOI: 10.3390/data4040149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Exploring gene networks is crucial for identifying significant biological interactions occurring in a disease condition. These interactions can be acknowledged by modeling the tie structure of networks. Such tie orientations are often detected within embedded community structures. However, most of the prevailing community detection modules are intended to capture information from nodes and its attributes, usually ignoring the ties. In this study, a modularity maximization algorithm is proposed based on nonlinear representation of local tangent space alignment (LTSA). Initially, the tangent coordinates are computed locally to identify k-nearest neighbors across the genes. These local neighbors are further optimized by generating a nonlinear network embedding function for detecting gene communities based on eigenvector decomposition. Experimental results suggest that this algorithm detects gene modules with a better modularity index of 0.9256, compared to other traditional community detection algorithms. Furthermore, co-expressed genes across these communities are identified by discovering the characteristic tie structures. These detected ties are known to have substantial biological influence in the progression of schizophrenia, thereby signifying the influence of tie patterns in biological networks. This technique can be extended logically on other diseases networks for detecting substantial gene “hotspots”.
Collapse
|
11
|
Cui H, Srinivasan S, Korkin D. Enriching Human Interactome with Functional Mutations to Detect High-Impact Network Modules Underlying Complex Diseases. Genes (Basel) 2019; 10:E933. [PMID: 31731769 PMCID: PMC6895925 DOI: 10.3390/genes10110933] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Revised: 11/04/2019] [Accepted: 11/11/2019] [Indexed: 11/16/2022] Open
Abstract
Rapid progress in high-throughput -omics technologies moves us one step closer to the datacalypse in life sciences. In spite of the already generated volumes of data, our knowledge of the molecular mechanisms underlying complex genetic diseases remains limited. Increasing evidence shows that biological networks are essential, albeit not sufficient, for the better understanding of these mechanisms. The identification of disease-specific functional modules in the human interactome can provide a more focused insight into the mechanistic nature of the disease. However, carving a disease network module from the whole interactome is a difficult task. In this paper, we propose a computational framework, Discovering most IMpacted SUbnetworks in interactoMe (DIMSUM), which enables the integration of genome-wide association studies (GWAS) and functional effects of mutations into the protein-protein interaction (PPI) network to improve disease module detection. Specifically, our approach incorporates and propagates the functional impact of non-synonymous single nucleotide polymorphisms (nsSNPs) on PPIs to implicate the genes that are most likely influenced by the disruptive mutations, and to identify the module with the greatest functional impact. Comparison against state-of-the-art seed-based module detection methods shows that our approach could yield modules that are biologically more relevant and have stronger association with the studied disease. We expect for our method to become a part of the common toolbox for the disease module analysis, facilitating the discovery of new disease markers.
Collapse
Affiliation(s)
- Hongzhu Cui
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| | - Suhas Srinivasan
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| |
Collapse
|
12
|
Musa A, Tripathi S, Dehmer M, Yli-Harja O, Kauffman SA, Emmert-Streib F. Systems Pharmacogenomic Landscape of Drug Similarities from LINCS data: Drug Association Networks. Sci Rep 2019; 9:7849. [PMID: 31127155 PMCID: PMC6534546 DOI: 10.1038/s41598-019-44291-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 05/08/2019] [Indexed: 02/01/2023] Open
Abstract
Modern research in the biomedical sciences is data-driven utilizing high-throughput technologies to generate big genomic data. The Library of Integrated Network-based Cellular Signatures (LINCS) is an example for a large-scale genomic data repository providing hundred thousands of high-dimensional gene expression measurements for thousands of drugs and dozens of cell lines. However, the remaining challenge is how to use these data effectively for pharmacogenomics. In this paper, we use LINCS data to construct drug association networks (DANs) representing the relationships between drugs. By using the Anatomical Therapeutic Chemical (ATC) classification of drugs we demonstrate that the DANs represent a systems pharmacogenomic landscape of drugs summarizing the entire LINCS repository on a genomic scale meaningfully. Here we identify the modules of the DANs as therapeutic attractors of the ATC drug classes.
Collapse
Affiliation(s)
- Aliyu Musa
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Shailesh Tripathi
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Wehrgrabengasse 1-3, 4400, Steyr, Austria
| | - Matthias Dehmer
- Department for Biomedical Computer Science and Mechatronics, UMIT - The Health and Lifesciences University, Eduard Wallnoefer Zentrum 1, 6060, Hall in Tyrol, Austria
- College of Computer and Control Engineering, Nankai University, Tianjin, 300350, P.R. China
- Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Wehrgrabengasse 1-3, 4400, Steyr, Austria
| | - Olli Yli-Harja
- Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Computational Systems Biology Lab, Tampere University of Technology, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Institute for Systems Biology, Seattle, WA, 98109, USA
| | | | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.
- Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.
| |
Collapse
|
13
|
Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2019. [DOI: 10.3390/make1010032] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
When performing a regression or classification analysis, one needs to specify a statistical model. This model should avoid the overfitting and underfitting of data, and achieve a low generalization error that characterizes its prediction performance. In order to identify such a model, one needs to decide which model to select from candidate model families based on performance evaluations. In this paper, we review the theoretical framework of model selection and model assessment, including error-complexity curves, the bias-variance tradeoff, and learning curves for evaluating statistical models. We discuss criterion-based, step-wise selection procedures and resampling methods for model selection, whereas cross-validation provides the most simple and generic means for computationally estimating all required entities. To make the theoretical concepts transparent, we present worked examples for linear regression models. However, our conceptual presentation is extensible to more general models, as well as classification problems.
Collapse
|
14
|
Kaalia R, Rajapakse JC. Functional homogeneity and specificity of topological modules in human proteome. BMC Bioinformatics 2019; 19:553. [PMID: 30717667 PMCID: PMC7394330 DOI: 10.1186/s12859-018-2549-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 11/30/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Functional modules in protein-protein interaction networks (PPIN) are defined by maximal sets of functionally associated proteins and are vital to understanding cellular mechanisms and identifying disease associated proteins. Topological modules of the human proteome have been shown to be related to functional modules of PPIN. However, the effects of the weights of interactions between protein pairs and the integration of physical (direct) interactions with functional (indirect expression-based) interactions have not been investigated in the detection of functional modules of the human proteome. RESULTS We investigated functional homogeneity and specificity of topological modules of the human proteome and validated them with known biological and disease pathways. Specifically, we determined the effects on functional homogeneity and heterogeneity of topological modules (i) with both physical and functional protein-protein interactions; and (ii) with incorporation of functional similarities between proteins as weights of interactions. With functional enrichment analyses and a novel measure for functional specificity, we evaluated functional relevance and specificity of topological modules of the human proteome. CONCLUSIONS The topological modules ranked using specificity scores show high enrichment with gene sets of known functions. Physical interactions in PPIN contribute to high specificity of the topological modules of the human proteome whereas functional interactions contribute to high homogeneity of the modules. Weighted networks result in more number of topological modules but did not affect their functional propensity. Modules of human proteome are more homogeneous for molecular functions than biological processes.
Collapse
Affiliation(s)
- Rama Kaalia
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Jagath C. Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
15
|
Luecken MD, Page MJT, Crosby AJ, Mason S, Reinert G, Deane CM. CommWalker: correctly evaluating modules in molecular networks in light of annotation bias. Bioinformatics 2019; 34:994-1000. [PMID: 29112702 PMCID: PMC5860269 DOI: 10.1093/bioinformatics/btx706] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 11/02/2017] [Indexed: 11/24/2022] Open
Abstract
Motivation Detecting novel functional modules in molecular networks is an important step in biological research. In the absence of gold standard functional modules, functional annotations are often used to verify whether detected modules/communities have biological meaning. However, as we show, the uneven distribution of functional annotations means that such evaluation methods favor communities of well-studied proteins. Results We propose a novel framework for the evaluation of communities as functional modules. Our proposed framework, CommWalker, takes communities as inputs and evaluates them in their local network environment by performing short random walks. We test CommWalker’s ability to overcome annotation bias using input communities from four community detection methods on two protein interaction networks. We find that modules accepted by CommWalker are similarly co-expressed as those accepted by current methods. Crucially, CommWalker performs well not only in well-annotated regions, but also in regions otherwise obscured by poor annotation. CommWalker community prioritization both faithfully captures well-validated communities and identifies functional modules that may correspond to more novel biology. Availability and implementation The CommWalker algorithm is freely available at opig.stats.ox.ac.uk/resources or as a docker image on the Docker Hub at hub.docker.com/r/lueckenmd/commwalker/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M D Luecken
- Department of Statistics, University of Oxford, Oxford, UK
- Doctoral Training Centre, University of Oxford, Oxford, UK
| | - M J T Page
- Department of Informatics, UCB Pharma, Slough, UK
| | - A J Crosby
- Immunology Therapeutic Area, UCB Pharma, Slough, UK
| | - S Mason
- Immunology Therapeutic Area, UCB Pharma, Slough, UK
| | - G Reinert
- Department of Statistics, University of Oxford, Oxford, UK
| | - C M Deane
- Department of Statistics, University of Oxford, Oxford, UK
- Doctoral Training Centre, University of Oxford, Oxford, UK
- To whom correspondence should be addressed.
| |
Collapse
|
16
|
MTGO: PPI Network Analysis Via Topological and Functional Module Identification. Sci Rep 2018; 8:5499. [PMID: 29615773 PMCID: PMC5882952 DOI: 10.1038/s41598-018-23672-0] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 02/28/2018] [Indexed: 11/08/2022] Open
Abstract
Protein-protein interaction (PPI) networks are viable tools to understand cell functions, disease machinery, and drug design/repositioning. Interpreting a PPI, however, it is a particularly challenging task because of network complexity. Several algorithms have been proposed for an automatic PPI interpretation, at first by solely considering the network topology, and later by integrating Gene Ontology (GO) terms as node similarity attributes. Here we present MTGO - Module detection via Topological information and GO knowledge, a novel functional module identification approach. MTGO let emerge the bimolecular machinery underpinning PPI networks by leveraging on both biological knowledge and topological properties. In particular, it directly exploits GO terms during the module assembling process, and labels each module with its best fit GO term, easing its functional interpretation. MTGO shows largely better results than other state of the art algorithms (including recent GO-based ones) when searching for small or sparse functional modules, while providing comparable or better results all other cases. MTGO correctly identifies molecular complexes and literature-consistent processes in an experimentally derived PPI network of Myocardial infarction. A software version of MTGO is available freely for non-commercial purposes at https://gitlab.com/d1vella/MTGO .
Collapse
|
17
|
Metri R, Mohan A, Nsengimana J, Pozniak J, Molina-Paris C, Newton-Bishop J, Bishop D, Chandra N. Identification of a gene signature for discriminating metastatic from primary melanoma using a molecular interaction network approach. Sci Rep 2017; 7:17314. [PMID: 29229936 PMCID: PMC5725601 DOI: 10.1038/s41598-017-17330-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 11/10/2017] [Indexed: 01/15/2023] Open
Abstract
Understanding the biological factors that are characteristic of metastasis in melanoma remains a key approach to improving treatment. In this study, we seek to identify a gene signature of metastatic melanoma. We configured a new network-based computational pipeline, combined with a machine learning method, to mine publicly available transcriptomic data from melanoma patient samples. Our method is unbiased and scans a genome-wide protein-protein interaction network using a novel formulation for network scoring. Using this, we identify the most influential, differentially expressed nodes in metastatic as compared to primary melanoma. We evaluated the shortlisted genes by a machine learning method to rank them by their discriminatory capacities. From this, we identified a panel of 6 genes, ALDH1A1, HSP90AB1, KIT, KRT16, SPRR3 and TMEM45B whose expression values discriminated metastatic from primary melanoma (87% classification accuracy). In an independent transcriptomic data set derived from 703 primary melanomas, we showed that all six genes were significant in predicting melanoma specific survival (MSS) in a univariate analysis, which was also consistent with AJCC staging. Further, 3 of these genes, HSP90AB1, SPRR3 and KRT16 remained significant predictors of MSS in a joint analysis (HR = 2.3, P = 0.03) although, HSP90AB1 (HR = 1.9, P = 2 × 10-4) alone remained predictive after adjusting for clinical predictors.
Collapse
Affiliation(s)
- Rahul Metri
- IISc Mathematics Initiative (IMI), Indian Institute of Science, Bangalore, Karnataka, India
| | - Abhilash Mohan
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, India
| | - Jérémie Nsengimana
- Section of Epidemiology and Biostatistics, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - Joanna Pozniak
- Section of Epidemiology and Biostatistics, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - Carmen Molina-Paris
- Department of Applied Mathematics, School of Mathematics, University of Leeds, Leeds, UK
| | - Julia Newton-Bishop
- Section of Epidemiology and Biostatistics, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - David Bishop
- Section of Epidemiology and Biostatistics, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - Nagasuma Chandra
- IISc Mathematics Initiative (IMI), Indian Institute of Science, Bangalore, Karnataka, India.
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, India.
| |
Collapse
|
18
|
Abstract
Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled "Big Data to Knowledge (BD2K)." The main emphasis of the more than $200M allocated to that program has been on "Big Data;" the "Knowledge" component has largely been the implicit assumption that the work will lead to new biomedical knowledge. However, there is long-standing and highly productive work in computational knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science. Knowledge-based biomedical Data Science involves the design and implementation of computer systems that act as if they knew about biomedicine. There are many ways in which a computational approach might act as if it knew something: for example, it might be able to answer a natural language question about a biomedical topic, or pass an exam; it might be able to use existing biomedical knowledge to rank or evaluate hypotheses; it might explain or interpret data in light of prior knowledge, either in a Bayesian or other sort of framework. These are all examples of automated reasoning that act on computational representations of knowledge. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application.
Collapse
Affiliation(s)
- Lawrence E Hunter
- Computational Bioscience, University of Colorado School of Medicine, Aurora, CO 80045, USA ; ORCID: https://orcid.org/0000-0003-1455-3370
| |
Collapse
|