1
|
Abstract
Link prediction in complex network is an important issue in network science. Recently, various structure-based similarity methods have been proposed. Most of algorithms are used to analyze the topology of the network, and to judge whether there is any connection between nodes by calculating the similarity of two nodes. However, it is necessary to get the extra attribute information of the node in advance, which is very difficult. Compared to the difficulty in obtaining the attribute information of the node itself, the topology of the network is easy to obtain, and the structure of the network is an inherent attribute of the network and is more reliable. The proposed method measures kinds of similarity between nodes based on non-trivial eigenvectors of Laplacian Matrix of the network, such as Euclidean distance, Manhattan distance and Angular distance. Then the classical machine learning algorithm can be used for classification prediction (two classification in this case), so as to achieve the purpose of link prediction. Based on this process, a spectral analysis-based link prediction algorithm is proposed, and named it LPbSA (Link Prediction based on Spectral Analysis). The experimental results on seven real-world networks demonstrated that LPbSA has better performance on Accuracy, Precision, Receiver Operating Curve(ROC), area under the ROC curve(AUC), Precision and Recall curve(PR curve) and balanced F Score(F-score curve) evaluation metrics than other ten classic methods.
Collapse
Affiliation(s)
- Chun Gui
- College of Mathematics and Computer Science, Northwest Minzu University, Lanzhou, China
- Key Laboratory of China’s Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, China
| |
Collapse
|
2
|
Cui H, Srinivasan S, Gao Z, Korkin D. The Extent of Edgetic Perturbations in the Human Interactome Caused by Population-Specific Mutations. Biomolecules 2023; 14:40. [PMID: 38254640 PMCID: PMC11154503 DOI: 10.3390/biom14010040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 11/30/2023] [Accepted: 12/03/2023] [Indexed: 01/24/2024] Open
Abstract
Until recently, efforts in population genetics have been focused primarily on people of European ancestry. To attenuate this bias, global population studies, such as the 1000 Genomes Project, have revealed differences in genetic variation across ethnic groups. How many of these differences can be attributed to population-specific traits? To answer this question, the mutation data must be linked with functional outcomes. A new "edgotype" concept has been proposed, which emphasizes the interaction-specific, "edgetic", perturbations caused by mutations in the interacting proteins. In this work, we performed systematic in silico edgetic profiling of ~50,000 non-synonymous SNVs (nsSNVs) from the 1000 Genomes Project by leveraging our semi-supervised learning approach SNP-IN tool on a comprehensive set of over 10,000 protein interaction complexes. We interrogated the functional roles of the variants and their impact on the human interactome and compared the results with the pathogenic variants disrupting PPIs in the same interactome. Our results demonstrated that a considerable number of nsSNVs from healthy populations could rewire the interactome. We also showed that the proteins enriched with interaction-disrupting mutations were associated with diverse functions and had implications in a broad spectrum of diseases. Further analysis indicated that distinct gene edgetic profiles among major populations could shed light on the molecular mechanisms behind the population phenotypic variances. Finally, the network analysis revealed that the disease-associated modules surprisingly harbored a higher density of interaction-disrupting mutations from healthy populations. The variation in the cumulative network damage within these modules could potentially account for the observed disparities in disease susceptibility, which are distinctly specific to certain populations. Our work demonstrates the feasibility of a large-scale in silico edgetic study, and reveals insights into the orchestrated play of population-specific mutations in the human interactome.
Collapse
Affiliation(s)
- Hongzhu Cui
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Chromatography and Mass Spectrometry Division, Thermo Fisher Scientific, San Jose, CA 95134, USA
| | - Suhas Srinivasan
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Program in Epithelial Biology, Stanford School of Medicine, Stanford, CA 94305, USA
- Center for Personal Dynamic Regulomes, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Ziyang Gao
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| |
Collapse
|
3
|
Gao Z, Jiang C, Zhang J, Jiang X, Li L, Zhao P, Yang H, Huang Y, Li J. Hierarchical graph learning for protein-protein interaction. Nat Commun 2023; 14:1093. [PMID: 36841846 PMCID: PMC9968329 DOI: 10.1038/s41467-023-36736-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 02/14/2023] [Indexed: 02/27/2023] Open
Abstract
Protein-Protein Interactions (PPIs) are fundamental means of functions and signalings in biological systems. The massive growth in demand and cost associated with experimental PPI studies calls for computational tools for automated prediction and understanding of PPIs. Despite recent progress, in silico methods remain inadequate in modeling the natural PPI hierarchy. Here we present a double-viewed hierarchical graph learning model, HIGH-PPI, to predict PPIs and extrapolate the molecular details involved. In this model, we create a hierarchical graph, in which a node in the PPI network (top outside-of-protein view) is a protein graph (bottom inside-of-protein view). In the bottom view, a group of chemically relevant descriptors, instead of the protein sequences, are used to better capture the structure-function relationship of the protein. HIGH-PPI examines both outside-of-protein and inside-of-protein of the human interactome to establish a robust machine understanding of PPIs. This model demonstrates high accuracy and robustness in predicting PPIs. Moreover, HIGH-PPI can interpret the modes of action of PPIs by identifying important binding and catalytic sites precisely. Overall, "HIGH-PPI [ https://github.com/zqgao22/HIGH-PPI ]" is a domain-knowledge-driven and interpretable framework for PPI prediction studies.
Collapse
Affiliation(s)
- Ziqi Gao
- Data Science and Analytics, The Hong Kong University of Science and Technology, Guangzhou, 511400, China.,Division of Emerging Interdisciplinary Areas, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Chenran Jiang
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China
| | - Jiawen Zhang
- Data Science and Analytics, The Hong Kong University of Science and Technology, Guangzhou, 511400, China
| | - Xiaosen Jiang
- The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Lanqing Li
- AI Lab, Tencent, Shenzhen, 518000, China
| | | | - Huanming Yang
- The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Yong Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| | - Jia Li
- Data Science and Analytics, The Hong Kong University of Science and Technology, Guangzhou, 511400, China. .,Division of Emerging Interdisciplinary Areas, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
4
|
Vora DS, Kalakoti Y, Sundar D. Computational Methods and Deep Learning for Elucidating Protein Interaction Networks. Methods Mol Biol 2023; 2553:285-323. [PMID: 36227550 DOI: 10.1007/978-1-0716-2617-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein interactions play a critical role in all biological processes, but experimental identification of protein interactions is a time- and resource-intensive process. The advances in next-generation sequencing and multi-omics technologies have greatly benefited large-scale predictions of protein interactions using machine learning methods. A wide range of tools have been developed to predict protein-protein, protein-nucleic acid, and protein-drug interactions. Here, we discuss the applications, methods, and challenges faced when employing the various prediction methods. We also briefly describe ways to overcome the challenges and prospective future developments in the field of protein interaction biology.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Yogesh Kalakoti
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
- School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
| |
Collapse
|
5
|
Tang S, Gökbağ B, Fan K, Shao S, Huo Y, Wu X, Cheng L, Li L. Synthetic lethal gene pairs: Experimental approaches and predictive models. Front Genet 2022; 13:961611. [PMID: 36531238 PMCID: PMC9751344 DOI: 10.3389/fgene.2022.961611] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 11/07/2022] [Indexed: 03/27/2024] Open
Abstract
Synthetic lethality (SL) refers to a genetic interaction in which the simultaneous perturbation of two genes leads to cell or organism death, whereas viability is maintained when only one of the pair is altered. The experimental exploration of these pairs and predictive modeling in computational biology contribute to our understanding of cancer biology and the development of cancer therapies. We extensively reviewed experimental technologies, public data sources, and predictive models in the study of synthetic lethal gene pairs and herein detail biological assumptions, experimental data, statistical models, and computational schemes of various predictive models, speculate regarding their influence on individual sample- and population-based synthetic lethal interactions, discuss the pros and cons of existing SL data and models, and highlight potential research directions in SL discovery.
Collapse
Affiliation(s)
- Shan Tang
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Birkan Gökbağ
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Kunjie Fan
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Shuai Shao
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Yang Huo
- Indiana University, Bloomington, IN, United States
| | - Xue Wu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Lijun Cheng
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
6
|
Peel L, Peixoto TP, De Domenico M. Statistical inference links data and theory in network science. Nat Commun 2022; 13:6794. [PMID: 36357376 PMCID: PMC9649740 DOI: 10.1038/s41467-022-34267-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 10/18/2022] [Indexed: 11/11/2022] Open
Abstract
The number of network science applications across many different fields has been rapidly increasing. Surprisingly, the development of theory and domain-specific applications often occur in isolation, risking an effective disconnect between theoretical and methodological advances and the way network science is employed in practice. Here we address this risk constructively, discussing good practices to guarantee more successful applications and reproducible results. We endorse designing statistically grounded methodologies to address challenges in network science. This approach allows one to explain observational data in terms of generative models, naturally deal with intrinsic uncertainties, and strengthen the link between theory and applications. Theoretical models and structures recovered from measured data serve for analysis of complex networks. The authors discuss here existing gaps between theoretical methods and real-world applied networks, and potential ways to improve the interplay between theory and applications.
Collapse
|
7
|
Neal ZP. backbone: An R package to extract network backbones. PLoS One 2022; 17:e0269137. [PMID: 35639738 PMCID: PMC9154188 DOI: 10.1371/journal.pone.0269137] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/13/2022] [Indexed: 11/19/2022] Open
Abstract
Networks are useful for representing phenomena in a broad range of domains. Although their ability to represent complexity can be a virtue, it is sometimes useful to focus on a simplified network that contains only the most important edges: the backbone. This paper introduces and demonstrates a substantially expanded version of the backbone package for R, which now provides methods for extracting backbones from weighted networks, weighted bipartite projections, and unweighted networks. For each type of network, fully replicable code is presented first for small toy examples, then for complete empirical examples using transportation, political, and social networks. The paper also demonstrates the implications of several issues of statistical inference that arise in backbone extraction. It concludes by briefly reviewing existing applications of backbone extraction using the backbone package, and future directions for research on network backbone extraction.
Collapse
Affiliation(s)
- Zachary P. Neal
- Psychology Department, Michigan State University, East Lansing, MI, United States of America
| |
Collapse
|
8
|
Dunham B, Ganapathiraju MK. Benchmark Evaluation of Protein-Protein Interaction Prediction Algorithms. Molecules 2021; 27:41. [PMID: 35011283 PMCID: PMC8746451 DOI: 10.3390/molecules27010041] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 11/23/2021] [Indexed: 11/16/2022] Open
Abstract
Protein-protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on 'illogical' and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source.
Collapse
|
9
|
Das JK, Roy S, Guzzi PH. Analyzing host-viral interactome of SARS-CoV-2 for identifying vulnerable host proteins during COVID-19 pathogenesis. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2021; 93:104921. [PMID: 34004362 PMCID: PMC8123524 DOI: 10.1016/j.meegid.2021.104921] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 05/04/2021] [Accepted: 05/07/2021] [Indexed: 02/07/2023]
Abstract
The development of therapeutic targets for COVID-19 relies on understanding the molecular mechanism of pathogenesis. Identifying genes or proteins involved in the infection mechanism is the key to shedding light on the complex molecular mechanisms. The combined effort of many laboratories distributed throughout the world has produced protein and genetic interactions. We integrated available results and obtained a host protein-protein interaction network composed of 1432 human proteins. Next, we performed network centrality analysis to identify critical proteins in the derived network. Finally, we performed a functional enrichment analysis of central proteins. We observed that the identified proteins are primarily associated with several crucial pathways, including cellular process, signaling transduction, neurodegenerative diseases. We focused on the proteins that are involved in human respiratory tract diseases. We highlighted many potential therapeutic targets, including RBX1, HSPA5, ITCH, RAB7A, RAB5A, RAB8A, PSMC5, CAPZB, CANX, IGF2R, and HSPA1A, which are central and also associated with multiple diseases.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Department of Pediatrics, School of Medicine, Johns Hopkins University, MD, USA
| | - Swarup Roy
- Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, Gangtok, India,Corresponding authors
| | - Pietro Hiram Guzzi
- Department of Medical and Surgical Sciences, Magna Graecia University, Catanzaro, Italy,Corresponding authors
| |
Collapse
|
10
|
Zhao Z, Xu W, Chen A, Han Y, Xia S, Xiang C, Wang C, Jiao J, Wang H, Yuan X, Gu L. Protein functional module identification method combining topological features and gene expression data. BMC Genomics 2021; 22:423. [PMID: 34103008 PMCID: PMC8185953 DOI: 10.1186/s12864-021-07620-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 04/08/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The study of protein complexes and protein functional modules has become an important method to further understand the mechanism and organization of life activities. The clustering algorithms used to analyze the information contained in protein-protein interaction network are effective ways to explore the characteristics of protein functional modules. RESULTS This paper conducts an intensive study on the problems of low recognition efficiency and noise in the overlapping structure of protein functional modules, based on topological characteristics of PPI network. Developing a protein function module recognition method ECTG based on Topological Features and Gene expression data for Protein Complex Identification. CONCLUSIONS The algorithm can effectively remove the noise data reflected by calculating the topological structure characteristic values in the PPI network through the similarity of gene expression patterns, and also properly use the information hidden in the gene expression data. The experimental results show that the ECTG algorithm can detect protein functional modules better.
Collapse
Affiliation(s)
- Zihao Zhao
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Wenjun Xu
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Aiwen Chen
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Yueyue Han
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Shengrong Xia
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - ChuLei Xiang
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Chao Wang
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Jun Jiao
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Hui Wang
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Xiaohui Yuan
- Department of Computer Science and Engineering, University of North Texas, Denton, TX, 76203, United States
| | - Lichuan Gu
- School of Computer and Information, Anhui Agricultural University, Hefei, Anhui, 230036, China.
| |
Collapse
|
11
|
Overton IM, Sims AH, Owen JA, Heale BSE, Ford MJ, Lubbock ALR, Pairo-Castineira E, Essafi A. Functional Transcription Factor Target Networks Illuminate Control of Epithelial Remodelling. Cancers (Basel) 2020; 12:cancers12102823. [PMID: 33007944 PMCID: PMC7652213 DOI: 10.3390/cancers12102823] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 09/16/2020] [Accepted: 09/24/2020] [Indexed: 12/15/2022] Open
Abstract
Cell identity is governed by gene expression, regulated by transcription factor (TF) binding at cis-regulatory modules. Decoding the relationship between TF binding patterns and gene regulation is nontrivial, remaining a fundamental limitation in understanding cell decision-making. We developed the NetNC software to predict functionally active regulation of TF targets; demonstrated on nine datasets for the TFs Snail, Twist, and modENCODE Highly Occupied Target (HOT) regions. Snail and Twist are canonical drivers of epithelial to mesenchymal transition (EMT), a cell programme important in development, tumour progression and fibrosis. Predicted "neutral" (non-functional) TF binding always accounted for the majority (50% to 95%) of candidate target genes from statistically significant peaks and HOT regions had higher functional binding than most of the Snail and Twist datasets examined. Our results illuminated conserved gene networks that control epithelial plasticity in development and disease. We identified new gene functions and network modules including crosstalk with notch signalling and regulation of chromatin organisation, evidencing networks that reshape Waddington's epigenetic landscape during epithelial remodelling. Expression of orthologous functional TF targets discriminated breast cancer molecular subtypes and predicted novel tumour biology, with implications for precision medicine. Predicted invasion roles were validated using a tractable cell model, supporting our approach.
Collapse
Affiliation(s)
- Ian M. Overton
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
- Department of Systems Biology, Harvard University, Boston, MA 02115, USA;
- Centre for Synthetic and Systems Biology (SynthSys), University of Edinburgh, Edinburgh EH9 3BF, UK
- Patrick G Johnston Centre for Cancer Research, Queen’s University Belfast, Belfast BT9 7AE, UK
- Correspondence:
| | - Andrew H. Sims
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Jeremy A. Owen
- Department of Systems Biology, Harvard University, Boston, MA 02115, USA;
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Bret S. E. Heale
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Matthew J. Ford
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Alexander L. R. Lubbock
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Erola Pairo-Castineira
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Abdelkader Essafi
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| |
Collapse
|
12
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
13
|
Zhang G, Zhang W. Direct protein-protein interaction network for insecticide resistance based on subcellular localization analysis in Drosophila melanogaster. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART. B, PESTICIDES, FOOD CONTAMINANTS, AND AGRICULTURAL WASTES 2020; 55:732-748. [PMID: 32567974 DOI: 10.1080/03601234.2020.1782114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In present study, we constructed the direct protein-protein interaction network of insecticide resistance based on subcellular localization analysis. Totally 177 of 528 resistance proteins were identified and they were located in 11 subcellular localizations. We further analyzed topological properties of the network and the biological characteristics of resistance proteins, such as k-core, neighborhood connectivity, instability index and aliphatic index. They can be used to predict the key proteins and potential mechanisms from macro-perspective. The problem of resistance has not been solved fundamentally, because the development of new insecticides can't keep pace with the development speed of resistance, and the lack of understanding of molecular mechanism of resistance. As the further analysis to reduce data noise, we constructed the direct protein-protein interaction network of insecticide resistance based on subcellular localization analysis. The interaction between proteins located at the same subcellular location belongs to direct interactions, thus eliminating indirect interaction. Totally 177 of 528 resistance proteins were identified and they were located in 11 subcellular localizations. We further analyzed topological properties of the network and the biological characteristics of resistance proteins, such as k-core, neighborhood connectivity, instability index and aliphatic index. They can be used to predict the hub proteins and potential mechanisms from macro-perspective. This is the first study to explore the insecticide resistance molecular mechanism of Drosophila melanogaster based on subcellular localization analysis. It can provide the bioinformatics foundation for further understanding the mechanisms of insecticide resistance. It also provides a reference for the study of molecular mechanism of insecticide resistance of other insects.
Collapse
Affiliation(s)
- Guilu Zhang
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Wenjun Zhang
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
14
|
Li Y, Sun C, Li P, Zhao Y, Mensah GK, Xu Y, Guo H, Chen J. Hypernetwork Construction and Feature Fusion Analysis Based on Sparse Group Lasso Method on fMRI Dataset. Front Neurosci 2020; 14:60. [PMID: 32116508 PMCID: PMC7029661 DOI: 10.3389/fnins.2020.00060] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 01/15/2020] [Indexed: 01/21/2023] Open
Abstract
Recent works have shown that the resting-state brain functional connectivity hypernetwork, where multiple nodes can be connected, are an effective technique for brain disease diagnosis and classification research. The lasso method was used to construct hypernetworks by solving sparse linear regression models in previous research. But, constructing a hypernetwork based on the lasso method simply selects a single variable, in that it lacks the ability to interpret the grouping effect. Considering the group structure problem, the previous study proposed to create a hypernetwork based on the elastic net and the group lasso methods, and the results showed that the former method had the best classification performance. However, the highly correlated variables selected by the elastic net method were not necessarily in the active set in the group. Therefore, we extended our research to address this issue. Herein, we propose a new method that introduces the sparse group lasso method to improve the construction of the hypernetwork by solving the group structure problem of the brain regions. We used the traditional lasso, group lasso method, and sparse group lasso method to construct a hypernetwork in patients with depression and normal subjects. Meanwhile, other clustering coefficients (clustering coefficients based on pairs of nodes) were also introduced to extract features with traditional clustering coefficients. Two types of features with significant differences obtained after feature selection were subjected to multi-kernel learning for feature fusion and classification using each method, respectively. The network topology results revealed differences among the three networks, where hypernetwork using the lasso method was the strictest; the group lasso, most lenient; and the sgLasso method, moderate. The network topology of the sparse group lasso method was similar to that of the group lasso method but different from the lasso method. The classification results show that the sparse group lasso method achieves the best classification accuracy by using multi-kernel learning, which indicates that better classification performance can be achieved when the group structure exists and is properly extended.
Collapse
Affiliation(s)
- Yao Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Chao Sun
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Pengzu Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Yunpeng Zhao
- College of Arts, Taiyuan University of Technology, Taiyuan, China
| | - Godfred Kim Mensah
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Yong Xu
- Department of Psychiatry, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Hao Guo
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Junjie Chen
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| |
Collapse
|
15
|
Gupta SK, Srivastava M, Osmanoglu Ö, Dandekar T. Genome-wide inference of the Camponotus floridanus protein-protein interaction network using homologous mapping and interacting domain profile pairs. Sci Rep 2020; 10:2334. [PMID: 32047225 PMCID: PMC7012867 DOI: 10.1038/s41598-020-59344-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 01/22/2020] [Indexed: 12/18/2022] Open
Abstract
Apart from some model organisms, the interactome of most organisms is largely unidentified. High-throughput experimental techniques to determine protein-protein interactions (PPIs) are resource intensive and highly susceptible to noise. Computational methods of PPI determination can accelerate biological discovery by identifying the most promising interacting pairs of proteins and by assessing the reliability of identified PPIs. Here we present a first in-depth study describing a global view of the ant Camponotus floridanus interactome. Although several ant genomes have been sequenced in the last eight years, studies exploring and investigating PPIs in ants are lacking. Our study attempts to fill this gap and the presented interactome will also serve as a template for determining PPIs in other ants in future. Our C. floridanus interactome covers 51,866 non-redundant PPIs among 6,274 proteins, including 20,544 interactions supported by domain-domain interactions (DDIs), 13,640 interactions supported by DDIs and subcellular localization, and 10,834 high confidence interactions mediated by 3,289 proteins. These interactions involve and cover 30.6% of the entire C. floridanus proteome.
Collapse
Affiliation(s)
- Shishir K Gupta
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, Am Hubland, D-97074, Würzburg, Germany.,Department of Microbiology, Biocenter, Am Hubland, D-97074, Würzburg, Germany
| | - Mugdha Srivastava
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, Am Hubland, D-97074, Würzburg, Germany
| | - Özge Osmanoglu
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, Am Hubland, D-97074, Würzburg, Germany
| | - Thomas Dandekar
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, Am Hubland, D-97074, Würzburg, Germany. .,EMBL Heidelberg, BioComputing Unit, Meyerhofstraße 1, 69117, Heidelberg, Germany.
| |
Collapse
|
16
|
Zhang J, Pham VVH, Liu L, Xu T, Truong B, Li J, Rao N, Le TD. Identifying miRNA synergism using multiple-intervention causal inference. BMC Bioinformatics 2019; 20:613. [PMID: 31881825 PMCID: PMC6933624 DOI: 10.1186/s12859-019-3215-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 11/12/2019] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Studying multiple microRNAs (miRNAs) synergism in gene regulation could help to understand the regulatory mechanisms of complicated human diseases caused by miRNAs. Several existing methods have been presented to infer miRNA synergism. Most of the current methods assume that miRNAs with shared targets at the sequence level are working synergistically. However, it is unclear if miRNAs with shared targets are working in concert to regulate the targets or they individually regulate the targets at different time points or different biological processes. A standard method to test the synergistic activities is to knock-down multiple miRNAs at the same time and measure the changes in the target genes. However, this approach may not be practical as we would have too many sets of miRNAs to test. RESULTS n this paper, we present a novel framework called miRsyn for inferring miRNA synergism by using a causal inference method that mimics the multiple-intervention experiments, e.g. knocking-down multiple miRNAs, with observational data. Our results show that several miRNA-miRNA pairs that have shared targets at the sequence level are not working synergistically at the expression level. Moreover, the identified miRNA synergistic network is small-world and biologically meaningful, and a number of miRNA synergistic modules are significantly enriched in breast cancer. Our further analyses also reveal that most of synergistic miRNA-miRNA pairs show the same expression patterns. The comparison results indicate that the proposed multiple-intervention causal inference method performs better than the single-intervention causal inference method in identifying miRNA synergistic network. CONCLUSIONS Taken together, the results imply that miRsyn is a promising framework for identifying miRNA synergism, and it could enhance the understanding of miRNA synergism in breast cancer.
Collapse
Affiliation(s)
- Junpeng Zhang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, Sichuan, China.,School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Vu Viet Hoang Pham
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Taosheng Xu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Buu Truong
- Pham Ngoc Thach University of Medicine, Ho Chi Minh, Vietnam
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Nini Rao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, Sichuan, China.
| | - Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia.
| |
Collapse
|
17
|
Wang R, Liu G, Wang C. Identifying protein complexes based on an edge weight algorithm and core-attachment structure. BMC Bioinformatics 2019; 20:471. [PMID: 31521132 PMCID: PMC6744658 DOI: 10.1186/s12859-019-3007-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/26/2019] [Indexed: 02/02/2023] Open
Abstract
Background Protein complex identification from protein-protein interaction (PPI) networks is crucial for understanding cellular organization principles and functional mechanisms. In recent decades, numerous computational methods have been proposed to identify protein complexes. However, most of the current state-of-the-art studies still have some challenges to resolve, including their high false-positives rates, incapability of identifying overlapping complexes, lack of consideration for the inherent organization within protein complexes, and absence of some biological attachment proteins. Results In this paper, to overcome these limitations, we present a protein complex identification method based on an edge weight method and core-attachment structure (EWCA) which consists of a complex core and some sparse attachment proteins. First, we propose a new weighting method to assess the reliability of interactions. Second, we identify protein complex cores by using the structural similarity between a seed and its direct neighbors. Third, we introduce a new method to detect attachment proteins that is able to distinguish and identify peripheral proteins and overlapping proteins. Finally, we bind attachment proteins to their corresponding complex cores to form protein complexes and discard redundant protein complexes. The experimental results indicate that EWCA outperforms existing state-of-the-art methods in terms of both accuracy and p-value. Furthermore, EWCA could identify many more protein complexes with statistical significance. Additionally, EWCA could have better balance accuracy and efficiency than some state-of-the-art methods with high accuracy. Conclusions In summary, EWCA has better performance for protein complex identification by a comprehensive comparison with twelve algorithms in terms of different evaluation metrics. The datasets and software are freely available for academic research at https://github.com/RongquanWang/EWCA.
Collapse
Affiliation(s)
- Rongquan Wang
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China. .,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, 24 Zhanlanguan Road, Xicheng District, Beijing, 100037, China
| |
Collapse
|
18
|
Bin Jang H, Bolduc B, Zablocki O, Kuhn JH, Roux S, Adriaenssens EM, Brister JR, Kropinski AM, Krupovic M, Lavigne R, Turner D, Sullivan MB. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat Biotechnol 2019; 37:632-639. [PMID: 31061483 DOI: 10.1038/s41587-019-0100-8] [Citation(s) in RCA: 460] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 03/11/2019] [Indexed: 01/03/2023]
Abstract
Microbiomes from every environment contain a myriad of uncultivated archaeal and bacterial viruses, but studying these viruses is hampered by the lack of a universal, scalable taxonomic framework. We present vConTACT v.2.0, a network-based application utilizing whole genome gene-sharing profiles for virus taxonomy that integrates distance-based hierarchical clustering and confidence scores for all taxonomic predictions. We report near-identical (96%) replication of existing genus-level viral taxonomy assignments from the International Committee on Taxonomy of Viruses for National Center for Biotechnology Information virus RefSeq. Application of vConTACT v.2.0 to 1,364 previously unclassified viruses deposited in virus RefSeq as reference genomes produced automatic, high-confidence genus assignments for 820 of the 1,364. We applied vConTACT v.2.0 to analyze 15,280 Global Ocean Virome genome fragments and were able to provide taxonomic assignments for 31% of these data, which shows that our algorithm is scalable to very large metagenomic datasets. Our taxonomy tool can be automated and applied to metagenomes from any environment for virus classification.
Collapse
Affiliation(s)
- Ho Bin Jang
- Department of Microbiology, Ohio State University, Columbus, OH, USA
| | - Benjamin Bolduc
- Department of Microbiology, Ohio State University, Columbus, OH, USA
| | - Olivier Zablocki
- Department of Microbiology, Ohio State University, Columbus, OH, USA
| | - Jens H Kuhn
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD, USA
| | - Simon Roux
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Evelien M Adriaenssens
- Institute of Integrative Biology, University of Liverpool, Liverpool, UK.,Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Andrew M Kropinski
- Department of Pathobiology, Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada.,Department of Food Science, University of Guelph, Guelph, Ontario, Canada
| | - Mart Krupovic
- Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Institut Pasteur, Paris, France
| | - Rob Lavigne
- Laboratory of Gene Technology, Department of Biosystems, Faculty of BioScience Engineering, KU Leuven, Leuven, Belgium
| | - Dann Turner
- Centre for Research in Biosciences, Department of Applied Sciences, Faculty of Health and Applied Sciences, University of the West of England, Bristol, UK
| | - Matthew B Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, USA. .,Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, USA.
| |
Collapse
|
19
|
Kovács IA, Luck K, Spirohn K, Wang Y, Pollis C, Schlabach S, Bian W, Kim DK, Kishore N, Hao T, Calderwood MA, Vidal M, Barabási AL. Network-based prediction of protein interactions. Nat Commun 2019; 10:1240. [PMID: 30886144 PMCID: PMC6423278 DOI: 10.1038/s41467-019-09177-y] [Citation(s) in RCA: 161] [Impact Index Per Article: 32.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 02/22/2019] [Indexed: 12/15/2022] Open
Abstract
Despite exceptional experimental efforts to map out the human interactome, the continued data incompleteness limits our ability to understand the molecular roots of human disease. Computational tools offer a promising alternative, helping identify biologically significant, yet unmapped protein-protein interactions (PPIs). While link prediction methods connect proteins on the basis of biological or network-based similarity, interacting proteins are not necessarily similar and similar proteins do not necessarily interact. Here, we offer structural and evolutionary evidence that proteins interact not if they are similar to each other, but if one of them is similar to the other's partners. This approach, that mathematically relies on network paths of length three (L3), significantly outperforms all existing link prediction methods. Given its high accuracy, we show that L3 can offer mechanistic insights into disease mechanisms and can complement future experimental efforts to complete the human interactome.
Collapse
Affiliation(s)
- István A Kovács
- Network Science Institute and Department of Physics, Northeastern University, Boston, MA, 02115, USA.
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA.
- Wigner Research Centre for Physics, Institute for Solid State Physics and Optics, H-1525, Budapest, P.O.Box 49, Hungary.
| | - Katja Luck
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Yang Wang
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Carl Pollis
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Sadie Schlabach
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Wenting Bian
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Dae-Kyum Kim
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Donnelly Centre, Toronto, Ontario, Canada, Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Nishka Kishore
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Donnelly Centre, Toronto, Ontario, Canada, Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Albert-László Barabási
- Network Science Institute and Department of Physics, Northeastern University, Boston, MA, 02115, USA.
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA.
- Division of Network Medicine and Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Department of Network and Data Science, Central European University, Budapest, H-1051, Hungary.
| |
Collapse
|
20
|
Xu B, Guan J, Wang Y, Wang Z. Essential Protein Detection by Random Walk on Weighted Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:377-387. [PMID: 28504946 DOI: 10.1109/tcbb.2017.2701824] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Essential proteins are critical to the development and survival of cells. Identification of essential proteins is helpful for understanding the minimal set of required genes in a living cell and for designing new drugs. To detect essential proteins, various computational methods have been proposed based on protein-protein interaction (PPI) networks. However, protein interaction data obtained by high-throughput experiments usually contain high false positives, which negatively impacts the accuracy of essential protein detection. Moreover, most existing studies focused on the local information of proteins in PPI networks, while ignoring the influence of indirect protein interactions on essentiality. In this paper, we propose a novel method, called Essentiality Ranking (EssRank in short), to boost the accuracy of essential protein detection. To deal with the inaccuracy of PPI data, confidence scores of interactions are evaluated by integrating various biological information. Weighted edge clustering coefficient (WECC), considering both interaction confidence scores and network topology, is proposed to calculate edge weights in PPI networks. The weight of each node is evaluated by the sum of WECC values of its linking edges. A random walk method, making use of both direct and indirect protein interactions, is then employed to calculate protein essentiality iteratively. Experimental results on the yeast PPI network show that EssRank outperforms most existing methods, including the most commonly-used centrality measures (SC, DC, BC, CC, IC, and EC), topology based methods (DMNC and NC) and the data integrating method IEW.
Collapse
|
21
|
Cuesta-Astroz Y, Santos A, Oliveira G, Jensen LJ. Analysis of Predicted Host-Parasite Interactomes Reveals Commonalities and Specificities Related to Parasitic Lifestyle and Tissues Tropism. Front Immunol 2019; 10:212. [PMID: 30815000 PMCID: PMC6381214 DOI: 10.3389/fimmu.2019.00212] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 01/24/2019] [Indexed: 01/03/2023] Open
Abstract
The study of molecular host–parasite interactions is essential to understand parasitic infection and adaptation within the host system. As well, prevention and treatment of infectious diseases require a clear understanding of the molecular crosstalk between parasites and their hosts. Yet, large-scale experimental identification of host–parasite molecular interactions remains challenging, and the use of computational predictions becomes then necessary. Here, we propose a computational integrative approach to predict host—parasite protein—protein interaction (PPI) networks resulting from the human infection by 15 different eukaryotic parasites. We used an orthology-based approach to transfer high-confidence intraspecies interactions obtained from the STRING database to the corresponding interspecies homolog protein pairs in the host–parasite system. Our approach uses either the parasites predicted secretome and membrane proteins, or only the secretome, depending on whether they are uni- or multi-cellular, respectively, to reduce the number of false predictions. Moreover, the host proteome is filtered for proteins expressed in selected cellular localizations and tissues supporting the parasite growth. We evaluated the inferred interactions by analyzing the enriched biological processes and pathways in the predicted networks and their association with known parasitic invasion and evasion mechanisms. The resulting PPI networks were compared across parasites to identify common mechanisms that may define a global pathogenic hallmark. We also provided a study case focusing on a closer examination of the human–S. mansoni predicted interactome, detecting central proteins that have relevant roles in the human–S. mansoni network, and identifying tissue-specific interactions with key roles in the life cycle of the parasite. The predicted PPI networks can be visualized and downloaded at http://orthohpi.jensenlab.org.
Collapse
Affiliation(s)
- Yesid Cuesta-Astroz
- Instituto René Rachou, Fundação Oswaldo Cruz - FIOCRUZ, Belo Horizonte, Brazil
| | - Alberto Santos
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
22
|
Williams N, Arnulfo G, Wang SH, Nobili L, Palva S, Palva JM. Comparison of Methods to Identify Modules in Noisy or Incomplete Brain Networks. Brain Connect 2018; 9:128-143. [PMID: 30543117 DOI: 10.1089/brain.2018.0603] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Community structure, or "modularity," is a fundamentally important aspect in the organization of structural and functional brain networks, but their identification with community detection methods is confounded by noisy or missing connections. Although several methods have been used to account for missing data, the performance of these methods has not been compared quantitatively so far. In this study, we compared four different approaches to account for missing connections when identifying modules in binary and weighted networks using both Louvain and Infomap community detection algorithms. The four methods are "zeros," "row-column mean," "common neighbors," and "consensus clustering." Using Lancichinetti-Fortunato-Radicchi benchmark-simulated binary and weighted networks, we find that "zeros," "row-column mean," and "common neighbors" approaches perform well with both Louvain and Infomap, whereas "consensus clustering" performs well with Louvain but not Infomap. A similar pattern of results was observed with empirical networks from stereotactical electroencephalography data, except that "consensus clustering" outperforms other approaches on weighted networks with Louvain. Based on these results, we recommend any of the four methods when using Louvain on binary networks, whereas "consensus clustering" is superior with Louvain clustering of weighted networks. When using Infomap, "zeros" or "common neighbors" should be used for both binary and weighted networks. These findings provide a basis to accounting for noisy or missing connections when identifying modules in brain networks.
Collapse
Affiliation(s)
- Nitin Williams
- 1 Neuroscience Center, Helsinki Institute of Life Science, University of Helsinki, Finland
| | - Gabriele Arnulfo
- 1 Neuroscience Center, Helsinki Institute of Life Science, University of Helsinki, Finland.,2 Department of Informatics, Bioengineering, Robotics and System Engineering, University of Genoa, Genoa, Italy
| | - Sheng H Wang
- 1 Neuroscience Center, Helsinki Institute of Life Science, University of Helsinki, Finland.,3 Doctoral Programme Brain & Mind, University of Helsinki, Finland
| | - Lino Nobili
- 4 Claudio Munari Epilepsy Surgery Centre, Niguarda Hospital, Milan, Italy.,5 Child Neuropsychiatry, IRCCS, Gaslini Institute, DINOGMI, University of Genoa, Genoa, Italy
| | - Satu Palva
- 1 Neuroscience Center, Helsinki Institute of Life Science, University of Helsinki, Finland.,6 BioMag laboratory, HUS Medical Imaging Center, Helsinki, Finland.,7 Center for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom
| | - J Matias Palva
- 1 Neuroscience Center, Helsinki Institute of Life Science, University of Helsinki, Finland.,7 Center for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom
| |
Collapse
|
23
|
Barel G, Herwig R. Network and Pathway Analysis of Toxicogenomics Data. Front Genet 2018; 9:484. [PMID: 30405693 PMCID: PMC6204403 DOI: 10.3389/fgene.2018.00484] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 09/28/2018] [Indexed: 12/20/2022] Open
Abstract
Toxicogenomics is the study of the molecular effects of chemical, biological and physical agents in biological systems, with the aim of elucidating toxicological mechanisms, building predictive models and improving diagnostics. The vast majority of toxicogenomics data has been generated at the transcriptome level, including RNA-seq and microarrays, and large quantities of drug-treatment data have been made publicly available through databases and repositories. Besides the identification of differentially expressed genes (DEGs) from case-control studies or drug treatment time series studies, bioinformatics methods have emerged that infer gene expression data at the molecular network and pathway level in order to reveal mechanistic information. In this work we describe different resources and tools that have been developed by us and others that relate gene expression measurements with known pathway information such as over-representation and gene set enrichment analyses. Furthermore, we highlight approaches that integrate gene expression data with molecular interaction networks in order to derive network modules related to drug toxicity. We describe the two main parts of the approach, i.e., the construction of a suitable molecular interaction network as well as the conduction of network propagation of the experimental data through the interaction network. In all cases we apply methods and tools to publicly available rat in vivo data on anthracyclines, an important class of anti-cancer drugs that are known to induce severe cardiotoxicity in patients. We report the results and functional implications achieved for four anthracyclines (doxorubicin, epirubicin, idarubicin, and daunorubicin) and compare the information content inherent in the different computational approaches.
Collapse
Affiliation(s)
| | - Ralf Herwig
- Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
24
|
Hou P, Cai J, Qu S, Xu M. Estimating Missing Unit Process Data in Life Cycle Assessment Using a Similarity-Based Approach. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2018; 52:5259-5267. [PMID: 29601197 DOI: 10.1021/acs.est.7b05366] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In life cycle assessment (LCA), collecting unit process data from the empirical sources (i.e., meter readings, operation logs/journals) is often costly and time-consuming. We propose a new computational approach to estimate missing unit process data solely relying on limited known data based on a similarity-based link prediction method. The intuition is that similar processes in a unit process network tend to have similar material/energy inputs and waste/emission outputs. We use the ecoinvent 3.1 unit process data sets to test our method in four steps: (1) dividing the data sets into a training set and a test set; (2) randomly removing certain numbers of data in the test set indicated as missing; (3) using similarity-weighted means of various numbers of most similar processes in the training set to estimate the missing data in the test set; and (4) comparing estimated data with the original values to determine the performance of the estimation. The results show that missing data can be accurately estimated when less than 5% data are missing in one process. The estimation performance decreases as the percentage of missing data increases. This study provides a new approach to compile unit process data and demonstrates a promising potential of using computational approaches for LCA data compilation.
Collapse
Affiliation(s)
- Ping Hou
- School for Environment and Sustainability , University of Michigan , Ann Arbor , Michigan 48109 , United States
- Michigan Institute for Computational Discovery and Engineering , University of Michigan , Ann Arbor , Michigan 48104 , United States
| | - Jiarui Cai
- School for Environment and Sustainability , University of Michigan , Ann Arbor , Michigan 48109 , United States
| | - Shen Qu
- School for Environment and Sustainability , University of Michigan , Ann Arbor , Michigan 48109 , United States
| | - Ming Xu
- School for Environment and Sustainability , University of Michigan , Ann Arbor , Michigan 48109 , United States
- Department of Civil and Environmental Engineering , University of Michigan , Ann Arbor , Michigan 48109 , United States
| |
Collapse
|
25
|
Aghabozorgi F, Reza Khayyambashi M. A new study of using temporality and weights to improve similarity measures for link prediction of social networks. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-17770] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
26
|
Abstract
The knowledge of protein-protein interactions (PPIs) and PPI networks (PPINs) is the key to starting to understand the biological processes inside the cell. Many computational tools have been designed to help explore PPIs and PPINs, such as those for interaction detection, reliability assessment and interaction network construction. Here, the application of computational tools is reviewed from three perspectives: PPI database construction, PPI prediction, and interaction network construction and analysis. This overview will provide researchers guidance on choosing appropriate methods for exploring PPIs.
Collapse
Affiliation(s)
- Shaowei Dong
- Department of Cell and System Biology, Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada
| | - Nicholas J Provart
- Department of Cell and System Biology, Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
27
|
Hardt C, Bauer C, Schuchhardt J, Herwig R. Computational Network Analysis for Drug Toxicity Prediction. Methods Mol Biol 2018; 1819:335-355. [PMID: 30421412 DOI: 10.1007/978-1-4939-8618-7_16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The computational prediction of compound effects from molecular data is an important task in hazard and risk assessment and pivotal for judging the safety of any drug, chemical or cosmetic compound. In particular, the identification of such compound effects at the level of molecular interaction networks can be helpful for the construction of adverse outcome pathways (AOPs). AOPs emerged as a guiding concept for toxicity prediction, because of the inherent mechanistic information of such networks. In fact, integrating molecular interactions in transcriptome analysis and observing expression changes in closely interacting genes might allow identifying the key molecular initiating events of compound toxicity.In this work we describe a computational approach that is suitable for the identification of such network modules from transcriptomics data, which is the major molecular readout of toxicogenomics studies. The approach is composed of different tools (1) for primary data analysis, i.e., the biostatistical quantification of the gene expression changes, (2) for functional annotation and prioritization of genes using literature mining, as well as (3) for the construction of an interaction network that consists of interactions with high confidence and the identification of predictive modules from these networks. We describe the different steps of the approach and demonstrate its performance with public data on drugs that induce hepatic and cardiac toxicity.
Collapse
Affiliation(s)
- C Hardt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73, D-14195, Berlin, Germany
| | - C Bauer
- MicroDiscovery GmbH, Marienburgerstr. 1, D-10405, Berlin, Germany
| | - J Schuchhardt
- MicroDiscovery GmbH, Marienburgerstr. 1, D-10405, Berlin, Germany
| | - R Herwig
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73, D-14195, Berlin, Germany.
| |
Collapse
|
28
|
Abstract
Two-hybrid methods remain among the most preferred choices for detecting protein-protein interactions (PPIs) and much of the PPI data in databases have been produced using yeast two-hybrid (Y2H) screens. The Y2H methods are extensively used to detect PPIs because of their scalability and accessibility. Several variants of Y2H methods have been developed and used by different research groups, increasing the accessibility of these methods and their applications in detecting different types of PPIs. However, the availability of variations on the same core methodology emphasizes the need to have a systematic comparison of available Y2H methods in the context of their applicability, coverage and efficiency. In this chapter, we discuss the key parameters of Y2H methods, namely proteins of interest, vectors, libraries, screening strategies, data analysis, and provide a flowchart that should help to decide which Y2H strategy is most appropriate for a protein interaction screen.
Collapse
|
29
|
Peng X, Wang J, Peng W, Wu FX, Pan Y. Protein-protein interactions: detection, reliability assessment and applications. Brief Bioinform 2017; 18:798-819. [PMID: 27444371 DOI: 10.1093/bib/bbw066] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Indexed: 01/06/2023] Open
Abstract
Protein-protein interactions (PPIs) participate in all important biological processes in living organisms, such as catalyzing metabolic reactions, DNA replication, DNA transcription, responding to stimuli and transporting molecules from one location to another. To reveal the function mechanisms in cells, it is important to identify PPIs that take place in the living organism. A large number of PPIs have been discovered by high-throughput experiments and computational methods. However, false-positive PPIs have been introduced too. Therefore, to obtain reliable PPIs, many computational methods have been proposed. Generally, these methods can be classified into two categories. One category includes the methods that are designed to determine new reliable PPIs. The other one is designed to assess the reliability of existing PPIs and filter out the unreliable ones. In this article, we review the two kinds of methods for detecting reliable PPIs, and then focus on evaluating the performance of some of these typical methods. Later on, we also enumerate several PPI network-based applications with taking a reliability assessment of the PPI data into consideration. Finally, we will discuss the challenges for obtaining reliable PPIs and future directions of the construction of reliable PPI networks. Our research will provide readers some guidance for choosing appropriate methods and features for obtaining reliable PPIs.
Collapse
|
30
|
Kotlyar M, Rossos AEM, Jurisica I. Prediction of Protein-Protein Interactions. ACTA ACUST UNITED AC 2017; 60:8.2.1-8.2.14. [PMID: 29220074 DOI: 10.1002/cpbi.38] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The authors provide an overview of physical protein-protein interaction prediction, covering the main strategies for predicting interactions, approaches for assessing predictions, and online resources for accessing predictions. This unit focuses on the main advancements in each of these areas over the last decade. The methods and resources that are presented here are not an exhaustive set, but characterize the current state of the field-highlighting key challenges and achievements. © 2017 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Max Kotlyar
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Andrea E M Rossos
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Igor Jurisica
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.,Departments of Medical Biophysics and Computer Science, University of Toronto, Ontario, Canada.,Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| |
Collapse
|
31
|
Ryan CJ, Kennedy S, Bajrami I, Matallanas D, Lord CJ. A Compendium of Co-regulated Protein Complexes in Breast Cancer Reveals Collateral Loss Events. Cell Syst 2017; 5:399-409.e5. [PMID: 29032073 PMCID: PMC5660599 DOI: 10.1016/j.cels.2017.09.011] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 07/31/2017] [Accepted: 09/18/2017] [Indexed: 12/19/2022]
Abstract
Protein complexes are responsible for the bulk of activities within the cell, but how their behavior and abundance varies across tumors remains poorly understood. By combining proteomic profiles of breast tumors with a large-scale protein-protein interaction network, we have identified a set of 285 high-confidence protein complexes whose subunits have highly correlated protein abundance across tumor samples. We used this set to identify complexes that are reproducibly under- or overexpressed in specific breast cancer subtypes. We found that mutation or deletion of one subunit of a co-regulated complex was often associated with a collateral reduction in protein expression of additional complex members. This collateral loss phenomenon was typically evident from proteomic, but not transcriptomic, profiles, suggesting post-transcriptional control. Mutation of the tumor suppressor E-cadherin (CDH1) was associated with a collateral loss of members of the adherens junction complex, an effect we validated using an engineered model of E-cadherin loss.
Collapse
Affiliation(s)
- Colm J Ryan
- School of Computer Science, University College Dublin, Dublin 4, Ireland; Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland.
| | - Susan Kennedy
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Ilirjana Bajrami
- The Breast Cancer Now Toby Robins Breast Cancer Research Centre and CRUK Gene Function Laboratory, The Institute of Cancer Research, London SW3 6JB, UK
| | - David Matallanas
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Christopher J Lord
- The Breast Cancer Now Toby Robins Breast Cancer Research Centre and CRUK Gene Function Laboratory, The Institute of Cancer Research, London SW3 6JB, UK
| |
Collapse
|
32
|
Frenkel-Morgenstern M, Gorohovski A, Tagore S, Sekar V, Vazquez M, Valencia A. ChiPPI: a novel method for mapping chimeric protein-protein interactions uncovers selection principles of protein fusion events in cancer. Nucleic Acids Res 2017; 45:7094-7105. [PMID: 28549153 PMCID: PMC5499553 DOI: 10.1093/nar/gkx423] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 05/07/2017] [Indexed: 12/20/2022] Open
Abstract
Fusion proteins, comprising peptides deriving from the translation of two parental genes, are produced in cancer by chromosomal aberrations. The expressed fusion protein incorporates domains of both parental proteins. Using a methodology that treats discrete protein domains as binding sites for specific domains of interacting proteins, we have cataloged the protein interaction networks for 11 528 cancer fusions (ChiTaRS-3.1). Here, we present our novel method, chimeric protein–protein interactions (ChiPPI) that uses the domain–domain co-occurrence scores in order to identify preserved interactors of chimeric proteins. Mapping the influence of fusion proteins on cell metabolism and pathways reveals that ChiPPI networks often lose tumor suppressor proteins and gain oncoproteins. Furthermore, fusions often induce novel connections between non-interactors skewing interaction networks and signaling pathways. We compared fusion protein PPI networks in leukemia/lymphoma, sarcoma and solid tumors finding distinct enrichment patterns for each disease type. While certain pathways are enriched in all three diseases (Wnt, Notch and TGF β), there are distinct patterns for leukemia (EGFR signaling, DNA replication and CCKR signaling), for sarcoma (p53 pathway and CCKR signaling) and solid tumors (FGFR and EGFR signaling). Thus, the ChiPPI method represents a comprehensive tool for studying the anomaly of skewed cellular networks produced by fusion proteins in cancer.
Collapse
Affiliation(s)
| | | | - Somnath Tagore
- Faculty of Medicine, Bar-Ilan-University, Henrietta Szold 8, Safed 1311502, Israel
| | - Vaishnovi Sekar
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), M.F.Almagro 3, 28029 Madrid, Spain
| | - Miguel Vazquez
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), M.F.Almagro 3, 28029 Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), M.F.Almagro 3, 28029 Madrid, Spain
| |
Collapse
|
33
|
Dutta P, Saha S. Fusion of expression values and protein interaction information using multi-objective optimization for improving gene clustering. Comput Biol Med 2017; 89:31-43. [PMID: 28783536 DOI: 10.1016/j.compbiomed.2017.07.015] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 07/28/2017] [Accepted: 07/28/2017] [Indexed: 11/29/2022]
Abstract
One of the crucial problems in the field of functional genomics is to identify a set of genes which are responsible for a particular cellular mechanism. The current work explores the usage of a multi-objective optimization based genetic clustering technique to classify genes into groups with respect to their functional similarities and biological relevance. Our contribution is two-fold: firstly a new quality measure to compute the goodness of gene-clusters namely protein-protein interaction confidence score is developed. This utilizes the confidence scores of the protein-protein interaction networks to measure the similarity between genes of a particular cluster with respect to their biochemical protein products. Secondly, a multi-objective based clustering approach is developed which intelligently uses integrated information of expression values of microarray dataset and protein-protein interaction confidence scores to select both statistically and biologically relevant genes. For that very purpose, some biological cluster validity indices, viz. biological homogeneity index and protein-protein interaction confidence score, along with two traditional internal cluster validity indices, viz. fuzzy partition coefficient and Pakhira-Bandyopadhyay-Maulik-index, are simultaneously optimized during the clustering process. Experimental results on three real-life gene expression datasets show that the addition of new objective capturing protein-protein interaction information aids in clustering the genes as compared to the existing techniques. The observations are further supported by biological and statistical significance tests.
Collapse
Affiliation(s)
- Pratik Dutta
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihar, India.
| | - Sriparna Saha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihar, India.
| |
Collapse
|
34
|
Liu Y, Yang Z, Du F, Yang Q, Hou J, Yan X, Geng Y, Zhao Y, Wang H. Molecular mechanisms of pathogenesis in hepatocellular carcinoma revealed by RNA‑sequencing. Mol Med Rep 2017; 16:6674-6682. [PMID: 28901494 PMCID: PMC5865798 DOI: 10.3892/mmr.2017.7457] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 02/22/2017] [Indexed: 12/23/2022] Open
Abstract
The present study aimed to explore the underlying molecular mechanisms of hepatocellular carcinoma (HCC). RNA‑sequencing profiles GSM629264 and GSM629265, from the GSE25599 data set, were downloaded from the Gene Expression Omnibus database and processed by quality evaluation. GSM629264 and GSM629265 were from HCC and adjacent non‑cancerous tissues, respectively. TopHat software was used for alignment analysis, followed by the detection of novel splicing sites. In addition, the Cufflinks software package was used to analyze gene expressions, and the Cuffdiff program was used to screen for differently expressed genes (DEGs) and differentially expressed splicing variants. Gene ontology functional enrichment and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of DEGs were also performed. Transcription factors (TFs) and microRNAs (miRNAs) that regulate DEGs were identified, and a protein‑protein interaction (PPI) network was constructed. The hub node in the PPI network was obtained, and the TFs and miRNAs that regulated the hub node were further predicted. The quality of the sequencing data met the standards for analysis, and the clean reads were ~65%. Most sequencing reads mapped into coding sequence exons (CDS_exons), whereas other reads mapped into exon 3' untranslated regions (UTR_Exons), 5'UTR_Exons and Introns. Upregulated and downregulated DEGs between HCC and adjacent non‑cancerous tissues were screened. Genes of differentially expressed splicing variants were identified, including vesicle‑associated membrane protein 4, phosphatidylinositol glycan anchor biosynthesis class C, protein disulfide isomerase family A member 4 and growth arrest specific 5. Screened DEGs were enriched in the complement pathway. In the PPI network, ubiquitin C (UBC) was the hub node. UBC was predicted to be regulated by several TFs, including specificity protein 1 (SP1), FBJ murine osteosarcoma viral oncogene homolog (FOS), proto‑oncogene c‑JUN (JUN), FOS‑like antigen 2 (FOSL2) and SWI/SNF‑related, matrix‑associated, actin‑dependent regulator of chromatin, subfamily A, member 4 (SMARCA4), and several miRNAs, including miR‑30 and miR‑181. Results from the present study demonstrated that UBC, SP1, FOS, JUN, FOSL2, SMARCA4, miR‑30 and miR‑181 may participate in the development of HCC.
Collapse
Affiliation(s)
- Yao Liu
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| | - Zhe Yang
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| | - Feng Du
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| | - Qiao Yang
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| | - Jie Hou
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| | - Xiaohong Yan
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| | - Yi Geng
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| | - Yaning Zhao
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| | - Hua Wang
- Department of Infectious Diseases, Baoji Municipal Central Hospital, Baoji, Shaanxi 721008, P.R. China
| |
Collapse
|
35
|
Meysman P, Titeca K, Eyckerman S, Tavernier J, Goethals B, Martens L, Valkenborg D, Laukens K. Protein complex analysis: From raw protein lists to protein interaction networks. MASS SPECTROMETRY REVIEWS 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Kevin Titeca
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- IBioStat, Hasselt University, Hasselt, Belgium
- CFP-CeProMa, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
36
|
Connor N, Barberán A, Clauset A. Using null models to infer microbial co-occurrence networks. PLoS One 2017; 12:e0176751. [PMID: 28493918 PMCID: PMC5426617 DOI: 10.1371/journal.pone.0176751] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Accepted: 04/12/2017] [Indexed: 12/21/2022] Open
Abstract
Although microbial communities are ubiquitous in nature, relatively little is known about the structural and functional roles of their constituent organisms' underlying interactions. A common approach to study such questions begins with extracting a network of statistically significant pairwise co-occurrences from a matrix of observed operational taxonomic unit (OTU) abundances across sites. The structure of this network is assumed to encode information about ecological interactions and processes, resistance to perturbation, and the identity of keystone species. However, common methods for identifying these pairwise interactions can contaminate the network with spurious patterns that obscure true ecological signals. Here, we describe this problem in detail and develop a solution that incorporates null models to distinguish ecological signals from statistical noise. We apply these methods to the initial OTU abundance matrix and to the extracted network. We demonstrate this approach by applying it to a large soil microbiome data set and show that many previously reported patterns for these data are statistical artifacts. In contrast, we find the frequency of three-way interactions among microbial OTUs to be highly statistically significant. These results demonstrate the importance of using appropriate null models when studying observational microbiome data, and suggest that extracting and characterizing three-way interactions among OTUs is a promising direction for unraveling the structure and function of microbial ecosystems.
Collapse
Affiliation(s)
- Nora Connor
- Department of Computer Science, University of Colorado, Boulder, Colorado, United States of America
- * E-mail:
| | - Albert Barberán
- Department of Soil, Water, and Environmental Science, University of Arizona, Tucson, Arizona, United States of America
| | - Aaron Clauset
- Department of Computer Science, University of Colorado, Boulder, Colorado, United States of America
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, United States of America
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
37
|
Statistically validated network of portfolio overlaps and systemic risk. Sci Rep 2016; 6:39467. [PMID: 28000764 PMCID: PMC5175158 DOI: 10.1038/srep39467] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 11/22/2016] [Indexed: 11/08/2022] Open
Abstract
Common asset holding by financial institutions (portfolio overlap) is nowadays regarded as an important channel for financial contagion with the potential to trigger fire sales and severe losses at the systemic level. We propose a method to assess the statistical significance of the overlap between heterogeneously diversified portfolios, which we use to build a validated network of financial institutions where links indicate potential contagion channels. The method is implemented on a historical database of institutional holdings ranging from 1999 to the end of 2013, but can be applied to any bipartite network. We find that the proportion of validated links (i.e. of significant overlaps) increased steadily before the 2007-2008 financial crisis and reached a maximum when the crisis occurred. We argue that the nature of this measure implies that systemic risk from fire sales liquidation was maximal at that time. After a sharp drop in 2008, systemic risk resumed its growth in 2009, with a notable acceleration in 2013. We finally show that market trends tend to be amplified in the portfolios identified by the algorithm, such that it is possible to have an informative signal about institutions that are about to suffer (enjoy) the most significant losses (gains).
Collapse
|
38
|
Srivastava A, Mazzocco G, Kel A, Wyrwicz LS, Plewczynski D. Detecting reliable non interacting proteins (NIPs) significantly enhancing the computational prediction of protein-protein interactions using machine learning methods. MOLECULAR BIOSYSTEMS 2016; 12:778-85. [PMID: 26738778 DOI: 10.1039/c5mb00672d] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Protein-protein interactions (PPIs) play a vital role in most biological processes. Hence their comprehension can promote a better understanding of the mechanisms underlying living systems. However, besides the cost and the time limitation involved in the detection of experimentally validated PPIs, the noise in the data is still an important issue to overcome. In the last decade several in silico PPI prediction methods using both structural and genomic information were developed for this purpose. Here we introduce a unique validation approach aimed to collect reliable non interacting proteins (NIPs). Thereafter the most relevant protein/protein-pair related features were selected. Finally, the prepared dataset was used for PPI classification, leveraging the prediction capabilities of well-established machine learning methods. Our best classification procedure displayed specificity and sensitivity values of 96.33% and 98.02%, respectively, surpassing the prediction capabilities of other methods, including those trained on gold standard datasets. We showed that the PPI/NIP predictive performances can be considerably improved by focusing on data preparation.
Collapse
Affiliation(s)
- A Srivastava
- Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland
| | - G Mazzocco
- Centre of New Technologies, University of Warsaw, Banacha 2c Str., 02-097 Warsaw, Poland. and Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
| | - A Kel
- GeneXplain GmbH, Am Exer 10b, D-38302, Wolfenbüttel, Germany
| | - L S Wyrwicz
- Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw, Poland
| | - D Plewczynski
- Centre of New Technologies, University of Warsaw, Banacha 2c Str., 02-097 Warsaw, Poland.
| |
Collapse
|
39
|
HybridRanker: Integrating network topology and biomedical knowledge to prioritize cancer candidate genes. J Biomed Inform 2016; 64:139-146. [DOI: 10.1016/j.jbi.2016.10.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Revised: 08/13/2016] [Accepted: 10/06/2016] [Indexed: 11/20/2022]
|
40
|
Zhang J, Duy Le T, Liu L, He J, Li J. Identifying miRNA synergistic regulatory networks in heterogeneous human data via network motifs. MOLECULAR BIOSYSTEMS 2016; 12:454-63. [PMID: 26660849 DOI: 10.1039/c5mb00562k] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Understanding the synergism of multiple microRNAs (miRNAs) in gene regulation can provide important insights into the mechanisms of complex human diseases caused by miRNA regulation. Therefore, it is important to identify miRNA synergism and study miRNA characteristics in miRNA synergistic regulatory networks. A number of methods have been proposed to identify miRNA synergism. However, most of the methods only use downstream target genes of miRNAs to infer miRNA synergism when miRNAs can also be regulated by upstream transcription factors (TFs) at the transcriptional level. Additionally, most methods are based on statistical associations identified from data without considering the causal nature of gene regulation. In this paper, we present a causality based framework, called mirSRN (miRNA synergistic regulatory network), to infer miRNA synergism in human molecular systems by considering both downstream miRNA targets and upstream TF regulation. We apply the proposed framework to two real world datasets and discover that almost all the top 10 miRNAs with the largest node degree in the mirSRNs are associated with different human diseases, including cancer, and that the mirSRNs are approximately scale-free and small-world networks. We also find that most miRNAs in the networks are frequently synergistic with other miRNAs, and miRNAs related to the same disease are likely to be synergistic and in a cluster linked to a biological function. Synergistic miRNA pairs show higher co-expression level, and may have potential functional relationships indicating collaboration between the miRNAs. Functional validation of the identified synergistic miRNAs demonstrates that these miRNAs cause different kinds of diseases. These results deepen our understanding of the biological meaning of miRNA synergism.
Collapse
Affiliation(s)
- Junpeng Zhang
- School of Engineering, Dali University, Dali, Yunnan 671003, P. R. China.
| | - Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA 5095, Australia.
| | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA 5095, Australia.
| | - Jianfeng He
- Institute of Biomedical Engineering, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA 5095, Australia.
| |
Collapse
|
41
|
Characterize the relationship between essential and TATA-containing genes for S. cerevisiae by network topologies in the perturbation sensitivity network. Genomics 2016; 108:177-183. [DOI: 10.1016/j.ygeno.2016.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2016] [Revised: 09/01/2016] [Accepted: 09/01/2016] [Indexed: 01/11/2023]
|
42
|
Analyzing and interpreting genome data at the network level with ConsensusPathDB. Nat Protoc 2016; 11:1889-907. [PMID: 27606777 DOI: 10.1038/nprot.2016.117] [Citation(s) in RCA: 321] [Impact Index Per Article: 40.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales.
Collapse
|
43
|
Bhajun R, Guyon L, Gidrol X. MicroRNA degeneracy and pluripotentiality within a Lavallière-tie architecture confers robustness to gene expression networks. Cell Mol Life Sci 2016; 73:2821-7. [PMID: 27038488 PMCID: PMC4937071 DOI: 10.1007/s00018-016-2186-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Revised: 02/24/2016] [Accepted: 03/18/2016] [Indexed: 11/03/2022]
Abstract
Modularity, feedback control, functional redundancy and bowtie architecture have been proposed as key factors that confer robustness to complex biological systems. MicroRNAs (miRNAs) are highly conserved but functionally dispensable. These antinomic properties suggest that miRNAs fine-tune gene expression rather than act as genetic switches. We synthesize published and unpublished data and hypothesize that miRNA pluripotentiality acts to buffer gene expression, while miRNA degeneracy tunes the expression of targets, thus providing robustness to gene expression networks. Furthermore, we propose a Lavallière-tie architecture by integrating signal transduction, miRNAs and protein expression data to model complex gene expression networks.
Collapse
Affiliation(s)
- Ricky Bhajun
- CEA, BIG, BGE, 17, rue des Martyrs, 38000, Grenoble, France
- University Grenoble Alpes, BGE, 38000, Grenoble, France
- INSERM, U1038, 38000, Grenoble, France
| | - Laurent Guyon
- CEA, BIG, BGE, 17, rue des Martyrs, 38000, Grenoble, France
- University Grenoble Alpes, BGE, 38000, Grenoble, France
- INSERM, U1038, 38000, Grenoble, France
| | - Xavier Gidrol
- CEA, BIG, BGE, 17, rue des Martyrs, 38000, Grenoble, France.
- University Grenoble Alpes, BGE, 38000, Grenoble, France.
- INSERM, U1038, 38000, Grenoble, France.
| |
Collapse
|
44
|
Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 2016; 19:1454-1462. [PMID: 27479844 DOI: 10.1038/nn.4353] [Citation(s) in RCA: 249] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Accepted: 07/01/2016] [Indexed: 02/08/2023]
Abstract
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder with a strong genetic basis. Yet, only a small fraction of potentially causal genes-about 65 genes out of an estimated several hundred-are known with strong genetic evidence from sequencing studies. We developed a complementary machine-learning approach based on a human brain-specific gene network to present a genome-wide prediction of autism risk genes, including hundreds of candidates for which there is minimal or no prior genetic evidence. Our approach was validated in a large independent case-control sequencing study. Leveraging these genome-wide predictions and the brain-specific network, we demonstrated that the large set of ASD genes converges on a smaller number of key pathways and developmental stages of the brain. Finally, we identified likely pathogenic genes within frequent autism-associated copy-number variants and proposed genes and pathways that are likely mediators of ASD across multiple copy-number variants. All predictions and functional insights are available at http://asd.princeton.edu.
Collapse
|
45
|
Spitz A, Gimmler A, Stoeck T, Zweig KA, Horvát EÁ. Assessing Low-Intensity Relationships in Complex Networks. PLoS One 2016; 11:e0152536. [PMID: 27096435 PMCID: PMC4838277 DOI: 10.1371/journal.pone.0152536] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 03/15/2016] [Indexed: 12/27/2022] Open
Abstract
Many large network data sets are noisy and contain links representing low-intensity relationships that are difficult to differentiate from random interactions. This is especially relevant for high-throughput data from systems biology, large-scale ecological data, but also for Web 2.0 data on human interactions. In these networks with missing and spurious links, it is possible to refine the data based on the principle of structural similarity, which assesses the shared neighborhood of two nodes. By using similarity measures to globally rank all possible links and choosing the top-ranked pairs, true links can be validated, missing links inferred, and spurious observations removed. While many similarity measures have been proposed to this end, there is no general consensus on which one to use. In this article, we first contribute a set of benchmarks for complex networks from three different settings (e-commerce, systems biology, and social networks) and thus enable a quantitative performance analysis of classic node similarity measures. Based on this, we then propose a new methodology for link assessment called z* that assesses the statistical significance of the number of their common neighbors by comparison with the expected value in a suitably chosen random graph model and which is a consistently top-performing algorithm for all benchmarks. In addition to a global ranking of links, we also use this method to identify the most similar neighbors of each single node in a local ranking, thereby showing the versatility of the method in two distinct scenarios and augmenting its applicability. Finally, we perform an exploratory analysis on an oceanographic plankton data set and find that the distribution of microbes follows similar biogeographic rules as those of macroorganisms, a result that rejects the global dispersal hypothesis for microbes.
Collapse
Affiliation(s)
- Andreas Spitz
- Institute of Computer Science, Heidelberg University, Heidelberg, BW, Germany
| | - Anna Gimmler
- Department of Ecology, University of Kaiserslautern, Kaiserslautern, RP, Germany
| | - Thorsten Stoeck
- Department of Ecology, University of Kaiserslautern, Kaiserslautern, RP, Germany
| | - Katharina Anna Zweig
- Department of Computer Science, University of Kaiserslautern, Kaiserslautern, RP, Germany
- * E-mail:
| | - Emőke-Ágnes Horvát
- Northwestern Institute on Complex Systems (NICO), Northwestern University, Evanston, IL, United States of America
| |
Collapse
|
46
|
Keskin O, Tuncbag N, Gursoy A. Predicting Protein–Protein Interactions from the Molecular to the Proteome Level. Chem Rev 2016; 116:4884-909. [DOI: 10.1021/acs.chemrev.5b00683] [Citation(s) in RCA: 207] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
| | - Nurcan Tuncbag
- Graduate
School of Informatics, Department of Health Informatics, Middle East Technical University, 06800 Ankara, Turkey
| | | |
Collapse
|
47
|
Snider J, Kotlyar M, Saraon P, Yao Z, Jurisica I, Stagljar I. Fundamentals of protein interaction network mapping. Mol Syst Biol 2015; 11:848. [PMID: 26681426 PMCID: PMC4704491 DOI: 10.15252/msb.20156351] [Citation(s) in RCA: 180] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Studying protein interaction networks of all proteins in an organism (“interactomes”) remains one of the major challenges in modern biomedicine. Such information is crucial to understanding cellular pathways and developing effective therapies for the treatment of human diseases. Over the past two decades, diverse biochemical, genetic, and cell biological methods have been developed to map interactomes. In this review, we highlight basic principles of interactome mapping. Specifically, we discuss the strengths and weaknesses of individual assays, how to select a method appropriate for the problem being studied, and provide general guidelines for carrying out the necessary follow‐up analyses. In addition, we discuss computational methods to predict, map, and visualize interactomes, and provide a summary of some of the most important interactome resources. We hope that this review serves as both a useful overview of the field and a guide to help more scientists actively employ these powerful approaches in their research.
Collapse
Affiliation(s)
- Jamie Snider
- Donnelly Centre, Department of Biochemistry, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Max Kotlyar
- Princess Margaret Cancer Center, IBM Life Sciences Discovery Centre, University Health Network, Ontario, Canada
| | - Punit Saraon
- Donnelly Centre, Department of Biochemistry, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Zhong Yao
- Donnelly Centre, Department of Biochemistry, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Igor Jurisica
- Princess Margaret Cancer Center, IBM Life Sciences Discovery Centre, University Health Network, Ontario, Canada
| | - Igor Stagljar
- Donnelly Centre, Department of Biochemistry, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
48
|
Alvarez AJ, Sanz-Rodríguez CE, Cabrera JL. Weighting dissimilarities to detect communities in networks. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2015; 373:rsta.2015.0108. [PMID: 26527808 DOI: 10.1098/rsta.2015.0108] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/19/2015] [Indexed: 06/05/2023]
Abstract
Many complex systems can be described as networks exhibiting inner organization as communities of nodes. The identification of communities is a key factor to understand community-based functionality. We propose a family of measures based on the weighted sum of two dissimilarity quantifiers that facilitates efficient classification of communities by tuning the quantifiers' relative weight to the network's particularities. Additionally, two new dissimilarities are introduced and incorporated in our analysis. The effectiveness of our approach is tested by examining the Zachary's Karate Club Network and the Caenorhabditis elegans reactions network. The analysis reveals the method's classification power as confirmed by the efficient detection of intrapathway metabolic functions in C. elegans.
Collapse
Affiliation(s)
- Alejandro J Alvarez
- Stochastic Dynamics Laboratory, Center for Physics, Venezuelan Institute for Scientific Research, Caracas 1020-A, Venezuela Departamento de Física, FCFM, Universidad de Chile, Santiago, Chile
| | - Carlos E Sanz-Rodríguez
- Stochastic Dynamics Laboratory, Center for Physics, Venezuelan Institute for Scientific Research, Caracas 1020-A, Venezuela
| | - Juan Luis Cabrera
- Stochastic Dynamics Laboratory, Center for Physics, Venezuelan Institute for Scientific Research, Caracas 1020-A, Venezuela
| |
Collapse
|
49
|
Gallagher SR, Goldberg DS. Characterization of known protein complexes using k-connectivity and other topological measures. F1000Res 2015; 2:172. [PMID: 26913183 PMCID: PMC4743144 DOI: 10.12688/f1000research.2-172.v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/03/2015] [Indexed: 11/20/2022] Open
Abstract
Many protein complexes are densely packed, so proteins within complexes often interact with several other proteins in the complex. Steric constraints prevent most proteins from simultaneously binding more than a handful of other proteins, regardless of the number of proteins in the complex. Because of this, as complex size increases, several measures of the complex decrease within protein-protein interaction networks. However, k-connectivity, the number of vertices or edges that need to be removed in order to disconnect a graph, may be consistently high for protein complexes. The property of k-connectivity has been little used previously in the investigation of protein-protein interactions. To understand the discriminative power of k-connectivity and other topological measures for identifying unknown protein complexes, we characterized these properties in known Saccharomyces cerevisiae protein complexes in networks generated both from highly accurate X-ray crystallography experiments which give an accurate model of each complex, and also as the complexes appear in high-throughput yeast 2-hybrid studies in which new complexes may be discovered. We also computed these properties for appropriate random subgraphs.We found that clustering coefficient, mutual clustering coefficient, and k-connectivity are better indicators of known protein complexes than edge density, degree, or betweenness. This suggests new directions for future protein complex-finding algorithms.
Collapse
Affiliation(s)
- Suzanne R Gallagher
- Department of Computer Science, University of Colorado, Boulder CO, 80302, USA
| | - Debra S Goldberg
- Department of Computer Science, University of Colorado, Boulder CO, 80302, USA
| |
Collapse
|
50
|
Hu GM, Mai TL, Chen CM. Clustering and visualizing similarity networks of membrane proteins. Proteins 2015; 83:1450-61. [PMID: 26011797 DOI: 10.1002/prot.24832] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Revised: 04/23/2015] [Accepted: 05/17/2015] [Indexed: 01/05/2023]
Abstract
We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information.
Collapse
Affiliation(s)
- Geng-Ming Hu
- Department of Physics, National Taiwan Normal University, Taipei, Taiwan
| | - Te-Lun Mai
- Department of Physics, National Taiwan Normal University, Taipei, Taiwan
| | - Chi-Ming Chen
- Department of Physics, National Taiwan Normal University, Taipei, Taiwan
| |
Collapse
|