1
|
Manipur I, Giordano M, Piccirillo M, Parashuraman S, Maddalena L. Community Detection in Protein-Protein Interaction Networks and Applications. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:217-237. [PMID: 34951849 DOI: 10.1109/tcbb.2021.3138142] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The ability to identify and characterize not only the protein-protein interactions but also their internal modular organization through network analysis is fundamental for understanding the mechanisms of biological processes at the molecular level. Indeed, the detection of the network communities can enhance our understanding of the molecular basis of disease pathology, and promote drug discovery and disease treatment in personalized medicine. This work gives an overview of recent computational methods for the detection of protein complexes and functional modules in protein-protein interaction networks, also providing a focus on some of its applications. We propose a systematic reformulation of frequently adopted taxonomies for these methods, also proposing new categories to keep up with the most recent research. We review the literature of the last five years (2017-2021) and provide links to existing data and software resources. Finally, we survey recent works exploiting module identification and analysis, in the context of a variety of disease processes for biomarker identification and therapeutic target detection. Our review provides the interested reader with an up-to-date and self-contained view of the existing research, with links to state-of-the-art literature and resources, as well as hints on open issues and future research directions in complex detection and its applications.
Collapse
|
2
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
3
|
Wang X, Zhang N, Zhao Y, Wang J. A New Method for Recognizing Protein Complexes Based on Protein Interaction Networks and GO Terms. Front Genet 2021; 12:792265. [PMID: 34966415 PMCID: PMC8711776 DOI: 10.3389/fgene.2021.792265] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 11/10/2021] [Indexed: 01/29/2023] Open
Abstract
Motivation: A protein complex is the combination of proteins which interact with each other. Protein–protein interaction (PPI) networks are composed of multiple protein complexes. It is very difficult to recognize protein complexes from PPI data due to the noise of PPI. Results: We proposed a new method, called Topology and Semantic Similarity Network (TSSN), based on topological structure characteristics and biological characteristics to construct the PPI. Experiments show that the TSSN can filter the noise of PPI data. We proposed a new algorithm, called Neighbor Nodes of Proteins (NNP), for recognizing protein complexes by considering their topology information. Experiments show that the algorithm can identify more protein complexes and more accurately. The recognition of protein complexes is vital in research on evolution analysis. Availability and implementation: https://github.com/bioinformatical-code/NNP.
Collapse
Affiliation(s)
- Xiaoting Wang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Nan Zhang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Yulan Zhao
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| |
Collapse
|
4
|
Ying KC, Lin SW. Maximizing cohesion and separation for detecting protein functional modules in protein-protein interaction networks. PLoS One 2020; 15:e0240628. [PMID: 33048996 PMCID: PMC7553341 DOI: 10.1371/journal.pone.0240628] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 09/29/2020] [Indexed: 12/26/2022] Open
Abstract
Protein Function Module (PFM) identification in Protein-Protein Interaction Networks (PPINs) is one of the most important and challenging tasks in computational biology. The quick and accurate detection of PFMs in PPINs can contribute greatly to the understanding of the functions, properties, and biological mechanisms in research on various diseases and the development of new medicines. Despite the performance of existing detection approaches being improved to some extent, there are still opportunities for further enhancements in the efficiency, accuracy, and robustness of such detection methods. Based on the uniqueness of the network-clustering problem in the context of PPINs, this study proposed a very effective and efficient model based on the Lin-Kernighan-Helsgaun algorithm for detecting PFMs in PPINs. To demonstrate the effectiveness and efficiency of the proposed model, computational experiments are performed using three different categories of species datasets. The computational results reveal that the proposed model outperforms existing detection techniques in terms of two key performance indices, i.e., the degree of polymerization inside PFMs (cohesion) and the deviation degree between PFMs (separation), while being very fast and robust. The proposed model can be used to help researchers decide whether to conduct further expensive and time-consuming biological experiments and to select target proteins from large-scale PPI data for further detailed research.
Collapse
Affiliation(s)
- Kuo-Ching Ying
- Department of Industrial Engineering and Management, National Taipei University of Technology, Taipei, Taiwan
| | - Shih-Wei Lin
- Department of Information Management, Chang Gung University, Taoyuan, Taiwan
- Linkou Chang Gung Memorial Hospital, Taoyuan, Taiwan
- Ming Chi University of Technology, Taipei, Taiwan
- * E-mail:
| |
Collapse
|
5
|
Liu W, Ma L, Jeon B, Chen L, Chen B. A Network Hierarchy-Based method for functional module detection in protein-protein interaction networks. J Theor Biol 2018; 455:26-38. [PMID: 29981337 DOI: 10.1016/j.jtbi.2018.06.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Revised: 06/27/2018] [Accepted: 06/29/2018] [Indexed: 02/02/2023]
Abstract
In the post-genomic era, one of the important tasks is to identify protein complexes and functional modules from high-throughput protein-protein interaction data, so that we can systematically analyze and understand the molecular functions and biological processes of cells. Although a lot of functional module detection studies have been proposed, how to design correctly and efficiently functional modules detection algorithms is still a challenging and important scientific problem in computational biology. In this paper, we present a novel Network Hierarchy-Based method to detect functional modules in PPI networks (named NHB-FMD). NHB-FMD first constructs the hierarchy tree corresponding to the PPI network and then encodes the tree such that genetic algorithm is employed to obtain the hierarchy tree with Maximum Likelihood. After that functional module partitioning is performed based on it and the best partitioning is selected as the result. Experimental results in the real PPI networks have shown that the proposed algorithm not only significantly outperforms the state-of-the-art methods but also can detect protein modules more effectively and accurately.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China; The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China; School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea.
| | - Liangyu Ma
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Byeungwoo Jeon
- School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea
| | - Ling Chen
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Bolun Chen
- The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China
| |
Collapse
|
6
|
Protein Complexes Prediction Method Based on Core-Attachment Structure and Functional Annotations. Int J Mol Sci 2017; 18:ijms18091910. [PMID: 28878201 PMCID: PMC5618559 DOI: 10.3390/ijms18091910] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 08/31/2017] [Accepted: 09/01/2017] [Indexed: 11/17/2022] Open
Abstract
Recent advances in high-throughput laboratory techniques captured large-scale protein–protein interaction (PPI) data, making it possible to create a detailed map of protein interaction networks, and thus enable us to detect protein complexes from these PPI networks. However, most of the current state-of-the-art studies still have some problems, for instance, incapability of identifying overlapping clusters, without considering the inherent organization within protein complexes, and overlooking the biological meaning of complexes. Therefore, we present a novel overlapping protein complexes prediction method based on core–attachment structure and function annotations (CFOCM), which performs in two stages: first, it detects protein complex cores with the maximum value of our defined cluster closeness function, in which the proteins are also closely related to at least one common function. Then it appends attach proteins into these detected cores to form the returned complexes. For performance evaluation, CFOCM and six classical methods have been used to identify protein complexes on three different yeast PPI networks, and three sets of real complexes including the Munich Information Center for Protein Sequences (MIPS), the Saccharomyces Genome Database (SGD) and the Catalogues of Yeast protein Complexes (CYC2008) are selected as benchmark sets, and the results show that CFOCM is indeed effective and robust for achieving the highest F-measure values in all tests.
Collapse
|
7
|
Fu J, Zhang W, Wu J. Identification of leader and self-organizing communities in complex networks. Sci Rep 2017; 7:704. [PMID: 28386089 PMCID: PMC5429660 DOI: 10.1038/s41598-017-00718-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 03/09/2017] [Indexed: 11/22/2022] Open
Abstract
Community or module structure is a natural property of complex networks. Leader communities and self-organizing communities have been introduced recently to characterize networks and understand how communities arise in complex networks. However, identification of leader and self-organizing communities is technically challenging since no adequate quantification has been developed to properly separate the two types of communities. We introduced a new measure, called ratio of node degree variances, to distinguish leader communities from self-organizing communities, and developed a statistical model to quantitatively characterize the two types of communities. We experimentally studied the power and robustness of the new method on several real-world networks in combination of some of the existing community identification methods. Our results revealed that social networks and citation networks contain more leader communities whereas technological networks such as power grid network have more self-organizing communities. Moreover, our results also indicated that self-organizing communities tend to be smaller than leader communities. The results shed new lights on community formation and module structures in complex systems.
Collapse
Affiliation(s)
- Jingcheng Fu
- School of Mathematics, Shandong University, Jinan, 250100, China
- Department of Computer Science and Engineering, Washington University, St. Louis, MO, 63130, USA
| | - Weixiong Zhang
- College of Math and Computer Science, Institute for Systems Biology, Jianghan University, Wuhan, 430056, China
- Department of Computer Science and Engineering, Washington University, St. Louis, MO, 63130, USA
| | - Jianliang Wu
- School of Mathematics, Shandong University, Jinan, 250100, China.
| |
Collapse
|
8
|
Yang C, Ji J, Zhang A. BFO-FMD: bacterial foraging optimization for functional module detection in protein–protein interaction networks. Soft comput 2017. [DOI: 10.1007/s00500-017-2584-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
9
|
Ji J, Lv J, Yang C, Zhang A. Detecting Functional Modules Based on a Multiple-Grain Model in Large-Scale Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:610-622. [PMID: 26394434 DOI: 10.1109/tcbb.2015.2480066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Detecting functional modules from a Protein-Protein Interaction (PPI) network is a fundamental and hot issue in proteomics research, where many computational approaches have played an important role in recent years. However, how to effectively and efficiently detect functional modules in large-scale PPI networks is still a challenging problem. We present a new framework, based on a multiple-grain model of PPI networks, to detect functional modules in PPI networks. First, we give a multiple-grain representation model of a PPI network, which has a smaller scale with super nodes. Next, we design the protein grain partitioning method, which employs a functional similarity or a structural similarity to merge some proteins layer by layer. Thirdly, a refining mechanism with border node tests is proposed to address the protein overlapping of different modules during the grain eliminating process. Finally, systematic experiments are conducted on five large-scale yeast and human networks. The results show that the framework not only significantly reduces the running time of functional module detection, but also effectively identifies overlapping modules while keeping some competitive performances, thus it is highly competent to detect functional modules in large-scale PPI networks.
Collapse
|
10
|
Aldecoa R, Marín I. Exploring the limits of community detection strategies in complex networks. Sci Rep 2014; 3:2216. [PMID: 23860510 PMCID: PMC3713530 DOI: 10.1038/srep02216] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Accepted: 06/18/2013] [Indexed: 12/05/2022] Open
Abstract
The characterization of network community structure has profound implications in several scientific areas. Therefore, testing the algorithms developed to establish the optimal division of a network into communities is a fundamental problem in the field. We performed here a highly detailed evaluation of community detection algorithms, which has two main novelties: 1) using complex closed benchmarks, which provide precise ways to assess whether the solutions generated by the algorithms are optimal; and, 2) A novel type of analysis, based on hierarchically clustering the solutions suggested by multiple community detection algorithms, which allows to easily visualize how different are those solutions. Surprise, a global parameter that evaluates the quality of a partition, confirms the power of these analyses. We show that none of the community detection algorithms tested provide consistently optimal results in all networks and that Surprise maximization, obtained by combining multiple algorithms, obtains quasi-optimal performances in these difficult benchmarks.
Collapse
Affiliation(s)
- Rodrigo Aldecoa
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Calle Jaime Roig 11, 46010, Valencia, Spain
| | | |
Collapse
|
11
|
Aldecoa R, Marin I. SurpriseMe: an integrated tool for network community structure characterization using Surprise maximization. Bioinformatics 2013; 30:1041-2. [DOI: 10.1093/bioinformatics/btt741] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
12
|
Surprise maximization reveals the community structure of complex networks. Sci Rep 2013; 3:1060. [PMID: 23320141 PMCID: PMC3544010 DOI: 10.1038/srep01060] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Accepted: 12/28/2012] [Indexed: 11/09/2022] Open
Abstract
How to determine the community structure of complex networks is an open question. It is critical to establish the best strategies for community detection in networks of unknown structure. Here, using standard synthetic benchmarks, we show that none of the algorithms hitherto developed for community structure characterization perform optimally. Significantly, evaluating the results according to their modularity, the most popular measure of the quality of a partition, systematically provides mistaken solutions. However, a novel quality function, called Surprise, can be used to elucidate which is the optimal division into communities. Consequently, we show that the best strategy to find the community structure of all the networks examined involves choosing among the solutions provided by multiple algorithms the one with the highest Surprise value. We conclude that Surprise maximization precisely reveals the community structure of complex networks.
Collapse
|
13
|
Valdiani A, Kadir MA, Saad MS, Talei D, Tan SG. Intra-specific hybridization: Generator of genetic diversification and heterosis in Andrographis paniculata Nees. A bridge from extinction to survival. Gene 2012; 505:23-36. [DOI: 10.1016/j.gene.2012.05.056] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 04/04/2012] [Accepted: 05/28/2012] [Indexed: 12/18/2022]
|
14
|
Aldecoa R, Marín I. Closed benchmarks for network community structure characterization. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:026109. [PMID: 22463281 DOI: 10.1103/physreve.85.026109] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2011] [Revised: 12/23/2011] [Indexed: 05/31/2023]
Abstract
Characterizing the community structure of complex networks is a key challenge in many scientific fields. Very diverse algorithms and methods have been proposed to this end, many working reasonably well in specific situations. However, no consensus has emerged on which of these methods is the best to use in practice. In part, this is due to the fact that testing their performance requires the generation of a comprehensive, standard set of synthetic benchmarks, a goal not yet fully achieved. Here, we present a type of benchmark that we call "closed," in which an initial network of known community structure is progressively converted into a second network whose communities are also known. This approach differs from all previously published ones, in which networks evolve toward randomness. The use of this type of benchmark allows us to monitor the transformation of the community structure of a network. Moreover, we can predict the optimal behavior of the variation of information, a measure of the quality of the partitions obtained, at any moment of the process. This enables us in many cases to determine the best partition among those suggested by different algorithms. Also, since any network can be used as a starting point, extensive studies and comparisons can be performed using a heterogeneous set of structures, including random ones. These properties make our benchmarks a general standard for comparing community detection algorithms.
Collapse
Affiliation(s)
- Rodrigo Aldecoa
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Calle Jaime Roig 11, E-46010 Valencia, Spain.
| | | |
Collapse
|
15
|
Baum A, Agger J, Meyer AS, Egebo M, Mikkelsen JD. Rapid near infrared spectroscopy for prediction of enzymatic hydrolysis of corn bran after various pretreatments. N Biotechnol 2012; 29:293-301. [DOI: 10.1016/j.nbt.2011.11.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Accepted: 11/22/2011] [Indexed: 11/28/2022]
|
16
|
Abstract
The analysis of complex networks permeates all sciences, from biology to sociology. A fundamental, unsolved problem is how to characterize the community structure of a network. Here, using both standard and novel benchmarks, we show that maximization of a simple global parameter, which we call Surprise (S), leads to a very efficient characterization of the community structure of complex synthetic networks. Particularly, S qualitatively outperforms the most commonly used criterion to define communities, Newman and Girvan's modularity (Q). Applying S maximization to real networks often provides natural, well-supported partitions, but also sometimes counterintuitive solutions that expose the limitations of our previous knowledge. These results indicate that it is possible to define an effective global criterion for community structure and open new routes for the understanding of complex networks.
Collapse
|