1
|
Castanho EN, Aidos H, Madeira SC. Biclustering data analysis: a comprehensive survey. Brief Bioinform 2024; 25:bbae342. [PMID: 39007596 PMCID: PMC11247412 DOI: 10.1093/bib/bbae342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 05/16/2024] [Accepted: 07/01/2024] [Indexed: 07/16/2024] Open
Abstract
Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.
Collapse
Affiliation(s)
- Eduardo N Castanho
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Helena Aidos
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| |
Collapse
|
2
|
Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, Sheng QZ, Yu PS. A Comprehensive Survey on Community Detection With Deep Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4682-4702. [PMID: 35263257 DOI: 10.1109/tnnls.2021.3137396] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detecting a community in a network is a matter of discerning the distinct features and connections of a group of members that are different from those in other communities. The ability to do this is of great significance in network analysis. However, beyond the classic spectral clustering and statistical inference methods, there have been significant developments with deep learning techniques for community detection in recent years-particularly when it comes to handling high-dimensional network data. Hence, a comprehensive review of the latest progress in community detection through deep learning is timely. To frame the survey, we have devised a new taxonomy covering different state-of-the-art methods, including deep learning models based on deep neural networks (DNNs), deep nonnegative matrix factorization, and deep sparse filtering. The main category, i.e., DNNs, is further divided into convolutional networks, graph attention networks, generative adversarial networks, and autoencoders. The popular benchmark datasets, evaluation metrics, and open-source implementations to address experimentation settings are also summarized. This is followed by a discussion on the practical applications of community detection in various domains. The survey concludes with suggestions of challenging topics that would make for fruitful future research directions in this fast-growing deep learning field.
Collapse
|
3
|
Spampinato AG, Scollo RA, Cutello V, Pavone M. Random search immune algorithm for community detection. Soft comput 2023. [DOI: 10.1007/s00500-023-07999-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
Abstract
AbstractCommunity detection is a prominent research topic in Complex Network Analysis, and it constitutes an important research field on all those areas where complex networks represent a powerful interpretation tool for describing and understanding systems involved in neuroscience, biology, social science, economy, and many others. A challenging approach to uncover the community structure in complex network, and then revealing the internal organization of nodes, is Modularity optimization. In this research paper, we present an immune optimization algorithm (opt-IA) developed to detect community structures, with the main aim to maximize the modularity produced by the discovered communities. In order to assess the performance of opt-IA, we compared it with an overall of 20 heuristics and metaheuristics, among which one Hyper-Heuristic method, using social and biological complex networks as data set. Unlike these algorithms, opt-IA is entirely based on a fully random search process, which in turn is combined with purely stochastic operators. According to the obtained outcomes, opt-IA shows strictly better performances than almost all heuristics and metaheuristics to which it was compared; whilst it turns out to be comparable with the Hyper-Heuristic method. Overall, it can be claimed that opt-IA, even if driven by a purely random process, proves to be reliable and with efficient performance. Furthermore, to prove the latter claim, a sensitivity analysis of the functionality was conducted, using the classic metrics NMI, ARI and NVI.
Collapse
|
4
|
Autoencoder Model Using Edge Enhancement to Detect Communities in Complex Networks. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-022-06747-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
5
|
Affiliation(s)
- Mingao Yuan
- Department of Statistics, North Dakota State University
| | - Ruiqi Liu
- Department of Mathematical Sciences, Texas Tech University
| | - Yang Feng
- Department of Biostatistics, New York University
| | - Zuofeng Shang
- Department of Mathematical Sciences, New Jersey Institute of Technology
| |
Collapse
|
6
|
Affiliation(s)
- Yaoming Zhen
- School of Data Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Junhui Wang
- School of Data Science, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
7
|
Information Limits for Community Detection in Hypergraph with Label Information. Symmetry (Basel) 2021. [DOI: 10.3390/sym13112060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
In network data mining, community detection refers to the problem of partitioning the nodes of a network into clusters (communities). This is equivalent to identifying the cluster label of each node. A label estimator is said to be an exact recovery of the true labels (communities) if it coincides with the true labels with a probability convergent to one. In this work, we consider the effect of label information on the exact recovery of communities in an m-uniform Hypergraph Stochastic Block Model (HSBM). We investigate two scenarios of label information: (1) a noisy label for each node is observed independently, with 1−αn as the probability that the noisy label will match the true label; (2) the true label of each node is observed independently, with the probability of 1−αn. We derive sharp boundaries for exact recovery under both scenarios from an information-theoretical point of view. The label information improves the sharp detection boundary if and only if αn=n−β+o(1) for a constant β>0.
Collapse
|
8
|
Yuan M, Yang F, Shang Z. Hypothesis testing in sparse weighted stochastic block model. Stat Pap (Berl) 2021. [DOI: 10.1007/s00362-021-01269-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Chen L, Zhou J, Lin L. Hypothesis testing for populations of networks. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1977961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Li Chen
- College of Mathematics, Southwest Minzu University, Chengdu, China
| | - Jie Zhou
- College of Mathematics, Sichuan University, Chengdu, China
| | - Lizhen Lin
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, South Bend, Indiana, USA
| |
Collapse
|
10
|
Mattsson CES, Takes FW, Heemskerk EM, Diks C, Buiten G, Faber A, Sloot PMA. Functional Structure in Production Networks. Front Big Data 2021; 4:666712. [PMID: 34095822 PMCID: PMC8176009 DOI: 10.3389/fdata.2021.666712] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 04/19/2021] [Indexed: 12/02/2022] Open
Abstract
Production networks are integral to economic dynamics, yet dis-aggregated network data on inter-firm trade is rarely collected and often proprietary. Here we situate company-level production networks within a wider space of networks that are different in nature, but similar in local connectivity structure. Through this lens, we study a regional and a national network of inferred trade relationships reconstructed from Dutch national economic statistics and re-interpret prior empirical findings. We find that company-level production networks have so-called functional structure, as previously identified in protein-protein interaction (PPI) networks. Functional networks are distinctive in their over-representation of closed squares, which we quantify using an existing measure called spectral bipartivity. Shared local connectivity structure lets us ferry insights between domains. PPI networks are shaped by complementarity, rather than homophily, and we use multi-layer directed configuration models to show that this principle explains the emergence of functional structure in production networks. Companies are especially similar to their close competitors, not to their trading partners. Our findings have practical implications for the analysis of production networks and give us precise terms for the local structural features that may be key to understanding their routine function, failure, and growth.
Collapse
Affiliation(s)
- Carolina E. S. Mattsson
- Computational Network Science Lab, Leiden Institute of Advanced Computer Science, Leiden University, Leiden, Netherlands
- Network Science Institute, Boston, MA, United States
| | - Frank W. Takes
- Computational Network Science Lab, Leiden Institute of Advanced Computer Science, Leiden University, Leiden, Netherlands
- CORPNET, University of Amsterdam, Amsterdam, Netherlands
| | - Eelke M. Heemskerk
- CORPNET, University of Amsterdam, Amsterdam, Netherlands
- Department of Political Science, University of Amsterdam, Amsterdam, Netherlands
| | - Cees Diks
- Faculty Economics and Business, University of Amsterdam, Amsterdam, Netherlands
- Tinbergen Institute, Amsterdam, Netherlands
| | - Gert Buiten
- Statistics Netherlands, The Hague, Netherlands
| | - Albert Faber
- Ministry of Economic Affairs & Climate, The Hague, Netherlands
| | - Peter M. A. Sloot
- Computational Science Lab, Faculty of Science, University of Amsterdam, Amsterdam, Netherlands
- Institute for Advanced Study, University of Amsterdam, Amsterdam, Netherlands
- Complexity Institute, Nanyang Technological University, Singapore, Singapore
- Complexity Science Hub Vienna, Vienna, Austria
- National Center for Cognitive Research, ITMO University, Saint Petersburg, Russia
| |
Collapse
|
11
|
Tang F, Feng Y, Chiheb H, Fan J. The Interplay of Demographic Variables and Social Distancing Scores in Deep Prediction of U.S. COVID-19 Cases. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1901717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Francesca Tang
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ
| | - Yang Feng
- Department of Biostatistics, New York University, New York City, NY
| | | | - Jianqing Fan
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ
| |
Collapse
|
12
|
Meng Z, Kuang L, Chen Z, Zhang Z, Tan Y, Li X, Wang L. Method for Essential Protein Prediction Based on a Novel Weighted Protein-Domain Interaction Network. Front Genet 2021; 12:645932. [PMID: 33815480 PMCID: PMC8010314 DOI: 10.3389/fgene.2021.645932] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 02/15/2021] [Indexed: 01/04/2023] Open
Abstract
In recent years a number of calculative models based on protein-protein interaction (PPI) networks have been proposed successively. However, due to false positives, false negatives, and the incompleteness of PPI networks, there are still many challenges affecting the design of computational models with satisfactory predictive accuracy when inferring key proteins. This study proposes a prediction model called WPDINM for detecting key proteins based on a novel weighted protein-domain interaction (PDI) network. In WPDINM, a weighted PPI network is constructed first by combining the gene expression data of proteins with topological information extracted from the original PPI network. Simultaneously, a weighted domain-domain interaction (DDI) network is constructed based on the original PDI network. Next, through integrating the newly obtained weighted PPI network and weighted DDI network with the original PDI network, a weighted PDI network is further constructed. Then, based on topological features and biological information, including the subcellular localization and orthologous information of proteins, a novel PageRank-based iterative algorithm is designed and implemented on the newly constructed weighted PDI network to estimate the criticality of proteins. Finally, to assess the prediction performance of WPDINM, we compared it with 12 kinds of competitive measures. Experimental results show that WPDINM can achieve a predictive accuracy rate of 90.19, 81.96, 70.72, 62.04, 55.83, and 51.13% in the top 1%, top 5%, top 10%, top 15%, top 20%, and top 25% separately, which exceeds the prediction accuracy achieved by traditional state-of-the-art competing measures. Owing to the satisfactory identification effect, the WPDINM measure may contribute to the further development of key protein identification.
Collapse
Affiliation(s)
- Zixuan Meng
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Zhen Zhang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Xueyong Li
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
13
|
Abstract
Network (graph) data analysis is a popular research topic in statistics and machine learning. In application, one is frequently confronted with graph two-sample hypothesis testing where the goal is to test the difference between two graph populations. Several statistical tests have been devised for this purpose in the context of binary graphs. However, many of the practical networks are weighted and existing procedures cannot be directly applied to weighted graphs. In this paper, we study the weighted graph two-sample hypothesis testing problem and propose a practical test statistic. We prove that the proposed test statistic converges in distribution to the standard normal distribution under the null hypothesis and analyze its power theoretically. The simulation study shows that the proposed test has satisfactory performance and it substantially outperforms the existing counterpart in the binary graph case. A real data application is provided to illustrate the method.
Collapse
Affiliation(s)
- Mingao Yuan
- Department of Statistics, North Dakota State University, Fargo, ND, USA,Mingao Yuan Department of Statistics, North Dakota State University, Fargo58102, ND, USA
| | - Qian Wen
- Department of Statistics, North Dakota State University, Fargo, ND, USA
| |
Collapse
|
14
|
Zhong X, Rajapakse JC. Graph embeddings on gene ontology annotations for protein-protein interaction prediction. BMC Bioinformatics 2020; 21:560. [PMID: 33323115 PMCID: PMC7739483 DOI: 10.1186/s12859-020-03816-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 10/13/2020] [Indexed: 01/15/2023] Open
Abstract
Background Protein–protein interaction (PPI) prediction is an important task towards the understanding of many bioinformatics functions and applications, such as predicting protein functions, gene-disease associations and disease-drug associations. However, many previous PPI prediction researches do not consider missing and spurious interactions inherent in PPI networks. To address these two issues, we define two corresponding tasks, namely missing PPI prediction and spurious PPI prediction, and propose a method that employs graph embeddings that learn vector representations from constructed Gene Ontology Annotation (GOA) graphs and then use embedded vectors to achieve the two tasks. Our method leverages on information from both term–term relations among GO terms and term-protein annotations between GO terms and proteins, and preserves properties of both local and global structural information of the GO annotation graph. Results We compare our method with those methods that are based on information content (IC) and one method that is based on word embeddings, with experiments on three PPI datasets from STRING database. Experimental results demonstrate that our method is more effective than those compared methods. Conclusion Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GOA graphs for our defined missing and spurious PPI tasks.
Collapse
Affiliation(s)
- Xiaoshi Zhong
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, Singapore
| |
Collapse
|
15
|
Chakraborty A, Ikeda Y. Testing "efficient supply chain propositions" using topological characterization of the global supply chain network. PLoS One 2020; 15:e0239669. [PMID: 33002029 PMCID: PMC7529254 DOI: 10.1371/journal.pone.0239669] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 09/08/2020] [Indexed: 11/23/2022] Open
Abstract
In this paper, we study the topological properties of the global supply chain network in terms of its degree distribution, clustering coefficient, degree-degree correlation, bow-tie structure, and community structure to test the efficient supply chain propositions proposed by E. J.S. Hearnshaw et al. The global supply chain data in the year 2017 are constructed by collecting various company data from the web site of Standard & Poor’s Capital IQ platform. The in- and out-degree distributions are characterized by a power law of the form of γin = 2.42 and γout = 2.11. The clustering coefficient decays 〈C(k)〉∼k-βk with an exponent βk = 0.46. The nodal degree-degree correlations 〈knn(k)〉 indicates the absence of assortativity. The bow-tie structure of giant weakly connected component (GWCC) reveals that the OUT component is the largest and consists 41.1% of all firms. The giant strong connected component (GSCC) is comprised of 16.4% of all firms. We observe that upstream or downstream firms are located a few steps away from the GSCC. Furthermore, we uncover the community structures of the network and characterize them according to their location and industry classification. We observe that the largest community consists of the consumer discretionary sector based mainly in the United States (US). These firms belong to the OUT component in the bow-tie structure of the global supply chain network. Finally, we confirm the validity of Hearnshaw et al.’s efficient supply chain propositions, namely Proposition S1 (short path length), Proposition S2 (power-law degree distribution), Proposition S3 (high clustering coefficient), Proposition S4 (“fit-gets-richer” growth mechanism), Proposition S5 (truncation of power-law degree distribution), and Proposition S7 (community structure with overlapping boundaries) regarding the global supply chain network. While the original propositions S1 just mentioned a short path length, we found the short path from the GSCC to IN and OUT by analyzing the bow-tie structure. Therefore, the short path length in the bow-tie structure is a conceptual addition to the original propositions of Hearnshaw.
Collapse
Affiliation(s)
| | - Yuichi Ikeda
- Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan
- * E-mail:
| |
Collapse
|
16
|
Perscheid C. Integrative biomarker detection on high-dimensional gene expression data sets: a survey on prior knowledge approaches. Brief Bioinform 2020; 22:5881664. [PMID: 32761115 DOI: 10.1093/bib/bbaa151] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 06/15/2020] [Accepted: 06/16/2020] [Indexed: 02/06/2023] Open
Abstract
Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.
Collapse
Affiliation(s)
- Cindy Perscheid
- Hasso Plattner Institute, University of Potsdam, Potsdam, 14482, Germany
| |
Collapse
|
17
|
Hwang JY, Lee JO, Yang W. Local law and Tracy–Widom limit for sparse stochastic block models. BERNOULLI 2020. [DOI: 10.3150/20-bej1201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
18
|
Hwang S, Lee T, Yoon Y. Exploring disease comorbidity in a module-module interaction network. J Bioinform Comput Biol 2020; 18:2050010. [PMID: 32404015 DOI: 10.1142/s0219720020500109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Understanding disease comorbidity contributes to improved quality of life in patients who are suffering from multiple diseases. Therefore, to better explore comorbid diseases, the clarification of associations between diseases based on biological functions is essential. In our study, we propose a method for identifying disease comorbidity in a module-based network, named the module-module interaction (MMI) network, which represents how biological functions influence each other. To construct the MMI network, we detected gene modules - sets of genes that have a higher probability of taking part in specific functions - and established a link between these modules. Subsequently, we constructed disease-related networks in the MMI network to understand inherent disease mechanisms and calculated comorbidity scores of disease pairs using Gene Ontology (GO) terms. Our results show that we can obtain further information on disease mechanisms by considering interactions between functional modules instead of between genes. In addition, we verified that predicted comorbid relationships of disease pairs based on the MMI network are more significant than those based on the protein-protein interaction (PPI) network. This study can be useful to elucidate the mechanisms underlying comorbidities for further study, which will provide a broader insight into the pathogenesis of diseases.
Collapse
Affiliation(s)
- Soyoun Hwang
- Department of IT Convergence Engineering, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Korea
| | - Taekeon Lee
- Department of Computer Engineering, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Korea
| | - Youngmi Yoon
- Department of Computer Engineering, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Korea
| |
Collapse
|
19
|
|
20
|
Nonreconstruction of high-dimensional stochastic block model with bounded degree. Stat Probab Lett 2020. [DOI: 10.1016/j.spl.2019.108675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Huang J, Hou Y, Li Y. Efficient community detection algorithm based on higher-order structures in complex networks. CHAOS (WOODBURY, N.Y.) 2020; 30:023114. [PMID: 32113221 DOI: 10.1063/1.5130523] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 01/14/2020] [Indexed: 06/10/2023]
Abstract
It is a challenging problem to assign communities in a complex network so that nodes in a community are tightly connected on the basis of higher-order connectivity patterns such as motifs. In this paper, we develop an efficient algorithm that detects communities based on higher-order structures. Our algorithm can also detect communities based on a signed motif, a colored motif, a weighted motif, as well as multiple motifs. We also introduce stochastic block models on the basis of higher-order structures. Then, we test our community detection algorithm on real-world networks and computer generated graphs drawn from the stochastic block models. The results of the tests indicate that our community detection algorithm is effective to identify communities on the basis of higher-order connectivity patterns.
Collapse
Affiliation(s)
- Jinyu Huang
- College of Computer Science, Sichuan University of Science and Engineering, Zigong 643000, People's Republic of China
| | - Yani Hou
- College of Computer Science, Sichuan University of Science and Engineering, Zigong 643000, People's Republic of China
| | - Yuansong Li
- College of Computer Science, Sichuan University of Science and Engineering, Zigong 643000, People's Republic of China
| |
Collapse
|
22
|
|
23
|
Wu Z, Liao Q, Liu B. A comprehensive review and evaluation of computational methods for identifying protein complexes from protein–protein interaction networks. Brief Bioinform 2019; 21:1531-1548. [DOI: 10.1093/bib/bbz085] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/17/2019] [Accepted: 06/17/2019] [Indexed: 02/04/2023] Open
Abstract
Abstract
Protein complexes are the fundamental units for many cellular processes. Identifying protein complexes accurately is critical for understanding the functions and organizations of cells. With the increment of genome-scale protein–protein interaction (PPI) data for different species, various computational methods focus on identifying protein complexes from PPI networks. In this article, we give a comprehensive and updated review on the state-of-the-art computational methods in the field of protein complex identification, especially focusing on the newly developed approaches. The computational methods are organized into three categories, including cluster-quality-based methods, node-affinity-based methods and ensemble clustering methods. Furthermore, the advantages and disadvantages of different methods are discussed, and then, the performance of 17 state-of-the-art methods is evaluated on two widely used benchmark data sets. Finally, the bottleneck problems and their potential solutions in this important field are discussed.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
24
|
Xie D, Yi Y, Zhou J, Li X, Wu H. A novel temporal protein complexes identification framework based on density–distance and heuristic algorithm. Neural Comput Appl 2019. [DOI: 10.1007/s00521-018-3660-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
25
|
Omony J, de Jong A, Kok J, van Hijum SAFT. Reconstruction and inference of the Lactococcus lactis MG1363 gene co-expression network. PLoS One 2019; 14:e0214868. [PMID: 31116749 PMCID: PMC6530827 DOI: 10.1371/journal.pone.0214868] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 03/21/2019] [Indexed: 01/30/2023] Open
Abstract
Lactic acid bacteria are Gram-positive bacteria used throughout the world in many industrial applications for their acidification, flavor and texture formation attributes. One of the species, Lactococcus lactis, is employed for the production of fermented milk products like cheese, buttermilk and quark. It ferments lactose to lactic acid and, thus, helps improve the shelf life of the products. Many physiological and transcriptome studies have been performed in L. lactis in order to comprehend and improve its biotechnological assets. Using large amounts of transcriptome data to understand and predict the behavior of biological processes in bacterial or other cell types is a complex task. Gene networks enable predicting gene behavior and function in the context of transcriptionally linked processes. We reconstruct and present the gene co-expression network (GCN) for the most widely studied L. lactis strain, MG1363, using publicly available transcriptome data. Several methods exist to generate and judge the quality of GCNs. Different reconstruction methods lead to networks with varying structural properties, consequently altering gene clusters. We compared the structural properties of the MG1363 GCNs generated by five methods, namely Pearson correlation, Spearman correlation, GeneNet, Weighted Gene Co-expression Network Analysis (WGCNA), and Sparse PArtial Correlation Estimation (SPACE). Using SPACE, we generated an L. lactis MG1363 GCN and assessed its quality using modularity and structural and biological criteria. The L. lactis MG1363 GCN has structural properties similar to those of the gold-standard networks of Escherichia coli K-12 and Bacillus subtilis 168. We showcase that the network can be used to mine for genes with similar expression profiles that are also generally linked to the same biological process.
Collapse
Affiliation(s)
- Jimmy Omony
- Top Institute Food and Nutrition (TIFN), Wageningen, The Netherlands
| | - Anne de Jong
- Top Institute Food and Nutrition (TIFN), Wageningen, The Netherlands
| | - Jan Kok
- Top Institute Food and Nutrition (TIFN), Wageningen, The Netherlands
- * E-mail:
| | | |
Collapse
|
26
|
Chen J, Liu J, Calhoun VD. The Translational Potential of Neuroimaging Genomic Analyses To Diagnosis And Treatment In The Mental Disorders. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2019; 107:912-927. [PMID: 32051642 PMCID: PMC7015534 DOI: 10.1109/jproc.2019.2913145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Imaging genomics focuses on characterizing genomic influence on the variation of neurobiological traits, holding promise for illuminating the pathogenesis, reforming the diagnostic system, and precision medicine of mental disorders. This paper aims to provide an overall picture of the current status of neuroimaging-genomic analyses in mental disorders, and how we can increase their translational potential into clinical practice. The review is organized around three perspectives. (a) Towards reliability, generalizability and interpretability, where we summarize the multivariate models and discuss the considerations and trade-offs of using these methods and how reliable findings may be reached, to serve as ground for further delineation. (b) Towards improved diagnosis, where we outline the advantages and challenges of constructing a dimensional transdiagnostic model and how imaging genomic analyses map into this framework to aid in deconstructing heterogeneity and achieving an optimal stratification of patients that better inform treatment planning. (c) Towards improved treatment. Here we highlight recent efforts and progress in elucidating the functional annotations that bridge between genomic risk and neurobiological abnormalities, in detecting genomic predisposition and prodromal neurodevelopmental changes, as well as in identifying imaging genomic biomarkers for predicting treatment response. Providing an overview of the challenges and promises, this review hopefully motivates imaging genomic studies with multivariate, dimensional and transdiagnostic designs for generalizable and interpretable findings that facilitate development of personalized treatment.
Collapse
Affiliation(s)
- Jiayu Chen
- The Mind Research Network, Albuquerque, NM 87106 USA
| | - Jingyu Liu
- The Mind Research Network, Albuquerque, NM 87106 USA, and also with the Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131 USA
| | - Vince D Calhoun
- The Mind Research Network, Albuquerque, NM 87106 USA, and also with the Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131 USA
| |
Collapse
|
27
|
Mei JP, Lv H, Yang L, Li Y. Clustering for heterogeneous information networks with extended star-structure. Data Min Knowl Discov 2019. [DOI: 10.1007/s10618-019-00626-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
28
|
Gligorijevic V, Panagakis Y, Zafeiriou S. Non-Negative Matrix Factorizations for Multiplex Network Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:928-940. [PMID: 29993651 DOI: 10.1109/tpami.2018.2821146] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Networks have been a general tool for representing, analyzing, and modeling relational data arising in several domains. One of the most important aspect of network analysis is community detection or network clustering. Until recently, the major focus have been on discovering community structure in single (i.e., monoplex) networks. However, with the advent of relational data with multiple modalities, multiplex networks, i.e., networks composed of multiple layers representing different aspects of relations, have emerged. Consequently, community detection in multiplex network, i.e., detecting clusters of nodes shared by all layers, has become a new challenge. In this paper, we propose Network Fusion for Composite Community Extraction (NF-CCE), a new class of algorithms, based on four different non-negative matrix factorization models, capable of extracting composite communities in multiplex networks. Each algorithm works in two steps: first, it finds a non-negative, low-dimensional feature representation of each network layer; then, it fuses the feature representation of layers into a common non-negative, low-dimensional feature representation via collective factorization. The composite clusters are extracted from the common feature representation. We demonstrate the superior performance of our algorithms over the state-of-the-art methods on various types of multiplex networks, including biological, social, economic, citation, phone communication, and brain multiplex networks.
Collapse
|
29
|
Xu B, Guan J, Wang Y, Wang Z. Essential Protein Detection by Random Walk on Weighted Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:377-387. [PMID: 28504946 DOI: 10.1109/tcbb.2017.2701824] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Essential proteins are critical to the development and survival of cells. Identification of essential proteins is helpful for understanding the minimal set of required genes in a living cell and for designing new drugs. To detect essential proteins, various computational methods have been proposed based on protein-protein interaction (PPI) networks. However, protein interaction data obtained by high-throughput experiments usually contain high false positives, which negatively impacts the accuracy of essential protein detection. Moreover, most existing studies focused on the local information of proteins in PPI networks, while ignoring the influence of indirect protein interactions on essentiality. In this paper, we propose a novel method, called Essentiality Ranking (EssRank in short), to boost the accuracy of essential protein detection. To deal with the inaccuracy of PPI data, confidence scores of interactions are evaluated by integrating various biological information. Weighted edge clustering coefficient (WECC), considering both interaction confidence scores and network topology, is proposed to calculate edge weights in PPI networks. The weight of each node is evaluated by the sum of WECC values of its linking edges. A random walk method, making use of both direct and indirect protein interactions, is then employed to calculate protein essentiality iteratively. Experimental results on the yeast PPI network show that EssRank outperforms most existing methods, including the most commonly-used centrality measures (SC, DC, BC, CC, IC, and EC), topology based methods (DMNC and NC) and the data integrating method IEW.
Collapse
|
30
|
Abstract
Network structures, consisting of nodes and edges, have applications in almost all subjects. A set of nodes is called a community if the nodes have strong interrelations. Industries (including cell phone carriers and online social media companies) need community structures to allocate network resources and provide proper and accurate services. However, most detection algorithms are derived independently, which is arduous and even unnecessary. Although recent research shows that a general detection method that serves all purposes does not exist, we believe that there is some general procedure of deriving detection algorithms. In this paper, we represent such a general scheme. We mainly focus on two types of networks: transmission networks and similarity networks. We reduce them to a unified graph model, based on which we propose a method to define and detect community structures. Finally, we also give a demonstration to show how our design scheme works.
Collapse
|
31
|
Luecken MD, Page MJT, Crosby AJ, Mason S, Reinert G, Deane CM. CommWalker: correctly evaluating modules in molecular networks in light of annotation bias. Bioinformatics 2019; 34:994-1000. [PMID: 29112702 PMCID: PMC5860269 DOI: 10.1093/bioinformatics/btx706] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 11/02/2017] [Indexed: 11/24/2022] Open
Abstract
Motivation Detecting novel functional modules in molecular networks is an important step in biological research. In the absence of gold standard functional modules, functional annotations are often used to verify whether detected modules/communities have biological meaning. However, as we show, the uneven distribution of functional annotations means that such evaluation methods favor communities of well-studied proteins. Results We propose a novel framework for the evaluation of communities as functional modules. Our proposed framework, CommWalker, takes communities as inputs and evaluates them in their local network environment by performing short random walks. We test CommWalker’s ability to overcome annotation bias using input communities from four community detection methods on two protein interaction networks. We find that modules accepted by CommWalker are similarly co-expressed as those accepted by current methods. Crucially, CommWalker performs well not only in well-annotated regions, but also in regions otherwise obscured by poor annotation. CommWalker community prioritization both faithfully captures well-validated communities and identifies functional modules that may correspond to more novel biology. Availability and implementation The CommWalker algorithm is freely available at opig.stats.ox.ac.uk/resources or as a docker image on the Docker Hub at hub.docker.com/r/lueckenmd/commwalker/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M D Luecken
- Department of Statistics, University of Oxford, Oxford, UK
- Doctoral Training Centre, University of Oxford, Oxford, UK
| | - M J T Page
- Department of Informatics, UCB Pharma, Slough, UK
| | - A J Crosby
- Immunology Therapeutic Area, UCB Pharma, Slough, UK
| | - S Mason
- Immunology Therapeutic Area, UCB Pharma, Slough, UK
| | - G Reinert
- Department of Statistics, University of Oxford, Oxford, UK
| | - C M Deane
- Department of Statistics, University of Oxford, Oxford, UK
- Doctoral Training Centre, University of Oxford, Oxford, UK
- To whom correspondence should be addressed.
| |
Collapse
|
32
|
Rai A, Shinde P, Jalan S. Network spectra for drug-target identification in complex diseases: new guns against old foes. APPLIED NETWORK SCIENCE 2018; 3:51. [PMID: 30596144 PMCID: PMC6297166 DOI: 10.1007/s41109-018-0107-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 10/30/2018] [Indexed: 05/07/2023]
Abstract
The fundamental understanding of altered complex molecular interactions in a diseased condition is the key to its cure. The overall functioning of these molecules is kind of jugglers play in the cell orchestra and to anticipate these relationships among the molecules is one of the greatest challenges in modern biology and medicine. Network science turned out to be providing a successful and simple platform to understand complex interactions among healthy and diseased tissues. Furthermore, much information about the structure and dynamics of a network is concealed in the eigenvalues of its adjacency matrix. In this review, we illustrate rapid advancements in the field of network science in combination with spectral graph theory that enables us to uncover the complexities of various diseases. Interpretations laid by network science approach have solicited insights into molecular relationships and have reported novel drug targets and biomarkers in various complex diseases.
Collapse
Affiliation(s)
- Aparna Rai
- Aushadhi Open Innovation Programme, Indian Institute of Technology Guwahati, Guwahati, 781039 India
| | - Pramod Shinde
- Discipline of Biosciences and Biomedical Engineering, Indian Institute of Technology Indore, Khandwa Road, Simrol, Indore, 453552 India
| | - Sarika Jalan
- Discipline of Biosciences and Biomedical Engineering, Indian Institute of Technology Indore, Khandwa Road, Simrol, Indore, 453552 India
- Complex Systems Lab, Discipline of Physics, Indian Institute of Technology Indore, Khandwa Road, Indore, 453552 India
- Lobachevsky University, Gagarin avenue 23, Nizhny Novgorod, 603950 Russia
| |
Collapse
|
33
|
Jiao P, Yu W, Wang W, Li X, Sun Y. Exploring temporal community structure and constant evolutionary pattern hiding in dynamic networks. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.03.065] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
34
|
Chakraborty A, Kichikawa Y, Iino T, Iyetomi H, Inoue H, Fujiwara Y, Aoyama H. Hierarchical communities in the walnut structure of the Japanese production network. PLoS One 2018; 13:e0202739. [PMID: 30157210 PMCID: PMC6114793 DOI: 10.1371/journal.pone.0202739] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 08/07/2018] [Indexed: 11/19/2022] Open
Abstract
This paper studies the structure of the Japanese production network, which includes one million firms and five million supplier-customer links. This study finds that this network forms a tightly-knit structure with a core giant strongly connected component (GSCC) surrounded by IN and OUT components constituting two half-shells of the GSCC, which we call awalnut structure because of its shape. The hierarchical structure of the communities is studied by the Infomap method, and most of the irreducible communities are found to be at the second level. The composition of some of the major communities, including overexpressions regarding their industrial or regional nature, and the connections that exist between the communities are studied in detail. The findings obtained here cause us to question the validity and accuracy of using the conventional input-output analysis, which is expected to be useful when firms in the same sectors are highly connected to each other.
Collapse
Affiliation(s)
- Abhijit Chakraborty
- Graduate School of Simulation Studies, The University of Hyogo, Kobe, Japan
- * E-mail:
| | | | - Takashi Iino
- Faculty of Science, Niigata University, Niigata, Japan
| | | | - Hiroyasu Inoue
- Graduate School of Simulation Studies, The University of Hyogo, Kobe, Japan
| | - Yoshi Fujiwara
- Graduate School of Simulation Studies, The University of Hyogo, Kobe, Japan
| | - Hideaki Aoyama
- Graduate School of Science, Kyoto University, Kyoto, Japan
| |
Collapse
|
35
|
Bao W, Michailidis G. Core community structure recovery and phase transition detection in temporally evolving networks. Sci Rep 2018; 8:12938. [PMID: 30154531 PMCID: PMC6113337 DOI: 10.1038/s41598-018-29964-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 07/16/2018] [Indexed: 11/08/2022] Open
Abstract
Community detection in time series networks represents a timely and significant research topic due to its applications in a broad range of scientific fields, including biology, social sciences and engineering. In this work, we introduce methodology to address this problem, based on a decomposition of the network adjacency matrices into low-rank components that capture the community structure and sparse & dense noise perturbation components. It is further assumed that the low-rank structure exhibits sharp changes (phase transitions) at certain epochs that our methodology successfully detects and identifies. The latter is achieved by averaging the low-rank component over time windows, which in turn enables us to precisely select the correct rank and monitor its evolution over time and thus identify the phase transition epochs. The methodology is illustrated on both synthetic networks generated by various network formation models, as well as the Kuramoto model of coupled oscillators and on real data reflecting the US Senate's voting record from 1979-2014. In the latter application, we identify that party polarization exhibited a sharp change and increased after 1993, a finding broadly concordant with the political science literature on the subject.
Collapse
Affiliation(s)
- Wei Bao
- Department of Physics, University of Michigan, Ann Arbor, USA
| | - George Michailidis
- Department of Statistics and the Informatics Institute, University of Florida, Gainesville, USA.
| |
Collapse
|
36
|
Liu W, Ma L, Jeon B, Chen L, Chen B. A Network Hierarchy-Based method for functional module detection in protein-protein interaction networks. J Theor Biol 2018; 455:26-38. [PMID: 29981337 DOI: 10.1016/j.jtbi.2018.06.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Revised: 06/27/2018] [Accepted: 06/29/2018] [Indexed: 02/02/2023]
Abstract
In the post-genomic era, one of the important tasks is to identify protein complexes and functional modules from high-throughput protein-protein interaction data, so that we can systematically analyze and understand the molecular functions and biological processes of cells. Although a lot of functional module detection studies have been proposed, how to design correctly and efficiently functional modules detection algorithms is still a challenging and important scientific problem in computational biology. In this paper, we present a novel Network Hierarchy-Based method to detect functional modules in PPI networks (named NHB-FMD). NHB-FMD first constructs the hierarchy tree corresponding to the PPI network and then encodes the tree such that genetic algorithm is employed to obtain the hierarchy tree with Maximum Likelihood. After that functional module partitioning is performed based on it and the best partitioning is selected as the result. Experimental results in the real PPI networks have shown that the proposed algorithm not only significantly outperforms the state-of-the-art methods but also can detect protein modules more effectively and accurately.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China; The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China; School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea.
| | - Liangyu Ma
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Byeungwoo Jeon
- School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea
| | - Ling Chen
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Bolun Chen
- The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China
| |
Collapse
|
37
|
Garcia JO, Ashourvan A, Muldoon SF, Vettel JM, Bassett DS. Applications of community detection techniques to brain graphs: Algorithmic considerations and implications for neural function. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2018; 106:846-867. [PMID: 30559531 PMCID: PMC6294140 DOI: 10.1109/jproc.2017.2786710] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The human brain can be represented as a graph in which neural units such as cells or small volumes of tissue are heterogeneously connected to one another through structural or functional links. Brain graphs are parsimonious representations of neural systems that have begun to offer fundamental insights into healthy human cognition, as well as its alteration in disease. A critical open question in network neuroscience lies in how neural units cluster into densely interconnected groups that can provide the coordinated activity that is characteristic of perception, action, and adaptive behaviors. Tools that have proven particularly useful for addressing this question are community detection approaches, which can identify communities or modules: groups of neural units that are densely interconnected with other units in their own group but sparsely interconnected with units in other groups. In this paper, we describe a common community detection algorithm known as modularity maximization, and we detail its applications to brain graphs constructed from neuroimaging data. We pay particular attention to important algorithmic considerations, especially in recent extensions of these techniques to graphs that evolve in time. After recounting a few fundamental insights that these techniques have provided into brain function, we highlight potential avenues of methodological advancements for future studies seeking to better characterize the patterns of coordinated activity in the brain that accompany human behavior. This tutorial provides a naive reader with an introduction to theoretical considerations pertinent to the generation of brain graphs, an understanding of modularity maximization for community detection, a resource of statistical measures that can be used to characterize community structure, and an appreciation of the usefulness of these approaches in uncovering behaviorally-relevant network dynamics in neuroimaging data.
Collapse
Affiliation(s)
- Javier O Garcia
- U.S. Army Research Laboratory, Aberdeen Proving Ground, MD 21005 USA
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
- Penn Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Mathematics and CDSE Program, University at Buffalo, Buffalo, NY 14260 USA
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, CA, 93106 USA
- Department of Electrical & Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Arian Ashourvan
- U.S. Army Research Laboratory, Aberdeen Proving Ground, MD 21005 USA
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
- Penn Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Mathematics and CDSE Program, University at Buffalo, Buffalo, NY 14260 USA
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, CA, 93106 USA
- Department of Electrical & Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Sarah F Muldoon
- U.S. Army Research Laboratory, Aberdeen Proving Ground, MD 21005 USA
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
- Penn Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Mathematics and CDSE Program, University at Buffalo, Buffalo, NY 14260 USA
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, CA, 93106 USA
- Department of Electrical & Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Jean M Vettel
- U.S. Army Research Laboratory, Aberdeen Proving Ground, MD 21005 USA
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
- Penn Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Mathematics and CDSE Program, University at Buffalo, Buffalo, NY 14260 USA
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, CA, 93106 USA
- Department of Electrical & Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Danielle S Bassett
- U.S. Army Research Laboratory, Aberdeen Proving Ground, MD 21005 USA
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
- Penn Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Mathematics and CDSE Program, University at Buffalo, Buffalo, NY 14260 USA
- Department of Psychological & Brain Sciences, University of California, Santa Barbara, CA, 93106 USA
- Department of Electrical & Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
| |
Collapse
|
38
|
Community Detection in Complex Networks via Clique Conductance. Sci Rep 2018; 8:5982. [PMID: 29654276 PMCID: PMC5899156 DOI: 10.1038/s41598-018-23932-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 03/20/2018] [Indexed: 11/08/2022] Open
Abstract
Network science plays a central role in understanding and modeling complex systems in many areas including physics, sociology, biology, computer science, economics, politics, and neuroscience. One of the most important features of networks is community structure, i.e., clustering of nodes that are locally densely interconnected. Communities reveal the hierarchical organization of nodes, and detecting communities is of great importance in the study of complex systems. Most existing community-detection methods consider low-order connection patterns at the level of individual links. But high-order connection patterns, at the level of small subnetworks, are generally not considered. In this paper, we develop a novel community-detection method based on cliques, i.e., local complete subnetworks. The proposed method overcomes the deficiencies of previous similar community-detection methods by considering the mathematical properties of cliques. We apply the proposed method to computer-generated graphs and real-world network datasets. When applied to networks with known community structure, the proposed method detects the structure with high fidelity and sensitivity. When applied to networks with no a priori information regarding community structure, the proposed method yields insightful results revealing the organization of these complex networks. We also show that the proposed method is guaranteed to detect near-optimal clusters in the bipartition case.
Collapse
|
39
|
Chang H, Feng Z, Ren Z. Community Detection Using Dual-Representation Chemical Reaction Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4328-4341. [PMID: 28113998 DOI: 10.1109/tcyb.2016.2607782] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Many complex networks have been shown to have community structures. Detecting those structures is very important for understanding the organization and function of networks. Because this problem is NP-hard, it is appropriate to resort to evolutionary algorithms. Chemical reaction optimization (CRO) is a novel evolutionary algorithm inspired by the interactions among molecules during chemical reactions. In this paper, we propose a CRO variant named dual-representation CRO (DCRO) to address the community detection problem. DCRO encodes a solution in two representations: one is locus-based and the other is vector-based. The former representation can ensure the validity of a solution and fits for diversification search, and the latter is convenient for intensification search. We thus design two operators for CRO based on these two representations. Their cooperation enables DCRO to achieve a good balance between exploration and exploitation. Experimental results on synthetic and real-life networks show that DCRO can find community structures close to the actual ones and is capable of achieving solutions comparable to several state-of-the-art methods.
Collapse
|
40
|
|
41
|
Jogwar SS, Daoutidis P. Community-based synthesis of distributed control architectures for integrated process networks. Chem Eng Sci 2017. [DOI: 10.1016/j.ces.2017.06.043] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
42
|
Identifying protein complexes in PPI network using non-cooperative sequential game. Sci Rep 2017; 7:8410. [PMID: 28827597 PMCID: PMC5566343 DOI: 10.1038/s41598-017-08760-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 07/13/2017] [Indexed: 11/14/2022] Open
Abstract
Identifying protein complexes from protein-protein interaction (PPI) network is an important and challenging task in computational biology as it helps in better understanding of cellular mechanisms in various organisms. In this paper we propose a noncooperative sequential game based model for protein complex detection from PPI network. The key hypothesis is that protein complex formation is driven by mechanism that eventually optimizes the number of interactions within the complex leading to dense subgraph. The hypothesis is drawn from the observed network property named small world. The proposed multi-player game model translates the hypothesis into the game strategies. The Nash equilibrium of the game corresponds to a network partition where each protein either belong to a complex or form a singleton cluster. We further propose an algorithm to find the Nash equilibrium of the sequential game. The exhaustive experiment on synthetic benchmark and real life yeast networks evaluates the structural as well as biological significance of the network partitions.
Collapse
|
43
|
Fu J, Zhang W, Wu J. Identification of leader and self-organizing communities in complex networks. Sci Rep 2017; 7:704. [PMID: 28386089 PMCID: PMC5429660 DOI: 10.1038/s41598-017-00718-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 03/09/2017] [Indexed: 11/22/2022] Open
Abstract
Community or module structure is a natural property of complex networks. Leader communities and self-organizing communities have been introduced recently to characterize networks and understand how communities arise in complex networks. However, identification of leader and self-organizing communities is technically challenging since no adequate quantification has been developed to properly separate the two types of communities. We introduced a new measure, called ratio of node degree variances, to distinguish leader communities from self-organizing communities, and developed a statistical model to quantitatively characterize the two types of communities. We experimentally studied the power and robustness of the new method on several real-world networks in combination of some of the existing community identification methods. Our results revealed that social networks and citation networks contain more leader communities whereas technological networks such as power grid network have more self-organizing communities. Moreover, our results also indicated that self-organizing communities tend to be smaller than leader communities. The results shed new lights on community formation and module structures in complex systems.
Collapse
Affiliation(s)
- Jingcheng Fu
- School of Mathematics, Shandong University, Jinan, 250100, China
- Department of Computer Science and Engineering, Washington University, St. Louis, MO, 63130, USA
| | - Weixiong Zhang
- College of Math and Computer Science, Institute for Systems Biology, Jianghan University, Wuhan, 430056, China
- Department of Computer Science and Engineering, Washington University, St. Louis, MO, 63130, USA
| | - Jianliang Wu
- School of Mathematics, Shandong University, Jinan, 250100, China.
| |
Collapse
|
44
|
Xu Y, Guo M, Liu X, Wang C, Liu Y, Liu G. Identify bilayer modules via pseudo-3D clustering: applications to miRNA-gene bilayer networks. Nucleic Acids Res 2016; 44:e152. [PMID: 27484480 PMCID: PMC5741208 DOI: 10.1093/nar/gkw679] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 06/30/2016] [Accepted: 07/18/2016] [Indexed: 12/11/2022] Open
Abstract
Module identification is a frequently used approach for mining local structures with more significance in global networks. Recently, a wide variety of bilayer networks are emerging to characterize the more complex biological processes. In the light of special topological properties of bilayer networks and the accompanying challenges, there is yet no effective method aiming at bilayer module identification to probe the modular organizations from the more inspiring bilayer networks. To this end, we proposed the pseudo-3D clustering algorithm, which starts from extracting initial non-hierarchically organized modules and then iteratively deciphers the hierarchical organization of modules according to a bottom-up strategy. Specifically, a modularity function for bilayer modules was proposed to facilitate the algorithm reporting the optimal partition that gives the most accurate characterization of the bilayer network. Simulation studies demonstrated its robustness and outperformance against alternative competing methods. Specific applications to both the soybean and human miRNA-gene bilayer networks demonstrated that the pseudo-3D clustering algorithm successfully identified the overlapping, hierarchically organized and highly cohesive bilayer modules. The analyses on topology, functional and human disease enrichment and the bilayer subnetwork involved in soybean fat biosynthesis provided both the theoretical and biological evidence supporting the effectiveness and robustness of pseudo-3D clustering algorithm.
Collapse
Affiliation(s)
- Yungang Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Guojun Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
45
|
Takaguchi T, Yoshida Y. Cycle and flow trusses in directed networks. ROYAL SOCIETY OPEN SCIENCE 2016; 3:160270. [PMID: 28018610 PMCID: PMC5180108 DOI: 10.1098/rsos.160270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 10/31/2016] [Indexed: 06/06/2023]
Abstract
When we represent real-world systems as networks, the directions of links often convey valuable information. Finding module structures that respect link directions is one of the most important tasks for analysing directed networks. Although many notions of a directed module have been proposed, no consensus has been reached. This lack of consensus results partly because there might exist distinct types of modules in a single directed network, whereas most previous studies focused on an independent criterion for modules. To address this issue, we propose a generic notion of the so-called truss structures in directed networks. Our definition of truss is able to extract two distinct types of trusses, named the cycle truss and the flow truss, from a unified framework. By applying the method for finding trusses to empirical networks obtained from a wide range of research fields, we find that most real networks contain both cycle and flow trusses. In addition, the abundance of (and the overlap between) the two types of trusses may be useful to characterize module structures in a wide variety of empirical networks. Our findings shed light on the importance of simultaneously considering different types of modules in directed networks.
Collapse
Affiliation(s)
- Taro Takaguchi
- National Institute of Informatics, ERATO, Kawarabayashi Large Graph Project, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430 Tokyo, Japan
- JST, ERATO, Kawarabayashi Large Graph Project, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430 Tokyo, Japan
| | - Yuichi Yoshida
- National Institute of Informatics, ERATO, Kawarabayashi Large Graph Project, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430 Tokyo, Japan
- Preferred Infrastructure, 1-6-1 Otemachi, Chiyoda-ku, 100-0004 Tokyo, Japan
| |
Collapse
|
46
|
Tang X, Hu X, Yang X, Fan Y, Li Y, Hu W, Liao Y, Zheng MC, Peng W, Gao L. Predicting diabetes mellitus genes via protein-protein interaction and protein subcellular localization information. BMC Genomics 2016; 17 Suppl 4:433. [PMID: 27535125 PMCID: PMC5001230 DOI: 10.1186/s12864-016-2795-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Diabetes mellitus characterized by hyperglycemia as a result of insufficient production of or reduced sensitivity to insulin poses a growing threat to the health of people. It is a heterogeneous disorder with multiple etiologies consisting of type 1 diabetes, type 2 diabetes, gestational diabetes and so on. Diabetes-associated protein/gene prediction is a key step to understand the cellular mechanisms related to diabetes mellitus. Compared with experimental methods, computational predictions of candidate proteins/genes are cheaper and more effortless. Protein-protein interaction (PPI) data produced by the high-throughput technology have been used to prioritize candidate disease genes/proteins. However, the false interactions in the PPI data seriously hurt computational methods performance. In order to address that particular question, new methods are developed to identify candidate disease genes/proteins via integrating biological data from other sources. RESULTS In this study, a new framework called PDMG is proposed to predict candidate disease genes/proteins. First, the weighted networks are building in terms of the combination of the subcellular localization information and PPI data. To form the weighted networks, the importance of each compartment is evaluated based on the number of interacted proteins in this compartment. This is because the very different roles played by different compartments in cell activities. Besides, some compartments are more important than others. Based on the evaluated compartments, the interactions between proteins are scored and the weighted PPI networks are constructed. Second, the known disease genes are extracted from OMIM database as the seed genes to expand disease-specific networks based on the weighted networks. Third, the weighted values between a protein and its neighbors in the disease-related networks are added together and the sum is as the score of the protein. Last but not least, the proteins are ranked based on descending order of their scores. The candidate proteins in the top are considered to be associated with the diseases and are potential disease-related proteins. Various types of data, such as type 2 diabetes-associated genes, subcellular localizations and protein interactions, are used to test PDMG method. CONCLUSIONS The results show that the proteins/genes functionally exerting a direct influence over diabetes are consistently placed at the head of the queue. PDMG expands and ranks 445 candidate proteins from the seed set including original 27 type 2 diabetes proteins. Out of the top 27 proteins, 14 proteins are the real type 2 diabetes proteins. The literature extracted from the PubMed database has proved that, out of 13 novel proteins, 8 proteins are associated with diabetes.
Collapse
Affiliation(s)
- Xiwei Tang
- School of Information Science and Engineering, Hunan First Normal University, Changsha, 410205, China.
- College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, USA.
- College of Computer, National University of Defense Technology, Changsha, 410073, China.
| | - Xiaohua Hu
- College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, USA.
- School of Computer, Central China Normal University, Hubei, 430079, China.
| | - Xuejun Yang
- College of Computer, National University of Defense Technology, Changsha, 410073, China
| | - Yetian Fan
- School of Mathematical Sciences, Dalian University of Technology, Dalian, 116023, China
| | - Yongfan Li
- School of Information Science and Engineering, Hunan First Normal University, Changsha, 410205, China
| | - Wei Hu
- School of Information Science and Engineering, Hunan First Normal University, Changsha, 410205, China
| | - Yongzhong Liao
- School of Information Science and Engineering, Hunan First Normal University, Changsha, 410205, China
| | - Ming Cai Zheng
- School of Information Science and Engineering, Hunan First Normal University, Changsha, 410205, China
| | - Wei Peng
- Computer Center, Kunming University of Science and Technology, Kunming, 650500, China
| | - Li Gao
- School of Computer, Central China Normal University, Hubei, 430079, China
| |
Collapse
|
47
|
Alcalá-Corona SA, Velázquez-Caldelas TE, Espinal-Enríquez J, Hernández-Lemus E. Community Structure Reveals Biologically Functional Modules in MEF2C Transcriptional Regulatory Network. Front Physiol 2016; 7:184. [PMID: 27252657 PMCID: PMC4878384 DOI: 10.3389/fphys.2016.00184] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 05/06/2016] [Indexed: 01/04/2023] Open
Abstract
Gene regulatory networks are useful to understand the activity behind the complex mechanisms in transcriptional regulation. A main goal in contemporary biology is using such networks to understand the systemic regulation of gene expression. In this work, we carried out a systematic study of a transcriptional regulatory network derived from a comprehensive selection of all potential transcription factor interactions downstream from MEF2C, a human transcription factor master regulator. By analyzing the connectivity structure of such network, we were able to find different biologically functional processes and specific biochemical pathways statistically enriched in communities of genes into the network, such processes are related to cell signaling, cell cycle and metabolism. In this way we further support the hypothesis that structural properties of biological networks encode an important part of their functional behavior in eukaryotic cells.
Collapse
Affiliation(s)
- Sergio A Alcalá-Corona
- Computational Genomics Department, National Institute of Genomic MedicineMexico City, Mexico; Complexity in Systems Biology, Center for Complexity Sciences, Universidad Nacional Autónoma de MéxicoMexico City, Mexico
| | | | - Jesús Espinal-Enríquez
- Computational Genomics Department, National Institute of Genomic MedicineMexico City, Mexico; Complexity in Systems Biology, Center for Complexity Sciences, Universidad Nacional Autónoma de MéxicoMexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Department, National Institute of Genomic MedicineMexico City, Mexico; Complexity in Systems Biology, Center for Complexity Sciences, Universidad Nacional Autónoma de MéxicoMexico City, Mexico
| |
Collapse
|
48
|
Henriques R, Madeira SC. BicNET: Flexible module discovery in large-scale biological networks using biclustering. Algorithms Mol Biol 2016; 11:14. [PMID: 27213009 PMCID: PMC4875761 DOI: 10.1186/s13015-016-0074-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 04/22/2016] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Despite the recognized importance of module discovery in biological networks to enhance our understanding of complex biological systems, existing methods generally suffer from two major drawbacks. First, there is a focus on modules where biological entities are strongly connected, leading to the discovery of trivial/well-known modules and to the inaccurate exclusion of biological entities with subtler yet relevant roles. Second, there is a generalized intolerance towards different forms of noise, including uncertainty associated with less-studied biological entities (in the context of literature-driven networks) and experimental noise (in the context of data-driven networks). Although state-of-the-art biclustering algorithms are able to discover modules with varying coherency and robustness to noise, their application for the discovery of non-dense modules in biological networks has been poorly explored and it is further challenged by efficiency bottlenecks. METHODS This work proposes Biclustering NETworks (BicNET), a biclustering algorithm to discover non-trivial yet coherent modules in weighted biological networks with heightened efficiency. Three major contributions are provided. First, we motivate the relevance of discovering network modules given by constant, symmetric, plaid and order-preserving biclustering models. Second, we propose an algorithm to discover these modules and to robustly handle noisy and missing interactions. Finally, we provide new searches to tackle time and memory bottlenecks by effectively exploring the inherent structural sparsity of network data. RESULTS Results in synthetic network data confirm the soundness, efficiency and superiority of BicNET. The application of BicNET on protein interaction and gene interaction networks from yeast, E. coli and Human reveals new modules with heightened biological significance. CONCLUSIONS BicNET is, to our knowledge, the first method enabling the efficient unsupervised analysis of large-scale network data for the discovery of coherent modules with parameterizable homogeneity.
Collapse
Affiliation(s)
- Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Sara C. Madeira
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
49
|
Ye H, Luo H, Ng HW, Meehan J, Ge W, Tong W, Hong H. Applying network analysis and Nebula (neighbor-edges based and unbiased leverage algorithm) to ToxCast data. ENVIRONMENT INTERNATIONAL 2016; 89-90:81-92. [PMID: 26826365 DOI: 10.1016/j.envint.2016.01.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Revised: 01/08/2016] [Accepted: 01/13/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND ToxCast data have been used to develop models for predicting in vivo toxicity. To predict the in vivo toxicity of a new chemical using a ToxCast data based model, its ToxCast bioactivity data are needed but not normally available. The capability of predicting ToxCast bioactivity data is necessary to fully utilize ToxCast data in the risk assessment of chemicals. OBJECTIVES We aimed to understand and elucidate the relationships between the chemicals and bioactivity data of the assays in ToxCast and to develop a network analysis based method for predicting ToxCast bioactivity data. METHODS We conducted modularity analysis on a quantitative network constructed from ToxCast data to explore the relationships between the assays and chemicals. We further developed Nebula (neighbor-edges based and unbiased leverage algorithm) for predicting ToxCast bioactivity data. RESULTS Modularity analysis on the network constructed from ToxCast data yielded seven modules. Assays and chemicals in the seven modules were distinct. Leave-one-out cross-validation yielded a Q(2) of 0.5416, indicating ToxCast bioactivity data can be predicted by Nebula. Prediction domain analysis showed some types of ToxCast assay data could be more reliably predicted by Nebula than others. CONCLUSIONS Network analysis is a promising approach to understand ToxCast data. Nebula is an effective algorithm for predicting ToxCast bioactivity data, helping fully utilize ToxCast data in the risk assessment of chemicals.
Collapse
Affiliation(s)
- Hao Ye
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - Heng Luo
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - Hui Wen Ng
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - Joe Meehan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - Weigong Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| |
Collapse
|
50
|
Kouhsar M, Zare-Mirakabad F, Jamali Y. WCOACH: Protein complex prediction in weighted PPI networks. Genes Genet Syst 2016; 90:317-24. [PMID: 26781082 DOI: 10.1266/ggs.15-00032] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Protein complexes are aggregates of protein molecules that play important roles in biological processes. Detecting protein complexes from protein-protein interaction (PPI) networks is one of the most challenging problems in computational biology, and many computational methods have been developed to solve this problem. Generally, these methods yield high false positive rates. In this article, a semantic similarity measure between proteins, based on Gene Ontology (GO) structure, is applied to weigh PPI networks. Consequently, one of the well-known methods, COACH, has been improved to be compatible with weighted PPI networks for protein complex detection. The new method, WCOACH, is compared to the COACH, ClusterOne, IPCA, CORE, OH-PIN, HC-PIN and MCODE methods on several PPI networks such as DIP, Krogan, Gavin 2002 and MIPS. WCOACH can be applied as a fast and high-performance algorithm to predict protein complexes in weighted PPI networks. All data and programs are freely available at http://bioinformatics.aut.ac.ir/wcoach.
Collapse
Affiliation(s)
- Morteza Kouhsar
- Department of Computer Science, School of Mathematical Sciences, Tarbiat Modares University
| | | | | |
Collapse
|