1
|
Baek B, Jang E, Park S, Park SH, Williams DR, Jung DW, Lee H. Integrated drug response prediction models pinpoint repurposed drugs with effectiveness against rhabdomyosarcoma. PLoS One 2024; 19:e0295629. [PMID: 38277404 PMCID: PMC10817174 DOI: 10.1371/journal.pone.0295629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 11/24/2023] [Indexed: 01/28/2024] Open
Abstract
Targeted therapies for inhibiting the growth of cancer cells or inducing apoptosis are urgently needed for effective rhabdomyosarcoma (RMS) treatment. However, identifying cancer-targeting compounds with few side effects, among the many potential compounds, is expensive and time-consuming. A computational approach to reduce the number of potential candidate drugs can facilitate the discovery of attractive lead compounds. To address this and obtain reliable predictions of novel cell-line-specific drugs, we apply prediction models that have the potential to improve drug discovery approaches for RMS treatment. The results of two prediction models were ensemble and validated via in vitro experiments. The computational models were trained using data extracted from the Genomics of Drug Sensitivity in Cancer database and tested on two RMS cell lines to select potential RMS drug candidates. Among 235 candidate drugs, 22 were selected following the result of the computational approach, and three candidate drugs were identified (NSC207895, vorinostat, and belinostat) that showed selective effectiveness in RMS cell lines in vitro via the induction of apoptosis. Our in vitro experiments have demonstrated that our proposed methods can effectively identify and repurpose drugs for treating RMS.
Collapse
Affiliation(s)
- Bin Baek
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Eunmi Jang
- School of Life Sciences, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Sejin Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Sung-Hye Park
- Department of Pathology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Republic of Korea
- Institute of Neuroscience, Seoul National University Hospital, Seoul, Republic of Korea
| | - Darren Reece Williams
- School of Life Sciences, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Da-Woon Jung
- School of Life Sciences, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| |
Collapse
|
2
|
Shi W, Feng H, Li J, Liu T, Liu Z. DapBCH: a disease association prediction model Based on Cross-species and Heterogeneous graph embedding. Front Genet 2023; 14:1222346. [PMID: 37811150 PMCID: PMC10556742 DOI: 10.3389/fgene.2023.1222346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 09/11/2023] [Indexed: 10/10/2023] Open
Abstract
The study of comorbidity can provide new insights into the pathogenesis of the disease and has important economic significance in the clinical evaluation of treatment difficulty, medical expenses, length of stay, and prognosis of the disease. In this paper, we propose a disease association prediction model DapBCH, which constructs a cross-species biological network and applies heterogeneous graph embedding to predict disease association. First, we combine the human disease-gene network, mouse gene-phenotype network, human-mouse homologous gene network, and human protein-protein interaction network to reconstruct a heterogeneous biological network. Second, we apply heterogeneous graph embedding based on meta-path aggregation to generate the feature vector of disease nodes. Finally, we employ link prediction to obtain the similarity of disease pairs. The experimental results indicate that our model is highly competitive in predicting the disease association and is promising for finding potential disease associations.
Collapse
Affiliation(s)
- Wanqi Shi
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Hailin Feng
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Jian Li
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Tongcun Liu
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Zhe Liu
- College of Media Engineering, Zhejiang University of Media and Communications, Hangzhou, Zhejiang, China
| |
Collapse
|
3
|
Manipur I, Giordano M, Piccirillo M, Parashuraman S, Maddalena L. Community Detection in Protein-Protein Interaction Networks and Applications. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:217-237. [PMID: 34951849 DOI: 10.1109/tcbb.2021.3138142] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The ability to identify and characterize not only the protein-protein interactions but also their internal modular organization through network analysis is fundamental for understanding the mechanisms of biological processes at the molecular level. Indeed, the detection of the network communities can enhance our understanding of the molecular basis of disease pathology, and promote drug discovery and disease treatment in personalized medicine. This work gives an overview of recent computational methods for the detection of protein complexes and functional modules in protein-protein interaction networks, also providing a focus on some of its applications. We propose a systematic reformulation of frequently adopted taxonomies for these methods, also proposing new categories to keep up with the most recent research. We review the literature of the last five years (2017-2021) and provide links to existing data and software resources. Finally, we survey recent works exploiting module identification and analysis, in the context of a variety of disease processes for biomarker identification and therapeutic target detection. Our review provides the interested reader with an up-to-date and self-contained view of the existing research, with links to state-of-the-art literature and resources, as well as hints on open issues and future research directions in complex detection and its applications.
Collapse
|
4
|
Chitra U, Park TY, Raphael BJ. NetMix2: A Principled Network Propagation Algorithm for Identifying Altered Subnetworks. J Comput Biol 2022; 29:1305-1323. [PMID: 36525308 PMCID: PMC9917315 DOI: 10.1089/cmb.2022.0336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A standard paradigm in computational biology is to leverage interaction networks as prior knowledge in analyzing high-throughput biological data, where the data give a score for each vertex in the network. One classical approach is the identification of altered subnetworks, or subnetworks of the interaction network that have both outlier vertex scores and a defined network topology. One class of algorithms for identifying altered subnetworks search for high-scoring subnetworks in subnetwork families with simple topological constraints, such as connected subnetworks, and have sound statistical guarantees. A second class of algorithms employ network propagation-the smoothing of vertex scores over the network using a random walk or diffusion process-and utilize the global structure of the network. However, network propagation algorithms often rely on ad hoc heuristics that lack a rigorous statistical foundation. In this work, we unify the subnetwork family and network propagation approaches by deriving the propagation family, a subnetwork family that approximates the sets of vertices ranked highly by network propagation approaches. We introduce NetMix2, a principled algorithm for identifying altered subnetworks from a wide range of subnetwork families. When using the propagation family, NetMix2 combines the advantages of the subnetwork family and network propagation approaches. NetMix2 outperforms other methods, including network propagation on simulated data, pan-cancer somatic mutation data, and genome-wide association data from multiple human diseases.
Collapse
Affiliation(s)
- Uthsav Chitra
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Tae Yoon Park
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
| | - Benjamin J. Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
| |
Collapse
|
5
|
Winkler S, Winkler I, Figaschewski M, Tiede T, Nordheim A, Kohlbacher O. De novo identification of maximally deregulated subnetworks based on multi-omics data with DeRegNet. BMC Bioinformatics 2022; 23:139. [PMID: 35439941 PMCID: PMC9020058 DOI: 10.1186/s12859-022-04670-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 03/29/2022] [Indexed: 12/14/2022] Open
Abstract
Background With a growing amount of (multi-)omics data being available, the extraction of knowledge from these datasets is still a difficult problem. Classical enrichment-style analyses require predefined pathways or gene sets that are tested for significant deregulation to assess whether the pathway is functionally involved in the biological process under study. De novo identification of these pathways can reduce the bias inherent in predefined pathways or gene sets. At the same time, the definition and efficient identification of these pathways de novo from large biological networks is a challenging problem. Results We present a novel algorithm, DeRegNet, for the identification of maximally deregulated subnetworks on directed graphs based on deregulation scores derived from (multi-)omics data. DeRegNet can be interpreted as maximum likelihood estimation given a certain probabilistic model for de-novo subgraph identification. We use fractional integer programming to solve the resulting combinatorial optimization problem. We can show that the approach outperforms related algorithms on simulated data with known ground truths. On a publicly available liver cancer dataset we can show that DeRegNet can identify biologically meaningful subgraphs suitable for patient stratification. DeRegNet can also be used to find explicitly multi-omics subgraphs which we demonstrate by presenting subgraphs with consistent methylation-transcription patterns. DeRegNet is freely available as open-source software. Conclusion The proposed algorithmic framework and its available implementation can serve as a valuable heuristic hypothesis generation tool contextualizing omics data within biomolecular networks.
Collapse
Affiliation(s)
- Sebastian Winkler
- Applied Bioinformatics, Department of Computer Science, University of Tuebingen, Tübingen, Germany. .,International Max Planck Research School (IMPRS) "From Molecules to Organism", Tübingen, Germany.
| | - Ivana Winkler
- International Max Planck Research School (IMPRS) "From Molecules to Organism", Tübingen, Germany.,Interfaculty Institute for Cell Biology (IFIZ), University of Tuebingen, Tübingen, Germany.,German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Mirjam Figaschewski
- Applied Bioinformatics, Department of Computer Science, University of Tuebingen, Tübingen, Germany
| | - Thorsten Tiede
- Applied Bioinformatics, Department of Computer Science, University of Tuebingen, Tübingen, Germany
| | - Alfred Nordheim
- Interfaculty Institute for Cell Biology (IFIZ), University of Tuebingen, Tübingen, Germany.,Leibniz Institute on Aging (FLI), Jena, Germany
| | - Oliver Kohlbacher
- Applied Bioinformatics, Department of Computer Science, University of Tuebingen, Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tuebingen, Tübingen, Germany.,Translational Bioinformatics, University Hospital Tuebingen, Tübingen, Germany
| |
Collapse
|
6
|
Smell Detection Agent Optimisation Framework and Systems Biology Approach to Detect Dys-Regulated Subnetwork in Cancer Data. Biomolecules 2021; 12:biom12010037. [PMID: 35053185 PMCID: PMC8774275 DOI: 10.3390/biom12010037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 11/23/2022] Open
Abstract
Network biology has become a key tool in unravelling the mechanisms of complex diseases. Detecting dys-regulated subnetworks from molecular networks is a task that needs efficient computational methods. In this work, we constructed an integrated network using gene interaction data as well as protein–protein interaction data of differentially expressed genes derived from the microarray gene expression data. We considered the level of differential expression as well as the topological weight of proteins in interaction network to quantify dys-regulation. Then, a nature-inspired Smell Detection Agent (SDA) optimisation algorithm is designed with multiple agents traversing through various paths in the network. Finally, the algorithm provides a maximum weighted module as the optimum dys-regulated subnetwork. The analysis is performed for samples of triple-negative breast cancer as well as colorectal cancer. Biological significance analysis of module genes is also done to validate the results. The breast cancer subnetwork is found to contain (i) valid biomarkers including PIK3CA, PTEN, BRCA1, AR and EGFR; (ii) validated drug targets TOP2A, CDK4, HDAC1, IL6, BRCA1, HSP90AA1 and AR; (iii) synergistic drug targets EGFR and BIRC5. Moreover, based on the weight values assigned to nodes in the subnetwork, PLK1, CTNNB1, IGF1, AURKA, PCNA, HSPA4 and GAPDH are proposed as drug targets for further studies. For colorectal cancer module, the analysis revealed the occurrence of approved drug targets TYMS, TOP1, BRAF and EGFR. Considering the higher weight values, HSP90AA1, CCNB1, AKT1 and CXCL8 are proposed as drug targets for experimentation. The derived subnetworks possess cancer-related pathways as well. The SDA-derived breast cancer subnetwork is compared with that of tools such as MCODE and Minimum Spanning Tree, and observed a higher enrichment (75%) of significant elements. Thus, the proposed nature-inspired algorithm is a novel approach to derive the optimum dys-regulated subnetwork from huge molecular network.
Collapse
|
7
|
Pasquier C, Robichon A. Temporal and sequential order of nonoverlapping gene networks unraveled in mated female Drosophila. Life Sci Alliance 2021; 5:5/2/e202101119. [PMID: 34844981 PMCID: PMC8645335 DOI: 10.26508/lsa.202101119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 11/11/2021] [Accepted: 11/12/2021] [Indexed: 12/13/2022] Open
Abstract
Mating triggers successive waves of temporal transcriptomic changes within independent gene networks in female Drosophila, suggesting a recruitment of interconnected modules that vanish in late life. In this study, we reanalyzed available datasets of gene expression changes in female Drosophila head induced by mating. Mated females present metabolic phenotypic changes and display behavioral characteristics that are not observed in virgin females, such as repulsion to male sexual aggressiveness, fidelity to food spots selected for oviposition, and restriction to the colonization of new niches. We characterize gene networks that play a role in female brain plasticity after mating using AMINE, a novel algorithm to find dysregulated modules of interacting genes. The uncovered networks of altered genes revealed a strong specificity for each successive period of life span after mating in the female head, with little conservation between them. This finding highlights a temporal order of recruitment of waves of interconnected genes which are apparently transiently modified: the first wave disappears before the emergence of the second wave in a reversible manner and ends with few consolidated gene expression changes at day 20. This analysis might document an extended field of a programmatic control of female phenotypic traits by male seminal fluid.
Collapse
|
8
|
Karimi MR, Karimi AH, Abolmaali S, Sadeghi M, Schmitz U. Prospects and challenges of cancer systems medicine: from genes to disease networks. Brief Bioinform 2021; 23:6361045. [PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/20/2022] Open
Abstract
It is becoming evident that holistic perspectives toward cancer are crucial in deciphering the overwhelming complexity of tumors. Single-layer analysis of genome-wide data has greatly contributed to our understanding of cellular systems and their perturbations. However, fundamental gaps in our knowledge persist and hamper the design of effective interventions. It is becoming more apparent than ever, that cancer should not only be viewed as a disease of the genome but as a disease of the cellular system. Integrative multilayer approaches are emerging as vigorous assets in our endeavors to achieve systemic views on cancer biology. Herein, we provide a comprehensive review of the approaches, methods and technologies that can serve to achieve systemic perspectives of cancer. We start with genome-wide single-layer approaches of omics analyses of cellular systems and move on to multilayer integrative approaches in which in-depth descriptions of proteogenomics and network-based data analysis are provided. Proteogenomics is a remarkable example of how the integration of multiple levels of information can reduce our blind spots and increase the accuracy and reliability of our interpretations and network-based data analysis is a major approach for data interpretation and a robust scaffold for data integration and modeling. Overall, this review aims to increase cross-field awareness of the approaches and challenges regarding the omics-based study of cancer and to facilitate the necessary shift toward holistic approaches.
Collapse
Affiliation(s)
| | | | | | - Mehdi Sadeghi
- Department of Cell & Molecular Biology, Semnan University, Semnan, Iran
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville, QLD 4811, Australia
| |
Collapse
|
9
|
A multi-objective genetic algorithm to find active modules in multiplex biological networks. PLoS Comput Biol 2021; 17:e1009263. [PMID: 34460810 PMCID: PMC8452006 DOI: 10.1371/journal.pcbi.1009263] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 09/20/2021] [Accepted: 07/09/2021] [Indexed: 12/13/2022] Open
Abstract
The identification of subnetworks of interest—or active modules—by integrating biological networks with molecular profiles is a key resource to inform on the processes perturbed in different cellular conditions. We here propose MOGAMUN, a Multi-Objective Genetic Algorithm to identify active modules in MUltiplex biological Networks. MOGAMUN optimizes both the density of interactions and the scores of the nodes (e.g., their differential expression). We compare MOGAMUN with state-of-the-art methods, representative of different algorithms dedicated to the identification of active modules in single networks. MOGAMUN identifies dense and high-scoring modules that are also easier to interpret. In addition, to our knowledge, MOGAMUN is the first method able to use multiplex networks. Multiplex networks are composed of different layers of physical and functional relationships between genes and proteins. Each layer is associated to its own meaning, topology, and biases; the multiplex framework allows exploiting this diversity of biological networks. We applied MOGAMUN to identify cellular processes perturbed in Facio-Scapulo-Humeral muscular Dystrophy, by integrating RNA-seq expression data with a multiplex biological network. We identified different active modules of interest, thereby providing new angles for investigating the pathomechanisms of this disease. Availability: MOGAMUN is available at https://github.com/elvanov/MOGAMUN and as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/MOGAMUN.html. Contact:anais.baudot@univ-amu.fr Integrating different sources of biological information is a powerful way to uncover the functioning of biological systems. In network biology, in particular, integrating interaction data with expression profiles helps contextualizing the networks and identifying subnetworks of interest, aka active modules. We here propose MOGAMUN, a multi-objective genetic algorithm that optimizes both the overall deregulation and the density to identify active modules, considering jointly multiple sources of biological interactions. We demonstrate the performance of MOGAMUN over state-of-the-art methods, and illustrate its usefulness in unveiling perturbed biological processes in Facio-Scapulo-Humeral muscular Dystrophy.
Collapse
|
10
|
Suresh NT, Ravindran VE, Krishnakumar U. A Computational Framework to Identify Cross Association Between Complex Disorders by Protein-protein Interaction Network Analysis. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200724145434] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Objective:
It is a known fact that numerous complex disorders do not happen in
isolation indicating the plausible set of shared causes common to several different sicknesses.
Hence, analysis of comorbidity can be utilized to explore the association between several
disorders. In this study, we have proposed a network-based computational approach, in which
genes are organized based on the topological characteristics of the constructed Protein-Protein
Interaction Network (PPIN) followed by a network prioritization scheme, to identify distinctive
key genes and biological pathways shared among diseases.
Methods:
The proposed approach is initiated from constructed PPIN of any randomly chosen
disease genes in order to infer its associations with other diseases in terms of shared pathways, coexpression,
co-occurrence etc. For this, initially, proteins associated to any disease based on
random choice were identified. Secondly, PPIN is organized through topological analysis to define
hub genes. Finally, using a prioritization algorithm a ranked list of newly predicted
multimorbidity-associated proteins is generated. Using Gene Ontology (GO), cellular pathways
involved in multimorbidity-associated proteins are mined.
Result and Conclusion:
: The proposed methodology is tested using three disorders, namely
Diabetes, Obesity and blood pressure at an atomic level and the results suggest the comorbidity of
other complex diseases that have associations with the proteins included in the disease of present
study through shared proteins and pathways. For diabetes, we have obtained key genes like
GAPDH, TNF, IL6, AKT1, ALB, TP53, IL10, MAPK3, TLR4 and EGF with key pathways like
P53 pathway, VEGF signaling pathway, Ras Pathway, Interleukin signaling pathway, Endothelin
signaling pathway, Huntington disease etc. Studies on other disorders such as obesity and blood
pressure also revealed promising results.
Collapse
Affiliation(s)
- Nikhila T. Suresh
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, Kochi, India
| | - Vimina E. Ravindran
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, Kochi, India
| | - Ullattil Krishnakumar
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, Kochi, India
| |
Collapse
|
11
|
Liu W, Su Y, Li S, Chen H, Liu Y, Li X, Shen W, Zhong X, Wu F, Meng Q, Jiang X. Weighted gene coexpression network reveals downregulation of genes in bronchopulmonary dysplasia. Pediatr Pulmonol 2021; 56:392-399. [PMID: 33118673 DOI: 10.1002/ppul.25141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 09/14/2020] [Accepted: 10/09/2020] [Indexed: 11/12/2022]
Abstract
BACKGROUND Bronchopulmonary dysplasia (BPD) is a serious lung disease observed in premature infants, known to cause considerable morbidity and mortality. Its prognosis is influenced by a complex network of genetic interactions. In this study, we determined the potential key factors in the pathogenesis of this condition. METHODS We constructed scale-free gene coexpression network using weighted gene coexpression network analysis. The analysis was carried out on the GSE8586 dataset, which contains the expression profiles of umbilical cord tissue homogenates from 20 neonates with BPD and 34 unaffected controls. RESULTS Our analysis identified one significantly downregulated coexpression module related to the BPD phenotype. It was significantly enriched in genes related to human T-cell leukemia virus infection and the mitogen-activated protein kinase pathway. In this module, the expression of the following four hub genes in infants with BPD was significantly decreased: Fos proto-oncogene (FOS), BTG antiproliferation factor 2 (BTG2), Jun proto-oncogene (JUN), and early growth response protein 1 (EGR1). The downregulation of these hub genes was verified in clinical samples derived from blood and umbilical cord tissue. CONCLUSION The decreased expression of FOS, BTG2, JUN, and EGR1 is associated with BPD and, therefore, could be used as biomarkers to diagnose early BPD.
Collapse
Affiliation(s)
- Wangkai Liu
- Department of Pediatrics, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yihua Su
- Department of Ophthalmology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Sitao Li
- Department of Pediatrics, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Haitian Chen
- Department of Obstetrics, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yumei Liu
- Department of Neonatology, Guangdong Academy of Medical Sciences, Guangdong Provincial People's Hospital, Guangzhou, China
| | - Xiaoyu Li
- Department of Pediatrics, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Wei Shen
- Department of Pediatrics, Southern Medical University, Guangzhou, Guangdong, China
| | - Xinqi Zhong
- Department of Pediatrics, Third Affiliated Hospital, Guangzhou Medical University, Guangzhou, China
| | - Fan Wu
- Department of Pediatrics, Third Affiliated Hospital, Guangzhou Medical University, Guangzhou, China
| | - Qiong Meng
- Department of Pediatrics, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Xiaoyun Jiang
- Department of Pediatrics, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
12
|
Wu L, Han L, Li Q, Wang G, Zhang H, Li L. Using Interactome Big Data to Crack Genetic Mysteries and Enhance Future Crop Breeding. MOLECULAR PLANT 2021; 14:77-94. [PMID: 33340690 DOI: 10.1016/j.molp.2020.12.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 05/27/2023]
Abstract
The functional genes underlying phenotypic variation and their interactions represent "genetic mysteries". Understanding and utilizing these genetic mysteries are key solutions for mitigating the current threats to agriculture posed by population growth and individual food preferences. Due to advances in high-throughput multi-omics technologies, we are stepping into an Interactome Big Data era that is certain to revolutionize genetic research. In this article, we provide a brief overview of current strategies to explore genetic mysteries. We then introduce the methods for constructing and analyzing the Interactome Big Data and summarize currently available interactome resources. Next, we discuss how Interactome Big Data can be used as a versatile tool to dissect genetic mysteries. We propose an integrated strategy that could revolutionize genetic research by combining Interactome Big Data with machine learning, which involves mining information hidden in Big Data to identify the genetic models or networks that control various traits, and also provide a detailed procedure for systematic dissection of genetic mysteries,. Finally, we discuss three promising future breeding strategies utilizing the Interactome Big Data to improve crop yields and quality.
Collapse
Affiliation(s)
- Leiming Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Linqian Han
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Qing Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoying Wang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hongwei Zhang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
13
|
Thistlethwaite LR, Petrosyan V, Li X, Miller MJ, Elsea SH, Milosavljevic A. CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models. PLoS Comput Biol 2021; 17:e1008550. [PMID: 33513132 PMCID: PMC7875364 DOI: 10.1371/journal.pcbi.1008550] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 02/10/2021] [Accepted: 11/16/2020] [Indexed: 01/17/2023] Open
Abstract
We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, "Connect the Dots", a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.
Collapse
Affiliation(s)
- Lillian R. Thistlethwaite
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Varduhi Petrosyan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Xiqi Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Marcus J. Miller
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Sarah H. Elsea
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Aleksandar Milosavljevic
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|
14
|
Ahmed KT, Park S, Jiang Q, Yeu Y, Hwang T, Zhang W. Network-based drug sensitivity prediction. BMC Med Genomics 2020; 13:193. [PMID: 33371891 PMCID: PMC7771088 DOI: 10.1186/s12920-020-00829-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 11/17/2020] [Indexed: 12/15/2022] Open
Abstract
Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at https://github.com/compbiolabucf/drug-sensitivity-prediction. Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, we found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA
| | - Sunho Park
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9211 Euclid Ave, Cleveland, OH, 44106, USA
| | - Qibing Jiang
- Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA
| | - Yunku Yeu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9211 Euclid Ave, Cleveland, OH, 44106, USA
| | - TaeHyun Hwang
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9211 Euclid Ave, Cleveland, OH, 44106, USA
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA.
| |
Collapse
|
15
|
Chitoiu L, Dobranici A, Gherghiceanu M, Dinescu S, Costache M. Multi-Omics Data Integration in Extracellular Vesicle Biology-Utopia or Future Reality? Int J Mol Sci 2020; 21:ijms21228550. [PMID: 33202771 PMCID: PMC7697477 DOI: 10.3390/ijms21228550] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/10/2020] [Accepted: 11/11/2020] [Indexed: 12/15/2022] Open
Abstract
Extracellular vesicles (EVs) are membranous structures derived from the endosomal system or generated by plasma membrane shedding. Due to their composition of DNA, RNA, proteins, and lipids, EVs have garnered a lot of attention as an essential mechanism of cell-to-cell communication, with various implications in physiological and pathological processes. EVs are not only a highly heterogeneous population by means of size and biogenesis, but they are also a source of diverse, functionally rich biomolecules. Recent advances in high-throughput processing of biological samples have facilitated the development of databases comprised of characteristic genomic, transcriptomic, proteomic, metabolomic, and lipidomic profiles for EV cargo. Despite the in-depth approach used to map functional molecules in EV-mediated cellular cross-talk, few integrative methods have been applied to analyze the molecular interplay in these targeted delivery systems. New perspectives arise from the field of systems biology, where accounting for heterogeneity may lead to finding patterns in an apparently random pool of data. In this review, we map the biological and methodological causes of heterogeneity in EV multi-omics data and present current applications or possible statistical methods for integrating such data while keeping track of the current bottlenecks in the field.
Collapse
Affiliation(s)
- Leona Chitoiu
- Ultrastructural Pathology and Bioimaging Laboratory, ‘Victor Babeș’ National Institute of Pathology, Bucharest 050096, Romania; (L.C.); (M.G.)
| | - Alexandra Dobranici
- Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest 050095, Romania; (A.D.); (M.C.)
| | - Mihaela Gherghiceanu
- Ultrastructural Pathology and Bioimaging Laboratory, ‘Victor Babeș’ National Institute of Pathology, Bucharest 050096, Romania; (L.C.); (M.G.)
- Department of Cellular, Molecular Biology and Histology, ‘Carol Davila’ University of Medicine and Pharmacy, Bucharest 050474, Romania
| | - Sorina Dinescu
- Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest 050095, Romania; (A.D.); (M.C.)
- Research Institute of the University of Bucharest, University of Bucharest, Bucharest 050663, Romania
- Correspondence:
| | - Marieta Costache
- Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest 050095, Romania; (A.D.); (M.C.)
- Research Institute of the University of Bucharest, University of Bucharest, Bucharest 050663, Romania
| |
Collapse
|
16
|
Lucchetta M, Pellegrini M. Finding disease modules for cancer and COVID-19 in gene co-expression networks with the Core&Peel method. Sci Rep 2020; 10:17628. [PMID: 33077837 PMCID: PMC7573595 DOI: 10.1038/s41598-020-74705-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 09/30/2020] [Indexed: 12/21/2022] Open
Abstract
Genes are organized in functional modules (or pathways), thus their action and their dysregulation in diseases may be better understood by the identification of the modules most affected by the disease (aka disease modules, or active subnetworks). We describe how an algorithm based on the Core&Peel method is used to detect disease modules in co-expression networks of genes. We first validate Core&Peel for the general task of functional module detection by comparison with 42 methods participating in the Disease Module Identification DREAM challenge. Next, we use four specific disease test cases (colorectal cancer, prostate cancer, asthma, and rheumatoid arthritis), four state-of-the-art algorithms (ModuleDiscoverer, Degas, KeyPathwayMiner, and ClustEx), and several pathway databases to validate the proposed algorithm. Core&Peel is the only method able to find significant associations of the predicted disease module with known validated relevant pathways for all four diseases. Moreover, for the two cancer datasets, Core&Peel detects further eight relevant pathways not discovered by the other methods used in the comparative analysis. Finally, we apply Core&Peel and other methods to explore the transcriptional response of human cells to SARS-CoV-2 infection, finding supporting evidence for drug repositioning efforts at a pre-clinical level.
Collapse
Affiliation(s)
- Marta Lucchetta
- Institute of Informatics and Telematics (IIT), CNR, Pisa, 56124, Italy
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, 53100, Italy
| | - Marco Pellegrini
- Institute of Informatics and Telematics (IIT), CNR, Pisa, 56124, Italy.
| |
Collapse
|
17
|
Rappoport N, Safra R, Shamir R. MONET: Multi-omic module discovery by omic selection. PLoS Comput Biol 2020; 16:e1008182. [PMID: 32931516 PMCID: PMC7518594 DOI: 10.1371/journal.pcbi.1008182] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 09/25/2020] [Accepted: 07/22/2020] [Indexed: 01/19/2023] Open
Abstract
Recent advances in experimental biology allow creation of datasets where several genome-wide data types (called omics) are measured per sample. Integrative analysis of multi-omic datasets in general, and clustering of samples in such datasets specifically, can improve our understanding of biological processes and discover different disease subtypes. In this work we present MONET (Multi Omic clustering by Non-Exhaustive Types), which presents a unique approach to multi-omic clustering. MONET discovers modules of similar samples, such that each module is allowed to have a clustering structure for only a subset of the omics. This approach differs from most existent multi-omic clustering algorithms, which assume a common structure across all omics, and from several recent algorithms that model distinct cluster structures. We tested MONET extensively on simulated data, on an image dataset, and on ten multi-omic cancer datasets from TCGA. Our analysis shows that MONET compares favorably with other multi-omic clustering methods. We demonstrate MONET's biological and clinical relevance by analyzing its results for Ovarian Serous Cystadenocarcinoma. We also show that MONET is robust to missing data, can cluster genes in multi-omic dataset, and reveal modules of cell types in single-cell multi-omic data. Our work shows that MONET is a valuable tool that can provide complementary results to those provided by existent algorithms for multi-omic analysis.
Collapse
Affiliation(s)
- Nimrod Rappoport
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Roy Safra
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
18
|
Yao H, Shi Y, Guan J, Zhou S. Accurately Detecting Protein Complexes by Graph Embedding and Combining Functions with Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:777-787. [PMID: 30736004 DOI: 10.1109/tcbb.2019.2897769] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying protein complexes is helpful for understanding cellular functions and designing drugs. In the last decades, many computational methods have been proposed based on detecting dense subgraphs or subnetworks in Protein-Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents from the achievement of satisfactory detection results directly from PINs, because most of such existing methods exploit mainly topological information to do network partitioning. In this paper, we propose a new approach for protein complex detection by merging topological information of PINs and functional information of proteins. We first split proteins to a number of protein groups from the perspective of protein functions by using FunCat data. Then, for each of the resulting protein groups, we calculate two protein-protein similarity matrices: one is computed by using graph embedding over a PIN, the other is by using GO terms, and combine these two matrices to get an integrated similarity matrix. Following that, we cluster the proteins in each group based on the corresponding integrated similarity matrix, and obtain a number of small protein clusters. We map these clusters of proteins onto the PIN, and get a number of connected subgraphs. After a round of merging of overlapping subgraphs, finally we get the detected complexes. We conduct empirical evaluation on four PPI datasets (Collins, Gavin, Krogan, and Wiphi) with two complex benchmarks (CYC2008 and MIPS). Experimental results show that our method performs better than the state-of-the-art methods.
Collapse
|
19
|
Al-Harazi O, El Allali A, Colak D. Biomolecular Databases and Subnetwork Identification Approaches of Interest to Big Data Community: An Expert Review. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 23:138-151. [PMID: 30883301 DOI: 10.1089/omi.2018.0205] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases. Consequently, many researchers have applied these approaches to discover the genetic/genomic causes of common complex and rare human diseases, generating multiomics big data that span the continuum of genomics, proteomics, metabolomics, and many other system science fields. Therefore, there is a significant and unmet need for biological databases and tools that enable and empower the researchers to analyze, integrate, and make sense of big data. There are currently large number of databases that offer different types of biological information. In particular, the integration of gene expression profiles and protein-protein interaction networks provides a deeper understanding of the complex multilayered molecular architecture of human diseases. Therefore, there has been a growing interest in developing methodologies that integrate and contextualize big data from molecular interaction networks to identify biomarkers of human diseases at a subnetwork resolution as well. In this expert review, we provide a comprehensive summary of most popular biomolecular databases for molecular interactions (e.g., Biological General Repository for Interaction Datasets, Kyoto Encyclopedia of Genes and Genomes and Search Tool for The Retrieval of Interacting Genes/Proteins), gene-disease associations (e.g., Online Mendelian Inheritance in Man, Disease-Gene Network, MalaCards), and population-specific databases (e.g., Human Genetic Variation Database), and describe some examples of their usage and potential applications. We also present the most recent subnetwork identification approaches and discuss their main advantages and limitations. As the field of data science continues to emerge, the present analysis offers a deeper and contextualized understanding of the available databases in molecular biomedicine.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.,2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Achraf El Allali
- 2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dilek Colak
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|
20
|
Salviato E, Djordjilović V, Chiogna M, Romualdi C. SourceSet: A graphical model approach to identify primary genes in perturbed biological pathways. PLoS Comput Biol 2019; 15:e1007357. [PMID: 31652275 PMCID: PMC6834292 DOI: 10.1371/journal.pcbi.1007357] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 11/06/2019] [Accepted: 08/23/2019] [Indexed: 11/24/2022] Open
Abstract
Topological gene-set analysis has emerged as a powerful means for omic data interpretation. Although numerous methods for identifying dysregulated genes have been proposed, few of them aim to distinguish genes that are the real source of perturbation from those that merely respond to the signal dysregulation. Here, we propose a new method, called SourceSet, able to distinguish between the primary and the secondary dysregulation within a Gaussian graphical model context. The proposed method compares gene expression profiles in the control and in the perturbed condition and detects the differences in both the mean and the covariance parameters with a series of likelihood ratio tests. The resulting evidence is used to infer the primary and the secondary set, i.e. the genes responsible for the primary dysregulation, and the genes affected by the perturbation through network propagation. The proposed method demonstrates high specificity and sensitivity in different simulated scenarios and on several real biological case studies. In order to fit into the more traditional pathway analysis framework, SourceSet R package also extends the analysis from a single to multiple pathways and provides several graphical outputs, including Cytoscape visualization to browse the results. The rapid increase in omic studies has created a need to understand the biological implications of their results. Gene-set analysis has emerged as a powerful means for gaining such understanding, evolving in the last decade from the classical enrichment analysis to the more powerful topological approaches. Although numerous methods for identifying dysregulated genes have been proposed, few of them aim to distinguish genes that are the real source of perturbation from those that merely respond to the signal dysregulation. This distinction is crucial for network medicine, where the prioritization of the effect of biological perturbations may help in the molecular understanding of drug treatments and diseases. Here we propose a new method, called SourceSet, able to distinguish between primary and secondary dysregulation within a graphical model context, demonstrating a high specificity and sensitivity in different simulated scenarios and on real biological case studies.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM - The FIRC Institute of Molecular Oncology, Milan, Italy
- * E-mail: (ES); (CR)
| | | | - Monica Chiogna
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
| | - Chiara Romualdi
- Department of Biology, University of Padova, Padova, Italy
- * E-mail: (ES); (CR)
| |
Collapse
|
21
|
Ulgen E, Ozisik O, Sezerman OU. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front Genet 2019; 10:858. [PMID: 31608109 PMCID: PMC6773876 DOI: 10.3389/fgene.2019.00858] [Citation(s) in RCA: 230] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 08/16/2019] [Indexed: 12/13/2022] Open
Abstract
Pathway analysis is often the first choice for studying the mechanisms underlying a phenotype. However, conventional methods for pathway analysis do not take into account complex protein-protein interaction information, resulting in incomplete conclusions. Previously, numerous approaches that utilize protein-protein interaction information to enhance pathway analysis yielded superior results compared to conventional methods. Hereby, we present pathfindR, another approach exploiting protein-protein interaction information and the first R package for active-subnetwork-oriented pathway enrichment analyses for class comparison omics experiments. Using the list of genes obtained from an omics experiment comparing two groups of samples, pathfindR identifies active subnetworks in a protein-protein interaction network. It then performs pathway enrichment analyses on these identified subnetworks. To further reduce the complexity, it provides functionality for clustering the resulting pathways. Moreover, through a scoring function, the overall activity of each pathway in each sample can be estimated. We illustrate the capabilities of our pathway analysis method on three gene expression datasets and compare our results with those obtained from three popular pathway analysis tools. The results demonstrate that literature-supported disease-related pathways ranked higher in our approach compared to the others. Moreover, pathfindR identified additional pathways relevant to the conditions that were not identified by other tools, including pathways named after the conditions.
Collapse
Affiliation(s)
- Ege Ulgen
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| | - Ozan Ozisik
- Department of Computer Engineering, Electrical & Electronics Faculty, Yildiz Technical University, Istanbul, Turkey
| | - Osman Ugur Sezerman
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| |
Collapse
|
22
|
Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data. Eur J Nucl Med Mol Imaging 2019; 46:2722-2730. [PMID: 31203421 DOI: 10.1007/s00259-019-04382-9] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Accepted: 05/28/2019] [Indexed: 12/13/2022]
Abstract
Artificial intelligence (AI) is currently regaining enormous interest due to the success of machine learning (ML), and in particular deep learning (DL). Image analysis, and thus radiomics, strongly benefits from this research. However, effectively and efficiently integrating diverse clinical, imaging, and molecular profile data is necessary to understand complex diseases, and to achieve accurate diagnosis in order to provide the best possible treatment. In addition to the need for sufficient computing resources, suitable algorithms, models, and data infrastructure, three important aspects are often neglected: (1) the need for multiple independent, sufficiently large and, above all, high-quality data sets; (2) the need for domain knowledge and ontologies; and (3) the requirement for multiple networks that provide relevant relationships among biological entities. While one will always get results out of high-dimensional data, all three aspects are essential to provide robust training and validation of ML models, to provide explainable hypotheses and results, and to achieve the necessary trust in AI and confidence for clinical applications.
Collapse
|
23
|
The EXPANDER Integrated Platform for Transcriptome Analysis. J Mol Biol 2019; 431:2398-2406. [PMID: 31100387 DOI: 10.1016/j.jmb.2019.05.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 05/07/2019] [Accepted: 05/07/2019] [Indexed: 11/21/2022]
Abstract
Genome-wide analysis of cellular transcriptomes using RNA-seq or expression arrays is a major mainstay of current biological and biomedical research. EXPANDER (EXPression ANalyzer and DisplayER) is a comprehensive software package for analysis of expression data, with built-in support for 18 different organisms. It is designed as a "one-stop shop" platform for transcriptomic analysis, allowing for execution of all analysis steps starting with gene expression data matrix. Analyses offered include low-level preprocessing and normalization, differential expression analysis, clustering, bi-clustering, supervised grouping, high-level functional and pathway enrichment tests, and networks and motif analyses. A variety of options is offered for each step, using established algorithms, including many developed and published by our laboratory. EXPANDER has been continuously developed since 2003, having to date over 18,000 downloads and 540 citations. One of the innovations in the recent version is support for combined analysis of gene expression and ChIP-seq data to enhance the inference of transcriptional networks and their functional interpretation. EXPANDER implements cutting-edge algorithms and makes them accessible to users through user-friendly interface and intuitive visualizations. It is freely available to users at http://acgt.cs.tau.ac.il/expander/.
Collapse
|
24
|
Nguyen H, Shrestha S, Tran D, Shafi A, Draghici S, Nguyen T. A Comprehensive Survey of Tools and Software for Active Subnetwork Identification. Front Genet 2019; 10:155. [PMID: 30891064 PMCID: PMC6411791 DOI: 10.3389/fgene.2019.00155] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 02/13/2019] [Indexed: 12/13/2022] Open
Abstract
A recent focus of computational biology has been to integrate the complementary information available in molecular profiles as well as in multiple network databases in order to identify connected regions that show significant changes under different conditions. This allows for capturing dynamic and condition-specific mechanisms of the underlying phenomena and disease stages. Here we review 22 such integrative approaches for active module identification published over the last decade. This article only focuses on tools that are currently available for use and are well-maintained. We compare these methods focusing on their primary features, integrative abilities, network structures, mathematical models, and implementations. We also provide real-world scenarios in which these methods have been successfully applied, as well as highlight outstanding challenges in the field that remain to be addressed. The main objective of this review is to help potential users and researchers to choose the best method that is suitable for their data and analysis purpose.
Collapse
Affiliation(s)
- Hung Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| | - Sangam Shrestha
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| | - Duc Tran
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| | - Adib Shafi
- Department of Computer Science, Wayne State University, Detroit, MI, United States
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, United States
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, United States
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| |
Collapse
|
25
|
Kusonmano K, Halle MK, Wik E, Hoivik EA, Krakstad C, Mauland KK, Tangen IL, Berg A, Werner HMJ, Trovik J, Øyan AM, Kalland KH, Jonassen I, Salvesen HB, Petersen K. Identification of highly connected and differentially expressed gene subnetworks in metastasizing endometrial cancer. PLoS One 2018; 13:e0206665. [PMID: 30383835 PMCID: PMC6211718 DOI: 10.1371/journal.pone.0206665] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 10/17/2018] [Indexed: 12/22/2022] Open
Abstract
We have identified nine highly connected and differentially expressed gene subnetworks between aggressive primary tumors and metastatic lesions in endometrial carcinomas. We implemented a novel pipeline combining gene set and network approaches, which here allows integration of protein-protein interactions and gene expression data. The resulting subnetworks are significantly associated with disease progression across tumor stages from complex atypical hyperplasia, primary tumors to metastatic lesions. The nine subnetworks include genes related to metastasizing features such as epithelial-mesenchymal transition (EMT), hypoxia and cell proliferation. TCF4 and TWIST2 were found as central genes in the subnetwork related to EMT. Two of the identified subnetworks display statistically significant association to patient survival, which were further supported by an independent validation in the data from The Cancer Genome Atlas data collection. The first subnetwork contains genes related to cell proliferation and cell cycle, while the second contains genes involved in hypoxia such as HIF1A and EGLN3. Our findings provide a promising context to elucidate the biological mechanisms of metastasis, suggest potential prognostic markers and further identify therapeutic targets. The pipeline R source code is freely available, including permutation tests to assess statistical significance of the identified subnetworks.
Collapse
Affiliation(s)
- Kanthida Kusonmano
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
- * E-mail:
| | - Mari K. Halle
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Elisabeth Wik
- Centre for Cancer Biomarkers, Department of Clinical Medicine, University of Bergen, Bergen, Norway
- Department of Pathology, The Gade Institute, Haukeland University Hospital, Bergen, Norway
| | - Erling A. Hoivik
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Camilla Krakstad
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Karen K. Mauland
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Ingvild L. Tangen
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Anna Berg
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Henrica M. J. Werner
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Jone Trovik
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Anne M. Øyan
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Microbiology, Haukeland University Hospital, Bergen, Norway
| | - Karl-Henning Kalland
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Microbiology, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Informatics, University of Bergen, Bergen, Norway
| | - Helga B. Salvesen
- Department of Obstetrics and Gynecology, Haukeland University Hospital, Bergen, Norway
- Centre for Cancer Biomarkers, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Kjell Petersen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
26
|
Zhang W, Xu J, Li Y, Zou X. Integrating network topology, gene expression data and GO annotation information for protein complex prediction. J Bioinform Comput Biol 2018; 17:1950001. [PMID: 30803297 DOI: 10.1142/s021972001950001x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The prediction of protein complexes based on the protein interaction network is a fundamental task for the understanding of cellular life as well as the mechanisms underlying complex disease. A great number of methods have been developed to predict protein complexes based on protein-protein interaction (PPI) networks in recent years. However, because the high throughput data obtained from experimental biotechnology are incomplete, and usually contain a large number of spurious interactions, most of the network-based protein complex identification methods are sensitive to the reliability of the PPI network. In this paper, we propose a new method, Identification of Protein Complex based on Refined Protein Interaction Network (IPC-RPIN), which integrates the topology, gene expression profiles and GO functional annotation information to predict protein complexes from the reconstructed networks. To demonstrate the performance of the IPC-RPIN method, we evaluated the IPC-RPIN on three PPI networks of Saccharomycescerevisiae and compared it with four state-of-the-art methods. The simulation results show that the IPC-RPIN achieved a better result than the other methods on most of the measurements and is able to discover small protein complexes which have traditionally been neglected.
Collapse
Affiliation(s)
- Wei Zhang
- * School of Science, East China Jiaotong University, Nanchang 330013, P. R. China
| | - Jia Xu
- † School of Mechatronic Engineering, East China Jiaotong University, Nanchang 330013, P. R. China
| | - Yuanyuan Li
- ‡ School of Mathematics and Statistics, Wuhan Institute of Technology in Wuhan, Wuhan 430072, P. R. China
| | - Xiufen Zou
- § School of Mathematics and Statistics, Wuhan University, Wuhan 430072, P. R. China
| |
Collapse
|
27
|
Conrad T, Kniemeyer O, Henkel SG, Krüger T, Mattern DJ, Valiante V, Guthke R, Jacobsen ID, Brakhage AA, Vlaic S, Linde J. Module-detection approaches for the integration of multilevel omics data highlight the comprehensive response of Aspergillus fumigatus to caspofungin. BMC SYSTEMS BIOLOGY 2018; 12:88. [PMID: 30342519 PMCID: PMC6195963 DOI: 10.1186/s12918-018-0620-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 10/08/2018] [Indexed: 12/20/2022]
Abstract
Background Omics data provide deep insights into overall biological processes of organisms. However, integration of data from different molecular levels such as transcriptomics and proteomics, still remains challenging. Analyzing lists of differentially abundant molecules from diverse molecular levels often results in a small overlap mainly due to different regulatory mechanisms, temporal scales, and/or inherent properties of measurement methods. Module-detecting algorithms identifying sets of closely related proteins from protein-protein interaction networks (PPINs) are promising approaches for a better data integration. Results Here, we made use of transcriptome, proteome and secretome data from the human pathogenic fungus Aspergillus fumigatus challenged with the antifungal drug caspofungin. Caspofungin targets the fungal cell wall which leads to a compensatory stress response. We analyzed the omics data using two different approaches: First, we applied a simple, classical approach by comparing lists of differentially expressed genes (DEGs), differentially synthesized proteins (DSyPs) and differentially secreted proteins (DSePs); second, we used a recently published module-detecting approach, ModuleDiscoverer, to identify regulatory modules from PPINs in conjunction with the experimental data. Our results demonstrate that regulatory modules show a notably higher overlap between the different molecular levels and time points than the classical approach. The additional structural information provided by regulatory modules allows for topological analyses. As a result, we detected a significant association of omics data with distinct biological processes such as regulation of kinase activity, transport mechanisms or amino acid metabolism. We also found a previously unreported increased production of the secondary metabolite fumagillin by A. fumigatus upon exposure to caspofungin. Furthermore, a topology-based analysis of potential key factors contributing to drug-caused side effects identified the highly conserved protein polyubiquitin as a central regulator. Interestingly, polyubiquitin UbiD neither belonged to the groups of DEGs, DSyPs nor DSePs but most likely strongly influenced their levels. Conclusion Module-detecting approaches support the effective integration of multilevel omics data and provide a deep insight into complex biological relationships connecting these levels. They facilitate the identification of potential key players in the organism’s stress response which cannot be detected by commonly used approaches comparing lists of differentially abundant molecules. Electronic supplementary material The online version of this article (10.1186/s12918-018-0620-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- T Conrad
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany.
| | - O Kniemeyer
- Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany
| | | | - T Krüger
- Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany
| | - D J Mattern
- Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany.,Present address: PerkinElmer Inc., Rodgau, Germany
| | - V Valiante
- Biobricks of Microbial Natural Product Syntheses, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany
| | - R Guthke
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany
| | - I D Jacobsen
- Microbial Immunology, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany.,Institute for Microbiology, Friedrich Schiller University, Jena, Germany
| | - A A Brakhage
- Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany.,Institute for Microbiology, Friedrich Schiller University, Jena, Germany
| | - S Vlaic
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany
| | - J Linde
- Research Group PiDOMICs, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, Jena, Germany.,Institute for Bacterial Infections and Zoonoses, Federal Research Institute for Animal Health - Friedrich Loeffler Institute, Jena, Germany
| |
Collapse
|
28
|
Jalili M, Gebhardt T, Wolkenhauer O, Salehzadeh-Yazdi A. Unveiling network-based functional features through integration of gene expression into protein networks. Biochim Biophys Acta Mol Basis Dis 2018; 1864:2349-2359. [PMID: 29466699 DOI: 10.1016/j.bbadis.2018.02.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 01/31/2018] [Accepted: 02/13/2018] [Indexed: 02/02/2023]
Abstract
Decoding health and disease phenotypes is one of the fundamental objectives in biomedicine. Whereas high-throughput omics approaches are available, it is evident that any single omics approach might not be adequate to capture the complexity of phenotypes. Therefore, integrated multi-omics approaches have been used to unravel genotype-phenotype relationships such as global regulatory mechanisms and complex metabolic networks in different eukaryotic organisms. Some of the progress and challenges associated with integrated omics studies have been reviewed previously in comprehensive studies. In this work, we highlight and review the progress, challenges and advantages associated with emerging approaches, integrating gene expression and protein-protein interaction networks to unravel network-based functional features. This includes identifying disease related genes, gene prioritization, clustering protein interactions, developing the modules, extract active subnetworks and static protein complexes or dynamic/temporal protein complexes. We also discuss how these approaches contribute to our understanding of the biology of complex traits and diseases. This article is part of a Special Issue entitled: Cardiac adaptations to obesity, diabetes and insulin resistance, edited by Professors Jan F.C. Glatz, Jason R.B. Dyck and Christine Des Rosiers.
Collapse
Affiliation(s)
- Mahdi Jalili
- Hematology, Oncology and SCT Research Center, Tehran University of Medical Sciences, Tehran, Iran; Hematologic Malignancies Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Tom Gebhardt
- Department of Systems Biology and Bioinformatics, University of Rostock, 18051 Rostock, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, 18051 Rostock, Germany
| | - Ali Salehzadeh-Yazdi
- Department of Systems Biology and Bioinformatics, University of Rostock, 18051 Rostock, Germany.
| |
Collapse
|
29
|
Cao B, Deng S, Luo J, Ding P, Wang S. Identification of overlapping protein complexes by fuzzy K-medoids clustering algorithm in yeast protein-protein interaction networks. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-17026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Buwen Cao
- School of Information Science and Engineering, Hunan City University, Yiyang, China
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shuguang Deng
- College of Communication and Electronic Engineering, Hunan City University, Yiyang, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Pingjian Ding
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shulin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
30
|
Vlaic S, Conrad T, Tokarski-Schnelle C, Gustafsson M, Dahmen U, Guthke R, Schuster S. ModuleDiscoverer: Identification of regulatory modules in protein-protein interaction networks. Sci Rep 2018; 8:433. [PMID: 29323246 PMCID: PMC5764996 DOI: 10.1038/s41598-017-18370-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 12/06/2017] [Indexed: 02/08/2023] Open
Abstract
The identification of disease-associated modules based on protein-protein interaction networks (PPINs) and gene expression data has provided new insights into the mechanistic nature of diverse diseases. However, their identification is hampered by the detection of protein communities within large-scale, whole-genome PPINs. A presented successful strategy detects a PPIN's community structure based on the maximal clique enumeration problem (MCE), which is a non-deterministic polynomial time-hard problem. This renders the approach computationally challenging for large PPINs implying the need for new strategies. We present ModuleDiscoverer, a novel approach for the identification of regulatory modules from PPINs and gene expression data. Following the MCE-based approach, ModuleDiscoverer uses a randomization heuristic-based approximation of the community structure. Given a PPIN of Rattus norvegicus and public gene expression data, we identify the regulatory module underlying a rodent model of non-alcoholic steatohepatitis (NASH), a severe form of non-alcoholic fatty liver disease (NAFLD). The module is validated using single-nucleotide polymorphism (SNP) data from independent genome-wide association studies and gene enrichment tests. Based on gene enrichment tests, we find that ModuleDiscoverer performs comparably to three existing module-detecting algorithms. However, only our NASH-module is significantly enriched with genes linked to NAFLD-associated SNPs. ModuleDiscoverer is available at http://www.hki-jena.de/index.php/0/2/490 (Others/ModuleDiscoverer).
Collapse
Affiliation(s)
- Sebastian Vlaic
- Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Systems Biology and Bioinformatics, Jena, 07745, Germany.
- Friedrich-Schiller-University, Department of Bioinformatics, Jena, 07743, Germany.
| | - Theresia Conrad
- Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Systems Biology and Bioinformatics, Jena, 07745, Germany
| | - Christian Tokarski-Schnelle
- Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Systems Biology and Bioinformatics, Jena, 07745, Germany
- University Hospital Jena, Friedrich-Schiller-University, General, Visceral and Vascular Surgery, Jena, 07749, Germany
| | - Mika Gustafsson
- Linköping University, Bioinformatics, Department of Physics, Chemistry and Biology, Linköping, 581 83, Sweden
| | - Uta Dahmen
- University Hospital Jena, Friedrich-Schiller-University, General, Visceral and Vascular Surgery, Jena, 07749, Germany
| | - Reinhard Guthke
- Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Systems Biology and Bioinformatics, Jena, 07745, Germany
| | - Stefan Schuster
- Friedrich-Schiller-University, Department of Bioinformatics, Jena, 07743, Germany
| |
Collapse
|
31
|
CPredictor3.0: detecting protein complexes from PPI networks with expression data and functional annotations. BMC SYSTEMS BIOLOGY 2017; 11:135. [PMID: 29322927 PMCID: PMC5763309 DOI: 10.1186/s12918-017-0504-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Effectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems. RESULTS In this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained. We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall. CONCLUSION CPredictor3.0 can serve as a promising tool of protein complex prediction.
Collapse
|
32
|
Alcaraz N, List M, Batra R, Vandin F, Ditzel HJ, Baumbach J. De novo pathway-based biomarker identification. Nucleic Acids Res 2017; 45:e151. [PMID: 28934488 PMCID: PMC5766193 DOI: 10.1093/nar/gkx642] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 07/13/2017] [Indexed: 02/07/2023] Open
Abstract
Gene expression profiles have been extensively discussed as an aid to guide the therapy by predicting disease outcome for the patients suffering from complex diseases, such as cancer. However, prediction models built upon single-gene (SG) features show poor stability and performance on independent datasets. Attempts to mitigate these drawbacks have led to the development of network-based approaches that integrate pathway information to produce meta-gene (MG) features. Also, MG approaches have only dealt with the two-class problem of good versus poor outcome prediction. Stratifying patients based on their molecular subtypes can provide a detailed view of the disease and lead to more personalized therapies. We propose and discuss a novel MG approach based on de novo pathways, which for the first time have been used as features in a multi-class setting to predict cancer subtypes. Comprehensive evaluation in a large cohort of breast cancer samples from The Cancer Genome Atlas (TCGA) revealed that MGs are considerably more stable than SG models, while also providing valuable insight into the cancer hallmarks that drive them. In addition, when tested on an independent benchmark non-TCGA dataset, MG features consistently outperformed SG models. We provide an easy-to-use web service at http://pathclass.compbio.sdu.dk where users can upload their own gene expression datasets from breast cancer studies and obtain the subtype predictions from all the classifiers.
Collapse
Affiliation(s)
- Nicolas Alcaraz
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark.,Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, 5000 Odense, Denmark.,The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Markus List
- Computational Biology and Applied Algorithms, Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Richa Batra
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Munich, Germany.,Department of Dermatology and Allergy, Technical University of Munich, 80802 Munich, Germany
| | - Fabio Vandin
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark.,Department of Information and Engineering, University of Padowa, 35122 Padowa, Italy
| | - Henrik J Ditzel
- Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, 5000 Odense, Denmark.,Department of Oncology, Odense University Hospital, 5000 Odense, Denmark
| | - Jan Baumbach
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark.,Computational Systems Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| |
Collapse
|
33
|
Xu B, Wang Y, Wang Z, Zhou J, Zhou S, Guan J. An effective approach to detecting both small and large complexes from protein-protein interaction networks. BMC Bioinformatics 2017; 18:419. [PMID: 29072136 PMCID: PMC5657047 DOI: 10.1186/s12859-017-1820-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Predicting protein complexes from protein-protein interaction (PPI) networks has been studied for decade. Various methods have been proposed to address some challenging issues of this problem, including overlapping clusters, high false positive/negative rates of PPI data and diverse complex structures. It is well known that most current methods can detect effectively only complexes of size ≥3, which account for only about half of the total existing complexes. Recently, a method was proposed specifically for finding small complexes (size = 2 and 3) from PPI networks. However, up to now there is no effective approach that can predict both small (size ≤ 3) and large (size >3) complexes from PPI networks. Results In this paper, we propose a novel method, called CPredictor2.0, that can detect both small and large complexes under a unified framework. Concretely, we first group proteins of similar functions. Then, the Markov clustering algorithm is employed to discover clusters in each group. Finally, we merge all discovered clusters that overlap with each other to a certain degree, and the merged clusters as well as the remaining clusters constitute the set of detected complexes. Extensive experiments have shown that the new method can more effectively predict both small and large complexes, in comparison with the state-of-the-art methods. Conclusions The proposed method, CPredictor2.0, can be applied to accurately predict both small and large protein complexes.
Collapse
Affiliation(s)
- Bin Xu
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai, 201804, China
| | - Yang Wang
- School of Software, Jiangxi Normal University, 99 Ziyang Avenue, Nanchang, 330022, China
| | - Zewei Wang
- Shanghai Southwest Model Middle School, 67 Huicheng Vallige-1, Baise Road, Shanghai, 200237, China
| | - Jiaogen Zhou
- The institute of subtropical Agriculture, China Academy of Sciences, 444 Yuandaer Road, Mapoling, Changsha, 410125, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 220 Handan Road, Shanghai, 200433, China.,The Bioinformatics Lab at Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai, 201804, China.
| |
Collapse
|
34
|
Garland J. Unravelling the complexity of signalling networks in cancer: A review of the increasing role for computational modelling. Crit Rev Oncol Hematol 2017; 117:73-113. [PMID: 28807238 DOI: 10.1016/j.critrevonc.2017.06.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Revised: 06/01/2017] [Accepted: 06/08/2017] [Indexed: 02/06/2023] Open
Abstract
Cancer induction is a highly complex process involving hundreds of different inducers but whose eventual outcome is the same. Clearly, it is essential to understand how signalling pathways and networks generated by these inducers interact to regulate cell behaviour and create the cancer phenotype. While enormous strides have been made in identifying key networking profiles, the amount of data generated far exceeds our ability to understand how it all "fits together". The number of potential interactions is astronomically large and requires novel approaches and extreme computation methods to dissect them out. However, such methodologies have high intrinsic mathematical and conceptual content which is difficult to follow. This review explains how computation modelling is progressively finding solutions and also revealing unexpected and unpredictable nano-scale molecular behaviours extremely relevant to how signalling and networking are coherently integrated. It is divided into linked sections illustrated by numerous figures from the literature describing different approaches and offering visual portrayals of networking and major conceptual advances in the field. First, the problem of signalling complexity and data collection is illustrated for only a small selection of known oncogenes. Next, new concepts from biophysics, molecular behaviours, kinetics, organisation at the nano level and predictive models are presented. These areas include: visual representations of networking, Energy Landscapes and energy transfer/dissemination (entropy); diffusion, percolation; molecular crowding; protein allostery; quinary structure and fractal distributions; energy management, metabolism and re-examination of the Warburg effect. The importance of unravelling complex network interactions is then illustrated for some widely-used drugs in cancer therapy whose interactions are very extensive. Finally, use of computational modelling to develop micro- and nano- functional models ("bottom-up" research) is highlighted. The review concludes that computational modelling is an essential part of cancer research and is vital to understanding network formation and molecular behaviours that are associated with it. Its role is increasingly essential because it is unravelling the huge complexity of cancer induction otherwise unattainable by any other approach.
Collapse
Affiliation(s)
- John Garland
- Manchester Interdisciplinary Biocentre, Manchester University, Manchester, UK.
| |
Collapse
|
35
|
Amar D, Izraeli S, Shamir R. Utilizing somatic mutation data from numerous studies for cancer research: proof of concept and applications. Oncogene 2017; 36:3375-3383. [PMID: 28092680 PMCID: PMC5485176 DOI: 10.1038/onc.2016.489] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 11/20/2016] [Accepted: 11/22/2016] [Indexed: 02/07/2023]
Abstract
Large cancer projects measure somatic mutations in thousands of samples, gradually assembling a catalog of recurring mutations in cancer. Many methods analyze these data jointly with auxiliary information with the aim of identifying subtype-specific results. Here, we show that somatic gene mutations alone can reliably and specifically predict cancer subtypes. Interpretation of the classifiers provides useful insights for several biomedical applications. We analyze the COSMIC database, which collects somatic mutations from The Cancer Genome Atlas (TCGA) as well as from many smaller scale studies. We use multi-label classification techniques and the Disease Ontology hierarchy in order to identify cancer subtype-specific biomarkers. Cancer subtype classifiers based on TCGA and the smaller studies have comparable performance, and the smaller studies add a substantial value in terms of validation, coverage of additional subtypes, and improved classification. The gene sets of the classifiers are used for threefold contribution. First, we refine the associations of genes to cancer subtypes and identify novel compelling candidate driver genes. Second, using our classifiers we successfully predict the primary site of metastatic samples. Third, we provide novel hypotheses regarding detection of subtype-specific synthetic lethality interactions. From the cancer research community perspective, our results suggest that curation efforts, such as COSMIC, have great added and complementary value even in the era of large international cancer projects.
Collapse
Affiliation(s)
- D Amar
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - S Izraeli
- Department of Pediatric Hematology-Oncology, Safra Children’s Hospital, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
- Sackler School of Medicine, Tel Aviv University, Tel-Aviv, Israel
| | - R Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
36
|
Ma CY, Chen YPP, Berger B, Liao CS. Identification of protein complexes by integrating multiple alignment of protein interaction networks. Bioinformatics 2017; 33:1681-1688. [PMID: 28130237 PMCID: PMC5860626 DOI: 10.1093/bioinformatics/btx043] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Revised: 11/22/2016] [Accepted: 01/20/2017] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Protein complexes are one of the keys to studying the behavior of a cell system. Many biological functions are carried out by protein complexes. During the past decade, the main strategy used to identify protein complexes from high-throughput network data has been to extract near-cliques or highly dense subgraphs from a single protein-protein interaction (PPI) network. Although experimental PPI data have increased significantly over recent years, most PPI networks still have many false positive interactions and false negative edge loss due to the limitations of high-throughput experiments. In particular, the false negative errors restrict the search space of such conventional protein complex identification approaches. Thus, it has become one of the most challenging tasks in systems biology to automatically identify protein complexes. RESULTS In this study, we propose a new algorithm, NEOComplex ( NE CC- and O rtholog-based Complex identification by multiple network alignment), which integrates functional orthology information that can be obtained from different types of multiple network alignment (MNA) approaches to expand the search space of protein complex detection. As part of our approach, we also define a new edge clustering coefficient (NECC) to assign weights to interaction edges in PPI networks so that protein complexes can be identified more accurately. The NECC is based on the intuition that there is functional information captured in the common neighbors of the common neighbors as well. Our results show that our algorithm outperforms well-known protein complex identification tools in a balance between precision and recall on three eukaryotic species: human, yeast, and fly. As a result of MNAs of the species, the proposed approach can tolerate edge loss in PPI networks and even discover sparse protein complexes which have traditionally been a challenge to predict. AVAILABILITY AND IMPLEMENTATION http://acolab.ie.nthu.edu.tw/bionetwork/NEOComplex. CONTACT bab@csail.mit.edu or csliao@ie.nthu.edu.tw. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cheng-Yu Ma
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Vic, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Vic, Australia
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Mathematics and Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Chung-Shou Liao
- Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu, Taiwan
| |
Collapse
|
37
|
Wu M, Ou-Yang L, Li XL. Protein Complex Detection via Effective Integration of Base Clustering Solutions and Co-Complex Affinity Scores. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:733-739. [PMID: 27071190 DOI: 10.1109/tcbb.2016.2552176] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
With the increasing availability of protein interaction data, various computational methods have been developed to predict protein complexes. However, different computational methods may have their own advantages and limitations. Ensemble clustering has thus been studied to minimize the potential bias and risk of individual methods and generate prediction results with better coverage and accuracy. In this paper, we extend the traditional ensemble clustering by taking into account the co-complex affinity scores and present an Ensemble H ierarchical Clustering framework (EnsemHC) to detect protein complexes. First, we construct co-cluster matrices by integrating the clustering results with the co-complex evidences. Second, we sum up the constructed co-cluster matrices to derive a final ensemble matrix via a novel iterative weighting scheme. Finally, we apply the hierarchical clustering to generate protein complexes from the final ensemble matrix. Experimental results demonstrate that our EnsemHC performs better than its base clustering methods and various existing integrative methods. In addition, we also observed that integrating the clusters and co-complex affinity scores from different data sources will improve the prediction performance, e.g., integrating the clusters from TAP data and co-complex affinities from binary PPI data achieved the best performance in our experiments.
Collapse
|
38
|
Active module identification in intracellular networks using a memetic algorithm with a new binary decoding scheme. BMC Genomics 2017; 18:209. [PMID: 28361692 PMCID: PMC5374686 DOI: 10.1186/s12864-017-3495-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Active modules are connected regions in biological network which show significant changes in expression over particular conditions. The identification of such modules is important since it may reveal the regulatory and signaling mechanisms that associate with a given cellular response. Results In this paper, we propose a novel active module identification algorithm based on a memetic algorithm. We propose a novel encoding/decoding scheme to ensure the connectedness of the identified active modules. Based on the scheme, we also design and incorporate a local search operator into the memetic algorithm to improve its performance. Conclusion The effectiveness of proposed algorithm is validated on both small and large protein interaction networks.
Collapse
|
39
|
Abstract
De novo pathway enrichment is a powerful approach to discover previously uncharacterized molecular mechanisms in addition to already known pathways. To achieve this, condition-specific functional modules are extracted from large interaction networks. Here, we give an overview of the state of the art and present the first framework for assessing the performance of existing methods. We identified 19 tools and selected seven representative candidates for a comparative analysis with more than 12,000 runs, spanning different biological networks, molecular profiles, and parameters. Our results show that none of the methods consistently outperforms the others. To mitigate this issue for biomedical researchers, we provide guidelines to choose the appropriate tool for a given dataset. Moreover, our framework is the first attempt for a quantitative evaluation of de novo methods, which will allow the bioinformatics community to objectively compare future tools against the state of the art. De novo pathway enrichment methods are essential to understand disease complexity. They can uncover disease-specific functional modules by integrating molecular interaction networks with expression profiles. However, how should researchers choose one method out of several? In this article, a group of scientists from Denmark and Germany presents the first attempt to quantitatively evaluate existing methods. This framework will help the biomedical community to find the appropriate tool(s) for their data. They created synthetic gold standards and simulated expression profiles to perform a systematic assessment of various tools. They observed that the choice of interaction network, parameter settings, preprocessing of expression data and statistical properties of the expression profiles influence the results to a large extent. The results reveal strengths and limitations of the individual methods and suggest using two or more tools to obtain comprehensive disease-modules.
Collapse
|
40
|
Gladilin E. Graph-theoretical model of global human interactome reveals enhanced long-range communicability in cancer networks. PLoS One 2017; 12:e0170953. [PMID: 28141819 PMCID: PMC5283687 DOI: 10.1371/journal.pone.0170953] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 01/13/2017] [Indexed: 12/22/2022] Open
Abstract
Malignant transformation is known to involve substantial rearrangement of the molecular genetic landscape of the cell. A common approach to analysis of these alterations is a reductionist one and consists of finding a compact set of differentially expressed genes or associated signaling pathways. However, due to intrinsic tumor heterogeneity and tissue specificity, biomarkers defined by a small number of genes/pathways exhibit substantial variability. As an alternative to compact differential signatures, global features of genetic cell machinery are conceivable. Global network descriptors suggested in previous works are, however, known to potentially be biased by overrepresentation of interactions between frequently studied genes-proteins. Here, we construct a cellular network of 74538 directional and differential gene expression weighted protein-protein and gene regulatory interactions, and perform graph-theoretical analysis of global human interactome using a novel, degree-independent feature—the normalized total communicability (NTC). We apply this framework to assess differences in total information flow between different cancer (BRCA/COAD/GBM) and non-cancer interactomes. Our experimental results reveal that different cancer interactomes are characterized by significant enhancement of long-range NTC, which arises from circulation of information flow within robustly organized gene subnetworks. Although enhancement of NTC emerges in different cancer types from different genomic profiles, we identified a subset of 90 common genes that are related to elevated NTC in all studied tumors. Our ontological analysis shows that these genes are associated with enhanced cell division, DNA replication, stress response, and other cellular functions and processes typically upregulated in cancer. We conclude that enhancement of long-range NTC manifested in the correlated activity of genes whose tight coordination is required for survival and proliferation of all tumor cells, and, thus, can be seen as a graph-theoretical equivalent to some hallmarks of cancer. The computational framework for differential network analysis presented herein is of potential interest for a wide range of network perturbation problems given by single or multiple gene-protein activation-inhibition.
Collapse
Affiliation(s)
- Evgeny Gladilin
- Division of Theoretical Bioinformatics, German Cancer Research Center, Berliner Str. 41, 69120 Heidelberg, Germany
- BioQuant and IPMB, University Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
- * E-mail:
| |
Collapse
|
41
|
Exome sequencing of Pakistani consanguineous families identifies 30 novel candidate genes for recessive intellectual disability. Mol Psychiatry 2017; 22:1604-1614. [PMID: 27457812 PMCID: PMC5658665 DOI: 10.1038/mp.2016.109] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Revised: 05/18/2016] [Accepted: 06/01/2016] [Indexed: 12/13/2022]
Abstract
Intellectual disability (ID) is a clinically and genetically heterogeneous disorder, affecting 1-3% of the general population. Although research into the genetic causes of ID has recently gained momentum, identification of pathogenic mutations that cause autosomal recessive ID (ARID) has lagged behind, predominantly due to non-availability of sizeable families. Here we present the results of exome sequencing in 121 large consanguineous Pakistani ID families. In 60 families, we identified homozygous or compound heterozygous DNA variants in a single gene, 30 affecting reported ID genes and 30 affecting novel candidate ID genes. Potential pathogenicity of these alleles was supported by co-segregation with the phenotype, low frequency in control populations and the application of stringent bioinformatics analyses. In another eight families segregation of multiple pathogenic variants was observed, affecting 19 genes that were either known or are novel candidates for ID. Transcriptome profiles of normal human brain tissues showed that the novel candidate ID genes formed a network significantly enriched for transcriptional co-expression (P<0.0001) in the frontal cortex during fetal development and in the temporal-parietal and sub-cortex during infancy through adulthood. In addition, proteins encoded by 12 novel ID genes directly interact with previously reported ID proteins in six known pathways essential for cognitive function (P<0.0001). These results suggest that disruptions of temporal parietal and sub-cortical neurogenesis during infancy are critical to the pathophysiology of ID. These findings further expand the existing repertoire of genes involved in ARID, and provide new insights into the molecular mechanisms and the transcriptome map of ID.
Collapse
|
42
|
Modos D, Brooks J, Fazekas D, Ari E, Vellai T, Csermely P, Korcsmaros T, Lenti K. Identification of critical paralog groups with indispensable roles in the regulation of signaling flow. Sci Rep 2016; 6:38588. [PMID: 27922122 PMCID: PMC5138592 DOI: 10.1038/srep38588] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 11/11/2016] [Indexed: 01/21/2023] Open
Abstract
Extensive cross-talk between signaling pathways is required to integrate the myriad of extracellular signal combinations at the cellular level. Gene duplication events may lead to the emergence of novel functions, leaving groups of similar genes - termed paralogs - in the genome. To distinguish critical paralog groups (CPGs) from other paralogs in human signaling networks, we developed a signaling network-based method using cross-talk annotation and tissue-specific signaling flow analysis. 75 CPGs were found with higher degree, betweenness centrality, closeness, and ‘bowtieness’ when compared to other paralogs or other proteins in the signaling network. CPGs had higher diversity in all these measures, with more varied biological functions and more specific post-transcriptional regulation than non-critical paralog groups (non-CPG). Using TGF-beta, Notch and MAPK pathways as examples, SMAD2/3, NOTCH1/2/3 and MEK3/6-p38 CPGs were found to regulate the signaling flow of their respective pathways. Additionally, CPGs showed a higher mutation rate in both inherited diseases and cancer, and were enriched in drug targets. In conclusion, the results revealed two distinct types of paralog groups in the signaling network: CPGs and non-CPGs. Thus highlighting the importance of CPGs as compared to non-CPGs in drug discovery and disease pathogenesis.
Collapse
Affiliation(s)
- Dezso Modos
- Department of Morphology and Physiology, Faculty of Health Sciences, Semmelweis University, Budapest, Hungary.,Department of Genetics, Eotvos Lorand University, Budapest, Hungary.,Earlham Institute, Norwich Research Park, Norwich, UK
| | - Johanne Brooks
- Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, UK.,Faculty of Medicine and Health, University of East Anglia, Norwich, UK.,Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
| | - David Fazekas
- Department of Genetics, Eotvos Lorand University, Budapest, Hungary
| | - Eszter Ari
- Department of Genetics, Eotvos Lorand University, Budapest, Hungary.,Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged, Hungary
| | - Tibor Vellai
- Department of Genetics, Eotvos Lorand University, Budapest, Hungary
| | - Peter Csermely
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Tamas Korcsmaros
- Department of Genetics, Eotvos Lorand University, Budapest, Hungary.,Earlham Institute, Norwich Research Park, Norwich, UK.,Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, UK
| | - Katalin Lenti
- Department of Morphology and Physiology, Faculty of Health Sciences, Semmelweis University, Budapest, Hungary
| |
Collapse
|
43
|
Kondofersky I, Theis FJ, Fuchs C. Inferring catalysis in biological systems. IET Syst Biol 2016; 10:210-218. [PMID: 27879475 PMCID: PMC8687166 DOI: 10.1049/iet-syb.2015.0087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 03/24/2016] [Accepted: 04/06/2016] [Indexed: 09/22/2023] Open
Abstract
In systems biology, one is often interested in the communication patterns between several species, such as genes, enzymes or proteins. These patterns become more recognisable when temporal experiments are performed. This temporal communication can be structured by reaction networks such as gene regulatory networks or signalling pathways. Mathematical modelling of data arising from such networks can reveal important details, thus helping to understand the studied system. In many cases, however, corresponding models still deviate from the observed data. This may be due to unknown but present catalytic reactions. From a modelling perspective, the question of whether a certain reaction is catalysed leads to a large increase of model candidates. For large networks the calibration of all possible models becomes computationally infeasible. We propose a method which determines a substantially reduced set of appropriate model candidates and identifies the catalyst of each reaction at the same time. This is incorporated in a multiple-step procedure which first extends the network by additional latent variables and subsequently identifies catalyst candidates using similarity analysis methods. Results from synthetic data examples suggest a good performance even for non-informative data with few observations. Applied on CD95 apoptotic pathway our method provides new insights into apoptosis regulation.
Collapse
Affiliation(s)
- Ivan Kondofersky
- Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Boltzmannstr. 3, 85748 Garching, Germany
| | - Fabian J Theis
- Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Boltzmannstr. 3, 85748 Garching, Germany
| | - Christiane Fuchs
- Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Boltzmannstr. 3, 85748 Garching, Germany.
| |
Collapse
|
44
|
Xu Y, Guo M, Liu X, Wang C, Liu Y, Liu G. Identify bilayer modules via pseudo-3D clustering: applications to miRNA-gene bilayer networks. Nucleic Acids Res 2016; 44:e152. [PMID: 27484480 PMCID: PMC5741208 DOI: 10.1093/nar/gkw679] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 06/30/2016] [Accepted: 07/18/2016] [Indexed: 12/11/2022] Open
Abstract
Module identification is a frequently used approach for mining local structures with more significance in global networks. Recently, a wide variety of bilayer networks are emerging to characterize the more complex biological processes. In the light of special topological properties of bilayer networks and the accompanying challenges, there is yet no effective method aiming at bilayer module identification to probe the modular organizations from the more inspiring bilayer networks. To this end, we proposed the pseudo-3D clustering algorithm, which starts from extracting initial non-hierarchically organized modules and then iteratively deciphers the hierarchical organization of modules according to a bottom-up strategy. Specifically, a modularity function for bilayer modules was proposed to facilitate the algorithm reporting the optimal partition that gives the most accurate characterization of the bilayer network. Simulation studies demonstrated its robustness and outperformance against alternative competing methods. Specific applications to both the soybean and human miRNA-gene bilayer networks demonstrated that the pseudo-3D clustering algorithm successfully identified the overlapping, hierarchically organized and highly cohesive bilayer modules. The analyses on topology, functional and human disease enrichment and the bilayer subnetwork involved in soybean fat biosynthesis provided both the theoretical and biological evidence supporting the effectiveness and robustness of pseudo-3D clustering algorithm.
Collapse
Affiliation(s)
- Yungang Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Guojun Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
45
|
Singh NK, Ernst M, Liebscher V, Fuellen G, Taher L. Revealing complex function, process and pathway interactions with high-throughput expression and biological annotation data. MOLECULAR BIOSYSTEMS 2016; 12:3196-208. [PMID: 27507577 DOI: 10.1039/c6mb00280c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The biological relationships both between and within the functions, processes and pathways that operate within complex biological systems are only poorly characterized, making the interpretation of large scale gene expression datasets extremely challenging. Here, we present an approach that integrates gene expression and biological annotation data to identify and describe the interactions between biological functions, processes and pathways that govern a phenotype of interest. The product is a global, interconnected network, not of genes but of functions, processes and pathways, that represents the biological relationships within the system. We validated our approach on two high-throughput expression datasets describing organismal and organ development. Our findings are well supported by the available literature, confirming that developmental processes and apoptosis play key roles in cell differentiation. Furthermore, our results suggest that processes related to pluripotency and lineage commitment, which are known to be critical for development, interact mainly indirectly, through genes implicated in more general biological processes. Moreover, we provide evidence that supports the relevance of cell spatial organization in the developing liver for proper liver function. Our strategy can be viewed as an abstraction that is useful to interpret high-throughput data and devise further experiments.
Collapse
Affiliation(s)
- Nitesh Kumar Singh
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Ernst-Heydemann-Str. 8, 18057 Rostock, Germany.
| | | | | | | | | |
Collapse
|
46
|
Cao B, Luo J, Liang C, Wang S, Ding P. PCE-FR: A Novel Method for Identifying Overlapping Protein Complexes in Weighted Protein-Protein Interaction Networks Using Pseudo-Clique Extension Based on Fuzzy Relation. IEEE Trans Nanobioscience 2016; 15:728-738. [PMID: 27662678 DOI: 10.1109/tnb.2016.2611683] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Identifying overlapping protein complexes in protein-protein interaction (PPI) networks can provide insight into cellular functional organization and thus elucidate underlying cellular mechanisms. Recently, various algorithms for protein complexes detection have been developed for PPI networks. However, majority of algorithms primarily depend on network topological feature and/or gene expression profile, failing to consider the inherent biological meanings between protein pairs. In this paper, we propose a novel method to detect protein complexes using pseudo-clique extension based on fuzzy relation (PCE-FR). Our algorithm operates in three stages: it first forms the nonoverlapping protein substructure based on fuzzy relation and then expands each substructure by adding neighbor proteins to maximize the cohesive score. Finally, highly overlapped candidate protein complexes are merged to form the final protein complex set. Particularly, our algorithm employs the biological significance hidden in protein pairs to construct edge weight for protein interaction networks. The experiment results show that our method can not only outperform classical algorithms such as CFinder, ClusterONE, CMC, RRW, HC-PIN, and ProRank +, but also achieve ideal overall performance in most of the yeast PPI datasets in terms of composite score consisting of precision, accuracy, and separation. We further apply our method to a human PPI network from the HPRD dataset and demonstrate it is very effective in detecting protein complexes compared to other algorithms.
Collapse
|
47
|
Domingo A, Amar D, Grütz K, Lee LV, Rosales R, Brüggemann N, Jamora RD, Cutiongco-Dela Paz E, Rolfs A, Dressler D, Walter U, Krainc D, Lohmann K, Shamir R, Klein C, Westenberger A. Evidence of TAF1 dysfunction in peripheral models of X-linked dystonia-parkinsonism. Cell Mol Life Sci 2016; 73:3205-15. [PMID: 26879577 PMCID: PMC11108471 DOI: 10.1007/s00018-016-2159-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 01/30/2016] [Accepted: 02/04/2016] [Indexed: 11/30/2022]
Abstract
The molecular dysfunction in X-linked dystonia-parkinsonism is not completely understood. Thus far, only noncoding alterations have been found in genetic analyses, located in or nearby the TATA-box binding protein-associated factor 1 (TAF1) gene. Given that this gene is ubiquitously expressed and is a critical component of the cellular transcription machinery, we sought to study differential gene expression in peripheral models by performing microarray-based expression profiling in blood and fibroblasts, and comparing gene expression in affected individuals vs. ethnically matched controls. Validation was performed via quantitative polymerase chain reaction in discovery and independent replication sets. We observed consistent downregulation of common TAF1 transcripts in samples from affected individuals in gene-level and high-throughput experiments. This signal was accompanied by a downstream effect in the microarray, reflected by the dysregulation of 307 genes in the disease group. Gene Ontology and network analyses revealed enrichment of genes involved in RNA polymerase II-dependent transcription, a pathway relevant to TAF1 function. Thus, the results converge on TAF1 dysfunction in peripheral models of X-linked dystonia-parkinsonism, and provide evidence of altered expression of a canonical gene in this disease. Furthermore, our study illustrates a link between the previously described genetic alterations and TAF1 dysfunction at the transcriptome level.
Collapse
Affiliation(s)
- Aloysius Domingo
- Institute of Neurogenetics, University of Lübeck, Maria Goeppert Str. 1, 23562, Lübeck, Germany
- Graduate School Lübeck, University of Lübeck, Lübeck, Germany
| | - David Amar
- Edmond J. Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv, Israel
| | - Karen Grütz
- Institute of Neurogenetics, University of Lübeck, Maria Goeppert Str. 1, 23562, Lübeck, Germany
| | - Lillian V Lee
- XDP Study Group, Philippine Children's Medical Center, Quezon City, Philippines
| | - Raymond Rosales
- Department of Neurology and Psychiatry, University of Santo Tomas, Manila, Philippines
| | - Norbert Brüggemann
- Institute of Neurogenetics, University of Lübeck, Maria Goeppert Str. 1, 23562, Lübeck, Germany
- Department of Neurology, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Roland Dominic Jamora
- Department of Neurosciences, College of Medicine, Philippine General Hospital, University of the Philippines Manila, Manila, Philippines
| | - Eva Cutiongco-Dela Paz
- National Institutes of Health, University of the Philippines Manila, Manila, Philippines
- Philippine Genome Center, University of the Philippines, Diliman, Quezon City, Philippines
| | - Arndt Rolfs
- Albrecht-Kossel-Institute for Neuroregeneration, University of Rostock, Rostock, Germany
| | - Dirk Dressler
- Department of Neurology, Hannover Medical School, Hannover, Germany
| | - Uwe Walter
- Department of Neurology, University of Rostock, Rostock, Germany
| | - Dimitri Krainc
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Katja Lohmann
- Institute of Neurogenetics, University of Lübeck, Maria Goeppert Str. 1, 23562, Lübeck, Germany
| | - Ron Shamir
- Edmond J. Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv, Israel
| | - Christine Klein
- Institute of Neurogenetics, University of Lübeck, Maria Goeppert Str. 1, 23562, Lübeck, Germany.
| | - Ana Westenberger
- Institute of Neurogenetics, University of Lübeck, Maria Goeppert Str. 1, 23562, Lübeck, Germany
| |
Collapse
|
48
|
Alcaraz N, List M, Dissing-Hansen M, Rehmsmeier M, Tan Q, Mollenhauer J, Ditzel HJ, Baumbach J. Robust de novo pathway enrichment with KeyPathwayMiner 5. F1000Res 2016; 5:1531. [PMID: 27540470 PMCID: PMC4965696 DOI: 10.12688/f1000research.9054.1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/22/2016] [Indexed: 01/26/2023] Open
Abstract
Identifying functional modules or novel active pathways, recently termed de novo pathway enrichment, is a computational systems biology challenge that has gained much attention during the last decade. Given a large biological interaction network, KeyPathwayMiner extracts connected subnetworks that are enriched for differentially active entities from a series of molecular profiles encoded as binary indicator matrices. Since interaction networks constantly evolve, an important question is how robust the extracted results are when the network is modified. We enable users to study this effect through several network perturbation techniques and over a range of perturbation degrees. In addition, users may now provide a gold-standard set to determine how enriched extracted pathways are with relevant genes compared to randomized versions of the original network.
Collapse
Affiliation(s)
- Nicolas Alcaraz
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark; Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, 5000 Odense, Denmark
| | - Markus List
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark; Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, 5000 Odense, Denmark; Lundbeckfonden Center of Excellence in Nanomedicine NanoCAN, University of Southern Denmark, 5000 Odense, Denmark; Institute of Clinical Research, University of Southern Denmark, 5000 Odense, Denmark; Max Planck Institute for Informatics, 66123 Saarbrucken, Germany
| | - Martin Dissing-Hansen
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark
| | - Marc Rehmsmeier
- Integrated Research Institute (IRI) for the Life Sciences and Department of Biology, Humboldt-Universitat zu Berlin, 10099 Berlin, Germany
| | - Qihua Tan
- Institute of Clinical Research, University of Southern Denmark, 5000 Odense, Denmark; Epidemiology, Biostatistics and Biodemography, Institute of Public Health, University of Southern Denmark, 5000 Odense, Denmark
| | - Jan Mollenhauer
- Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, 5000 Odense, Denmark; Lundbeckfonden Center of Excellence in Nanomedicine NanoCAN, University of Southern Denmark, 5000 Odense, Denmark
| | - Henrik J Ditzel
- Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, 5000 Odense, Denmark; Lundbeckfonden Center of Excellence in Nanomedicine NanoCAN, University of Southern Denmark, 5000 Odense, Denmark; Department of Oncology, Odense University Hospital, 5000 Odense, Denmark
| | - Jan Baumbach
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark; Max Planck Institute for Informatics, 66123 Saarbrucken, Germany
| |
Collapse
|
49
|
Ou-Yang L, Wu M, Zhang XF, Dai DQ, Li XL, Yan H. A two-layer integration framework for protein complex detection. BMC Bioinformatics 2016; 17:100. [PMID: 26911324 PMCID: PMC4765032 DOI: 10.1186/s12859-016-0939-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 01/27/2016] [Indexed: 01/05/2023] Open
Abstract
Background Protein complexes carry out nearly all signaling and functional processes within cells. The study of protein complexes is an effective strategy to analyze cellular functions and biological processes. With the increasing availability of proteomics data, various computational methods have recently been developed to predict protein complexes. However, different computational methods are based on their own assumptions and designed to work on different data sources, and various biological screening methods have their unique experiment conditions, and are often different in scale and noise level. Therefore, a single computational method on a specific data source is generally not able to generate comprehensive and reliable prediction results. Results In this paper, we develop a novel Two-layer INtegrative Complex Detection (TINCD) model to detect protein complexes, leveraging the information from both clustering results and raw data sources. In particular, we first integrate various clustering results to construct consensus matrices for proteins to measure their overall co-complex propensity. Second, we combine these consensus matrices with the co-complex score matrix derived from Tandem Affinity Purification/Mass Spectrometry (TAP) data and obtain an integrated co-complex similarity network via an unsupervised metric fusion method. Finally, a novel graph regularized doubly stochastic matrix decomposition model is proposed to detect overlapping protein complexes from the integrated similarity network. Conclusions Extensive experimental results demonstrate that TINCD performs much better than 21 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0939-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Le Ou-Yang
- College of Information Engineering, Shenzhen University, Shenzhen, 518060, China. .,Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China. .,Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China.
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore, Singapore.
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China.
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China.
| | - Xiao-Li Li
- Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore, Singapore.
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
50
|
Structural and Functional Characterization of a Caenorhabditis elegans Genetic Interaction Network within Pathways. PLoS Comput Biol 2016; 12:e1004738. [PMID: 26871911 PMCID: PMC4752231 DOI: 10.1371/journal.pcbi.1004738] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 01/05/2016] [Indexed: 12/02/2022] Open
Abstract
A genetic interaction (GI) is defined when the mutation of one gene modifies the phenotypic expression associated with the mutation of a second gene. Genome-wide efforts to map GIs in yeast revealed structural and functional properties of a GI network. This provided insights into the mechanisms underlying the robustness of yeast to genetic and environmental insults, and also into the link existing between genotype and phenotype. While a significant conservation of GIs and GI network structure has been reported between distant yeast species, such a conservation is not clear between unicellular and multicellular organisms. Structural and functional characterization of a GI network in these latter organisms is consequently of high interest. In this study, we present an in-depth characterization of ~1.5K GIs in the nematode Caenorhabditis elegans. We identify and characterize six distinct classes of GIs by examining a wide-range of structural and functional properties of genes and network, including co-expression, phenotypical manifestations, relationship with protein-protein interaction dense subnetworks (PDS) and pathways, molecular and biological functions, gene essentiality and pleiotropy. Our study shows that GI classes link genes within pathways and display distinctive properties, specifically towards PDS. It suggests a model in which pathways are composed of PDS-centric and PDS-independent GIs coordinating molecular machines through two specific classes of GIs involving pleiotropic and non-pleiotropic connectors. Our study provides the first in-depth characterization of a GI network within pathways of a multicellular organism. It also suggests a model to understand better how GIs control system robustness and evolution. Network biology has focused for years on protein-protein interaction (PPI) networks, identifying nodes with central structural functions and modules associated to bioprocesses, phenotypes and diseases. Network biology field moved to a higher level of abstraction, and started characterizing a less intuitive kind of interactions, called genetic interactions (GIs) or epistasis. Mostly due to technical challenges associated to the genome-wide mapping of GIs, these studies primarily focused on unicellular organisms. They uncovered modules embedded within the structure of these networks and started characterizing their relationship with PPI-network and biological functions. We provide here the first in-depth characterization of a network composed of ~600 GIs within signaling and metabolic pathways of a multicellular organism, the nematode Caenorhabditis elegans. We characterize the structure of this network, and the function of GI classes found in this network. We also discuss how these GI classes contribute to the genomic robustness and the adaptive evolution of multicellular organisms.
Collapse
|