1
|
Tilajka Á, Kurilla A, László L, Lovrics A, Novák J, Takács T, Buday L, Vas V. Predictive value analysis of the interaction network of Tks4 scaffold protein in colon cancer. Front Mol Biosci 2024; 11:1414805. [PMID: 39234565 PMCID: PMC11371697 DOI: 10.3389/fmolb.2024.1414805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 07/31/2024] [Indexed: 09/06/2024] Open
Abstract
Background Colorectal carcinoma (CRC) has emerged as one of the most widespread cancers and was the third leading cause of cancer-related mortality in 2020. The role of the podosomal protein Tks4 in tumor formation and progression is well established, including its involvement in gastric carcinoma and hepatocellular carcinoma; however, exploration of Tks4 and its associated EMT-regulating interactome in the context of colon cancer remains largely unexplored. Methods We conducted a comprehensive bioinformatic analysis to investigate the mRNA and protein expression levels of Tks4 and its associated partner molecules (CD2AP, GRB2, WASL, SRC, CTTN, and CAPZA1) across different tumor types. We quantified the expression levels of Tks4 and its partner molecules using qPCR, utilizing a TissueScan colon cancer array. We then validated the usefulness of Tks4 and its associated molecules as biomarkers via careful statistical analyses, including Pearson's correlation analysis, principal component analysis (PCA), multiple logistic regression, confusion matrix analysis, and ROC analysis. Results Our findings indicate that the co-expression patterns of the seven examined biomarker candidates better differentiate between tumor and normal samples compared with the expression levels of the individual genes. Moreover, variable importance analysis of these seven genes revealed four core genes that yield consistent results similar to the seven genes. Thus, these four core genes from the Tks4 interactome hold promise as potential combined biomarkers for colon adenocarcinoma diagnosis and prognosis. Conclusion Our proposed biomarker set from the Tks4 interactome shows promising sensitivity and specificity, aiding in colon cancer prevention and diagnosis.
Collapse
Affiliation(s)
- Álmos Tilajka
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
- Doctoral School of Biology, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Anita Kurilla
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
| | - Loretta László
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
- Doctoral School of Biology, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Anna Lovrics
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
| | - Julianna Novák
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
| | - Tamás Takács
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
- Doctoral School of Biology, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Hungary
| | - László Buday
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Virag Vas
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
2
|
Pollex T, Marco-Ferreres R, Ciglar L, Ghavi-Helm Y, Rabinowitz A, Viales RR, Schaub C, Jankowski A, Girardot C, Furlong EEM. Chromatin gene-gene loops support the cross-regulation of genes with related function. Mol Cell 2024; 84:822-838.e8. [PMID: 38157845 DOI: 10.1016/j.molcel.2023.12.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/31/2023] [Accepted: 12/14/2023] [Indexed: 01/03/2024]
Abstract
Chromatin loops between gene pairs have been observed in diverse contexts in both flies and vertebrates. Combining high-resolution Capture-C, DNA fluorescence in situ hybridization, and genetic perturbations, we dissect the functional role of three loops between genes with related function during Drosophila embryogenesis. By mutating the loop anchor (but not the gene) or the gene (but not loop anchor), we disentangle loop formation and gene expression and show that the 3D proximity of paralogous gene loci supports their co-regulation. Breaking the loop leads to either an attenuation or enhancement of expression and perturbs their relative levels of expression and cross-regulation. Although many loops appear constitutive across embryogenesis, their function can change in different developmental contexts. Taken together, our results indicate that chromatin gene-gene loops act as architectural scaffolds that can be used in different ways in different contexts to fine-tune the coordinated expression of genes with related functions and sustain their cross-regulation.
Collapse
Affiliation(s)
- Tim Pollex
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Raquel Marco-Ferreres
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Lucia Ciglar
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Yad Ghavi-Helm
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Adam Rabinowitz
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | | | - Christoph Schaub
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Aleksander Jankowski
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Charles Girardot
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany.
| |
Collapse
|
3
|
Zhao S, Yang X, Zeng Z, Qian P, Zhao Z, Dai L, Prabhu N, Nordlund P, Tam WL. Deep learning based CETSA feature prediction cross multiple cell lines with latent space representation. Sci Rep 2024; 14:1878. [PMID: 38253642 PMCID: PMC10810365 DOI: 10.1038/s41598-024-51193-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 01/01/2024] [Indexed: 01/24/2024] Open
Abstract
Mass spectrometry-coupled cellular thermal shift assay (MS-CETSA), a biophysical principle-based technique that measures the thermal stability of proteins at the proteome level inside the cell, has contributed significantly to the understanding of drug mechanisms of action and the dissection of protein interaction dynamics in different cellular states. One of the barriers to the wide applications of MS-CETSA is that MS-CETSA experiments must be performed on the specific cell lines of interest, which is typically time-consuming and costly in terms of labeling reagents and mass spectrometry time. In this study, we aim to predict CETSA features in various cell lines by introducing a computational framework called CycleDNN based on deep neural network technology. For a given set of n cell lines, CycleDNN comprises n auto-encoders. Each auto-encoder includes an encoder to convert CETSA features from one cell line into latent features in a latent space [Formula: see text]. It also features a decoder that transforms the latent features back into CETSA features for another cell line. In such a way, the proposed CycleDNN creates a cyclic prediction of CETSA features across different cell lines. The prediction loss, cycle-consistency loss, and latent space regularization loss are used to guide the model training. Experimental results on a public CETSA dataset demonstrate the effectiveness of our proposed approach. Furthermore, we confirm the validity of the predicted MS-CETSA data from our proposed CycleDNN through validation in protein-protein interaction prediction.
Collapse
Affiliation(s)
- Shenghao Zhao
- Institute for Infocomm Research (I2R), A*STAR, Singapore, 138632, Singapore
- National University of Singapore (NUS), Singapore, 119077, Singapore
| | - Xulei Yang
- Institute for Infocomm Research (I2R), A*STAR, Singapore, 138632, Singapore.
| | - Zeng Zeng
- Institute for Infocomm Research (I2R), A*STAR, Singapore, 138632, Singapore
| | - Peisheng Qian
- Institute for Infocomm Research (I2R), A*STAR, Singapore, 138632, Singapore
| | - Ziyuan Zhao
- Institute for Infocomm Research (I2R), A*STAR, Singapore, 138632, Singapore
| | - Lingyun Dai
- Institute of Molecular and Cell Biology (IMCB), A*STAR, Singapore, 138632, Singapore
- The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People's Hospital, Shenzhen, 518020, China
| | - Nayana Prabhu
- Institute of Molecular and Cell Biology (IMCB), A*STAR, Singapore, 138632, Singapore
| | - Pär Nordlund
- Institute of Molecular and Cell Biology (IMCB), A*STAR, Singapore, 138632, Singapore
- Department of Oncology and Pathology, Karolinska Institutet, 171 77, Stockholm, Sweden
| | - Wai Leong Tam
- Genome Institute of Singapore (GIS), A*STAR, Singapore, 138632, Singapore.
| |
Collapse
|
4
|
Campelo dos Santos AL, DeGiorgio M, Assis R. Predicting evolutionary targets and parameters of gene deletion from expression data. BIOINFORMATICS ADVANCES 2024; 4:vbae002. [PMID: 38282974 PMCID: PMC10812876 DOI: 10.1093/bioadv/vbae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 12/08/2023] [Accepted: 01/04/2024] [Indexed: 01/30/2024]
Abstract
Motivation Gene deletion is traditionally thought of as a nonadaptive process that removes functional redundancy from genomes, such that it generally receives less attention than duplication in evolutionary turnover studies. Yet, mounting evidence suggests that deletion may promote adaptation via the "less-is-more" evolutionary hypothesis, as it often targets genes harboring unique sequences, expression profiles, and molecular functions. Hence, predicting the relative prevalence of redundant and unique functions among genes targeted by deletion, as well as the parameters underlying their evolution, can shed light on the role of gene deletion in adaptation. Results Here, we present CLOUDe, a suite of machine learning methods for predicting evolutionary targets of gene deletion events from expression data. Specifically, CLOUDe models expression evolution as an Ornstein-Uhlenbeck process, and uses multi-layer neural network, extreme gradient boosting, random forest, and support vector machine architectures to predict whether deleted genes are "redundant" or "unique", as well as several parameters underlying their evolution. We show that CLOUDe boasts high power and accuracy in differentiating between classes, and high accuracy and precision in estimating evolutionary parameters, with optimal performance achieved by its neural network architecture. Application of CLOUDe to empirical data from Drosophila suggests that deletion primarily targets genes with unique functions, with further analysis showing these functions to be enriched for protein deubiquitination. Thus, CLOUDe represents a key advance in learning about the role of gene deletion in functional evolution and adaptation. Availability and implementation CLOUDe is freely available on GitHub (https://github.com/anddssan/CLOUDe).
Collapse
Affiliation(s)
- Andre Luiz Campelo dos Santos
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
| | - Raquel Assis
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, FL 33431, United States
| |
Collapse
|
5
|
Merle DA, Sen M, Armento A, Stanton CM, Thee EF, Meester-Smoor MA, Kaiser M, Clark SJ, Klaver CCW, Keane PA, Wright AF, Ehrmann M, Ueffing M. 10q26 - The enigma in age-related macular degeneration. Prog Retin Eye Res 2023; 96:101154. [PMID: 36513584 DOI: 10.1016/j.preteyeres.2022.101154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/21/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022]
Abstract
Despite comprehensive research efforts over the last decades, the pathomechanisms of age-related macular degeneration (AMD) remain far from being understood. Large-scale genome wide association studies (GWAS) were able to provide a defined set of genetic aberrations which contribute to disease risk, with the strongest contributors mapping to distinct regions on chromosome 1 and 10. While the chromosome 1 locus comprises factors of the complement system with well-known functions, the role of the 10q26-locus in AMD-pathophysiology remains enigmatic. 10q26 harbors a cluster of three functional genes, namely PLEKHA1, ARMS2 and HTRA1, with most of the AMD-associated genetic variants mapping to the latter two genes. High linkage disequilibrium between ARMS2 and HTRA1 has kept association studies from reliably defining the risk-causing gene for long and only very recently the genetic risk region has been narrowed to ARMS2, suggesting that this is the true AMD gene at this locus. However, genetic associations alone do not suffice to prove causality and one or more of the 14 SNPs on this haplotype may be involved in long-range control of gene expression, leaving HTRA1 and PLEKHA1 still suspects in the pathogenic pathway. Both, ARMS2 and HTRA1 have been linked to extracellular matrix homeostasis, yet their exact molecular function as well as their role in AMD pathogenesis remains to be uncovered. The transcriptional regulation of the 10q26 locus adds an additional level of complexity, given, that gene-regulatory as well as epigenetic alterations may influence expression levels from 10q26 in diseased individuals. Here, we provide a comprehensive overview on the 10q26 locus and its three gene products on various levels of biological complexity and discuss current and future research strategies to shed light on one of the remaining enigmatic spots in the AMD landscape.
Collapse
Affiliation(s)
- David A Merle
- Institute for Ophthalmic Research, Department for Ophthalmology, Eberhard Karls University of Tübingen, 72076, Tübingen, Germany; Department for Ophthalmology, University Eye Clinic, Eberhard Karls University of Tübingen, 72076, Tübingen, Germany; Department of Ophthalmology, Medical University of Graz, 8036, Graz, Austria.
| | - Merve Sen
- Institute for Ophthalmic Research, Department for Ophthalmology, Eberhard Karls University of Tübingen, 72076, Tübingen, Germany
| | - Angela Armento
- Institute for Ophthalmic Research, Department for Ophthalmology, Eberhard Karls University of Tübingen, 72076, Tübingen, Germany
| | - Chloe M Stanton
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Eric F Thee
- Department of Ophthalmology, Erasmus University Medical Center, 3015GD, Rotterdam, Netherlands; Department of Epidemiology, Erasmus University Medical Center, 3015CE, Rotterdam, Netherlands
| | - Magda A Meester-Smoor
- Department of Ophthalmology, Erasmus University Medical Center, 3015GD, Rotterdam, Netherlands; Department of Epidemiology, Erasmus University Medical Center, 3015CE, Rotterdam, Netherlands
| | - Markus Kaiser
- Center of Medical Biotechnology, Faculty of Biology, University Duisburg-Essen, 45117, Essen, Germany
| | - Simon J Clark
- Institute for Ophthalmic Research, Department for Ophthalmology, Eberhard Karls University of Tübingen, 72076, Tübingen, Germany; Department for Ophthalmology, University Eye Clinic, Eberhard Karls University of Tübingen, 72076, Tübingen, Germany; Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PT, UK
| | - Caroline C W Klaver
- Department of Ophthalmology, Erasmus University Medical Center, 3015GD, Rotterdam, Netherlands; Department of Epidemiology, Erasmus University Medical Center, 3015CE, Rotterdam, Netherlands; Department of Ophthalmology, Radboudumc, 6525EX, Nijmegen, Netherlands; Institute of Molecular and Clinical Ophthalmology Basel, CH-4031, Basel, Switzerland
| | - Pearse A Keane
- Institute for Health Research, Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust, UCL Institute of Ophthalmology, London, EC1V 2PD, UK
| | - Alan F Wright
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Michael Ehrmann
- Center of Medical Biotechnology, Faculty of Biology, University Duisburg-Essen, 45117, Essen, Germany
| | - Marius Ueffing
- Institute for Ophthalmic Research, Department for Ophthalmology, Eberhard Karls University of Tübingen, 72076, Tübingen, Germany; Department for Ophthalmology, University Eye Clinic, Eberhard Karls University of Tübingen, 72076, Tübingen, Germany.
| |
Collapse
|
6
|
Ye J, Li A, Zheng H, Yang B, Lu Y. Machine Learning Advances in Predicting Peptide/Protein-Protein Interactions Based on Sequence Information for Lead Peptides Discovery. Adv Biol (Weinh) 2023; 7:e2200232. [PMID: 36775876 DOI: 10.1002/adbi.202200232] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/30/2022] [Indexed: 02/14/2023]
Abstract
Peptides have shown increasing advantages and significant clinical value in drug discovery and development. With the development of high-throughput technologies and artificial intelligence (AI), machine learning (ML) methods for discovering new lead peptides have been expanded and incorporated into rational drug design. Predictions of peptide-protein interactions (PepPIs) and protein-protein interactions (PPIs) are both opportunities and challenges in computational biology, which will help to better understand the mechanisms of disease and provide the impetus for the discovery of lead peptides. This paper comprehensively reviews computational models for PepPI and PPI predictions. It begins with an introduction of various databases of peptide ligands and target proteins. Then it discusses data formats and feature representations for proteins and peptides. Furthermore, classical ML methods and emerging deep learning (DL) methods that can be used to train prediction models of PepPI and PPI are classified into four categories, and their advantages and disadvantages are analyzed. To assess the relative performance of different models, different validation protocols and evaluation indexes are discussed. The goal of this review is to help researchers quickly get started to develop computational frameworks using these integrated resources and eventually promote the discovery of lead peptides.
Collapse
Affiliation(s)
- Jiahao Ye
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - An Li
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Hao Zheng
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Banghua Yang
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Yiming Lu
- School of Medicine, Shanghai University, Shanghai, 200444, China
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| |
Collapse
|
7
|
Xie P, Zhuang J, Tian G, Yang J. Emvirus: An embedding-based neural framework for human-virus protein-protein interactions prediction. BIOSAFETY AND HEALTH 2023; 5:152-158. [PMID: 37362223 PMCID: PMC10166638 DOI: 10.1016/j.bsheal.2023.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/23/2023] [Accepted: 04/23/2023] [Indexed: 06/28/2023] Open
Abstract
Human-virus protein-protein interactions (PPIs) play critical roles in viral infection. For example, the spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) binds primarily to human angiotensin-converting enzyme 2 (ACE2) protein to infect human cells. Thus, identifying and blocking these PPIs contribute to controlling and preventing viruses. However, wet-lab experiment-based identification of human-virus PPIs is usually expensive, labor-intensive, and time-consuming, which presents the need for computational methods. Many machine-learning methods have been proposed recently and achieved good results in predicting human-virus PPIs. However, most methods are based on protein sequence features and apply manually extracted features, such as statistical characteristics, phylogenetic profiles, and physicochemical properties. In this work, we present an embedding-based neural framework with convolutional neural network (CNN) and bi-directional long short-term memory unit (Bi-LSTM) architecture, named Emvirus, to predict human-virus PPIs (including human-SARS-CoV-2 PPIs). In addition, we conduct cross-viral experiments to explore the generalization ability of Emvirus. Compared to other feature extraction methods, Emvirus achieves better prediction accuracy.
Collapse
Affiliation(s)
- Pengfei Xie
- College of Transportation Engineering, Dalian Maritime University, Dalian 116026, China
| | - Jujuan Zhuang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing 100102, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Jialiang Yang
- Geneis Beijing Co., Ltd., Beijing 100102, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| |
Collapse
|
8
|
Ghosh N, Saha I, Gambin A. Interactome-Based Machine Learning Predicts Potential Therapeutics for COVID-19. ACS OMEGA 2023; 8:13840-13854. [PMID: 37163139 PMCID: PMC10084923 DOI: 10.1021/acsomega.3c00030] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 02/22/2023] [Indexed: 05/11/2023]
Abstract
COVID-19, the disease caused by SARS-CoV-2, has been disrupting our lives for more than two years now. SARS-CoV-2 interacts with human proteins to pave its way into the human body, thereby wreaking havoc. Moreover, the mutating variants of the virus that take place in the SARS-CoV-2 genome are also a cause of concern among the masses. Thus, it is very important to understand human-spike protein-protein interactions (PPIs) in order to predict new PPIs and consequently propose drugs for the human proteins in order to fight the virus and its different mutated variants, with the mutations occurring in the spike protein. This fact motivated us to develop a complete pipeline where PPIs and drug-protein interactions can be predicted for human-SARS-CoV-2 interactions. In this regard, initially interacting data sets are collected from the literature, and noninteracting data sets are subsequently created for human-SARS-CoV-2 by considering only spike glycoprotein. On the other hand, for drug-protein interactions both interacting and noninteracting data sets are considered from DrugBank and ChEMBL databases. Thereafter, a model based on a sequence-based feature is used to code the protein sequences of human and spike proteins using the well-known Moran autocorrelation technique, while the drugs are coded using another well-known technique, viz., PaDEL descriptors, to predict new human-spike PPIs and eventually new drug-protein interactions for the top 20 predicted human proteins interacting with the original spike protein and its different mutated variants like Alpha, Beta, Delta, Gamma, and Omicron. Such predictions are carried out by random forest as it is found to perform better than other predictors, providing an accuracy of 90.53% for human-spike PPI and 96.15% for drug-protein interactions. Finally, 40 unique drugs like eicosapentaenoic acid, doxercalciferol, ciclesonide, dexamethasone, methylprednisolone, etc. are identified that target 32 human proteins like ACACA, DST, DYNC1H1, etc.
Collapse
Affiliation(s)
- Nimisha Ghosh
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, 00-927 Warsaw, Poland
- Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha 'O' Anusandhan, Bhubaneswar, 751030 Odisha, India
| | - Indrajit Saha
- Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, 700106 West Bengal, India
| | - Anna Gambin
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, 00-927 Warsaw, Poland
| |
Collapse
|
9
|
Wang X, Yang W, Yang Y, He Y, Zhang J, Wang L, Hu L. PPISB: A Novel Network-Based Algorithm of Predicting Protein-Protein Interactions With Mixed Membership Stochastic Blockmodel. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1606-1612. [PMID: 35939453 DOI: 10.1109/tcbb.2022.3196336] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein-protein interactions (PPIs) play an essential role for most of biological processes in cells. Many computational algorithms have thus been proposed to predict PPIs. However, most of them heavily rest on the biological information of proteins while ignoring the latent structural features of proteins presented in a PPI network. In this paper, we propose an efficient network-based prediction algorithm, namely PPISB, based on a mixed membership stochastic blockmodel. By simulating the generative process of a PPI network, PPISB is able to capture the latent community structures. The inference procedure adopted by PPISB further optimizes the membership distributions of proteins over different complexes. After that, a distance measure is designed to compute the similarity between two proteins in terms of their likelihoods of being in the same complex, thus verifying whether they interact with each other or not. To evaluate the performance of PPISB, a series of extensive experiments have been conducted with five PPI networks collected from different species and the results demonstrate that PPISB has a promising performance when applied to predict PPIs in terms of several evaluation metrics. Hence, we reason that PPISB is preferred over state-of-the-art network-based prediction algorithms especially for predicting potential PPIs.
Collapse
|
10
|
Zhao H, Datta S, Duan ZH. An Integrated Approach of Learning Genetic Networks From Genome-Wide Gene Expression Data Using Gaussian Graphical Model and Monte Carlo Method. Bioinform Biol Insights 2023; 17:11779322231152972. [PMID: 36865982 PMCID: PMC9972065 DOI: 10.1177/11779322231152972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 01/02/2023] [Indexed: 03/02/2023] Open
Abstract
Global genetic networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single genes or local networks. The Gaussian graphical model (GGM) is widely applied to learn genetic networks because it defines an undirected graph decoding the conditional dependence between genes. Many algorithms based on the GGM have been proposed for learning genetic network structures. Because the number of gene variables is typically far more than the number of samples collected, and a real genetic network is typically sparse, the graphical lasso implementation of GGM becomes a popular tool for inferring the conditional interdependence among genes. However, graphical lasso, although showing good performance in low dimensional data sets, is computationally expensive and inefficient or even unable to work directly on genome-wide gene expression data sets. In this study, the method of Monte Carlo Gaussian graphical model (MCGGM) was proposed to learn global genetic networks of genes. This method uses a Monte Carlo approach to sample subnetworks from genome-wide gene expression data and graphical lasso to learn the structures of the subnetworks. The learned subnetworks are then integrated to approximate a global genetic network. The proposed method was evaluated with a relatively small real data set of RNA-seq expression levels. The results indicate the proposed method shows a strong ability of decoding the interactions with high conditional dependences among genes. The method was then applied to genome-wide data sets of RNA-seq expression levels. The gene interactions with high interdependence from the estimated global networks show that most of the predicted gene-gene interactions have been reported in the literatures playing important roles in different human cancers. Also, the results validate the ability and reliability of the proposed method to identify high conditional dependences among genes in large-scale data sets.
Collapse
Affiliation(s)
- Haitao Zhao
- Department of Mathematics and Computer
Science, The University of North Carolina at Pembroke, Pembroke, NC, USA,Haitao Zhao, Department of Mathematics and
Computer Science, The University of North Carolina at Pembroke, Pembroke, NC,
USA.
| | - Sujay Datta
- Department of Statistics, The
University of Akron, Akron, OH, USA
| | - Zhong-Hui Duan
- Department of Computer Science, The
University of Akron, Akron, OH, USA
| |
Collapse
|
11
|
Sun KF, Sun LM, Zhou D, Chen YY, Hao XW, Liu HR, Liu X, Chen JJ. XGBG: A Novel Method for Identifying Ovarian Carcinoma Susceptible Genes Based on Deep Learning. Front Oncol 2022; 12:897503. [PMID: 35646648 PMCID: PMC9133413 DOI: 10.3389/fonc.2022.897503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 04/08/2022] [Indexed: 11/30/2022] Open
Abstract
Ovarian carcinomas (OCs) represent a heterogeneous group of neoplasms consisting of several entities with pathogenesis, molecular profiles, multiple risk factors, and outcomes. OC has been regarded as the most lethal cancer among women all around the world. There are at least five main types of OCs classified by the fifth edition of the World Health Organization of tumors: high-/low-grade serous carcinoma, mucinous carcinoma, clear cell carcinoma, and endometrioid carcinoma. With the improved knowledge of genome-wide association study (GWAS) and expression quantitative trait locus (eQTL) analyses, the knowledge of genomic landscape of complex diseases has been uncovered in large measure. Moreover, pathway analyses also play an important role in exploring the underlying mechanism of complex diseases by providing curated pathway models and information about molecular dynamics and cellular processes. To investigate OCs deeper, we introduced a novel disease susceptible gene prediction method, XGBG, which could be used in identifying OC-related genes based on different omics data and deep learning methods. We first employed the graph convolutional network (GCN) to reconstruct the gene features based on both gene feature and network topological structure. Then, a boosting method is utilized to predict OC susceptible genes. As a result, our model achieved a high AUC of 0.7541 and an AUPR of 0.8051, which indicates the effectiveness of the XGPG. Based on the newly predicted OC susceptible genes, we gathered and researched related literatures to provide strong support to the results, which may help in understanding the pathogenesis and mechanisms of the disease.
Collapse
Affiliation(s)
- Ke Feng Sun
- Department of Obstetrics and Gynecology, First Affiliated Hospital, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Li Min Sun
- Department of Oncology, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Dong Zhou
- Department of Oncology, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Ying Ying Chen
- Department of Nephrology, The First Affiliated Hospital of Heilongjiang University of Chinese Medical, Harbin, China
| | - Xi Wen Hao
- Heilongjiang University of Chinese Medicine, Harbin, China
| | - Hong Ruo Liu
- Department of Oncology, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Xin Liu
- Department of Oncology, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Jing Jing Chen
- Department of Rheumatology and Immunology, The First Hospital Affiliated to Army Medical University, Chongqing, China
| |
Collapse
|
12
|
Harrison BR, Hoffman JM, Samuelson A, Raftery D, Promislow DEL. Modular Evolution of the Drosophila Metabolome. Mol Biol Evol 2022; 39:msab307. [PMID: 34662414 PMCID: PMC8760934 DOI: 10.1093/molbev/msab307] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Comparative phylogenetic studies offer a powerful approach to study the evolution of complex traits. Although much effort has been devoted to the evolution of the genome and to organismal phenotypes, until now relatively little work has been done on the evolution of the metabolome, despite the fact that it is composed of the basic structural and functional building blocks of all organisms. Here we explore variation in metabolite levels across 50 My of evolution in the genus Drosophila, employing a common garden design to measure the metabolome within and among 11 species of Drosophila. We find that both sex and age have dramatic and evolutionarily conserved effects on the metabolome. We also find substantial evidence that many metabolite pairs covary after phylogenetic correction, and that such metabolome coevolution is modular. Some of these modules are enriched for specific biochemical pathways and show different evolutionary trajectories, with some showing signs of stabilizing selection. Both observations suggest that functional relationships may ultimately cause such modularity. These coevolutionary patterns also differ between sexes and are affected by age. We explore the relevance of modular evolution to fitness by associating modules with lifespan variation measured in the same common garden. We find several modules associated with lifespan, particularly in the metabolome of older flies. Oxaloacetate levels in older females appear to coevolve with lifespan, and a lifespan-associated module in older females suggests that metabolic associations could underlie 50 My of lifespan evolution.
Collapse
Affiliation(s)
- Benjamin R Harrison
- Department of Lab Medicine & Pathology, University of Washington School of Medicine, Seattle, WA, USA
| | - Jessica M Hoffman
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ariana Samuelson
- Department of Biology, University of Washington, Seattle, WA, USA
| | - Daniel Raftery
- Department of Anesthesiology & Pain Medicine, University of Washington School of Medicine, Seattle, WA, USA
| | - Daniel E L Promislow
- Department of Lab Medicine & Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Biology, University of Washington, Seattle, WA, USA
| |
Collapse
|
13
|
Ma JX, Yang Y, Li G, Ma BG. Computationally Reconstructed Interactome of Bradyrhizobium diazoefficiens USDA110 Reveals Novel Functional Modules and Protein Hubs for Symbiotic Nitrogen Fixation. Int J Mol Sci 2021; 22:11907. [PMID: 34769335 PMCID: PMC8584416 DOI: 10.3390/ijms222111907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 10/22/2021] [Accepted: 10/28/2021] [Indexed: 11/16/2022] Open
Abstract
Symbiotic nitrogen fixation is an important part of the nitrogen biogeochemical cycles and the main nitrogen source of the biosphere. As a classical model system for symbiotic nitrogen fixation, rhizobium-legume systems have been studied elaborately for decades. Details about the molecular mechanisms of the communication and coordination between rhizobia and host plants is becoming clearer. For more systematic insights, there is an increasing demand for new studies integrating multiomics information. Here, we present a comprehensive computational framework integrating the reconstructed protein interactome of B. diazoefficiens USDA110 with its transcriptome and proteome data to study the complex protein-protein interaction (PPI) network involved in the symbiosis system. We reconstructed the interactome of B. diazoefficiens USDA110 by computational approaches. Based on the comparison of interactomes between B. diazoefficiens USDA110 and other rhizobia, we inferred that the slow growth of B. diazoefficiens USDA110 may be due to the requirement of more protein modifications, and we further identified 36 conserved functional PPI modules. Integrated with transcriptome and proteome data, interactomes representing free-living cell and symbiotic nitrogen-fixing (SNF) bacteroid were obtained. Based on the SNF interactome, a core-sub-PPI-network for symbiotic nitrogen fixation was determined and nine novel functional modules and eleven key protein hubs playing key roles in symbiosis were identified. The reconstructed interactome of B. diazoefficiens USDA110 may serve as a valuable reference for studying the mechanism underlying the SNF system of rhizobia and legumes.
Collapse
Affiliation(s)
| | | | | | - Bin-Guang Ma
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; (J.-X.M.); (Y.Y.); (G.L.)
| |
Collapse
|
14
|
Ghosh N, Saha I, Sharma N. Interactome of human and SARS-CoV-2 proteins to identify human hub proteins associated with comorbidities. Comput Biol Med 2021; 138:104889. [PMID: 34655901 PMCID: PMC8492901 DOI: 10.1016/j.compbiomed.2021.104889] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 09/22/2021] [Accepted: 09/22/2021] [Indexed: 02/06/2023]
Abstract
SARS-CoV-2 has a higher chance of progression in adults of any age with certain underlying health conditions or comorbidities like cancer, neurological diseases and in certain cases may even lead to death. Like other viruses, SARS-CoV-2 also interacts with host proteins to pave its entry into host cells. Therefore, to understand the behaviour of SARS-CoV-2 and design of effective antiviral drugs, host-virus protein-protein interactions (PPIs) can be very useful. In this regard, we have initially created a human-SARS-CoV-2 PPI database from existing works in the literature which has resulted in 7085 unique PPIs. Subsequently, we have identified at most 10 proteins with highest degrees viz. hub proteins from interacting human proteins for individual virus protein. The identification of these hub proteins is important as they are connected to most of the other human proteins. Consequently, when they get affected, the potential diseases are triggered in the corresponding pathways, thereby leading to comorbidities. Furthermore, the biological significance of the identified hub proteins is shown using KEGG pathway and GO enrichment analysis. KEGG pathway analysis is also essential for identifying the pathways leading to comorbidities. Among others, SARS-CoV-2 proteins viz. NSP2, NSP5, Envelope and ORF10 interacting with human hub proteins like COX4I1, COX5A, COX5B, NDUFS1, CANX, HSP90AA1 and TP53 lead to comorbidities. Such comorbidities are Alzheimer, Parkinson, Huntington, HTLV-1 infection, prostate cancer and viral carcinogenesis. Subsequently, using Enrichr tool possible repurposable drugs which target the human hub proteins are reported in this paper as well. Therefore, this work provides a consolidated study for human-SARS-CoV-2 protein interactions to understand the relationship between comorbidity and hub proteins so that it may pave the way for the development of anti-viral drugs.
Collapse
Affiliation(s)
- Nimisha Ghosh
- Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to Be University), Bhubaneswar, Odisha, India; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Indrajit Saha
- Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, West Bengal, India.
| | - Nikhil Sharma
- Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
| |
Collapse
|
15
|
Chiliński M, Sengupta K, Plewczynski D. From DNA human sequence to the chromatin higher order organisation and its biological meaning: Using biomolecular interaction networks to understand the influence of structural variation on spatial genome organisation and its functional effect. Semin Cell Dev Biol 2021; 121:171-185. [PMID: 34429265 DOI: 10.1016/j.semcdb.2021.08.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 08/06/2021] [Accepted: 08/12/2021] [Indexed: 12/30/2022]
Abstract
The three-dimensional structure of the human genome has been proven to have a significant functional impact on gene expression. The high-order spatial chromatin is organised first by looping mediated by multiple protein factors, and then it is further formed into larger structures of topologically associated domains (TADs) or chromatin contact domains (CCDs), followed by A/B compartments and finally the chromosomal territories (CTs). The genetic variation observed in human population influences the multi-scale structures, posing a question regarding the functional impact of structural variants reflected by the variability of the genes expression patterns. The current methods of evaluating the functional effect include eQTLs analysis which uses statistical testing of influence of variants on spatially close genes. Rarely, non-coding DNA sequence changes are evaluated by their impact on the biomolecular interaction network (BIN) reflecting the cellular interactome that can be analysed by the classical graph-theoretic algorithms. Therefore, in the second part of the review, we introduce the concept of BIN, i.e. a meta-network model of the complete molecular interactome developed by integrating various biological networks. The BIN meta-network model includes DNA-protein binding by the plethora of protein factors as well as chromatin interactions, therefore allowing connection of genomics with the downstream biomolecular processes present in a cell. As an illustration, we scrutinise the chromatin interactions mediated by the CTCF protein detected in a ChIA-PET experiment in the human lymphoblastoid cell line GM12878. In the corresponding BIN meta-network the DNA spatial proximity is represented as a graph model, combined with the Proteins-Interaction Network (PIN) of human proteome using the Gene Association Network (GAN). Furthermore, we enriched the BIN with the signalling and metabolic pathways and Gene Ontology (GO) terms to assert its functional context. Finally, we mapped the Single Nucleotide Polymorphisms (SNPs) from the GWAS studies and identified the chromatin mutational hot-spots associated with a significant enrichment of SNPs related to autoimmune diseases. Afterwards, we mapped Structural Variants (SVs) from healthy individuals of 1000 Genomes Project and identified an interesting example of the missing protein complex associated with protein Q6GYQ0 due to a deletion on chromosome 14. Such an analysis using the meta-network BIN model is therefore helpful in evaluating the influence of genetic variation on spatial organisation of the genome and its functional effect in a cell.
Collapse
Affiliation(s)
- Mateusz Chiliński
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland; Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Kaustav Sengupta
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland; Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland.
| |
Collapse
|
16
|
Dandage R, Berger CM, Gagnon-Arsenault I, Moon KM, Stacey RG, Foster LJ, Landry CR. Frequent Assembly of Chimeric Complexes in the Protein Interaction Network of an Interspecies Yeast Hybrid. Mol Biol Evol 2021; 38:1384-1401. [PMID: 33252673 PMCID: PMC8042767 DOI: 10.1093/molbev/msaa298] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Hybrids between species often show extreme phenotypes, including some that take place at the molecular level. In this study, we investigated the phenotypes of an interspecies diploid hybrid in terms of protein–protein interactions inferred from protein correlation profiling. We used two yeast species, Saccharomyces cerevisiae and Saccharomyces uvarum, which are interfertile, but yet have proteins diverged enough to be differentiated using mass spectrometry. Most of the protein–protein interactions are similar between hybrid and parents, and are consistent with the assembly of chimeric complexes, which we validated using an orthogonal approach for the prefoldin complex. We also identified instances of altered protein–protein interactions in the hybrid, for instance, in complexes related to proteostasis and in mitochondrial protein complexes. Overall, this study uncovers the likely frequent occurrence of chimeric protein complexes with few exceptions, which may result from incompatibilities or imbalances between the parental proteomes.
Collapse
Affiliation(s)
- Rohan Dandage
- Département de Biochimie, Microbiologie et Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, QC, Canada.,PROTEO, Le Réseau Québécois de Recherche sur la Fonction, la Structure et L'ingénierie des Protéines, Université Laval, Québec, QC, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, QC, Canada.,Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec, QC, Canada
| | - Caroline M Berger
- Département de Biochimie, Microbiologie et Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, QC, Canada.,PROTEO, Le Réseau Québécois de Recherche sur la Fonction, la Structure et L'ingénierie des Protéines, Université Laval, Québec, QC, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, QC, Canada.,Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec, QC, Canada
| | - Isabelle Gagnon-Arsenault
- Département de Biochimie, Microbiologie et Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, QC, Canada.,PROTEO, Le Réseau Québécois de Recherche sur la Fonction, la Structure et L'ingénierie des Protéines, Université Laval, Québec, QC, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, QC, Canada.,Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec, QC, Canada
| | - Kyung-Mee Moon
- Department of Biochemistry & Molecular Biology, and Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Richard Greg Stacey
- Department of Biochemistry & Molecular Biology, and Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Leonard J Foster
- Department of Biochemistry & Molecular Biology, and Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Christian R Landry
- Département de Biochimie, Microbiologie et Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, QC, Canada.,PROTEO, Le Réseau Québécois de Recherche sur la Fonction, la Structure et L'ingénierie des Protéines, Université Laval, Québec, QC, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, QC, Canada.,Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec, QC, Canada
| |
Collapse
|
17
|
Wang W, Tan H, Sun M, Han Y, Chen W, Qiu S, Zheng K, Wei G, Ni T. Independent component analysis based gene co-expression network inference (ICAnet) to decipher functional modules for better single-cell clustering and batch integration. Nucleic Acids Res 2021; 49:e54. [PMID: 33619563 PMCID: PMC8136772 DOI: 10.1093/nar/gkab089] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 01/26/2021] [Accepted: 02/02/2021] [Indexed: 12/18/2022] Open
Abstract
With the tremendous increase of publicly available single-cell RNA-sequencing (scRNA-seq) datasets, bioinformatics methods based on gene co-expression network are becoming efficient tools for analyzing scRNA-seq data, improving cell type prediction accuracy and in turn facilitating biological discovery. However, the current methods are mainly based on overall co-expression correlation and overlook co-expression that exists in only a subset of cells, thus fail to discover certain rare cell types and sensitive to batch effect. Here, we developed independent component analysis-based gene co-expression network inference (ICAnet) that decomposed scRNA-seq data into a series of independent gene expression components and inferred co-expression modules, which improved cell clustering and rare cell-type discovery. ICAnet showed efficient performance for cell clustering and batch integration using scRNA-seq datasets spanning multiple cells/tissues/donors/library types. It works stably on datasets produced by different library construction strategies and with different sequencing depths and cell numbers. We demonstrated the capability of ICAnet to discover rare cell types in multiple independent scRNA-seq datasets from different sources. Importantly, the identified modules activated in acute myeloid leukemia scRNA-seq datasets have the potential to serve as new diagnostic markers. Thus, ICAnet is a competitive tool for cell clustering and biological interpretations of single-cell RNA-seq data analysis.
Collapse
Affiliation(s)
- Weixu Wang
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai, 200438, P.R. China
| | - Huanhuan Tan
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166, P.R. China
| | - Mingwan Sun
- College of Life Science, South China Agricultural University, Guangzhou 510642, P.R. China
| | - Yiqing Han
- College of Agricultural, South China Agricultural University, Guangzhou 510642, P.R. China
| | - Wei Chen
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai, 200438, P.R. China
| | - Shengnu Qiu
- Division of Biosciences, Faculty of Life Sciences, University College London, London, WC1E 6BT, UK
| | - Ke Zheng
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166, P.R. China
| | - Gang Wei
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai, 200438, P.R. China.,MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200438, P.R. China
| | - Ting Ni
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai, 200438, P.R. China
| |
Collapse
|
18
|
Spatiotemporal 22q11.21 Protein Network Implicates DGCR8-Dependent MicroRNA Biogenesis as a Risk for Late-Fetal Cortical Development in Psychiatric Diseases. Life (Basel) 2021; 11:life11060514. [PMID: 34073122 PMCID: PMC8227527 DOI: 10.3390/life11060514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 05/28/2021] [Accepted: 05/31/2021] [Indexed: 12/28/2022] Open
Abstract
The chromosome 22q11.21 copy number variant (CNV) is a vital risk factor that can be a genetic predisposition to neurodevelopmental disorders (NDD). As the 22q11.21 CNV affects multiple genes, causal disease genes and mechanisms affected are still poorly understood. Thus, we aimed to identify the most impactful 22q11.21 CNV genes and the potential impacted human brain regions, developmental stages and signaling pathways. We constructed the spatiotemporal dynamic networks of 22q11.21 CNV genes using the brain developmental transcriptome and physical protein–protein interactions. The affected brain regions, developmental stages, driver genes and pathways were subsequently investigated via integrated bioinformatics analysis. As a result, we first identified that 22q11.21 CNV genes affect the cortical area mainly during late fetal periods. Interestingly, we observed that connections between a driver gene, DGCR8, and its interacting partners, MECP2 and CUL3, also network hubs, only existed in the network of the late fetal period within the cortical region, suggesting their functional specificity during brain development. We also confirmed the physical interaction result between DGCR8 and CUL3 by liquid chromatography-tandem mass spectrometry. In conclusion, our results could suggest that the disruption of DGCR8-dependent microRNA biogenesis plays a vital role in NDD for late fetal cortical development.
Collapse
|
19
|
Salomé PA, Merchant SS. Co-expression networks in Chlamydomonas reveal significant rhythmicity in batch cultures and empower gene function discovery. THE PLANT CELL 2021; 33:1058-1082. [PMID: 33793846 PMCID: PMC8226298 DOI: 10.1093/plcell/koab042] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/25/2021] [Indexed: 05/18/2023]
Abstract
The unicellular green alga Chlamydomonas reinhardtii is a choice reference system for the study of photosynthesis and chloroplast metabolism, cilium assembly and function, lipid and starch metabolism, and metal homeostasis. Despite decades of research, the functions of thousands of genes remain largely unknown, and new approaches are needed to categorically assign genes to cellular pathways. Growing collections of transcriptome and proteome data now allow a systematic approach based on integrative co-expression analysis. We used a dataset comprising 518 deep transcriptome samples derived from 58 independent experiments to identify potential co-expression relationships between genes. We visualized co-expression potential with the R package corrplot, to easily assess co-expression and anti-correlation between genes. We extracted several hundred high-confidence genes at the intersection of multiple curated lists involved in cilia, cell division, and photosynthesis, illustrating the power of our method. Surprisingly, Chlamydomonas experiments retained a significant rhythmic component across the transcriptome, suggesting an underappreciated variable during sample collection, even in samples collected in constant light. Our results therefore document substantial residual synchronization in batch cultures, contrary to assumptions of asynchrony. We provide step-by-step protocols for the analysis of co-expression across transcriptome data sets from Chlamydomonas and other species to help foster gene function discovery.
Collapse
Affiliation(s)
- Patrice A Salomé
- Department of Chemistry and Biochemistry, University of California—Los Angeles, Los Angeles California 90095
| | - Sabeeha S Merchant
- Department of Chemistry and Biochemistry, University of California—Los Angeles, Los Angeles California 90095
- Departments of Molecular and Cell Biology and Plant and Microbial Biology, University of California-Berkeley, Berkeley, California 94720 and Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
| |
Collapse
|
20
|
Spatiotemporal 7q11.23 protein network analysis implicates the role of DNA repair pathway during human brain development. Sci Rep 2021; 11:8246. [PMID: 33859276 PMCID: PMC8050238 DOI: 10.1038/s41598-021-87632-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 03/25/2021] [Indexed: 01/10/2023] Open
Abstract
Recurrent deletions and duplications of chromosome 7q11.23 copy number variants (CNVs) are associated with several psychiatric disorders. Although phenotypic abnormalities have been observed in patients, causal genes responsible for CNV-associated diagnoses and traits are still poorly understood. Furthermore, the targeted human brain regions, developmental stages, protein networks, and signaling pathways, influenced by this CNV remain unclear. Previous works showed GTF2I involved in Williams-Beuren syndrome, but pathways affected by GTF2I are indistinct. We first constructed dynamic spatiotemporal networks of 7q11.23 genes by combining data from the brain developmental transcriptome with physical interactions of 7q11.23 proteins. Topological changes were observed in protein-protein interaction (PPI) networks throughout different stages of brain development. Early and late fetal periods of development in the cortex, striatum, hippocampus, and amygdale were observed as the vital periods and regions for 7q11.23 CNV proteins. CNV proteins and their partners are significantly enriched in DNA repair pathway. As a driver gene, GTF2I interacted with PRKDC and BRCA1 to involve in DNA repair pathway. The physical interaction between GTF2I with PRKDC was confirmed experimentally by the liquid chromatography-tandem mass spectrometry (LC-MS/MS). We identified that early and late fetal periods are crucial for 7q11.23 genes to affect brain development. Our results implicate that 7q11.23 CNV genes converge on the DNA repair pathway to contribute to the pathogenesis of psychiatric diseases.
Collapse
|
21
|
DeGiorgio M, Assis R. Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data. Mol Biol Evol 2021; 38:1209-1224. [PMID: 33045078 PMCID: PMC7947822 DOI: 10.1093/molbev/msaa267] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.
Collapse
Affiliation(s)
- Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, FL 33431
| | - Raquel Assis
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, FL 33431
| |
Collapse
|
22
|
Hu L, Wang X, Huang YA, Hu P, You ZH. A survey on computational models for predicting protein-protein interactions. Brief Bioinform 2021; 22:6159365. [PMID: 33693513 DOI: 10.1093/bib/bbab036] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/31/2020] [Indexed: 12/24/2022] Open
Abstract
Proteins interact with each other to play critical roles in many biological processes in cells. Although promising, laboratory experiments usually suffer from the disadvantages of being time-consuming and labor-intensive. The results obtained are often not robust and considerably uncertain. Due recently to advances in high-throughput technologies, a large amount of proteomics data has been collected and this presents a significant opportunity and also a challenge to develop computational models to predict protein-protein interactions (PPIs) based on these data. In this paper, we present a comprehensive survey of the recent efforts that have been made towards the development of effective computational models for PPI prediction. The survey introduces the algorithms that can be used to learn computational models for predicting PPIs, and it classifies these models into different categories. To understand their relative merits, the paper discusses different validation schemes and metrics to evaluate the prediction performance. Biological databases that are commonly used in different experiments for performance comparison are also described and their use in a series of extensive experiments to compare different prediction models are discussed. Finally, we present some open issues in PPI prediction for future work. We explain how the performance of PPI prediction can be improved if these issues are effectively tackled.
Collapse
Affiliation(s)
- Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, 830011, Urumqi, China
| | - Xiaojuan Wang
- School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, 518060, Shenzhen, China
| | | | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, 830011, Urumqi, China
| |
Collapse
|
23
|
Chen Q, Li Y, Tan K, Qiao Y, Pan S, Jiang T, Chen YPP. Network-based methods for gene function prediction. Brief Funct Genomics 2021; 20:249-257. [PMID: 33686431 DOI: 10.1093/bfgp/elab006] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 12/23/2022] Open
Abstract
The rapid development of high-throughput technology has generated a large number of biological networks. Network-based methods are able to provide rich information for inferring gene function. This is composed of analyzing the topological characteristics of genes in related networks, integrating biological information, and considering data from different data sources. To promote network biology and related biotechnology research, this article provides a survey for the state of the art of advanced methods of network-based gene function prediction and discusses the potential challenges.
Collapse
Affiliation(s)
- Qingfeng Chen
- University of Technology Sydney, China and Hundred-Talent Program
| | - Yongjie Li
- School of Computer and Electronic Information at Guangxi University
| | - Kai Tan
- School of Computer and Electronic Information at Guangxi University
| | - Yvlu Qiao
- School of Computer and Electronic Information at Guangxi University
| | - Shirui Pan
- Computer science from the University of Technology Sydney
| | - Taijiao Jiang
- Suzhou Institute of System Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia
| |
Collapse
|
24
|
Ji H, Liu D, Yang Z. High oil accumulation in tuber of yellow nutsedge compared to purple nutsedge is associated with more abundant expression of genes involved in fatty acid synthesis and triacylglycerol storage. BIOTECHNOLOGY FOR BIOFUELS 2021; 14:54. [PMID: 33653389 PMCID: PMC7923336 DOI: 10.1186/s13068-021-01909-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 02/18/2021] [Indexed: 05/10/2023]
Abstract
BACKGROUND Yellow nutsedge is a unique plant species that can accumulate up to 35% oil of tuber dry weight, perhaps the highest level observed in the tuber tissues of plant kingdom. To gain insight into the molecular mechanism that leads to high oil accumulation in yellow nutsedge, gene expression profiles of oil production pathways involved carbon metabolism, fatty acid synthesis, triacylglycerol synthesis, and triacylglycerol storage during tuber development were compared with purple nutsedge, the closest relative of yellow nutsedge that is poor in oil accumulation. RESULTS Compared with purple nutsedge, high oil accumulation in yellow nutsedge was associated with significant up-regulation of specific key enzymes of plastidial RubisCO bypass as well as malate and pyruvate metabolism, almost all fatty acid synthesis enzymes, and seed-like oil-body proteins. However, overall transcripts for carbon metabolism toward carbon precursor for fatty acid synthesis were comparable and for triacylglycerol synthesis were similar in both species. Two seed-like master transcription factors ABI3 and WRI1 were found to display similar transcript patterns but were expressed at 6.5- and 14.3-fold higher levels in yellow nutsedge than in purple nutsedge, respectively. A weighted gene co-expression network analysis revealed that ABI3 was in strong transcriptional coordination with WRI1 and other key oil-related genes. CONCLUSIONS These results implied that pyruvate availability and fatty acid synthesis in plastid, along with triacylglycerol storage in oil bodies, rather than triacylglycerol synthesis in endoplasmic reticulum, are the major factors responsible for high oil production in tuber of yellow nutsedge, and ABI3 most likely plays a critical role in regulating oil accumulation. This study is of significance with regard to understanding the molecular mechanism controlling carbon partitioning toward oil production in oil-rich tuber and provides a valuable reference for enhancing oil accumulation in non-seed tissues of crops through genetic breeding or metabolic engineering.
Collapse
Affiliation(s)
- Hongying Ji
- Key Lab of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Dantong Liu
- Key Lab of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Zhenle Yang
- Key Lab of Plant Resources, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093 China
| |
Collapse
|
25
|
Swamy KBS, Schuyler SC, Leu JY. Protein Complexes Form a Basis for Complex Hybrid Incompatibility. Front Genet 2021; 12:609766. [PMID: 33633780 PMCID: PMC7900514 DOI: 10.3389/fgene.2021.609766] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 01/20/2021] [Indexed: 12/20/2022] Open
Abstract
Proteins are the workhorses of the cell and execute many of their functions by interacting with other proteins forming protein complexes. Multi-protein complexes are an admixture of subunits, change their interaction partners, and modulate their functions and cellular physiology in response to environmental changes. When two species mate, the hybrid offspring are usually inviable or sterile because of large-scale differences in the genetic makeup between the two parents causing incompatible genetic interactions. Such reciprocal-sign epistasis between inter-specific alleles is not limited to incompatible interactions between just one gene pair; and, usually involves multiple genes. Many of these multi-locus incompatibilities show visible defects, only in the presence of all the interactions, making it hard to characterize. Understanding the dynamics of protein-protein interactions (PPIs) leading to multi-protein complexes is better suited to characterize multi-locus incompatibilities, compared to studying them with traditional approaches of genetics and molecular biology. The advances in omics technologies, which includes genomics, transcriptomics, and proteomics can help achieve this end. This is especially relevant when studying non-model organisms. Here, we discuss the recent progress in the understanding of hybrid genetic incompatibility; omics technologies, and how together they have helped in characterizing protein complexes and in turn multi-locus incompatibilities. We also review advances in bioinformatic techniques suitable for this purpose and propose directions for leveraging the knowledge gained from model-organisms to identify genetic incompatibilities in non-model organisms.
Collapse
Affiliation(s)
- Krishna B. S. Swamy
- Division of Biological and Life Sciences, School of Arts and Sciences, Ahmedabad University, Ahmedabad, India
| | - Scott C. Schuyler
- Department of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Division of Head and Neck Surgery, Department of Otolaryngology, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
26
|
AUTS2 isoforms control neuronal differentiation. Mol Psychiatry 2021; 26:666-681. [PMID: 30953002 DOI: 10.1038/s41380-019-0409-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 03/15/2019] [Accepted: 03/18/2019] [Indexed: 01/07/2023]
Abstract
Mutations in AUTS2 are associated with autism, intellectual disability, and microcephaly. AUTS2 is expressed in the brain and interacts with polycomb proteins, yet it is still unclear how mutations in AUTS2 lead to neurodevelopmental phenotypes. Here we report that when neuronal differentiation is initiated, there is a shift in expression from a long isoform to a short AUTS2 isoform. Yeast two-hybrid screen identified the splicing factor SF3B1 as an interactor of both isoforms, whereas the polycomb group proteins, PCGF3 and PCGF5, were found to interact exclusively with the long AUTS2 isoform. Reporter assays showed that the first exons of the long AUTS2 isoform function as a transcription repressor, but the part that consist of the short isoform acts as a transcriptional activator, both influenced by the cellular context. The expression levels of PCGF3 influenced the ability of the long AUTS2 isoform to activate or repress transcription. Mouse embryonic stem cells (mESCs) with heterozygote mutations in Auts2 had an increase in cell death during in vitro corticogenesis, which was significantly rescued by overexpressing the human AUTS2 transcripts. mESCs with a truncated AUTS2 protein (missing exons 12-20) showed premature neuronal differentiation, whereas cells overexpressing AUTS2, especially the long transcript, showed increase in expression of pluripotency markers and delayed differentiation. Taken together, our data suggest that the precise expression of AUTS2 isoforms is essential for regulating transcription and the timing of neuronal differentiation.
Collapse
|
27
|
Leroux M, Boutchueng-Djidjou M, Faure R. Insulin's Discovery: New Insights on Its Hundredth Birthday: From Insulin Action and Clearance to Sweet Networks. Int J Mol Sci 2021; 22:ijms22031030. [PMID: 33494161 PMCID: PMC7864324 DOI: 10.3390/ijms22031030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 01/18/2021] [Accepted: 01/19/2021] [Indexed: 11/28/2022] Open
Abstract
In 2021, the 100th anniversary of the isolation of insulin and the rescue of a child with type 1 diabetes from death will be marked. In this review, we highlight advances since the ingenious work of the four discoverers, Frederick Grant Banting, John James Rickard Macleod, James Bertram Collip and Charles Herbert Best. Macleoad closed his Nobel Lecture speech by raising the question of the mechanism of insulin action in the body. This challenge attracted many investigators, and the question remained unanswered until the third part of the 20th century. We summarize what has been learned, from the discovery of cell surface receptors, insulin action, and clearance, to network and precision medicine.
Collapse
|
28
|
Veenstra TD. Omics in Systems Biology: Current Progress and Future Outlook. Proteomics 2021; 21:e2000235. [PMID: 33320441 DOI: 10.1002/pmic.202000235] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 11/25/2020] [Indexed: 12/16/2022]
Abstract
Biological research has undergone tremendous changes over the past three decades. Research used to almost exclusively focus on a single aspect of a single molecule per experiment. Modern technologies have enabled thousands of molecules to be simultaneously analyzed and the way that these molecules influence each other to be discerned. The change is so dramatic that it has given rise to a whole new descriptive suffix (i.e., omics) to describe these fields of study. While genomics was arguably the initial driver of this new trend, it quickly spread to other biological entities resulting in the creation of transcriptomics, proteomics, metabolomics, etc. The development of these "big four omics" created a wave of other omic fields, such as epigenomics, glycomics, lipidomics, microbiomics, and even foodomics; all with the purpose of comprehensively studying all the molecular entities or processes within their respective domain. The large number of omic fields that are invented even led to the term "panomics" as a way to classify them all under one category. Ultimately, all of these omic fields are setting the foundation for developing systems biology; in which the focus will be on determining the complex interactions that occur within biological systems.
Collapse
|
29
|
Ramundo S, Asakura Y, Salomé PA, Strenkert D, Boone M, Mackinder LCM, Takafuji K, Dinc E, Rahire M, Crèvecoeur M, Magneschi L, Schaad O, Hippler M, Jonikas MC, Merchant S, Nakai M, Rochaix JD, Walter P. Coexpressed subunits of dual genetic origin define a conserved supercomplex mediating essential protein import into chloroplasts. Proc Natl Acad Sci U S A 2020; 117:32739-32749. [PMID: 33273113 PMCID: PMC7768757 DOI: 10.1073/pnas.2014294117] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
In photosynthetic eukaryotes, thousands of proteins are translated in the cytosol and imported into the chloroplast through the concerted action of two translocons-termed TOC and TIC-located in the outer and inner membranes of the chloroplast envelope, respectively. The degree to which the molecular composition of the TOC and TIC complexes is conserved over phylogenetic distances has remained controversial. Here, we combine transcriptomic, biochemical, and genetic tools in the green alga Chlamydomonas (Chlamydomonas reinhardtii) to demonstrate that, despite a lack of evident sequence conservation for some of its components, the algal TIC complex mirrors the molecular composition of a TIC complex from Arabidopsis thaliana. The Chlamydomonas TIC complex contains three nuclear-encoded subunits, Tic20, Tic56, and Tic100, and one chloroplast-encoded subunit, Tic214, and interacts with the TOC complex, as well as with several uncharacterized proteins to form a stable supercomplex (TIC-TOC), indicating that protein import across both envelope membranes is mechanistically coupled. Expression of the nuclear and chloroplast genes encoding both known and uncharacterized TIC-TOC components is highly coordinated, suggesting that a mechanism for regulating its biogenesis across compartmental boundaries must exist. Conditional repression of Tic214, the only chloroplast-encoded subunit in the TIC-TOC complex, impairs the import of chloroplast proteins with essential roles in chloroplast ribosome biogenesis and protein folding and induces a pleiotropic stress response, including several proteins involved in the chloroplast unfolded protein response. These findings underscore the functional importance of the TIC-TOC supercomplex in maintaining chloroplast proteostasis.
Collapse
Affiliation(s)
- Silvia Ramundo
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143
- Howard Hughes Medical Institute, Chevy Chase, MD 20815
| | - Yukari Asakura
- Laboratory of Organelle Biology, Institute for Protein Research, Osaka University, Osaka 565-0871, Japan
| | - Patrice A Salomé
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095
| | - Daniela Strenkert
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095
| | - Morgane Boone
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143
- Howard Hughes Medical Institute, Chevy Chase, MD 20815
| | - Luke C M Mackinder
- Department of Biology, University of York, York YO10 5DD, United Kingdom
| | - Kazuaki Takafuji
- Graduate School of Medicine, Osaka University, Osaka 565-0871, Japan
| | - Emine Dinc
- Department of Molecular Biology, University of Geneva, Geneva CH-1211, Switzerland
- Department of Plant Biology, University of Geneva, Geneva CH-1211, Switzerland
| | - Michèle Rahire
- Department of Molecular Biology, University of Geneva, Geneva CH-1211, Switzerland
- Department of Plant Biology, University of Geneva, Geneva CH-1211, Switzerland
| | - Michèle Crèvecoeur
- Department of Molecular Biology, University of Geneva, Geneva CH-1211, Switzerland
- Department of Plant Biology, University of Geneva, Geneva CH-1211, Switzerland
| | - Leonardo Magneschi
- Institute of Plant Biology and Biotechnology, University of Münster, Münster 48143, Germany
| | - Olivier Schaad
- Department of Biochemistry, University of Geneva, Geneva CH-1211, Switzerland
| | - Michael Hippler
- Institute of Plant Biology and Biotechnology, University of Münster, Münster 48143, Germany
- Institute of Plant Science and Resources, Okayama University, Kurashiki 710-0046, Japan
| | - Martin C Jonikas
- Department of Molecular Biology, Princeton University, Princeton, NJ 08540
- Howard Hughes Medical Institute, Chevy Chase, MD 20815
| | - Sabeeha Merchant
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095
| | - Masato Nakai
- Laboratory of Organelle Biology, Institute for Protein Research, Osaka University, Osaka 565-0871, Japan;
| | - Jean-David Rochaix
- Department of Molecular Biology, University of Geneva, Geneva CH-1211, Switzerland;
- Department of Plant Biology, University of Geneva, Geneva CH-1211, Switzerland
| | - Peter Walter
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143;
- Howard Hughes Medical Institute, Chevy Chase, MD 20815
| |
Collapse
|
30
|
Savino A, Provero P, Poli V. Differential Co-Expression Analyses Allow the Identification of Critical Signalling Pathways Altered during Tumour Transformation and Progression. Int J Mol Sci 2020; 21:E9461. [PMID: 33322692 PMCID: PMC7764314 DOI: 10.3390/ijms21249461] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/02/2020] [Accepted: 12/09/2020] [Indexed: 02/02/2023] Open
Abstract
Biological systems respond to perturbations through the rewiring of molecular interactions, organised in gene regulatory networks (GRNs). Among these, the increasingly high availability of transcriptomic data makes gene co-expression networks the most exploited ones. Differential co-expression networks are useful tools to identify changes in response to an external perturbation, such as mutations predisposing to cancer development, and leading to changes in the activity of gene expression regulators or signalling. They can help explain the robustness of cancer cells to perturbations and identify promising candidates for targeted therapy, moreover providing higher specificity with respect to standard co-expression methods. Here, we comprehensively review the literature about the methods developed to assess differential co-expression and their applications to cancer biology. Via the comparison of normal and diseased conditions and of different tumour stages, studies based on these methods led to the definition of pathways involved in gene network reorganisation upon oncogenes' mutations and tumour progression, often converging on immune system signalling. A relevant implementation still lagging behind is the integration of different data types, which would greatly improve network interpretability. Most importantly, performance and predictivity evaluation of the large variety of mathematical models proposed would urgently require experimental validations and systematic comparisons. We believe that future work on differential gene co-expression networks, complemented with additional omics data and experimentally tested, will considerably improve our insights into the biology of tumours.
Collapse
Affiliation(s)
- Aurora Savino
- Molecular Biotechnology Center, Department of Molecular Biotechnology and Health Sciences, University of Turin, Via Nizza 52, 10126 Turin, Italy
| | - Paolo Provero
- Department of Neurosciences “Rita Levi Montalcini”, University of Turin, Corso Massimo D’Ázeglio 52, 10126 Turin, Italy;
- Center for Omics Sciences, Ospedale San Raffaele IRCCS, Via Olgettina 60, 20132 Milan, Italy
| | - Valeria Poli
- Molecular Biotechnology Center, Department of Molecular Biotechnology and Health Sciences, University of Turin, Via Nizza 52, 10126 Turin, Italy
| |
Collapse
|
31
|
de Groot NS, Torrent Burgas M. Bacteria use structural imperfect mimicry to hijack the host interactome. PLoS Comput Biol 2020; 16:e1008395. [PMID: 33275611 PMCID: PMC7744059 DOI: 10.1371/journal.pcbi.1008395] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 12/16/2020] [Accepted: 09/23/2020] [Indexed: 12/25/2022] Open
Abstract
Bacteria use protein-protein interactions to infect their hosts and hijack fundamental pathways, which ensures their survival and proliferation. Hence, the infectious capacity of the pathogen is closely related to its ability to interact with host proteins. Here, we show that hubs in the host-pathogen interactome are isolated in the pathogen network by adapting the geometry of the interacting interfaces. An imperfect mimicry of the eukaryotic interfaces allows pathogen proteins to actively bind to the host's target while preventing deleterious effects on the pathogen interactome. Understanding how bacteria recognize eukaryotic proteins may pave the way for the rational design of new antibiotic molecules.
Collapse
Affiliation(s)
- Natalia Sanchez de Groot
- Gene Function and Evolution Lab, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona, Spain
- * E-mail: (NSdG); (MTB)
| | - Marc Torrent Burgas
- Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- * E-mail: (NSdG); (MTB)
| |
Collapse
|
32
|
Identification and expression profiling of HvMADS57 and HvD14 in a barley tb1 mutant. J Genet 2020. [DOI: 10.1007/s12041-020-1190-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
33
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
34
|
Cope AL, O'Meara BC, Gilchrist MA. Gene expression of functionally-related genes coevolves across fungal species: detecting coevolution of gene expression using phylogenetic comparative methods. BMC Genomics 2020; 21:370. [PMID: 32434474 PMCID: PMC7240986 DOI: 10.1186/s12864-020-6761-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/29/2020] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Researchers often measure changes in gene expression across conditions to better understand the shared functional roles and regulatory mechanisms of different genes. Analogous to this is comparing gene expression across species, which can improve our understanding of the evolutionary processes shaping the evolution of both individual genes and functional pathways. One area of interest is determining genes showing signals of coevolution, which can also indicate potential functional similarity, analogous to co-expression analysis often performed across conditions for a single species. However, as with any trait, comparing gene expression across species can be confounded by the non-independence of species due to shared ancestry, making standard hypothesis testing inappropriate. RESULTS We compared RNA-Seq data across 18 fungal species using a multivariate Brownian Motion phylogenetic comparative method (PCM), which allowed us to quantify coevolution between protein pairs while directly accounting for the shared ancestry of the species. Our work indicates proteins which physically-interact show stronger signals of coevolution than randomly-generated pairs. Interactions with stronger empirical and computational evidence also showing stronger signals of coevolution. We examined the effects of number of protein interactions and gene expression levels on coevolution, finding both factors are overall poor predictors of the strength of coevolution between a protein pair. Simulations further demonstrate the potential issues of analyzing gene expression coevolution without accounting for shared ancestry in a standard hypothesis testing framework. Furthermore, our simulations indicate the use of a randomly-generated null distribution as a means of determining statistical significance for detecting coevolving genes with phylogenetically-uncorrected correlations, as has previously been done, is less accurate than PCMs, although is a significant improvement over standard hypothesis testing. These methods are further improved by using a phylogenetically-corrected correlation metric. CONCLUSIONS Our work highlights potential benefits of using PCMs to detect gene expression coevolution from high-throughput omics scale data. This framework can be built upon to investigate other evolutionary hypotheses, such as changes in transcription regulatory mechanisms across species.
Collapse
Affiliation(s)
- Alexander L Cope
- Genome Science and Technology, University of Tennessee, Knoxville, Tennessee, USA.
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.
| | - Brian C O'Meara
- Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, Tennessee, USA
- National Institute of Mathematical and Biological Synthesis, University of Tennessee, Knoxville, Tennessee, USA
| | - Michael A Gilchrist
- Genome Science and Technology, University of Tennessee, Knoxville, Tennessee, USA
- Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, Tennessee, USA
- National Institute of Mathematical and Biological Synthesis, University of Tennessee, Knoxville, Tennessee, USA
| |
Collapse
|
35
|
Carianopol CS, Chan AL, Dong S, Provart NJ, Lumba S, Gazzarrini S. An abscisic acid-responsive protein interaction network for sucrose non-fermenting related kinase1 in abiotic stress response. Commun Biol 2020; 3:145. [PMID: 32218501 PMCID: PMC7099082 DOI: 10.1038/s42003-020-0866-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 02/24/2020] [Indexed: 12/13/2022] Open
Abstract
Yeast Snf1 (Sucrose non-fermenting1), mammalian AMPK (5′ AMP-activated protein kinase) and plant SnRK1 (Snf1-Related Kinase1) are conserved heterotrimeric kinase complexes that re-establish energy homeostasis following stress. The hormone abscisic acid (ABA) plays a crucial role in plant stress response. Activation of SnRK1 or ABA signaling results in overlapping transcriptional changes, suggesting these stress pathways share common targets. To investigate how SnRK1 and ABA interact during stress response in Arabidopsis thaliana, we screened the SnRK1 complex by yeast two-hybrid against a library of proteins encoded by 258 ABA-regulated genes. Here, we identify 125 SnRK1- interacting proteins (SnIPs). Network analysis indicates that a subset of SnIPs form signaling modules in response to abiotic stress. Functional studies show the involvement of SnRK1 and select SnIPs in abiotic stress responses. This targeted study uncovers the largest set of SnRK1 interactors, which can be used to further characterize SnRK1 role in plant survival under stress. Carianopol et al. construct a detailed protein interaction network for the SnRK1 kinase complex to investigate the interaction of SnRK1 and ABA during stress response. They identify 125 proteins that interact with SnRK1, which can be used further to characterise the role of SnRK1 in plant survival under stress.
Collapse
Affiliation(s)
- Carina Steliana Carianopol
- Department of Biological Sciences, University of Toronto Scarborough, 1265 Military Trail, Toronto, ON, M1C 1A4, Canada.,Department of Cell and Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Aaron Lorheed Chan
- Department of Biological Sciences, University of Toronto Scarborough, 1265 Military Trail, Toronto, ON, M1C 1A4, Canada.,Department of Cell and Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Shaowei Dong
- Department of Cell and Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Nicholas J Provart
- Department of Cell and Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada.,Centre for the Analysis of Genome Evolution and Function, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Shelley Lumba
- Department of Cell and Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Sonia Gazzarrini
- Department of Biological Sciences, University of Toronto Scarborough, 1265 Military Trail, Toronto, ON, M1C 1A4, Canada. .,Department of Cell and Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada.
| |
Collapse
|
36
|
Yang X, Yang S, Qi H, Wang T, Li H, Zhang Z. PlaPPISite: a comprehensive resource for plant protein-protein interaction sites. BMC PLANT BIOLOGY 2020; 20:61. [PMID: 32028878 PMCID: PMC7006421 DOI: 10.1186/s12870-020-2254-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 01/16/2020] [Indexed: 05/02/2023]
Abstract
BACKGROUND Protein-protein interactions (PPIs) play very important roles in diverse biological processes. Experimentally validated or predicted PPI data have become increasingly available in diverse plant species. To further explore the biological functions of PPIs, understanding the interaction details of plant PPIs (e.g., the 3D structural contexts of interaction sites) is necessary. By integrating bioinformatics algorithms, interaction details can be annotated at different levels and then compiled into user-friendly databases. In our previous study, we developed AraPPISite, which aimed to provide interaction site information for PPIs in the model plant Arabidopsis thaliana. Considering that the application of AraPPISite is limited to one species, it is very natural that AraPPISite should be evolved into a new database that can provide interaction details of PPIs in multiple plants. DESCRIPTION PlaPPISite (http://zzdlab.com/plappisite/index.php) is a comprehensive, high-coverage and interaction details-oriented database for 13 plant interactomes. In addition to collecting 121 experimentally verified structures of protein complexes, the complex structures of experimental/predicted PPIs in the 13 plants were also constructed, and the corresponding interaction sites were annotated. For the PPIs whose 3D structures could not be modelled, the associated domain-domain interactions (DDIs) and domain-motif interactions (DMIs) were inferred. To facilitate the reliability assessment of predicted PPIs, the source species of interolog templates, GO annotations, subcellular localizations and gene expression similarities are also provided. JavaScript packages were employed to visualize structures of protein complexes, protein interaction sites and protein interaction networks. We also developed an online tool for homology modelling and protein interaction site annotation of protein complexes. All data contained in PlaPPISite are also freely available on the Download page. CONCLUSION PlaPPISite provides the plant research community with an easy-to-use and comprehensive data resource for the search and analysis of protein interaction details from the 13 important plant species.
Collapse
Affiliation(s)
- Xiaodi Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| | - Shiping Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| | - Huan Qi
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| | - Tianpeng Wang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| | - Hong Li
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, 570228 China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| |
Collapse
|
37
|
A Novel Stochastic Block Model for Network-Based Prediction of Protein-Protein Interactions. INTELLIGENT COMPUTING THEORIES AND APPLICATION 2020. [DOI: 10.1007/978-3-030-60802-6_54] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
38
|
Yang X, Yang S, Li Q, Wuchty S, Zhang Z. Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Comput Struct Biotechnol J 2019; 18:153-161. [PMID: 31969974 PMCID: PMC6961065 DOI: 10.1016/j.csbj.2019.12.005] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/29/2019] [Accepted: 12/10/2019] [Indexed: 12/11/2022] Open
Abstract
The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computational methods are playing an important role in providing testable hypotheses, complementing the determination of large-scale interactome between species. In this work, we applied an unsupervised sequence embedding technique (doc2vec) to represent protein sequences as rich feature vectors of low dimensionality. Training a Random Forest (RF) classifier through a training dataset that covers known PPIs between human and all viruses, we obtained excellent predictive accuracy outperforming various combinations of machine learning algorithms and commonly-used sequence encoding schemes. Rigorous comparison with three existing human-virus PPI prediction methods, our proposed computational framework further provided very competitive and promising performance, suggesting that the doc2vec encoding scheme effectively captures context information of protein sequences, pertaining to corresponding protein-protein interactions. Our approach is freely accessible through our web server as part of our host-pathogen PPI prediction platform (http://zzdlab.com/InterSPPI/). Taken together, we hope the current work not only contributes a useful predictor to accelerate the exploration of human-virus PPIs, but also provides some meaningful insights into human-virus relationships.
Collapse
Key Words
- AC, Auto Covariance
- ACC, Accuracy
- AUC, area under the ROC curve
- AUPRC, area under the PR curve
- Adaboost, Adaptive Boosting
- CT, Conjoint Triad
- Doc2vec
- Embedding
- Human-virus interaction
- LD, Local Descriptor
- MCC, Matthews correlation coefficient
- ML, machine learning
- MLP, Multiple Layer Perceptron
- MS, mass spectroscopy
- Machine learning
- PPIs, protein-protein interactions
- PR, Precision-Recall
- Prediction
- Protein-protein interaction
- RBF, radial basis function
- RF, Random Forest
- ROC, Receiver Operating Characteristic
- SGD, stochastic gradient descent
- SVM, Support Vector Machine
- Y2H, yeast two-hybrid
Collapse
Affiliation(s)
- Xiaodi Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Shiping Yang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Qinmengge Li
- National Demonstration Center for Experimental Biological Sciences Education, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Stefan Wuchty
- Dept. of Computer Science, University of Miami, Miami, FL 33146, USA
- Dept. of Biology, University of Miami, Miami, FL 33146, USA
- Center of Computational Science, University of Miami, Miami, FL 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
39
|
Zhang J, Ju S. Identifying genuine protein-protein interactions within communities of gene co-expression networks using a deconvolution method. IET Syst Biol 2019; 13:290-296. [PMID: 31778125 PMCID: PMC8687158 DOI: 10.1049/iet-syb.2019.0060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 06/24/2019] [Accepted: 07/09/2019] [Indexed: 11/20/2022] Open
Abstract
Direct relationships between biological molecules connected in a gene co-expression network tend to reflect real biological activities such as gene regulation, protein-protein interactions (PPIs), and metabolisation. As correlation-based networks contain numerous indirect connections, those direct relationships are always 'hidden' in them. Compared with the global network, network communities imply more biological significance on predicting protein function, detecting protein complexes and studying network evolution. Therefore, identifying direct relationships in communities is a pervasive and important topic in the biological sciences. Unfortunately, this field has not been well studied. A major thrust of this study is to apply a deconvolution algorithm on communities stemming from different gene co-expression networks, which are constructed by fixing different thresholds for robustness analysis. Using the fifth Dialogue on Reverse Engineering Assessment and Methods challenge (DREAM5) framework, the authors demonstrate that nearly all new communities extracted from a 'deconvolution filter' contain more genuine PPIs than before deconvolution.
Collapse
Affiliation(s)
- Jin Zhang
- School of Information Science and Engineering, University of Jinan, Jinan 250022, People's Republic of China.
| | - Shan Ju
- School of International Trade and Economics, Shandong University of Finance and Economics, Jinan 250014, People's Republic of China
| |
Collapse
|
40
|
Reyna MA, Leiserson MDM, Raphael BJ. Hierarchical HotNet: identifying hierarchies of altered subnetworks. Bioinformatics 2019; 34:i972-i980. [PMID: 30423088 PMCID: PMC6129270 DOI: 10.1093/bioinformatics/bty613] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Motivation The analysis of high-dimensional ‘omics data is often informed by the use of biological interaction networks. For example, protein–protein interaction networks have been used to analyze gene expression data, to prioritize germline variants, and to identify somatic driver mutations in cancer. In these and other applications, the underlying computational problem is to identify altered subnetworks containing genes that are both highly altered in an ‘omics dataset and are topologically close (e.g. connected) on an interaction network. Results We introduce Hierarchical HotNet, an algorithm that finds a hierarchy of altered subnetworks. Hierarchical HotNet assesses the statistical significance of the resulting subnetworks over a range of biological scales and explicitly controls for ascertainment bias in the network. We evaluate the performance of Hierarchical HotNet and several other algorithms that identify altered subnetworks on the problem of predicting cancer genes and significantly mutated subnetworks. On somatic mutation data from The Cancer Genome Atlas, Hierarchical HotNet outperforms other methods and identifies significantly mutated subnetworks containing both well-known cancer genes and candidate cancer genes that are rarely mutated in the cohort. Hierarchical HotNet is a robust algorithm for identifying altered subnetworks across different ‘omics datasets. Availability and implementation http://github.com/raphael-group/hierarchical-hotnet. Supplementary information Supplementary material are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew A Reyna
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Mark D M Leiserson
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| |
Collapse
|
41
|
Identifying Protein Complexes from Dynamic Temporal Interval Protein-Protein Interaction Networks. BIOMED RESEARCH INTERNATIONAL 2019; 2019:3726721. [PMID: 31531351 PMCID: PMC6720829 DOI: 10.1155/2019/3726721] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 05/22/2019] [Accepted: 07/04/2019] [Indexed: 11/26/2022]
Abstract
Identification of protein complex is very important for revealing the underlying mechanism of biological processes. Many computational methods have been developed to identify protein complexes from static protein-protein interaction (PPI) networks. Recently, researchers are considering the dynamics of protein-protein interactions. Dynamic PPI networks are closer to reality in the cell system. It is expected that more protein complexes can be accurately identified from dynamic PPI networks. In this paper, we use the undulating degree above the base level of gene expression instead of the gene expression level to construct dynamic temporal PPI networks. Further we convert dynamic temporal PPI networks into dynamic Temporal Interval Protein Interaction Networks (TI-PINs) and propose a novel method to accurately identify more protein complexes from the constructed TI-PINs. Owing to preserving continuous interactions within temporal interval, the constructed TI-PINs contain more dynamical information for accurately identifying more protein complexes. Our proposed identification method uses multisource biological data to judge whether the joint colocalization condition, the joint coexpression condition, and the expanding cluster condition are satisfied; this is to ensure that the identified protein complexes have the features of colocalization, coexpression, and functional homogeneity. The experimental results on yeast data sets demonstrated that using the constructed TI-PINs can obtain better identification of protein complexes than five existing dynamic PPI networks, and our proposed identification method can find more protein complexes accurately than four other methods.
Collapse
|
42
|
Sügis E, Dauvillier J, Leontjeva A, Adler P, Hindie V, Moncion T, Collura V, Daudin R, Loe-Mie Y, Herault Y, Lambert JC, Hermjakob H, Pupko T, Rain JC, Xenarios I, Vilo J, Simonneau M, Peterson H. HENA, heterogeneous network-based data set for Alzheimer's disease. Sci Data 2019; 6:151. [PMID: 31413325 PMCID: PMC6694132 DOI: 10.1038/s41597-019-0152-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 06/18/2019] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease and other types of dementia are the top cause for disabilities in later life and various types of experiments have been performed to understand the underlying mechanisms of the disease with the aim of coming up with potential drug targets. These experiments have been carried out by scientists working in different domains such as proteomics, molecular biology, clinical diagnostics and genomics. The results of such experiments are stored in the databases designed for collecting data of similar types. However, in order to get a systematic view of the disease from these independent but complementary data sets, it is necessary to combine them. In this study we describe a heterogeneous network-based data set for Alzheimer's disease (HENA). Additionally, we demonstrate the application of state-of-the-art graph convolutional networks, i.e. deep learning methods for the analysis of such large heterogeneous biological data sets. We expect HENA to allow scientists to explore and analyze their own results in the broader context of Alzheimer's disease research.
Collapse
Affiliation(s)
- Elena Sügis
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Jerome Dauvillier
- Swiss Institute of Bioinformatics, Vital-IT group, Unil Quartier Sorge, Genopode building, CH-1015, Lausanne, Switzerland
| | - Anna Leontjeva
- CSIRO Data 61, 5/13 Garden St, Eveleigh, NSW, 2015, Australia
| | - Priit Adler
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Valerie Hindie
- Hybrigenics SA, 3-5 Impasse Reille, 75014, Paris, France
| | - Thomas Moncion
- Hybrigenics SA, 3-5 Impasse Reille, 75014, Paris, France
| | | | - Rachel Daudin
- Institut national de la santé et de la recherche médicale, INSERM U894 2 ter rue d'Alésia, 75014, Paris, France
- Laboratoire Aimé Cotton, Centre National Recherche Scientifique, Université Paris-Sud, Ecole Normale Supérieure Paris-Saclay, Université Paris-Saclay, 91405, Orsay, France
| | - Yann Loe-Mie
- (Epi)genomics of Animal Development Unit, Institut Pasteur, CNRS UMR3738, Paris, 75015, France
| | - Yann Herault
- Centre Européen de Recherche en Biologie et Médecine, 1 rue Laurent Fries, 67404, Illkirch, France
| | - Jean-Charles Lambert
- Institut Pasteur de Lille, UMR 744 1 rue du Pr. Calmette BP 245, 59019, Lille cedex, France
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10 1SD, Hinxton, United Kingdom
| | - Tal Pupko
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, P.O. Box 39040, 6997801, Tel Aviv, Israel
| | | | - Ioannis Xenarios
- Center for Integrative Genomics University of Lausanne, Genopode, 1015, Lausanne, Switzerland
- Genome Center Health 2030, Analytical Platform Department, Chemin des Mines 9, 1202, Genève, Switzerland
- DFR CHUV, Rue du Bugnon 21, 1011, Lausanne, Switzerland
- Agora Center, LICR/Department of Oncology, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
| | - Jaak Vilo
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Michel Simonneau
- Institut national de la santé et de la recherche médicale, INSERM U894 2 ter rue d'Alésia, 75014, Paris, France.
- Laboratoire Aimé Cotton, Centre National Recherche Scientifique, Université Paris-Sud, Ecole Normale Supérieure Paris-Saclay, Université Paris-Saclay, 91405, Orsay, France.
| | - Hedi Peterson
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia.
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia.
| |
Collapse
|
43
|
Zhang J, Zhong C, Huang Y, Lin HX, Wang M. A method for identifying protein complexes with the features of joint co-localization and joint co-expression in static PPI networks. Comput Biol Med 2019; 111:103333. [PMID: 31376777 DOI: 10.1016/j.compbiomed.2019.103333] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 06/01/2019] [Accepted: 06/17/2019] [Indexed: 02/09/2023]
Abstract
Identifying protein complexes in static protein-protein interaction (PPI) networks is essential for understanding the underlying mechanism of biological processes. Proteins in a complex are co-localized at the same place and co-expressed at the same time. We propose a novel method to identify protein complexes with the features of joint co-localization and joint co-expression in static PPI networks. To achieve this goal, we define a joint localization vector to construct a joint co-localization criterion of a protein group, and define a joint gene expression to construct a joint co-expression criterion of a gene group. Moreover, the functional similarity of proteins in a complex is an important characteristic. Thus, we use the CC-based, MF-based, and BP-based protein similarities to devise functional similarity criterion to determine whether a protein is functionally similar to a protein cluster. Based on the core-attachment structure and following to seed expanding strategy, we use four types of biological data including PPI data with reliability score, protein localization data, gene expression data, and gene ontology annotations, to identify protein complexes. The experimental results on yeast data show that comparing with existing methods our proposed method can efficiently and exactly identify more protein complexes, especially more protein complexes of sizes from 2 to 6. Furthermore, the enrichment analysis demonstrates that the protein complexes identified by our method have significant biological meaning.
Collapse
Affiliation(s)
- Jinxiong Zhang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; School of Computer, Electronics and Information, Guangxi University, Nanning, China.
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, China.
| | - Yiran Huang
- School of Computer, Electronics and Information, Guangxi University, Nanning, China.
| | - Hai Xiang Lin
- Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, the Netherlands.
| | - Mian Wang
- College of Life Science and Technology, Guangxi University, Nanning, China.
| |
Collapse
|
44
|
Guala D, Ogris C, Müller N, Sonnhammer ELL. Genome-wide functional association networks: background, data & state-of-the-art resources. Brief Bioinform 2019; 21:1224-1237. [PMID: 31281921 PMCID: PMC7373183 DOI: 10.1093/bib/bbz064] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 04/29/2019] [Accepted: 05/04/2019] [Indexed: 02/06/2023] Open
Abstract
The vast amount of experimental data from recent advances in the field of high-throughput biology begs for integration into more complex data structures such as genome-wide functional association networks. Such networks have been used for elucidation of the interplay of intra-cellular molecules to make advances ranging from the basic science understanding of evolutionary processes to the more translational field of precision medicine. The allure of the field has resulted in rapid growth of the number of available network resources, each with unique attributes exploitable to answer different biological questions. Unfortunately, the high volume of network resources makes it impossible for the intended user to select an appropriate tool for their particular research question. The aim of this paper is to provide an overview of the underlying data and representative network resources as well as to mention methods of integration, allowing a customized approach to resource selection. Additionally, this report will provide a primer for researchers venturing into the field of network integration.
Collapse
Affiliation(s)
- Dimitri Guala
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Christoph Ogris
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Nikola Müller
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Erik L L Sonnhammer
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
45
|
Feng J, Li D, Tang Y, Du R, Liu L. Molecular cloning of the Rab7 effector RILP (Rab-interacting lysosomal protein) in Litopenaeus vannamei and preliminary analysis of its role in white spot syndrome virus infection. FISH & SHELLFISH IMMUNOLOGY 2019; 90:126-133. [PMID: 31059814 DOI: 10.1016/j.fsi.2019.04.306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 04/26/2019] [Accepted: 04/30/2019] [Indexed: 06/09/2023]
Abstract
To investigate the role of the Rab7 effector RILP (Rab-interacting lysosomal protein) in white spot syndrome virus (WSSV) infection, the full-length cDNA of RILP (LvRILP) was cloned in Litopenaeus vannamei, which consists of 1595 bp and encodes a polypeptide of 411 amino acids. Sequence analysis and multiple sequence alignment displayed that LvRILP contained a conserved RILP region from 277 amino acid to 325 amino acid. Both the LvRILP and Rab7 mRNA were most highly expressed in stomach and most lowly expressed in hemocyte, which were significantly up-regulated and exhibited similar kinetics post WSSV infection. The interaction of Rab7 with LvRILP was verified by both GST Pull-down and ELISA. Meanwhile, the results of Pull-down assays showed that the GST-tagged VP28 (GST-VP28), His-tagged Rab7 (His-Rab7) and His-RILP formed a tripartite complex. After silencing by specific LvRILP dsRNA, the LvRILP mRNA level exhibited a significant reduction, and the expression levels of three WSSV genes ie1, wsv477 and vp28 all exhibited decreases at 24, 36 and 48 h post WSSV infection. These results suggested that the Rab7 effector RILP was involved in WSSV infection.
Collapse
Affiliation(s)
- Jixing Feng
- Laboratory of Pathology of Aquatic Animals, Yantai University, Yantai, 264005, PR China.
| | - Denglai Li
- Laboratory of Pathology of Aquatic Animals, Yantai University, Yantai, 264005, PR China
| | - Yongzheng Tang
- Laboratory of Pathology of Aquatic Animals, Yantai University, Yantai, 264005, PR China
| | - Rongbin Du
- Laboratory of Pathology of Aquatic Animals, Yantai University, Yantai, 264005, PR China
| | - Liming Liu
- Laboratory of Pathology of Aquatic Animals, Yantai University, Yantai, 264005, PR China
| |
Collapse
|
46
|
Li X, Li W, Zeng M, Zheng R, Li M. Network-based methods for predicting essential genes or proteins: a survey. Brief Bioinform 2019; 21:566-583. [DOI: 10.1093/bib/bbz017] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/21/2019] [Accepted: 01/22/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein–protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Wenkai Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| |
Collapse
|
47
|
Assis R. Lineage-Specific Expression Divergence in Grasses Is Associated with Male Reproduction, Host-Pathogen Defense, and Domestication. Genome Biol Evol 2019; 11:207-219. [PMID: 30398650 PMCID: PMC6331041 DOI: 10.1093/gbe/evy245] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2018] [Indexed: 02/02/2023] Open
Abstract
Poaceae (grasses) is an agriculturally important and widely distributed family of plants with extraordinary phenotypic diversity, much of which was generated under recent lineage-specific evolution. Yet, little is known about the genes and functional modules involved in the lineage-specific divergence of grasses. Here, I address this question on a genome-wide scale by applying a novel branch-based statistic of lineage-specific expression divergence, LED, to RNA-seq data from nine tissues of the wild grass Brachypodium distachyon and its domesticated relatives Oryza sativa japonica (rice) and Sorghum bicolor (sorghum). I find that LED is generally smallest in B. distachyon and largest in O. sativa japonica, which underwent domestication earlier than S. bicolor, supporting the hypothesis that domestication may increase the rate of lineage-specific expression divergence in grasses. Moreover, in all three species, LED is positively correlated with protein-coding sequence divergence and tissue specificity, and negatively correlated with network connectivity. Further analysis reveals that genes with large LED are often primarily expressed in anther, implicating lineage-specific expression divergence in the evolution of male reproductive phenotypes. Gene ontology enrichment analysis also identifies an overrepresentation of terms related to male reproduction in the two domesticated grasses, as well as to those involved in host-pathogen defense in all three species. Last, examinations of genes with the largest LED reveal that their lineage-specific expression divergence may have contributed to antimicrobial functions in B. distachyon, to enhanced adaptation and yield during domestication in O. sativa japonica, and to defense against a widespread and devastating fungal pathogen in S. bicolor. Together, these findings suggest that lineage-specific expression divergence in grasses may increase under domestication and preferentially target rapidly evolving genes involved in male reproduction, host-pathogen defense, and the origin of domesticated phenotypes.
Collapse
Affiliation(s)
- Raquel Assis
- Department of Biology, Pennsylvania State University, University Park
| |
Collapse
|
48
|
Manners HN, Roy S, Kalita JK. Intrinsic-overlapping co-expression module detection with application to Alzheimer's Disease. Comput Biol Chem 2018; 77:373-389. [PMID: 30466046 DOI: 10.1016/j.compbiolchem.2018.10.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 10/28/2018] [Accepted: 10/29/2018] [Indexed: 11/18/2022]
Abstract
Genes interact with each other and may cause perturbation in the molecular pathways leading to complex diseases. Often, instead of any single gene, a subset of genes interact, forming a network, to share common biological functions. Such a subnetwork is called a functional module or motif. Identifying such modules and central key genes in them, that may be responsible for a disease, may help design patient-specific drugs. In this study, we consider the neurodegenerative Alzheimer's Disease (AD) and identify potentially responsible genes from functional motif analysis. We start from the hypothesis that central genes in genetic modules are more relevant to a disease that is under investigation and identify hub genes from the modules as potential marker genes. Motifs or modules are often non-exclusive or overlapping in nature. Moreover, they sometimes show intrinsic or hierarchical distributions with overlapping functional roles. To the best of our knowledge, no prior work handles both the situations in an integrated way. We propose a non-exclusive clustering approach, CluViaN (Clustering Via Network) that can detect intrinsic as well as overlapping modules from gene co-expression networks constructed using microarray expression profiles. We compare our method with existing methods to evaluate the quality of modules extracted. CluViaN reports the presence of intrinsic and overlapping motifs in different species not reported by any other research. We further apply our method to extract significant AD specific modules using CluViaN and rank them based the number of genes from a module involved in the disease pathways. Finally, top central genes are identified by topological analysis of the modules. We use two different AD phenotype data for experimentation. We observe that central genes, namely PSEN1, APP, NDUFB2, NDUFA1, UQCR10, PPP3R1 and a few more, play significant roles in the AD. Interestingly, our experiments also find a hub gene, PML, which has recently been reported to play a role in plasticity, circadian rhythms and the response to proteins which can cause neurodegenerative disorders. MUC4, another hub gene that we find experimentally is yet to be investigated for its potential role in AD. A software implementation of CluViaN in Java is available for download at https://sites.google.com/site/swarupnehu/publications/resources/CluViaN Software.rar.
Collapse
Affiliation(s)
- Hazel Nicolette Manners
- Department of Information Technology, North Eastern Hill University, Shillong, Meghalaya, India.
| | - Swarup Roy
- Department of Computer Applications, Sikkim University, Gangtok, Sikkim, India; Department of Information Technology, North Eastern Hill University, Shillong, Meghalaya, India.
| | - Jugal K Kalita
- Department of Computer Science, University of Colorado, Colorado Springs, USA.
| |
Collapse
|
49
|
Zhang L, Liu JY, Gu H, Du Y, Zuo JF, Zhang Z, Zhang M, Li P, Dunwell JM, Cao Y, Zhang Z, Zhang YM. Bradyrhizobium diazoefficiens USDA 110- Glycine max Interactome Provides Candidate Proteins Associated with Symbiosis. J Proteome Res 2018; 17:3061-3074. [PMID: 30091610 DOI: 10.1021/acs.jproteome.8b00209] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Although the legume-rhizobium symbiosis is a most-important biological process, there is a limited knowledge about the protein interaction network between host and symbiont. Using interolog- and domain-based approaches, we constructed an interspecies protein interactome containing 5115 protein-protein interactions between 2291 Glycine max and 290 Bradyrhizobium diazoefficiens USDA 110 proteins. The interactome was further validated by the expression pattern analysis in nodules, gene ontology term semantic similarity, co-expression analysis, and luciferase complementation image assay. In the G. max-B. diazoefficiens interactome, bacterial proteins are mainly ion channel and transporters of carbohydrates and cations, while G. max proteins are mainly involved in the processes of metabolism, signal transduction, and transport. We also identified the top 10 highly interacting proteins (hubs) for each species. Kyoto Encyclopedia of Genes and Genomes pathway analysis for each hub showed that a pair of 14-3-3 proteins (SGF14g and SGF14k) and 5 heat shock proteins in G. max are possibly involved in symbiosis, and 10 hubs in B. diazoefficiens may be important symbiotic effectors. Subnetwork analysis showed that 18 symbiosis-related soluble N-ethylmaleimide sensitive factor attachment protein receptor proteins may play roles in regulating bacterial ion channels, and SGF14g and SGF14k possibly regulate the rhizobium dicarboxylate transport protein DctA. The predicted interactome provide a valuable basis for understanding the molecular mechanism of nodulation in soybean.
Collapse
Affiliation(s)
- Li Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
- School of Public Health , Xinxiang Medical University , Xinxiang 453003 , China
| | - Jin-Yang Liu
- College of Agriculture, Nanjing Agricultural University , Nanjing 210095 , China
| | - Huan Gu
- College of Agriculture, Nanjing Agricultural University , Nanjing 210095 , China
| | - Yanfang Du
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Jian-Fang Zuo
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Zhibin Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Menglin Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Pan Li
- School of Public Health , Xinxiang Medical University , Xinxiang 453003 , China
| | - Jim M Dunwell
- School of Agriculture, Policy and Development , University of Reading , Reading RG6 6AR , United Kingdom
| | - Yangrong Cao
- College of Life Science and Technology , Huazhong Agricultural University , Wuhan 430070 , China
| | - Zuxin Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Yuan-Ming Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| |
Collapse
|
50
|
Ding Z, Kihara D. Computational Methods for Predicting Protein-Protein Interactions Using Various Protein Features. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2018; 93:e62. [PMID: 29927082 PMCID: PMC6097941 DOI: 10.1002/cpps.62] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Understanding protein-protein interactions (PPIs) in a cell is essential for learning protein functions, pathways, and mechanism of diseases. PPIs are also important targets for developing drugs. Experimental methods, both small-scale and large-scale, have identified PPIs in several model organisms. However, results cover only a part of PPIs of organisms; moreover, there are many organisms whose PPIs have not yet been investigated. To complement experimental methods, many computational methods have been developed that predict PPIs from various characteristics of proteins. Here we provide an overview of literature reports to classify computational PPI prediction methods that consider different features of proteins, including protein sequence, genomes, protein structure, function, PPI network topology, and those which integrate multiple methods. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907 USA
- Corresponding author: DK; , Phone: 1-765-496-2284 (DK)
| |
Collapse
|