1
|
Martins YC, Ziviani A, Cerqueira e Costa MDO, Cavalcanti MCR, Nicolás MF, de Vasconcelos ATR. PPIntegrator: semantic integrative system for protein-protein interaction and application for host-pathogen datasets. BIOINFORMATICS ADVANCES 2023; 3:vbad067. [PMID: 37359724 PMCID: PMC10290227 DOI: 10.1093/bioadv/vbad067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 04/28/2023] [Accepted: 05/30/2023] [Indexed: 06/28/2023]
Abstract
Summary Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein-protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host-pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host-pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system. Availability and implementation https://github.com/YasCoMa/ppintegrator, https://github.com/YasCoMa/ppi_validation_process and https://github.com/YasCoMa/predprin.
Collapse
Affiliation(s)
- Yasmmin Côrtes Martins
- Bioinformatics Laboratory, National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | - Artur Ziviani
- Data Extreme Laboratory (DEXL), National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | | | | | - Marisa Fabiana Nicolás
- Bioinformatics Laboratory, National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | | |
Collapse
|
2
|
Ibrahim AH, Karabulut OC, Karpuzcu BA, Türk E, Süzek BE. A correlation coefficient-based feature selection approach for virus-host protein-protein interaction prediction. PLoS One 2023; 18:e0285168. [PMID: 37130110 PMCID: PMC10153705 DOI: 10.1371/journal.pone.0285168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 04/17/2023] [Indexed: 05/03/2023] Open
Abstract
Prediction of virus-host protein-protein interactions (PPI) is a broad research area where various machine-learning-based classifiers are developed. Transforming biological data into machine-usable features is a preliminary step in constructing these virus-host PPI prediction tools. In this study, we have adopted a virus-host PPI dataset and a reduced amino acids alphabet to create tripeptide features and introduced a correlation coefficient-based feature selection. We applied feature selection across several correlation coefficient metrics and statistically tested their relevance in a structural context. We compared the performance of feature-selection models against that of the baseline virus-host PPI prediction models created using different classification algorithms without the feature selection. We also tested the performance of these baseline models against the previously available tools to ensure their predictive power is acceptable. Here, the Pearson coefficient provides the best performance with respect to the baseline model as measured by AUPR; a drop of 0.003 in AUPR while achieving a 73.3% (from 686 to 183) reduction in the number of tripeptides features for random forest. The results suggest our correlation coefficient-based feature selection approach, while decreasing the computation time and space complexity, has a limited impact on the prediction performance of virus-host PPI prediction tools.
Collapse
Affiliation(s)
- Ahmed Hassan Ibrahim
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Onur Can Karabulut
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Betül Asiye Karpuzcu
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Erdem Türk
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Barış Ethem Süzek
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
- Georgetown University Medical Center, Biochemistry and Molecular & Cellular Biology, Washington DC, United States of America
| |
Collapse
|
3
|
Iuchi H, Kawasaki J, Kubo K, Fukunaga T, Hokao K, Yokoyama G, Ichinose A, Suga K, Hamada M. Bioinformatics approaches for unveiling virus-host interactions. Comput Struct Biotechnol J 2023; 21:1774-1784. [PMID: 36874163 PMCID: PMC9969756 DOI: 10.1016/j.csbj.2023.02.044] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Junna Kawasaki
- Faculty of Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Nishi Waseda, Shinjuku-ku, Tokyo 169-0051, Japan
| | - Koki Hokao
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Gentaro Yokoyama
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Akiko Ichinose
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kanta Suga
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
4
|
Karpuzcu BA, Türk E, Ibrahim AH, Karabulut OC, Süzek BE. Machine Learning Methods for Virus-Host Protein-Protein Interaction Prediction. Methods Mol Biol 2023; 2690:401-417. [PMID: 37450162 DOI: 10.1007/978-1-0716-3327-4_31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
The attachment of a virion to a respective cellular receptor on the host organism occurring through the virus-host protein-protein interactions (PPIs) is a decisive step for viral pathogenicity and infectivity. Therefore, a vast number of wet-lab experimental techniques are used to study virus-host PPIs. Taking the great number and enormous variety of virus-host PPIs and the cost as well as labor of laboratory work, however, computational approaches toward analyzing the available interaction data and predicting previously unidentified interactions have been on the rise. Among them, machine-learning-based models are getting increasingly more attention with a great body of resources and tools proposed recently.In this chapter, we first provide the methodology with major steps toward the development of a virus-host PPI prediction tool. Next, we discuss the challenges involved and evaluate several existing machine-learning-based virus-host PPI prediction tools. Finally, we describe our experience with several ensemble techniques as utilized on available prediction results retrieved from individual PPI prediction tools. Overall, based on our experience, we recognize there is still room for the development of new individual and/or ensemble virus-host PPI prediction tools that leverage existing tools.
Collapse
Affiliation(s)
- Betül Asiye Karpuzcu
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Erdem Türk
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Ahmad Hassan Ibrahim
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Onur Can Karabulut
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Barış Ethem Süzek
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey.
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey.
| |
Collapse
|
5
|
Neumann D, Roy S, Minhas FUAA, Ben-Hur A. On the choice of negative examples for prediction of host-pathogen protein interactions. FRONTIERS IN BIOINFORMATICS 2022; 2:1083292. [PMID: 36591335 PMCID: PMC9798088 DOI: 10.3389/fbinf.2022.1083292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 11/14/2022] [Indexed: 12/23/2022] Open
Abstract
As practitioners of machine learning in the area of bioinformatics we know that the quality of the results crucially depends on the quality of our labeled data. While there is a tendency to focus on the quality of positive examples, the negative examples are equally as important. In this opinion paper we revisit the problem of choosing negative examples for the task of predicting protein-protein interactions, either among proteins of a given species or for host-pathogen interactions and describe important issues that are prevalent in the current literature. The challenge in creating datasets for this task is the noisy nature of the experimentally derived interactions and the lack of information on non-interacting proteins. A standard approach is to choose random pairs of non-interacting proteins as negative examples. Since the interactomes of all species are only partially known, this leads to a very small percentage of false negatives. This is especially true for host-pathogen interactions. To address this perceived issue, some researchers have chosen to select negative examples as pairs of proteins whose sequence similarity to the positive examples is sufficiently low. This clearly reduces the chance for false negatives, but also makes the problem much easier than it really is, leading to over-optimistic accuracy estimates. We demonstrate the effect of this form of bias using a selection of recent protein interaction prediction methods of varying complexity, and urge researchers to pay attention to the details of generating their datasets for potential biases like this.
Collapse
Affiliation(s)
- Don Neumann
- Department Computer Science, Colorado State University, Fort Collins, CO, United States,*Correspondence: Don Neumann, ; Asa Ben-Hur,
| | - Soumyadip Roy
- Department Computer Science, Colorado State University, Fort Collins, CO, United States
| | | | - Asa Ben-Hur
- Department Computer Science, Colorado State University, Fort Collins, CO, United States,*Correspondence: Don Neumann, ; Asa Ben-Hur,
| |
Collapse
|
6
|
Asim MN, Fazeel A, Ibrahim MA, Dengel A, Ahmed S. MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses. Front Med (Lausanne) 2022; 9:1025887. [PMID: 36465911 PMCID: PMC9709337 DOI: 10.3389/fmed.2022.1025887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/17/2022] [Indexed: 09/19/2023] Open
Abstract
Viral-host protein-protein interaction (VHPPI) prediction is essential to decoding molecular mechanisms of viral pathogens and host immunity processes that eventually help to control the propagation of viral diseases and to design optimized therapeutics. Multiple AI-based predictors have been developed to predict diverse VHPPIs across a wide range of viruses and hosts, however, these predictors produce better performance only for specific types of hosts and viruses. The prime objective of this research is to develop a robust meta predictor (MP-VHPPI) capable of more accurately predicting VHPPI across multiple hosts and viruses. The proposed meta predictor makes use of two well-known encoding methods Amphiphilic Pseudo-Amino Acid Composition (APAAC) and Quasi-sequence (QS) Order that capture amino acids sequence order and distributional information to most effectively generate the numerical representation of complete viral-host raw protein sequences. Feature agglomeration method is utilized to transform the original feature space into a more informative feature space. Random forest (RF) and Extra tree (ET) classifiers are trained on optimized feature space of both APAAC and QS order separate encoders and by combining both encodings. Further predictions of both classifiers are utilized to feed the Support Vector Machine (SVM) classifier that makes final predictions. The proposed meta predictor is evaluated over 7 different benchmark datasets, where it outperforms existing VHPPI predictors with an average performance of 3.07, 6.07, 2.95, and 2.85% in terms of accuracy, Mathews correlation coefficient, precision, and sensitivity, respectively. To facilitate the scientific community, the MP-VHPPI web server is available at https://sds_genetic_analysis.opendfki.de/MP-VHPPI/.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Ahtisham Fazeel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| |
Collapse
|
7
|
Kumar S, Kumar GS, Maitra SS, Malý P, Bharadwaj S, Sharma P, Dwivedi VD. Viral informatics: bioinformatics-based solution for managing viral infections. Brief Bioinform 2022; 23:6659740. [PMID: 35947964 DOI: 10.1093/bib/bbac326] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/26/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Several new viral infections have emerged in the human population and establishing as global pandemics. With advancements in translation research, the scientific community has developed potential therapeutics to eradicate or control certain viral infections, such as smallpox and polio, responsible for billions of disabilities and deaths in the past. Unfortunately, some viral infections, such as dengue virus (DENV) and human immunodeficiency virus-1 (HIV-1), are still prevailing due to a lack of specific therapeutics, while new pathogenic viral strains or variants are emerging because of high genetic recombination or cross-species transmission. Consequently, to combat the emerging viral infections, bioinformatics-based potential strategies have been developed for viral characterization and developing new effective therapeutics for their eradication or management. This review attempts to provide a single platform for the available wide range of bioinformatics-based approaches, including bioinformatics methods for the identification and management of emerging or evolved viral strains, genome analysis concerning the pathogenicity and epidemiological analysis, computational methods for designing the viral therapeutics, and consolidated information in the form of databases against the known pathogenic viruses. This enriched review of the generally applicable viral informatics approaches aims to provide an overview of available resources capable of carrying out the desired task and may be utilized to expand additional strategies to improve the quality of translation viral informatics research.
Collapse
Affiliation(s)
- Sanjay Kumar
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | - Geethu S Kumar
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida, Uttar Pradesh, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | | | - Petr Malý
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Shiv Bharadwaj
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Pradeep Sharma
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Vivek Dhar Dwivedi
- Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India.,Institute of Advanced Materials, IAAM, 59053 Ulrika, Sweden
| |
Collapse
|
8
|
Hussain A, Asif N, Pirzada AR, Noureen A, Shaukat J, Burhan A, Zaynab M, Ali E, Imran K, Ameen A, Mahmood MA, Nazar A, Mukhtar MS. Genome wide study of cysteine rich receptor like proteins in Gossypium sp. Sci Rep 2022; 12:4885. [PMID: 35318409 PMCID: PMC8941122 DOI: 10.1038/s41598-022-08943-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 03/11/2022] [Indexed: 02/08/2023] Open
Abstract
Cysteine-rich receptor-like-kinases (CRKs), a transmembrane subfamily of receptor-like kinase, play crucial roles in plant adaptation. As such cotton is the major source of fiber for the textile industry, but environmental stresses are limiting its growth and production. Here, we have performed a deep computational analysis of CRKs in five Gossypium species, including G. arboreum (60 genes), G. raimondii (74 genes), G. herbaceum (65 genes), G. hirsutum (118 genes), and G. barbadense (120 genes). All identified CRKs were classified into 11 major classes and 43 subclasses with the finding of several novel CRK-associated domains including ALMT, FUSC_2, Cript, FYVE, and Pkinase. Of these, DUF26_DUF26_Pkinase_Tyr was common and had elevated expression under different biotic and abiotic stresses. Moreover, the 35 land plants comparison identified several new CRKs domain-architectures. Likewise, several SNPs and InDels were observed in CLCuD resistant G. hirsutum. The miRNA target side prediction and their expression profiling in different tissues predicted miR172 as a major CRK regulating miR. The expression profiling of CRKs identified multiple clusters with co-expression under certain stress conditions. The expression analysis under CLCuD highlighted the role of GhCRK057, GhCRK059, GhCRK058, and GhCRK081 in resistant accession. Overall, these results provided primary data for future potential functional analysis as well as a reference study for other agronomically important crops.
Collapse
Affiliation(s)
- Athar Hussain
- Genomics Lab, School of Food and Agricultural Sciences (SFAS), University of Management and Technology (UMT), Lahore, 54000, Pakistan.
| | - Naila Asif
- Department of Life Sciences, School of Science, University of Management and Technology (UMT), Lahore, 54000, Pakistan
| | - Abdul Rafay Pirzada
- Department of Life Sciences, School of Science, University of Management and Technology (UMT), Lahore, 54000, Pakistan
| | - Azka Noureen
- National Institute for Biotechnology and Genetic Engineering (NIBGE), College of Pakistan Institute of Engineering and Applied Sciences (PIEAS), Faisalabad, 38000, Pakistan.,PMAS-Arid Agriculture University Rawalpindi, Rawalpindi, 46300, Pakistan
| | - Javeria Shaukat
- Department of Life Sciences, School of Science, University of Management and Technology (UMT), Lahore, 54000, Pakistan
| | - Akif Burhan
- Department of Life Sciences, School of Science, University of Management and Technology (UMT), Lahore, 54000, Pakistan
| | - Madiha Zaynab
- Shenzhen Key Laboratory of Marine Bioresource & Eco-Environmental Sciences, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, 51807, China
| | - Ejaz Ali
- Center of Excellence in Molecular Biology, University of Punjab, Lahore, 54000, Pakistan
| | - Koukab Imran
- Department of Life Sciences, School of Science, University of Management and Technology (UMT), Lahore, 54000, Pakistan
| | - Ayesha Ameen
- Office of Research Innovation and Commercialization, University of Management and Technology (UMT), Lahore, 54000, Pakistan
| | - Muhammad Arslan Mahmood
- National Institute for Biotechnology and Genetic Engineering (NIBGE), College of Pakistan Institute of Engineering and Applied Sciences (PIEAS), Faisalabad, 38000, Pakistan
| | - Aquib Nazar
- Department of Life Sciences, School of Science, University of Management and Technology (UMT), Lahore, 54000, Pakistan
| | - M Shahid Mukhtar
- Department of Biology, the University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL, 35294, USA
| |
Collapse
|
9
|
Dong TN, Brogden G, Gerold G, Khosla M. A multitask transfer learning framework for the prediction of virus-human protein-protein interactions. BMC Bioinformatics 2021; 22:572. [PMID: 34837942 PMCID: PMC8626732 DOI: 10.1186/s12859-021-04484-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 11/15/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Viral infections are causing significant morbidity and mortality worldwide. Understanding the interaction patterns between a particular virus and human proteins plays a crucial role in unveiling the underlying mechanism of viral infection and pathogenesis. This could further help in prevention and treatment of virus-related diseases. However, the task of predicting protein-protein interactions between a new virus and human cells is extremely challenging due to scarce data on virus-human interactions and fast mutation rates of most viruses. RESULTS We developed a multitask transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome to counter the problem of small training datasets. Instead of using hand-crafted protein features, we utilize statistically rich protein representations learned by a deep language modeling approach from a massive source of protein sequences. Additionally, we employ an additional objective which aims to maximize the probability of observing human protein-protein interactions. This additional task objective acts as a regularizer and also allows to incorporate domain knowledge to inform the virus-human protein-protein interaction prediction model. CONCLUSIONS Our approach achieved competitive results on 13 benchmark datasets and the case study for the SARS-COV-2 virus receptor. Experimental results show that our proposed model works effectively for both virus-human and bacteria-human protein-protein interaction prediction tasks. We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/multitask-transfer .
Collapse
Affiliation(s)
- Thi Ngan Dong
- L3S Research Center, Leibniz University Hannover, Hannover, Germany.
| | - Graham Brogden
- Institute for Biochemistry, University of Veterinary Medicine, Hannover, Germany.,Institute of Experimental Virology, TWINCORE, Center for Experimental and Clinical Infection Research Hannover, Hannover, Germany
| | - Gisa Gerold
- Institute for Biochemistry, University of Veterinary Medicine, Hannover, Germany.,Institute of Experimental Virology, TWINCORE, Center for Experimental and Clinical Infection Research Hannover, Hannover, Germany.,Department of Clinical Microbiology, Umeå University, Umeå, Sweden.,Wallenberg Centre for Molecular Medicine (WCMM), Umeå University, Umeå, Sweden
| | - Megha Khosla
- L3S Research Center, Leibniz University Hannover, Hannover, Germany
| |
Collapse
|
10
|
Loaiza CD, Kaundal R. PredHPI: an integrated web server platform for the detection and visualization of host-pathogen interactions using sequence-based methods. Bioinformatics 2021; 37:622-624. [PMID: 33027504 DOI: 10.1093/bioinformatics/btaa862] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 07/21/2020] [Accepted: 09/22/2020] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Understanding the mechanisms underlying infectious diseases is fundamental to develop prevention strategies. Host-pathogen interactions (HPIs) are actively studied worldwide to find potential genomic targets for the development of novel drugs, vaccines and other therapeutics. Determining which proteins are involved in the interaction system behind an infectious process is the first step to develop an efficient disease control strategy. Very few computational methods have been implemented as web services to infer novel HPIs, and there is not a single framework which combines several of those approaches to produce and visualize a comprehensive analysis of HPIs. RESULTS Here, we introduce PredHPI, a powerful framework that integrates both the detection and visualization of interaction networks in a single web service, facilitating the apprehension of model and non-model host-pathogen systems to aid the biologists in building hypotheses and designing appropriate experiments. PredHPI is built on high-performance computing resources on the backend capable of handling proteome-scale sequence data from both the host as well as pathogen. Data are displayed in an information-rich and interactive visualization, which can be further customized with user-defined layouts. We believe PredHPI will serve as an invaluable resource to diverse experimental biologists and will help advance the research in the understanding of complex infectious diseases. AVAILABILITY AND IMPLEMENTATION PredHPI tool is freely available at http://bioinfo.usu.edu/PredHPI/. SUPPLEMENTARY INFORMATION Sup plementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cristian D Loaiza
- Bioinformatics Lab, Department of Plants, Soils and Climate, Logan, UT 84322, USA.,Bioinformatics Facility, Center for Integrated BioSystems (CIB), Logan, UT 84322, USA
| | - Rakesh Kaundal
- Bioinformatics Lab, Department of Plants, Soils and Climate, Logan, UT 84322, USA.,Bioinformatics Facility, Center for Integrated BioSystems (CIB), Logan, UT 84322, USA.,Department of Computer Science, Utah State University, Logan, UT 84322, USA
| |
Collapse
|
11
|
Tian H, Jiang X, Tao P. PASSer: Prediction of Allosteric Sites Server. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021; 2. [PMID: 34396127 DOI: 10.1088/2632-2153/abe6d6] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Allostery is considered important in regulating protein's activity. Drug development depends on the understanding of allosteric mechanisms, especially the identification of allosteric sites, which is a prerequisite in drug discovery and design. Many computational methods have been developed for allosteric site prediction using pocket features and protein dynamics. Here, we present an ensemble learning method, consisting of eXtreme gradient boosting (XGBoost) and graph convolutional neural network (GCNN), to predict allosteric sites. Our model can learn physical properties and topology without any prior information, and shows good performance under multiple indicators. Prediction results showed that 84.9% of allosteric pockets in the test set appeared in the top 3 positions. The PASSer: Protein Allosteric Sites Server (https://passer.smu.edu), along with a command line interface (CLI, https://github.com/smutaogroup/passerCLI) provide insights for further analysis in drug discovery.
Collapse
Affiliation(s)
- Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, United States of America
| | - Xi Jiang
- Department of Statistical Science, Southern Methodist University, Dallas, Texas, United States of America
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, United States of America
| |
Collapse
|
12
|
Sudhakar P, Machiels K, Verstockt B, Korcsmaros T, Vermeire S. Computational Biology and Machine Learning Approaches to Understand Mechanistic Microbiome-Host Interactions. Front Microbiol 2021; 12:618856. [PMID: 34046017 PMCID: PMC8148342 DOI: 10.3389/fmicb.2021.618856] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 03/19/2021] [Indexed: 12/11/2022] Open
Abstract
The microbiome, by virtue of its interactions with the host, is implicated in various host functions including its influence on nutrition and homeostasis. Many chronic diseases such as diabetes, cancer, inflammatory bowel diseases are characterized by a disruption of microbial communities in at least one biological niche/organ system. Various molecular mechanisms between microbial and host components such as proteins, RNAs, metabolites have recently been identified, thus filling many gaps in our understanding of how the microbiome modulates host processes. Concurrently, high-throughput technologies have enabled the profiling of heterogeneous datasets capturing community level changes in the microbiome as well as the host responses. However, due to limitations in parallel sampling and analytical procedures, big gaps still exist in terms of how the microbiome mechanistically influences host functions at a system and community level. In the past decade, computational biology and machine learning methodologies have been developed with the aim of filling the existing gaps. Due to the agnostic nature of the tools, they have been applied in diverse disease contexts to analyze and infer the interactions between the microbiome and host molecular components. Some of these approaches allow the identification and analysis of affected downstream host processes. Most of the tools statistically or mechanistically integrate different types of -omic and meta -omic datasets followed by functional/biological interpretation. In this review, we provide an overview of the landscape of computational approaches for investigating mechanistic interactions between individual microbes/microbiome and the host and the opportunities for basic and clinical research. These could include but are not limited to the development of activity- and mechanism-based biomarkers, uncovering mechanisms for therapeutic interventions and generating integrated signatures to stratify patients.
Collapse
Affiliation(s)
- Padhmanand Sudhakar
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Earlham Institute, Norwich, United Kingdom
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - Kathleen Machiels
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
| | - Bram Verstockt
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| | - Tamas Korcsmaros
- Earlham Institute, Norwich, United Kingdom
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - Séverine Vermeire
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| |
Collapse
|
13
|
Karabulut OC, Karpuzcu BA, Türk E, Ibrahim AH, Süzek BE. ML-AdVInfect: A Machine-Learning Based Adenoviral Infection Predictor. Front Mol Biosci 2021; 8:647424. [PMID: 34026828 PMCID: PMC8139618 DOI: 10.3389/fmolb.2021.647424] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open
Abstract
Adenoviruses (AdVs) constitute a diverse family with many pathogenic types that infect a broad range of hosts. Understanding the pathogenesis of adenoviral infections is not only clinically relevant but also important to elucidate the potential use of AdVs as vectors in therapeutic applications. For an adenoviral infection to occur, attachment of the viral ligand to a cellular receptor on the host organism is a prerequisite and, in this sense, it is a criterion to decide whether an adenoviral infection can potentially happen. The interaction between any virus and its corresponding host organism is a specific kind of protein-protein interaction (PPI) and several experimental techniques, including high-throughput methods are being used in exploring such interactions. As a result, there has been accumulating data on virus-host interactions including a significant portion reported at publicly available bioinformatics resources. There is not, however, a computational model to integrate and interpret the existing data to draw out concise decisions, such as whether an infection happens or not. In this study, accepting the cellular entry of AdV as a decisive parameter for infectivity, we have developed a machine learning, more precisely support vector machine (SVM), based methodology to predict whether adenoviral infection can take place in a given host. For this purpose, we used the sequence data of the known receptors of AdVs, we identified sets of adenoviral ligands and their respective host species, and eventually, we have constructed a comprehensive adenovirus–host interaction dataset. Then, we committed interaction predictions through publicly available virus-host PPI tools and constructed an AdV infection predictor model using SVM with RBF kernel, with the overall sensitivity, specificity, and AUC of 0.88 ± 0.011, 0.83 ± 0.064, and 0.86 ± 0.030, respectively. ML-AdVInfect is the first of its kind as an effective predictor to screen the infection capacity along with anticipating any cross-species shifts. We anticipate our approach led to ML-AdVInfect can be adapted in making predictions for other viral infections.
Collapse
Affiliation(s)
- Onur Can Karabulut
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Betül Asiye Karpuzcu
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Erdem Türk
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Ahmad Hassan Ibrahim
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Barış Ethem Süzek
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey.,Georgetown University Medical Center, Biochemistry and Molecular and Cellular Biology, Washington, DC, United States
| |
Collapse
|
14
|
Ke Y, Rao J, Zhao H, Lu Y, Xiao N, Yang Y. Accurate prediction of genome-wide RNA secondary structure profile based on extreme gradient boosting. Bioinformatics 2021; 36:4576-4582. [PMID: 32467966 DOI: 10.1093/bioinformatics/btaa534] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 05/01/2020] [Accepted: 05/23/2020] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION RNA secondary structure plays a vital role in fundamental cellular processes, and identification of RNA secondary structure is a key step to understand RNA functions. Recently, a few experimental methods were developed to profile genome-wide RNA secondary structure, i.e. the pairing probability of each nucleotide, through high-throughput sequencing techniques. However, these high-throughput methods have low precision and cannot cover all nucleotides due to limited sequencing coverage. RESULTS Here, we have developed a new method for the prediction of genome-wide RNA secondary structure profile from RNA sequence based on the extreme gradient boosting technique. The method achieves predictions with areas under the receiver operating characteristic curve (AUC) >0.9 on three different datasets, and AUC of 0.888 by another independent test on the recently released Zika virus data. These AUCs are consistently >5% greater than those by the CROSS method recently developed based on a shallow neural network. Further analysis on the 1000 Genome Project data showed that our predicted unpaired probabilities are highly correlated (>0.8) with the minor allele frequencies at synonymous, non-synonymous mutations, and mutations in untranslated regions, which were higher than those generated by RNAplfold. Moreover, the prediction over all human mRNA indicated a consistent result with previous observation that there is a periodic distribution of unpaired probability on codons. The accurate predictions by our method indicate that such model trained on genome-wide experimental data might be an alternative for analytical methods. AVAILABILITY AND IMPLEMENTATION The GRASP is available for academic use at https://github.com/sysu-yanglab/GRASP. SUPPLEMENTARY INFORMATION Supplementary data are available online.
Collapse
Affiliation(s)
- Yaobin Ke
- School of Data and Computer Science, Guangzhou 510000, China
| | - Jiahua Rao
- School of Data and Computer Science, Guangzhou 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Guangzhou 510000, China
| | - Yutong Lu
- School of Data and Computer Science, Guangzhou 510000, China
| | - Nong Xiao
- School of Data and Computer Science, Guangzhou 510000, China
| | - Yuedong Yang
- School of Data and Computer Science, Guangzhou 510000, China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of Ministry of Education, Guangzhou 510000, China
| |
Collapse
|
15
|
Dey L, Chakraborty S, Mukhopadhyay A. Machine learning techniques for sequence-based prediction of viral-host interactions between SARS-CoV-2 and human proteins. Biomed J 2020; 43:438-450. [PMID: 33036956 PMCID: PMC7470713 DOI: 10.1016/j.bj.2020.08.003] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 07/22/2020] [Accepted: 08/05/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, has been declared as a pandemic by the World Health Organization on March 11, 2020. Over 15 million people have already been affected worldwide by COVID-19, resulting in more than 0.6 million deaths. Protein-protein interactions (PPIs) play a key role in the cellular process of SARS-CoV-2 virus infection in the human body. Recently a study has reported some SARS-CoV-2 proteins that interact with several human proteins while many potential interactions remain to be identified. METHOD In this article, various machine learning models are built to predict the PPIs between the virus and human proteins that are further validated using biological experiments. The classification models are prepared based on different sequence-based features of human proteins like amino acid composition, pseudo amino acid composition, and conjoint triad. RESULT We have built an ensemble voting classifier using SVMRadial, SVMPolynomial, and Random Forest technique that gives a greater accuracy, precision, specificity, recall, and F1 score compared to all other models used in the work. A total of 1326 potential human target proteins of SARS-CoV-2 have been predicted by the proposed ensemble model and validated using gene ontology and KEGG pathway enrichment analysis. Several repurposable drugs targeting the predicted interactions are also reported. CONCLUSION This study may encourage the identification of potential targets for more effective anti-COVID drug discovery.
Collapse
Affiliation(s)
- Lopamudra Dey
- Department of Computer Science & Engineering, Heritage Institute of Technology, Kolkata, India; Department of Information Technology, Techno Main, Saltlake, Kolkata, India; Department of. Computer Science & Engineering, University of Kalyani, Kalyani, India
| | - Sanjay Chakraborty
- Department of Computer Science & Engineering, Heritage Institute of Technology, Kolkata, India; Department of Information Technology, Techno Main, Saltlake, Kolkata, India; Department of. Computer Science & Engineering, University of Kalyani, Kalyani, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, Heritage Institute of Technology, Kolkata, India; Department of Information Technology, Techno Main, Saltlake, Kolkata, India; Department of. Computer Science & Engineering, University of Kalyani, Kalyani, India.
| |
Collapse
|
16
|
Khorsand B, Savadi A, Naghibzadeh M. SARS-CoV-2-human protein-protein interaction network. INFORMATICS IN MEDICINE UNLOCKED 2020; 20:100413. [PMID: 32838020 PMCID: PMC7425553 DOI: 10.1016/j.imu.2020.100413] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 07/11/2020] [Accepted: 08/10/2020] [Indexed: 12/13/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the novel coronavirus which caused the coronavirus disease 2019 pandemic and infected more than 12 million victims and resulted in over 560,000 deaths in 213 countries around the world. Having no symptoms in the first week of infection increases the rate of spreading the virus. The increasing rate of the number of infected individuals and its high mortality necessitates an immediate development of proper diagnostic methods and effective treatments. SARS-CoV-2, similar to other viruses, needs to interact with the host proteins to reach the host cells and replicate its genome. Consequently, virus-host protein-protein interaction (PPI) identification could be useful in predicting the behavior of the virus and the design of antiviral drugs. Identification of virus-host PPIs using experimental approaches are very time consuming and expensive. Computational approaches could be acceptable alternatives for many preliminary investigations. In this study, we developed a new method to predict SARS-CoV-2-human PPIs. Our model is a three-layer network in which the first layer contains the most similar Alphainfluenzavirus proteins to SARS-CoV-2 proteins. The second layer contains protein-protein interactions between Alphainfluenzavirus proteins and human proteins. The last layer reveals protein-protein interactions between SARS-CoV-2 proteins and human proteins by using the clustering coefficient network property on the first two layers. To further analyze the results of our prediction network, we investigated human proteins targeted by SARS-CoV-2 proteins and reported the most central human proteins in human PPI network. Moreover, differentially expressed genes of previous researches were investigated and PPIs of SARS-CoV-2-human network, the human proteins of which were related to upregulated genes, were reported.
Collapse
Affiliation(s)
- Babak Khorsand
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Abdorreza Savadi
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
17
|
Mendik P, Dobronyi L, Hári F, Kerepesi C, Maia-Moço L, Buszlai D, Csermely P, Veres DV. Translocatome: a novel resource for the analysis of protein translocation between cellular organelles. Nucleic Acids Res 2020; 47:D495-D505. [PMID: 30380112 PMCID: PMC6324082 DOI: 10.1093/nar/gky1044] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 10/25/2018] [Indexed: 01/02/2023] Open
Abstract
Here we present Translocatome, the first dedicated database of human translocating proteins (URL: http://translocatome.linkgroup.hu). The core of the Translocatome database is the manually curated data set of 213 human translocating proteins listing the source of their experimental validation, several details of their translocation mechanism, their local compartmentalized interactome, as well as their involvement in signalling pathways and disease development. In addition, using the well-established and widely used gradient boosting machine learning tool, XGBoost, Translocatome provides translocation probability values for 13 066 human proteins identifying 1133 and 3268 high- and low-confidence translocating proteins, respectively. The database has user-friendly search options with a UniProt autocomplete quick search and advanced search for proteins filtered by their localization, UniProt identifiers, translocation likelihood or data complexity. Download options of search results, manually curated and predicted translocating protein sets are available on its website. The update of the database is helped by its manual curation framework and connection to the previously published ComPPI compartmentalized protein–protein interaction database (http://comppi.linkgroup.hu). As shown by the application examples of merlin (NF2) and tumor protein 63 (TP63) Translocatome allows a better comprehension of protein translocation as a systems biology phenomenon and can be used as a discovery-tool in the protein translocation field.
Collapse
Affiliation(s)
- Péter Mendik
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Levente Dobronyi
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Ferenc Hári
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Csaba Kerepesi
- Institute for Computer Science and Control (MTA SZTAKI), Hungarian Academy of Sciences, Budapest, Hungary.,Institute of Mathematics, Eötvös Loránd University, Budapest, Hungary
| | - Leonardo Maia-Moço
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary.,Cancer Biology and Epigenetics Group, Research Center of Portuguese Oncology Institute of Porto, Portugal
| | - Donát Buszlai
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Peter Csermely
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Daniel V Veres
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary.,Turbine Ltd., Budapest, Hungary
| |
Collapse
|
18
|
Guven-Maiorov E, Hakouz A, Valjevac S, Keskin O, Tsai CJ, Gursoy A, Nussinov R. HMI-PRED: A Web Server for Structural Prediction of Host-Microbe Interactions Based on Interface Mimicry. J Mol Biol 2020; 432:3395-3403. [PMID: 32061934 PMCID: PMC7261632 DOI: 10.1016/j.jmb.2020.01.025] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 11/28/2019] [Accepted: 01/14/2020] [Indexed: 02/07/2023]
Abstract
Microbes, commensals, and pathogens, control the numerous functions in the host cells. They can alter host signaling and modulate immune surveillance by interacting with the host proteins. For shedding light on the contribution of microbes to health and disease, it is vital to discern how microbial proteins rewire host signaling and through which host proteins they do this. Host-Microbe Interaction PREDictor (HMI-PRED) is a user-friendly web server for structural prediction of protein-protein interactions (PPIs) between the host and a microbial species, including bacteria, viruses, fungi, and protozoa. HMI-PRED relies on "interface mimicry" through which the microbial proteins hijack host binding surfaces. Given the structure of a microbial protein of interest, HMI-PRED will return structural models of potential host-microbe interaction (HMI) complexes, the list of host endogenous and exogenous PPIs that can be disrupted, and tissue expression of the microbe-targeted host proteins. The server also allows users to upload homology models of microbial proteins. Broadly, it aims at large-scale, efficient identification of HMIs. The prediction results are stored in a repository for community access. HMI-PRED is free and available at https://interactome.ku.edu.tr/hmi.
Collapse
Affiliation(s)
- Emine Guven-Maiorov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
| | - Asma Hakouz
- Department of Computer Engineering, Koc University, Istanbul, 34450, Turkey.
| | - Sukejna Valjevac
- Department of Computer Engineering, Koc University, Istanbul, 34450, Turkey.
| | - Ozlem Keskin
- Department of Chemical and Biological Engineering, Koc University, Istanbul, 34450, Turkey.
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
| | - Attila Gursoy
- Department of Computer Engineering, Koc University, Istanbul, 34450, Turkey.
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA; Sackler Inst. of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, 69978, Israel.
| |
Collapse
|
19
|
Evolution and diversity of the EMA families of the divergent equid parasites, Theileria equi and T. haneyi. INFECTION GENETICS AND EVOLUTION 2019; 68:153-160. [DOI: 10.1016/j.meegid.2018.12.020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 12/04/2018] [Accepted: 12/17/2018] [Indexed: 11/30/2022]
|