1
|
Le TD, Nguyen PD, Korkin D, Thieu T. PHILM2Web: A high-throughput database of macromolecular host–pathogen interactions on the Web. Database (Oxford) 2022; 2022:6625823. [PMID: 35776535 PMCID: PMC9248916 DOI: 10.1093/database/baac042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 04/27/2022] [Accepted: 05/31/2022] [Indexed: 12/02/2022]
Abstract
During infection, the pathogen’s entry into the host organism, breaching the host immune defense, spread and multiplication are frequently mediated by multiple interactions between the host and pathogen proteins. Systematic studying of host–pathogen interactions (HPIs) is a challenging task for both experimental and computational approaches and is critically dependent on the previously obtained knowledge about these interactions found in the biomedical literature. While several HPI databases exist that manually filter HPI protein–protein interactions from the generic databases and curated experimental interactomic studies, no comprehensive database on HPIs obtained from the biomedical literature is currently available. Here, we introduce a high-throughput literature-mining platform for extracting HPI data that includes the most comprehensive to date collection of HPIs obtained from the PubMed abstracts. Our HPI data portal, PHILM2Web (Pathogen–Host Interactions by Literature Mining on the Web), integrates an automatically generated database of interactions extracted by PHILM, our high-precision HPI literature-mining algorithm. Currently, the database contains 23 581 generic HPIs between 157 host and 403 pathogen organisms from 11 609 abstracts. The interactions were obtained from processing 608 972 PubMed abstracts, each containing mentions of at least one host and one pathogen organisms. In response to the coronavirus disease 2019 (COVID-19) pandemic, we also utilized PHILM to process 25 796 PubMed abstracts obtained by the same query as the COVID-19 Open Research Dataset. This COVID-19 processing batch resulted in 257 HPIs between 19 host and 31 pathogen organisms from 167 abstracts. The access to the entire HPI dataset is available via a searchable PHILM2Web interface; scientists can also download the entire database in bulk for offline processing. Database URL: http://philm2web.live
Collapse
Affiliation(s)
- Tuan-Dung Le
- Department of Computer Science, Oklahoma State University , Stillwater, OK, USA
| | - Phuong D Nguyen
- Department of Biochemistry and Molecular Biology, Oklahoma State University , Stillwater, OK, USA
| | - Dmitry Korkin
- Department of Computer Science and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute , Worcester, MA, USA
| | - Thanh Thieu
- Machine Learning Department, Moffitt Cancer Center and Research Institute , Tampa, FL, USA
| |
Collapse
|
2
|
Stephens PR, Gottdenker N, Schatz AM, Schmidt JP, Drake JM. Characteristics of the 100 largest modern zoonotic disease outbreaks. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200535. [PMID: 34538141 PMCID: PMC8450623 DOI: 10.1098/rstb.2020.0535] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/14/2021] [Indexed: 12/19/2022] Open
Abstract
Zoonotic disease outbreaks are an important threat to human health and numerous drivers have been recognized as contributing to their increasing frequency. Identifying and quantifying relationships between drivers of zoonotic disease outbreaks and outbreak severity is critical to developing targeted zoonotic disease surveillance and outbreak prevention strategies. However, quantitative studies of outbreak drivers on a global scale are lacking. Attributes of countries such as press freedom, surveillance capabilities and latitude also bias global outbreak data. To illustrate these issues, we review the characteristics of the 100 largest outbreaks in a global dataset (n = 4463 bacterial and viral zoonotic outbreaks), and compare them with 200 randomly chosen background controls. Large outbreaks tended to have more drivers than background outbreaks and were related to large-scale environmental and demographic factors such as changes in vector abundance, human population density, unusual weather conditions and water contamination. Pathogens of large outbreaks were more likely to be viral and vector-borne than background outbreaks. Overall, our case study shows that the characteristics of large zoonotic outbreaks with thousands to millions of cases differ consistently from those of more typical outbreaks. We also discuss the limitations of our work, hoping to pave the way for more comprehensive future studies. This article is part of the theme issue 'Infectious disease macroecology: parasite diversity and dynamics across the globe'.
Collapse
Affiliation(s)
- Patrick R. Stephens
- Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia, Athens, 30602 GA, USA
| | - N. Gottdenker
- Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia, Athens, 30602 GA, USA
- Department of Pathology, College of Veterinary Medicine, University of Georgia, Athens, 30602 GA, USA
| | - A. M. Schatz
- Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia, Athens, 30602 GA, USA
| | - J. P. Schmidt
- Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia, Athens, 30602 GA, USA
| | - John M. Drake
- Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia, Athens, 30602 GA, USA
| |
Collapse
|
3
|
Doherty JF, Chai X, Cope LE, de Angeli Dutra D, Milotic M, Ni S, Park E, Filion A. The rise of big data in disease ecology. Trends Parasitol 2021; 37:1034-1037. [PMID: 34602364 DOI: 10.1016/j.pt.2021.09.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 09/03/2021] [Accepted: 09/08/2021] [Indexed: 10/20/2022]
Abstract
Big data have become readily available to explore patterns in large-scale disease ecology. However, the rate at which these public databases are exploited remains unknown. We highlight trends in big data usage in disease ecology during the past decade and encourage researchers to integrate big data into their study framework.
Collapse
Affiliation(s)
| | - Xuhong Chai
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | - Laurie E Cope
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | | | - Marin Milotic
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | - Steven Ni
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | - Eunji Park
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | - Antoine Filion
- Department of Zoology, University of Otago, Dunedin, New Zealand.
| |
Collapse
|
4
|
Sudhakar P, Machiels K, Verstockt B, Korcsmaros T, Vermeire S. Computational Biology and Machine Learning Approaches to Understand Mechanistic Microbiome-Host Interactions. Front Microbiol 2021; 12:618856. [PMID: 34046017 PMCID: PMC8148342 DOI: 10.3389/fmicb.2021.618856] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 03/19/2021] [Indexed: 12/11/2022] Open
Abstract
The microbiome, by virtue of its interactions with the host, is implicated in various host functions including its influence on nutrition and homeostasis. Many chronic diseases such as diabetes, cancer, inflammatory bowel diseases are characterized by a disruption of microbial communities in at least one biological niche/organ system. Various molecular mechanisms between microbial and host components such as proteins, RNAs, metabolites have recently been identified, thus filling many gaps in our understanding of how the microbiome modulates host processes. Concurrently, high-throughput technologies have enabled the profiling of heterogeneous datasets capturing community level changes in the microbiome as well as the host responses. However, due to limitations in parallel sampling and analytical procedures, big gaps still exist in terms of how the microbiome mechanistically influences host functions at a system and community level. In the past decade, computational biology and machine learning methodologies have been developed with the aim of filling the existing gaps. Due to the agnostic nature of the tools, they have been applied in diverse disease contexts to analyze and infer the interactions between the microbiome and host molecular components. Some of these approaches allow the identification and analysis of affected downstream host processes. Most of the tools statistically or mechanistically integrate different types of -omic and meta -omic datasets followed by functional/biological interpretation. In this review, we provide an overview of the landscape of computational approaches for investigating mechanistic interactions between individual microbes/microbiome and the host and the opportunities for basic and clinical research. These could include but are not limited to the development of activity- and mechanism-based biomarkers, uncovering mechanisms for therapeutic interventions and generating integrated signatures to stratify patients.
Collapse
Affiliation(s)
- Padhmanand Sudhakar
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Earlham Institute, Norwich, United Kingdom
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - Kathleen Machiels
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
| | - Bram Verstockt
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| | - Tamas Korcsmaros
- Earlham Institute, Norwich, United Kingdom
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - Séverine Vermeire
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| |
Collapse
|
5
|
Agany DD, Pietri JE, Gnimpieba EZ. Assessment of vector-host-pathogen relationships using data mining and machine learning. Comput Struct Biotechnol J 2020; 18:1704-1721. [PMID: 32670510 PMCID: PMC7340972 DOI: 10.1016/j.csbj.2020.06.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/19/2020] [Accepted: 06/19/2020] [Indexed: 12/15/2022] Open
Abstract
Infectious diseases, including vector-borne diseases transmitted by arthropods, are a leading cause of morbidity and mortality worldwide. In the era of big data, addressing broad-scale, fundamental questions regarding the complex dynamics of these diseases will increasingly require the integration of diverse datasets to produce new biological knowledge. This review provides a current snapshot of the systematic assessment of the relationships between microbial pathogens, arthropod vectors and mammalian hosts using data mining and machine learning. We employ PRISMA to identify 32 key papers relevant to this topic. Our analysis shows an increasing use of data mining and machine learning tasks and techniques, including prediction, classification, clustering, association rules mining, and deep learning, over the last decade. However, it also reveals a number of critical challenges in applying these to the study of vector-host-pathogen interactions at various systems biology levels. Here, relevant studies, current limitations and future directions are discussed. Furthermore, the quality of data in relevant papers was assessed using the FAIR (Findable, Accessible, Interoperable, Reusable) compliance criteria to evaluate and encourage reproducibility and shareability of research outcomes. Although shortcomings in their application remain, data mining and machine learning have significant potential to break new ground in understanding fundamental aspects of vector-host-pathogen relationships and their application in this field should be encouraged. In particular, while predictive modeling, feature engineering and supervised machine learning are already being used in the field, other data mining and machine learning methods such as deep learning and association rules analysis lag behind and should be implemented in combination with established methods to accelerate hypothesis and knowledge generation in the domain.
Collapse
Affiliation(s)
- Diing D.M. Agany
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| | - Jose E. Pietri
- University of South Dakota, Sanford School of Medicine, Division of Basic Biomedical Sciences, Vermillion, SD, United States
| | - Etienne Z. Gnimpieba
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| |
Collapse
|
6
|
Kafkas Ş, Hoehndorf R. Ontology based mining of pathogen-disease associations from literature. J Biomed Semantics 2019; 10:15. [PMID: 31533864 PMCID: PMC6751637 DOI: 10.1186/s13326-019-0208-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 09/02/2019] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Infectious diseases claim millions of lives especially in the developing countries each year. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen-disease associations that can be utilized in computational studies. A large number of pathogen-disease associations is available from the literature in unstructured form and we need automated methods to extract the data. RESULTS We developed a text mining system designed for extracting pathogen-disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3420 pathogen-disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research. CONCLUSIONS To the best of our knowledge, we present the first study focusing on extracting pathogen-disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from https://github.com/bio-ontology-research-group/padimi and through a public SPARQL endpoint from http://patho.phenomebrowser.net/ .
Collapse
Affiliation(s)
- Şenay Kafkas
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Saudi Arabia
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Saudi Arabia
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Saudi Arabia
| |
Collapse
|
7
|
Badal VD, Kundrotas PJ, Vakser IA. Natural language processing in text mining for structural modeling of protein complexes. BMC Bioinformatics 2018; 19:84. [PMID: 29506465 PMCID: PMC5838950 DOI: 10.1186/s12859-018-2079-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 02/20/2018] [Indexed: 12/04/2022] Open
Abstract
Background Structural modeling of protein-protein interactions produces a large number of putative configurations of the protein complexes. Identification of the near-native models among them is a serious challenge. Publicly available results of biomedical research may provide constraints on the binding mode, which can be essential for the docking. Our text-mining (TM) tool, which extracts binding site residues from the PubMed abstracts, was successfully applied to protein docking (Badal et al., PLoS Comput Biol, 2015; 11: e1004630). Still, many extracted residues were not relevant to the docking. Results We present an extension of the TM tool, which utilizes natural language processing (NLP) for analyzing the context of the residue occurrence. The procedure was tested using generic and specialized dictionaries. The results showed that the keyword dictionaries designed for identification of protein interactions are not adequate for the TM prediction of the binding mode. However, our dictionary designed to distinguish keywords relevant to the protein binding sites led to considerable improvement in the TM performance. We investigated the utility of several methods of context analysis, based on dissection of the sentence parse trees. The machine learning-based NLP filtered the pool of the mined residues significantly more efficiently than the rule-based NLP. Constraints generated by NLP were tested in docking of unbound proteins from the DOCKGROUND X-ray benchmark set 4. The output of the global low-resolution docking scan was post-processed, separately, by constraints from the basic TM, constraints re-ranked by NLP, and the reference constraints. The quality of a match was assessed by the interface root-mean-square deviation. The results showed significant improvement of the docking output when using the constraints generated by the advanced TM with NLP. Conclusions The basic TM procedure for extracting protein-protein binding site residues from the PubMed abstracts was significantly advanced by the deep parsing (NLP techniques for contextual analysis) in purging of the initial pool of the extracted residues. Benchmarking showed a substantial increase of the docking success rate based on the constraints generated by the advanced TM with NLP. Electronic supplementary material The online version of this article (10.1186/s12859-018-2079-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Varsha D Badal
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047, USA
| | - Petras J Kundrotas
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047, USA.
| | - Ilya A Vakser
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047, USA.
| |
Collapse
|
8
|
Vyas R, Bapat S, Goel P, Karthikeyan M, Tambe SS, Kulkarni BD. Application of Genetic Programming (GP) Formalism for Building Disease Predictive Models from Protein-Protein Interactions (PPI) Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:27-37. [PMID: 28113781 DOI: 10.1109/tcbb.2016.2621042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Protein-protein interactions (PPIs) play a vital role in the biological processes involved in the cell functions and disease pathways. The experimental methods known to predict PPIs require tremendous efforts and the results are often hindered by the presence of a large number of false positives. Herein, we demonstrate the use of a new Genetic Programming (GP) based Symbolic Regression (SR) approach for predicting PPIs related to a disease. In a case study, a dataset consisting of one hundred and thirty five PPI complexes related to cancer was used to construct a generic PPI predicting model with good PPI prediction accuracy and generalization ability. A high correlation coefficient(CC) of 0.893, low root mean square error (RMSE) and mean absolute percentage error (MAPE) values of 478.221 and 0.239, respectively were achieved for both the training and test set outputs. To validate the discriminatory nature of the model, it was applied on a dataset of diabetes complexes where it yielded significantly low CC values. Thus, the GP model developed here serves a dual purpose: (a)a predictor of the binding energy of cancer related PPI complexes, and (b)a classifier for discriminating PPI complexes related to cancer from those of other diseases.
Collapse
|
9
|
Chiang AWT, Wu WYL, Wang T, Hwang MJ. Identification of Entry Factors Involved in Hepatitis C Virus Infection Based on Host-Mimicking Short Linear Motifs. PLoS Comput Biol 2017; 13:e1005368. [PMID: 28129350 PMCID: PMC5302801 DOI: 10.1371/journal.pcbi.1005368] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 02/10/2017] [Accepted: 01/17/2017] [Indexed: 12/15/2022] Open
Abstract
Host factors that facilitate viral entry into cells can, in principle, be identified from a virus-host protein interaction network, but for most viruses information for such a network is limited. To help fill this void, we developed a bioinformatics approach and applied it to hepatitis C virus (HCV) infection, which is a current concern for global health. Using this approach, we identified short linear sequence motifs, conserved in the envelope proteins of HCV (E1/E2), that potentially can bind human proteins present on the surface of hepatocytes so as to construct an HCV (envelope)-host protein interaction network. Gene Ontology functional and KEGG pathway analyses showed that the identified host proteins are enriched in cell entry and carcinogenesis functionalities. The validity of our results is supported by much published experimental data. Our general approach should be useful when developing antiviral agents, particularly those that target virus-host interactions. Viruses recruit host proteins, called entry factors, to help gain entry to host cells. Identification of entry factors can provide targets for developing antiviral drugs. By exploring the concept that short linear peptide motifs involved in human protein-protein interactions may be mimicked by viruses to hijack certain host cellular processes and thereby assist viral infection/survival, we developed a bioinformatics strategy to computationally identify entry factors of hepatitis C virus (HCV) infection, which is a worldwide health problem. Analysis of cellular functions and biochemical pathways indicated that the human proteins we identified usually play a role in cell entry and/or carcinogenesis, and results of the analysis are generally supported by experimental studies on HCV infection, including the ~80% (15 of 19) prediction rate of known HCV hepatocyte entry factors. Because molecular mimicry is a general concept, our bioinformatics strategy is a timely approach to identify new targets for antiviral research, not only for HCV but also for other viruses.
Collapse
Affiliation(s)
| | - Walt Y. L. Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ting Wang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ming-Jing Hwang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
- * E-mail:
| |
Collapse
|
10
|
Sen R, Nayak L, De RK. A review on host-pathogen interactions: classification and prediction. Eur J Clin Microbiol Infect Dis 2016; 35:1581-99. [PMID: 27470504 DOI: 10.1007/s10096-016-2716-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 06/22/2016] [Indexed: 01/01/2023]
Abstract
The research on host-pathogen interactions is an ever-emerging and evolving field. Every other day a new pathogen gets discovered, along with comes the challenge of its prevention and cure. As the intelligent human always vies for prevention, which is better than cure, understanding the mechanisms of host-pathogen interactions gets prior importance. There are many mechanisms involved from the pathogen as well as the host sides while an interaction happens. It is a vis-a-vis fight of the counter genes and proteins from both sides. Who wins depends on whether a host gets an infection or not. Moreover, a higher level of complexity arises when the pathogens evolve and become resistant to a host's defense mechanisms. Such pathogens pose serious challenges for treatment. The entire human population is in danger of such long-lasting persistent infections. Some of these infections even increase the rate of mortality. Hence there is an immediate emergency to understand how the pathogens interact with their host for successful invasion. It may lead to discovery of appropriate preventive measures, and the development of rational therapeutic measures and medication against such infections and diseases. This review, a state-of-the-art updated scenario of host-pathogen interaction research, has been done by keeping in mind this urgency. It covers the biological and computational aspects of host-pathogen interactions, classification of the methods by which the pathogens interact with their hosts, different machine learning techniques for prediction of host-pathogen interactions, and future scopes of this research field.
Collapse
Affiliation(s)
- R Sen
- Machine Intelligence Unit, Indian Statistical Institute, 203, Barrackpore Trunk Road, Kolkata, 700108, India
| | - L Nayak
- Machine Intelligence Unit, Indian Statistical Institute, 203, Barrackpore Trunk Road, Kolkata, 700108, India
| | - R K De
- Machine Intelligence Unit, Indian Statistical Institute, 203, Barrackpore Trunk Road, Kolkata, 700108, India.
| |
Collapse
|
11
|
Badal VD, Kundrotas PJ, Vakser IA. Text Mining for Protein Docking. PLoS Comput Biol 2015; 11:e1004630. [PMID: 26650466 PMCID: PMC4674139 DOI: 10.1371/journal.pcbi.1004630] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 10/29/2015] [Indexed: 11/18/2022] Open
Abstract
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. Protein interactions are central for many cellular processes. Physical characterization of these interactions is essential for understanding of life processes and applications in biology and medicine. Because of the inherent limitations of experimental techniques and rapid development of computational power and methodology, computer modeling is a tool of choice in many studies. Publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for modeling of proteins and protein complexes. A major paradigm shift in modeling of protein complexes is emerging due to the rapidly expanding amount of such information, which can be used as modeling constraints. Text mining has been widely used in recreating networks of protein interactions, as well as in detecting small molecule binding sites on proteins. Combining and expanding these two well-developed areas of research, we applied the text mining to physical modeling of protein complexes (protein docking). Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information. The results show that correct information on binding can be obtained for about half of protein complexes. The extracted constraints were incorporated in a modeling procedure, significantly improving its performance.
Collapse
Affiliation(s)
- Varsha D. Badal
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
| | - Petras J. Kundrotas
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: (IAV); (PJK)
| | - Ilya A. Vakser
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: (IAV); (PJK)
| |
Collapse
|
12
|
Karadeniz İ, Hur J, He Y, Özgür A. Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network. Front Microbiol 2015; 6:1386. [PMID: 26696993 PMCID: PMC4673313 DOI: 10.3389/fmicb.2015.01386] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 11/20/2015] [Indexed: 01/27/2023] Open
Abstract
Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host–pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene–gene interactions from the abstracts of articles in PubMed. The gene–gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene–gene interactions demonstrates that host–pathogen gene–gene interactions occur at experimental conditions which can be ontologically represented. Our results show that the introduced literature mining and ontology-based modeling approach are effective in retrieving and analyzing host–pathogen gene–gene interaction networks.
Collapse
Affiliation(s)
- İlknur Karadeniz
- Department of Computer Engineering, Boğaziçi University Istanbul, Turkey
| | - Junguk Hur
- Department of Basic Sciences, School of Medicine and Health Sciences, University of North Dakota, Grand Forks ND, USA
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, University of Michigan, Ann Arbor MI, USA ; Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor MI, USA ; Comprehensive Cancer Center, University of Michigan Health System, Ann Arbor MI, USA
| | - Arzucan Özgür
- Department of Computer Engineering, Boğaziçi University Istanbul, Turkey
| |
Collapse
|
13
|
Durmuş S, Çakır T, Özgür A, Guthke R. A review on computational systems biology of pathogen-host interactions. Front Microbiol 2015; 6:235. [PMID: 25914674 PMCID: PMC4391036 DOI: 10.3389/fmicb.2015.00235] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 03/10/2015] [Indexed: 12/27/2022] Open
Abstract
Pathogens manipulate the cellular mechanisms of host organisms via pathogen-host interactions (PHIs) in order to take advantage of the capabilities of host cells, leading to infections. The crucial role of these interspecies molecular interactions in initiating and sustaining infections necessitates a thorough understanding of the corresponding mechanisms. Unlike the traditional approach of considering the host or pathogen separately, a systems-level approach, considering the PHI system as a whole is indispensable to elucidate the mechanisms of infection. Following the technological advances in the post-genomic era, PHI data have been produced in large-scale within the last decade. Systems biology-based methods for the inference and analysis of PHI regulatory, metabolic, and protein-protein networks to shed light on infection mechanisms are gaining increasing demand thanks to the availability of omics data. The knowledge derived from the PHIs may largely contribute to the identification of new and more efficient therapeutics to prevent or cure infections. There are recent efforts for the detailed documentation of these experimentally verified PHI data through Web-based databases. Despite these advances in data archiving, there are still large amounts of PHI data in the biomedical literature yet to be discovered, and novel text mining methods are in development to unearth such hidden data. Here, we review a collection of recent studies on computational systems biology of PHIs with a special focus on the methods for the inference and analysis of PHI networks, covering also the Web-based databases and text-mining efforts to unravel the data hidden in the literature.
Collapse
Affiliation(s)
- Saliha Durmuş
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, KocaeliTurkey
| | - Tunahan Çakır
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, KocaeliTurkey
| | - Arzucan Özgür
- Department of Computer Engineering, Boǧaziçi University, IstanbulTurkey
| | - Reinhard Guthke
- Leibniz Institute for Natural Product Research and Infection Biology – Hans-Knoell-Institute, JenaGermany
| |
Collapse
|
14
|
Subramanian N, Torabi-Parizi P, Gottschalk RA, Germain RN, Dutta B. Network representations of immune system complexity. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2015; 7:13-38. [PMID: 25625853 PMCID: PMC4339634 DOI: 10.1002/wsbm.1288] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 12/09/2014] [Accepted: 12/11/2014] [Indexed: 12/25/2022]
Abstract
The mammalian immune system is a dynamic multiscale system composed of a hierarchically organized set of molecular, cellular, and organismal networks that act in concert to promote effective host defense. These networks range from those involving gene regulatory and protein–protein interactions underlying intracellular signaling pathways and single‐cell responses to increasingly complex networks of in vivo cellular interaction, positioning, and migration that determine the overall immune response of an organism. Immunity is thus not the product of simple signaling events but rather nonlinear behaviors arising from dynamic, feedback‐regulated interactions among many components. One of the major goals of systems immunology is to quantitatively measure these complex multiscale spatial and temporal interactions, permitting development of computational models that can be used to predict responses to perturbation. Recent technological advances permit collection of comprehensive datasets at multiple molecular and cellular levels, while advances in network biology support representation of the relationships of components at each level as physical or functional interaction networks. The latter facilitate effective visualization of patterns and recognition of emergent properties arising from the many interactions of genes, molecules, and cells of the immune system. We illustrate the power of integrating ‘omics’ and network modeling approaches for unbiased reconstruction of signaling and transcriptional networks with a focus on applications involving the innate immune system. We further discuss future possibilities for reconstruction of increasingly complex cellular‐ and organism‐level networks and development of sophisticated computational tools for prediction of emergent immune behavior arising from the concerted action of these networks. WIREs Syst Biol Med 2015, 7:13–38. doi: 10.1002/wsbm.1288 This article is categorized under:
Analytical and Computational Methods > Computational Methods Laboratory Methods and Technologies > Macromolecular Interactions, Methods
Collapse
Affiliation(s)
- Naeha Subramanian
- Institute for Systems Biology, Seattle, WA, USA; Laboratory of Systems Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | | |
Collapse
|
15
|
Wang L, Ji P, Qi J, Shan S, Bi Z, Deng W, Zhang N. Feature weighted naïve Bayes algorithm for information retrieval of enterprise systems. ENTERP INF SYST-UK 2013. [DOI: 10.1080/17517575.2013.860481] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
16
|
Donati C, Rappuoli R. Reverse vaccinology in the 21st century: improvements over the original design. Ann N Y Acad Sci 2013; 1285:115-32. [PMID: 23527566 DOI: 10.1111/nyas.12046] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Reverse vaccinology (RV), the first application of genomic technologies in vaccine research, represented a major revolution in the process of discovering novel vaccines. By determining their entire antigenic repertoire, researchers could identify protective targets and design efficacious vaccines for pathogens where conventional approaches had failed. Bexsero, the first vaccine developed using RV, has recently received positive opinion from the European Medicines Agency. The use of RV initiated a cascade of changes that affected the entire vaccine development process, shifting the focus from the identification of a list of vaccine candidates to the definition of a set of high throughput screens to reduce the need for costly and labor intensive tests in animal models. It is now clear that a deep understanding of the epidemiology of vaccine candidates, and their regulation and role in host-pathogen interactions, must become an integral component of the screening workflow. Far from being outdated by technological advancements, RV still represents a paradigm of how high-throughput technologies and scientific insight can be integrated into biotechnology research.
Collapse
|
17
|
Li C, Liakata M, Rebholz-Schuhmann D. Biological network extraction from scientific literature: state of the art and challenges. Brief Bioinform 2013; 15:856-77. [PMID: 23434632 DOI: 10.1093/bib/bbt006] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Networks of molecular interactions explain complex biological processes, and all known information on molecular events is contained in a number of public repositories including the scientific literature. Metabolic and signalling pathways are often viewed separately, even though both types are composed of interactions involving proteins and other chemical entities. It is necessary to be able to combine data from all available resources to judge the functionality, complexity and completeness of any given network overall, but especially the full integration of relevant information from the scientific literature is still an ongoing and complex task. Currently, the text-mining research community is steadily moving towards processing the full body of the scientific literature by making use of rich linguistic features such as full text parsing, to extract biological interactions. The next step will be to combine these with information from scientific databases to support hypothesis generation for the discovery of new knowledge and the extension of biological networks. The generation of comprehensive networks requires technologies such as entity grounding, coordination resolution and co-reference resolution, which are not fully solved and are required to further improve the quality of results. Here, we analyse the state of the art for the extraction of network information from the scientific literature and the evaluation of extraction methods against reference corpora, discuss challenges involved and identify directions for future research.
Collapse
|
18
|
Durmuş Tekir SD, Ülgen KÖ. Systems biology of pathogen-host interaction: networks of protein-protein interaction within pathogens and pathogen-human interactions in the post-genomic era. Biotechnol J 2013; 8:85-96. [PMID: 23193100 PMCID: PMC7161785 DOI: 10.1002/biot.201200110] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Revised: 09/17/2012] [Accepted: 10/11/2012] [Indexed: 12/13/2022]
Abstract
Infectious diseases comprise some of the leading causes of death and disability worldwide. Interactions between pathogen and host proteins underlie the process of infection. Improved understanding of pathogen-host molecular interactions will increase our knowledge of the mechanisms involved in infection, and allow novel therapeutic solutions to be devised. Complete genome sequences for a number of pathogenic microorganisms, as well as the human host, has led to the revelation of their protein-protein interaction (PPI) networks. In this post-genomic era, pathogen-host interactions (PHIs) operating during infection can also be mapped. Detailed systematic analyses of PPI and PHI data together are required for a complete understanding of pathogenesis of infections. Here we review the striking results recently obtained during the construction and investigation of these networks. Emphasis is placed on studies producing large-scale interaction data by high-throughput experimental techniques.
Collapse
Affiliation(s)
| | - Kutlu Ö. Ülgen
- Department of Chemical Engineering, Boǧaziçi University, Istanbul, Turkey
| |
Collapse
|
19
|
Franzosa EA, Garamszegi S, Xia Y. Toward a three-dimensional view of protein networks between species. Front Microbiol 2012; 3:428. [PMID: 23267356 PMCID: PMC3528071 DOI: 10.3389/fmicb.2012.00428] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2012] [Accepted: 12/06/2012] [Indexed: 01/27/2023] Open
Abstract
General principles governing biomolecular interactions between species are expected to differ significantly from known principles governing the interactions within species, yet these principles remain poorly understood at the systems level. A key reason for this knowledge gap is the lack of a detailed three-dimensional (3D), atomistic view of biomolecular interaction networks between species. Recent progress in structural biology, systems biology, and computational biology has enabled accurate and large-scale construction of 3D structural models of nodes and edges for protein–protein interaction networks within and between species. The resulting within- and between-species structural interaction networks have provided new biophysical, functional, and evolutionary insights into species interactions and infectious disease. Here, we review the nascent field of between-species structural systems biology, focusing on interactions between host and pathogens such as viruses.
Collapse
|
20
|
Zhou H, Jin J, Wong L. Progress in computational studies of host-pathogen interactions. J Bioinform Comput Biol 2012; 11:1230001. [PMID: 23600809 DOI: 10.1142/s0219720012300018] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Host-pathogen interactions are important for understanding infection mechanism and developing better treatment and prevention of infectious diseases. Many computational studies on host-pathogen interactions have been published. Here, we review recent progress and results in this field and provide a systematic summary, comparison and discussion of computational studies on host-pathogen interactions, including prediction and analysis of host-pathogen protein-protein interactions; basic principles revealed from host-pathogen interactions; and database and software tools for host-pathogen interaction data collection, integration and analysis.
Collapse
Affiliation(s)
- Hufeng Zhou
- NUS Graduate School for Integrative Sciences & Engineering, National University of Singapore, Singapore 117456, Singapore.
| | | | | |
Collapse
|
21
|
Arnold R, Boonen K, Sun MG, Kim PM. Computational analysis of interactomes: current and future perspectives for bioinformatics approaches to model the host-pathogen interaction space. Methods 2012; 57:508-18. [PMID: 22750305 PMCID: PMC7128575 DOI: 10.1016/j.ymeth.2012.06.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Revised: 06/20/2012] [Accepted: 06/21/2012] [Indexed: 11/05/2022] Open
Abstract
Bacterial and viral pathogens affect their eukaryotic host partly by interacting with proteins of the host cell. Hence, to investigate infection from a systems' perspective we need to construct complete and accurate host-pathogen protein-protein interaction networks. Because of the paucity of available data and the cost associated with experimental approaches, any construction and analysis of such a network in the near future has to rely on computational predictions. Specifically, this challenge consists of a number of sub-problems: First, prediction of possible pathogen interactors (e.g. effector proteins) is necessary for bacteria and protozoa. Second, the prospective host binding partners have to be determined and finally, the impact on the host cell analyzed. This review gives an overview of current bioinformatics approaches to obtain and understand host-pathogen interactions. As an application example of the methods covered, we predict host-pathogen interactions of Salmonella and discuss the value of these predictions as a prospective for further research.
Collapse
Affiliation(s)
- Roland Arnold
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1
| | - Kurt Boonen
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1
| | - Mark G.F. Sun
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1
- Banting and Best Department of Medical Research, University of Toronto, Toronto, ON, Canada M5S 3E1
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada M5S 3E1
- Department of Computer Science, University of Toronto, Toronto, ON, Canada M5S 3E1
| |
Collapse
|