1
|
Jia J, Fan X, Zhang W, Xu Z, Wu M, Zhan Y, Fan B. Predictive model for totally implanted venous access ports‑related long‑term complications in patients with lung cancer. Oncol Lett 2024; 28:326. [PMID: 38807672 PMCID: PMC11130750 DOI: 10.3892/ol.2024.14459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 04/30/2024] [Indexed: 05/30/2024] Open
Abstract
Totally implanted venous access ports (TIVAPs), which are typically used in oncological chemotherapy and parenteral nutritional support, are convenient and safe, and thus offer patients a higher quality of life. However, insertion or removal of the device requires a minor surgical operation. Long-term complications (>30 days post insertion), such as catheter migration, catheter-related thrombosis and infection, are major reasons for TIVAP removal and are associated with a number of factors such as body mass index and hemoglobin count. Since management of complications is typically time-consuming and costly, a predictive model of such events may be of great value. Therefore, in the present study, a predictive model for long-term complications following TIVAP implantation in patients with lung cancer was developed. After excluding patients with a large amount of missing data, 902 patients admitted to The First Affiliated Hospital with Nanjing Medical University (Nanjing, China) were ultimately included in the present study. Of the included patients, 28 had complications, indicating an incidence rate of 3.1%. Patients were randomly divided into training and test cohorts (7:3), and three machine learning-based anomaly detection algorithms, namely, the Isolation Forest, one-class Support Vector Machines (one-class SVM) and Local Outlier Factor, were used to construct a model. The performance of the model was initially evaluated by the Matthew's correlation coefficient (MCC), area under curve (AUC) and accuracy. The one-class SVM model demonstrated the highest performance in classifying the risk of complications associated with the use of the intracavitary electrocardiogram method for TIVAP implantation in patients with lung cancer (MCC, 0.078; AUC, 0.62; accuracy, 66.0%). In conclusion, the predictive model developed in the present study may be used to improve the early detection of TIVAP-related complications in patients with lung cancer, which could lead to the conservation of medical resources and the promotion of medical advances.
Collapse
Affiliation(s)
- Jian Jia
- Department of General Practice, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu 210029, P.R. China
- School of Business, Nanjing University, Nanjing, Jiangsu 210093, P.R. China
| | - Xutong Fan
- Department of Geriatrics, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu 210029, P.R. China
| | - Wenhong Zhang
- School of Business, Nanjing University, Nanjing, Jiangsu 210093, P.R. China
- National Institute of Healthcare Data Science, Nanjing University, Nanjing, Jiangsu 210093, P.R. China
| | - Zhiyang Xu
- Department of Geriatrics, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu 210029, P.R. China
| | - Mian Wu
- Department of Geriatrics, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu 210029, P.R. China
| | - Yiyang Zhan
- Department of Geriatrics, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu 210029, P.R. China
| | - Boqiang Fan
- Department of Oncology, The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu 210029, P.R. China
| |
Collapse
|
2
|
Iuchi H, Kawasaki J, Kubo K, Fukunaga T, Hokao K, Yokoyama G, Ichinose A, Suga K, Hamada M. Bioinformatics approaches for unveiling virus-host interactions. Comput Struct Biotechnol J 2023; 21:1774-1784. [PMID: 36874163 PMCID: PMC9969756 DOI: 10.1016/j.csbj.2023.02.044] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Junna Kawasaki
- Faculty of Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Nishi Waseda, Shinjuku-ku, Tokyo 169-0051, Japan
| | - Koki Hokao
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Gentaro Yokoyama
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Akiko Ichinose
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kanta Suga
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
3
|
Dong TN, Brogden G, Gerold G, Khosla M. A multitask transfer learning framework for the prediction of virus-human protein-protein interactions. BMC Bioinformatics 2021; 22:572. [PMID: 34837942 PMCID: PMC8626732 DOI: 10.1186/s12859-021-04484-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 11/15/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Viral infections are causing significant morbidity and mortality worldwide. Understanding the interaction patterns between a particular virus and human proteins plays a crucial role in unveiling the underlying mechanism of viral infection and pathogenesis. This could further help in prevention and treatment of virus-related diseases. However, the task of predicting protein-protein interactions between a new virus and human cells is extremely challenging due to scarce data on virus-human interactions and fast mutation rates of most viruses. RESULTS We developed a multitask transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome to counter the problem of small training datasets. Instead of using hand-crafted protein features, we utilize statistically rich protein representations learned by a deep language modeling approach from a massive source of protein sequences. Additionally, we employ an additional objective which aims to maximize the probability of observing human protein-protein interactions. This additional task objective acts as a regularizer and also allows to incorporate domain knowledge to inform the virus-human protein-protein interaction prediction model. CONCLUSIONS Our approach achieved competitive results on 13 benchmark datasets and the case study for the SARS-COV-2 virus receptor. Experimental results show that our proposed model works effectively for both virus-human and bacteria-human protein-protein interaction prediction tasks. We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/multitask-transfer .
Collapse
Affiliation(s)
- Thi Ngan Dong
- L3S Research Center, Leibniz University Hannover, Hannover, Germany.
| | - Graham Brogden
- Institute for Biochemistry, University of Veterinary Medicine, Hannover, Germany.,Institute of Experimental Virology, TWINCORE, Center for Experimental and Clinical Infection Research Hannover, Hannover, Germany
| | - Gisa Gerold
- Institute for Biochemistry, University of Veterinary Medicine, Hannover, Germany.,Institute of Experimental Virology, TWINCORE, Center for Experimental and Clinical Infection Research Hannover, Hannover, Germany.,Department of Clinical Microbiology, Umeå University, Umeå, Sweden.,Wallenberg Centre for Molecular Medicine (WCMM), Umeå University, Umeå, Sweden
| | - Megha Khosla
- L3S Research Center, Leibniz University Hannover, Hannover, Germany
| |
Collapse
|
4
|
Sharma G, Rana PS, Bawa S. Hybrid Machine Learning Models for Predicting Types of Human T-cell Lymphotropic Virus. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1524-1534. [PMID: 31567100 DOI: 10.1109/tcbb.2019.2944610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Life threatening diseases like adult T-cell leukemia, neurodegenerative diseases, and demyelinating diseases such as HTLV-1 based myelopathy/tropical spastic paraparesis (HAM/TSP), hypocalcaemia, and bone lesions are caused by a group of human retrovirus known as Human T-cell Lymphotropic virus (HTLV). Out of the four different types of HTLVs, HTLV-1 is most prominent in scourging over 20 million people around the world and still not much effort has been made in understanding the epidemiology and controlling the prevalence of this virus. This condition further worsens when most of the infected cases remain asymptomatic throughout their lifetime due to the limited diagnostic methods; that are most of the times unavailable for timely detection of infected individuals. Moreover, at present, there is no licensed vaccination for HTLV-1 infection. Therefore, there is a need to develop the faster and efficient diagnostic method for the detection of HTLV-1. Influenced from the outcomes of the machine learning techniques in the field of bio-informatics, this is the first study in which 64 hybrid machine learning techniques have been proposed for the prediction of different type of HTLVs (HTLV-1, HTLV-2, and HTLV-3). The hybrid techniques are built by permutation and combination of four classification methods, four feature weighting, and four feature selection techniques. The proposed hybrid models when evaluated on the basis of various model evaluation parameters are found to be capable of efficiently predicting the type of HTLVs. The best hybrid model has been identified by having accuracy, an AUROC value, and F1 score of 99.85 percent, 0.99, and 0.99, respectively. This kind of the system can assist the current diagnostic system for the detection of HTLV-1 as after the molecular diagnostics of HTLV by various screening tests like enzyme-linked immunoassay or particle agglutination assays there is always a need of confirmatory tests like western blotting, immuno-fluorescence assay, or radio-immuno-precipitation assay for distinguishing HTLV-1 from HTLV-2. These confirmatory tests are indeed very complex analytical techniques involving various steps. The proposed hybrid techniques can be used to support and verify the results of confirmatory test from the protein mixture. Furthermore, better insights about the virus can be obtained by exploring the physicochemical properties of the protein sequences of HTLVs.
Collapse
|
5
|
Li J, Wang S, Chen Z, Wang Y. A Bipartite Network Module-Based Project to Predict Pathogen-Host Association. Front Genet 2020; 10:1357. [PMID: 32038713 PMCID: PMC6992693 DOI: 10.3389/fgene.2019.01357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 12/11/2019] [Indexed: 12/23/2022] Open
Abstract
Pathogen-host interactions play an important role in understanding the mechanism by which a pathogen can infect its host. Some approaches for predicting pathogen-host association have been developed, but prediction accuracy is still low. In this paper, we propose a bipartite network module-based approach to improve prediction accuracy. First, a bipartite network with pathogens and hosts is constructed. Next, pathogens and hosts are divided into different modules respectively. Then, modular information on the pathogens and hosts is added into a bipartite network projection model and the association scores between pathogens and hosts are calculated. Finally, leave-one-out cross-validation is used to estimate the performance of the proposed method. Experimental results show that the proposed method performs better in predicting pathogen-host association than other methods, and some potential pathogen-host associations with higher prediction scores are also confirmed by the results of biological experiments in the publically available literature.
Collapse
Affiliation(s)
- Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | | | | | | |
Collapse
|
6
|
Mei S, Zhang K. In silico unravelling pathogen-host signaling cross-talks via pathogen mimicry and human protein-protein interaction networks. Comput Struct Biotechnol J 2019; 18:100-113. [PMID: 31956393 PMCID: PMC6956678 DOI: 10.1016/j.csbj.2019.12.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/07/2019] [Accepted: 12/14/2019] [Indexed: 01/08/2023] Open
Abstract
Pathogen-host protein interactions are fundamental for pathogens to manipulate host signaling pathways and subvert host immune defense. For most pathogens, very few or no experimental studies have been conducted to investigate their signaling cross-talks with host. In this study, we propose a computational framework to validate the biological assumption that human protein-protein interaction (PPI) networks alone are sufficient to infer pathogen-host PPIs via pathogen functional mimicry. Pathogen functional mimicry assumes that a pathogen functionally mimics and substitutes host counterpart proteins in order for the pathogen to get involved in or hijack the host cellular processes. Through pathogen functional mimicry defined via gene ontology (GO) semantic similarity, we first use the known human PPIs as templates to infer pathogen-host PPIs, and the PPIs are further used as training data to build an l2-regularized logistic regression model for novel pathogen-host PPI prediction. Independent tests on the experimental data from human immunodeficiency virus and Francisella tularensis validate the effectiveness of the proposed pathogen functional mimicry technique. Performance comparisons also show that the proposed technique y excels the existing pathogen sequence mimicry approaches and transfer learning methods. The proposed framework provides a new avenue to study the experimentally less-studied pathogens in the worst scenarios that very few or no experimental pathogen-host PPIs are available. As two case studies, we apply the proposed framework to Salmonella typhimurium and Human respiratory syncytial virus to reconstruct the pathogen-host PPI networks and further investigate the interference of these two pathogens with human immune signaling and transcription regulatory system.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang 110034, China
| | - Kun Zhang
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
| |
Collapse
|
7
|
Zheng N, Wang K, Zhan W, Deng L. Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches. Curr Drug Metab 2019; 20:177-184. [PMID: 30156155 DOI: 10.2174/1389200219666180829121038] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 01/15/2023]
Abstract
BACKGROUND Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions. METHODS In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods. RESULTS We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions. CONCLUSION The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.
Collapse
Affiliation(s)
- Nantao Zheng
- School of Software, Central South University, Changsha, 410075, China
| | - Kairou Wang
- School of Software, Central South University, Changsha, 410075, China
| | - Weihua Zhan
- School of Electronics and Computer Science, Zhejiang Wanli University, Ningbo 315100, China
| | - Lei Deng
- School of Software, Central South University, Changsha, 410075, China.,Shanghai Key Lab of Intelligent Information Processing, Shanghai 200433, China
| |
Collapse
|
8
|
Application of Support Vector Machines in Viral Biology. GLOBAL VIROLOGY III: VIROLOGY IN THE 21ST CENTURY 2019. [PMCID: PMC7114997 DOI: 10.1007/978-3-030-29022-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Novel experimental and sequencing techniques have led to an exponential explosion and spiraling of data in viral genomics. To analyse such data, rapidly gain information, and transform this information to knowledge, interdisciplinary approaches involving several different types of expertise are necessary. Machine learning has been in the forefront of providing models with increasing accuracy due to development of newer paradigms with strong fundamental bases. Support Vector Machines (SVM) is one such robust tool, based rigorously on statistical learning theory. SVM provides very high quality and robust solutions to classification and regression problems. Several studies in virology employ high performance tools including SVM for identification of potentially important gene and protein functions. This is mainly due to the highly beneficial aspects of SVM. In this chapter we briefly provide lucid and easy to understand details of SVM algorithms along with applications in virology.
Collapse
|
9
|
Sun J, Yang LL, Chen X, Kong DX, Liu R. Integrating Multifaceted Information to Predict Mycobacterium tuberculosis-Human Protein-Protein Interactions. J Proteome Res 2018; 17:3810-3823. [PMID: 30269499 DOI: 10.1021/acs.jproteome.8b00497] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Tuberculosis (TB) is one of the biggest infectious disease killers caused by Mycobacterium tuberculosis (MTB). Studying the protein-protein interactions (PPIs) between MTB and human can deepen our understanding of the pathogenesis of TB and offer new clues to the treatment against MTB infection, but the experimentally validated interactions are especially scarce in this regard. Herein we proposed an integrated framework that combined template-, domain-domain interaction-, and machine learning-based methods to predict MTB-human PPIs. As a result, we established a network composed of 13 758 PPIs including 451 MTB proteins and 3167 human proteins ( http://liulab.hzau.edu.cn/MTB/ ). Compared to known human targets of various pathogens, our predicted human targets show a similar tendency in terms of the network topological properties and enrichment in important functional genes. Additionally, these human targets largely have longer sequence lengths, more protein domains, more disordered residues, lower evolutionary rates, and older protein ages. Functional analysis demonstrates that these proteins show strong preferences toward the phosphorylation, kinase activity, and signaling transduction processes and the disease and immune related pathways. Dissecting the cross-talk among top-ranked pathways suggests that the cancer pathway may serve as a bridge in MTB infection. Triplet analysis illustrates that the paired targets interacting with the same partner are adjacent to each other in the intraspecies network and tend to share similar expression patterns. Finally, we identified 36 potential anti-MTB human targets by integrating known drug target information and molecular properties of proteins.
Collapse
|
10
|
Mei S, Flemington EK, Zhang K. Transferring knowledge of bacterial protein interaction networks to predict pathogen targeted human genes and immune signaling pathways: a case study on M. tuberculosis. BMC Genomics 2018; 19:505. [PMID: 29954330 PMCID: PMC6027805 DOI: 10.1186/s12864-018-4873-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 06/18/2018] [Indexed: 12/11/2022] Open
Abstract
Background Bacterial invasive infection and host immune response is fundamental to the understanding of pathogen pathogenesis and the discovery of effective therapeutic drugs. However, there are very few experimental studies on the signaling cross-talks between bacteria and human host to date. Methods In this work, taking M. tuberculosis H37Rv (MTB) that is co-evolving with its human host as an example, we propose a general computational framework that exploits the known bacterial pathogen protein interaction networks in STRING database to predict pathogen-host protein interactions and their signaling cross-talks. In this framework, significant interlogs are derived from the known pathogen protein interaction networks to train a predictive l2-regularized logistic regression model. Results The computational results show that the proposed method achieves excellent performance of cross validation as well as low predicted positive rates on the less significant interlogs and non-interlogs, indicating a low risk of false discovery. We further conduct gene ontology (GO) and pathway enrichment analyses of the predicted pathogen-host protein interaction networks, which potentially provides insights into the machinery that M. tuberculosis H37Rv targets human genes and signaling pathways. In addition, we analyse the pathogen-host protein interactions related to drug resistance, inhibition of which potentially provides an alternative solution to M. tuberculosis H37Rv drug resistance. Conclusions The proposed machine learning framework has been verified effective for predicting bacteria-host protein interactions via known bacterial protein interaction networks. For a vast majority of bacterial pathogens that lacks experimental studies of bacteria-host protein interactions, this framework is supposed to achieve a general-purpose applicability. The predicted protein interaction networks between M. tuberculosis H37Rv and Homo sapiens, provided in the Additional files, promise to gain applications in the two fields: (1) providing an alternative solution to drug resistance; (2) revealing the patterns that M. tuberculosis H37Rv genes target human immune signaling pathways. Electronic supplementary material The online version of this article (10.1186/s12864-018-4873-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, 110034, China.
| | - Erik K Flemington
- Department of Pathology, Tulane Cancer Center, Tulane University, New Orleans, LA, 70112, USA.
| | - Kun Zhang
- Department of Computer Science, Bioinformatics facility of Xavier NIH RCMI Cancer Research Center, Xavier University of Louisiana, New Orleans, LA, 70125, USA.
| |
Collapse
|
11
|
Mei S, Flemington EK, Zhang K. A computational framework for distinguishing direct versus indirect interactions in human functional protein-protein interaction networks. Integr Biol (Camb) 2018; 9:595-606. [PMID: 28524201 DOI: 10.1039/c7ib00013h] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Recognition of indirect interactions is instrumental to in silico reconstruction of signaling pathways and sheds light on the exploration of unknown physical paths between two indirectly interacting genes. However, very limited computational methods have explicitly exploited the indirect interactions with experimental evidence thus far. In this work, we attempt to distinguish direct versus indirect interactions in human functional protein-protein interaction (PPI) networks via a predictive l2-regularized logistic regression model built on the experimental data. The l2-regularized logistic regression method is adopted to counteract the potential homolog noise and reduce the computational complexity on large training data. Computational results show that the proposed model demonstrates promising performance even though the training data are highly skewed. From the 304 799 PPIs that are curated in several databases, the proposed method detects 23 131 indirect interactions, most of which have been verified by the breadth-first graph search algorithm to find dozens of physical paths between the interacting partners. Pathway enrichment analysis shows that most of the physical paths can be mapped onto more than one human signaling pathway, indicating that there do exist a series of biochemical signals between the two indirectly interacting genes. The interactome-scale computational results promise to provide useful cues to the following applications: (1) exploration of unknown physical PPIs or physical paths between two indirectly interacting genes; (2) amending or extending the existing signaling pathways; (3) recognition of the physical PPIs for druggable target discovery.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, 110034, China.
| | | | | |
Collapse
|
12
|
Hsu YY, Wei CH, Lu Z. Assisting document triage for human kinome curation via machine learning. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5094578. [PMID: 30239677 PMCID: PMC6146134 DOI: 10.1093/database/bay091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 08/13/2018] [Indexed: 11/16/2022]
Abstract
In the era of data explosion, the increasing frequency of published articles presents unorthodox challenges to fulfill specific curation requirements for bio-literature databases. Recognizing these demands, we designed a document triage system with automatic methods that can improve efficiency to retrieve the most relevant articles in curation workflows and reduce workloads for biocurators. Since the BioCreative VI (2017), we have implemented texting mining processing in our system in hopes of providing higher effectiveness for curating articles related to human kinase proteins. We tested several machine learning methods together with state-of-the-art concept extraction tools. For features, we extracted rich co-occurrence and linguistic information to model the curation process of human kinome articles by the neXtProt database. As shown in the official evaluation on the human kinome curation task in BioCreative VI, our system can effectively retrieve 5.2 and 6.5 kinase articles with the relevant disease (DIS) and biological process (BP) information, respectively, among the top 100 returned results. Comparing to neXtA5, our system demonstrates significant improvements in prioritizing kinome-related articles as follows: our system achieves 0.458 and 0.109 for the DIS axis whereas the neXtA5’s best-reported mean average precision (MAP) and maximum precision observed are 0.41 and 0.04. Our system also outperforms the neXtA5 in retrieving BP axis with 0.195 for MAP and the neXtA5’s reported value was 0.11. These results suggest that our system may be able to assist neXtProt biocurators in practice.
Collapse
Affiliation(s)
- Yi-Yu Hsu
- National Center for Biotechnology Information, Bethesda, MD, USA
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information, Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, Bethesda, MD, USA
| |
Collapse
|
13
|
Vyas R, Bapat S, Goel P, Karthikeyan M, Tambe SS, Kulkarni BD. Application of Genetic Programming (GP) Formalism for Building Disease Predictive Models from Protein-Protein Interactions (PPI) Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:27-37. [PMID: 28113781 DOI: 10.1109/tcbb.2016.2621042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Protein-protein interactions (PPIs) play a vital role in the biological processes involved in the cell functions and disease pathways. The experimental methods known to predict PPIs require tremendous efforts and the results are often hindered by the presence of a large number of false positives. Herein, we demonstrate the use of a new Genetic Programming (GP) based Symbolic Regression (SR) approach for predicting PPIs related to a disease. In a case study, a dataset consisting of one hundred and thirty five PPI complexes related to cancer was used to construct a generic PPI predicting model with good PPI prediction accuracy and generalization ability. A high correlation coefficient(CC) of 0.893, low root mean square error (RMSE) and mean absolute percentage error (MAPE) values of 478.221 and 0.239, respectively were achieved for both the training and test set outputs. To validate the discriminatory nature of the model, it was applied on a dataset of diabetes complexes where it yielded significantly low CC values. Thus, the GP model developed here serves a dual purpose: (a)a predictor of the binding energy of cancer related PPI complexes, and (b)a classifier for discriminating PPI complexes related to cancer from those of other diseases.
Collapse
|
14
|
Multi-label ℓ 2-regularized logistic regression for predicting activation/inhibition relationships in human protein-protein interaction networks. Sci Rep 2016; 6:36453. [PMID: 27819359 PMCID: PMC5098220 DOI: 10.1038/srep36453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 10/17/2016] [Indexed: 11/30/2022] Open
Abstract
Protein-protein interaction (PPI) networks are naturally viewed as infrastructure to infer signalling pathways. The descriptors of signal events between two interacting proteins such as upstream/downstream signal flow, activation/inhibition relationship and protein modification are indispensable for inferring signalling pathways from PPI networks. However, such descriptors are not available in most cases as most PPI networks are seldom semantically annotated. In this work, we extend ℓ2-regularized logistic regression to the scenario of multi-label learning for predicting the activation/inhibition relationships in human PPI networks. The phenomenon that both activation and inhibition relationships exist between two interacting proteins is computationally modelled by multi-label learning framework. The problem of GO (gene ontology) sparsity is tackled by introducing the homolog knowledge as independent homolog instances. ℓ2-regularized logistic regression is accordingly adopted here to penalize the homolog noise and to reduce the computational complexity of the double-sized training data. Computational results show that the proposed method achieves satisfactory multi-label learning performance and outperforms the existing phenotype correlation method on the experimental data of Drosophila melanogaster. Several predictions have been validated against recent literature. The predicted activation/inhibition relationships in human PPI networks are provided in the supplementary file for further biomedical research.
Collapse
|
15
|
Mei S, Zhang K. Computational discovery of Epstein-Barr virus targeted human genes and signalling pathways. Sci Rep 2016; 6:30612. [PMID: 27470517 PMCID: PMC4965740 DOI: 10.1038/srep30612] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 07/05/2016] [Indexed: 12/22/2022] Open
Abstract
Epstein-Barr virus (EBV) plays important roles in the origin and the progression of human carcinomas, e.g. diffuse large B cell tumors, T cell lymphomas, etc. Discovering EBV targeted human genes and signaling pathways is vital to understand EBV tumorigenesis. In this study we propose a noise-tolerant homolog knowledge transfer method to reconstruct functional protein-protein interactions (PPI) networks between Epstein-Barr virus and Homo sapiens. The training set is augmented via homolog instances and the homolog noise is counteracted by support vector machine (SVM). Additionally we propose two methods to define subcellular co-localization (i.e. stringent and relaxed), based on which to further derive physical PPI networks. Computational results show that the proposed method achieves sound performance of cross validation and independent test. In the space of 648,672 EBV-human protein pairs, we obtain 51,485 functional interactions (7.94%), 869 stringent physical PPIs and 46,050 relaxed physical PPIs. Fifty-eight evidences are found from the latest database and recent literature to validate the model. This study reveals that Epstein-Barr virus interferes with normal human cell life, such as cholesterol homeostasis, blood coagulation, EGFR binding, p53 binding, Notch signaling, Hedgehog signaling, etc. The proteome-wide predictions are provided in the supplementary file for further biomedical research.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, 110034, China
| | - Kun Zhang
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
| |
Collapse
|
16
|
Uncovering New Pathogen-Host Protein-Protein Interactions by Pairwise Structure Similarity. PLoS One 2016; 11:e0147612. [PMID: 26799490 PMCID: PMC4723085 DOI: 10.1371/journal.pone.0147612] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 01/06/2016] [Indexed: 01/31/2023] Open
Abstract
Pathogens usually evade and manipulate host-immune pathways through pathogen-host protein-protein interactions (PPIs) to avoid being killed by the host immune system. Therefore, uncovering pathogen-host PPIs is critical for determining the mechanisms underlying pathogen infection and survival. In this study, we developed a computational method, which we named pairwise structure similarity (PSS)-PPI, to predict pathogen-host PPIs. First, a high-quality and non-redundant structure-structure interaction (SSI) template library was constructed by exhaustively exploring heteromeric protein complex structures in the PDB database. New interactions were then predicted by searching for PSS with complex structures in the SSI template library. A quantitative score named the PSS score, which integrated structure similarity and residue-residue contact-coverage information, was used to describe the overall similarity of each predicted interaction with the corresponding SSI template. Notably, PSS-PPI yielded experimentally confirmed pathogen-host PPIs of human immunodeficiency virus type 1 (HIV-1) with performance close to that of in vitro high-throughput screening approaches. Finally, a pathogen-host PPI network of human pathogen Mycobacterium tuberculosis, the causative agent of tuberculosis, was constructed using PSS-PPI and refined using filtration steps based on cellular localization information. Analysis of the resulting network indicated that secreted proteins of the STPK, ESX-1, and PE/PPE family in M. tuberculosis targeted human proteins involved in immune response and phagocytosis. M. tuberculosis also targeted host factors known to regulate HIV replication. Taken together, our findings provide insights into the survival mechanisms of M. tuberculosis in human hosts, as well as co-infection of tuberculosis and HIV. With the rapid pace of three-dimensional protein structure discovery, the SSI template library we constructed and the PSS-PPI method we devised can be used to uncover new pathogen-host PPIs in the future.
Collapse
|
17
|
Eid FE, ElHefnawi M, Heath LS. DeNovo: virus-host sequence-based protein–protein interaction prediction. Bioinformatics 2015; 32:1144-50. [DOI: 10.1093/bioinformatics/btv737] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 12/12/2015] [Indexed: 01/02/2023] Open
|
18
|
Mei S, Zhu H. A simple feature construction method for predicting upstream/downstream signal flow in human protein-protein interaction networks. Sci Rep 2015; 5:17983. [PMID: 26648121 PMCID: PMC4673612 DOI: 10.1038/srep17983] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/10/2015] [Indexed: 12/24/2022] Open
Abstract
Signaling pathways play important roles in understanding the underlying mechanism of cell growth, cell apoptosis, organismal development and pathways-aberrant diseases. Protein-protein interaction (PPI) networks are commonly-used infrastructure to infer signaling pathways. However, PPI networks generally carry no information of upstream/downstream relationship between interacting proteins, which retards our inferring the signal flow of signaling pathways. In this work, we propose a simple feature construction method to train a SVM (support vector machine) classifier to predict PPI upstream/downstream relations. The domain based asymmetric feature representation naturally embodies domain-domain upstream/downstream relations, providing an unconventional avenue to predict the directionality between two objects. Moreover, we propose a semantically interpretable decision function and a macro bag-level performance metric to satisfy the need of two-instance depiction of an interacting protein pair. Experimental results show that the proposed method achieves satisfactory cross validation performance and independent test performance. Lastly, we use the trained model to predict the PPIs in HPRD, Reactome and IntAct. Some predictions have been validated against recent literature.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, China.,Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Hao Zhu
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| |
Collapse
|