1
|
Ozger ZB. A robust protein language model for SARS-CoV-2 protein-protein interaction network prediction. Artif Intell Med 2023; 142:102574. [PMID: 37316102 DOI: 10.1016/j.artmed.2023.102574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/17/2023] [Accepted: 04/27/2023] [Indexed: 06/16/2023]
Abstract
Protein-protein interaction is one of the ways viruses interact with their hosts. Therefore, identifying protein interactions between viruses and hosts helps explain how virus proteins work, how they replicate, and how they cause disease. SARS-CoV-2 is a new type of virus that emerged from the coronavirus family in 2019 and caused a worldwide pandemic. Detection of human proteins interacting with this novel virus strain plays an important role in monitoring the cellular process of virus-associated infection. Within the scope of the study, a natural language processing-based collective learning method is proposed for the prediction of potential SARS-CoV-2-human PPIs. Protein language models were obtained with the prediction-based word2Vec and doc2Vec embedding methods and the frequency-based tf-idf method. Known interactions were represented by proposed language models and traditional feature extraction methods (conjoint triad and repeat pattern), and their performances were compared. The interaction data were trained with support vector machine, artificial neural network (ANN), k-nearest neighbor (KNN), naive Bayes (NB), decision tree (DT), and ensemble algorithms. Experimental results show that protein language models are a promising protein representation method for protein-protein interaction prediction. The term frequency-inverse document frequency-based language model performed the SARS-CoV-2 protein-protein interaction estimation with an error of 1.4%. Additionally, the decisions of high-performing learning models for different feature extraction methods were combined with a collective voting approach to make new interaction predictions. For 10,000 human proteins, 285 new potential interactions were predicted, with models combining decisions.
Collapse
Affiliation(s)
- Zeynep Banu Ozger
- Department of Computer Engineering, Sutcu Imam University, 46040, Kahramanmaras, Turkey.
| |
Collapse
|
2
|
Pan J, You W, Lu X, Wang S, You Z, Sun Y. GSPHI: A novel deep learning model for predicting phage-host interactions via multiple biological information. Comput Struct Biotechnol J 2023; 21:3404-3413. [PMID: 37397626 PMCID: PMC10314231 DOI: 10.1016/j.csbj.2023.06.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 06/14/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Emerging evidence suggests that due to the misuse of antibiotics, bacteriophage (phage) therapy has been recognized as one of the most promising strategies for treating human diseases infected by antibiotic-resistant bacteria. Identification of phage-host interactions (PHIs) can help to explore the mechanisms of bacterial response to phages and provide new insights into effective therapeutic approaches. Compared to conventional wet-lab experiments, computational models for predicting PHIs can not only save time and cost, but also be more efficient and economical. In this study, we developed a deep learning predictive framework called GSPHI to identify potential phage and target bacterium pairs through DNA and protein sequence information. More specifically, GSPHI first initialized the node representations of phages and target bacterial hosts via a natural language processing algorithm. Then a graph embedding algorithm structural deep network embedding (SDNE) was utilized to extract local and global information from the interaction network, and finally, a deep neural network (DNN) was applied to accurately detect the interactions between phages and their bacterial hosts. In the drug-resistant bacteria dataset ESKAPE, GSPHI achieved a prediction accuracy of 86.65 % and AUC of 0.9208 under the 5-fold cross-validation technique, significantly better than other methods. In addition, case studies in Gram-positive and negative bacterial species demonstrated that GSPHI is competent in detecting potential Phage-host interactions. Taken together, these results indicate that GSPHI can provide reasonable candidate sensitive bacteria to phages for biological experiments. The webserver of the GSPHI predictor is freely available at http://120.77.11.78/GSPHI/.
Collapse
Affiliation(s)
- Jie Pan
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, The College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Wencai You
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, The College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Xiaoliang Lu
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, The College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Shiwei Wang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, The College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| | - Yanmei Sun
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, The College of Life Sciences, Northwest University, Xi’an 710069, China
| |
Collapse
|
3
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
4
|
Iuchi H, Kawasaki J, Kubo K, Fukunaga T, Hokao K, Yokoyama G, Ichinose A, Suga K, Hamada M. Bioinformatics approaches for unveiling virus-host interactions. Comput Struct Biotechnol J 2023; 21:1774-1784. [PMID: 36874163 PMCID: PMC9969756 DOI: 10.1016/j.csbj.2023.02.044] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Junna Kawasaki
- Faculty of Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Nishi Waseda, Shinjuku-ku, Tokyo 169-0051, Japan
| | - Koki Hokao
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Gentaro Yokoyama
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Akiko Ichinose
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kanta Suga
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
5
|
Liu X, Wang L, Liang CH, Lu YP, Yang T, Zhang X. An enhanced methodology for predicting protein-protein interactions between human and hepatitis C virus via ensemble learning algorithms. J Biomol Struct Dyn 2022; 40:10592-10602. [PMID: 34251992 DOI: 10.1080/07391102.2021.1946429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Hepatitis C virus (HCV) is responsible for a variety of human life-threatening diseases, which include liver cirrhosis, chronic hepatitis, fibrosis and hepatocellular carcinoma (HCC) . Computational study of protein-protein interactions between human and HCV could boost the findings of antiviral drugs in HCV therapy and might optimize the treatment procedures for HCV infections. In this analysis, we constructed a prediction model for protein-protein interactions between HCV and human by incorporating the features generated by pseudo amino acid compositions, which were then carried out at two levels: categories and features. In brief, extra-tree was initially used for feature selection while SVM was then used to build the classification model. After that, the most suitable models for each category and each feature were selected by comparing with the three ensemble learning algorithms, that is, Random Forest, Adaboost, and Xgboost. According to our results, profile-based features were more suitable for building predictive models among the four categories. AUC value of the model constructed by Xgboost algorithm on independent data set could reach 92.66%. Moreover, Distance-based Residue, Physicochemical Distance Transformation and Profile-based Physicochemical Distance Transformation performed much better among the 17 features. AUC value of the Adaboost classifier constructed by Profile-based Physicochemical Distance Transformation on the independent dataset achieved 93.74%. Taken together, we proposed a better model with improved prediction capacity for protein-protein interactions between human and HCV in this study, which could provide practical reference for further experimental investigation into HCV-related diseases in future.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Xin Liu
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Liang Wang
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China.,Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Cheng-Hao Liang
- School of Life Science, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ya-Ping Lu
- College of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Ting Yang
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Xiao Zhang
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, China
| |
Collapse
|
6
|
Asim MN, Fazeel A, Ibrahim MA, Dengel A, Ahmed S. MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses. Front Med (Lausanne) 2022; 9:1025887. [PMID: 36465911 PMCID: PMC9709337 DOI: 10.3389/fmed.2022.1025887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/17/2022] [Indexed: 09/19/2023] Open
Abstract
Viral-host protein-protein interaction (VHPPI) prediction is essential to decoding molecular mechanisms of viral pathogens and host immunity processes that eventually help to control the propagation of viral diseases and to design optimized therapeutics. Multiple AI-based predictors have been developed to predict diverse VHPPIs across a wide range of viruses and hosts, however, these predictors produce better performance only for specific types of hosts and viruses. The prime objective of this research is to develop a robust meta predictor (MP-VHPPI) capable of more accurately predicting VHPPI across multiple hosts and viruses. The proposed meta predictor makes use of two well-known encoding methods Amphiphilic Pseudo-Amino Acid Composition (APAAC) and Quasi-sequence (QS) Order that capture amino acids sequence order and distributional information to most effectively generate the numerical representation of complete viral-host raw protein sequences. Feature agglomeration method is utilized to transform the original feature space into a more informative feature space. Random forest (RF) and Extra tree (ET) classifiers are trained on optimized feature space of both APAAC and QS order separate encoders and by combining both encodings. Further predictions of both classifiers are utilized to feed the Support Vector Machine (SVM) classifier that makes final predictions. The proposed meta predictor is evaluated over 7 different benchmark datasets, where it outperforms existing VHPPI predictors with an average performance of 3.07, 6.07, 2.95, and 2.85% in terms of accuracy, Mathews correlation coefficient, precision, and sensitivity, respectively. To facilitate the scientific community, the MP-VHPPI web server is available at https://sds_genetic_analysis.opendfki.de/MP-VHPPI/.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Ahtisham Fazeel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| |
Collapse
|
7
|
Li B, Tian Y, Tian Y, Zhang S, Zhang X. Predicting Cancer Lymph-Node Metastasis From LncRNA Expression Profiles Using Local Linear Reconstruction Guided Distance Metric Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3179-3189. [PMID: 35139024 DOI: 10.1109/tcbb.2022.3149791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Lymph-node metastasis is the most perilous cancer progressive state, where long non-coding RNA (lncRNA) has been confirmed to be an important genetic indicator in cancer prediction. However, lncRNA expression profile is often characterized of large features and small samples, it is urgent to establish an efficient judgment to deal with such high dimensional lncRNA data, which will aid in clinical targeted treatment. Thus, in this study, a local linear reconstruction guided distance metric learning is put forward to handle lncRNA data for determination of cancer lymph-node metastasis. In the original locally linear embedding (LLE) approach, any point can be approximately linearly reconstructed using its nearest neighborhood points, from which a novel distance metric can be learned by satisfying both nonnegative and sum-to-one constraints on the reconstruction weights. Taking the defined distance metric and lncRNA data supervised information into account, a local margin model will be deduced to find a low dimensional subspace for lncRNA signature extraction. At last, a classifier is constructed to predict cancer lymph-node metastasis, where the learned distance metric is also adopted. Several experiments on lncRNA data sets have been carried out, and experimental results show the performance of the proposed method by making comparisons with some other related dimensionality reduction methods and the classical classifier models.
Collapse
|
8
|
Cui Z, Chen ZH, Zhang QH, Gribova V, Filaretov VF, Huang DS. RMSCNN: A Random Multi-Scale Convolutional Neural Network for Marine Microbial Bacteriocins Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3663-3672. [PMID: 34699364 DOI: 10.1109/tcbb.2021.3122183] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The abuse of traditional antibiotics has led to an increase in the resistance of bacteria and viruses. Similar to the function of antibacterial peptides, bacteriocins are more common as a kind of peptides produced by bacteria that have bactericidal or bacterial effects. More importantly, the marine environment is one of the most abundant resources for extracting marine microbial bacteriocins (MMBs). Identifying bacteriocins from marine microorganisms is a common goal for the development of new drugs. Effective use of MMBs will greatly alleviate the current antibiotic abuse problem. In this work, deep learning is used to identify meaningful MMBs. We propose a random multi-scale convolutional neural network method. In the scale setting, we set a random model to update the scale value randomly. The scale selection method can reduce the contingency caused by artificial setting under certain conditions, thereby making the method more extensive. The results show that the classification performance of the proposed method is better than the state-of-the-art classification methods. In addition, some potential MMBs are predicted, and some different sequence analyses are performed on these candidates. It is worth mentioning that after sequence analysis, the HNH endonucleases of different marine bacteria are considered as potential bacteriocins.
Collapse
|
9
|
Cross-attention PHV: Prediction of human and virus protein-protein interactions using cross-attention-based neural networks. Comput Struct Biotechnol J 2022; 20:5564-5573. [PMID: 36249566 PMCID: PMC9546503 DOI: 10.1016/j.csbj.2022.10.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 10/05/2022] [Accepted: 10/05/2022] [Indexed: 11/30/2022] Open
Abstract
Cross-attention PHV implements two key technologies: cross-attention mechanism and 1D-CNN. It accurately predicts PPIs between human and unknown influenza viruses/SARS-CoV-2. It extracts critical taxonomic and evolutionary differences responsible for PPI prediction.
Viral infections represent a major health concern worldwide. The alarming rate at which SARS-CoV-2 spreads, for example, led to a worldwide pandemic. Viruses incorporate genetic material into the host genome to hijack host cell functions such as the cell cycle and apoptosis. In these viral processes, protein–protein interactions (PPIs) play critical roles. Therefore, the identification of PPIs between humans and viruses is crucial for understanding the infection mechanism and host immune responses to viral infections and for discovering effective drugs. Experimental methods including mass spectrometry-based proteomics and yeast two-hybrid assays are widely used to identify human-virus PPIs, but these experimental methods are time-consuming, expensive, and laborious. To overcome this problem, we developed a novel computational predictor, named cross-attention PHV, by implementing two key technologies of the cross-attention mechanism and a one-dimensional convolutional neural network (1D-CNN). The cross-attention mechanisms were very effective in enhancing prediction and generalization abilities. Application of 1D-CNN to the word2vec-generated feature matrices reduced computational costs, thus extending the allowable length of protein sequences to 9000 amino acid residues. Cross-attention PHV outperformed existing state-of-the-art models using a benchmark dataset and accurately predicted PPIs for unknown viruses. Cross-attention PHV also predicted human–SARS-CoV-2 PPIs with area under the curve values >0.95. The Cross-attention PHV web server and source codes are freely available at https://kurata35.bio.kyutech.ac.jp/Cross-attention_PHV/ and https://github.com/kuratahiroyuki/Cross-Attention_PHV, respectively.
Collapse
Key Words
- 1D-CNN, One-dimensional-CNN
- AC, Accuracy
- AUC, Area under the curve
- CNN, Convolutional neural network
- Convolutional neural network
- DT, Decision tree
- F1, F1-score
- HV-PPIs, Human-virus PPIs
- HuV-PPI, Human–unknown virus PPI
- Human
- LR, Linear regression
- MCC, Matthews correlation coefficient
- PPIs, Protein-protein interactions
- Protein–protein interaction
- RF, Random forest
- SARS-CoV-2
- SARS-CoV-2, Severe acute respiratory syndrome coronavirus 2
- SN, Sensitivity
- SP, Specificity
- SVM, Support vector machine
- T-SNE, T-distributed stochastic neighbor embedding
- Virus
- W2V, Word2vec
- Word2vec
Collapse
|
10
|
Koca MB, Nourani E, Abbasoğlu F, Karadeniz İ, Sevilgen FE. Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses. Comput Biol Chem 2022; 101:107755. [PMID: 36037723 DOI: 10.1016/j.compbiolchem.2022.107755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/07/2022] [Accepted: 08/10/2022] [Indexed: 11/03/2022]
Abstract
Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of pathogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test its functionality and compare it with the state-of-art methods. The experimental results on the benchmark dataset prove the efficiency of the proposed model by having a 3-23% better area under curve (AUC) score than its competitors.
Collapse
Affiliation(s)
- Mehmet Burak Koca
- Department of Computer Engineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkey
| | - Esmaeil Nourani
- Department of Information Technology, Faculty of Computer Engineering and Information Technology, Azarbaijan Shahid Madani University, Tabriz, Iran
| | - Ferda Abbasoğlu
- Department of Computer Engineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkey
| | - İlknur Karadeniz
- Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Işık University, İstanbul, Turkey.
| | - Fatih Erdoğan Sevilgen
- Department of Computer Engineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkey; Institute for Data Science and Artificial Intelligence, Boğaziçi University, İstanbul, Turkey
| |
Collapse
|
11
|
Kumar S, Kumar GS, Maitra SS, Malý P, Bharadwaj S, Sharma P, Dwivedi VD. Viral informatics: bioinformatics-based solution for managing viral infections. Brief Bioinform 2022; 23:6659740. [PMID: 35947964 DOI: 10.1093/bib/bbac326] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/26/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Several new viral infections have emerged in the human population and establishing as global pandemics. With advancements in translation research, the scientific community has developed potential therapeutics to eradicate or control certain viral infections, such as smallpox and polio, responsible for billions of disabilities and deaths in the past. Unfortunately, some viral infections, such as dengue virus (DENV) and human immunodeficiency virus-1 (HIV-1), are still prevailing due to a lack of specific therapeutics, while new pathogenic viral strains or variants are emerging because of high genetic recombination or cross-species transmission. Consequently, to combat the emerging viral infections, bioinformatics-based potential strategies have been developed for viral characterization and developing new effective therapeutics for their eradication or management. This review attempts to provide a single platform for the available wide range of bioinformatics-based approaches, including bioinformatics methods for the identification and management of emerging or evolved viral strains, genome analysis concerning the pathogenicity and epidemiological analysis, computational methods for designing the viral therapeutics, and consolidated information in the form of databases against the known pathogenic viruses. This enriched review of the generally applicable viral informatics approaches aims to provide an overview of available resources capable of carrying out the desired task and may be utilized to expand additional strategies to improve the quality of translation viral informatics research.
Collapse
Affiliation(s)
- Sanjay Kumar
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | - Geethu S Kumar
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida, Uttar Pradesh, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | | | - Petr Malý
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Shiv Bharadwaj
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Pradeep Sharma
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Vivek Dhar Dwivedi
- Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India.,Institute of Advanced Materials, IAAM, 59053 Ulrika, Sweden
| |
Collapse
|
12
|
Predicting Protein–Protein Interactions Based on Ensemble Learning-Based Model from Protein Sequence. BIOLOGY 2022; 11:biology11070995. [PMID: 36101379 PMCID: PMC9311754 DOI: 10.3390/biology11070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 05/27/2022] [Accepted: 06/29/2022] [Indexed: 11/17/2022]
Abstract
Simple Summary Due to most traditional high-throughput experiments are tedious and laborious in identifying potential protein–protein interaction. To better improve accuracy prediction in protein–protein interactions. We proposed a novel computational method that can identify unknown protein–protein interaction efficiently and hope this method can provide a helpful idea and tool for proteomics research. Abstract Protein–protein interactions (PPIs) play an essential role in many biological cellular functions. However, it is still tedious and time-consuming to identify protein–protein interactions through traditional experimental methods. For this reason, it is imperative and necessary to develop a computational method for predicting PPIs efficiently. This paper explores a novel computational method for detecting PPIs from protein sequence, the approach which mainly adopts the feature extraction method: Locality Preserving Projections (LPP) and classifier: Rotation Forest (RF). Specifically, we first employ the Position Specific Scoring Matrix (PSSM), which can remain evolutionary information of biological for representing protein sequence efficiently. Then, the LPP descriptor is applied to extract feature vectors from PSSM. The feature vectors are fed into the RF to obtain the final results. The proposed method is applied to two datasets: Yeast and H. pylori, and obtained an average accuracy of 92.81% and 92.56%, respectively. We also compare it with K nearest neighbors (KNN) and support vector machine (SVM) to better evaluate the performance of the proposed method. In summary, all experimental results indicate that the proposed approach is stable and robust for predicting PPIs and promising to be a useful tool for proteomics research.
Collapse
|
13
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
14
|
Wang Y, Wang L, Wong L, Zhao B, Su X, Li Y, You Z. RoFDT: Identification of Drug–Target Interactions from Protein Sequence and Drug Molecular Structure Using Rotation Forest. BIOLOGY 2022; 11:biology11050741. [PMID: 35625469 PMCID: PMC9138819 DOI: 10.3390/biology11050741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 11/16/2022]
Abstract
As the basis for screening drug candidates, the identification of drug–target interactions (DTIs) plays a crucial role in the innovative drugs research. However, due to the inherent constraints of small-scale and time-consuming wet experiments, DTI recognition is usually difficult to carry out. In the present study, we developed a computational approach called RoFDT to predict DTIs by combining feature-weighted Rotation Forest (FwRF) with a protein sequence. In particular, we first encode protein sequences as numerical matrices by Position-Specific Score Matrix (PSSM), then extract their features utilize Pseudo Position-Specific Score Matrix (PsePSSM) and combine them with drug structure information-molecular fingerprints and finally feed them into the FwRF classifier and validate the performance of RoFDT on Enzyme, GPCR, Ion Channel and Nuclear Receptor datasets. In the above dataset, RoFDT achieved 91.68%, 84.72%, 88.11% and 78.33% accuracy, respectively. RoFDT shows excellent performance in comparison with support vector machine models and previous superior approaches. Furthermore, 7 of the top 10 DTIs with RoFDT estimate scores were proven by the relevant database. These results demonstrate that RoFDT can be employed to a powerful predictive approach for DTIs to provide theoretical support for innovative drug discovery.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
- Correspondence: (L.W.); (Z.Y.); Tel.: +86-151-0632-2257 (L.W.); +86-173-9276-3836 (Z.Y.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
| | - Bowei Zhao
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.Z.); (X.S.)
| | - Xiaorui Su
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.Z.); (X.S.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Zhuhong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China;
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
- Correspondence: (L.W.); (Z.Y.); Tel.: +86-151-0632-2257 (L.W.); +86-173-9276-3836 (Z.Y.)
| |
Collapse
|
15
|
Okoh OS, Nii-Trebi NI, Jakkari A, Olaniran TT, Senbadejo TY, Kafintu-Kwashie AA, Dairo EO, Ganiyu TO, Akaninyene IE, Ezediuno LO, Adeosun IJ, Ockiya MA, Jimah EM, Spiro DJ, Oladipo EK, Trovão NS. Epidemiology and genetic diversity of SARS-CoV-2 lineages circulating in Africa. iScience 2022; 25:103880. [PMID: 35156006 PMCID: PMC8817759 DOI: 10.1016/j.isci.2022.103880] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 11/29/2021] [Accepted: 02/03/2022] [Indexed: 12/15/2022] Open
Abstract
There is a dearth of information on COVID-19 disease dynamics in Africa. To fill this gap, we investigated the epidemiology and genetic diversity of SARS-CoV-2 lineages circulating in the continent. We retrieved 5229 complete genomes collected in 33 African countries from the GISAID database. We investigated the circulating diversity, reconstructed the viral evolutionary divergence and history, and studied the case and death trends in the continent. Almost a fifth (144/782, 18.4%) of Pango lineages found worldwide circulated in Africa, with five different lineages dominating over time. Phylogenetic analysis revealed that African viruses cluster more closely with those from Europe. We also identified two motifs that could function as integrin-binding sites and N-glycosylation domains. These results shed light on the epidemiological and evolutionary dynamics of the circulating viral diversity in Africa. They also emphasize the need to expand surveillance efforts in Africa to help inform and implement better public health measures. SARS-CoV-2 viruses from Africa cluster predominantly with European strains Lower viral diversity observed in Africa is likely due to genomic under-surveillance Number of cases, deaths, and testing show substantial heterogeneity across Africa Two motifs could function as integrin-binding sites and N-glycosylation domains
Collapse
Affiliation(s)
| | - Nicholas Israel Nii-Trebi
- Department of Medical Laboratory Sciences, School of Biomedical and Allied Health Sciences, University of Ghana, Accra, Ghana
| | - Abdulrokeeb Jakkari
- Department of Microbiology, Faculty of Science, Lagos State University, Ojo, Lagos, Nigeria
| | - Tosin Titus Olaniran
- Department of Pure and Applied Biology (Microbiology Unit), Ladoke Akintola University of Technology, Ogbomoso, Nigeria.,Helix Biogen Institute, Ogbomoso, Nigeria
| | - Tosin Yetunde Senbadejo
- Department of Biological Sciences, College of Natural and Applied Sciences, Fountain University, Osogbo, Nigeria
| | - Anna Aba Kafintu-Kwashie
- Department of Medical Microbiology, Clinical Virology Unit, University of Ghana Medical School, Accra, Ghana
| | - Emmanuel Oluwatobi Dairo
- Helix Biogen Institute, Ogbomoso, Nigeria.,Department of Virology, College of Medicine, University of Ibadan, Ibadan, Nigeria
| | - Tajudeen Oladunni Ganiyu
- Department of Biological Sciences, College of Natural and Applied Sciences, Fountain University, Osogbo, Nigeria
| | - Ifiokakaninyene Ekpo Akaninyene
- Department of Pure and Applied Biology (Microbiology Unit), Ladoke Akintola University of Technology, Ogbomoso, Nigeria.,Helix Biogen Institute, Ogbomoso, Nigeria
| | - Louis Odinakaose Ezediuno
- Department of Microbiology, Faculty of Life Sciences, University of Ilorin,1515 P.M.B, Ilorin, Nigeria
| | - Idowu Jesulayomi Adeosun
- Department of Microbiology, Laboratory of Molecular Biology, Immunology and Bioinformatics, Adeleke University, Ede, Osun, Nigeria.,Division of Microbiology, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Private Bag X20, Hatfield Pretoria 0028, South Africa
| | - Michael Asebake Ockiya
- Department of Animal Science, Niger Delta University, Wilberforce Island, Bayelsa, Nigeria
| | - Esther Moradeyo Jimah
- Helix Biogen Institute, Ogbomoso, Nigeria.,Department of Medical Microbiology and Parasitology, University of Ilorin 1515, P.M.B, Ilorin, Nigeria
| | - David J Spiro
- Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| | - Elijah Kolawole Oladipo
- Helix Biogen Institute, Ogbomoso, Nigeria.,Department of Microbiology, Laboratory of Molecular Biology, Immunology and Bioinformatics, Adeleke University, Ede, Osun, Nigeria
| | - Nídia S Trovão
- Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
16
|
Machine Learning Approaches for Discriminating Bacterial and Viral Targeted Human Proteins. Processes (Basel) 2022. [DOI: 10.3390/pr10020291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Infectious diseases are one of the core biological complications for public health. It is important to recognize the pathogen-specific mechanisms to improve our understanding of infectious diseases. Differentiations between bacterial- and viral-targeted human proteins are important for improving both prognosis and treatment for the patient. Here, we introduce machine learning-based classifiers to discriminate between the two groups of human proteins. We used the sequence, network, and gene ontology features of human proteins. Among different classifiers and features, the deep neural network (DNN) classifier with amino acid composition (AAC), dipeptide composition (DC), and pseudo-amino acid composition (PAAC) (445 features) achieved the best area under the curve (AUC) value (0.939), F1-score (94.9%), and Matthews correlation coefficient (MCC) value (0.81). We found that each of the selected top 100 of the bacteria- and virus-targeted human proteins from a candidate pool of 1618 and 3916 proteins, respectively, were part of distinct enriched biological processes and pathways. Our proposed method will help to differentiate between the bacterial and viral infections based on the targeted human proteins on a global scale. Furthermore, identification of the crucial pathogen targets in the human proteome would help us to better understand the pathogen-specific infection strategies and develop novel therapeutics.
Collapse
|
17
|
Li M, Zhang W. PHIAF: prediction of phage-host interactions with GAN-based data augmentation and sequence-based feature fusion. Brief Bioinform 2021; 23:6362109. [PMID: 34472593 DOI: 10.1093/bib/bbab348] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Revised: 07/05/2021] [Accepted: 07/18/2021] [Indexed: 01/01/2023] Open
Abstract
Phage therapy has become one of the most promising alternatives to antibiotics in the treatment of bacterial diseases, and identifying phage-host interactions (PHIs) helps to understand the possible mechanism through which a phage infects bacteria to guide the development of phage therapy. Compared with wet experiments, computational methods of identifying PHIs can reduce costs and save time and are more effective and economic. In this paper, we propose a PHI prediction method with a generative adversarial network (GAN)-based data augmentation and sequence-based feature fusion (PHIAF). First, PHIAF applies a GAN-based data augmentation module, which generates pseudo PHIs to alleviate the data scarcity. Second, PHIAF fuses the features originated from DNA and protein sequences for better performance. Third, PHIAF utilizes an attention mechanism to consider different contributions of DNA/protein sequence-derived features, which also provides interpretability of the prediction model. In computational experiments, PHIAF outperforms other state-of-the-art PHI prediction methods when evaluated via 5-fold cross-validation (AUC and AUPR are 0.88 and 0.86, respectively). An ablation study shows that data augmentation, feature fusion and an attention mechanism are all beneficial to improve the prediction performance of PHIAF. Additionally, four new PHIs with the highest PHIAF score in the case study were verified by recent literature. In conclusion, PHIAF is a promising tool to accelerate the exploration of phage therapy.
Collapse
Affiliation(s)
- Menglu Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
18
|
Computational predictions for protein sequences of COVID-19 virus via machine learning algorithms. Med Biol Eng Comput 2021; 59:1723-1734. [PMID: 34291385 PMCID: PMC8295007 DOI: 10.1007/s11517-021-02412-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 07/09/2021] [Indexed: 10/31/2022]
Abstract
The rapid spread of coronavirus disease (COVID-19) has become a worldwide pandemic and affected more than 15 million patients reported in 27 countries. Therefore, the computational biology carrying this virus that correlates with the human population urgently needs to be understood. In this paper, the classification of the human protein sequences of COVID-19, according to the country, is presented based on machine learning algorithms. The proposed model is based on distinguishing 9238 sequences using three stages, including data preprocessing, data labeling, and classification. In the first stage, data preprocessing's function converts the amino acids of COVID-19 protein sequences into eight groups of numbers based on the amino acids' volume and dipole. It is based on the conjoint triad (CT) method. In the second stage, there are two methods for labeling data from 27 countries from 0 to 26. The first method is based on selecting one number for each country according to the code numbers of countries, while the second method is based on binary elements for each country. According to their countries, machine learning algorithms are used to discover different COVID-19 protein sequences in the last stage. The obtained results demonstrate 100% accuracy, 100% sensitivity, and 90% specificity via the country-based binary labeling method with a linear support vector machine (SVM) classifier. Furthermore, with significant infection data, the USA is more prone to correct classification compared to other countries with fewer data. The unbalanced data for COVID-19 protein sequences is considered a major issue, especially as the US's available data represents 76% of a total of 9238 sequences. The proposed model will act as a prediction tool for the COVID-19 protein sequences in different countries.
Collapse
|
19
|
Yang X, Yang S, Lian X, Wuchty S, Zhang Z. Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction. Bioinformatics 2021; 37:4771-4778. [PMID: 34273146 PMCID: PMC8406877 DOI: 10.1093/bioinformatics/btab533] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 06/03/2021] [Accepted: 07/16/2021] [Indexed: 11/20/2022] Open
Abstract
Motivation To complement experimental efforts, machine learning-based computational methods are playing an increasingly important role to predict human–virus protein–protein interactions (PPIs). Furthermore, transfer learning can effectively apply prior knowledge obtained from a large source dataset/task to a small target dataset/task, improving prediction performance. Results To predict interactions between human and viral proteins, we combine evolutionary sequence profile features with a Siamese convolutional neural network (CNN) architecture and a multi-layer perceptron. Our architecture outperforms various feature encodings-based machine learning and state-of-the-art prediction methods. As our main contribution, we introduce two transfer learning methods (i.e. ‘frozen’ type and ‘fine-tuning’ type) that reliably predict interactions in a target human–virus domain based on training in a source human–virus domain, by retraining CNN layers. Finally, we utilize the ‘frozen’ type transfer learning approach to predict human–SARS-CoV-2 PPIs, indicating that our predictions are topologically and functionally similar to experimentally known interactions. Availability and implementation: The source codes and datasets are available at https://github.com/XiaodiYangCAU/TransPPI/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaodi Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Shiping Yang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xianyi Lian
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Stefan Wuchty
- Dept. of Computer Science, University of Miami, Miami, FL 33146, USA.,Dept. of Biology, University of Miami, Miami, FL 33146, USA.,Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
20
|
Okoh OS, Nii-Trebi NI, Jakkari A, Olaniran TT, Senbadejo TY, Kafintu-kwashie AA, Dairo EO, Ganiyu TO, Akaninyene IE, Ezediuno LO, Adeosun IJ, Ockiya MA, Jimah EM, Spiro DJ, Oladipo EK, Trovão NS. Epidemiology and genetic diversity of SARS-CoV-2 lineages circulating in Africa. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.05.17.21257341. [PMID: 34031660 PMCID: PMC8142660 DOI: 10.1101/2021.05.17.21257341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
COVID-19 disease dynamics have been widely studied in different settings around the globe, but little is known about these patterns in the African continent. To investigate the epidemiology and genetic diversity of SARS-CoV-2 lineages circulating in Africa, more than 2400 complete genomes from 33 African countries were retrieved from the GISAID database and analyzed. We investigated their diversity using various clade and lineage nomenclature systems, reconstructed their evolutionary divergence and history using maximum likelihood inference methods, and studied the case and death trends in the continent. We also examined potential repeat patterns and motifs across the sequences. In this study, we show that after almost one year of the COVID-19 pandemic, only 143 out of the 782 Pango lineages found worldwide circulated in Africa, with five different lineages dominating in distinct periods of the pandemic. Analysis of the number of reported deaths in Africa also revealed large heterogeneity across the continent. Phylogenetic analysis revealed that African viruses cluster closely with those from all continents but more notably with viruses from Europe. However, the extent of viral diversity observed among African genomes is closest to that of the Oceania outbreak, most likely due to genomic under-surveillance in Africa. We also identified two motifs that could function as integrin-binding sites and N-glycosylation domains. These results shed light on the evolutionary dynamics of the circulating viral strains in Africa, elucidate the functions of protein motifs present in the genome sequences, and emphasize the need to expand genomic surveillance efforts in the continent to better understand the molecular, evolutionary, epidemiological, and spatiotemporal dynamics of the COVID-19 pandemic in Africa.
Collapse
Affiliation(s)
| | - Nicholas Israel Nii-Trebi
- Department of Medical Laboratory Sciences, School of Biomedical and Allied Health Sciences, University of Ghana, Accra, Ghana
| | - Abdulrokeeb Jakkari
- Department of Microbiology, Faculty of Science, Lagos State University, Ojo, Lagos, Nigeria
| | - Tosin Titus Olaniran
- Department of Pure and Applied Biology (Microbiology Unit), Ladoke Akintola University of Technology, Ogbomoso, Nigeria
- Helix Biogen Institute, Ogbomoso, Nigeria
| | - Tosin Yetunde Senbadejo
- Department of Biological Sciences, College of Natural and Applied Sciences, Fountain University, Osogbo, Nigeria
| | - Anna Aba Kafintu-kwashie
- Department of Medical Microbiology Clinical Virology unit, University of Ghana Medical School, Accra, Ghana
| | - Emmanuel Oluwatobi Dairo
- Helix Biogen Institute, Ogbomoso, Nigeria
- Department of Virology, College of Medicine, University of Ibadan, Ibadan, Nigeria
| | - Tajudeen Oladunni Ganiyu
- Department of Biological Sciences, College of Natural and Applied Sciences, Fountain University, Osogbo, Nigeria
| | - Ifiokakaninyene Ekpo Akaninyene
- Department of Pure and Applied Biology (Microbiology Unit), Ladoke Akintola University of Technology, Ogbomoso, Nigeria
- Helix Biogen Institute, Ogbomoso, Nigeria
| | | | - Idowu Jesulayomi Adeosun
- Department of Microbiology, Laboratory of Molecular Biology, Immunology and Bioinformatics, Adeleke University, Ede, Osun State, Nigeria
| | - Michael Asebake Ockiya
- Department of Animal Science, Niger Delta University, Wilberforce Island, Bayelsa State, Nigeria
| | - Esther Moradeyo Jimah
- Helix Biogen Institute, Ogbomoso, Nigeria
- Department of Medical Microbiology and Parasitology, University of Ilorin, Nigeria
| | - David J. Spiro
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Elijah Kolawole Oladipo
- Helix Biogen Institute, Ogbomoso, Nigeria
- Department of Microbiology, Laboratory of Molecular Biology, Immunology and Bioinformatics, Adeleke University, Ede, Osun State, Nigeria
| | - Nídia S. Trovão
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
21
|
Das JK, Chakraborty S, Roy S. A scheme for inferring viral-host associations based on codon usage patterns identifies the most affected signaling pathways during COVID-19. J Biomed Inform 2021; 118:103801. [PMID: 33965637 PMCID: PMC8102073 DOI: 10.1016/j.jbi.2021.103801] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 05/02/2021] [Accepted: 05/03/2021] [Indexed: 12/16/2022]
Abstract
Understanding the molecular mechanism of COVID-19 pathogenesis helps in the rapid therapeutic target identification. Usually, viral protein targets host proteins in an organized fashion. The expression of any viral gene depends mostly on the host translational machinery. Recent studies report the great significance of codon usage biases in establishing host-viral protein–protein interactions (PPI). Exploring the codon usage patterns between a pair of co-evolved host and viral proteins may present novel insight into the host-viral protein interactomes during disease pathogenesis. Leveraging the similarity in codon usage patterns, we propose a computational scheme to recreate the host-viral protein–protein interaction network. We use host proteins from seventeen (17) essential signaling pathways for our current work towards understanding the possible targeting mechanism of SARS-CoV-2 proteins. We infer both negatively and positively interacting edges in the network. Further, extensive analysis is performed to understand the host PPI network topologically and the attacking behavior of the viral proteins. Our study reveals that viral proteins mostly utilize codons, rare in the targeted host proteins (negatively correlated interaction). Among them, non-structural proteins, NSP3 and structural protein, Spike (S), are the most influential proteins in interacting with multiple host proteins. While ranking the most affected pathways, MAPK pathways observe to be the worst affected during the SARS-CoV-2 infection. Several proteins participating in multiple pathways are highly central in host PPI and mostly targeted by multiple viral proteins. We observe many potential targets (host proteins) from the affected pathways associated with the various drug molecules, including Arsenic trioxide, Dexamethasone, Hydroxychloroquine, Ritonavir, and Interferon beta, which are either under clinical trial or in use during COVID-19.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Department of Pediatrics, Johns Hopkins University, School of Medicine, MD, USA
| | | | - Swarup Roy
- Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, Gangtok, India.
| |
Collapse
|
22
|
Karabulut OC, Karpuzcu BA, Türk E, Ibrahim AH, Süzek BE. ML-AdVInfect: A Machine-Learning Based Adenoviral Infection Predictor. Front Mol Biosci 2021; 8:647424. [PMID: 34026828 PMCID: PMC8139618 DOI: 10.3389/fmolb.2021.647424] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open
Abstract
Adenoviruses (AdVs) constitute a diverse family with many pathogenic types that infect a broad range of hosts. Understanding the pathogenesis of adenoviral infections is not only clinically relevant but also important to elucidate the potential use of AdVs as vectors in therapeutic applications. For an adenoviral infection to occur, attachment of the viral ligand to a cellular receptor on the host organism is a prerequisite and, in this sense, it is a criterion to decide whether an adenoviral infection can potentially happen. The interaction between any virus and its corresponding host organism is a specific kind of protein-protein interaction (PPI) and several experimental techniques, including high-throughput methods are being used in exploring such interactions. As a result, there has been accumulating data on virus-host interactions including a significant portion reported at publicly available bioinformatics resources. There is not, however, a computational model to integrate and interpret the existing data to draw out concise decisions, such as whether an infection happens or not. In this study, accepting the cellular entry of AdV as a decisive parameter for infectivity, we have developed a machine learning, more precisely support vector machine (SVM), based methodology to predict whether adenoviral infection can take place in a given host. For this purpose, we used the sequence data of the known receptors of AdVs, we identified sets of adenoviral ligands and their respective host species, and eventually, we have constructed a comprehensive adenovirus–host interaction dataset. Then, we committed interaction predictions through publicly available virus-host PPI tools and constructed an AdV infection predictor model using SVM with RBF kernel, with the overall sensitivity, specificity, and AUC of 0.88 ± 0.011, 0.83 ± 0.064, and 0.86 ± 0.030, respectively. ML-AdVInfect is the first of its kind as an effective predictor to screen the infection capacity along with anticipating any cross-species shifts. We anticipate our approach led to ML-AdVInfect can be adapted in making predictions for other viral infections.
Collapse
Affiliation(s)
- Onur Can Karabulut
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Betül Asiye Karpuzcu
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Erdem Türk
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Ahmad Hassan Ibrahim
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Barış Ethem Süzek
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey.,Georgetown University Medical Center, Biochemistry and Molecular and Cellular Biology, Washington, DC, United States
| |
Collapse
|
23
|
Liu-Wei W, Kafkas Ş, Chen J, Dimonaco NJ, Tegnér J, Hoehndorf R. DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes. Bioinformatics 2021; 37:2722-2729. [PMID: 33682875 PMCID: PMC8428617 DOI: 10.1093/bioinformatics/btab147] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/18/2021] [Accepted: 03/01/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Infectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus-host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts. RESULTS We developed DeepViral, a deep learning based method that predicts protein-protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction. AVAILABILITY Code and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.
Collapse
Affiliation(s)
- Wang Liu-Wei
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Şenay Kafkas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Jun Chen
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, SY23 3BQ, Wales, UK
| | - Jesper Tegnér
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.,Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
24
|
Khorsand B, Savadi A, Naghibzadeh M. SARS-CoV-2-human protein-protein interaction network. INFORMATICS IN MEDICINE UNLOCKED 2020; 20:100413. [PMID: 32838020 PMCID: PMC7425553 DOI: 10.1016/j.imu.2020.100413] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 07/11/2020] [Accepted: 08/10/2020] [Indexed: 12/13/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the novel coronavirus which caused the coronavirus disease 2019 pandemic and infected more than 12 million victims and resulted in over 560,000 deaths in 213 countries around the world. Having no symptoms in the first week of infection increases the rate of spreading the virus. The increasing rate of the number of infected individuals and its high mortality necessitates an immediate development of proper diagnostic methods and effective treatments. SARS-CoV-2, similar to other viruses, needs to interact with the host proteins to reach the host cells and replicate its genome. Consequently, virus-host protein-protein interaction (PPI) identification could be useful in predicting the behavior of the virus and the design of antiviral drugs. Identification of virus-host PPIs using experimental approaches are very time consuming and expensive. Computational approaches could be acceptable alternatives for many preliminary investigations. In this study, we developed a new method to predict SARS-CoV-2-human PPIs. Our model is a three-layer network in which the first layer contains the most similar Alphainfluenzavirus proteins to SARS-CoV-2 proteins. The second layer contains protein-protein interactions between Alphainfluenzavirus proteins and human proteins. The last layer reveals protein-protein interactions between SARS-CoV-2 proteins and human proteins by using the clustering coefficient network property on the first two layers. To further analyze the results of our prediction network, we investigated human proteins targeted by SARS-CoV-2 proteins and reported the most central human proteins in human PPI network. Moreover, differentially expressed genes of previous researches were investigated and PPIs of SARS-CoV-2-human network, the human proteins of which were related to upregulated genes, were reported.
Collapse
Affiliation(s)
- Babak Khorsand
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Abdorreza Savadi
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
25
|
Khorsand B, Savadi A, Zahiri J, Naghibzadeh M. Alpha influenza virus infiltration prediction using virus-human protein-protein interaction network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2020; 17:3109-3129. [PMID: 32987519 DOI: 10.3934/mbe.2020176] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
More than ten million deaths make influenza virus one of the deadliest of history. About half a million sever illnesses are annually reported consequent of influenza. Influenza is a parasite which needs the host cellular machinery to replicate its genome. To reach the host, viral proteins need to interact with the host proteins. Therefore, identification of host-virus protein interaction network (HVIN) is one of the crucial steps in treating viral diseases. Being expensive, time-consuming and laborious of HVIN experimental identification, force the researches to use computational methods instead of experimental ones to obtain a better understanding of HVIN. In this study, several features are extracted from physicochemical properties of amino acids, combined with different centralities of human protein-protein interaction network (HPPIN) to predict protein-protein interactions between human proteins and Alphainfluenzavirus proteins (HI-PPIs). Ensemble learning methods were used to predict such PPIs. Our model reached 0.93 accuracy, 0.91 sensitivity and 0.95 specificity. Moreover, a database including 694522 new PPIs was constructed by prediction results of the model. Further analysis showed that HPPIN centralities, gene ontology semantic similarity and conjoint triad of virus proteins are the most important features to predict HI-PPIs.
Collapse
Affiliation(s)
- Babak Khorsand
- Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Abdorreza Savadi
- Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Javad Zahiri
- Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mahmoud Naghibzadeh
- Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
26
|
Sen R, Tagore S, De RK. ASAPP: Architectural Similarity-Based Automated Pathway Prediction System and Its Application in Host-Pathogen Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:506-515. [PMID: 30281472 DOI: 10.1109/tcbb.2018.2872527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The significance of metabolic pathway prediction is to envision the viable unknown transformations that can occur provided the appropriate enzymes are present. It can facilitate the prediction of the consequences of host-pathogen interactions. In this article, we have proposed a new algorithm Architectural Similarity-based Automated Pathway Prediction (ASAPP) to predict metabolic pathways based on the structural similarity among the metabolites. ASAPP takes two-dimensional structure and molecular weight of metabolites as input, and generates a list of probable transformations without the knowledge of any externally established reactions, with an accuracy of 85.09 percent. ASAPP has also been applied to predict the outcome of pathogen liberated toxins on the carbohydrate and lipid pathways of the hosts. We have analyzed the disruption of host pathways in the presence of toxins, and have found that some metabolites in Glycolysis and the TCA cycle have a high chance of being the breakpoints in the pathway. The tool is available at http://asapp.droppages.com/.
Collapse
|
27
|
Yang X, Yang S, Li Q, Wuchty S, Zhang Z. Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Comput Struct Biotechnol J 2019; 18:153-161. [PMID: 31969974 PMCID: PMC6961065 DOI: 10.1016/j.csbj.2019.12.005] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/29/2019] [Accepted: 12/10/2019] [Indexed: 12/11/2022] Open
Abstract
The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computational methods are playing an important role in providing testable hypotheses, complementing the determination of large-scale interactome between species. In this work, we applied an unsupervised sequence embedding technique (doc2vec) to represent protein sequences as rich feature vectors of low dimensionality. Training a Random Forest (RF) classifier through a training dataset that covers known PPIs between human and all viruses, we obtained excellent predictive accuracy outperforming various combinations of machine learning algorithms and commonly-used sequence encoding schemes. Rigorous comparison with three existing human-virus PPI prediction methods, our proposed computational framework further provided very competitive and promising performance, suggesting that the doc2vec encoding scheme effectively captures context information of protein sequences, pertaining to corresponding protein-protein interactions. Our approach is freely accessible through our web server as part of our host-pathogen PPI prediction platform (http://zzdlab.com/InterSPPI/). Taken together, we hope the current work not only contributes a useful predictor to accelerate the exploration of human-virus PPIs, but also provides some meaningful insights into human-virus relationships.
Collapse
Key Words
- AC, Auto Covariance
- ACC, Accuracy
- AUC, area under the ROC curve
- AUPRC, area under the PR curve
- Adaboost, Adaptive Boosting
- CT, Conjoint Triad
- Doc2vec
- Embedding
- Human-virus interaction
- LD, Local Descriptor
- MCC, Matthews correlation coefficient
- ML, machine learning
- MLP, Multiple Layer Perceptron
- MS, mass spectroscopy
- Machine learning
- PPIs, protein-protein interactions
- PR, Precision-Recall
- Prediction
- Protein-protein interaction
- RBF, radial basis function
- RF, Random Forest
- ROC, Receiver Operating Characteristic
- SGD, stochastic gradient descent
- SVM, Support Vector Machine
- Y2H, yeast two-hybrid
Collapse
Affiliation(s)
- Xiaodi Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Shiping Yang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Qinmengge Li
- National Demonstration Center for Experimental Biological Sciences Education, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Stefan Wuchty
- Dept. of Computer Science, University of Miami, Miami, FL 33146, USA
- Dept. of Biology, University of Miami, Miami, FL 33146, USA
- Center of Computational Science, University of Miami, Miami, FL 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|