1
|
D'Souza SE, Khan K, Jalal K, Hassam M, Uddin R. The Gene Network Correlation Analysis of Obesity to Type 1 Diabetes and Cardiovascular Disorders: An Interactome-Based Bioinformatics Approach. Mol Biotechnol 2024; 66:2123-2143. [PMID: 37606877 DOI: 10.1007/s12033-023-00845-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/29/2023] [Indexed: 08/23/2023]
Abstract
The current study focuses on the importance of Protein-Protein Interactions (PPIs) in biological processes and the potential of targeting PPIs as a new treatment strategy for diseases. Specifically, the study explores the cross-links of PPIs network associated with obesity, type 1 diabetes mellitus (T1DM), and cardiac disease (CD), which is an unexplored area of research. The research aimed to understand the role of highly connected proteins in the network and their potential as drug targets. The methodology for this research involves retrieving genes from the NCBI online gene database, intersecting genes among three diseases (type 1 diabetes, obesity, and cardiovascular) using Interactivenn, determining suitable drug molecules using NetworkAnalyst, and performing various bioinformatics analyses such as Generic Protein-Protein Interactions, topological properties analysis, function enrichment analysis in terms of GO, and Kyoto Encyclopedia of Genes and Genomes (KEGG), gene co-expression network, and protein drug as well as protein chemical interaction network. The study focuses on human subjects. The results of this study identified 12 genes [VEGFA (Vascular Endothelial Growth Factor A), IL6 (Interleukin 6), MTHFR (Methylenetetrahydrofolate reductase), NPPB (Natriuretic Peptide B), RAC1 (Rac Family Small GTPase 1), LMNA (Lamin A/C), UGT1A1 (UDP-glucuronosyltransferase family 1 membrane A1), RETN (Resistin), GCG (Glucagon), NPPA (Natriuretic Peptide A), RYR2 (Ryanodine receptor 2), and PRKAG2 (Protein Kinase AMP-Activated Non-Catalytic Subunit Gamma 2)] that were shared across the three diseases and could be used as key proteins for protein-drug/chemical interaction. Additionally, the study provides an in-depth understanding of the complex molecular and biological relationships between the three diseases and the cellular mechanisms that lead to their development. Potentially significant implications for the therapy and management of various disorders are highlighted by the findings of this study by improving treatment efficacy, simplifying treatment regimens, cost-effectiveness, better understanding of the underlying mechanism of these diseases, early diagnosis, and introducing personalized medicine. In conclusion, the current study provides new insights into the cross-links of PPIs network associated with obesity, T1DM, and CD, and highlights the potential of targeting PPIs as a new treatment strategy for these prevalent diseases.
Collapse
Affiliation(s)
- Sharon Elaine D'Souza
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Lab 103 PCMD Ext., Karachi, 75270, Pakistan
| | - Kanwal Khan
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Lab 103 PCMD Ext., Karachi, 75270, Pakistan
| | - Khurshid Jalal
- HEJ Research Institute of Chemistry International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan
| | - Muhammad Hassam
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Lab 103 PCMD Ext., Karachi, 75270, Pakistan
| | - Reaz Uddin
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Lab 103 PCMD Ext., Karachi, 75270, Pakistan.
| |
Collapse
|
2
|
Gong Y, Li R, Liu Y, Wang J, Cao B, Fu X, Li R, Chen DZ. MR2CPPIS: Accurate prediction of protein-protein interaction sites based on multi-scale Res2Net with coordinate attention mechanism. Comput Biol Med 2024; 176:108543. [PMID: 38744015 DOI: 10.1016/j.compbiomed.2024.108543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 04/09/2024] [Accepted: 04/28/2024] [Indexed: 05/16/2024]
Abstract
Proteins play a vital role in various biological processes and achieve their functions through protein-protein interactions (PPIs). Thus, accurate identification of PPI sites is essential. Traditional biological methods for identifying PPIs are costly, labor-intensive, and time-consuming. The development of computational prediction methods for PPI sites offers promising alternatives. Most known deep learning (DL) methods employ layer-wise multi-scale CNNs to extract features from protein sequences. But, these methods usually neglect the spatial positions and hierarchical information embedded within protein sequences, which are actually crucial for PPI site prediction. In this paper, we propose MR2CPPIS, a novel sequence-based DL model that utilizes the multi-scale Res2Net with coordinate attention mechanism to exploit multi-scale features and enhance PPI site prediction capability. We leverage the multi-scale Res2Net to expand the receptive field for each network layer, thus capturing multi-scale information of protein sequences at a granular level. To further explore the local contextual features of each target residue, we employ a coordinate attention block to characterize the precise spatial position information, enabling the network to effectively extract long-range dependencies. We evaluate our MR2CPPIS on three public benchmark datasets (Dset 72, Dset 186, and PDBset 164), achieving state-of-the-art performance. The source codes are available at https://github.com/YyinGong/MR2CPPIS.
Collapse
Affiliation(s)
- Yinyin Gong
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China; Hunan Engineering Research Center of Advanced Embedded Computing and Intelligent Medical Systems, Hunan University, Changsha, 410082, China
| | - Rui Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China; Hunan Engineering Research Center of Advanced Embedded Computing and Intelligent Medical Systems, Hunan University, Changsha, 410082, China.
| | - Yan Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China; Hunan Engineering Research Center of Advanced Embedded Computing and Intelligent Medical Systems, Hunan University, Changsha, 410082, China
| | - Jilong Wang
- Peng Cheng Laboratory, Shenzhen, 518066, China
| | - Buwen Cao
- College of Information and Electronic Engineering, Hunan City University, Yiyang, 413002, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Renfa Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China; Hunan Engineering Research Center of Advanced Embedded Computing and Intelligent Medical Systems, Hunan University, Changsha, 410082, China
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
3
|
Yuan Q, Tian C, Yang Y. Genome-scale annotation of protein binding sites via language model and geometric deep learning. eLife 2024; 13:RP93695. [PMID: 38630609 PMCID: PMC11023698 DOI: 10.7554/elife.93695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Revealing protein binding sites with other molecules, such as nucleic acids, peptides, or small ligands, sheds light on disease mechanism elucidation and novel drug design. With the explosive growth of proteins in sequence databases, how to accurately and efficiently identify these binding sites from sequences becomes essential. However, current methods mostly rely on expensive multiple sequence alignments or experimental protein structures, limiting their genome-scale applications. Besides, these methods haven't fully explored the geometry of the protein structures. Here, we propose GPSite, a multi-task network for simultaneously predicting binding residues of DNA, RNA, peptide, protein, ATP, HEM, and metal ions on proteins. GPSite was trained on informative sequence embeddings and predicted structures from protein language models, while comprehensively extracting residual and relational geometric contexts in an end-to-end manner. Experiments demonstrate that GPSite substantially surpasses state-of-the-art sequence-based and structure-based approaches on various benchmark datasets, even when the structures are not well-predicted. The low computational cost of GPSite enables rapid genome-scale binding residue annotations for over 568,000 sequences, providing opportunities to unveil unexplored associations of binding sites with molecular functions, biological processes, and genetic variants. The GPSite webserver and annotation database can be freely accessed at https://bio-web1.nscc-gz.cn/app/GPSite.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| |
Collapse
|
4
|
Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Fausta Desantis
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- The
Open University Affiliated Research Centre at Istituto Italiano di
Tecnologia, Genoa 16163, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Gian Gaetano Tartaglia
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
- Center
for Human Technologies, Genoa 16152, Italy
| | - Annalisa Pastore
- Experiment
Division, European Synchrotron Radiation
Facility, Grenoble 38043, France
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| | - Michele Monti
- RNA
System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| |
Collapse
|
5
|
Song FV, Su J, Huang S, Zhang N, Li K, Ni M, Liao M. DeepSS2GO: protein function prediction from secondary structure. Brief Bioinform 2024; 25:bbae196. [PMID: 38701416 PMCID: PMC11066904 DOI: 10.1093/bib/bbae196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/31/2024] [Accepted: 04/10/2024] [Indexed: 05/05/2024] Open
Abstract
Predicting protein function is crucial for understanding biological life processes, preventing diseases and developing new drug targets. In recent years, methods based on sequence, structure and biological networks for protein function annotation have been extensively researched. Although obtaining a protein in three-dimensional structure through experimental or computational methods enhances the accuracy of function prediction, the sheer volume of proteins sequenced by high-throughput technologies presents a significant challenge. To address this issue, we introduce a deep neural network model DeepSS2GO (Secondary Structure to Gene Ontology). It is a predictor incorporating secondary structure features along with primary sequence and homology information. The algorithm expertly combines the speed of sequence-based information with the accuracy of structure-based features while streamlining the redundant data in primary sequences and bypassing the time-consuming challenges of tertiary structure analysis. The results show that the prediction performance surpasses state-of-the-art algorithms. It has the ability to predict key functions by effectively utilizing secondary structure information, rather than broadly predicting general Gene Ontology terms. Additionally, DeepSS2GO predicts five times faster than advanced algorithms, making it highly applicable to massive sequencing data. The source code and trained models are available at https://github.com/orca233/DeepSS2GO.
Collapse
Affiliation(s)
- Fu V Song
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
| | - Jiaqi Su
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
| | - Sixing Huang
- Gemini Data Japan, Kitaku Oujikamiya 1-11-11, 115-0043, Tokyo, Japan
| | - Neng Zhang
- Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Road, E1 4NS, London, UK
| | - Kaiyue Li
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
| | - Ming Ni
- MGI Tech, Beishan Industrial Zone, 518083, Shenzhen, China
| | - Maofu Liao
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
- Institute for Biological Electron Microscopy, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
| |
Collapse
|
6
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
7
|
Fu X, Yuan Y, Qiu H, Suo H, Song Y, Li A, Zhang Y, Xiao C, Li Y, Dou L, Zhang Z, Cui F. AGF-PPIS: A protein-protein interaction site predictor based on an attention mechanism and graph convolutional networks. Methods 2024; 222:142-151. [PMID: 38242383 DOI: 10.1016/j.ymeth.2024.01.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/04/2024] [Accepted: 01/13/2024] [Indexed: 01/21/2024] Open
Abstract
Protein-protein interactions play an important role in various biological processes. Interaction among proteins has a wide range of applications. Therefore, the correct identification of protein-protein interactions sites is crucial. In this paper, we propose a novel predictor for protein-protein interactions sites, AGF-PPIS, where we utilize a multi-head self-attention mechanism (introducing a graph structure), graph convolutional network, and feed-forward neural network. We use the Euclidean distance between each protein residue to generate the corresponding protein graph as the input of AGF-PPIS. On the independent test dataset Test_60, AGF-PPIS achieves superior performance over comparative methods in terms of seven different evaluation metrics (ACC, precision, recall, F1-score, MCC, AUROC, AUPRC), which fully demonstrates the validity and superiority of the proposed AGF-PPIS model. The source codes and the steps for usage of AGF-PPIS are available at https://github.com/fxh1001/AGF-PPIS.
Collapse
Affiliation(s)
- Xiuhao Fu
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Haoye Qiu
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Haodong Suo
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yingying Song
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Anqi Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yupeng Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Cuilin Xiao
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yazi Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH 44106, USA
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| |
Collapse
|
8
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. Structurally-informed human interactome reveals proteome-wide perturbations by disease mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.24.538110. [PMID: 37162909 PMCID: PMC10168245 DOI: 10.1101/2023.04.24.538110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Human genome sequencing studies have identified numerous loci associated with complex diseases. However, translating human genetic and genomic findings to disease pathobiology and therapeutic discovery remains a major challenge at multiscale interactome network levels. Here, we present a deep-learning-based ensemble framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that accurately predicts protein binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms, generating comprehensive structurally-informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods. We further systematically validated PIONEER predictions experimentally through generating 2,395 mutations and testing their impact on 6,754 mutation-interaction pairs, confirming the high quality and validity of PIONEER predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces after mapping mutations from ~60,000 germline exomes and ~36,000 somatic genomes. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from pan-cancer analysis of ~11,000 tumor whole-exomes across 33 cancer types. We show that PIONEER-predicted oncoPPIs are significantly associated with patient survival and drug responses from both cancer cell lines and patient-derived xenograft mouse models. We identify a landscape of PPI-perturbing tumor alleles upon ubiquitination by E3 ligases, and we experimentally validate the tumorigenic KEAP1-NRF2 interface mutation p.Thr80Lys in non-small cell lung cancer. We show that PIONEER-predicted PPI-perturbing alleles alter protein abundance and correlates with drug responses and patient survival in colon and uterine cancers as demonstrated by proteogenomic data from the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Yunguang Qiu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY 10032, USA
| | - Yadi Zhou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
- Biophysics Program, Cornell University, Ithaca, NY 14853, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Charis Eng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
9
|
Ding H, Li X, Han P, Tian X, Jing F, Wang S, Song T, Fu H, Kang N. MEG-PPIS: a fast protein-protein interaction site prediction method based on multi-scale graph information and equivariant graph neural network. Bioinformatics 2024; 40:btae269. [PMID: 38640481 PMCID: PMC11252844 DOI: 10.1093/bioinformatics/btae269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/19/2024] [Accepted: 04/17/2024] [Indexed: 04/21/2024] Open
Abstract
MOTIVATION Protein-protein interaction sites (PPIS) are crucial for deciphering protein action mechanisms and related medical research, which is the key issue in protein action research. Recent studies have shown that graph neural networks have achieved outstanding performance in predicting PPIS. However, these studies often neglect the modeling of information at different scales in the graph and the symmetry of protein molecules within three-dimensional space. RESULTS In response to this gap, this article proposes the MEG-PPIS approach, a PPIS prediction method based on multi-scale graph information and E(n) equivariant graph neural network (EGNN). There are two channels in MEG-PPIS: the original graph and the subgraph obtained by graph pooling. The model can iteratively update the features of the original graph and subgraph through the weight-sharing EGNN. Subsequently, the max-pooling operation aggregates the updated features of the original graph and subgraph. Ultimately, the model feeds node features into the prediction layer to obtain prediction results. Comparative assessments against other methods on benchmark datasets reveal that MEG-PPIS achieves optimal performance across all evaluation metrics and gets the fastest runtime. Furthermore, specific case studies demonstrate that our method can predict more true positive and true negative sites than the current best method, proving that our model achieves better performance in the PPIS prediction task. AVAILABILITY AND IMPLEMENTATION The data and code are available at https://github.com/dhz234/MEG-PPIS.git.
Collapse
Affiliation(s)
- Hongzhen Ding
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Xue Li
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Peifu Han
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Xu Tian
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Fengrui Jing
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Shuang Wang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Tao Song
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Hanjiao Fu
- School of Humanities and Law, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Na Kang
- The Ninth Department of Health Care Administration, the Second Medical Center, Chinese PLA General Hospital, Beijing, 100853, China
| |
Collapse
|
10
|
Hosseini S, Golding GB, Ilie L. Seq-InSite: sequence supersedes structure for protein interaction site prediction. Bioinformatics 2024; 40:btad738. [PMID: 38212995 PMCID: PMC10796176 DOI: 10.1093/bioinformatics/btad738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 11/17/2023] [Accepted: 01/10/2024] [Indexed: 01/13/2024] Open
Abstract
MOTIVATION Proteins accomplish cellular functions by interacting with each other, which makes the prediction of interaction sites a fundamental problem. As experimental methods are expensive and time consuming, computational prediction of the interaction sites has been studied extensively. Structure-based programs are the most accurate, while the sequence-based ones are much more widely applicable, as the sequences available outnumber the structures by two orders of magnitude. Ideally, we would like a tool that has the quality of the former and the applicability of the latter. RESULTS We provide here the first solution that achieves these two goals. Our new sequence-based program, Seq-InSite, greatly surpasses the performance of sequence-based models, matching the quality of state-of-the-art structure-based predictors, thus effectively superseding the need for models requiring structure. The predictive power of Seq-InSite is illustrated using an analysis of evolutionary conservation for four protein sequences. AVAILABILITY AND IMPLEMENTATION Seq-InSite is freely available as a web server at http://seq-insite.csd.uwo.ca/ and as free source code, including trained models and all datasets used for training and testing, at https://github.com/lucian-ilie/Seq-InSite.
Collapse
Affiliation(s)
- SeyedMohsen Hosseini
- Department of Computer Science, University of Western Ontario, London, ON N6A 5B7, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Lucian Ilie
- Department of Computer Science, University of Western Ontario, London, ON N6A 5B7, Canada
| |
Collapse
|
11
|
Michalik I, Kuder KJ. Machine Learning Methods in Protein-Protein Docking. Methods Mol Biol 2024; 2780:107-126. [PMID: 38987466 DOI: 10.1007/978-1-0716-3985-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
An exponential increase in the number of publications that address artificial intelligence (AI) usage in life sciences has been noticed in recent years, while new modeling techniques are constantly being reported. The potential of these methods is vast-from understanding fundamental cellular processes to discovering new drugs and breakthrough therapies. Computational studies of protein-protein interactions, crucial for understanding the operation of biological systems, are no exception in this field. However, despite the rapid development of technology and the progress in developing new approaches, many aspects remain challenging to solve, such as predicting conformational changes in proteins, or more "trivial" issues as high-quality data in huge quantities.Therefore, this chapter focuses on a short introduction to various AI approaches to study protein-protein interactions, followed by a description of the most up-to-date algorithms and programs used for this purpose. Yet, given the considerable pace of development in this hot area of computational science, at the time you read this chapter, the development of the algorithms described, or the emergence of new (and better) ones should come as no surprise.
Collapse
Affiliation(s)
- Ilona Michalik
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland
| | - Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
12
|
Zeng X, Meng FF, Li X, Zhong KY, Jiang B, Li Y. GHGPR-PPIS: A graph convolutional network for identifying protein-protein interaction site using heat kernel with Generalized PageRank techniques and edge self-attention feature processing block. Comput Biol Med 2024; 168:107683. [PMID: 37984202 DOI: 10.1016/j.compbiomed.2023.107683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/10/2023] [Accepted: 11/06/2023] [Indexed: 11/22/2023]
Abstract
Accurately pinpointing protein-protein interaction site (PPIS) on the molecular level is of utmost significance for annotating protein function and comprehending the mechanisms underpinning various diseases. While numerous computational methods for predicting PPIS have emerged, they have indeed mitigated the labor and time constraints associated with traditional experimental methods. However, the predictive accuracy of these methods has yet to reach the desired threshold. In this context, we proposed a groundbreaking graph-based computational model called GHGPR-PPIS. This innovative model leveraged a graph convolutional network using heat kernel (GraphHeat) in conjunction with Generalized PageRank techniques (GHGPR) to predict PPIS. Additionally, building upon the GHGPR framework, we devised an edge self-attention feature processing block, further augmenting the performance of the model. Experimental findings conclusively demonstrated that GHGPR-PPIS surpassed all competing state-of-the-art models when evaluated on the benchmark test set. Impressively, on two distinct independent test sets and a specific protein chain, GHGPR-PPIS consistently demonstrated superior generalization performance and practical applicability compared to the comparative model, AGAT-PPIS. Lastly, leveraging the t-SNE dimensionality reduction algorithm and clustering visualization technique, we delved into an interpretability analysis of the effectiveness of GHGPR-PPIS by meticulously comparing the outputs from different stages of the model.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Fan-Fang Meng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Xin Li
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Bei Jiang
- Yunnan Key Laboratory of Screening and Research on Anti-pathogenic Plant Resources from Western Yunnan, Dali University, Dali, 671000, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China.
| |
Collapse
|
13
|
Cong H, Liu H, Cao Y, Liang C, Chen Y. Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinformatics 2023; 24:456. [PMID: 38053020 DOI: 10.1186/s12859-023-05592-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 11/30/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. RESULTS We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. CONCLUSION The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|
14
|
Fang Y, Jiang Y, Wei L, Ma Q, Ren Z, Yuan Q, Wei DQ. DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model. Bioinformatics 2023; 39:btad718. [PMID: 38015872 PMCID: PMC10723037 DOI: 10.1093/bioinformatics/btad718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/04/2023] [Accepted: 11/27/2023] [Indexed: 11/30/2023] Open
Abstract
MOTIVATION Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
Collapse
Affiliation(s)
- Yitian Fang
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Yi Jiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Leyi Wei
- School of Software, Shandong University, Jinan, Shandong 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | | | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| |
Collapse
|
15
|
Kewalramani N, Emili A, Crovella M. State-of-the-art computational methods to predict protein-protein interactions with high accuracy and coverage. Proteomics 2023; 23:e2200292. [PMID: 37401192 DOI: 10.1002/pmic.202200292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/24/2023] [Accepted: 06/09/2023] [Indexed: 07/05/2023]
Abstract
Prediction of protein-protein interactions (PPIs) commonly involves a significant computational component. Rapid recent advances in the power of computational methods for protein interaction prediction motivate a review of the state-of-the-art. We review the major approaches, organized according to the primary source of data utilized: protein sequence, protein structure, and protein co-abundance. The advent of deep learning (DL) has brought with it significant advances in interaction prediction, and we show how DL is used for each source data type. We review the literature taxonomically, present example case studies in each category, and conclude with observations about the strengths and weaknesses of machine learning methods in the context of the principal sources of data for protein interaction prediction.
Collapse
Affiliation(s)
- Neal Kewalramani
- Program in Bioinformatics, Boston University, Boston, Massachusetts, USA
| | - Andrew Emili
- OHSU Knight Cancer Institute, Portland, Oregon, USA
| | - Mark Crovella
- Department of Computer Science and Program in Bioinformatics, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
16
|
He J, Zhang S, Fang C. AAindex-PPII: Predicting polyproline type II helix structure based on amino acid indexes with an improved BiGRU-TextCNN model. J Bioinform Comput Biol 2023; 21:2350022. [PMID: 37899354 DOI: 10.1142/s0219720023500221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
The polyproline-II (PPII) structure domain is crucial in organisms' signal transduction, transcription, cell metabolism, and immune response. It is also a critical structural domain for specific vital disease-associated proteins. Recognizing PPII is essential for understanding protein structure and function. To accurately predict PPII in proteins, we propose a novel method, AAindex-PPII, which only adopts amino acid index to characterize protein sequences and uses a Bidirectional Gated Recurrent Unit (BiGRU)-Improved TextCNN composite deep learning model to predict PPII in proteins. Experimental results show that, when tested on the same datasets, our method outperforms the state-of-the-art BERT-PPII method, achieving an AUC value of 0.845 on the strict data and an AUC value of 0.813 on the non-strict data, which is 0.024 and 0.03 higher than that of the BERT-PPII method. This study demonstrates that our proposed method is simple and efficient for PPII prediction without using pre-trained large models or complex features such as position-specific scoring matrices.
Collapse
Affiliation(s)
- Jiasheng He
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| | - Shun Zhang
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| | - Chun Fang
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| |
Collapse
|
17
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
18
|
Wu H, Han J, Zhang S, Xin G, Mou C, Liu J. Spatom: a graph neural network for structure-based protein-protein interaction site prediction. Brief Bioinform 2023; 24:bbad345. [PMID: 37779247 DOI: 10.1093/bib/bbad345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/22/2023] [Accepted: 09/13/2023] [Indexed: 10/03/2023] Open
Abstract
Accurate identification of protein-protein interaction (PPI) sites remains a computational challenge. We propose Spatom, a novel framework for PPI site prediction. This framework first defines a weighted digraph for a protein structure to precisely characterize the spatial contacts of residues, then performs a weighted digraph convolution to aggregate both spatial local and global information and finally adds an improved graph attention layer to drive the predicted sites to form more continuous region(s). Spatom was tested on a diverse set of challenging protein-protein complexes and demonstrated the best performance among all the compared methods. Furthermore, when tested on multiple popular proteins in a case study, Spatom clearly identifies the interaction interfaces and captures the majority of hotspots. Spatom is expected to contribute to the understanding of protein interactions and drug designs targeting protein binding.
Collapse
Affiliation(s)
- Haonan Wu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Jiyun Han
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Gaojia Xin
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Chaozhou Mou
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| |
Collapse
|
19
|
Liu T, Gao H, Ren X, Xu G, Liu B, Wu N, Luo H, Wang Y, Tu T, Yao B, Guan F, Teng Y, Huang H, Tian J. Protein-protein interaction and site prediction using transfer learning. Brief Bioinform 2023; 24:bbad376. [PMID: 37870286 DOI: 10.1093/bib/bbad376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/14/2023] [Accepted: 10/02/2023] [Indexed: 10/24/2023] Open
Abstract
The advanced language models have enabled us to recognize protein-protein interactions (PPIs) and interaction sites using protein sequences or structures. Here, we trained the MindSpore ProteinBERT (MP-BERT) model, a Bidirectional Encoder Representation from Transformers, using protein pairs as inputs, making it suitable for identifying PPIs and their respective interaction sites. The pretrained model (MP-BERT) was fine-tuned as MPB-PPI (MP-BERT on PPI) and demonstrated its superiority over the state-of-the-art models on diverse benchmark datasets for predicting PPIs. Moreover, the model's capability to recognize PPIs among various organisms was evaluated on multiple organisms. An amalgamated organism model was designed, exhibiting a high level of generalization across the majority of organisms and attaining an accuracy of 92.65%. The model was also customized to predict interaction site propensity by fine-tuning it with PPI site data as MPB-PPISP. Our method facilitates the prediction of both PPIs and their interaction sites, thereby illustrating the potency of transfer learning in dealing with the protein pair task.
Collapse
Affiliation(s)
- Tuoyu Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Han Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Xiaopu Ren
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Guoshun Xu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bo Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ningfeng Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Huiying Luo
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yuan Wang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tao Tu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bin Yao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Feifei Guan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yue Teng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing 100071, China
| | - Huoqing Huang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Jian Tian
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
20
|
Roche R, Moussad B, Shuvo MH, Bhattacharya D. E(3) equivariant graph neural networks for robust and accurate protein-protein interaction site prediction. PLoS Comput Biol 2023; 19:e1011435. [PMID: 37651442 PMCID: PMC10499216 DOI: 10.1371/journal.pcbi.1011435] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 09/13/2023] [Accepted: 08/15/2023] [Indexed: 09/02/2023] Open
Abstract
Artificial intelligence-powered protein structure prediction methods have led to a paradigm-shift in computational structural biology, yet contemporary approaches for predicting the interfacial residues (i.e., sites) of protein-protein interaction (PPI) still rely on experimental structures. Recent studies have demonstrated benefits of employing graph convolution for PPI site prediction, but ignore symmetries naturally occurring in 3-dimensional space and act only on experimental coordinates. Here we present EquiPPIS, an E(3) equivariant graph neural network approach for PPI site prediction. EquiPPIS employs symmetry-aware graph convolutions that transform equivariantly with translation, rotation, and reflection in 3D space, providing richer representations for molecular data compared to invariant convolutions. EquiPPIS substantially outperforms state-of-the-art approaches based on the same experimental input, and exhibits remarkable robustness by attaining better accuracy with predicted structural models from AlphaFold2 than what existing methods can achieve even with experimental structures. Freely available at https://github.com/Bhattacharya-Lab/EquiPPIS, EquiPPIS enables accurate PPI site prediction at scale.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
21
|
Li K, Wu H, Yue Z, Sun Y, Xia C. A convolutional network and attention mechanism-based approach to predict protein-RNA binding residues. Comput Biol Chem 2023; 105:107901. [PMID: 37327559 DOI: 10.1016/j.compbiolchem.2023.107901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/29/2023] [Accepted: 05/31/2023] [Indexed: 06/18/2023]
Abstract
Protein-RNA interactions play a key role in various biological cellular processes, and many experimental and computational studies have been initiated to analyze their interactions. However, experimental determination is quite complex and expensive. Therefore, researchers have worked to develop efficient computational tools to detect protein-RNA binding residues. The accuracy of existing methods is limited by the features of the target and the performance of the computational models; there remains room for improvement. To solve the problem of the accurate detection of protein-RNA binding residues, we propose a convolutional network model named PBRPre based on improved MobileNet. First, by extracting the position information of the target complex and the 3-mer amino acid feature data, the position-specific scoring matrix (PSSM) is improved by using spatial neighbor smoothing processing and discrete wavelet transform to fully exploit the spatial structure information of the target and enrich the feature dataset. Second, the deep learning model MobileNet is used to integrate and optimize the potential features in the target complexes; then, by introducing the Vision Transformer (ViT) network classification layer, the deep-level information of the target is mined to enhance the processing ability of the model for global information and to improve the detection accuracy of the classifiers. The results show that the AUC value of the model can reach 0.866 in the independent testing dataset, which shows that PBRPre can effectively realize the detection of protein-RNA binding residues. All datasets and resource codes of PBRPre are available at https://github.com/linglewu/PBRPre for academic use.
Collapse
Affiliation(s)
- Ke Li
- School of Information & Computer, Anhui Agricultural University, Hefei, Anhui 230036, China; Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, Anhui 230601, China; Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China.
| | - Hongwei Wu
- School of Information & Computer, Anhui Agricultural University, Hefei, Anhui 230036, China; Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Zhenyu Yue
- School of Information & Computer, Anhui Agricultural University, Hefei, Anhui 230036, China; Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yu Sun
- School of Information & Computer, Anhui Agricultural University, Hefei, Anhui 230036, China; Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Chuan Xia
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
22
|
Yin R, Luo Z, Zhuang P, Zeng M, Li M, Lin Z, Kwoh CK. ViPal: A framework for virulence prediction of influenza viruses with prior viral knowledge using genomic sequences. J Biomed Inform 2023; 142:104388. [PMID: 37178781 PMCID: PMC10602211 DOI: 10.1016/j.jbi.2023.104388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 04/30/2023] [Accepted: 05/07/2023] [Indexed: 05/15/2023]
Abstract
Influenza viruses pose great threats to public health and cause enormous economic losses every year. Previous work has revealed the viral factors associated with the virulence of influenza viruses in mammals. However, taking prior viral knowledge represented by heterogeneous categorical and discrete information into account to explore virus virulence is scarce in the existing work. How to make full use of the preceding domain knowledge in virulence study is challenging but beneficial. This paper proposes a general framework named ViPal for virulence prediction in mice that incorporates discrete prior viral mutation and reassortment information based on all eight influenza segments. The posterior regularization technique is leveraged to transform prior viral knowledge into constraint features and integrated into the machine learning models. Experimental results on influenza genomic datasets validate that our proposed framework can improve virulence prediction performance over baselines. The comparison between ViPal and other existing methods shows the computational efficiency of our framework with comparable or superior performance. Moreover, the interpretable analysis through SHAP (SHapley Additive exPlanations) identifies the scores of constraint features contributing to the prediction. We hope this framework could provide assistance for the accurate detection of influenza virulence and facilitate flu surveillance.
Collapse
Affiliation(s)
- Rui Yin
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, USA; School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore.
| | - Zihan Luo
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
| | - Pei Zhuang
- Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhuoyi Lin
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| |
Collapse
|
23
|
Li M, Shi W, Zhang F, Zeng M, Li Y. A Deep Learning Framework for Predicting Protein Functions With Co-Occurrence of GO Terms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:833-842. [PMID: 35476573 DOI: 10.1109/tcbb.2022.3170719] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The understanding of protein functions is critical to many biological problems such as the development of new drugs and new crops. To reduce the huge gap between the increase of protein sequences and annotations of protein functions, many methods have been proposed to deal with this problem. These methods use Gene Ontology (GO) to classify the functions of proteins and consider one GO term as a class label. However, they ignore the co-occurrence of GO terms that is helpful for protein function prediction. We propose a new deep learning model, named DeepPFP-CO, which uses Graph Convolutional Network (GCN) to explore and capture the co-occurrence of GO terms to improve the protein function prediction performance. In this way, we can further deduce the protein functions by fusing the predicted propensity of the center function and its co-occurrence functions. We use Fmax and AUPR to evaluate the performance of DeepPFP-CO and compare DeepPFP-CO with state-of-the-art methods such as DeepGOPlus and DeepGOA. The computational results show that DeepPFP-CO outperforms DeepGOPlus and other methods. Moreover, we further analyze our model at the protein level. The results have demonstrated that DeepPFP-CO improves the performance of protein function prediction. DeepPFP-CO is available at https://csuligroup.com/DeepPFP/.
Collapse
|
24
|
Tian Z, Fang H, Teng Z, Ye Y. GOGCN: Graph Convolutional Network on Gene Ontology for Functional Similarity Analysis of Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1053-1064. [PMID: 35687647 DOI: 10.1109/tcbb.2022.3181300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The measurement of gene functional similarity plays a critical role in numerous biological applications, such as gene clustering, the construction of gene similarity networks. However, most existing approaches still rely heavily on traditional computational strategies, which are not guaranteed to achieve satisfactory performance. In this study, we propose a novel computational approach called GOGCN to measure gene functional similarity by modeling the Gene Ontology (GO) through Graph Convolutional Network (GCN). GOGCN is a graph-based approach that performs sufficient representation learning for terms and relations in the GO graph. First, GOGCN employs the GCN-based knowledge graph embedding (KGE) model to learn vector representations (i.e., embeddings) for all entities (i.e., terms). Second, GOGCN calculates the semantic similarity between two terms based on their corresponding vector representations. Finally, GOGCN estimates gene functional similarity by making use of the pair-wise strategy. During the representation learning period, GOGCN promotes semantic interaction between terms through GCN, thereby capturing the rich structural information of the GO graph. Further experimental results on various datasets suggest that GOGCN is superior to the other state-of-the-art approaches, which shows its reliability and effectiveness.
Collapse
|
25
|
Aybey E, Gümüş Ö. SENSDeep: An Ensemble Deep Learning Method for Protein-Protein Interaction Sites Prediction. Interdiscip Sci 2023; 15:55-87. [PMID: 36346583 DOI: 10.1007/s12539-022-00543-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/15/2022] [Accepted: 10/17/2022] [Indexed: 11/09/2022]
Abstract
PURPOSE The determination of which amino acid in a protein interacts with other proteins is important in understanding the functional mechanism of that protein. Although there are experimental methods to detect protein-protein interaction sites (PPISs), these are costly, time-consuming, and require expertise. Therefore, many computational methods have been proposed to accelerate this type of research, but they are generally insufficient to predict PPISs accurately. There is a need for development in this field. METHODS In this study, we introduce a new PPISs prediction method. This method is a sequence-based Stacking ENSemble Deep (SENSDeep) learning method that has an ensemble learning model including the models of RNN, CNN, GRU sequence to sequence (GRUs2s), GRU sequence to sequence with an attention layer (GRUs2satt) and a multilayer perceptron. Two embedded features, secondary structure, and protein sequence information are added to the training data set in addition to twelve existing features to improve the prediction performance of the method. RESULTS SENSDeep trained on the training data set without two extra features obtains a better performance on some of the independent testing data sets than that of the other methods in the literature, especially on scoring metrics of sensitivity, F1, MCC, and AUPRC, having increments up to 63.5%, 19.3%, 18.5%, 11.4%, respectively. It is shown that the added extra features improve the performance of the method by having almost the same performance with less data as the method trained on the data set without these added features. On the other hand, different sizes of the sliding window are tried on the data sets and an optimal sliding window size for SENSDeep is found. Moreover, SENSDeep has also been compared to structure-based methods. Some of these methods have been found to perform better. Using SENSDeep obtained by training with both training data sets, PPISs prediction examples of various proteins that are not in these training data sets are also presented. Furthermore, execution times for SENSDeep and its submodels are shown. AVAILABILITY AND IMPLEMENTATION https://github.com/enginaybey/SENSDeep.
Collapse
Affiliation(s)
- Engin Aybey
- Department of Health Bioinformatics, Ege University, 35100, Bornova, Izmir, Turkey.
- Rectorate, Marmara University, 34722, Kadıköy, Istanbul, Turkey.
| | - Özgür Gümüş
- Department of Computer Engineering, Ege University, 35100, Bornova, Izmir, Turkey
| |
Collapse
|
26
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
27
|
Hosseini S, Ilie L. Predicting Protein Interaction Sites Using PITHIA. Methods Mol Biol 2023; 2690:375-383. [PMID: 37450160 DOI: 10.1007/978-1-0716-3327-4_29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
Several proteins work independently, but the majority work together to maintain the functions of the cell. Thus, it is crucial to know the interaction sites that facilitate protein-protein interactions. The development of effective computational methods is essential because experimental methods are expensive and time-consuming. This chapter is a guide to predicting protein interaction sites using the program "PITHIA." First, some installation guides are presented, followed by descriptions of input file formats. Afterward, PITHIA's commands and options are outlined with examples. Moreover, some notes are provided on how to extend PITHIA's installation and usage.
Collapse
Affiliation(s)
- SeyedMohsen Hosseini
- Department of Computer Science, University of Western Ontario, London, ON, Canada
| | - Lucian Ilie
- Department of Computer Science, University of Western Ontario, London, ON, Canada.
| |
Collapse
|
28
|
Li K, Quan L, Jiang Y, Li Y, Zhou Y, Wu T, Lyu Q. ctP 2ISP: Protein-Protein Interaction Sites Prediction Using Convolution and Transformer With Data Augmentation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:297-306. [PMID: 35213314 DOI: 10.1109/tcbb.2022.3154413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctP2ISP to improve the prediction of protein-protein interaction sites. ctP2ISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctP2ISP was evaluated against current state-of-the-art methods on six public datasets. The results show that ctP2ISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.
Collapse
|
29
|
Wang S, Chen W, Han P, Li X, Song T. RGN: Residue-Based Graph Attention and Convolutional Network for Protein-Protein Interaction Site Prediction. J Chem Inf Model 2022; 62:5961-5974. [PMID: 36398714 DOI: 10.1021/acs.jcim.2c01092] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The prediction of a protein-protein interaction site (PPI site) plays a very important role in the biochemical process, and lots of computational methods have been proposed in the past. However, the majority of the past methods are time consuming and lack accuracy. Hence, coming up with an effective computational method is necessary. In this article, we present a novel computational model called RGN (residue-based graph attention and convolutional network) to predict PPI sites. In our paper, the protein is treated as a graph. The amino acid can be seen as the node in the graph structure. The position-specific scoring matrix, hidden Markov model, hydrogen bond estimation algorithm, and ProtBert are applied as node features. The edges are decided by the spatial distance between the amino acids. Then, we utilize a residue-based graph convolutional network and graph attention network to further extract the deeper feature. Finally, the processed node feature is fed into the prediction layer. We show the superiority of our model by comparing it with the other four protein structure-based methods and five protein sequence-based methods. Our model obtains the best performance on all the evaluation metrics (accuracy, precision, recall, F1 score, Matthews correlation coefficient, area under the receiver operating characteristic curve, and area under the precision recall curve). We also conduct a case study to demonstrate that extracting the protein information from the protein structure perspective is effective and points out the difficult aspect of PPI site prediction.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, QingDao266580, China
| | - Wenqi Chen
- College of Computer Science and Technology, China University of Petroleum, QingDao266580, China
| | - Peifu Han
- College of Computer Science and Technology, China University of Petroleum, QingDao266580, China
| | - Xue Li
- College of Computer Science and Technology, China University of Petroleum, QingDao266580, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum, QingDao266580, China.,Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Madrid28031, Spain
| |
Collapse
|
30
|
PITHIA: Protein Interaction Site Prediction Using Multiple Sequence Alignments and Attention. Int J Mol Sci 2022; 23:ijms232112814. [PMID: 36361606 PMCID: PMC9657891 DOI: 10.3390/ijms232112814] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 11/22/2022] Open
Abstract
Cellular functions are governed by proteins, and, while some proteins work independently, most work by interacting with other proteins. As a result it is crucially important to know the interaction sites that facilitate the interactions between the proteins. Since the experimental methods are costly and time consuming, it is essential to develop effective computational methods. We present PITHIA, a sequence-based deep learning model for protein interaction site prediction that exploits the combination of multiple sequence alignments and learning attention. We demonstrate that our new model clearly outperforms the state-of-the-art models on a wide range of metrics. In order to provide meaningful comparison, we update existing test datasets with new information regarding interaction site, as well as introduce an additional new testing dataset which resolves the shortcomings of the existing ones.
Collapse
|
31
|
Zhang Y, Hu Y, Li H, Liu X. Drug-protein interaction prediction via variational autoencoders and attention mechanisms. Front Genet 2022; 13:1032779. [PMID: 36313473 PMCID: PMC9614151 DOI: 10.3389/fgene.2022.1032779] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/30/2022] [Indexed: 09/29/2023] Open
Abstract
During the process of drug discovery, exploring drug-protein interactions (DPIs) is a key step. With the rapid development of biological data, computer-aided methods are much faster than biological experiments. Deep learning methods have become popular and are mainly used to extract the characteristics of drugs and proteins for further DPIs prediction. Since the prediction of DPIs through machine learning cannot fully extract effective features, in our work, we propose a deep learning framework that uses variational autoencoders and attention mechanisms; it utilizes convolutional neural networks (CNNs) to obtain local features and attention mechanisms to obtain important information about drugs and proteins, which is very important for predicting DPIs. Compared with some machine learning methods on the C.elegans and human datasets, our approach provides a better effect. On the BindingDB dataset, its accuracy (ACC) and area under the curve (AUC) reach 0.862 and 0.913, respectively. To verify the robustness of the model, multiclass classification tasks are performed on Davis and KIBA datasets, and the ACC values reach 0.850 and 0.841, respectively, thus further demonstrating the effectiveness of the model.
Collapse
Affiliation(s)
- Yue Zhang
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
| | | | | | | |
Collapse
|
32
|
ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides. Int J Mol Sci 2022; 23:ijms232012194. [PMID: 36293050 PMCID: PMC9603247 DOI: 10.3390/ijms232012194] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 10/08/2022] [Accepted: 10/11/2022] [Indexed: 11/30/2022] Open
Abstract
Cancer is the second-leading cause of death worldwide, and therapeutic peptides that target and destroy cancer cells have received a great deal of interest in recent years. Traditional wet experiments are expensive and inefficient for identifying novel anticancer peptides; therefore, the development of an effective computational approach is essential to recognize ACP candidates before experimental methods are used. In this study, we proposed an Ada-boosting algorithm with the base learner random forest called ACP-ADA, which integrates binary profile feature, amino acid index, and amino acid composition with a 210-dimensional feature space vector to represent the peptides. Training samples in the feature space were augmented to increase the sample size and further improve the performance of the model in the case of insufficient samples. Furthermore, we used five-fold cross-validation to find model parameters, and the cross-validation results showed that ACP-ADA outperforms existing methods for this feature combination with data augmentation in terms of performance metrics. Specifically, ACP-ADA recorded an average accuracy of 86.4% and a Mathew’s correlation coefficient of 74.01% for dataset ACP740 and 90.83% and 81.65% for dataset ACP240; consequently, it can be a very useful tool in drug development and biomedical research.
Collapse
|
33
|
Agapito G, Milano M, Cannataro M. A Python Clustering Analysis Protocol of Genes Expression Data Sets. Genes (Basel) 2022; 13:genes13101839. [PMID: 36292724 PMCID: PMC9601308 DOI: 10.3390/genes13101839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/05/2022] [Accepted: 10/08/2022] [Indexed: 11/16/2022] Open
Abstract
Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.
Collapse
Affiliation(s)
- Giuseppe Agapito
- Department of Law, Economics and Social Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Correspondence:
| | - Marianna Milano
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
| | - Mario Cannataro
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
| |
Collapse
|
34
|
Xu D, Liu B, Wang J, Zhang Z. Bibliometric analysis of artificial intelligence for biotechnology and applied microbiology: Exploring research hotspots and frontiers. Front Bioeng Biotechnol 2022; 10:998298. [PMID: 36277390 PMCID: PMC9585160 DOI: 10.3389/fbioe.2022.998298] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/23/2022] [Indexed: 11/13/2022] Open
Abstract
Background: In the biotechnology and applied microbiology sectors, artificial intelligence (AI) has been extensively used in disease diagnostics, drug research and development, functional genomics, biomarker recognition, and medical imaging diagnostics. In our study, from 2000 to 2021, science publications focusing on AI in biotechnology were reviewed, and quantitative, qualitative, and modeling analyses were performed. Methods: On 6 May 2022, the Web of Science Core Collection (WoSCC) was screened for AI applications in biotechnology and applied microbiology; 3,529 studies were identified between 2000 and 2022, and analyzed. The following information was collected: publication, country or region, references, knowledgebase, institution, keywords, journal name, and research hotspots, and examined using VOSviewer and CiteSpace V bibliometric platforms. Results: We showed that 128 countries published articles related to AI in biotechnology and applied microbiology; the United States had the most publications. In addition, 584 global institutions contributed to publications, with the Chinese Academy of Science publishing the most. Reference clusters from studies were categorized into ten headings: deep learning, prediction, support vector machines (SVM), object detection, feature representation, synthetic biology, amyloid, human microRNA precursors, systems biology, and single cell RNA-Sequencing. Research frontier keywords were represented by microRNA (2012–2020) and protein-protein interactions (PPIs) (2012–2020). Conclusion: We systematically, objectively, and comprehensively analyzed AI-related biotechnology and applied microbiology literature, and additionally, identified current hot spots and future trends in this area. Our review provides researchers with a comprehensive overview of the dynamic evolution of AI in biotechnology and applied microbiology and identifies future key research areas.
Collapse
Affiliation(s)
- Dongyu Xu
- Department of Computer, School of Intelligent Medicine, China Medical University, Shenyang, Liaoning, China
| | - Bing Liu
- Department of Bone Oncology, The People’s Hospital of Liaoning Province, Shenyang, Liaoning, China
| | - Jian Wang
- Department of Pathogenic Biology, School of Basic Medicine, China Medical University, Shenyang, Liaoning, China
| | - Zhichang Zhang
- Department of Computer, School of Intelligent Medicine, China Medical University, Shenyang, Liaoning, China
- *Correspondence: Zhichang Zhang,
| |
Collapse
|
35
|
Evaluation of the Effectiveness of Derived Features of AlphaFold2 on Single-Sequence Protein Binding Site Prediction. BIOLOGY 2022; 11:biology11101454. [PMID: 36290358 PMCID: PMC9598995 DOI: 10.3390/biology11101454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 09/30/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022]
Abstract
Simple Summary With the development of artificial intelligence, researchers can roughly predict the crystal structure of a protein by computer without the need for biological experiments, which provides new ideas and solutions to problems, such as protein-protein interaction and drug-target predictions. In this study, we proposed strategies to combine predicted protein structures with deep learning networks and evaluated them on different protein binding site prediction tasks. Our computational experiment results showed that all proposed strategies could effectively encode structural information for deep learning models. Abstract Though AlphaFold2 has attained considerably high precision on protein structure prediction, it is reported that directly inputting coordinates into deep learning networks cannot achieve desirable results on downstream tasks. Thus, how to process and encode the predicted results into effective forms that deep learning models can understand to improve the performance of downstream tasks is worth exploring. In this study, we tested the effects of five processing strategies of coordinates on two single-sequence protein binding site prediction tasks. These five strategies are spatial filtering, the singular value decomposition of a distance map, calculating the secondary structure feature, and the relative accessible surface area feature of proteins. The computational experiment results showed that all strategies were suitable and effective methods to encode structural information for deep learning models. In addition, by performing a case study of a mutated protein, we showed that the spatial filtering strategy could introduce structural changes into HHblits profiles and deep learning networks when protein mutation happens. In sum, this work provides new insight into the downstream tasks of protein-molecule interaction prediction, such as predicting the binding residues of proteins and estimating the effects of mutations.
Collapse
|
36
|
Ni Y, Fan L, Wang M, Zhang N, Zuo Y, Liao M. EPI-Mind: Identifying Enhancer-Promoter Interactions Based on Transformer Mechanism. Interdiscip Sci 2022; 14:786-794. [PMID: 35633468 DOI: 10.1007/s12539-022-00525-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 03/24/2022] [Accepted: 04/25/2022] [Indexed: 06/15/2023]
Abstract
MOTIVATION Enhancer-Promoter Interactions (EPIs) is an essential step in the gene regulation process. However, the detection of EPIs by traditional wet experimental techniques is time-consuming and expensive. Thus, computational methods would be very useful for understanding the mechanism of EPIs. A number of approaches have been proposed to address this problem. Nevertheless, there is room for exploration and improvement for the existing research methods. METHODS In this study, a novel deep-learning model named EPI-Mind was proposed to predict EPIs with sequences features. First, we encoded enhancers and promoters sequences with pre-trained DNA vectors. Then, the Convolutional Neural Network (CNN) was utilized to rough extract the global and local features. Finally, the transformer mechanism was introduced to further extract the feature. We first trained a model named EPI-Mind_spe which can predict EPIs in one cell line. To achieve general prediction across different cell lines and further improve the performance of the model, a second-time training was carried on. The redivided dataset were used to train a new model called EPI-Mind_gen which can predict EPIs across different cell lines. To further improve the accuracy, a new model named EPI-Mind_best was trained which used the EPI-Mind_gen as a pre-trained model. RESULTS EPI-Mind_spe has the ability of predict EPIs with average AUROC above 90% and average AUPR above 70% but with cell lines specificity. EPI-Mind_gen can predict EPIs across different cell lines and its average AUROC is higher than the EPI-Mind_spe about 4.8%. EPI-Mind_best is superior to the state-of-the-art predictors on benchmarking datasets. EPI-Mind_best achieved best in 5 indicators within 12 indicators consists of AUPR and AUROC which is better than pioneers. CONCLUSION This research proposed a method, which was called EPI-Mind, to predict EPIs only with enhancer and promoters sequences, the framework of which was based on deep learning. This manuscript may provide a new route to solve the problem.
Collapse
Affiliation(s)
- Yu Ni
- College of Life Sciences, Northwest A&F University, Taicheng Road, Yangling, 712100, China
- College of Information Engineering, Northwest A&F University, Taicheng Road, Yangling, 712100, China
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Linqi Fan
- The 5th Paradigm Technology Co., Ltd, Yuanjiang Street, Shanghai, 200000, China
| | - Miao Wang
- College of Information Engineering, Northwest A&F University, Taicheng Road, Yangling, 712100, China
| | - Ning Zhang
- College of Life Sciences, Northwest A&F University, Taicheng Road, Yangling, 712100, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| | - Mingzhi Liao
- College of Life Sciences, Northwest A&F University, Taicheng Road, Yangling, 712100, China.
| |
Collapse
|
37
|
Wei X, Yang J, Li S, Li B, Chen M, Lu Y, Wu X, Cheng Z, Zhang X, Chen Z, Wang C, Wang E, Zheng R, Xu X, Shang H. TAIGET: A small-molecule target identification and annotation web server. Front Pharmacol 2022; 13:898519. [PMID: 36105222 PMCID: PMC9465370 DOI: 10.3389/fphar.2022.898519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 07/19/2022] [Indexed: 11/28/2022] Open
Abstract
Background: Accurate target identification of small molecules and downstream target annotation are important in pharmaceutical research and drug development. Methods: We present TAIGET, a friendly and easy to operate graphical web interface, which consists of a docking module based on AutoDock Vina and LeDock, a target screen module based on a Bayesian–Gaussian mixture model (BGMM), and a target annotation module derived from >14,000 cancer-related literature works. Results: TAIGET produces binding poses by selecting ≤5 proteins at a time from the UniProt ID-PDB network and submitting ≤3 ligands at a time with the SMILES format. Once the identification process of binding poses is complete, TAIGET then screens potential targets based on the BGMM. In addition, three medical experts and 10 medical students curated associations among drugs, genes, gene regulation, cancer outcome phenotype, 2,170 cancer cell types, and 73 cancer types from the PubMed literature, with the aim to construct a target annotation module. A target-related PPI network can be visualized by an interactive interface. Conclusion: This online tool significantly lowers the entry barrier of virtual identification of targets for users who are not experts in the technical aspects of virtual drug discovery. The web server is available free of charge at http://www.taiget.cn/.
Collapse
Affiliation(s)
- Xuxu Wei
- Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, China
- Key Laboratory of Chinese Internal Medicine of MOE, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Jiarui Yang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Simin Li
- Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, China
| | - Boyuan Li
- Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, China
| | - Mengzhen Chen
- Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, China
| | - Yukang Lu
- Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, China
| | - Xiang Wu
- Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, China
| | - Zeyu Cheng
- Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, China
| | - Xiaoyu Zhang
- Key Laboratory of Chinese Internal Medicine of MOE, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Zhao Chen
- Key Laboratory of Chinese Internal Medicine of MOE, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Chunxia Wang
- Key Laboratory of Chinese Internal Medicine of MOE, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Edwin Wang
- Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, China
- *Correspondence: Ruiqing Zheng, ; Xue Xu, ; Hongcai Shang,
| | - Xue Xu
- Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, China
- *Correspondence: Ruiqing Zheng, ; Xue Xu, ; Hongcai Shang,
| | - Hongcai Shang
- Key Laboratory of Chinese Internal Medicine of MOE, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
- *Correspondence: Ruiqing Zheng, ; Xue Xu, ; Hongcai Shang,
| |
Collapse
|
38
|
Jena M, Mishra D, Mishra SP, Mallick PK. A Tailored Complex Medical Decision Analysis Model for Diabetic Retinopathy Classification Based on Optimized Un-Supervised Feature Learning Approach. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-022-07057-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
39
|
ProB-Site: Protein Binding Site Prediction Using Local Features. Cells 2022; 11:cells11132117. [PMID: 35805201 PMCID: PMC9266162 DOI: 10.3390/cells11132117] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 01/16/2023] Open
Abstract
Protein–protein interactions (PPIs) are responsible for various essential biological processes. This information can help develop a new drug against diseases. Various experimental methods have been employed for this purpose; however, their application is limited by their cost and time consumption. Alternatively, computational methods are considered viable means to achieve this crucial task. Various techniques have been explored in the literature using the sequential information of amino acids in a protein sequence, including machine learning and deep learning techniques. The current efficiency of interaction-site prediction still has growth potential. Hence, a deep neural network-based model, ProB-site, is proposed. ProB-site utilizes sequential information of a protein to predict its binding sites. The proposed model uses evolutionary information and predicted structural information extracted from sequential information of proteins, generating three unique feature sets for every amino acid in a protein sequence. Then, these feature sets are fed to their respective sub-CNN architecture to acquire complex features. Finally, the acquired features are concatenated and classified using fully connected layers. This methodology performed better than state-of-the-art techniques because of the selection of the best features and contemplation of local information of each amino acid.
Collapse
|
40
|
Lu S, Li Y, Ma Q, Nan X, Zhang S. A Structure-Based B-cell Epitope Prediction Model Through Combing Local and Global Features. Front Immunol 2022; 13:890943. [PMID: 35844532 PMCID: PMC9283778 DOI: 10.3389/fimmu.2022.890943] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 05/23/2022] [Indexed: 11/24/2022] Open
Abstract
B-cell epitopes (BCEs) are a set of specific sites on the surface of an antigen that binds to an antibody produced by B-cell. The recognition of BCEs is a major challenge for drug design and vaccines development. Compared with experimental methods, computational approaches have strong potential for BCEs prediction at much lower cost. Moreover, most of the currently methods focus on using local information around target residue without taking the global information of the whole antigen sequence into consideration. We propose a novel deep leaning method through combing local features and global features for BCEs prediction. In our model, two parallel modules are built to extract local and global features from the antigen separately. For local features, we use Graph Convolutional Networks (GCNs) to capture information of spatial neighbors of a target residue. For global features, Attention-Based Bidirectional Long Short-Term Memory (Att-BLSTM) networks are applied to extract information from the whole antigen sequence. Then the local and global features are combined to predict BCEs. The experiments show that the proposed method achieves superior performance over the state-of-the-art BCEs prediction methods on benchmark datasets. Also, we compare the performance differences between data with or without global features. The experimental results show that global features play an important role in BCEs prediction. Our detailed case study on the BCEs prediction for SARS-Cov-2 receptor binding domain confirms that our method is effective for predicting and clustering true BCEs.
Collapse
Affiliation(s)
- Shuai Lu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
| | - Yuguang Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
| | - Qiang Ma
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Xiaofei Nan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- *Correspondence: Xiaofei Nan, ; Shoutao Zhang,
| | - Shoutao Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
- Longhu Laboratory of Advanced Immunology, Zhengzhou, China
- *Correspondence: Xiaofei Nan, ; Shoutao Zhang,
| |
Collapse
|
41
|
Luo X, Ju W, Qu M, Gu Y, Chen C, Deng M, Hua XS, Zhang M. CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph Representation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:899-912. [PMID: 35675236 DOI: 10.1109/tnnls.2022.3177775] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This article studies self-supervised graph representation learning, which is critical to various tasks, such as protein property prediction. Existing methods typically aggregate representations of each individual node as graph representations, but fail to comprehensively explore local substructures (i.e., motifs and subgraphs), which also play important roles in many graph mining tasks. In this article, we propose a self-supervised graph representation learning framework named cluster-enhanced Contrast (CLEAR) that models the structural semantics of a graph from graph-level and substructure-level granularities, i.e., global semantics and local semantics, respectively. Specifically, we use graph-level augmentation strategies followed by a graph neural network-based encoder to explore global semantics. As for local semantics, we first use graph clustering techniques to partition each whole graph into several subgraphs while preserving as much semantic information as possible. We further employ a self-attention interaction module to aggregate the semantics of all subgraphs into a local-view graph representation. Moreover, we integrate both global semantics and local semantics into a multiview graph contrastive learning framework, enhancing the semantic-discriminative ability of graph representations. Extensive experiments on various real-world benchmarks demonstrate the efficacy of the proposed over current graph self-supervised representation learning approaches on both graph classification and transfer learning tasks.
Collapse
|
42
|
A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences. BIOLOGY 2022; 11:biology11050775. [PMID: 35625503 PMCID: PMC9139052 DOI: 10.3390/biology11050775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/10/2022] [Accepted: 05/11/2022] [Indexed: 11/16/2022]
Abstract
Simple Summary Protein–protein interactions (PPIs) play a central role in the evolution and progression of various biological processes. In this article, we constructed a novel ensemble-learning-based model to predict potential PPIs, which only utilized the protein sequence information. The presented method used Discrete Hilbert transform to extract amino acid sequence information from position-specific scoring matrices. Then these extracted features were fed into rotation forest for training and predicting. When applying our method to the three datasets (Yeast, Human, and Oryza sativa) for detecting PPIs, we obtained excellent prediction performance. Furthermore, the comparison results indicated that our computational model is effective and robust in predicting potential PPI pairs. Abstract Protein–protein interactions (PPIs) are crucial for understanding the cellular processes, including signal cascade, DNA transcription, metabolic cycles, and repair. In the past decade, a multitude of high-throughput methods have been introduced to detect PPIs. However, these techniques are time-consuming, laborious, and always suffer from high false negative rates. Therefore, there is a great need of new computational methods as a supplemental tool for PPIs prediction. In this article, we present a novel sequence-based model to predict PPIs that combines Discrete Hilbert transform (DHT) and Rotation Forest (RoF). This method contains three stages: firstly, the Position-Specific Scoring Matrices (PSSM) was adopted to transform the amino acid sequence into a PSSM matrix, which can contain rich information about protein evolution. Then, the 400-dimensional DHT descriptor was constructed for each protein pair. Finally, these feature descriptors were fed to the RoF classifier for identifying the potential PPI class. When exploring the proposed model on the Yeast, Human, and Oryza sativa PPIs datasets, it yielded excellent prediction accuracies of 91.93, 96.35, and 94.24%, respectively. In addition, we also conducted numerous experiments on cross-species PPIs datasets, and the predictive capacity of our method is also very excellent. To further access the prediction ability of the proposed approach, we present the comparison of RoF with four powerful classifiers, including Support Vector Machine (SVM), Random Forest (RF), K-nearest Neighbor (KNN), and AdaBoost. We also compared it with some existing superiority works. These comprehensive experimental results further confirm the excellent and feasibility of the proposed approach. In future work, we hope it can be a supplemental tool for the proteomics analysis.
Collapse
|
43
|
Lin K, Quan X, Jin C, Shi Z, Yang J. An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions. Front Genet 2022; 13:885627. [PMID: 35432476 PMCID: PMC9012241 DOI: 10.3389/fgene.2022.885627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 03/07/2022] [Indexed: 12/01/2022] Open
Abstract
Background Classification and annotation of enzyme proteins are fundamental for enzyme research on biological metabolism. Enzyme Commission (EC) numbers provide a standard for hierarchical enzyme class prediction, on which several computational methods have been proposed. However, most of these methods are dependent on prior distribution information and none explicitly quantifies amino-acid-level relations and possible contribution of sub-sequences. Methods In this study, we propose a double-scale attention enzyme class prediction model named DAttProt with high reusability and interpretability. DAttProt encodes sequence by self-supervised Transformer encoders in pre-training and gathers local features by multi-scale convolutions in fine-tuning. Specially, a probabilistic double-scale attention weight matrix is designed to aggregate multi-scale features and positional prediction scores. Finally, a full connection linear classifier conducts a final inference through the aggregated features and prediction scores. Results On DEEPre and ECPred datasets, DAttProt performs as competitive with the compared methods on level 0 and outperforms them on deeper task levels, reaching 0.788 accuracy on level 2 of DEEPre and 0.967 macro-F1 on level 1 of ECPred. Moreover, through case study, we demonstrate that the double-scale attention matrix learns to discover and focus on the positions and scales of bio-functional sub-sequences in the protein. Conclusion Our DAttProt provides an effective and interpretable method for enzyme class prediction. It can predict enzyme protein classes accurately and furthermore discover enzymatic functional sub-sequences such as protein motifs from both positional and spatial scales.
Collapse
Affiliation(s)
- Ken Lin
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Xiongwen Quan
- College of Artificial Intelligence, Nankai University, Tianjin, China
- *Correspondence: Xiongwen Quan,
| | - Chen Jin
- College of Computer Science, Nankai University, Tianjin, China
| | - Zhuangwei Shi
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Jinglong Yang
- College of Artificial Intelligence, Nankai University, Tianjin, China
| |
Collapse
|
44
|
Pan J, You ZH, Li LP, Huang WZ, Guo JX, Yu CQ, Wang LP, Zhao ZY. DWPPI: A Deep Learning Approach for Predicting Protein–Protein Interactions in Plants Based on Multi-Source Information With a Large-Scale Biological Network. Front Bioeng Biotechnol 2022; 10:807522. [PMID: 35387292 PMCID: PMC8978800 DOI: 10.3389/fbioe.2022.807522] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 02/25/2022] [Indexed: 12/30/2022] Open
Abstract
The prediction of protein–protein interactions (PPIs) in plants is vital for probing the cell function. Although multiple high-throughput approaches in the biological domain have been developed to identify PPIs, with the increasing complexity of PPI network, these methods fall into laborious and time-consuming situations. Thus, it is essential to develop an effective and feasible computational method for the prediction of PPIs in plants. In this study, we present a network embedding-based method, called DWPPI, for predicting the interactions between different plant proteins based on multi-source information and combined with deep neural networks (DNN). The DWPPI model fuses the protein natural language sequence information (attribute information) and protein behavior information to represent plant proteins as feature vectors and finally sends these features to a deep learning–based classifier for prediction. To validate the prediction performance of DWPPI, we performed it on three model plant datasets: Arabidopsis thaliana (A. thaliana), mazie (Zea mays), and rice (Oryza sativa). The experimental results with the fivefold cross-validation technique demonstrated that DWPPI obtains great performance with the AUC (area under ROC curves) values of 0.9548, 0.9867, and 0.9213, respectively. To further verify the predictive capacity of DWPPI, we compared it with some different state-of-the-art machine learning classifiers. Moreover, case studies were performed with the AC149810.2_FGP003 protein. As a result, 14 of the top 20 PPI pairs identified by DWPPI with the highest scores were confirmed by the literature. These excellent results suggest that the DWPPI model can act as a promising tool for related plant molecular biology.
Collapse
Affiliation(s)
- Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Zhu-Hong You
- School of Information Engineering, Xijing University, Xi’an, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
- College of Grassland and Environment Science, Xinjiang Agricultural University, Urumqi, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Wen-Zhun Huang
- School of Information Engineering, Xijing University, Xi’an, China
| | - Jian-Xin Guo
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Li-Ping Wang
- School of Information Engineering, Xijing University, Xi’an, China
| | - Zheng-Yang Zhao
- School of Information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
45
|
Zhou X, Song H, Li J. Residue-Frustration-Based Prediction of Protein-Protein Interactions Using Machine Learning. J Phys Chem B 2022; 126:1719-1727. [PMID: 35170967 DOI: 10.1021/acs.jpcb.1c10525] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The study of protein-protein interactions (PPIs) is important in understanding the function of proteins. However, it is still a challenge to investigate the transient protein-protein interaction by experiments. Hence, the computational prediction for protein-protein interactions draws growing attention. Statistics-based features have been widely used in the studies of protein structure prediction and protein folding. Due to the scarcity of experimental data of PPI, it is difficult to construct a conventional statistical feature for PPI prediction, and the application of statistics-based features is very limited in this field. In this paper, we explored the application of frustration, a statistical potential, in PPI prediction. By comparing the energetic contribution of the extra stabilization energy from a given residue pair in the native protein with the statistics of the energies, we obtained the residue pair's frustration index. By calculating the number of residue pairs with a high frustration index, the highly frustrated density, a residue-frustration-based feature, was then obtained to describe the tendency of residues to be involved in PPI. Highly frustrated density, as well as structure-based features, were then used to describe protein residues and combined with the long short-term memory (LSTM) neural network to predict PPI residue pairs. Our model correctly predicted 75% dimers when only the top 2‰ residue pairs were selected in each dimer. Our model, which considers the statistics-based features, is significantly different from the models based on the chemical features of residues. We found that frustration can effectively describe the tendency of residue to be involved in PPI. Frustration-based features can replace chemical features to combine with machine learning and realize the better performance of PPI prediction. It reveals the great potential of statistical potential such as frustration in PPI prediction.
Collapse
Affiliation(s)
- Xiaozhou Zhou
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Haoyu Song
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Jingyuan Li
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| |
Collapse
|
46
|
Liu Z, Ren Z, Yan L, Li F. DeepLRR: An Online Webserver for Leucine-Rich-Repeat Containing Protein Characterization Based on Deep Learning. PLANTS (BASEL, SWITZERLAND) 2022; 11:plants11010136. [PMID: 35009139 PMCID: PMC8796025 DOI: 10.3390/plants11010136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 12/31/2021] [Accepted: 01/01/2022] [Indexed: 05/26/2023]
Abstract
Members of the leucine-rich repeat (LRR) superfamily play critical roles in multiple biological processes. As the LRR unit sequence is highly variable, accurately predicting the number and location of LRR units in proteins is a highly challenging task in the field of bioinformatics. Existing methods still need to be improved, especially when it comes to similarity-based methods. We introduce our DeepLRR method based on a convolutional neural network (CNN) model and LRR features to predict the number and location of LRR units in proteins. We compared DeepLRR with six existing methods using a dataset containing 572 LRR proteins and it outperformed all of them when it comes to overall F1 score. In addition, DeepLRR has integrated identifying plant disease-resistance proteins (NLR, LRR-RLK, LRR-RLP) and non-canonical domains. With DeepLRR, 223, 191 and 183 LRR-RLK genes in Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa ssp. Japonica) and tomato (Solanum lycopersicum) genomes were re-annotated, respectively. Chromosome mapping and gene cluster analysis revealed that 24.2% (54/223), 29.8% (57/191) and 16.9% (31/183) of LRR-RLK genes formed gene cluster structures in Arabidopsis, rice and tomato, respectively. Finally, we explored the evolutionary relationship and domain composition of LRR-RLK genes in each plant and distributions of known receptor and co-receptor pairs. This provides a new perspective for the identification of potential receptors and co-receptors.
Collapse
Affiliation(s)
- Zhenya Liu
- Key Lab of Horticultural Plant Biology (MOE), College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China
| | - Zirui Ren
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; (Z.R.); (L.Y.)
| | - Lunyi Yan
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; (Z.R.); (L.Y.)
| | - Feng Li
- Key Lab of Horticultural Plant Biology (MOE), College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
47
|
Wang L, Zhong C. gGATLDA: lncRNA-disease association prediction based on graph-level graph attention network. BMC Bioinformatics 2022; 23:11. [PMID: 34983363 PMCID: PMC8729153 DOI: 10.1186/s12859-021-04548-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 12/21/2021] [Indexed: 01/20/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are related to human diseases by regulating gene expression. Identifying lncRNA-disease associations (LDAs) will contribute to diagnose, treatment, and prognosis of diseases. However, the identification of LDAs by the biological experiments is time-consuming, costly and inefficient. Therefore, the development of efficient and high-accuracy computational methods for predicting LDAs is of great significance. Results In this paper, we propose a novel computational method (gGATLDA) to predict LDAs based on graph-level graph attention network. Firstly, we extract the enclosing subgraphs of each lncRNA-disease pair. Secondly, we construct the feature vectors by integrating lncRNA similarity and disease similarity as node attributes in subgraphs. Finally, we train a graph neural network (GNN) model by feeding the subgraphs and feature vectors to it, and use the trained GNN model to predict lncRNA-disease potential association scores. The experimental results show that our method can achieve higher area under the receiver operation characteristic curve (AUC), area under the precision recall curve (AUPR), accuracy and F1-Score than the state-of-the-art methods in five fold cross-validation. Case studies show that our method can effectively identify lncRNAs associated with breast cancer, gastric cancer, prostate cancer, and renal cancer. Conclusion The experimental results indicate that our method is a useful approach for predicting potential LDAs.
Collapse
Affiliation(s)
- Li Wang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.,School of Computer, Electronics and Information, Guangxi University, Nanning, China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, China. .,Key Laboratory of Parallel and Distributed Computing in Guangxi Colleges and Universities, Guangxi University, Nanning, China.
| |
Collapse
|
48
|
From complete cross-docking to partners identification and binding sites predictions. PLoS Comput Biol 2022; 18:e1009825. [PMID: 35089918 PMCID: PMC8827487 DOI: 10.1371/journal.pcbi.1009825] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 02/09/2022] [Accepted: 01/11/2022] [Indexed: 11/19/2022] Open
Abstract
Proteins ensure their biological functions by interacting with each other. Hence, characterising protein interactions is fundamental for our understanding of the cellular machinery, and for improving medicine and bioengineering. Over the past years, a large body of experimental data has been accumulated on who interacts with whom and in what manner. However, these data are highly heterogeneous and sometimes contradictory, noisy, and biased. Ab initio methods provide a means to a "blind" protein-protein interaction network reconstruction. Here, we report on a molecular cross-docking-based approach for the identification of protein partners. The docking algorithm uses a coarse-grained representation of the protein structures and treats them as rigid bodies. We applied the approach to a few hundred of proteins, in the unbound conformations, and we systematically investigated the influence of several key ingredients, such as the size and quality of the interfaces, and the scoring function. We achieved some significant improvement compared to previous works, and a very high discriminative power on some specific functional classes. We provide a readout of the contributions of shape and physico-chemical complementarity, interface matching, and specificity, in the predictions. In addition, we assessed the ability of the approach to account for protein surface multiple usages, and we compared it with a sequence-based deep learning method. This work may contribute to guiding the exploitation of the large amounts of protein structural models now available toward the discovery of unexpected partners and their complex structure characterisation.
Collapse
|
49
|
Wu Y, Zeng M, Fei Z, Yu Y, Wu FX, Li M. KAICD: A knowledge attention-based deep learning framework for automatic ICD coding. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2020.05.115] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
50
|
Tang M, Wu L, Yu X, Chu Z, Jin S, Liu J. Prediction of Protein-Protein Interaction Sites Based on Stratified Attentional Mechanisms. Front Genet 2021; 12:784863. [PMID: 34880910 PMCID: PMC8647646 DOI: 10.3389/fgene.2021.784863] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 10/08/2021] [Indexed: 11/19/2022] Open
Abstract
Proteins are the basic substances that undertake human life activities, and they often perform their biological functions through interactions with other biological macromolecules, such as cell transmission and signal transduction. Predicting the interaction sites between proteins can deepen the understanding of the principle of protein interactions, but traditional experimental methods are time-consuming and labor-intensive. In this study, a new hierarchical attention network structure, named HANPPIS, by adding six effective features of protein sequence, position-specific scoring matrix (PSSM), secondary structure, pre-training vector, hydrophilic, and amino acid position, is proposed to predict protein–protein interaction (PPI) sites. The experiment proved that our model has obtained very effective results, which was better than the existing advanced calculation methods. More importantly, we used the double-layer attention mechanism to improve the interpretability of the model and to a certain extent solved the problem of the “black box” of deep neural networks, which can be used as a reference for location positioning on the biological level.
Collapse
Affiliation(s)
- Minli Tang
- Department of Computer Science and Technology, Xiamen University, Xiamen, China.,School of Big Data Engineering, Kaili University, Kaili, China
| | - Longxin Wu
- Department of Computer Science and Technology, Xiamen University, Xiamen, China
| | - Xinyu Yu
- Department of Computer Science and Technology, Xiamen University, Xiamen, China
| | - Zhaoqi Chu
- Department of Instrumental and Electrical Engineering, School of Aerospace Engineering, Xiamen University, Xiamen, China
| | - Shuting Jin
- Department of Computer Science and Technology, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Juan Liu
- Department of Instrumental and Electrical Engineering, School of Aerospace Engineering, Xiamen University, Xiamen, China
| |
Collapse
|