Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jeong JC, Lin X, Chen XW. On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans Comput Biol Bioinform 2011;8:308-315. [PMID: 20855926 DOI: 10.1109/tcbb.2010.93] [Citation(s) in RCA: 117] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

For:	Jeong JC, Lin X, Chen XW. On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans Comput Biol Bioinform 2011;8:308-315. [PMID: 20855926 DOI: 10.1109/tcbb.2010.93] [Citation(s) in RCA: 117] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Number

Cited by Other Article(s)

Zhu Y, Sun A. LGC-DBP: the method of DNA-binding protein identification based on PSSM and deep learning. Front Genet 2024;15:1411847. [PMID: 38903752 PMCID: PMC11188361 DOI: 10.3389/fgene.2024.1411847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 05/14/2024] [Indexed: 06/22/2024] Open

Zeng X, Meng FF, Wen ML, Li SJ, Li Y. GNNGL-PPI: multi-category prediction of protein-protein interactions using graph neural networks based on global graphs and local subgraphs. BMC Genomics 2024;25:406. [PMID: 38724906 PMCID: PMC11080243 DOI: 10.1186/s12864-024-10299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 04/10/2024] [Indexed: 05/13/2024] Open

Ullah M, Akbar S, Raza A, Zou Q. DeepAVP-TPPred: identification of antiviral peptides using transformed image-based localized descriptors and binary tree growth algorithm. Bioinformatics 2024;40:btae305. [PMID: 38710482 PMCID: PMC11256913 DOI: 10.1093/bioinformatics/btae305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/08/2024] [Accepted: 05/03/2024] [Indexed: 05/08/2024] Open

Zhang H, Liu X, Cheng W, Wang T, Chen Y. Prediction of drug-target binding affinity based on deep learning models. Comput Biol Med 2024;174:108435. [PMID: 38608327 DOI: 10.1016/j.compbiomed.2024.108435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]

Song FV, Su J, Huang S, Zhang N, Li K, Ni M, Liao M. DeepSS2GO: protein function prediction from secondary structure. Brief Bioinform 2024;25:bbae196. [PMID: 38701416 PMCID: PMC11066904 DOI: 10.1093/bib/bbae196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/31/2024] [Accepted: 04/10/2024] [Indexed: 05/05/2024] Open

Sun Y, Li YY, Leung CK, Hu P. iNGNN-DTI: prediction of drug-target interaction with interpretable nested graph neural network and pretrained molecule models. Bioinformatics 2024;40:btae135. [PMID: 38449285 PMCID: PMC10957515 DOI: 10.1093/bioinformatics/btae135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 12/31/2023] [Accepted: 03/05/2024] [Indexed: 03/08/2024] Open

Zhao H, Zhu B, Jiang T, Cui Z, Wu H. Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024;21:170-185. [PMID: 38303418 DOI: 10.3934/mbe.2024008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]

Xu B, Chen Y, Xue W. Computational Protein Design - Where it goes? Curr Med Chem 2024;31:2841-2854. [PMID: 37272467 DOI: 10.2174/0929867330666230602143700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 02/18/2023] [Accepted: 03/15/2023] [Indexed: 06/06/2023]

Zeng X, Zhong KY, Jiang B, Li Y. Fusing Sequence and Structural Knowledge by Heterogeneous Models to Accurately and Interpretively Predict Drug-Target Affinity. Molecules 2023;28:8005. [PMID: 38138496 PMCID: PMC10745601 DOI: 10.3390/molecules28248005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/06/2023] [Accepted: 12/06/2023] [Indexed: 12/24/2023] Open

Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023;3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]

Zou K, Wang S, Wang Z, Zou H, Yang F. Dual-Signal Feature Spaces Map Protein Subcellular Locations Based on Immunohistochemistry Image and Protein Sequence. SENSORS (BASEL, SWITZERLAND) 2023;23:9014. [PMID: 38005402 PMCID: PMC10675401 DOI: 10.3390/s23229014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/29/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023]

Abstract

Protein is one of the primary biochemical macromolecular regulators in the compartmental cellular structure, and the subcellular locations of proteins can therefore provide information on the function of subcellular structures and physiological environments. Recently, data-driven systems have been developed to predict the subcellular location of proteins based on protein sequence, immunohistochemistry (IHC) images, or immunofluorescence (IF) images. However, the research on the fusion of multiple protein signals has received little attention. In this study, we developed a dual-signal computational protocol by incorporating IHC images into protein sequences to learn protein subcellular localization. Three major steps can be summarized as follows in this protocol: first, a benchmark database that includes 281 proteins sorted out from 4722 proteins of the Human Protein Atlas (HPA) and Swiss-Prot database, which is involved in the endoplasmic reticulum (ER), Golgi apparatus, cytosol, and nucleoplasm; second, discriminative feature operators were first employed to quantitate protein image-sequence samples that include IHC images and protein sequence; finally, the feature subspace of different protein signals is absorbed to construct multiple sub-classifiers via dimensionality reduction and binary relevance (BR), and multiple confidence derived from multiple sub-classifiers is adopted to decide subcellular location by the centralized voting mechanism at the decision layer. The experimental results indicated that the dual-signal model embedded IHC images and protein sequences outperformed the single-signal models with accuracy, precision, and recall of 75.41%, 80.38%, and 74.38%, respectively. It is enlightening for further research on protein subcellular location prediction under multi-signal fusion of protein.

Collapse

Shen J, Xia Y, Lu Y, Lu W, Qian M, Wu H, Fu Q, Chen J. Identification of membrane protein types via deep residual hypergraph neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:20188-20212. [PMID: 38052642 DOI: 10.3934/mbe.2023894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]

Djulbegovic M, Taylor Gonzalez DJ, Antonietti M, Uversky VN, Shields CL, Karp CL. Intrinsic disorder may drive the interaction of PROS1 and MERTK in uveal melanoma. Int J Biol Macromol 2023;250:126027. [PMID: 37506796 PMCID: PMC11182630 DOI: 10.1016/j.ijbiomac.2023.126027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 07/23/2023] [Accepted: 07/25/2023] [Indexed: 07/30/2023]

Abstract

BACKGROUND

Class 2 uveal melanomas are associated with the inactivation of the BRCA1 ((breast cancer type 1 susceptibility protein)-associated protein 1 (BAP1)) gene. Inactivation of BAP1 promotes the upregulation of vitamin K-dependent protein S (PROS1), which interacts with the tyrosine-protein kinase Mer (MERTK) receptor on M2 macrophages to induce an immunosuppressive environment.

METHODS

We simulated the interaction of PROS1 with MERTK with ColabFold. We evaluated PROS1 and MERTK for the presence of intrinsically disordered protein regions (IDPRs) and disorder-to-order (DOT) regions to understand their protein-protein interaction (PPI). We first evaluated the structure of each protein with AlphaFold. We then analyzed specific sequence-based features of the PROS1 and MERTK with a suite of bioinformatics tools.

RESULTS

With high-resolution, moderate confidence, we successfully modeled the interaction between PROS1 and MERTK (predicted local distance difference test score (pDLLT) = 70.68). Our structural analysis qualitatively demonstrated IDPRs (i.e., spaghetti-like entities) in PROS1 and MERK. PROS1 was 23.37 % disordered, and MERTK was 23.09 % disordered, classifying them as moderately disordered and flexible proteins. PROS1 was significantly enriched in cysteine, the most order-promoting residue (p-value <0.05). Our IUPred analysis demonstrated that there are two disorder-to-order transition (DOT) regions in PROS1. MERTK was significantly enriched in proline, the most disorder-promoting residue (p-value <0.05), but did not contain DOT regions. Our STRING analysis demonstrated that the PPI network between PROS1 and MERTK is more complex than their assumed one-to-one binding (p-value <2.0 × 10-6).

CONCLUSION

Our findings present a novel prediction for the interaction between PROS1 and MERTK. Our findings show that PROS1 and MERTK contain elements of intrinsic disorder. PROS1 has two DOT regions that are attractive immunotherapy targets. We recommend that IDPRs and DOT regions found in PROS1 and MERTK should be considered when developing immunotherapies targeting this PPI.

Collapse

Zhou S, Zhou Y, Liu T, Zheng J, Jia C. PredLLPS_PSSM: a novel predictor for liquid-liquid protein separation identification based on evolutionary information and a deep neural network. Brief Bioinform 2023;24:bbad299. [PMID: 37609923 DOI: 10.1093/bib/bbad299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 08/24/2023] Open

Liu Y, Guan S, Jiang T, Fu Q, Ma J, Cui Z, Ding Y, Wu H. DNA protein binding recognition based on lifelong learning. Comput Biol Med 2023;164:107094. [PMID: 37459792 DOI: 10.1016/j.compbiomed.2023.107094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 05/09/2023] [Accepted: 05/27/2023] [Indexed: 09/09/2023]

Qian Y, Shang T, Guo F, Wang C, Cui Z, Ding Y, Wu H. Identification of DNA-binding protein based multiple kernel model. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:13149-13170. [PMID: 37501482 DOI: 10.3934/mbe.2023586] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]

Muazzam Ali Shah S, Ou YY. Disto-TRP: An approach for identifying transient receptor potential (TRP) channels using structural information generated by AlphaFold. Gene 2023;871:147435. [PMID: 37075925 DOI: 10.1016/j.gene.2023.147435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 03/13/2023] [Accepted: 04/13/2023] [Indexed: 04/21/2023]

Ming Y, Liu H, Cui Y, Guo S, Ding Y, Liu R. Identification of DNA-binding proteins by Kernel Sparse Representation via L_2,1-matrix norm. Comput Biol Med 2023;159:106849. [PMID: 37060772 DOI: 10.1016/j.compbiomed.2023.106849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 02/26/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]

Guan S, Qian Y, Jiang T, Jiang M, Ding Y, Wu H. MV-H-RKM: A Multiple View-Based Hypergraph Regularized Restricted Kernel Machine for Predicting DNA-Binding Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1246-1256. [PMID: 35731758 DOI: 10.1109/tcbb.2022.3183191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Qian Y, Ding Y, Zou Q, Guo F. Multi-View Kernel Sparse Representation for Identification of Membrane Protein Types. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1234-1245. [PMID: 35857734 DOI: 10.1109/tcbb.2022.3191325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Ji B, Pi W, Liu W, Liu Y, Cui Y, Zhang X, Peng S. HyperVR: a hybrid deep ensemble learning approach for simultaneously predicting virulence factors and antibiotic resistance genes. NAR Genom Bioinform 2023;5:lqad012. [PMID: 36789031 PMCID: PMC9918863 DOI: 10.1093/nargab/lqad012] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/13/2023] Open

Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023;98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]

Vora DS, Kalakoti Y, Sundar D. Computational Methods and Deep Learning for Elucidating Protein Interaction Networks. Methods Mol Biol 2023;2553:285-323. [PMID: 36227550 DOI: 10.1007/978-1-0716-2617-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Ma D, Li S, Chen Z. Drug-target binding affinity prediction method based on a deep graph neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:269-282. [PMID: 36650765 DOI: 10.3934/mbe.2023012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]

Li Y, Zeng M, Wu Y, Li Y, Li M. Accurate Prediction of Human Essential Proteins Using Ensemble Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:3263-3271. [PMID: 34699365 DOI: 10.1109/tcbb.2021.3122294] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Nourani E, Asgari E, McHardy AC, Mofrad MRK. TripletProt: Deep Representation Learning of Proteins Based On Siamese Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:3744-3753. [PMID: 34460382 DOI: 10.1109/tcbb.2021.3108718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Shah AA, Alturise F, Alkhalifah T, Khan YD. Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations. Digit Health 2022;8:20552076221133703. [PMID: 36312852 PMCID: PMC9597026 DOI: 10.1177/20552076221133703] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022] Open

Kha QH, Ho QT, Le NQK. Identifying SNARE Proteins Using an Alignment-Free Method Based on Multiscan Convolutional Neural Network and PSSM Profiles. J Chem Inf Model 2022;62:4820-4826. [PMID: 36166351 PMCID: PMC9554904 DOI: 10.1021/acs.jcim.2c01034] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Nguyen Q, Tran HV, Nguyen BP, Do TTT. Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition. ACS OMEGA 2022;7:32322-32330. [PMID: 36119976 PMCID: PMC9475634 DOI: 10.1021/acsomega.2c03696] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]

Arif M, Kabir M, Ahmed S, Khan A, Ge F, Khelifi A, Yu DJ. DeepCPPred: A Deep Learning Framework for the Discrimination of Cell-Penetrating Peptides and Their Uptake Efficiencies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2749-2759. [PMID: 34347603 DOI: 10.1109/tcbb.2021.3102133] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Ranjan A, Fernandez-Baca D, Tripathi S, Deepak A. An Ensemble Tf-Idf Based Approach to Protein Function Prediction via Sequence Segmentation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2685-2696. [PMID: 34185646 DOI: 10.1109/tcbb.2021.3093060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Wang N, Zhang J, Liu B. IDRBP-PPCT: Identifying Nucleic Acid-Binding Proteins Based on Position-Specific Score Matrix and Position-Specific Frequency Matrix Cross Transformation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2284-2293. [PMID: 33780341 DOI: 10.1109/tcbb.2021.3069263] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022;2022:5847242. [PMID: 35799660 PMCID: PMC9256349 DOI: 10.1155/2022/5847242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 06/07/2022] [Indexed: 11/17/2022]

Lu W, Shen J, Zhang Y, Wu H, Qian Y, Chen X, Fu Q. Identifying Membrane Protein Types Based on Lifelong Learning With Dynamically Scalable Networks. Front Genet 2022;12:834488. [PMID: 35371189 PMCID: PMC8964460 DOI: 10.3389/fgene.2021.834488] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 12/21/2021] [Indexed: 11/13/2022] Open

Mohammadi A, Zahiri J, Mohammadi S, Khodarahmi M, Arab SS. PSSMCOOL: A Comprehensive R Package for Generating Evolutionary-based Descriptors of Protein Sequences from PSSM Profiles. BIOLOGY METHODS AND PROTOCOLS 2022;7:bpac008. [PMID: 35388370 PMCID: PMC8977839 DOI: 10.1093/biomethods/bpac008] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 01/21/2022] [Indexed: 11/14/2022]

Kumar A, Sharma P, Arun A, Meena LS. Development of peptide vaccine candidate using highly antigenic PE-PGRS family proteins to stimulate the host immune response against Mycobacterium tuberculosis H₃₇Rv: an immuno-informatics approach. J Biomol Struct Dyn 2022;41:3382-3404. [PMID: 35293852 DOI: 10.1080/07391102.2022.2048079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences. BIOLOGY 2022;11:biology11030418. [PMID: 35336792 PMCID: PMC8945605 DOI: 10.3390/biology11030418] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 02/24/2022] [Accepted: 03/07/2022] [Indexed: 01/14/2023]

Abstract

Simple Summary

The family of coronaviruses comprises a diverse set of strains and variants which cause diseases from the common cold to COVID-19. Moreover, they infect a wide array of hosts from bats, camels, birds, to humans. Studying coronaviruses through the lens of host specificity provides a unique perspective to understanding the evolution, diversity and dynamics of this family. In particular, this can reveal groups of different hosts infected by similar strains, giving clues on strains which were more likely to have evolved to jump from one host to another. In this work, we frame host specificity as a classification task, in designing a very compact numerical representation of the spike sequences of different coronaviruses. Based on this numerical representation, classification methods are able to detect the target host with high accuracy. Such an approach can used to efficiently scale to large volumes of sequences, in order to unveil trends in the host specificity of different coronavirus strains.

Abstract

The study of host specificity has important connections to the question about the origin of SARS-CoV-2 in humans which led to the COVID-19 pandemic—an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona)viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating, and preventing current and future pandemics. In coronaviruses, the surface (S) protein, or spike protein, is important in determining host specificity, since it is the point of contact between the virus and the host cell membrane. In this paper, we classify the hosts of over five thousand coronaviruses from their spike protein sequences, segregating them into clusters of distinct hosts among birds, bats, camels, swine, humans, and weasels, to name a few. We propose a feature embedding based on the well-known position weight matrix (PWM), which we call PWM2Vec, and we use it to generate feature vectors from the spike protein sequences of these coronaviruses. While our embedding is inspired by the success of PWMs in biological applications, such as determining protein function and identifying transcription factor binding sites, we are the first (to the best of our knowledge) to use PWMs from viral sequences to generate fixed-length feature vector representations, and use them in the context of host classification. The results on real world data show that when using PWM2Vec, machine learning classifiers are able to perform comparably to the baseline models in terms of predictive performance and runtime—in some cases, the performance is better. We also measure the importance of different amino acids using information gain to show the amino acids which are important for predicting the host of a given coronavirus. Finally, we perform some statistical analyses on these results to show that our embedding is more compact than the embeddings of the baseline models.

Collapse

Yu B, Wang X, Zhang Y, Gao H, Wang Y, Liu Y, Gao X. RPI-MDLStack: Predicting RNA-protein interactions through deep learning with stacking strategy and LASSO. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Zhao Z, Yang W, Zhai Y, Liang Y, Zhao Y. Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm. Front Genet 2022;12:821996. [PMID: 35154264 PMCID: PMC8837382 DOI: 10.3389/fgene.2021.821996] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/07/2021] [Indexed: 12/13/2022] Open

Pan G, Sun C, Liao Z, Tang J. Machine and Deep Learning for Prediction of Subcellular Localization. Methods Mol Biol 2022;2361:249-261. [PMID: 34236666 DOI: 10.1007/978-1-0716-1641-3_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Qian Y, Meng H, Lu W, Liao Z, Ding Y, Wu H. Identification of DNA-Binding Proteins via Hypergraph Based Laplacian Support Vector Machine. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210806091922] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Tang M, Wu L, Yu X, Chu Z, Jin S, Liu J. Prediction of Protein-Protein Interaction Sites Based on Stratified Attentional Mechanisms. Front Genet 2021;12:784863. [PMID: 34880910 PMCID: PMC8647646 DOI: 10.3389/fgene.2021.784863] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 10/08/2021] [Indexed: 11/19/2022] Open

Liu Y, Jin S, Gao H, Wang X, Wang C, Zhou W, Yu B. Predicting the multi-label protein subcellular localization through multi-information fusion and MLSI dimensionality reduction based on MLFE classifier. Bioinformatics 2021;38:1223-1230. [PMID: 34864897 PMCID: PMC8690230 DOI: 10.1093/bioinformatics/btab811] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 11/17/2021] [Accepted: 11/30/2021] [Indexed: 01/05/2023] Open

Abstract

MOTIVATION

Multi-label (ML) protein subcellular localization (SCL) is an indispensable way to study protein function. It can locate a certain protein (such as the human transmembrane protein that promotes the invasion of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)) or expression product at a specific location in a cell, which can provide a reference for clinical treatment of diseases such as coronavirus disease 2019 (COVID-19).

RESULTS

The article proposes a novel method named ML-locMLFE. First of all, six feature extraction methods are adopted to obtain protein effective information. These methods include pseudo amino acid composition, encoding based on grouped weight, gene ontology, multi-scale continuous and discontinuous, residue probing transformation and evolutionary distance transformation. In the next part, we utilize the ML information latent semantic index method to avoid the interference of redundant information. In the end, ML learning with feature-induced labeling information enrichment is adopted to predict the ML protein SCL. The Gram-positive bacteria dataset is chosen as a training set, while the Gram-negative bacteria dataset, virus dataset, newPlant dataset and SARS-CoV-2 dataset as the test sets. The overall actual accuracy of the first four datasets are 99.23%, 93.82%, 93.24% and 96.72% by the leave-one-out cross validation. It is worth mentioning that the overall actual accuracy prediction result of our predictor on the SARS-CoV-2 dataset is 72.73%. The results indicate that the ML-locMLFE method has obvious advantages in predicting the SCL of ML protein, which provides new ideas for further research on the SCL of ML protein.

AVAILABILITY AND IMPLEMENTATION

The source codes and datasets are publicly available at https://github.com/QUST-AIBBDRC/ML-locMLFE/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Zou Y, Ding Y, Peng L, Zou Q. FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation. Interdiscip Sci 2021;14:372-384. [PMID: 34743286 DOI: 10.1007/s12539-021-00489-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 10/11/2021] [Accepted: 10/24/2021] [Indexed: 12/01/2022]

Ullah M, Han K, Hadi F, Xu J, Song J, Yu DJ. PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection. Brief Bioinform 2021;22:bbab278. [PMID: 34337652 PMCID: PMC8574991 DOI: 10.1093/bib/bbab278] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 06/30/2021] [Accepted: 07/01/2021] [Indexed: 01/17/2023] Open

Abstract

Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine-based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test.

Collapse

Ali Shah SM, Ou YY. TRP-BERT: Discrimination of transient receptor potential (TRP) channels using contextual representations from deep bidirectional transformer based on BERT. Comput Biol Med 2021;137:104821. [PMID: 34508974 DOI: 10.1016/j.compbiomed.2021.104821] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 08/26/2021] [Accepted: 08/27/2021] [Indexed: 11/16/2022]

Lyu Z, Wang Z, Luo F, Shuai J, Huang Y. Protein Secondary Structure Prediction With a Reductive Deep Learning Method. Front Bioeng Biotechnol 2021;9:687426. [PMID: 34211967 PMCID: PMC8240957 DOI: 10.3389/fbioe.2021.687426] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 04/26/2021] [Indexed: 12/12/2022] Open

Qian Y, Jiang L, Ding Y, Tang J, Guo F. A sequence-based multiple kernel model for identifying DNA-binding proteins. BMC Bioinformatics 2021;22:291. [PMID: 34058979 PMCID: PMC8167993 DOI: 10.1186/s12859-020-03875-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 11/13/2020] [Indexed: 11/18/2022] Open

Component Parts of Bacteriophage Virions Accurately Defined by a Machine-Learning Approach Built on Evolutionary Features. mSystems 2021;6:e0024221. [PMID: 34042467 PMCID: PMC8269216 DOI: 10.1128/msystems.00242-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

Antimicrobial resistance (AMR) continues to evolve as a major threat to human health, and new strategies are required for the treatment of AMR infections. Bacteriophages (phages) that kill bacterial pathogens are being identified for use in phage therapies, with the intention to apply these bactericidal viruses directly into the infection sites in bespoke phage cocktails. Despite the great unsampled phage diversity for this purpose, an issue hampering the roll out of phage therapy is the poor quality annotation of many of the phage genomes, particularly for those from infrequently sampled environmental sources. We developed a computational tool called STEP³ to use the “evolutionary features” that can be recognized in genome sequences of diverse phages. These features, when integrated into an ensemble framework, achieved a stable and robust prediction performance when benchmarked against other prediction tools using phages from diverse sources. Validation of the prediction accuracy of STEP³ was conducted with high-resolution mass spectrometry analysis of two novel phages, isolated from a watercourse in the Southern Hemisphere. STEP³ provides a robust computational approach to distinguish specific and universal features in phages to improve the quality of phage cocktails and is available for use at http://step3.erc.monash.edu/.

IMPORTANCE In response to the global problem of antimicrobial resistance, there are moves to use bacteriophages (phages) as therapeutic agents. Selecting which phages will be effective therapeutics relies on interpreting features contributing to shelf-life and applicability to diagnosed infections. However, the protein components of the phage virions that dictate these properties vary so much in sequence that best estimates suggest failure to recognize up to 90% of them. We have utilized this diversity in evolutionary features as an advantage, to apply machine learning for prediction accuracy for diverse components in phage virions. We benchmark this new tool showing the accurate recognition and evaluation of phage component parts using genome sequence data of phages from undersampled environments, where the richest diversity of phage still lies.

Collapse

Nanni L, Brahnam S. Robust ensemble of handcrafted and learned approaches for DNA-binding proteins. APPLIED COMPUTING AND INFORMATICS 2021. [DOI: 10.1108/aci-03-2021-0051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract Purpose Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks. Design/methodology/approach Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system. Findings The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system. Originality/value Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field. Collapse