Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tan JX, Lv H, Wang F, Dao FY, Chen W, Ding H. A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods. Curr Drug Targets 2020;20:540-550. [PMID: 30277150 DOI: 10.2174/1389450119666181002143355] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 08/17/2018] [Accepted: 09/04/2018] [Indexed: 12/13/2022]

For:	Tan JX, Lv H, Wang F, Dao FY, Chen W, Ding H. A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods. Curr Drug Targets 2020;20:540-550. [PMID: 30277150 DOI: 10.2174/1389450119666181002143355] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 08/17/2018] [Accepted: 09/04/2018] [Indexed: 12/13/2022]

Number

Cited by Other Article(s)

Singh L, Singh S, Singh DD. A Machine Learning Approach to Identify C Type Lectin Domain (CTLD) Containing Proteins. Protein J 2024:10.1007/s10930-024-10224-x. [PMID: 39068630 DOI: 10.1007/s10930-024-10224-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/07/2024] [Indexed: 07/30/2024]

Abstract

Lectins are sugar interacting proteins which bind specific glycans reversibly and have ubiquitous presence in all forms of life. They have diverse biological functions such as cell signaling, molecular recognition, etc. C-type lectins (CTL) are a group of proteins from the lectin family which have been studied extensively in animals and are reported to be involved in immune functions, carcinogenesis, cell signaling, etc. The carbohydrate recognition domain (CRD) in CTL has a highly variable protein sequence and proteins carrying this domain are also referred to as C-type lectin domain containing proteins (CTLD). Because of this low sequence homology, identification of CTLD from hypothetical proteins in the sequenced genomes using homology based programs has limitations. Machine learning (ML) tools use characteristic features to identify homologous sequences and it has been used to develop a tool for identification of CTLD. Initially 500 sequences of well annotated CTLD and 500 sequences of non CTLD were used in developing the machine learning model. The classifier program Linear SVC from sci kit library of python was used and characteristic features in CTLD sequences like dipeptide and tripeptide composition were used as training attributes in various classifiers. A precision, recall and multiple correlation coefficient (MCC) value of 0.92, 0.91 and 0.82 respectively were obtained when tested on external test set. On fine tuning of the parameters like kernel, C value, gamma, degree and increasing number of non CTLD sequences there was improvement in precision, recall and MCC and the corresponding values were 0.99, 0.99 and 0.96. New CTLD have also been identified in the hypothetical segment of human genome using the trained model. The tool is available on our local server for interested users.

Collapse

Huang Y, Lin Y, Lan W, Huang C, Zhong C. GloEC: a hierarchical-aware global model for predicting enzyme function. Brief Bioinform 2024;25:bbae365. [PMID: 39073830 DOI: 10.1093/bib/bbae365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/18/2024] [Accepted: 07/12/2024] [Indexed: 07/30/2024] Open

Yadav AK, Gupta PK, Singh TR. PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs. Mol Divers 2024:10.1007/s11030-024-10937-2. [PMID: 39033257 DOI: 10.1007/s11030-024-10937-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 07/10/2024] [Indexed: 07/23/2024]

Idhaya T, Suruliandi A, Raja SP. A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction. Protein J 2024;43:171-186. [PMID: 38427271 DOI: 10.1007/s10930-024-10181-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/02/2024]

Abstract

Proteomics is a field dedicated to the analysis of proteins in cells, tissues, and organisms, aiming to gain insights into their structures, functions, and interactions. A crucial aspect within proteomics is protein family prediction, which involves identifying evolutionary relationships between proteins by examining similarities in their sequences or structures. This approach holds great potential for applications such as drug discovery and functional annotation of genomes. However, current methods for protein family prediction have certain limitations, including limited accuracy, high false positive rates, and challenges in handling large datasets. Some methods also rely on homologous sequences or protein structures, which introduce biases and restrict their applicability to specific protein families or structures. To overcome these limitations, researchers have turned to machine learning (ML) approaches that can identify connections between protein features and simplify complex high-dimensional datasets. This paper presents a comprehensive survey of articles that employ various ML techniques for predicting protein families. The primary objective is to explore and improve ML techniques specifically for protein family prediction, thus advancing future research in the field. Through qualitative and quantitative analyses of ML techniques, it is evident that multiple methods utilizing a range of classifiers have been applied for protein family prediction. However, there has been limited focus on developing novel classifiers for protein family classification, highlighting the urgent need for improved approaches in this area. By addressing these challenges, this research aims to enhance the accuracy and effectiveness of protein family prediction, ultimately facilitating advancements in proteomics and its diverse applications.

Collapse

Chen L, Zhang C, Xu J. PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes. BMC Bioinformatics 2024;25:50. [PMID: 38291384 PMCID: PMC10829269 DOI: 10.1186/s12859-024-05665-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 01/22/2024] [Indexed: 02/01/2024] Open

Abstract

BACKGROUND

Enzymes play an irreplaceable and important role in maintaining the lives of living organisms. The Enzyme Commission (EC) number of an enzyme indicates its essential functions. Correct identification of the first digit (family class) of the EC number for a given enzyme is a hot topic in the past twenty years. Several previous methods adopted functional domain composition to represent enzymes. However, it would lead to dimension disaster, thereby reducing the efficiency of the methods. On the other hand, most previous methods can only deal with enzymes belonging to one family class. In fact, several enzymes belong to two or more family classes.

RESULTS

In this study, a fast and efficient multi-label classifier, named PredictEFC, was designed. To construct this classifier, a novel feature extraction scheme was designed for processing functional domain information of enzymes, which counting the distribution of each functional domain entry across seven family classes in the training dataset. Based on this scheme, each training or test enzyme was encoded into a 7-dimenion vector by fusing its functional domain information and above statistical results. Random k-labelsets (RAKEL) was adopted to build the classifier, where random forest was selected as the base classification algorithm. The two tenfold cross-validation results on the training dataset shown that the accuracy of PredictEFC can reach 0.8493 and 0.8370. The independent test on two datasets indicated the accuracy values of 0.9118 and 0.8777.

CONCLUSION

The performance of PredictEFC was slightly lower than the classifier directly using functional domain composition. However, its efficiency was sharply improved. The running time was less than one-tenth of the time of the classifier directly using functional domain composition. In additional, the utility of PredictEFC was superior to the classifiers using traditional dimensionality reduction methods and some previous methods, and this classifier can be transplanted for predicting enzyme family classes of other species. Finally, a web-server available at http://124.221.158.221/ was set up for easy usage.

Collapse

Ge F, Chen G, Qian M, Xu C, Liu J, Cao J, Li X, Hu D, Xu Y, Xin Y, Wang D, Zhou J, Shi H, Tan Z. Artificial Intelligence Aided Lipase Production and Engineering for Enzymatic Performance Improvement. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023;71:14911-14930. [PMID: 37800676 DOI: 10.1021/acs.jafc.3c05029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]

Alletto P, Garcia AM, Marchesan S. Short Peptides for Hydrolase Supramolecular Mimicry and Their Potential Applications. Gels 2023;9:678. [PMID: 37754360 PMCID: PMC10529927 DOI: 10.3390/gels9090678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 08/19/2023] [Accepted: 08/21/2023] [Indexed: 09/28/2023] Open

Khosravi F, Fard EM, Hosseininezhad M, Shoorideh H. Identification and characterization of inulinases by bioinformatics analysis of bacterial glycoside hydrolases family 32 (GH32). Eng Life Sci 2023;23:e2300003. [PMID: 37533727 PMCID: PMC10390659 DOI: 10.1002/elsc.202300003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 05/15/2023] [Accepted: 06/26/2023] [Indexed: 08/04/2023] Open

Abstract

The glycoside hydrolase family contains enzymes that break the glycosidic bonds of carbohydrates by hydrolysis. Inulinase is one of the most important industrial enzymes in the family of Glycoside Hydrolases 32 (GH32). In this study, to identify and classify bacterial inulinases initially, 16,002 protein sequences belonging to the GH32 family were obtained using various databases. The inulin-effective enzymes (endoinulinase and exoinulinase) were identified. Eight endoinulinases (EC 3.2.1.7) and 4318 exoinulinases (EC 3.2.1.80) were found. Then, the localization of endoinulinase and exoinulinase enzymes in the cell was predicted. Among them, two extracellular endoinulinases and 1232 extracellular exoinulinases were found. The biochemical properties of 363 enzymes of the genus Arthrobacter, Bacillus, and Streptomyces (most abundant) showed that exoinulinases have an acid isoelectric point up to the neutral range due to their amino acid length. That is, the smaller the protein (336 aa), the more acidic the pI (4.39), and the larger the protein (1207 aa), the pI is in the neutral range (8.84). Also, a negative gravitational index indicates the hydrophilicity of exoinulinases. Finally, considering the biochemical properties affecting protein stability and post-translational changes studies, one enzyme for endoinulinase and 40 enzymes with desirable characteristics were selected to identify their enzyme production sources. To screen and isolate enzyme-containing strains, now with the expansion of databases and the development of bioinformatics tools, it is possible to classify, review and analyze a lot of data related to different enzyme-producing strains. Although, in laboratory studies, a maximum of 20 to 30 strains can be examined. Therefore, when more strains are examined, finally, strains with more stable and efficient enzymes were selected and introduced for laboratory activities. The findings of this study can help researchers to select the appropriate gene source from introduced strains for cloning and expression heterologous inulinase, or to extract native inulinase from introduced strains.

Collapse

Liu S, Liang Y, Li J, Yang S, Liu M, Liu C, Yang D, Zuo Y. Integrating reduced amino acid composition into PSSM for improving copper ion-binding protein prediction. Int J Biol Macromol 2023:124993. [PMID: 37307968 DOI: 10.1016/j.ijbiomac.2023.124993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/12/2023] [Accepted: 05/19/2023] [Indexed: 06/14/2023]

Affiliation(s)

Shanghua Liu State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
Yuchao Liang State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China
Jinzhao Li State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
Siqi Yang State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
Ming Liu State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
Chengfang Liu State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
Dezhi Yang Inner Mongolia International Mongolian Hospital, Hohhot 010065, China.
Yongchun Zuo State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China.

Collapse

Duong-Trung N, Born S, Kim JW, Schermeyer MT, Paulick K, Borisyak M, Cruz-Bournazou MN, Werner T, Scholz R, Schmidt-Thieme L, Neubauer P, Martinez E. When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development. Biochem Eng J 2022. [DOI: 10.1016/j.bej.2022.108764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Nallapareddy MV, Dwivedula R. ABLE: Attention based learning for enzyme classification. Comput Biol Chem 2021;94:107558. [PMID: 34481129 DOI: 10.1016/j.compbiolchem.2021.107558] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 07/28/2021] [Accepted: 08/10/2021] [Indexed: 11/19/2022]

Yan K, Wen J, Liu JX, Xu Y, Liu B. Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2008-2016. [PMID: 31940548 DOI: 10.1109/tcbb.2020.2966450] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Baldazzi D, Savojardo C, Martelli PL, Casadio R. BENZ WS: the Bologna ENZyme Web Server for four-level EC number annotation. Nucleic Acids Res 2021;49:W60-W66. [PMID: 33963861 PMCID: PMC8262719 DOI: 10.1093/nar/gkab328] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/01/2021] [Accepted: 04/20/2021] [Indexed: 11/12/2022] Open

Zhao S, Ju Y, Ye X, Zhang J, Han S. Bioluminescent Proteins Prediction with Voting Strategy. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200601122328] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

4mCPred-CNN-Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network. Genes (Basel) 2021;12:genes12020296. [PMID: 33672576 PMCID: PMC7924022 DOI: 10.3390/genes12020296] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 02/16/2021] [Accepted: 02/17/2021] [Indexed: 02/07/2023] Open

Wang H, Xi Q, Liang P, Zheng L, Hong Y, Zuo Y. IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy. Amino Acids 2021;53:239-251. [PMID: 33486591 DOI: 10.1007/s00726-021-02941-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 01/11/2021] [Indexed: 12/18/2022]

Shi W, Chen X, Deng L. A Review of Recent Developments and Progress in Computational Drug Repositioning. Curr Pharm Des 2021;26:3059-3068. [PMID: 31951162 DOI: 10.2174/1381612826666200116145559] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/09/2020] [Indexed: 12/27/2022]

Rahman A, Susmi TF, Yasmin F, Karim ME, Hossain MU. Functional annotation of an ecologically important protein from Chloroflexus aurantiacus involved in polyhydroxyalkanoates (PHA) biosynthetic pathway. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-03598-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open

Wang XF, Gao P, Liu YF, Li HF, Lu F. Predicting Thermophilic Proteins by Machine Learning. Curr Bioinform 2020. [DOI: 10.2174/1574893615666200207094357] [Citation(s) in RCA: 83] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Wahab A, Mahmoudi O, Kim J, Chong KT. DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning. Cells 2020;9:E1756. [PMID: 32707969 PMCID: PMC7465362 DOI: 10.3390/cells9081756] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 07/17/2020] [Accepted: 07/17/2020] [Indexed: 11/24/2022] Open

Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs. BIOMED RESEARCH INTERNATIONAL 2020;2020:9235920. [PMID: 32596396 PMCID: PMC7273372 DOI: 10.1155/2020/9235920] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 04/22/2020] [Indexed: 11/17/2022]

Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020;26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]

Meng C, Zhang J, Ye X, Guo F, Zou Q. Review and comparative analysis of machine learning-based phage virion protein identification methods. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020;1868:140406. [PMID: 32135196 DOI: 10.1016/j.bbapap.2020.140406] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 02/14/2020] [Accepted: 02/27/2020] [Indexed: 02/01/2023]

Huang Q, Zhang J, Wei L, Guo F, Zou Q. 6mA-RicePred: A Method for Identifying DNA N ⁶-Methyladenine Sites in the Rice Genome Based on Feature Fusion. FRONTIERS IN PLANT SCIENCE 2020;11:4. [PMID: 32076430 PMCID: PMC7006724 DOI: 10.3389/fpls.2020.00004] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/06/2020] [Indexed: 06/01/2023]

Ostermeier L, Oliva R, Winter R. The multifaceted effects of DMSO and high hydrostatic pressure on the kinetic constants of hydrolysis reactions catalyzed by α-chymotrypsin. Phys Chem Chem Phys 2020;22:16325-16333. [DOI: 10.1039/d0cp03062g] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Sun S, Wang C, Ding H, Zou Q. Machine learning and its applications in plant molecular studies. Brief Funct Genomics 2019;19:40-48. [DOI: 10.1093/bfgp/elz036] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/06/2019] [Accepted: 09/15/2019] [Indexed: 01/16/2023] Open