1
|
Teng S, Wang B, Yang F, Yi X, Zhang X, Sun Y. MediDRNet: Tackling category imbalance in diabetic retinopathy classification with dual-branch learning and prototypical contrastive learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 253:108230. [PMID: 38810377 DOI: 10.1016/j.cmpb.2024.108230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 04/17/2024] [Accepted: 05/14/2024] [Indexed: 05/31/2024]
Abstract
BACKGROUND AND OBJECTIVE The classification of diabetic retinopathy (DR) aims to utilize the implicit information in images for early diagnosis, to prevent and mitigate the further worsening of the condition. However, existing methods are often limited by the need to operate within large, annotated datasets to show significant advantages. Additionally, the number of samples for different categories within the dataset needs to be evenly distributed, because the characteristic of sample imbalance distribution can lead to an excessive focus on high-frequency disease categories, while neglecting the less common but equally important disease categories. Therefore, there is an urgent need to develop a new classification method that can effectively alleviate the issue of sample distribution imbalance, thereby enhancing the accuracy of diabetic retinopathy classification. METHODS In this work, we propose MediDRNet, a dual-branch network model based on prototypical contrastive learning. This model adopts prototype contrastive learning, creating prototypes for different levels of lesions, ensuring they represent the core features of each lesion level. It classifies by comparing the similarity between data points and their category prototypes. Our dual-branch network structure effectively resolves the issue of category imbalance and improves classification accuracy by emphasizing subtle differences in retinal lesions. Moreover, our approach combines a dual-branch network with specific lesion-level prototypes for core feature representation and incorporates the convolutional block attention module for enhanced lesion feature identification. RESULTS Our experiments using both the Kaggle and UWF classification datasets have demonstrated that MediDRNet exhibits exceptional performance compared to other advanced models in the industry, especially on the UWF DR classification dataset where it achieved state-of-the-art performance across all metrics. On the Kaggle DR classification dataset, it achieved the highest average classification accuracy (0.6327) and Macro-F1 score (0.6361). Particularly in the classification tasks for minority categories of diabetic retinopathy on the Kaggle dataset (Grades 1, 2, 3, and 4), the model reached high classification accuracies of 58.08%, 55.32%, 69.73%, and 90.21%, respectively. In the ablation study, the MediDRNet model proved to be more effective in feature extraction from diabetic retinal fundus images compared to other feature extraction methods. CONCLUSIONS This study employed prototype contrastive learning and bidirectional branch learning strategies, successfully constructing a grading system for diabetic retinopathy lesions within imbalanced diabetic retinopathy datasets. Through a dual-branch network, the feature learning branch effectively facilitated a smooth transition of features from the grading network to the classification learning branch, accurately identifying minority sample categories. This method not only effectively resolved the issue of sample imbalance but also provided strong support for the precise grading and early diagnosis of diabetic retinopathy in clinical applications, showcasing exceptional performance in handling complex diabetic retinopathy datasets. Moreover, this research significantly improved the efficiency of prevention and management of disease progression in diabetic retinopathy patients within medical practice. We encourage the use and modification of our code, which is publicly accessible on GitHub: https://github.com/ReinforceLove/MediDRNet.
Collapse
Affiliation(s)
- Siying Teng
- Department of Ophthalmology, the First Hospital of Jilin University, Changchun, 130021, Jilin, China
| | - Bo Wang
- University of Minho, Braga, 4710-057, Braga District, Portugal
| | - Feiyang Yang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China
| | - Xingcheng Yi
- Laboratory of Cancer Precision Medicine, the First Hospital of Jilin University, Changchun, 130013, Jilin, China
| | - Xinmin Zhang
- Department of Regenerative Medicine, School of Pharmaceutical Science, Jilin University, Changchun, 130021, Jilin, China
| | - Yabin Sun
- Department of Ophthalmology, the First Hospital of Jilin University, Changchun, 130021, Jilin, China.
| |
Collapse
|
2
|
Mahmud MS, Fattah SA, Saquib M, Saha O. Emotion recognition with reduced channels using CWT based EEG feature representation and a CNN classifier. Biomed Phys Eng Express 2024; 10:045003. [PMID: 38457844 DOI: 10.1088/2057-1976/ad31f9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 03/08/2024] [Indexed: 03/10/2024]
Abstract
Objective.Although emotion recognition has been studied for decades, a more accurate classification method that requires less computing is still needed. At present, in many studies, EEG features are extracted from all channels to recognize emotional states, however, there is a lack of an efficient feature domain that improves classification performance and reduces the number of EEG channels.Approach.In this study, a continuous wavelet transform (CWT)-based feature representation of multi-channel EEG data is proposed for automatic emotion recognition. In the proposed feature, the time-frequency domain information is preserved by using CWT coefficients. For a particular EEG channel, each CWT coefficient is mapped into a strength-to-entropy component ratio to obtain a 2D representation. Finally, a 2D feature matrix, namely CEF2D, is created by concatenating these representations from different channels and fed into a deep convolutional neural network architecture. Based on the CWT domain energy-to-entropy ratio, effective channel and CWT scale selection schemes are also proposed to reduce computational complexity.Main results.Compared with previous studies, the results of this study show that valence and arousal classification accuracy has improved in both 3-class and 2-class cases. For the 2-class problem, the average accuracies obtained for valence and arousal dimensions are 98.83% and 98.95%, respectively, and for the 3-class, the accuracies are 98.25% and 98.68%, respectively.Significance.Our findings show that the entropy-based feature of EEG data in the CWT domain is effective for emotion recognition. Utilizing the proposed feature domain, an effective channel selection method can reduce computational complexity.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park-16802, PA, United States of America
| | - Shaikh Anowarul Fattah
- Department of Electrical and Electronic Engineering (EEE), Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Mohammad Saquib
- Department of Electrical and Computer Engineering, The University of Texas at Dallas, Richardson-75080, TX, United States of America
| | - Oishy Saha
- Department of Electrical and Computer Engineering, The University of Maryland-College Park, College Park-20742, MD, United States of America
| |
Collapse
|
3
|
Wang JM, Cui RK, Qian ZK, Yang ZZ, Li Y. Mining channel-regulated peptides from animal venom by integrating sequence semantics and structural information. Comput Biol Chem 2024; 109:108027. [PMID: 38340414 DOI: 10.1016/j.compbiolchem.2024.108027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/24/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
Channel-regulated peptides (CRPs) derived from animal venom hold great promise as potential drug candidates for numerous diseases associated with channel proteins. However, discovering and identifying CRPs using traditional bio-experimental methods is a time-consuming and laborious process. While there were a few computational studies on CRPs, they were limited to specific channel proteins, relied heavily on complex feature engineering, and lacked the incorporation of multi-source information. To address these problems, we proposed a novel deep learning model, called DeepCRPs, based on graph neural networks for systematically mining CRPs from animal venom. By combining the sequence semantic and structural information, the classification performance of four CRPs was significantly enhanced, reaching an accuracy of 0.92. This performance surpassed baseline models with accuracies ranging from 0.77 to 0.89. Furthermore, we employed advanced interpretable techniques to explore sequence and structural determinants relevant to the classification of CRPs, yielding potentially valuable bio-function interpretations. Comprehensive experimental results demonstrated the precision and interpretive capability of DeepCRPs, making it an accurate and bio-explainable suit for the identification and categorization of CRPs. Our research will contribute to the discovery and development of toxin peptides targeting channel proteins. The source data and code are freely available at https://github.com/liyigerry/DeepCRPs.
Collapse
Affiliation(s)
- Jian-Ming Wang
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Rong-Kai Cui
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Zheng-Kun Qian
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Zi-Zhong Yang
- Yunnan Provincial Key Laboratory of Entomological Biopharmaceutical R&D, College of Pharmacy, Dali University, Dali, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China.
| |
Collapse
|
4
|
Yan J, Zhang B, Zhou M, Campbell-Valois FX, Siu SWI. A deep learning method for predicting the minimum inhibitory concentration of antimicrobial peptides against Escherichia coli using Multi-Branch-CNN and Attention. mSystems 2023; 8:e0034523. [PMID: 37431995 PMCID: PMC10506472 DOI: 10.1128/msystems.00345-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 05/31/2023] [Indexed: 07/12/2023] Open
Abstract
Antimicrobial peptides (AMPs) are a promising alternative to antibiotics to combat drug resistance in pathogenic bacteria. However, the development of AMPs with high potency and specificity remains a challenge, and new tools to evaluate antimicrobial activity are needed to accelerate the discovery process. Therefore, we proposed MBC-Attention, a combination of a multi-branch convolution neural network architecture and attention mechanisms to predict the experimental minimum inhibitory concentration of peptides against Escherichia coli. The optimal MBC-Attention model achieved an average Pearson correlation coefficient (PCC) of 0.775 and a root mean squared error (RMSE) of 0.533 (log μM) in three independent tests of randomly drawn sequences from the data set. This results in a 5-12% improvement in PCC and a 6-13% improvement in RMSE compared to 17 traditional machine learning models and 2 optimally tuned models using random forest and support vector machine. Ablation studies confirmed that the two proposed attention mechanisms, global attention and local attention, contributed largely to performance improvement. IMPORTANCE Antimicrobial peptides (AMPs) are potential candidates for replacing conventional antibiotics to combat drug resistance in pathogenic bacteria. Therefore, it is necessary to evaluate the antimicrobial activity of AMPs quantitatively. However, wet-lab experiments are labor-intensive and time-consuming. To accelerate the evaluation process, we develop a deep learning method called MBC-Attention to regress the experimental minimum inhibitory concentration of AMPs against Escherichia coli. The proposed model outperforms traditional machine learning methods. Data, scripts to reproduce experiments, and the final production models are available on GitHub.
Collapse
Affiliation(s)
- Jielu Yan
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, Macau, China
| | - Bob Zhang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, Macau, China
| | - Mingliang Zhou
- School of Computer Science, Chongqing University, Shapingba, Chongqing, China
| | - François-Xavier Campbell-Valois
- Host-Microbe Interactions Laboratory, Center for Chemical and Synthetic Biology, Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, Canada
- Centre for Infection, Immunity, and Inflammation, University of Ottawa, Ottawa, Ontario, Canada
- Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, Canada
| | - Shirley W. I. Siu
- Institute of Science and Environment, University of Saint Joseph, Macau, China
| |
Collapse
|
5
|
Lerksuthirat T, On‐yam P, Chitphuk S, Stitchantrakul W, Newburg DS, Morrow AL, Hongeng S, Chiangjong W, Chutipongtanate S. ALA-A2 Is a Novel Anticancer Peptide Inspired by Alpha-Lactalbumin: A Discovery from a Computational Peptide Library, In Silico Anticancer Peptide Screening and In Vitro Experimental Validation. GLOBAL CHALLENGES (HOBOKEN, NJ) 2023; 7:2200213. [PMID: 36910465 PMCID: PMC10000267 DOI: 10.1002/gch2.202200213] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Indexed: 06/18/2023]
Abstract
Anticancer peptides (ACPs) are rising as a new strategy for cancer therapy. However, traditional laboratory screening to find and identify novel ACPs from hundreds to thousands of peptides is costly and time consuming. Here, a sequential procedure is applied to identify candidate ACPs from a computer-generated peptide library inspired by alpha-lactalbumin, a milk protein with known anticancer properties. A total of 2688 distinct peptides, 5-25 amino acids in length, are generated from alpha-lactalbumin. In silico ACP screening using the physicochemical and structural filters and three machine learning models lead to the top candidate peptides ALA-A1 and ALA-A2. In vitro screening against five human cancer cell lines supports ALA-A2 as the positive hit. ALA-A2 selectively kills A549 lung cancer cells in a dose-dependent manner, with no hemolytic side effects, and acts as a cell penetrating peptide without membranolytic effects. Sequential window acquisition of all theorical fragment ions-proteomics and functional validation reveal that ALA-A2 induces autophagy to mediate lung cancer cell death. This approach to identify ALA-A2 is time and cost-effective. Further investigations are warranted to elucidate the exact intracellular targets of ALA-A2. Moreover, these findings support the use of larger computational peptide libraries built upon multiple proteins to further advance ACP research and development.
Collapse
Affiliation(s)
- Tassanee Lerksuthirat
- Research CenterFaculty of Medicine Ramathibodi HospitalMahidol UniversityBangkok10400Thailand
| | - Pasinee On‐yam
- Pediatric Translational Research UnitDepartment of PediatricsFaculty of Medicine Ramathibodi HospitalMahidol UniversityBangkok10400Thailand
- Faculty of Medicine Ramathibodi HospitalMahidol UniversityBangkok10400Thailand
| | - Sermsiri Chitphuk
- Research CenterFaculty of Medicine Ramathibodi HospitalMahidol UniversityBangkok10400Thailand
| | - Wasana Stitchantrakul
- Research CenterFaculty of Medicine Ramathibodi HospitalMahidol UniversityBangkok10400Thailand
| | - David S. Newburg
- Division of EpidemiologyDepartment of Environmental and Public Health SciencesUniversity of Cincinnati College of MedicineCincinnatiOH45267USA
| | - Ardythe L. Morrow
- Division of EpidemiologyDepartment of Environmental and Public Health SciencesUniversity of Cincinnati College of MedicineCincinnatiOH45267USA
- Division of Infectious DiseasesDepartment of PediatricsCincinnati Children's Hospital Medical CenterUniversity of Cincinnati College of MedicineCincinnatiOH45267USA
| | - Suradej Hongeng
- Division of Hematology and OncologyDepartment of PediatricsFaculty of Medicine Ramathibodi HospitalMahidol UniversityBangkok10400Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research UnitDepartment of PediatricsFaculty of Medicine Ramathibodi HospitalMahidol UniversityBangkok10400Thailand
| | - Somchai Chutipongtanate
- Pediatric Translational Research UnitDepartment of PediatricsFaculty of Medicine Ramathibodi HospitalMahidol UniversityBangkok10400Thailand
- Division of EpidemiologyDepartment of Environmental and Public Health SciencesUniversity of Cincinnati College of MedicineCincinnatiOH45267USA
| |
Collapse
|
6
|
Singh A, Tiwari AK. Machine learning-based approach for prediction of ion channels and their subclasses. J Cell Biochem 2023; 124:72-88. [PMID: 36271914 DOI: 10.1002/jcb.30343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 10/10/2022] [Accepted: 10/12/2022] [Indexed: 01/25/2023]
Abstract
Ion channels are ion-permeable protein pores that are found in all cell lipid membranes. Distinct ion channels play multiple roles in biological processes. Proteomic data is fast accumulating as a result of the fast growth of mass spectrometry and giving us the chance to comprehensively explore ion channel classes along with their subclasses. This paper proposes an eXtreme Gradient Boosting (XGBoost)-based method to estimate the ion channel classes and their subclasses. Here, 12 feature vectors are applied to better characterize protein sequences like amino acid composition, pseudo-amino acid composition, normalized moreau-broto autocorrelation, amphiphilic pseudo-amino acid composition, dipeptide composition, Geary autocorrelation, tripeptide composition, sequence-order-coupling number, composition/transition/distribution, conjoint triad, moran autocorrelation, quasi-sequence-order descriptors. Here, a total of 9920 features are extracted from the protein sequence. The principal component analysis is applied to determine the optimal number of features to optimize the performance. In 10-fold cross-validation the proposed XGBoost based approach with optimal 50 features achieved accuracy of 100%, 98.70%, 98.77%, 97.26%, 87.40%, 97.39%, 98.03%, 96.42%, and F1-Score of 100%, 99%, 99%, 97%, 87%, 97%, 98%, 97%, for prediction of ion channel and nonion channel, voltage-gated and ligand-gated ion channels, subclasses of voltage-gated ion channels (VGICs), subclasses of ligand-gated ion channels (LGICs), subclasses of voltage-gated calcium channels (VGCCs), subclasses of voltage-gated potassium channels (VGKCs), subclasses of voltage-gated sodium channels (VGSCs), and subclasses of voltage-gated chloride channels, respectively. Here the proposed approach also compares with the other approaches such as support vector machine, k-nearest neighbor, Gaussian Naïve Bayes, and random forest and also compares with existing methods such as support vector machine (SVM) with maximum relevance maximum distance with an accuracy of 86.6%, 83.7%, and 85.1%, for ion channels, non-ion channels and overall respectively and SVM with radial basis function kernel-based method with an accuracy of 100%, 97% and 99.9% for ion channels, nonion channels, and overall accuracy, respectively.
Collapse
Affiliation(s)
- Anuj Singh
- Department of Computer Science and Engineering, Kamla Nehru Institute of Technology, Sultanpur, Uttar Pradesh, India
| | - Arvind Kumar Tiwari
- Department of Computer Science and Engineering, Kamla Nehru Institute of Technology, Sultanpur, Uttar Pradesh, India
| |
Collapse
|
7
|
Detecting Coronary Artery Disease from Computed Tomography Images Using a Deep Learning Technique. Diagnostics (Basel) 2022; 12:diagnostics12092073. [PMID: 36140475 PMCID: PMC9498285 DOI: 10.3390/diagnostics12092073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/13/2022] [Accepted: 08/23/2022] [Indexed: 11/16/2022] Open
Abstract
In recent times, coronary artery disease (CAD) has become one of the leading causes of morbidity and mortality across the globe. Diagnosing the presence and severity of CAD in individuals is essential for choosing the best course of treatment. Presently, computed tomography (CT) provides high spatial resolution images of the heart and coronary arteries in a short period. On the other hand, there are many challenges in analyzing cardiac CT scans for signs of CAD. Research studies apply machine learning (ML) for high accuracy and consistent performance to overcome the limitations. It allows excellent visualization of the coronary arteries with high spatial resolution. Convolutional neural networks (CNN) are widely applied in medical image processing to identify diseases. However, there is a demand for efficient feature extraction to enhance the performance of ML techniques. The feature extraction process is one of the factors in improving ML techniques’ efficiency. Thus, the study intends to develop a method to detect CAD from CT angiography images. It proposes a feature extraction method and a CNN model for detecting the CAD in minimum time with optimal accuracy. Two datasets are utilized to evaluate the performance of the proposed model. The present work is unique in applying a feature extraction model with CNN for CAD detection. The experimental analysis shows that the proposed method achieves 99.2% and 98.73% prediction accuracy, with F1 scores of 98.95 and 98.82 for benchmark datasets. In addition, the outcome suggests that the proposed CNN model achieves the area under the receiver operating characteristic and precision-recall curve of 0.92 and 0.96, 0.91 and 0.90 for datasets 1 and 2, respectively. The findings highlight that the performance of the proposed feature extraction and CNN model is superior to the existing models.
Collapse
|
8
|
Zhou J, Wu Z, Jiang Z, Huang K, Guo K, Zhao S. Background selection schema on deep learning-based classification of dermatological disease. Comput Biol Med 2022; 149:105966. [PMID: 36029748 DOI: 10.1016/j.compbiomed.2022.105966] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 07/28/2022] [Accepted: 08/13/2022] [Indexed: 11/03/2022]
Abstract
Skin diseases are one of the most common ailments affecting humans. Artificial intelligence based on deep learning can significantly improve the efficiency of identifying skin disorders and alleviate the scarcity of medical resources. However, the distribution of background information in dermatological datasets is imbalanced, causing generalized deep learning models to perform poorly in skin disease classification. We propose a deep learning schema that combines data preprocessing, data augmentation, and residual networks to study the influence of color-based background selection on a deep model's capacity to learn foreground lesion subject attributes in a skin disease classification problem. First, clinical photographs are annotated by dermatologists, and then the original background information is masked with unique colors to generate several subsets with distinct background colors. Sample-balanced training and test sets are generated using random over/undersampling and data augmentation techniques. Finally, the deep learning networks are independently trained on diverse subsets of backdrop colors to compare the performance of classifiers based on different background information. Extensive experiments demonstrate that color-based background information significantly affects the classification of skin diseases and that classifiers trained on the green subset achieve state-of-the-art performance for classifying black and red skin lesions.
Collapse
Affiliation(s)
- Jiancun Zhou
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; College of Information and Electronic Engineering, Hunan City University, Yiyang 413000, China
| | - Zheng Wu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Zixi Jiang
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, China; Hunan Engineering Research Center of Skin Health and Disease, Xiangya Hospital, Central South University, Changsha, China; Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China; National Clinical Research Center of Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Kai Huang
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, China; Hunan Engineering Research Center of Skin Health and Disease, Xiangya Hospital, Central South University, Changsha, China; Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China; National Clinical Research Center of Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Kehua Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Shuang Zhao
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, China; Hunan Engineering Research Center of Skin Health and Disease, Xiangya Hospital, Central South University, Changsha, China; Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China; National Clinical Research Center of Geriatric Disorders, Xiangya Hospital, Central South University, China.
| |
Collapse
|
9
|
Thi Phan L, Woo Park H, Pitti T, Madhavan T, Jeon YJ, Manavalan B. MLACP 2.0: An updated machine learning tool for anticancer peptide prediction. Comput Struct Biotechnol J 2022; 20:4473-4480. [PMID: 36051870 PMCID: PMC9421197 DOI: 10.1016/j.csbj.2022.07.043] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 07/25/2022] [Accepted: 07/25/2022] [Indexed: 12/24/2022] Open
Abstract
We present a novel meta-approach, MLACP 2.0, and implement it as a user-friendly webserver for the accurate identification of ACPs. MLACP 2.0 employed 11 different encoding schemes and eight different classifiers, including convolutional neural networks, to create a stable meta-model. Benchmarking study has demonstrated that MLACP 2.0 achieves superior performance in ACP prediction compared to publicly available state-of-the-art predictors.
Anticancer peptides are emerging anticancer drug that offers fewer side effects and is more effective than chemotherapy and targeted therapy. Predicting anticancer peptides from sequence information is one of the most challenging tasks in immunoinformatics. In the past ten years, machine learning-based approaches have been proposed for identifying ACP activity from peptide sequences. These methods include our previous method MLACP (developed in 2017) which made a significant impact on anticancer research. MLACP tool has been widely used by the research community, however, its robustness must be improved significantly for its continued practical application. In this study, the first large non-redundant training and independent datasets were constructed for ACP research. Using the training dataset, the study explored a wide range of feature encodings and developed their respective models using seven different conventional classifiers. Subsequently, a subset of encoding-based models was selected for each classifier based on their performance, whose predicted scores were concatenated and trained through a convolutional neural network (CNN), whose corresponding predictor is named MLACP 2.0. The evaluation of MLACP 2.0 with a very diverse independent dataset showed excellent performance and significantly outperformed the recent ACP prediction tools. Additionally, MLACP 2.0 exhibits superior performance during cross-validation and independent assessment when compared to CNN-based embedding models and conventional single models. Consequently, we anticipate that our proposed MLACP 2.0 will facilitate the design of hypothesis-driven experiments by making it easier to discover novel ACPs. The MLACP 2.0 is freely available at https://balalab-skku.org/mlacp2.
Collapse
|