Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Raza A, Uddin J, Almuhaimeed A, Akbar S, Zou Q, Ahmad A. AIPs-SnTCN: Predicting Anti-Inflammatory Peptides Using fastText and Transformer Encoder-Based Hybrid Word Embedding with Self-Normalized Temporal Convolutional Networks. J Chem Inf Model 2023;63:6537-6554. [PMID: 37905969 DOI: 10.1021/acs.jcim.3c01563] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]

For:	Raza A, Uddin J, Almuhaimeed A, Akbar S, Zou Q, Ahmad A. AIPs-SnTCN: Predicting Anti-Inflammatory Peptides Using fastText and Transformer Encoder-Based Hybrid Word Embedding with Self-Normalized Temporal Convolutional Networks. J Chem Inf Model 2023;63:6537-6554. [PMID: 37905969 DOI: 10.1021/acs.jcim.3c01563] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]

Number

Cited by Other Article(s)

Nafi MMI. Predicting C- and S-linked Glycosylation sites from protein sequences using protein language models. Comput Biol Med 2025;189:109956. [PMID: 40073495 DOI: 10.1016/j.compbiomed.2025.109956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Revised: 02/25/2025] [Accepted: 02/27/2025] [Indexed: 03/14/2025]

Rizzuto V, Settino M, Stroffolini G, Covello G, Vanags J, Naccarato M, Montanari R, de Lossada CR, Mazzotta C, Forestiero A, Adornetto C, Rechichi M, Ricca F, Greco G, Laganovska G, Borroni D. Ocular surface microbiome: Influences of physiological, environmental, and lifestyle factors. Comput Biol Med 2025;190:110046. [PMID: 40174504 DOI: 10.1016/j.compbiomed.2025.110046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 01/22/2025] [Accepted: 03/16/2025] [Indexed: 04/04/2025]

Abstract

PURPOSE

The ocular surface (OS) microbiome is influenced by various factors and impacts on ocular health. Understanding its composition and dynamics is crucial for developing targeted interventions for ocular diseases. This study aims to identify host variables, including physiological, environmental, and lifestyle (PEL) factors, that influence the ocular microbiome composition and establish valid associations between the ocular microbiome and health outcomes.

METHODS

The 16S rRNA gene sequencing was performed on OS samples collected from 135 healthy individuals using eSwab. DNA was extracted, libraries prepared, and PCR products purified and analyzed. PEL confounding factors were identified, and a cross-validation strategy using various bioinformatics methods including Machine learning was used to identify features that classify microbial profiles.

RESULTS

Nationality, allergy, sport practice, and eyeglasses usage are significant PEL confounding factors influencing the eye microbiome. Alpha-diversity analysis revealed significant differences between Spanish and Italian subjects (p-value < 0.001), with a median Shannon index of 1.05 for Spanish subjects and 0.59 for Italian subjects. Additionally, 8 microbial genera were significantly associated with eyeglass usage. Beta-diversity analysis indicated significant differences in microbial community composition based on nationality, age, sport, and eyeglasses usage. Differential abundance analysis identified several microbial genera associated with these PEL factors. The Support Vector Machine (SVM) model for Nationality achieved an accuracy of 100%, with an AUC-ROC score of 1.0, indicating excellent performance in classifying microbial profiles.

CONCLUSION

This study underscores the importance of considering PEL factors when studying the ocular microbiome. Our findings highlight the complex interplay between environmental, lifestyle, and demographic factors in shaping the OS microbiome. Future research should further explore these interactions to develop personalized approaches for managing ocular health.

Collapse

Affiliation(s)

Vincenzo Rizzuto Clinic of Ophthalmology, P. Stradins Clinical University Hospital, Riga, Latvia; School of Advanced Studies, Center for Neuroscience, University of Camerino, Camerino, Italy; Latvian American Eye Center (LAAC), Riga, Latvia
Marzia Settino Department of Mathematics and Computer Science, University of Calabria, Rende, Italy; Institute of High Performance Computing and Networks-National Research Council (ICAR-CNR), Rende, Italy.
Giacomo Stroffolini Department of Infectious-Tropical Diseases and Microbiology, IRCCS Sacro Cuore Don Calabria Hospital, Verona, Italy
Giuseppe Covello Department of Surgical, Medical, Molecular Pathology and Critical Care Medicine, University of Pisa, Pisa, Italy
Juris Vanags Department of Ophthalmology, Riga Stradins University, Riga, Latvia; Clinic of Ophthalmology, P. Stradins Clinical University Hospital, Riga, Latvia
Marta Naccarato Clinic of Ophthalmology, P. Stradins Clinical University Hospital, Riga, Latvia; Iris Medical Center, Cosenza, Italy
Roberto Montanari Pharmacology Institute, Heidelberg University Hospital, Heidelberg, Germany
Carlos Rocha de Lossada Eyemetagenomics Ltd., London, United Kingdom; Ophthalmology Department, QVision, Almeria, Spain; Ophthalmology Department, Hospital Regional Universitario of Malaga, Malaga, Spain; Department of Surgery, Ophthalmology Area, University of Seville, Seville, Spain
Cosimo Mazzotta Siena Crosslinking Center, Siena, Italy; Departmental Ophthalmology Unit, USL Toscana Sud Est, Siena, Italy; Postgraduate Ophthalmology School, University of Siena, Siena, Italy
Agostino Forestiero Institute of High Performance Computing and Networks-National Research Council (ICAR-CNR), Rende, Italy
Carlo Adornetto Eyemetagenomics Ltd., London, United Kingdom
Miguel Rechichi Centro Polispecialistico Mediterraneo, Sellia Marina, Italy
Francesco Ricca Department of Mathematics and Computer Science, University of Calabria, Rende, Italy
Gianluigi Greco Department of Mathematics and Computer Science, University of Calabria, Rende, Italy
Guna Laganovska Department of Ophthalmology, Riga Stradins University, Riga, Latvia; Clinic of Ophthalmology, P. Stradins Clinical University Hospital, Riga, Latvia
Davide Borroni Department of Ophthalmology, Riga Stradins University, Riga, Latvia; Eyemetagenomics Ltd., London, United Kingdom; Centro Oculistico Borroni, Gallarate, Italy

Collapse

Akbar S, Raza A, Awan HH, Zou Q, Alghamdi W, Saeed A. pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network. ACS OMEGA 2025;10:12403-12416. [PMID: 40191328 PMCID: PMC11966582 DOI: 10.1021/acsomega.4c11449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 02/04/2025] [Accepted: 03/07/2025] [Indexed: 04/09/2025]

Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025;188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]

Abstract

Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.

Collapse

Wei Z, Shen Y, Tang X, Wen J, Song Y, Wei M, Cheng J, Zhu X. AVPpred-BWR: antiviral peptides prediction via biological words representation. Bioinformatics 2025;41:btaf126. [PMID: 40152250 PMCID: PMC11968319 DOI: 10.1093/bioinformatics/btaf126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 02/17/2025] [Accepted: 03/26/2025] [Indexed: 03/29/2025] Open

Shamas M, Tauseef H, Ahmad A, Raza A, Ghadi YY, Mamyrbayev O, Momynzhanova K, Alahmadi TJ. Classification of pulmonary diseases from chest radiographs using deep transfer learning. PLoS One 2025;20:e0316929. [PMID: 40096069 PMCID: PMC11913265 DOI: 10.1371/journal.pone.0316929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 12/18/2024] [Indexed: 03/19/2025] Open

Madni HA, Umer RM, Zottin S, Marr C, Foresti GL. FL-W3S: Cross-domain federated learning for weakly supervised semantic segmentation of white blood cells. Int J Med Inform 2025;195:105806. [PMID: 39854783 DOI: 10.1016/j.ijmedinf.2025.105806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Revised: 01/10/2025] [Accepted: 01/21/2025] [Indexed: 01/26/2025]

Abstract

BACKGROUND

Segmentation models for clinical data experience severe performance degradation when trained on a single client from one domain and distributed to other clients from different domain. Federated Learning (FL) provides a solution by enabling multi-party collaborative learning without compromising the confidentiality of clients' private data.

METHODS

In this paper, we propose a cross-domain FL method for Weakly Supervised Semantic Segmentation (FL-W3S) of white blood cells in microscopic images. We perform model training on multiple clients with different data distributions to obtain a global aggregated model using only image-level class labels for semantic segmentation of white blood cells. A multi-class token transformer model learns the relationship between patch tokens and class tokens during collaborative learning and generates class-specific localization maps for mask predictions. To rectify the localization maps, we use patch-level pairwise affinity obtained from patch-to-patch transformer attention.

RESULTS

We evaluate performance of the proposed semantic segmentation method on two different datasets of white blood cells from different domains. Our experimental results show that for two datasets, there is 2.56% and 1.39% increase in performance of the proposed method over existing state-of-the-art methods.

CONCLUSION

The combination of federated learning for collaborative model training while preserving data privacy, alongside white blood cell segmentation techniques for precise cell identification, enhances diagnostic accuracy and personalized treatment strategies in clinical applications, particularly in hematology and pathology. More specifically, it involves isolating white blood cell from blood smear for further analysis such as automated blood cell counting, morphological analysis, cell classification, disease diagnosis and monitoring.

Collapse

Fan J, Weng W, Chen Q, Wu H, Wu J. PDG2Seq: Periodic Dynamic Graph to Sequence Model for Traffic Flow Prediction. Neural Netw 2025;183:106941. [PMID: 39642644 DOI: 10.1016/j.neunet.2024.106941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 09/23/2024] [Accepted: 11/16/2024] [Indexed: 12/09/2024]

Gaurav A, Gupta BB, Arya V, Attar RW, Bansal S, Alhomoud A, Chui KT. Smart waste classification in IoT-enabled smart cities using VGG16 and Cat Swarm Optimized random forest. PLoS One 2025;20:e0316930. [PMID: 40019915 PMCID: PMC11870384 DOI: 10.1371/journal.pone.0316930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 12/18/2024] [Indexed: 03/03/2025] Open

Timoneda JC, Vera SV. Behind the mask: Random and selective masking in transformer models applied to specialized social science texts. PLoS One 2025;20:e0318421. [PMID: 39982967 PMCID: PMC11844826 DOI: 10.1371/journal.pone.0318421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 01/16/2025] [Indexed: 02/23/2025] Open

Masud A, Hosen MB, Habibullah M, Anannya M, Kaiser MS. Image captioning in Bengali language using visual attention. PLoS One 2025;20:e0309364. [PMID: 39946345 PMCID: PMC11825021 DOI: 10.1371/journal.pone.0309364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 08/11/2024] [Indexed: 02/16/2025] Open

Hemmatian J, Hajizadeh R, Nazari F. Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE. PLoS One 2025;20:e0317396. [PMID: 39928607 PMCID: PMC11809912 DOI: 10.1371/journal.pone.0317396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/29/2024] [Indexed: 02/12/2025] Open

Feifei W, Wenrou S, Jinyue S, Qiaochu D, Jingjing L, Jin L, Junxiang L, Xuhui L, Xiao L, Congfen H. Anti-ageing mechanism of topical bioactive ingredient composition on skin based on network pharmacology. Int J Cosmet Sci 2025;47:134-154. [PMID: 39246148 DOI: 10.1111/ics.13005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/16/2024] [Accepted: 06/28/2024] [Indexed: 09/10/2024]

Yue J, Li T, Xu J, Chen Z, Li Y, Liang S, Liu Z, Wang Y. Discovery of anticancer peptides from natural and generated sequences using deep learning. Int J Biol Macromol 2025;290:138880. [PMID: 39706427 DOI: 10.1016/j.ijbiomac.2024.138880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/10/2024] [Accepted: 12/16/2024] [Indexed: 12/23/2024]

Affiliation(s)

Jianda Yue The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
Tingting Li The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
Jiawei Xu The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
Zihui Chen The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China
Yaqi Li The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
Songping Liang The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
Zhonghua Liu The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
Ying Wang The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.

Collapse

Li L, Wang R, Zou M, Guo F, Ren Y. Enhanced ResNet-50 for garbage classification: Feature fusion and depth-separable convolutions. PLoS One 2025;20:e0317999. [PMID: 39869568 PMCID: PMC11771864 DOI: 10.1371/journal.pone.0317999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Accepted: 01/08/2025] [Indexed: 01/29/2025] Open

Iqbal MW, Shahab M, Ullah Z, Zheng G, Anjum I, Shazly GA, Mengistie AA, Sun X, Yuan Q. Integrating machine learning and structure-based approaches for repurposing potent tyrosine protein kinase Src inhibitors to treat inflammatory disorders. Sci Rep 2025;15:1836. [PMID: 39805859 PMCID: PMC11730308 DOI: 10.1038/s41598-024-83767-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 12/17/2024] [Indexed: 01/16/2025] Open

Abstract

Tyrosine-protein kinase Src plays a key role in cell proliferation and growth under favorable conditions, but its overexpression and genetic mutations can lead to the progression of various inflammatory diseases. Due to the specificity and selectivity problems of previously discovered inhibitors like dasatinib and bosutinib, we employed an integrated machine learning and structure-based drug repurposing strategy to find novel, targeted, and non-toxic Src kinase inhibitors. Different machine learning models including random forest (RF), k-nearest neighbors (K-NN), decision tree, and support vector machine (SVM), were trained using already available bioactivity data of Src kinase targeting compounds. The performance evaluation of these models demonstrated SVM as the best model, which was further utilized to shortlist 51 highly potent compounds by screening an FDA-approved library of 1040 drugs. Molecular docking and molecular dynamic simulation were subsequently employed to evaluate the binding affinity and stability of the proposed compounds. Orlistat, acarbose and afatinib were identified as the potent leads, demonstrating stable conformations and stronger interactions, validated by root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (RoG), and hydrogen bond analyses. Molecular Mechanics/Generalized Born Surface Area (MMGBSA) analysis validated their binding affinities by providing comparably lower binding free energies for orlistat (- 33.4743 ± 3.8908), acarbose (- 19.5455 ± 5.4702), and afatinib (- 36.4944 ± 5.4929) than the control, dasatinib (- 13.7785 ± 5.8058). Finally, toxicity analysis revealed orlistat and acarbose as the possible safer therapeutics by eliminating afatinib as it showed significant toxicity concerns. Our investigation supports the advance computational methods utilization in the field of drug discovery and suggest further experimental validation of proposed inhibitors of Src kinase for their safer use against inflammatory diseases. The ultimate aim of this study is to advance the development of effective treatments for inflammatory diseases, linked with Src overexpression.

Collapse

Shahid, Hayat M, Alghamdi W, Akbar S, Raza A, Kadir RA, Sarker MR. pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning. Sci Rep 2025;15:565. [PMID: 39747941 PMCID: PMC11695694 DOI: 10.1038/s41598-024-84146-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 12/20/2024] [Indexed: 01/04/2025] Open

Fang S, Hong S, Li Q, Li P, Coats T, Zou B, Kong G. Cross-modal similar clinical case retrieval using a modular model based on contrastive learning and k-nearest neighbor search. Int J Med Inform 2025;193:105680. [PMID: 39500035 DOI: 10.1016/j.ijmedinf.2024.105680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 09/20/2024] [Accepted: 10/28/2024] [Indexed: 12/01/2024]

Han S, Jung H. NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class. PLoS One 2024;19:e0316454. [PMID: 39739883 DOI: 10.1371/journal.pone.0316454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/11/2024] [Indexed: 01/02/2025] Open

Abstract

Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model's decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE's capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.

Collapse

Akbar S, Ullah M, Raza A, Zou Q, Alghamdi W. DeepAIPs-Pred: Predicting Anti-Inflammatory Peptides Using Local Evolutionary Transformation Images and Structural Embedding-Based Optimal Descriptors with Self-Normalized BiTCNs. J Chem Inf Model 2024;64:9609-9625. [PMID: 39625463 DOI: 10.1021/acs.jcim.4c01758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]

Abstract

Inflammation is a biological response to harmful stimuli, playing a crucial role in facilitating tissue repair by eradicating pathogenic microorganisms. However, when inflammation becomes chronic, it leads to numerous serious disorders, particularly in autoimmune diseases. Anti-inflammatory peptides (AIPs) have emerged as promising therapeutic agents due to their high specificity, potency, and low toxicity. However, identifying AIPs using traditional in vivo methods is time-consuming and expensive. Recent advancements in computational-based intelligent models for peptides have offered a cost-effective alternative for identifying various inflammatory diseases, owing to their selectivity toward targeted cells with low side effects. In this paper, we propose a novel computational model, namely, DeepAIPs-Pred, for the accurate prediction of AIP sequences. The training samples are represented using LBP-PSSM- and LBP-SMR-based evolutionary image transformation methods. Additionally, to capture contextual semantic features, we employed attention-based ProtBERT-BFD embedding and QLC for structural features. Furthermore, differential evolution (DE)-based weighted feature integration is utilized to produce a multiview feature vector. The SMOTE-Tomek Links are introduced to address the class imbalance problem, and a two-layer feature selection technique is proposed to reduce and select the optimal features. Finally, the novel self-normalized bidirectional temporal convolutional networks (SnBiTCN) are trained using optimal features, achieving a significant predictive accuracy of 94.92% and an AUC of 0.97. The generalization of our proposed model is validated using two independent datasets, demonstrating higher performance with the improvement of ∼2 and ∼10% of accuracies than the existing state-of-the-art model using Ind-I and Ind-II, respectively. The efficacy and reliability of DeepAIPs-Pred highlight its potential as a valuable and promising tool for drug development and research academia.

Collapse

Al-Omari AM, Akkam YH, Zyout A, Younis S, Tawalbeh SM, Al-Sawalmeh K, Al Fahoum A, Arnold J. Accelerating antimicrobial peptide design: Leveraging deep learning for rapid discovery. PLoS One 2024;19:e0315477. [PMID: 39705302 DOI: 10.1371/journal.pone.0315477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 11/26/2024] [Indexed: 12/22/2024] Open

Abstract

Antimicrobial peptides (AMPs) are excellent at fighting many different infections. This demonstrates how important it is to make new AMPs that are even better at eliminating infections. The fundamental transformation in a variety of scientific disciplines, which led to the emergence of machine learning techniques, has presented significant opportunities for the development of antimicrobial peptides. Machine learning and deep learning are used to predict antimicrobial peptide efficacy in the study. The main purpose is to overcome traditional experimental method constraints. Gram-negative bacterium Escherichia coli is the model organism in this study. The investigation assesses 1,360 peptide sequences that exhibit anti- E. coli activity. These peptides' minimal inhibitory concentrations have been observed to be correlated with a set of 34 physicochemical characteristics. Two distinct methodologies are implemented. The initial method involves utilizing the pre-computed physicochemical attributes of peptides as the fundamental input data for a machine-learning classification approach. In the second method, these fundamental peptide features are converted into signal images, which are then transmitted to a deep learning neural network. The first and second methods have accuracy of 74% and 92.9%, respectively. The proposed methods were developed to target a single microorganism (gram negative E.coli), however, they offered a framework that could potentially be adapted for other types of antimicrobial, antiviral, and anticancer peptides with further validation. Furthermore, they have the potential to result in significant time and cost reductions, as well as the development of innovative AMP-based treatments. This research contributes to the advancement of deep learning-based AMP drug discovery methodologies by generating potent peptides for drug development and application. This discovery has significant implications for the processing of biological data and the computation of pharmacology.

Collapse

Kalal V, Jha BK. Cancer detection with various classification models: A comprehensive feature analysis using HMM to extract a nucleotide pattern. Comput Biol Chem 2024;113:108215. [PMID: 39378821 DOI: 10.1016/j.compbiolchem.2024.108215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 09/04/2024] [Accepted: 09/15/2024] [Indexed: 10/10/2024]

Wang Y, Fang C. Cycle-ESM: Generation-assisted classification of antifungal peptides using ESM protein language model. Comput Biol Chem 2024;113:108240. [PMID: 39437594 DOI: 10.1016/j.compbiolchem.2024.108240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 09/29/2024] [Accepted: 10/04/2024] [Indexed: 10/25/2024]

Qi D, Liu T. VotePLMs-AFP: Identification of antifreeze proteins using transformer-embedding features and ensemble learning. Biochim Biophys Acta Gen Subj 2024;1868:130721. [PMID: 39426757 DOI: 10.1016/j.bbagen.2024.130721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/24/2024] [Accepted: 10/11/2024] [Indexed: 10/21/2024]

Lu Q, Xu J, Zhang R, Liu H, Wang M, Liu X, Yue Z, Gao Y. RiceSNP-ABST: a deep learning approach to identify abiotic stress-associated single nucleotide polymorphisms in rice. Brief Bioinform 2024;26:bbae702. [PMID: 39757606 PMCID: PMC11962596 DOI: 10.1093/bib/bbae702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/16/2024] [Accepted: 12/23/2024] [Indexed: 01/07/2025] Open

Affiliation(s)

Quan Lu School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
Jiajun Xu School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
Renyi Zhang School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
Hangcheng Liu School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
Meng Wang School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
Xiaoshuang Liu Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
Zhenyu Yue School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
Yujia Gao School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China

Collapse

Najafi H, Savoji K, Mirzaeibonehkhater M, Moravvej SV, Alizadehsani R, Pedrammehr S. A Novel Method for 3D Lung Tumor Reconstruction Using Generative Models. Diagnostics (Basel) 2024;14:2604. [PMID: 39594270 PMCID: PMC11592759 DOI: 10.3390/diagnostics14222604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 11/02/2024] [Accepted: 11/12/2024] [Indexed: 11/28/2024] Open

Abstract

BACKGROUND

Lung cancer remains a significant health concern, and the effectiveness of early detection significantly enhances patient survival rates. Identifying lung tumors with high precision is a challenge due to the complex nature of tumor structures and the surrounding lung tissues.

METHODS

To address these hurdles, this paper presents an innovative three-step approach that leverages Generative Adversarial Networks (GAN), Long Short-Term Memory (LSTM), and VGG16 algorithms for the accurate reconstruction of three-dimensional (3D) lung tumor images. The first challenge we address is the accurate segmentation of lung tissues from CT images, a task complicated by the overwhelming presence of non-lung pixels, which can lead to classifier imbalance. Our solution employs a GAN model trained with a reinforcement learning (RL)-based algorithm to mitigate this imbalance and enhance segmentation accuracy. The second challenge involves precisely detecting tumors within the segmented lung regions. We introduce a second GAN model with a novel loss function that significantly improves tumor detection accuracy. Following successful segmentation and tumor detection, the VGG16 algorithm is utilized for feature extraction, preparing the data for the final 3D reconstruction. These features are then processed through an LSTM network and converted into a format suitable for the reconstructive GAN. This GAN, equipped with dilated convolution layers in its discriminator, captures extensive contextual information, enabling the accurate reconstruction of the tumor's 3D structure.

RESULTS

The effectiveness of our method is demonstrated through rigorous evaluation against established techniques using the LIDC-IDRI dataset and standard performance metrics, showcasing its superior performance and potential for enhancing early lung cancer detection.

CONCLUSIONS

This study highlights the benefits of combining GANs, LSTM, and VGG16 into a unified framework. This approach significantly improves the accuracy of detecting and reconstructing lung tumors, promising to enhance diagnostic methods and patient results in lung cancer treatment.

Collapse

Noor S, Naseem A, Awan HH, Aslam W, Khan S, AlQahtani SA, Ahmad N. Deep-m5U: a deep learning-based approach for RNA 5-methyluridine modification prediction using optimized feature integration. BMC Bioinformatics 2024;25:360. [PMID: 39563239 DOI: 10.1186/s12859-024-05978-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 11/06/2024] [Indexed: 11/21/2024] Open

Yan K. Syntactic analysis of SMOSS model combined with improved LSTM model: Taking English writing teaching as an example. PLoS One 2024;19:e0312049. [PMID: 39546444 PMCID: PMC11567549 DOI: 10.1371/journal.pone.0312049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 09/30/2024] [Indexed: 11/17/2024] Open

Abstract

This paper explores the method of combining Sequential Matching on Sliding Window Sequences (SMOSS) model with improved Long Short-Term Memory (LSTM) model in English writing teaching to improve learners' syntactic understanding and writing ability, thus effectively improving the quality of English writing teaching. Firstly, this paper analyzes the structure of SMOSS model. Secondly, this paper optimizes the traditional LSTM model by using Connectist Temporal Classification (CTC), and proposes an English text error detection model. Meanwhile, this paper combines the SMOSS model with the optimized LSTM model to form a comprehensive syntactic analysis framework, and designs and implements the structure and code of the framework. Finally, on the one hand, the semantic disambiguation performance of the model is tested by using SemCor data set. On the other hand, taking English writing teaching as an example, the proposed method is further verified by designing a comparative experiment in groups. The results show that: (1) From the experimental data of word sense disambiguation, the accuracy of the SMOSS-LSTM model proposed in this paper is the lowest when the context range is "3+3", then it rises in turn at "5+5" and "7+7", reaches the highest at "7+7", and then begins to decrease at "10+10"; (2) Compared with the control group, the accuracy of syntactic analysis in the experimental group reached 89.5%, while that in the control group was only 73.2%. (3) In the aspect of English text error detection, the detection accuracy of the proposed model in the experimental group is as high as 94.8%, which is significantly better than the traditional SMOSS-based text error detection method, and its accuracy is only 68.3%. (4) Compared with other existing researches, although it is slightly inferior to Bidirectional Encoder Representations from Transformers (BERT) in word sense disambiguation, this proposed model performs well in syntactic analysis and English text error detection, and its comprehensive performance is excellent. This paper verifies the effectiveness and practicability of applying SMOSS model and improved LSTM model to the syntactic analysis task in English writing teaching, and provides new ideas and methods for the application of syntactic analysis in English teaching.

Collapse

Beltrán JF, Herrera-Belén L, Yáñez AJ, Jimenez L. Prediction of viral oncoproteins through the combination of generative adversarial networks and machine learning techniques. Sci Rep 2024;14:27108. [PMID: 39511292 PMCID: PMC11543823 DOI: 10.1038/s41598-024-77028-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 10/18/2024] [Indexed: 11/15/2024] Open

Qureshi MS, Qureshi MB, Iqrar U, Raza A, Ghadi YY, Innab N, Alajmi M, Qahmash A. AI based predictive acceptability model for effective vaccine delivery in healthcare systems. Sci Rep 2024;14:26657. [PMID: 39496689 PMCID: PMC11535025 DOI: 10.1038/s41598-024-76891-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Accepted: 10/17/2024] [Indexed: 11/06/2024] Open

Zhang Z, Lu Y, Wang T, Wei X, Wei Z. Joint Dual Feature Distillation and Gradient Progressive Pruning for BERT compression. Neural Netw 2024;179:106533. [PMID: 39079378 DOI: 10.1016/j.neunet.2024.106533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/24/2024] [Accepted: 07/09/2024] [Indexed: 09/18/2024]

Shaon MSH, Karim T, Ali MM, Ahmed K, Bui FM, Chen L, Moni MA. A robust deep learning approach for identification of RNA 5-methyluridine sites. Sci Rep 2024;14:25688. [PMID: 39465261 PMCID: PMC11514282 DOI: 10.1038/s41598-024-76148-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 10/10/2024] [Indexed: 10/29/2024] Open

Castro-Silva JA, Moreno-García MN, Guachi-Guachi L, Peluffo-Ordóñez DH. Novel hippocampus-centered methodology for informative instance selection in Alzheimer's disease data. Heliyon 2024;10:e37552. [PMID: 39381107 PMCID: PMC11456841 DOI: 10.1016/j.heliyon.2024.e37552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 08/30/2024] [Accepted: 09/05/2024] [Indexed: 10/10/2024] Open

Aruna AS, Babu KRR, Deepthi K. A deep drug prediction framework for viral infectious diseases using an optimizer-based ensemble of convolutional neural network: COVID-19 as a case study. Mol Divers 2024:10.1007/s11030-024-11003-7. [PMID: 39379663 DOI: 10.1007/s11030-024-11003-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Accepted: 09/26/2024] [Indexed: 10/10/2024]

Kilimci ZH, Yalcin M. ACP-ESM: A novel framework for classification of anticancer peptides using protein-oriented transformer approach. Artif Intell Med 2024;156:102951. [PMID: 39173421 DOI: 10.1016/j.artmed.2024.102951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 07/19/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024]

Abstract

Anticancer peptides (ACPs) are a class of molecules that have gained significant attention in the field of cancer research and therapy. ACPs are short chains of amino acids, the building blocks of proteins, and they possess the ability to selectively target and kill cancer cells. One of the key advantages of ACPs is their ability to selectively target cancer cells while sparing healthy cells to a greater extent. This selectivity is often attributed to differences in the surface properties of cancer cells compared to normal cells. That is why ACPs are being investigated as potential candidates for cancer therapy. ACPs may be used alone or in combination with other treatment modalities like chemotherapy and radiation therapy. While ACPs hold promise as a novel approach to cancer treatment, there are challenges to overcome, including optimizing their stability, improving selectivity, and enhancing their delivery to cancer cells, continuous increasing in number of peptide sequences, developing a reliable and precise prediction model. In this work, we propose an efficient transformer-based framework to identify ACPs for by performing accurate a reliable and precise prediction model. For this purpose, four different transformer models, namely ESM, ProtBERT, BioBERT, and SciBERT are employed to detect ACPs from amino acid sequences. To demonstrate the contribution of the proposed framework, extensive experiments are carried on widely-used datasets in the literature, two versions of AntiCp2, cACP-DeepGram, ACP-740. Experiment results show the usage of proposed model enhances classification accuracy when compared to the literature studies. The proposed framework, ESM, exhibits 96.45% of accuracy for AntiCp2 dataset, 97.66% of accuracy for cACP-DeepGram dataset, and 88.51% of accuracy for ACP-740 dataset, thence determining new state-of-the-art. The code of proposed framework is publicly available at github (https://github.com/mstf-yalcin/acp-esm).

Collapse

Wen J, Ding Z, Wei Z, Xia H, Zhang Y, Zhu X. NeuroPpred-SHE: An interpretable neuropeptides prediction model based on selected features from hand-crafted features and embeddings of T5 model. Comput Biol Med 2024;181:109048. [PMID: 39182368 DOI: 10.1016/j.compbiomed.2024.109048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 08/13/2024] [Accepted: 08/18/2024] [Indexed: 08/27/2024]

İhtiyar MN, Özgür A. Generative language models on nucleotide sequences of human genes. Sci Rep 2024;14:22204. [PMID: 39333252 PMCID: PMC11437190 DOI: 10.1038/s41598-024-72512-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 09/09/2024] [Indexed: 09/29/2024] Open

Abstract

Language models, especially transformer-based ones, have achieved colossal success in natural language processing. To be precise, studies like BERT for natural language understanding and works like GPT-3 for natural language generation are very important. If we consider DNA sequences as a text written with an alphabet of four letters representing the nucleotides, they are similar in structure to natural languages. This similarity has led to the development of discriminative language models such as DNABERT in the field of DNA-related bioinformatics. To our knowledge, however, the generative side of the coin is still largely unexplored. Therefore, we have focused on the development of an autoregressive generative language model such as GPT-3 for DNA sequences. Since working with whole DNA sequences is challenging without extensive computational resources, we decided to conduct our study on a smaller scale and focus on nucleotide sequences of human genes, i.e. unique parts of DNA with specific functions, rather than the whole DNA. This decision has not significantly changed the structure of the problem, as both DNA and genes can be considered as 1D sequences consisting of four different nucleotides without losing much information and without oversimplification. First of all, we systematically studied an almost entirely unexplored problem and observed that recurrent neural networks (RNNs) perform best, while simple techniques such as N-grams are also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural languages. The importance of using real-world tasks beyond classical metrics such as perplexity was noted. In addition, we examined whether the data-hungry nature of these models can be altered by selecting a language with minimal vocabulary size, four due to four different types of nucleotides. The reason for reviewing this was that choosing such a language might make the problem easier. However, in this study, we found that this did not change the amount of data required very much.

Collapse

Ghafoor H, Asim MN, Ibrahim MA, Dengel A. ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution. Heliyon 2024;10:e36041. [PMID: 39281576 PMCID: PMC11401092 DOI: 10.1016/j.heliyon.2024.e36041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/01/2024] [Accepted: 08/08/2024] [Indexed: 09/18/2024] Open

Abstract

Protein solubility prediction is useful for the careful selection of highly effective candidate proteins for drug development. In recombinant proteins synthesis, solubility prediction is valuable for optimizing key protein characteristics, including stability, functionality, and ease of purification. It contains valuable information about potential biomarkers or therapeutic targets and helps in early forecasting of neurodegenerative diseases, cancer, and cardiovascular disorders. Traditional wet-lab experimental protein solubility prediction approaches are error-prone, time-consuming, and costly. Researchers harnessed the competence of Artificial Intelligence approaches for replacing experimental approaches with computational predictors. These predictors inferred the solubility of proteins by analyzing amino acids distributions in raw protein sequences. There is still a lot of room for the development of robust computational predictors because existing predictors remain fail in extracting comprehensive discriminative distribution of amino acids. To more precisely discriminate soluble proteins from insoluble proteins, this paper presents ProSol-Multi predictor that makes use of a novel MLCDE encoder and Random Forest classifier. MLCDE encoder transforms protein sequences into informative statistical vectors by capturing amino acids multi-level correlation and discriminative distribution within raw protein sequences. The performance of proposed encoder is evaluated against 56 existing protein sequence encoding methods on a widely used protein solubility prediction benchmark dataset under two different experimental settings namely intrinsic and extrinsic. Intrinsic evaluation reveals that from all sequence encoders, proposed MLCDE encoder manages to generate non-overlapping clusters of soluble and insoluble classes. In extrinsic evaluation, 10 machine learning classifiers achieve better performance with proposed MLCDE encoder as compared to 56 existing protein sequence encoders. Moreover, across 4 public benchmark datasets, proposed ProSol-Multi predictor outshines 20 existing predictors by an average accuracy of 3%, MCC and AU-ROC of 2%. ProSol-Multi interactive web application is available at https://sds_genetic_analysis.opendfki.de/ProSol-Multi.

Collapse

Gunduz H. Comparative analysis of BERT and FastText representations on crowdfunding campaign success prediction. PeerJ Comput Sci 2024;10:e2316. [PMID: 39314718 PMCID: PMC11419673 DOI: 10.7717/peerj-cs.2316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 08/19/2024] [Indexed: 09/25/2024]

Uddin I, Awan HH, Khalid M, Khan S, Akbar S, Sarker MR, Abdolrasol MGM, Alghamdi TAH. A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications. Sci Rep 2024;14:20819. [PMID: 39242695 PMCID: PMC11379919 DOI: 10.1038/s41598-024-71568-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 08/29/2024] [Indexed: 09/09/2024] Open

Wang S, Luo B. Academic achievement prediction in higher education through interpretable modeling. PLoS One 2024;19:e0309838. [PMID: 39236050 PMCID: PMC11376577 DOI: 10.1371/journal.pone.0309838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 08/20/2024] [Indexed: 09/07/2024] Open

Kalal V, Jha BK. A Kernelized Classification Approach for Cancer Recognition Using Markovian Analysis of DNA Structure Patterns as Feature Mining. Cell Biochem Biophys 2024;82:2249-2274. [PMID: 38847942 DOI: 10.1007/s12013-024-01336-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2024] [Indexed: 10/02/2024]

Xu Y, Zhang S, Zhu F, Liang Y. A deep learning model for anti-inflammatory peptides identification based on deep variational autoencoder and contrastive learning. Sci Rep 2024;14:18451. [PMID: 39117712 PMCID: PMC11310449 DOI: 10.1038/s41598-024-69419-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 08/05/2024] [Indexed: 08/10/2024] Open

Abstract

As a class of biologically active molecules with significant immunomodulatory and anti-inflammatory effects, anti-inflammatory peptides have important application value in the medical and biotechnology fields due to their unique biological functions. Research on the identification of anti-inflammatory peptides provides important theoretical foundations and practical value for a deeper understanding of the biological mechanisms of inflammation and immune regulation, as well as for the development of new drugs and biotechnological applications. Therefore, it is necessary to develop more advanced computational models for identifying anti-inflammatory peptides. In this study, we propose a deep learning model named DAC-AIPs based on variational autoencoder and contrastive learning for accurate identification of anti-inflammatory peptides. In the sequence encoding part, the incorporation of multi-hot encoding helps capture richer sequence information. The autoencoder, composed of convolutional layers and linear layers, can learn latent features and reconstruct features, with variational inference enhancing the representation capability of latent features. Additionally, the introduction of contrastive learning aims to improve the model's classification ability. Through cross-validation and independent dataset testing experiments, DAC-AIPs achieves superior performance compared to existing state-of-the-art models. In cross-validation, the classification accuracy of DAC-AIPs reached around 88%, which is 7% higher than previous models. Furthermore, various ablation experiments and interpretability experiments validate the effectiveness of DAC-AIPs. Finally, a user-friendly online predictor is designed to enhance the practicality of the model, and the server is freely accessible at http://dac-aips.online .

Collapse

Rukh G, Akbar S, Rehman G, Alarfaj FK, Zou Q. StackedEnC-AOP: prediction of antioxidant proteins using transform evolutionary and sequential features based multi-scale vector with stacked ensemble learning. BMC Bioinformatics 2024;25:256. [PMID: 39098908 PMCID: PMC11298090 DOI: 10.1186/s12859-024-05884-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 07/29/2024] [Indexed: 08/06/2024] Open

Abstract

BACKGROUND

Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins.

METHODS

In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model.

RESULTS

Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98.

CONCLUSION

Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.

Collapse

Yu JC, Ni K, Chen CT. ENCAP: Computational prediction of tumor T cell antigens with ensemble classifiers and diverse sequence features. PLoS One 2024;19:e0307176. [PMID: 39024250 PMCID: PMC11257298 DOI: 10.1371/journal.pone.0307176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open

Zhang L, Hu X, Xiao K, Kong L. Effective identification and differential analysis of anticancer peptides. Biosystems 2024;241:105246. [PMID: 38848816 DOI: 10.1016/j.biosystems.2024.105246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 05/27/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024]

Harun-Or-Roshid M, Pham NT, Manavalan B, Kurata H. Meta-2OM: A multi-classifier meta-model for the accurate prediction of RNA 2'-O-methylation sites in human RNA. PLoS One 2024;19:e0305406. [PMID: 38924058 PMCID: PMC11207182 DOI: 10.1371/journal.pone.0305406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/29/2024] [Indexed: 06/28/2024] Open

Ipkovich Á, Czvetkó T, A. Acosta L, Lee S, Nzimenyera I, Sebestyén V, Abonyi J. Network science and explainable AI-based life cycle management of sustainability models. PLoS One 2024;19:e0300531. [PMID: 38870225 PMCID: PMC11175538 DOI: 10.1371/journal.pone.0300531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 02/29/2024] [Indexed: 06/15/2024] Open

Jia Y, Yu Z, Hong Z. Semantic aware-based instruction embedding for binary code similarity detection. PLoS One 2024;19:e0305299. [PMID: 38861533 PMCID: PMC11166306 DOI: 10.1371/journal.pone.0305299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 05/27/2024] [Indexed: 06/13/2024] Open

Chen T, Kabir MF. Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data. PLoS One 2024;19:e0302947. [PMID: 38728288 PMCID: PMC11086842 DOI: 10.1371/journal.pone.0302947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/15/2024] [Indexed: 05/12/2024] Open