1
|
Harrigan WL, Ferrell BD, Wommack KE, Polson SW, Schreiber ZD, Belcaid M. Improvements in viral gene annotation using large language models and soft alignments. BMC Bioinformatics 2024; 25:165. [PMID: 38664627 PMCID: PMC11046836 DOI: 10.1186/s12859-024-05779-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/12/2024] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND The annotation of protein sequences in public databases has long posed a challenge in molecular biology. This issue is particularly acute for viral proteins, which demonstrate limited homology to known proteins when using alignment, k-mer, or profile-based homology search approaches. A novel methodology employing Large Language Models (LLMs) addresses this methodological challenge by annotating protein sequences based on embeddings. RESULTS Central to our contribution is the soft alignment algorithm, drawing from traditional protein alignment but leveraging embedding similarity at the amino acid level to bypass the need for conventional scoring matrices. This method not only surpasses pooled embedding-based models in efficiency but also in interpretability, enabling users to easily trace homologous amino acids and delve deeper into the alignments. Far from being a black box, our approach provides transparent, BLAST-like alignment visualizations, combining traditional biological research with AI advancements to elevate protein annotation through embedding-based analysis while ensuring interpretability. Tests using the Virus Orthologous Groups and ViralZone protein databases indicated that the novel soft alignment approach recognized and annotated sequences that both blastp and pooling-based methods, which are commonly used for sequence annotation, failed to detect. CONCLUSION The embeddings approach shows the great potential of LLMs for enhancing protein sequence annotation, especially in viral genomics. These findings present a promising avenue for more efficient and accurate protein function inference in molecular biology.
Collapse
Affiliation(s)
- William L Harrigan
- Hawai'i Institute of Marine Biology, University of Hawai'i at Mānoa, Honolulu, HI, 96822, USA
| | - Barbra D Ferrell
- Department of Plant & Soil Sciences, University of Delaware, Newark, DE, 19713, USA
| | - K Eric Wommack
- Department of Plant & Soil Sciences, University of Delaware, Newark, DE, 19713, USA
| | - Shawn W Polson
- Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19713, USA
| | - Zachary D Schreiber
- Department of Plant & Soil Sciences, University of Delaware, Newark, DE, 19713, USA
| | - Mahdi Belcaid
- Department of Computer Science, University of Hawai'i at Mānoa, Honolulu, HI, 96822, USA.
| |
Collapse
|
2
|
Kumari S, Gupta R, Ambasta RK, Kumar P. Emerging trends in post-translational modification: Shedding light on Glioblastoma multiforme. Biochim Biophys Acta Rev Cancer 2023; 1878:188999. [PMID: 37858622 DOI: 10.1016/j.bbcan.2023.188999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
Recent multi-omics studies, including proteomics, transcriptomics, genomics, and metabolomics have revealed the critical role of post-translational modifications (PTMs) in the progression and pathogenesis of Glioblastoma multiforme (GBM). Further, PTMs alter the oncogenic signaling events and offer a novel avenue in GBM therapeutics research through PTM enzymes as potential biomarkers for drug targeting. In addition, PTMs are critical regulators of chromatin architecture, gene expression, and tumor microenvironment (TME), that play a crucial function in tumorigenesis. Moreover, the implementation of artificial intelligence and machine learning algorithms enhances GBM therapeutics research through the identification of novel PTM enzymes and residues. Herein, we briefly explain the mechanism of protein modifications in GBM etiology, and in altering the biologics of GBM cells through chromatin remodeling, modulation of the TME, and signaling pathways. In addition, we highlighted the importance of PTM enzymes as therapeutic biomarkers and the role of artificial intelligence and machine learning in protein PTM prediction.
Collapse
Affiliation(s)
- Smita Kumari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; School of Medicine, University of South Carolina, Columbia, SC, United States of America
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India.
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India.
| |
Collapse
|
3
|
Wang X, Ding Z, Wang R, Lin X. Deepro-Glu: combination of convolutional neural network and Bi-LSTM models using ProtBert and handcrafted features to identify lysine glutarylation sites. Brief Bioinform 2023; 24:6991122. [PMID: 36653898 DOI: 10.1093/bib/bbac631] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 12/11/2022] [Accepted: 12/28/2022] [Indexed: 01/20/2023] Open
Abstract
Lysine glutarylation (Kglu) is a newly discovered post-translational modification of proteins with important roles in mitochondrial functions, oxidative damage, etc. The established biological experimental methods to identify glutarylation sites are often time-consuming and costly. Therefore, there is an urgent need to develop computational methods for efficient and accurate identification of glutarylation sites. Most of the existing computational methods only utilize handcrafted features to construct the prediction model and do not consider the positive impact of the pre-trained protein language model on the prediction performance. Based on this, we develop an ensemble deep-learning predictor Deepro-Glu that combines convolutional neural network and bidirectional long short-term memory network using the deep learning features and traditional handcrafted features to predict lysine glutaryation sites. The deep learning features are generated from the pre-trained protein language model called ProtBert, and the handcrafted features consist of sequence-based features, physicochemical property-based features and evolution information-based features. Furthermore, the attention mechanism is used to efficiently integrate the deep learning features and the handcrafted features by learning the appropriate attention weights. 10-fold cross-validation and independent tests demonstrate that Deepro-Glu achieves competitive or superior performance than the state-of-the-art methods. The source codes and data are publicly available at https://github.com/xwanggroup/Deepro-Glu.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Zhaoyuan Ding
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Xi Lin
- Instiute of Artificial Intelligence, Xiamen University, No.4221, Xiang'an South Road, 361000, Xiamen, China
| |
Collapse
|
4
|
Jia J, Sun M, Wu G, Qiu W. DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2815-2830. [PMID: 36899559 DOI: 10.3934/mbe.2023132] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
As a key issue in orchestrating various biological processes and functions, protein post-translational modification (PTM) occurs widely in the mechanism of protein's function of animals and plants. Glutarylation is a type of protein-translational modification that occurs at active ε-amino groups of specific lysine residues in proteins, which is associated with various human diseases, including diabetes, cancer, and glutaric aciduria type I. Therefore, the issue of prediction for glutarylation sites is particularly important. This study developed a brand-new deep learning-based prediction model for glutarylation sites named DeepDN_iGlu via adopting attention residual learning method and DenseNet. The focal loss function is utilized in this study in place of the traditional cross-entropy loss function to address the issue of a substantial imbalance in the number of positive and negative samples. It can be noted that DeepDN_iGlu based on the deep learning model offers a greater potential for the glutarylation site prediction after employing the straightforward one hot encoding method, with Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), Mathews Correlation Coefficient (MCC), and Area Under Curve (AUC) of 89.29%, 61.97%, 65.15%, 0.33 and 0.80 accordingly on the independent test set. To the best of the authors' knowledge, this is the first time that DenseNet has been used for the prediction of glutarylation sites. DeepDN_iGlu has been deployed as a web server (https://bioinfo.wugenqiang.top/~smw/DeepDN_iGlu/) that is available to make glutarylation site prediction data more accessible.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Mingwei Sun
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
5
|
Molnár V, Lakner Z, Molnár A, Tárnoki DL, Tárnoki ÁD, Kunos L, Tamás L. The Predictive Role of Subcutaneous Adipose Tissue in the Pathogenesis of Obstructive Sleep Apnoea. Life (Basel) 2022; 12:life12101504. [PMID: 36294937 PMCID: PMC9605212 DOI: 10.3390/life12101504] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 09/20/2022] [Accepted: 09/23/2022] [Indexed: 11/16/2022] Open
Abstract
Simple Summary Although several methods are used to diagnose obstructive sleep apnoea (OSA), the disorder is still underdiagnosed, leading to public healthcare problems. The main aim of the present study was to analyse the role of artificial intelligence in OSA diagnostics and obstruction localisation and, moreover, the role of subcutaneous adipose tissue in OSA pathophysiology. The significance of the present investigation is that using US in OSA diagnostics and obstruction location, an additional opportunity besides standard procedures (i.e., drug-induced sleep endoscopy or polygraphy) is presented, which is vital due to the high number of undiagnosed cases. Applying the algorithm, including artificial intelligence, the presence of obstructions and its localisation, can be determined with high precision. This can be essential in therapy planning or preoperative patient preparation. Abstract Introduction: Our aim was to investigate the applicability of artificial intelligence in predicting obstructive sleep apnoea (OSA) and upper airway obstruction using ultrasound (US) measurements of subcutaneous adipose tissues (SAT) in the regions of the neck, chest and abdomen. Methods: One hundred patients were divided into mild (32), moderately severe-severe (32) OSA and non-OSA (36), according to the results of the polysomnography. These patients were examined using anthropometric measurements and US of SAT and drug-induced sleep endoscopy. Results: Using SAT US and anthropometric parameters, oropharyngeal obstruction could be predicted in 64% and tongue-based obstruction in 72%. In predicting oropharyngeal obstruction, BMI, abdominal and hip circumferences, submental SAT and SAT above the second intercostal space on the left were identified as essential parameters. Furthermore, tongue-based obstruction was predicted mainly by height, SAT measured 2 cm above the umbilicus and submental SAT. The OSA prediction was successful in 97% using the parameters mentioned above. Moreover, other parameters, such as US-based SAT, with SAT measured 2 cm above the umbilicus and both-sided SAT above the second intercostal spaces as the most important ones. Discussion: Based on our results, several categories of OSA can be predicted using artificial intelligence with high precision by using SAT and anthropometric parameters.
Collapse
Affiliation(s)
- Viktória Molnár
- Department of Otolaryngology and Head and Neck Surgery, Semmelweis University, 1083 Budapest, Hungary
- Correspondence: ; Tel.: +36-20-663-2402
| | - Zoltán Lakner
- Szent István Campus, Hungarian University of Agriculture and Life Sciences, 2100 Gödöllő, Hungary
| | - András Molnár
- Department of Otolaryngology and Head and Neck Surgery, Semmelweis University, 1083 Budapest, Hungary
| | | | | | - László Kunos
- Department of Pulmonology, Pulmonology Hospital of Törökbálint, 2045 Törökbálint, Hungary
| | - László Tamás
- Department of Otolaryngology and Head and Neck Surgery, Semmelweis University, 1083 Budapest, Hungary
- Department of Voice, Speech and Swallowing Therapy, Faculty of Health Sciences, Semmelweis University, 1083 Budapest, Hungary
| |
Collapse
|