1
|
Kolaitis A, Makris E, Karagiannis AA, Tsanakas P, Pavlatos C. Knotify_V2.0: Deciphering RNA Secondary Structures with H-Type Pseudoknots and Hairpin Loops. Genes (Basel) 2024; 15:670. [PMID: 38927606 PMCID: PMC11203014 DOI: 10.3390/genes15060670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/19/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
Accurately predicting the pairing order of bases in RNA molecules is essential for anticipating RNA secondary structures. Consequently, this task holds significant importance in unveiling previously unknown biological processes. The urgent need to comprehend RNA structures has been accentuated by the unprecedented impact of the widespread COVID-19 pandemic. This paper presents a framework, Knotify_V2.0, which makes use of syntactic pattern recognition techniques in order to predict RNA structures, with a specific emphasis on tackling the demanding task of predicting H-type pseudoknots that encompass bulges and hairpins. By leveraging the expressive capabilities of a Context-Free Grammar (CFG), the suggested framework integrates the inherent benefits of CFG and makes use of minimum free energy and maximum base pairing criteria. This integration enables the effective management of this inherently ambiguous task. The main contribution of Knotify_V2.0 compared to earlier versions lies in its capacity to identify additional motifs like bulges and hairpins within the internal loops of the pseudoknot. Notably, the proposed methodology, Knotify_V2.0, demonstrates superior accuracy in predicting core stems compared to state-of-the-art frameworks. Knotify_V2.0 exhibited exceptional performance by accurately identifying both core base pairing that form the ground truth pseudoknot in 70% of the examined sequences. Furthermore, Knotify_V2.0 narrowed the performance gap with Knotty, which had demonstrated better performance than Knotify and even surpassed it in Recall and F1-score metrics. Knotify_V2.0 achieved a higher count of true positives (tp) and a significantly lower count of false negatives (fn) compared to Knotify, highlighting improvements in Prediction and Recall metrics, respectively. Consequently, Knotify_V2.0 achieved a higher F1-score than any other platform. The source code and comprehensive implementation details of Knotify_V2.0 are publicly available on GitHub.
Collapse
Affiliation(s)
- Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Alexandros Anastasios Karagiannis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
| |
Collapse
|
2
|
Pal D, Dey S, Ghosh P, Bhattacharya DK, Das S, Maji B. A unique approach for protein secondary structure comparison under TOPS representation. J Biomol Struct Dyn 2024:1-13. [PMID: 38698728 DOI: 10.1080/07391102.2024.2333449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/15/2024] [Indexed: 05/05/2024]
Abstract
To unravel the intricate connection between protein function and protein structure, it is imperative to comprehensively evaluate protein secondary structure similarity from various perspectives. While numerous techniques have been suggested for comparing protein secondary structure elements (SSE), there continues to be a substantial need for finding alternative ways of comparing the same. In this paper, Topology of Protein Structure (TOPS) representations of protein secondary structures are considered to offer a new alignment-free method for evaluating similarities/dissimilarities of protein secondary structures. Initially, a two-dimensional numerical representation of the SSE is created, associating each point with a mass reflecting its frequency of occurrence. Then the means of coordinate values are determined by averaging weighted sums, and these mean values are subsequently used to calculate moments-of-inertia. Next, a four-component descriptor is generated out of the eigenvalues of the matrix and the mean values of the represented coordinates. Thereafter, Manhattan distance measure is used to obtain the distance matrix. This is finally applied to obtain the phylogenetic trees under the use of NJ method. SSE considered in the proposed method comprises 36-elements from the Chew-Kedem database giving five different taxa: globin, alpha-beta, tim-barrel, beta, and alpha. Phylogenetic trees were created for these SSE through the application of various methods: Clustal-Omega, LZ-Complexity, SED, TOPS + and TOC, to facilitate comparative analysis. Phylogenetic tree of the proposed method outperformed results of the previous methods when applied to the same SSE. Therefore, the method effectively constructs phylogenetic tree for analyzing protein secondary structure comparison.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Debrupa Pal
- Computer Application, Narula Institute of Technology, Kolkata, India
- Electronics and Communication Engineering, National Institute of Technology, Durgapur, India
| | - Sudeshna Dey
- Computer Science and Engineering, Narula Institute of Technology, Kolkata, India
| | - Papri Ghosh
- Computer Science and Engineering, Narula Institute of Technology, Kolkata, India
| | | | - Subhram Das
- Computer Science and Engineering, Narula Institute of Technology, Kolkata, India
| | - Bansibadan Maji
- Electronics and Communication Engineering, National Institute of Technology, Durgapur, India
| |
Collapse
|
3
|
Hashemi Sheikhshabani S, Ghafouri-Fard S, Amini-Farsani Z, Modarres P, Khazaei Feyzabad S, Amini-Farsani Z, Shaygan N, Omrani MD. In Silico Prediction of Functional SNPs Interrupting Antioxidant Defense Genes in Relation to COVID-19 Progression. Biochem Genet 2024:10.1007/s10528-024-10705-9. [PMID: 38460087 DOI: 10.1007/s10528-024-10705-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 01/16/2024] [Indexed: 03/11/2024]
Abstract
The excessive production of reactive oxygen species and weakening of antioxidant defense system play a pivotal role in the pathogenesis of different diseases. Extensive differences observed among individuals in terms of affliction with cancer, cardiovascular disorders, diabetes, bacterial, and viral infections, as well as response to treatments can be partly due to their genomic variations. In this work, we attempted to predict the effect of SNPs of the key genes of antioxidant defense system on their structure, function, and expression in relation to COVID-19 pathogenesis using in silico tools. In addition, the effect of SNPs on the target site binding efficiency of SNPs was investigated as a factor with potential to change drug response or susceptibility to COVID-19. According to the predicted results, only six missense SNPs with minor allele frequency (MAF) ≥ 0.1 in the coding region of genes GPX7, GPX8, TXNRD2, GLRX5, and GLRX were able to strongly affect their structure and function. Our results predicted that 39 SNPs with MAF ≥ 0.1 led to the generation or destruction of miRNA-binding sites on target antioxidant genes from GPX, PRDX, GLRX, TXN, and SOD families. The results obtained from comparing the expression profiles of mild vs. severe COVID-19 patients using GEO2R demonstrated a significant change in the expression of approximately 250 miRNAs. The binding efficiency of 21 of these miRNAs was changed due to the elimination or generation of target sites in these genes. Altogether, this study reveals the fundamental role of the SNPs of antioxidant defense genes in COVID-19 progression and susceptibility of individuals to this virus. In addition, different responses of COVID-19 patients to antioxidant defense system enhancement drugs may be due to presence of these SNPs in different individuals.
Collapse
Affiliation(s)
- Somayeh Hashemi Sheikhshabani
- Student Research Committee, Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Soudeh Ghafouri-Fard
- Student Research Committee, Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Zeinab Amini-Farsani
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Parastoo Modarres
- Department of Cell and Molecular Biology and Microbiology, University of Isfahan, Isfahan, Iran
| | - Sharareh Khazaei Feyzabad
- Department of Laboratory Sciences, School of Paramedical Sciences, Zahedan University of Medical Sciences, Zahedan, Iran
| | - Zahra Amini-Farsani
- Bayesian Imaging and Spatial Statistics Group, Institute of Statistics, Ludwig-Maximilian-Universität München, Ludwigstraße 33, 80539, Munich, Germany
| | - Nasibeh Shaygan
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mir Davood Omrani
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
- Urogenital Stem Cell Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
4
|
Sekmen A, Al Nasr K, Bilgin B, Koku AB, Jones C. Mathematical and Machine Learning Approaches for Classification of Protein Secondary Structure Elements from Cα Coordinates. Biomolecules 2023; 13:923. [PMID: 37371503 DOI: 10.3390/biom13060923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 05/16/2023] [Accepted: 05/16/2023] [Indexed: 06/29/2023] Open
Abstract
Determining Secondary Structure Elements (SSEs) for any protein is crucial as an intermediate step for experimental tertiary structure determination. SSEs are identified using popular tools such as DSSP and STRIDE. These tools use atomic information to locate hydrogen bonds to identify SSEs. When some spatial atomic details are missing, locating SSEs becomes a hinder. To address the problem, when some atomic information is missing, three approaches for classifying SSE types using Cα atoms in protein chains were developed: (1) a mathematical approach, (2) a deep learning approach, and (3) an ensemble of five machine learning models. The proposed methods were compared against each other and with a state-of-the-art approach, PCASSO.
Collapse
Affiliation(s)
- Ali Sekmen
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA
| | - Kamal Al Nasr
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA
| | - Bahadir Bilgin
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA
- Department of Mechanical Engineering, Middle East Technical University, Ankara 06800, Türkiye
| | - Ahmet Bugra Koku
- Department of Mechanical Engineering, Middle East Technical University, Ankara 06800, Türkiye
- Center for Robotics and AI, Middle East Technical University, Ankara 06800, Türkiye
| | - Christopher Jones
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA
| |
Collapse
|
5
|
de Brevern AG. A Perspective on the (Rise and Fall of) Protein β-Turns. Int J Mol Sci 2022; 23:12314. [PMID: 36293166 PMCID: PMC9604201 DOI: 10.3390/ijms232012314] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/07/2022] [Accepted: 10/13/2022] [Indexed: 11/21/2022] Open
Abstract
The β-turn is the third defined secondary structure after the α-helix and the β-sheet. The β-turns were described more than 50 years ago and account for more than 20% of protein residues. Nonetheless, they are often overlooked or even misunderstood. This poor knowledge of these local protein conformations is due to various factors, causes that I discuss here. For example, confusion still exists about the assignment of these local protein structures, their overlaps with other structures, the potential absence of a stabilizing hydrogen bond, the numerous types of β-turns and the software's difficulty in assigning or visualizing them. I also propose some ideas to potentially/partially remedy this and present why β-turns can still be helpful, even in the AlphaFold 2 era.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Team, F-75014 Paris, France
| |
Collapse
|
6
|
Antony JV, Koya R, Pournami PN, Nair GG, Balakrishnan JP. Protein secondary structure assignment using residual networks. J Mol Model 2022; 28:269. [PMID: 35997827 DOI: 10.1007/s00894-022-05271-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 08/12/2022] [Indexed: 11/27/2022]
Abstract
Proteins are constructed from amino acid sequences. Their structural classifications include primary, secondary, tertiary, and quaternary, with tertiary and quaternary structures influencing protein function. Because a protein's structure is inextricably connected to its biological function, machine learning algorithms that can better anticipate the structures have the potential to lead to new scientific discoveries in human health and improve our capacity to develop new treatments. Protein secondary structure assignment enriches the structural and functional understanding of proteins. It helps in protein structure comparison and classification studies, besides facilitating secondary and tertiary structure prediction systems. Several secondary structure assignment methods have been developed since the 1980s, most of which are based on hydrogen bond analysis and atomic coordinate features. However, the assignment process becomes complex when protein data includes missing atoms. Deep neural networks are often referred to as universal function approximators because they can approximate any function to produce the desired output when properly designed and trained. Optimised deep learning architectures have already proven their ability to increase performance in a wide range of problems. Recently, the ResNet architecture has garnered significant interest due to its applicability in various areas, including image classification and protein contact map prediction. The proposed model, which is based on the ResNet architecture, assigns secondary structures using Cα atom coordinates. The model achieved an accuracy of 94% when evaluated against the benchmark and independent test sets. The findings encourage the development of new deep learning-based methods that are more generalised across various protein learning tasks. Furthermore, it allows computational biologists to delve deeper into integrating these techniques with experimental methods. The model codes are available at: https://github.com/jisnava/ResNet_for_Structure_Assignments/ .
Collapse
Affiliation(s)
- Jisna Vellara Antony
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kattangal, Kerala, 673601, India.
| | - Roosafeed Koya
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kattangal, Kerala, 673601, India
| | | | - Gopakumar Gopalakrishnan Nair
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kattangal, Kerala, 673601, India
| | | |
Collapse
|
7
|
Automated Protein Secondary Structure Assignment from C α Positions Using Neural Networks. Biomolecules 2022; 12:biom12060841. [PMID: 35740966 PMCID: PMC9220970 DOI: 10.3390/biom12060841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 11/17/2022] Open
Abstract
The assignment of secondary structure elements in protein conformations is necessary to interpret a protein model that has been established by computational methods. The process essentially involves labeling the amino acid residues with H (Helix), E (Strand), or C (Coil, also known as Loop). When particular atoms are absent from an input protein structure, the procedure becomes more complicated, especially when only the alpha carbon locations are known. Various techniques have been tested and applied to this problem during the last forty years. The application of machine learning techniques is the most recent trend. This contribution presents the HECA classifier, which uses neural networks to assign protein secondary structure types. The technique exclusively employs Cα coordinates. The Keras (TensorFlow) library was used to implement and train the neural network model. The BioShell toolkit was used to calculate the neural network input features from raw coordinates. The study’s findings show that neural network-based methods may be successfully used to take on structure assignment challenges when only Cα trace is available. Thanks to the careful selection of input features, our approach’s accuracy (above 97%) exceeded that of the existing methods.
Collapse
|
8
|
Antony JV, Madhu P, Balakrishnan JP, Yadav H. Assigning secondary structure in proteins using AI. J Mol Model 2021; 27:252. [PMID: 34402969 DOI: 10.1007/s00894-021-04825-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 06/16/2021] [Indexed: 12/16/2022]
Abstract
Knowledge about protein structure assignment enriches the structural and functional understanding of proteins. Accurate and reliable structure assignment data is crucial for secondary structure prediction systems. Since the 1980s, various methods based on hydrogen bond analysis and atomic coordinate geometry, followed by machine learning, have been employed in protein structure assignment. However, the assignment process becomes challenging when missing atoms are present in the protein files. Our method proposed a multi-class classifier program named DLFSA for assigning protein secondary structure elements (SSE) using convolutional neural networks (CNNs). A fast and efficient GPU-based parallel procedure extracts fragments from protein files. The model implemented in this work is trained with a subset of the protein fragments and achieves 88.1% and 82.5% train and test accuracy, respectively. The model uses only Cα coordinates for secondary structure assignments. The model has been successfully tested on a few full-length proteins also. Results from the fragment-based studies demonstrate the feasibility of applying deep learning solutions for structure assignment problems.
Collapse
Affiliation(s)
- Jisna Vellara Antony
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala, 673601, India.
| | - Prayagh Madhu
- Computer Science and Engineering Dept., Rajiv Gandhi Institute of Technology, Kottayam, India
| | | | - Hemant Yadav
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala, 673601, India
| |
Collapse
|
9
|
Adasme-Carreño F, Caballero J, Ireta J. PSIQUE: Protein Secondary Structure Identification on the Basis of Quaternions and Electronic Structure Calculations. J Chem Inf Model 2021; 61:1789-1800. [PMID: 33769809 DOI: 10.1021/acs.jcim.0c01343] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The secondary structure is important in protein structure analysis, classification, and modeling. We have developed a novel method for secondary structure assignment, termed PSIQUE, based on the potential energy surface (PES) of polyalanine obtained using an infinitely long chain model and density functional theory calculations. First, uniform protein segments are determined in terms of a difference of quaternions between neighboring amino acids along the protein backbone. Then, the identification of the secondary structure motifs is carried out based on the minima found in the PES. PSIQUE shows good agreement with other secondary structure assignment methods. However, it provides better discrimination of subtle secondary structures (e.g., helix types) and termini and produces more uniform segments while also accounting for local distortions. Overall, PSIQUE provides a precise and reliable assignment of secondary structures, so it should be helpful for the detailed characterization of the protein structure.
Collapse
Affiliation(s)
- Francisco Adasme-Carreño
- Departamento de Bioinformática, Centro de Bioinformática, Simulación y Modelado (CBSM), Facultad de Ingeniería, Universidad de Talca, Campus Talca, 1 Poniente No. 1141, Casilla 721, Talca 3460000, Chile
| | - Julio Caballero
- Departamento de Bioinformática, Centro de Bioinformática, Simulación y Modelado (CBSM), Facultad de Ingeniería, Universidad de Talca, Campus Talca, 1 Poniente No. 1141, Casilla 721, Talca 3460000, Chile
| | - Joel Ireta
- Departamento de Química, División de Ciencias Básicas e Ingeniería, Universidad Autónoma Metropolitana-Iztapalapa, A.P. 55-534, Ciudad de Mexico 09340, Mexico
| |
Collapse
|
10
|
Oldfield CJ, Chen K, Kurgan L. Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences. Methods Mol Biol 2019; 1958:73-100. [PMID: 30945214 DOI: 10.1007/978-1-4939-9161-7_4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Many new methods for the sequence-based prediction of the secondary and supersecondary structures have been developed over the last several years. These and older sequence-based predictors are widely applied for the characterization and prediction of protein structure and function. These efforts have produced countless accurate predictors, many of which rely on state-of-the-art machine learning models and evolutionary information generated from multiple sequence alignments. We describe and motivate both types of predictions. We introduce concepts related to the annotation and computational prediction of the three-state and eight-state secondary structure as well as several types of supersecondary structures, such as β hairpins, coiled coils, and α-turn-α motifs. We review 34 predictors focusing on recent tools and provide detailed information for a selected set of 14 secondary structure and 3 supersecondary structure predictors. We conclude with several practical notes for the end users of these predictive methods.
Collapse
Affiliation(s)
- Christopher J Oldfield
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Ke Chen
- School of Computer Science and Software Engineering, Tianjin Polytechnic University, Tianjin, People's Republic of China
| | - Lukasz Kurgan
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
11
|
Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM-ANN Algorithm. Acta Biotheor 2018; 66:135-148. [PMID: 29700659 PMCID: PMC7250805 DOI: 10.1007/s10441-018-9327-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 04/16/2018] [Indexed: 12/11/2022]
Abstract
The accurate annotation of an unknown protein sequence depends on extant data of template sequences. This could be empirical or sets of reference sequences, and provides an exhaustive pool of probable functions. Individual methods of predicting dominant function possess shortcomings such as varying degrees of inter-sequence redundancy, arbitrary domain inclusion thresholds, heterogeneous parameterization protocols, and ill-conditioned input channels. Here, I present a rigorous theoretical derivation of various steps of a generic algorithm that integrates and utilizes several statistical methods to predict the dominant function in unknown protein sequences. The accompanying mathematical proofs, interval definitions, analysis, and numerical computations presented are meant to offer insights not only into the specificity and accuracy of predictions, but also provide details of the operatic mechanisms involved in the integration and its ensuing rigor. The algorithm uses numerically modified raw hidden markov model scores of well defined sets of training sequences and clusters them on the basis of known function. The results are then fed into an artificial neural network, the predictions of which can be refined using the available data. This pipeline is trained recursively and can be used to discern the dominant principal function, and thereby, annotate an unknown protein sequence. Whilst, the approach is complex, the specificity of the final predictions can benefit laboratory workers design their experiments with greater confidence.
Collapse
|
12
|
|