1
|
Derivation of Highly Predictive 3D-QSAR Models for hERG Channel Blockers Based on the Quantum Artificial Neural Network Algorithm. Pharmaceuticals (Basel) 2023; 16:1509. [PMID: 38004375 PMCID: PMC10675541 DOI: 10.3390/ph16111509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 10/14/2023] [Accepted: 10/20/2023] [Indexed: 11/26/2023] Open
Abstract
The hERG potassium channel serves as an annexed target for drug discovery because the associated off-target inhibitory activity may cause serious cardiotoxicity. Quantitative structure-activity relationship (QSAR) models were developed to predict inhibitory activities against the hERG potassium channel, utilizing the three-dimensional (3D) distribution of quantum mechanical electrostatic potential (ESP) as the molecular descriptor. To prepare the optimal atomic coordinates of dataset molecules, pairwise 3D structural alignments were carried out in order for the quantum mechanical cross correlation between the template and other molecules to be maximized. This alignment method stands out from the common atom-by-atom matching technique, as it can handle structurally diverse molecules as effectively as chemical derivatives that share an identical scaffold. The alignment problem prevalent in 3D-QSAR methods was ameliorated substantially by dividing the dataset molecules into seven subsets, each of which contained molecules with similar molecular weights. Using an artificial neural network algorithm to find the functional relationship between the quantum mechanical ESP descriptors and the experimental hERG inhibitory activities, highly predictive 3D-QSAR models were derived for all seven molecular subsets to the extent that the squared correlation coefficients exceeded 0.79. Given their simplicity in model development and strong predictability, the 3D-QSAR models developed in this study are expected to function as an effective virtual screening tool for assessing the potential cardiotoxicity of drug candidate molecules.
Collapse
|
2
|
PrePCI: A structure- and chemical similarity-informed database of predicted protein compound interactions. Protein Sci 2023; 32:e4594. [PMID: 36776141 PMCID: PMC10019447 DOI: 10.1002/pro.4594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 02/07/2023] [Accepted: 02/09/2023] [Indexed: 02/14/2023]
Abstract
We describe the Predicting Protein-Compound Interactions (PrePCI) database which comprises over 5 billion predicted interactions between 6.8 million chemical compounds and 19,797 human proteins. PrePCI relies on a proteome-wide database of structural models based on both traditional modeling techniques and the AlphaFold Protein Structure Database. Sequence- and structural similarity-based metrics are established between template proteins, T, in the Protein Data Bank that bind compounds, C, and query proteins in the model database, Q. When the metrics exceed threshold values, it is assumed that C also binds to Q with a likelihood ratio (LR) derived from machine learning. If the relationship is based on structural similarity, the LR is based on a scoring function that measures the extent to which C is compatible with the binding site of Q as described in the LT-scanner algorithm. For every predicted complex derived in this way, chemical similarity based on the Tanimoto coefficient identifies other small molecules that may bind to Q. An overall LR for the binding of C to Q is obtained from Naive Bayesian statistics. The PrePCI database can be queried by entering a UniProt ID or gene name for a protein to obtain a list of compounds predicted to bind to it along with associated LRs. Alternatively, entering an identifier for the compound outputs a list of proteins it is predicted to bind. Specific applications of the database to lead discovery, elucidation of drug mechanism of action, and biological function annotation are described.
Collapse
|
3
|
DALI shines a light on remote homologs: One hundred discoveries. Protein Sci 2023; 32:e4519. [PMID: 36419248 PMCID: PMC9793968 DOI: 10.1002/pro.4519] [Citation(s) in RCA: 110] [Impact Index Per Article: 110.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/15/2022] [Accepted: 11/20/2022] [Indexed: 11/25/2022]
Abstract
Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server (http://ekhidna2.biocenter.helsinki.fi/dali) is a platform for structural analysis that provides database searches and interactive visualization, including structural alignments annotated with secondary structure, protein families and sequence logos, and 3D structure superimposition supported by color-coded sequence and structure conservation. Here, we are using DALI to mine the AlphaFold Database version 1, which increased the structural coverage of protein families by 20%. We found 100 remote homologous relationships hitherto unreported in the current reference database for protein domains, Pfam 35.0. In particular, we linked 35 domains of unknown function (DUFs) to the previously characterized families, generating a functional hypothesis that can be explored downstream in structural biology studies. Other findings include gene fusions, tandem duplications, and adjustments to domain boundaries. The evidence for homology can be browsed interactively through live examples on DALI's website.
Collapse
|
4
|
Identification and in silico Analysis of Nonsense SNPs of Human Colorectal Cancer Protein. J Oleo Sci 2022; 71:363-370. [PMID: 35236796 DOI: 10.5650/jos.ess21313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Colorectal cancer (CRC) is the third most prevalent disease in the world, with an estimated 1.2 million new cases each year. Spontaneous CRCs account for around 70% of all CRCs, are caused by somatic mutations. Minor variations or single-nucleotide polymorphisms (SNPs) in oncogene or tumor-suppressor genes cause familial CRC. MSH2 and MSH6 genes are located on chromosome 2. These genes products are involved in the repair of DNA replication defects. If these proteins are changed, the replication errors are not rectified, resulting in damaged DNA leading to colorectal cancer. We employed a variety of computational methodologies to find nsSNPs that are harmful to the structure and function of the MSH6 protein and could be causing CRC in our study. SIFT, PROVEAN, Poly- Phen-2, PhD-SNP, and SNPs&GO were among the in silico methods used to do the computational research. According to the findings, mutations of G932Q, E1234Q, and F1104Q are important alterations in native MSH6 protein rs35717727 that may contribute to its dysfunction and, ultimately, disease. The study also provided three-dimensional structures of the native MSH6 protein and mutations. These nsSNPs should be considered as key target mutations in many disorders involving MSH6 dysfunction in future studies. This is the first thorough study to use in silico technologies to assess MSH6 gene variants, and it will be extremely useful in planning largescale investigations and developing precision medicines to treat disorders caused by these polymorphisms. Additionally, animal models of various autoimmune disorders with these mutations could aid in determining their precise involvement.
Collapse
|
5
|
Abstract
Retroviral elements from endogenous retroviruses have functions in mammalian physiology. The best-known examples are the envelope proteins that function in placenta development and immune suppression. Porcine endogenous retroviruses (PERVs) are an understudied class of endogenous retroviruses that infect cultured human cells, raising concern regarding porcine xenografts. The PERV envelope glycoprotein has also been proposed as a possible swine syncytin with a role in placental development. Despite the growing interest in PERVs, their envelope glycoproteins remain poorly characterized. Here, we successfully determined the postfusion crystal structure of the PERV core fusion ectodomain. The PERV fusion protein structure reveals a conserved class I viral fusion protein six-helix bundle. Biophysical experiments demonstrated that the thermodynamic stability of the PERV fusion protein secondary structure was the same at physiological and acidic pHs. A conserved surface analysis highlights the high degree of sequence conservation among retroviral fusogens in the chain reversal region that facilitates the large-scale conformational change required for membrane fusion. Further structural alignment of class I viral fusogens revealed a phylogenetic clustering that shows evolution into various lineages that correlate with virus mechanisms of cell entry. Our work indicates that structural dendrograms can be used to qualitatively infer insights into the fusion mechanisms of newly discovered class I viral fusogen structures. IMPORTANCE Class I viral fusion proteins represent a diverse group of fusogens that catalyze membrane fusion. Although structural studies have focused on those from exogenous viruses, ancient retroviral infections of germ line cells have immortalized ancient fusogens in eukaryotic genomes. These "fossilized" glycoproteins are poorly defined compared to modern fusogens. In this study, we characterized and determined the structure of the porcine endogenous retrovirus fusogen, an ancient retroviral element captured by swine. This fusion protein revealed remarkable alignment to exogenous retroviral fusion proteins, suggesting that fossil fusogens utilize similar structural determinants to perform membrane fusion. Moreover, structural phylogenetic analysis demonstrates that class I viral fusogens cluster into distinct lineages defined by mechanism of membrane fusion. Our results suggest that structural dendrograms can be used to infer mechanistic insights for uncharacterized fusion proteins.
Collapse
|
6
|
LinearTurboFold: Linear-time global prediction of conserved structures for RNA homologs with applications to SARS-CoV-2. Proc Natl Acad Sci U S A 2021; 118:e2116269118. [PMID: 34887342 PMCID: PMC8719904 DOI: 10.1073/pnas.2116269118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2021] [Indexed: 12/26/2022] Open
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
Collapse
|
7
|
LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.11.23.393488. [PMID: 34816262 PMCID: PMC8609897 DOI: 10.1101/2020.11.23.393488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt ) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurbo-Fold's purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics. SIGNIFICANCE STATEMENT Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.
Collapse
|
8
|
Quantum Artificial Neural Network Approach to Derive a Highly Predictive 3D-QSAR Model for Blood-Brain Barrier Passage. Int J Mol Sci 2021; 22:ijms222010995. [PMID: 34681653 PMCID: PMC8537149 DOI: 10.3390/ijms222010995] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 10/07/2021] [Accepted: 10/10/2021] [Indexed: 01/07/2023] Open
Abstract
A successful passage of the blood–brain barrier (BBB) is an essential prerequisite for the drug molecules designed to act on the central nervous system. The logarithm of blood–brain partitioning (LogBB) has served as an effective index of molecular BBB permeability. Using the three-dimensional (3D) distribution of the molecular electrostatic potential (ESP) as the numerical descriptor, a quantitative structure-activity relationship (QSAR) model termed AlphaQ was derived to predict the molecular LogBB values. To obtain the optimal atomic coordinates of the molecules under investigation, the pairwise 3D structural alignments were conducted in such a way to maximize the quantum mechanical cross correlation between the template and a target molecule. This alignment method has the advantage over the conventional atom-by-atom matching protocol in that the structurally diverse molecules can be analyzed as rigorously as the chemical derivatives with the same scaffold. The inaccuracy problem in the 3D structural alignment was alleviated in a large part by categorizing the molecules into the eight subsets according to the molecular weight. By applying the artificial neural network algorithm to associate the fully quantum mechanical ESP descriptors with the extensive experimental LogBB data, a highly predictive 3D-QSAR model was derived for each molecular subset with a squared correlation coefficient larger than 0.8. Due to the simplicity in model building and the high predictability, AlphaQ is anticipated to serve as an effective computational screening tool for molecular BBB permeability.
Collapse
|
9
|
A Structure Based Study of Selective Inhibition of Factor IXa over Factor Xa. Molecules 2021; 26:molecules26175372. [PMID: 34500804 PMCID: PMC8434132 DOI: 10.3390/molecules26175372] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/27/2021] [Accepted: 08/30/2021] [Indexed: 11/25/2022] Open
Abstract
Blood coagulation is an essential physiological process for hemostasis; however, abnormal coagulation can lead to various potentially fatal disorders, generally known as thromboembolic disorders, which are a major cause of mortality in the modern world. Recently, the FDA has approved several anticoagulant drugs for Factor Xa (FXa) which work via the common pathway of the coagulation cascade. A main side effect of these drugs is the potential risk for bleeding in patients. Coagulation Factor IXa (FIXa) has recently emerged as the strategic target to ease these risks as it selectively regulates the intrinsic pathway. These aforementioned coagulation factors are highly similar in structure, functional architecture, and inhibitor binding mode. Therefore, it remains a challenge to design a selective inhibitor which may affect only FIXa. With the availability of a number of X-ray co-crystal structures of these two coagulation factors as protein–ligand complexes, structural alignment, molecular docking, and pharmacophore modeling were employed to derive the relevant criteria for selective inhibition of FIXa over FXa. In this study, six ligands (three potent, two selective, and one inactive) were selected for FIXa inhibition and six potent ligands (four FDA approved drugs) were considered for FXa. The pharmacophore hypotheses provide the distribution patterns for the principal interactions that take place in the binding site. None of the pharmacophoric patterns of the FXa inhibitors matched with any of the patterns of FIXa inhibitors. Based on pharmacophore analysis, a selectivity of a ligand for FIXa over FXa may be defined quantitatively as a docking score of lower than −8.0 kcal/mol in the FIXa-grids and higher than −7.5 kcal/mol in the FXa-grids.
Collapse
|
10
|
Abstract
During the past five years, deep-learning algorithms have enabled ground-breaking progress towards the prediction of tertiary structure from a protein sequence. Very recently, we developed SAdLSA, a new computational algorithm for protein sequence comparison via deep-learning of protein structural alignments. SAdLSA shows significant improvement over established sequence alignment methods. In this contribution, we show that SAdLSA provides a general machine-learning framework for structurally characterizing protein sequences. By aligning a protein sequence against itself, SAdLSA generates a fold distogram for the input sequence, including challenging cases whose structural folds were not present in the training set. About 70% of the predicted distograms are statistically significant. Although at present the accuracy of the intra-sequence distogram predicted by SAdLSA self-alignment is not as good as deep-learning algorithms specifically trained for distogram prediction, it is remarkable that the prediction of single protein structures is encoded by an algorithm that learns ensembles of pairwise structural comparisons, without being explicitly trained to recognize individual structural folds. As such, SAdLSA can not only predict protein folds for individual sequences, but also detects subtle, yet significant, structural relationships between multiple protein sequences using the same deep-learning neural network. The former reduces to a special case in this general framework for protein sequence annotation.
Collapse
|
11
|
First phylogenetic analysis of Dryophthorinae (Coleoptera, Curculionidae) based on structural alignment of ribosomal DNA reveals Cenozoic diversification. Ecol Evol 2021; 11:1984-1998. [PMID: 33717436 PMCID: PMC7920784 DOI: 10.1002/ece3.7131] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 11/10/2020] [Accepted: 11/12/2020] [Indexed: 01/09/2023] Open
Abstract
Dryophthorinae is an economically important, ecologically distinct, and ubiquitous monophyletic group of pantropical weevils with more than 1,200 species in 153 genera. This study provides the first comprehensive phylogeny of the group with the aim to provide insights into the process and timing of diversification of phytophagous insects, inform classification and facilitate predictions. The taxon sampling is the most extensive to date and includes representatives of all five dryophthorine tribes and all but one subtribe. The phylogeny is based on secondary structural alignment of 18S and 28S rRNA totaling 3,764 nucleotides analyzed under Bayesian and maximum likelihood inference. We used a fossil-calibrated relaxed clock model with two approaches, node-dating and fossilized birth-death models, to estimate divergence times for the subfamily. All tribes except the species-rich Rhynchophorini were found to be monophyletic, but higher support is required to ascertain the paraphyly of Rhynchophorini with more confidence. Nephius is closely related to Dryophthorini and Stromboscerini, and there is strong evidence for paraphyly of Sphenophorina. We find a large gap between the divergence of Dryophthorinae from their sister group Platypodinae in the Jurassic-Cretaceous boundary and the diversification of extant species in the Cenozoic, highlighting the role of coevolution with angiosperms in this group.
Collapse
|
12
|
Structure Unveils Relationships between RNA Virus Polymerases. Viruses 2021; 13:v13020313. [PMID: 33671332 PMCID: PMC7922027 DOI: 10.3390/v13020313] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 02/14/2021] [Accepted: 02/15/2021] [Indexed: 12/30/2022] Open
Abstract
RNA viruses are the fastest evolving known biological entities. Consequently, the sequence similarity between homologous viral proteins disappears quickly, limiting the usability of traditional sequence-based phylogenetic methods in the reconstruction of relationships and evolutionary history among RNA viruses. Protein structures, however, typically evolve more slowly than sequences, and structural similarity can still be evident, when no sequence similarity can be detected. Here, we used an automated structural comparison method, homologous structure finder, for comprehensive comparisons of viral RNA-dependent RNA polymerases (RdRps). We identified a common structural core of 231 residues for all the structurally characterized viral RdRps, covering segmented and non-segmented negative-sense, positive-sense, and double-stranded RNA viruses infecting both prokaryotic and eukaryotic hosts. The grouping and branching of the viral RdRps in the structure-based phylogenetic tree follow their functional differentiation. The RdRps using protein primer, RNA primer, or self-priming mechanisms have evolved independently of each other, and the RdRps cluster into two large branches based on the used transcription mechanism. The structure-based distance tree presented here follows the recently established RdRp-based RNA virus classification at genus, subfamily, family, order, class and subphylum ranks. However, the topology of our phylogenetic tree suggests an alternative phylum level organization.
Collapse
|
13
|
Molecular and Structural Evolution of Cytochrome P450 Aromatase. Int J Mol Sci 2021; 22:E631. [PMID: 33435208 PMCID: PMC7827799 DOI: 10.3390/ijms22020631] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/06/2021] [Accepted: 01/07/2021] [Indexed: 12/22/2022] Open
Abstract
Aromatase is the cytochrome P450 enzyme converting androgens into estrogen in the last phase of steroidogenesis. As estrogens are crucial in reproductive biology, aromatase is found in vertebrates and the invertebrates of the genus Branchiostoma, where it carries out the aromatization reaction of the A-ring of androgens that produces estrogens. Here, we investigate the molecular evolution of this unique and highly substrate-selective enzyme by means of structural, sequence alignment, and homology modeling, shedding light on its key role in species conservation. The alignments led to the identification of a core structure that, together with key and unique amino acids located in the active site and the substrate recognition sites, has been well conserved during evolution. Structural analysis shows what their roles are and the reason why they have been preserved. Moreover, the residues involved in the interaction with the redox partner and some phosphorylation sites appeared late during evolution. These data reveal how highly substrate-selective cytochrome P450 has evolved, indicating that the driving forces for evolution have been the optimization of the interaction with the redox partner and the introduction of phosphorylation sites that give the possibility of modulating its activity in a rapid way.
Collapse
|
14
|
Global alignment and assessment of TRP channel transmembrane domain structures to explore functional mechanisms. eLife 2020; 9:e58660. [PMID: 32804077 PMCID: PMC7431192 DOI: 10.7554/elife.58660] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 07/31/2020] [Indexed: 12/20/2022] Open
Abstract
The recent proliferation of published TRP channel structures provides a foundation for understanding the diverse functional properties of this important family of ion channel proteins. To facilitate mechanistic investigations, we constructed a structure-based alignment of the transmembrane domains of 120 TRP channel structures. Comparison of structures determined in the absence or presence of activating stimuli reveals similar constrictions in the central ion permeation pathway near the intracellular end of the S6 helices, pointing to a conserved cytoplasmic gate and suggesting that most available structures represent non-conducting states. Comparison of the ion selectivity filters toward the extracellular end of the pore supports existing hypotheses for mechanisms of ion selectivity. Also conserved to varying extents are hot spots for interactions with hydrophobic ligands, lipids and ions, as well as discrete alterations in helix conformations. This analysis therefore provides a framework for investigating the structural basis of TRP channel gating mechanisms and pharmacology, and, despite the large number of structures included, reveals the need for additional structural data and for more functional studies to establish the mechanistic basis of TRP channel function.
Collapse
|
15
|
Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB). Int J Mol Sci 2020; 21:ijms21062243. [PMID: 32213914 PMCID: PMC7139665 DOI: 10.3390/ijms21062243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 03/06/2020] [Accepted: 03/20/2020] [Indexed: 11/16/2022] Open
Abstract
The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein-protein, protein-DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.
Collapse
|
16
|
DALI and the persistence of protein shape. Protein Sci 2020; 29:128-140. [PMID: 31606894 PMCID: PMC6933842 DOI: 10.1002/pro.3749] [Citation(s) in RCA: 437] [Impact Index Per Article: 109.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/08/2019] [Accepted: 10/09/2019] [Indexed: 12/30/2022]
Abstract
DALI is a popular resource for comparing protein structures. The software is based on distance-matrix alignment. The associated web server provides tools to navigate, integrate and organize some data pushed out by genomics and structural genomics. The server has been running continuously for the past 25 years. Structural biologists routinely use DALI to compare a new structure against previously known protein structures. If significant similarities are discovered, it may indicate a distant homology, that is, that the structures are of shared origin. This may be significant in determining the molecular mechanisms, as these may remain very similar from a distant predecessor to the present day, for example, from the last common ancestor of humans and bacteria. Meta-analysis of independent reference-based evaluations of alignment accuracy and fold discrimination shows DALI at top rank in six out of 12 studies. The web server and standalone software are available from http://ekhidna2.biocenter.helsinki.fi/dali.
Collapse
|
17
|
Alignment of Noncoding Ribonucleic Acids with Pseudoknots Using Context-Sensitive Hidden Markov Model. JOURNAL OF MEDICAL SIGNALS & SENSORS 2019; 9:252-258. [PMID: 31737554 PMCID: PMC6839439 DOI: 10.4103/jmss.jmss_11_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 03/22/2019] [Accepted: 05/15/2019] [Indexed: 12/04/2022]
Abstract
Up to now, various signal processing techniques have been used to predict protein-coding genes that are unsuitable for predicting ribonucleic acids (RNAs). Modeling a gene network can be employed in various fields, such as the discovery of new drugs, reducing the side effects of treatment methods, further identifying genetic diseases and treatments for genetic disorders by influencing the activity of effectual genes, preventing the growth of unwanted tissues via growth weakening and cell reproduction, and also for many other applications in the fields of medicine and agriculture. The main purpose of this study was to design a suitable algorithm based on context-sensitive hidden Markov models (csHMMs) for the alignment of secondary structures of RNAs, which can identify noncoding RNAs. In this model, several RNA families are compared, and their existing similarities are measured. An expectation–maximization algorithm is used to estimate the model's parameters. This algorithm is the standard algorithm to maximize HMM parameters. The alignment results for RNAs belonging to the hepatitis delta virus family showed an accuracy of 83.33%, a specificity of 89%, and a sensitivity of 97%, and RNAs belonging to the purine family showed an accuracy of 65%, a specificity of 76%, and a sensitivity of 76%. The results show that csHMMs, in addition to aligning the primary sequences of RNAs, would align the secondary structures of RNAs with high accuracy.
Collapse
|
18
|
Structure and Sequence Based Analysis of Pullulanases: Understanding Dual Catalytic Mechanism. Protein Pept Lett 2019; 26:893-903. [PMID: 31429684 DOI: 10.2174/0929866526666190820160611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 04/25/2019] [Accepted: 05/26/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Starch processing requires a combination of enzymes with other chemical and physical processes, which increases cost and time. Enzymes used in these processes have a characteristic (α/β)8 barrel domain architecture, although, show variable activity. Pullulanase type 1 and isoamylase act on α-1-6 linkage, amylase on α-1-4 linkage whereas pullulanase type 2 acts on both α-1-6, and α-1-4 linkages of starch. OBJECTIVE This article focusses on elucidating the importance of sequence and structural-based differences in pullulanase, that may lead to its dual catalytic nature. METHODS Initially, sequences and structures of pullulanase type 1, pullulanase type 2, amylase and isoamylase were retrieved from the database (NCBI and PDB). Homology modelling using SWISS-MODEL and PHYRE2 was carried out for predicting the structure of the enzymes with unavailable structures. Further, the modelled structures were validated using ANOLEA, Verify 3D and PROCHECK, structures with high confidence value were selected and used for analysis. Finally, the selected structures were compared by using PDBefold, and their domain alignment and analysis was performed manually using Pymol. RESULTS Modelled structures of pullulanase and isoamylase were validated and selected based on the confidence score. Comparative analysis of complete structures low similarity between the enzymes, although, domain analysis showed good similarity. Moreover, alignment of catalytic site residues showed high similarities with the change in orientation of critical site residues (HIS 242, ASP 347 and GLN 375). CONCLUSION The change in orientation of active site residues along with the absence or presence of few residues might play a crucial role in imparting dual functionality.
Collapse
|
19
|
Computational analysis of non-coding RNAs in Alzheimer's disease. Bioinformation 2019; 15:351-357. [PMID: 31249438 PMCID: PMC6589468 DOI: 10.6026/97320630015351] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 04/01/2019] [Indexed: 01/09/2023] Open
Abstract
Latest studies have shown that Long Noncoding RNAs corresponds to a crucial factor in neurodegenerative diseases and next-generation therapeutic targets. A wide range of advanced computational methods for the analysis of Noncoding RNAs mainly includes the prediction of RNA and miRNA structures. The problems that concern representations of specific biological structures such as secondary structures are either characterized as NP-complete or with high complexity. Numerous algorithms and techniques related to the enumeration of sequential terms of biological structures and mainly with exponential complexity have been constructed until now. While BACE1-AS, NATRad18, 17A, and hnRNP Q lnRNAs have been found to be associated with Alzheimer's disease, in this research study the significance of the most known β-turn-forming residues between these proteins is computationally identified and discussed, as a potentially crucial factor on the regulation of folding, aggregation and other intermolecular interactions.
Collapse
|
20
|
RNA Structure Elements Conserved between Mouse and 59 Other Vertebrates. Genes (Basel) 2018; 9:E392. [PMID: 30071678 PMCID: PMC6116022 DOI: 10.3390/genes9080392] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 07/25/2018] [Accepted: 07/27/2018] [Indexed: 12/24/2022] Open
Abstract
In this work, we present a computational screen conducted for functional RNA structures, resulting in over 100,000 conserved RNA structure elements found in alignments of mouse (mm10) against 59 other vertebrates. We explicitly included masked repeat regions to explore the potential of transposable elements and low-complexity regions to give rise to regulatory RNA elements. In our analysis pipeline, we implemented a four-step procedure: (i) we screened genome-wide alignments for potential structure elements using RNAz-2, (ii) realigned and refined candidate loci with LocARNA-P, (iii) scored candidates again with RNAz-2 in structure alignment mode, and (iv) searched for additional homologous loci in mouse genome that were not covered by genome alignments. The 3'-untranslated regions (3'-UTRs) of protein-coding genes and small noncoding RNAs are enriched for structures, while coding sequences are depleted. Repeat-associated loci make up about 95% of the homologous loci identified and are, as expected, predominantly found in intronic and intergenic regions. Nevertheless, we report the structure elements enriched in specific genome elements, such as 3'-UTRs and long noncoding RNAs (lncRNAs). We provide full access to our results via a custom UCSC genome browser trackhub freely available on our website (http://rna.tbi.univie.ac.at/trackhubs/#RNAz).
Collapse
|
21
|
The evolution of function within the Nudix homology clan. Proteins 2017; 85:775-811. [PMID: 27936487 PMCID: PMC5389931 DOI: 10.1002/prot.25223] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Revised: 10/15/2016] [Accepted: 11/28/2016] [Indexed: 01/01/2023]
Abstract
The Nudix homology clan encompasses over 80,000 protein domains from all three domains of life, defined by homology to each other. Proteins with a domain from this clan fall into four general functional classes: pyrophosphohydrolases, isopentenyl diphosphate isomerases (IDIs), adenine/guanine mismatch-specific adenine glycosylases (A/G-specific adenine glycosylases), and nonenzymatic activities such as protein/protein interaction and transcriptional regulation. The largest group, pyrophosphohydrolases, encompasses more than 100 distinct hydrolase specificities. To understand the evolution of this vast number of activities, we assembled and analyzed experimental and structural data for 205 Nudix proteins collected from the literature. We corrected erroneous functions or provided more appropriate descriptions for 53 annotations described in the Gene Ontology Annotation database in this family, and propose 275 new experimentally-based annotations. We manually constructed a structure-guided sequence alignment of 78 Nudix proteins. Using the structural alignment as a seed, we then made an alignment of 347 "select" Nudix homology domains, curated from structurally determined, functionally characterized, or phylogenetically important Nudix domains. Based on our review of Nudix pyrophosphohydrolase structures and specificities, we further analyzed a loop region downstream of the Nudix hydrolase motif previously shown to contact the substrate molecule and possess known functional motifs. This loop region provides a potential structural basis for the functional radiation and evolution of substrate specificity within the hydrolase family. Finally, phylogenetic analyses of the 347 select protein domains and of the complete Nudix homology clan revealed general monophyly with regard to function and a few instances of probable homoplasy. Proteins 2017; 85:775-811. © 2016 Wiley Periodicals, Inc.
Collapse
|
22
|
Structure-based classification of FAD binding sites: A comparative study of structural alignment tools. Proteins 2016; 84:1728-1747. [PMID: 27580869 DOI: 10.1002/prot.25158] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 07/29/2016] [Accepted: 08/24/2016] [Indexed: 11/06/2022]
Abstract
A total of six different structural alignment tools (TM-Align, TriangleMatch, CLICK, ProBis, SiteEngine and GA-SI) were assessed for their ability to perform two particular tasks: (i) discriminating FAD (flavin adenine dinucleotide) from non-FAD binding sites, and (ii) performing an all-to-all comparison on a set of 883 FAD binding sites for the purpose of classifying them. For the first task, the consistency of each alignment method was evaluated, showing that every method is able to distinguish FAD and non-FAD binding sites with a high Matthews correlation coefficient. Additionally, GA-SI was found to provide alignments different from those of the other approaches. The results obtained for the second task revealed more significant differences among alignment methods, as reflected in the poor correlation of their results and highlighted clearly by the independent evaluation of the structural superimpositions generated by each method. The classification itself was performed using the combined results of all methods, using the best result found for each comparison of binding sites. A number of different clustering methods (Single-linkage, UPGMA, Complete-linkage, SPICKER and k-Means clustering) were also used. The groups of similar binding sites (proteins) or clusters generated by the best performing method were further analyzed in terms of local sequence identity, local structural similarity and conservation of analogous contacts with the FAD ligands. Each of the clusters was characterized by a unique set of structural features or patterns, demonstrating that the groups generated truly reflect the structural diversity of FAD binding sites. Proteins 2016; 84:1728-1747. © 2016 Wiley Periodicals, Inc.
Collapse
|
23
|
PhyreStorm: A Web Server for Fast Structural Searches Against the PDB. J Mol Biol 2015; 428:702-708. [PMID: 26517951 PMCID: PMC7610957 DOI: 10.1016/j.jmb.2015.10.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 10/13/2015] [Accepted: 10/18/2015] [Indexed: 11/10/2022]
Abstract
The identification of structurally similar proteins can provide a range of biological insights, and accordingly, the alignment of a query protein to a database of experimentally determined protein structures is a technique commonly used in the fields of structural and evolutionary biology. The PhyreStorm Web server has been designed to provide comprehensive, up-to-date and rapid structural comparisons against the Protein Data Bank (PDB) combined with a rich and intuitive user interface. It is intended that this facility will enable biologists inexpert in bioinformatics access to a powerful tool for exploring protein structure relationships beyond what can be achieved by sequence analysis alone. By partitioning the PDB into similar structures, PhyreStorm is able to quickly discard the majority of structures that cannot possibly align well to a query protein, reducing the number of alignments required by an order of magnitude. PhyreStorm is capable of finding 93 ± 2% of all highly similar (TM-score > 0.7) structures in the PDB for each query structure, usually in less than 60 s. PhyreStorm is available at http://www.sbg.bio.ic.ac.uk/phyrestorm/.
Collapse
|
24
|
Carbohydrate-binding protein identification by coupling structural similarity searching with binding affinity prediction. J Comput Chem 2014; 35:2177-83. [PMID: 25220682 DOI: 10.1002/jcc.23730] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 05/27/2014] [Accepted: 08/25/2014] [Indexed: 02/03/2023]
Abstract
Carbohydrate-binding proteins (CBPs) are potential biomarkers and drug targets. However, the interactions between carbohydrates and proteins are challenging to study experimentally and computationally because of their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates as exists in RNA, DNA, and proteins. Here, we describe a structure-based function-prediction technique called SPOT-Struc that identifies carbohydrate-recognizing proteins and their binding amino acid residues by structural alignment program SPalign and binding affinity scoring according to a knowledge-based statistical potential based on the distance-scaled finite-ideal gas reference state (DFIRE). The leave-one-out cross-validation of the method on 113 carbohydrate-binding domains and 3442 noncarbohydrate binding proteins yields a Matthews correlation coefficient of 0.56 for SPalign alone and 0.63 for SPOT-Struc (SPalign + binding affinity scoring) for CBP prediction. SPOT-Struc is a technique with high positive predictive value (79% correct predictions in all positive CBP predictions) with a reasonable sensitivity (52% positive predictions in all CBPs). The sensitivity of the method was changed slightly when applied to 31 APO (unbound) structures found in the protein databank (14/31 for APO versus 15/31 for HOLO). The result of SPOT-Struc will not change significantly if highly homologous templates were used. SPOT-Struc predicted 19 out of 2076 structural genome targets as CBPs. In particular, one uncharacterized protein in Bacillus subtilis (1oq1A) was matched to galectin-9 from Mus musculus. Thus, SPOT-Struc is useful for uncovering novel carbohydrate-binding proteins. SPOT-Struc is available at http://sparks-lab.org.
Collapse
|
25
|
Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Mol Biol Evol 2014; 31:2251-66. [PMID: 24899668 PMCID: PMC4137710 DOI: 10.1093/molbev/msu184] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally diverge more slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence-structure model. We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences. We use the model to investigate the order of divergence of cytoglobins, myoglobins, and hemoglobins and observe a stabilization of phylogenetic inference: although a sequence-based inference assigns significant posterior probability to several different topologies, the structural model strongly favors one of these over the others and is more robust to the choice of data set.
Collapse
|
26
|
Automated structural comparisons clarify the phylogeny of the right-hand-shaped polymerases. Mol Biol Evol 2014; 31:2741-52. [PMID: 25063440 DOI: 10.1093/molbev/msu219] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Polymerases are essential for life, being responsible for replication, transcription, and the repair of nucleic acid molecules. Those that share a right-hand-shaped fold and catalytic site structurally similar to the DNA polymerase I of Escherichia coli may catalyze RNA- or DNA-dependent RNA polymerization, reverse transcription, or DNA replication in eukarya, archaea, bacteria, and their viruses. We have applied novel computational methods for structure-based clustering and phylogenetic analyses of this functionally diverse polymerase superfamily, which currently comprises six families. We identified a structural core common to all right-handed polymerases, composed of 57 amino acid residues, harboring two positionally and chemically conserved residues, the catalytic aspartates. The structural conservation within each of the six families is considerable, for example, the structural core shared by family Y DNA polymerases covers over 90% of the polymerase domain of the Sulfolobus solfataricus Dpo4. Our phylogenetic analyses propose an early separation of RNA-dependent polymerases that use primers from those that are primer-independent. Furthermore, the exchange of polymerase genes between viruses and their hosts is evident. Because of this horizontal gene transfer, the phylogeny of polymerases does not always reflect the evolutionary history of the corresponding organisms.
Collapse
|
27
|
Discovering isozyme-selective inhibitor scaffolds of human carbonic anhydrases using structural alignment and de novo drug design approaches. Chem Biol Drug Des 2013; 83:247-58. [PMID: 24112770 DOI: 10.1111/cbdd.12234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Revised: 08/24/2013] [Accepted: 09/15/2013] [Indexed: 11/27/2022]
Abstract
The development of isozyme-selective carbonic anhydrase inhibitors is currently still a great challenge. In the present study, protein-ligand complex structures were obtained by AutoDock Vina with SBR ((R)-N-(3-indol-1-yl-2-methyl-propyl)-4-sulfamoyl-benzamide) as the only inhibitor docked into the binding pockets of human isozymes CA I, II, IV, VI, IX, XII, and XIII. To make the spatial structures of complexes comparable, the co-ordinates for CA domains were reassigned based on structural alignments. With preferred docking poses of SBR been reduced to seed structures, the LigBuilder was used to build up inhibitor molecules. The results suggested that sulfonamide derivatives with naphthalene, fluorene, and acridan as the scaffold structures can be the potential isozyme-selective CAIs, especially for isozymes CA II, IV, and IX.
Collapse
|
28
|
CentroidAlign-Web: A Fast and Accurate Multiple Aligner for Long Non-Coding RNAs. Int J Mol Sci 2013; 14:6144-56. [PMID: 23507751 PMCID: PMC3634467 DOI: 10.3390/ijms14036144] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Revised: 01/28/2013] [Accepted: 02/28/2013] [Indexed: 12/31/2022] Open
Abstract
Due to the recent discovery of non-coding RNAs (ncRNAs), multiple sequence alignment (MSA) of those long RNA sequences is becoming increasingly important for classifying and determining the functional motifs in RNAs. However, not only primary (nucleotide) sequences, but also secondary structures of ncRNAs are closely related to their function and are conserved evolutionarily. Hence, information about secondary structures should be considered in the sequence alignment of ncRNAs. Yet, in general, a huge computational time is required in order to compute MSAs, taking secondary structure information into account. In this paper, we describe a fast and accurate web server, called CentroidAlign-Web, which can handle long RNA sequences. The web server also appropriately incorporates information about known secondary structures into MSAs. Computational experiments indicate that our web server is fast and accurate enough to handle long RNA sequences. CentroidAlign-Web is freely available from http://centroidalign.ncrna.org/.
Collapse
|
29
|
Comparative Bioinformatic Analysis of Active Site Structures in Evolutionarily Remote Homologues of α,β-Hydrolase Superfamily Enzymes. Acta Naturae 2011; 3:93-8. [PMID: 22649677 PMCID: PMC3347592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Comparative bioinformatic analysis is the cornerstone of the study of enzymes' structure-function relationship. However, numerous enzymes that derive from a common ancestor and have undergone substantial functional alterations during natural selection appear not to have a sequence similarity acceptable for a statistically reliable comparative analysis. At the same time, their active site structures, in general, can be conserved, while other parts may largely differ. Therefore, it sounds both plausible and appealing to implement a comparative analysis of the most functionally important structural elements - the active site structures; that is, the amino acid residues involved in substrate binding and the catalytic mechanism. A computer algorithm has been developed to create a library of enzyme active site structures based on the use of the PDB database, together with programs of structural analysis and identification of functionally important amino acid residues and cavities in the enzyme structure. The proposed methodology has been used to compare some α,β-hydrolase superfamily enzymes. The insight has revealed a high structural similarity of catalytic site areas, including the conservative organization of a catalytic triad and oxyanion hole residues, despite the wide functional diversity among the remote homologues compared. The methodology can be used to compare the structural organization of the catalytic and substrate binding sites of various classes of enzymes, as well as study enzymes' evolution and to create of a databank of enzyme active site structures.
Collapse
|
30
|
Similarity search for local protein structures at atomic resolution by exploiting a database management system. Biophysics (Nagoya-shi) 2007; 3:75-84. [PMID: 27857569 PMCID: PMC5036654 DOI: 10.2142/biophysics.3.75] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2007] [Accepted: 11/26/2007] [Indexed: 12/01/2022] Open
Abstract
A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational database management system with appropriate indexing of geometric data. This method, which we call geometric indexing, can enumerate ligand binding sites that are structurally similar to sub-structures of a query protein among more than 160,000 possible candidates within a few hours of CPU time on an ordinary desktop computer. After detecting a set of high scoring ligand binding sites by the geometric indexing search, structural alignments at atomic resolution are constructed by iteratively applying the Hungarian algorithm, and the statistical significance of the final score is estimated from an empirical model based on a gamma distribution. Applications of this method to several protein structures clearly shows that significant similarities can be detected between local structures of non-homologous as well as homologous proteins.
Collapse
|
31
|
Abstract
The OB-fold domain is a compact structural motif frequently used for nucleic acid recognition. Structural comparison of all OB-fold/nucleic acid complexes solved to date confirms the low degree of sequence similarity among members of this family while highlighting several structural sequence determinants common to most of these OB-folds. Loops connecting the secondary structural elements in the OB-fold core are extremely variable in length and in functional detail. However, certain features of ligand binding are conserved among OB-fold complexes, including the location of the binding surface, the polarity of the nucleic acid with respect to the OB-fold, and particular nucleic acid-protein interactions commonly used for recognition of single-stranded and unusually structured nucleic acids. Intriguingly, the observation of shared nucleic acid polarity may shed light on the longstanding question concerning OB-fold origins, indicating that it is unlikely that members of this family arose via convergent evolution.
Collapse
|