1
|
Zheng Z, Goncearenco A, Berezovsky IN. Back in time to the Gly-rich prototype of the phosphate binding elementary function. Curr Res Struct Biol 2024; 7:100142. [PMID: 38655428 PMCID: PMC11035071 DOI: 10.1016/j.crstbi.2024.100142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 03/31/2024] [Accepted: 04/03/2024] [Indexed: 04/26/2024] Open
Abstract
Binding of nucleotides and their derivatives is one of the most ancient elementary functions dating back to the Origin of Life. We review here the works considering one of the key elements in binding of (di)nucleotide-containing ligands - phosphate binding. We start from a brief discussion of major participants, conditions, and events in prebiotic evolution that resulted in the Origin of Life. Tracing back to the basic functions, including metal and phosphate binding, and, potentially, formation of primitive protein-protein interactions, we focus here on the phosphate binding. Critically assessing works on the structural, functional, and evolutionary aspects of phosphate binding, we perform a simple computational experiment reconstructing its most ancient and generic sequence prototype. The profiles of the phosphate binding signatures have been derived in form of position-specific scoring matrices (PSSMs), their peculiarities depending on the type of the ligands have been analyzed, and evolutionary connections between them have been delineated. Then, the apparent prototype that gave rise to all relevant phosphate-binding signatures had also been reconstructed. We show that two major signatures of the phosphate binding that discriminate between the binding of dinucleotide- and nucleotide-containing ligands are GxGxxG and GxxGxG, respectively. It appears that the signature archetypal for dinucleotide-containing ligands is more generic, and it can frequently bind phosphate groups in nucleotide-containing ligands as well. The reconstructed prototype's key signature GxGGxG underlies the role of glycine residues in providing flexibility and interactions necessary for binding the phosphate groups. The prototype also contains other ancient amino acids, valine, and alanine, showing versatility towards evolutionary design and functional diversification.
Collapse
Affiliation(s)
- Zejun Zheng
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | | | - Igor N. Berezovsky
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
2
|
Ferruz N, Heinzinger M, Akdel M, Goncearenco A, Naef L, Dallago C. From sequence to function through structure: Deep learning for protein design. Comput Struct Biotechnol J 2022; 21:238-250. [PMID: 36544476 PMCID: PMC9755234 DOI: 10.1016/j.csbj.2022.11.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/05/2022] [Accepted: 11/05/2022] [Indexed: 11/20/2022] Open
Abstract
The process of designing biomolecules, in particular proteins, is witnessing a rapid change in available tooling and approaches, moving from design through physicochemical force fields, to producing plausible, complex sequences fast via end-to-end differentiable statistical models. To achieve conditional and controllable protein design, researchers at the interface of artificial intelligence and biology leverage advances in natural language processing (NLP) and computer vision techniques, coupled with advances in computing hardware to learn patterns from growing biological databases, curated annotations thereof, or both. Once learned, these patterns can be used to provide novel insights into mechanistic biology and the design of biomolecules. However, navigating and understanding the practical applications for the many recent protein design tools is complex. To facilitate this, we 1) document recent advances in deep learning (DL) assisted protein design from the last three years, 2) present a practical pipeline that allows to go from de novo-generated sequences to their predicted properties and web-powered visualization within minutes, and 3) leverage it to suggest a generated protein sequence which might be used to engineer a biosynthetic gene cluster to produce a molecular glue-like compound. Lastly, we discuss challenges and highlight opportunities for the protein design field.
Collapse
Key Words
- ADMM, Alternating Direction Method of Multipliers
- CNN, Convolutional Neural Network
- DL, Deep learning
- Deep learning
- Drug discovery
- FNN, fully-connected neural network
- GAN, Generative Adversarial Network
- GCN, Graph Convolutional Network
- GNN, Graph Neural Network
- GO, Gene Ontology
- GVP, Geometric Vector Perceptron
- LSTM, Long-Short Term Memory
- MLP, Multilayer Perceptron
- MSA, Multiple Sequence Alignment
- NLP, Natural Language Processing
- NSR, Natural Sequence Recovery
- Protein design
- Protein language models
- Protein prediction
- VAE, Variational Autoencoder
- pLM, protein Language Model
Collapse
Affiliation(s)
- Noelia Ferruz
- Institute of Informatics and Applications, University of Girona, Girona, Spain
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics & Computational Biology, Technische Universität München, 85748 Garching, Germany
| | - Mehmet Akdel
- VantAI, 151 W 42nd Street, New York, NY 10036, United States
| | | | - Luca Naef
- VantAI, 151 W 42nd Street, New York, NY 10036, United States
| | - Christian Dallago
- Department of Informatics, Bioinformatics & Computational Biology, Technische Universität München, 85748 Garching, Germany
- VantAI, 151 W 42nd Street, New York, NY 10036, United States
- NVIDIA DE GmbH, Einsteinstraße 172, 81677 München, Germany
| |
Collapse
|
3
|
Yin M, Goncearenco A, Berezovsky IN. Deriving and Using Descriptors of Elementary Functions in Rational Protein Design. Front Bioinform 2021; 1:657529. [DOI: 10.3389/fbinf.2021.657529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/15/2021] [Indexed: 01/05/2023] Open
Abstract
The rational design of proteins with desired functions requires a comprehensive description of the functional building blocks. The evolutionary conserved functional units constitute nature's toolbox; however, they are not readily available to protein designers. This study focuses on protein units of subdomain size that possess structural properties and amino acid residues sufficient to carry out elementary reactions in the catalytic mechanisms. The interactions within such elementary functional loops (ELFs) and the interactions with the surrounding protein scaffolds constitute the descriptor of elementary function. The computational approach to deriving descriptors directly from protein sequences and structures and applying them in rational design was implemented in a proof-of-concept DEFINED-PROTEINS software package. Once the descriptor is obtained, the ELF can be fitted into existing or novel scaffolds to obtain the desired function. For instance, the descriptor may be used to determine the necessary spatial restraints in a fragment-based grafting protocol. We illustrated the approach by applying it to well-known cases of ELFs, including phosphate-binding P-loop, diphosphate-binding glycine-rich motif, and calcium-binding EF-hand motif, which could be used to jumpstart templates for user applications. The DEFINED-PROTEINS package is available for free at https://github.com/MelvinYin/Defined_Proteins.
Collapse
|
4
|
Peng Y, Markov Y, Goncearenco A, Landsman D, Panchenko AR. Data sets on human histone interaction networks. Data Brief 2020; 33:106555. [PMID: 33299912 PMCID: PMC7701981 DOI: 10.1016/j.dib.2020.106555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 11/13/2020] [Accepted: 11/16/2020] [Indexed: 11/28/2022] Open
Abstract
Here, we present the data of human histone interactomes generated and analysed in the research article by Peng et al., 2020 [1]. The histone interactome data provide a comprehensive mapping of human histone/nucleosome interaction networks by using different data sources from the structural, chemical cross-linking, and high-throughput studies. The histone interactions are presented at different levels of granularity in networks, including protein, domain, and residue-levels. All human histone interactome Cytoscape session files are available at https://github.com/Panchenko-Lab/Human-histone-interactome.
Collapse
Affiliation(s)
- Yunhui Peng
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, United States
| | - Yaroslav Markov
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, United States.,Computational Biology and Bioinformatics, Combined Program in the Biological and Biomedical Sciences, Yale University, New Haven, CT, United States
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, United States.,VantAI, New York, NY, United States
| | - David Landsman
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, United States
| | - Anna R Panchenko
- Department of Pathology and Molecular Medicine, School of Medicine, Queen's University, ON, Canada
| |
Collapse
|
5
|
Peng Y, Markov Y, Goncearenco A, Landsman D, Panchenko AR. Human Histone Interaction Networks: An Old Concept, New Trends. J Mol Biol 2020; 433:166684. [PMID: 33098859 DOI: 10.1016/j.jmb.2020.10.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 10/12/2020] [Accepted: 10/13/2020] [Indexed: 12/19/2022]
Abstract
To elucidate the properties of human histone interactions on the large scale, we perform a comprehensive mapping of human histone interaction networks by using data from structural, chemical cross-linking and various high-throughput studies. Histone interactomes derived from different data sources show limited overlap and complement each other. It inspires us to integrate these data into the combined histone global interaction network which includes 5308 proteins and 10,330 interactions. The analysis of topological properties of the human histone interactome reveals its scale free behavior and high modularity. Our study of histone binding interfaces uncovers a remarkably high number of residues involved in interactions between histones and non-histone proteins, 80-90% of residues in histones H3 and H4 have at least one binding partner. Two types of histone binding modes are detected: interfaces conserved in most histone variants and variant specific interfaces. Finally, different types of chromatin factors recognize histones in nucleosomes via distinct binding modes, and many of these interfaces utilize acidic patches among other sites. Interaction networks are available at https://github.com/Panchenko-Lab/Human-histone-interactome.
Collapse
Affiliation(s)
- Yunhui Peng
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yaroslav Markov
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA; Computational Biology and Bioinformatics, Combined Program in the Biological and Biomedical Sciences, Yale University, New Haven, CT 06520, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA; VantAI, New York, NY 10003, USA
| | - David Landsman
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Anna R Panchenko
- Department of Pathology and Molecular Medicine, School of Medicine, Queen's University, ON K7L 3N6, Canada.
| |
Collapse
|
6
|
Miller BF, Pisanic Ii TR, Margolin G, Petrykowska HM, Athamanolap P, Goncearenco A, Osei-Tutu A, Annunziata CM, Wang TH, Elnitski L. Leveraging locus-specific epigenetic heterogeneity to improve the performance of blood-based DNA methylation biomarkers. Clin Epigenetics 2020; 12:154. [PMID: 33081832 PMCID: PMC7574234 DOI: 10.1186/s13148-020-00939-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 09/21/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Variation in intercellular methylation patterns can complicate the use of methylation biomarkers for clinical diagnostic applications such as blood-based cancer testing. Here, we describe development and validation of a methylation density binary classification method called EpiClass (available for download at https://github.com/Elnitskilab/EpiClass ) that can be used to predict and optimize the performance of methylation biomarkers, particularly in challenging, heterogeneous samples such as liquid biopsies. This approach is based upon leveraging statistical differences in single-molecule sample methylation density distributions to identify ideal thresholds for sample classification. RESULTS We developed and tested the classifier using reduced representation bisulfite sequencing (RRBS) data derived from ovarian carcinoma tissue DNA and controls. We used these data to perform in silico simulations using methylation density profiles from individual epiallelic copies of ZNF154, a genomic locus known to be recurrently methylated in numerous cancer types. From these profiles, we predicted the performance of the classifier in liquid biopsies for the detection of epithelial ovarian carcinomas (EOC). In silico analysis indicated that EpiClass could be leveraged to better identify cancer-positive liquid biopsy samples by implementing precise thresholds with respect to methylation density profiles derived from circulating cell-free DNA (cfDNA) analysis. These predictions were confirmed experimentally using DREAMing to perform digital methylation density analysis on a cohort of low volume (1-ml) plasma samples obtained from 26 EOC-positive and 41 cancer-free women. EpiClass performance was then validated in an independent cohort of 24 plasma specimens, derived from a longitudinal study of 8 EOC-positive women, and 12 plasma specimens derived from 12 healthy women, respectively, attaining a sensitivity/specificity of 91.7%/100.0%. Direct comparison of CA-125 measurements with EpiClass demonstrated that EpiClass was able to better identify EOC-positive women than standard CA-125 assessment. Finally, we used independent whole genome bisulfite sequencing (WGBS) datasets to demonstrate that EpiClass can also identify other cancer types as well or better than alternative methylation-based classifiers. CONCLUSIONS Our results indicate that assessment of intramolecular methylation density distributions calculated from cfDNA facilitates the use of methylation biomarkers for diagnostic applications. Furthermore, we demonstrated that EpiClass analysis of ZNF154 methylation was able to outperform CA-125 in the detection of etiologically diverse ovarian carcinomas, indicating broad utility of ZNF154 for use as a biomarker of ovarian cancer.
Collapse
Affiliation(s)
- Brendan F Miller
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Thomas R Pisanic Ii
- Institute for NanoBioTechnology, Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - Gennady Margolin
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Hanna M Petrykowska
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Pornpat Athamanolap
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Alexander Goncearenco
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Akosua Osei-Tutu
- Women's Malignancy Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Christina M Annunziata
- Women's Malignancy Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Tza-Huei Wang
- Institute for NanoBioTechnology, Johns Hopkins University, Baltimore, MD, 21218, USA
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Laura Elnitski
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
7
|
Goncearenco A, LaBarre BA, Petrykowska HM, Jaratlerdsiri W, Bornman MSR, Turner SD, Hayes VM, Elnitski L. DNA methylation profiles unique to Kalahari KhoeSan individuals. Epigenetics 2020; 16:537-553. [PMID: 32892676 PMCID: PMC8078743 DOI: 10.1080/15592294.2020.1809852] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Genomes of KhoeSan individuals of the Kalahari Desert provide the greatest understanding of single nucleotide diversity in the human genome. Compared with individuals in industrialized environments, the KhoeSan have a unique foraging and hunting lifestyle. Given these dramatic environmental differences, and the responsiveness of the methylome to environmental exposures of many types, we hypothesized that DNA methylation patterns would differ between KhoeSan and neighbouring agropastoral and/or industrial Bantu. We analysed Illumina HumanMethylation 450 k array data generated from blood samples from 38 KhoeSan and 42 Bantu, and 6 Europeans. After removing CpG positions associated with annotated and novel polymorphisms and controlling for white blood cell composition, sex, age and technical variation we identified 816 differentially methylated CpG loci, out of which 133 had an absolute beta-value difference of at least 0.05. Notably SLC39A4/ZIP4, which plays a role in zinc transport, was one of the most differentially methylated loci. Although the chronological ages of the KhoeSan are not formally recorded, we compared historically estimated ages to methylation-based calculations. This study demonstrates that the epigenetic profile of KhoeSan individuals reveals differences from other populations, and along with extensive genetic diversity, this community brings increased accessibility and understanding to the diversity of the human genome.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Brenna A LaBarre
- Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.,Graduate Program in Bioinformatics, Boston University, Boston, MA, USA
| | | | - Weerachai Jaratlerdsiri
- Laboratory for Human Comparative & Prostate Cancer Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - M S Riana Bornman
- School of Health Systems and Public Health, University of Pretoria, Pretoria, South Africa
| | - Stephen D Turner
- Division of Biomedical Informatics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Vanessa M Hayes
- Laboratory for Human Comparative & Prostate Cancer Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia.,School of Health Systems and Public Health, University of Pretoria, Pretoria, South Africa.,Faculty of Health Sciences, University of Limpopo, Sovenga, South Africa.,Sydney Medical School, University of Sydney, Camperdown, Australia
| | - Laura Elnitski
- Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
8
|
Zhang N, Chen Y, Lu H, Zhao F, Alvarez RV, Goncearenco A, Panchenko AR, Li M. MutaBind2: Predicting the Impacts of Single and Multiple Mutations on Protein-Protein Interactions. iScience 2020; 23:100939. [PMID: 32169820 PMCID: PMC7068639 DOI: 10.1016/j.isci.2020.100939] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/21/2019] [Accepted: 02/20/2020] [Indexed: 01/17/2023] Open
Abstract
Missense mutations may affect proteostasis by destabilizing or over-stabilizing protein complexes and changing the pathway flux. Predicting the effects of stabilizing mutations on protein-protein interactions is notoriously difficult because existing experimental sets are skewed toward mutations reducing protein-protein binding affinity and many computational methods fail to correctly evaluate their effects. To address this issue, we developed a method MutaBind2, which estimates the impacts of single as well as multiple mutations on protein-protein interactions. MutaBind2 employs only seven features, and the most important of them describe interactions of proteins with the solvent, evolutionary conservation of the site, and thermodynamic stability of the complex and each monomer. This approach shows a distinct improvement especially in evaluating the effects of mutations increasing binding affinity. MutaBind2 can be used for finding disease driver mutations, designing stable protein complexes, and discovering new protein-protein interaction inhibitors.
Collapse
Affiliation(s)
- Ning Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Yuting Chen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Haoyu Lu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Feiyang Zhao
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Roberto Vera Alvarez
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Minghui Li
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.
| |
Collapse
|
9
|
LaBarre BA, Goncearenco A, Petrykowska HM, Jaratlerdsiri W, Bornman MSR, Hayes VM, Elnitski L. MethylToSNP: identifying SNPs in Illumina DNA methylation array data. Epigenetics Chromatin 2019; 12:79. [PMID: 31861999 PMCID: PMC6923858 DOI: 10.1186/s13072-019-0321-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 12/09/2019] [Indexed: 12/16/2022] Open
Abstract
Background Current array-based methods for the measurement of DNA methylation rely on the process of sodium bisulfite conversion to differentiate between methylated and unmethylated cytosine bases in DNA. In the absence of genotype data this process can lead to ambiguity in data interpretation when a sample has polymorphisms at a methylation probe site. A common way to minimize this problem is to exclude such potentially problematic sites, with some methods removing as much as 60% of array probes from consideration before data analysis. Results Here, we present an algorithm implemented in an R Bioconductor package, MethylToSNP, which detects a characteristic data pattern to infer sites likely to be confounded by polymorphisms. Additionally, the tool provides a stringent reliability score to allow thresholding on SNP predictions. We calibrated parameters and thresholds used by the algorithm on simulated and real methylation data sets. We illustrate findings using methylation data from YRI (Yoruba in Ibadan, Nigeria), CEPH (European descent) and KhoeSan (southern African) populations. Our polymorphism predictions made using MethylToSNP have been validated through SNP databases and bisulfite and genomic sequencing. Conclusions The benefits of this method are threefold. First, it prevents extensive data loss by considering only SNPs specific to the individuals in the study. Second, it offers the possibility to identify new polymorphisms in samples for which there is little known about the genetic landscape. Third, it identifies variants as they exist in functional regions of a genome, such as in CTCF (transcriptional repressor) sites and enhancers, that may be common alleles or personal mutations with potential to deleteriously affect genomic regulatory activities. We demonstrate that MethylToSNP is applicable to the Illumina 450K and Illumina 850K EPIC array data and is also backwards compatible to the 27K methylation arrays. Going forward, this kind of nuanced approach can increase the amount of information derived from precious data sets by considering samples of the project individually to enable more informed decisions about data cleaning.
Collapse
Affiliation(s)
- Brenna A LaBarre
- Graduate Program in Bioinformatics, Boston University, Boston, MA, USA.,Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 49 Convent Dr., Bethesda, MD, 20892, USA
| | - Alexander Goncearenco
- Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 49 Convent Dr., Bethesda, MD, 20892, USA
| | - Hanna M Petrykowska
- Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 49 Convent Dr., Bethesda, MD, 20892, USA
| | - Weerachai Jaratlerdsiri
- Laboratory for Human Comparative & Prostate Cancer Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - M S Riana Bornman
- School of Health Systems and Public Health, University of Pretoria, Hatfield, Pretoria, South Africa
| | - Vanessa M Hayes
- Laboratory for Human Comparative & Prostate Cancer Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,School of Health Systems and Public Health, University of Pretoria, Hatfield, Pretoria, South Africa.,Sydney Medical School, University of Sydney, Camperdown, NSW, Australia
| | - Laura Elnitski
- Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 49 Convent Dr., Bethesda, MD, 20892, USA.
| |
Collapse
|
10
|
Rogozin IB, Pavlov YI, Goncearenco A, De S, Lada AG, Poliakov E, Panchenko AR, Cooper DN. Mutational signatures and mutable motifs in cancer genomes. Brief Bioinform 2019; 19:1085-1101. [PMID: 28498882 DOI: 10.1093/bib/bbx049] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Indexed: 12/22/2022] Open
Abstract
Cancer is a genetic disorder, meaning that a plethora of different mutations, whether somatic or germ line, underlie the etiology of the 'Emperor of Maladies'. Point mutations, chromosomal rearrangements and copy number changes, whether they have occurred spontaneously in predisposed individuals or have been induced by intrinsic or extrinsic (environmental) mutagens, lead to the activation of oncogenes and inactivation of tumor suppressor genes, thereby promoting malignancy. This scenario has now been recognized and experimentally confirmed in a wide range of different contexts. Over the past decade, a surge in available sequencing technologies has allowed the sequencing of whole genomes from liquid malignancies and solid tumors belonging to different types and stages of cancer, giving birth to the new field of cancer genomics. One of the most striking discoveries has been that cancer genomes are highly enriched with mutations of specific kinds. It has been suggested that these mutations can be classified into 'families' based on their mutational signatures. A mutational signature may be regarded as a type of base substitution (e.g. C:G to T:A) within a particular context of neighboring nucleotide sequence (the bases upstream and/or downstream of the mutation). These mutational signatures, supplemented by mutable motifs (a wider mutational context), promise to help us to understand the nature of the mutational processes that operate during tumor evolution because they represent the footprints of interactions between DNA, mutagens and the enzymes of the repair/replication/modification pathways.
Collapse
Affiliation(s)
- Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, USA
| | - Youri I Pavlov
- Eppley Institute for Cancer Research, University of Nebraska Medical Center, USA
| | | | | | - Artem G Lada
- Department Microbiology and Molecular Genetics, University of California, Davis, USA
| | - Eugenia Poliakov
- Laboratory of Retinal Cell and Molecular Biology, National Eye Institute, National Institutes of Health, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, USA
| | | |
Collapse
|
11
|
Goncearenco A, Rager SL, Li M, Sang QX, Rogozin IB, Panchenko AR. Exploring background mutational processes to decipher cancer genetic heterogeneity. Nucleic Acids Res 2019; 45:W514-W522. [PMID: 28472504 PMCID: PMC5793731 DOI: 10.1093/nar/gkx367] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 04/21/2017] [Indexed: 01/08/2023] Open
Abstract
Much remains unknown about the progression and heterogeneity of mutational processes in different cancers and their diagnostic and clinical potential. A growing body of evidence supports mutation rate dependence on the local DNA sequence context for various types of mutations. We propose several tools for the analysis of cancer context-dependent mutations, which are implemented in an online computational framework MutaGene. The framework explores DNA context-dependent mutational patterns and underlying somatic cancer mutagenesis, analyzes mutational profiles of cancer samples, identifies the combinations of underlying mutagenic processes including those related to infidelity of DNA replication and repair machinery, and various other endogenous and exogenous mutagenic factors. As a result, the combination of mutagenic processes can be identified in any query sample with subsequent comparison to mutational profiles derived from malignant and benign samples. In addition, mutagen or cancer-specific mutational background models are applied to calculate expected DNA and protein site mutability to decouple relative contributions of mutagenesis and selection in carcinogenesis, thus elucidating the site-specific driving events in cancer. MutaGene is freely available at https://www.ncbi.nlm.nih.gov/projects/mutagene/.
Collapse
Affiliation(s)
| | - Stephanie L Rager
- National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA.,Columbia University, School of Engineering and Applied Science, New York, NY 10027, USA
| | - Minghui Li
- National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA
| | - Qing-Xiang Sang
- Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida 32306, USA
| | - Igor B Rogozin
- National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA
| |
Collapse
|
12
|
Kale S, Goncearenco A, Markov Y, Landsman D, Panchenko AR. Molecular recognition of nucleosomes by binding partners. Curr Opin Struct Biol 2019; 56:164-170. [PMID: 30991239 DOI: 10.1016/j.sbi.2019.03.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 03/01/2019] [Accepted: 03/07/2019] [Indexed: 12/20/2022]
Abstract
Nucleosomes represent the elementary units of chromatin packing and hubs in epigenetic signaling pathways. Across the chromatin and over the lifetime of the eukaryotic cell, nucleosomes experience a broad repertoire of alterations that affect their structure and binding with various chromatin factors. Dynamics of the histone core, nucleosomal and linker DNA, and intrinsic disorder of histone tails add further complexity to the nucleosome interaction landscape. In light of our understanding through the growing number of experimental and computational studies, we review the emerging patterns of molecular recognition of nucleosomes by their binding partners and assess the basic mechanisms of its regulation.
Collapse
Affiliation(s)
- Seyit Kale
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yaroslav Markov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
13
|
Brown AL, Li M, Goncearenco A, Panchenko AR. Finding driver mutations in cancer: Elucidating the role of background mutational processes. PLoS Comput Biol 2019; 15:e1006981. [PMID: 31034466 PMCID: PMC6508748 DOI: 10.1371/journal.pcbi.1006981] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 05/09/2019] [Accepted: 03/28/2019] [Indexed: 01/22/2023] Open
Abstract
Identifying driver mutations in cancer is notoriously difficult. To date, recurrence of a mutation in patients remains one of the most reliable markers of mutation driver status. However, some mutations are more likely to occur than others due to differences in background mutation rates arising from various forms of infidelity of DNA replication and repair machinery, endogenous, and exogenous mutagens. We calculated nucleotide and codon mutability to study the contribution of background processes in shaping the observed mutational spectrum in cancer. We developed and tested probabilistic pan-cancer and cancer-specific models that adjust the number of mutation recurrences in patients by background mutability in order to find mutations which may be under selection in cancer. We showed that mutations with higher mutability values had higher observed recurrence frequency, especially in tumor suppressor genes. This trend was prominent for nonsense and silent mutations or mutations with neutral functional impact. In oncogenes, however, highly recurring mutations were characterized by relatively low mutability, resulting in an inversed U-shaped trend. Mutations not yet observed in any tumor had relatively low mutability values, indicating that background mutability might limit mutation occurrence. We compiled a dataset of missense mutations from 58 genes with experimentally validated functional and transforming impacts from various studies. We found that mutability of driver mutations was lower than that of passengers and consequently adjusting mutation recurrence frequency by mutability significantly improved ranking of mutations and driver mutation prediction. Even though no training on existing data was involved, our approach performed similarly or better to the state-of-the-art methods.
Collapse
Affiliation(s)
- Anna-Leigh Brown
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD, United States of America
| | - Minghui Li
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Alexander Goncearenco
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD, United States of America
| | - Anna R. Panchenko
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD, United States of America
| |
Collapse
|
14
|
Roper N, Gao S, Maity TK, Banday AR, Zhang X, Venugopalan A, Cultraro CM, Patidar R, Sindiri S, Brown AL, Goncearenco A, Panchenko AR, Biswas R, Thomas A, Rajan A, Carter CA, Kleiner DE, Hewitt SM, Khan J, Prokunina-Olsson L, Guha U. APOBEC Mutagenesis and Copy-Number Alterations Are Drivers of Proteogenomic Tumor Evolution and Heterogeneity in Metastatic Thoracic Tumors. Cell Rep 2019; 26:2651-2666.e6. [PMID: 30840888 PMCID: PMC6461561 DOI: 10.1016/j.celrep.2019.02.028] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 01/02/2019] [Accepted: 02/07/2019] [Indexed: 12/13/2022] Open
Abstract
Intratumor mutational heterogeneity has been documented in primary non-small-cell lung cancer. Here, we elucidate mechanisms of tumor evolution and heterogeneity in metastatic thoracic tumors (lung adenocarcinoma and thymic carcinoma) using whole-exome and transcriptome sequencing, SNP array for copy-number alterations (CNAs), and mass-spectrometry-based quantitative proteomics of metastases obtained by rapid autopsy. APOBEC mutagenesis, promoted by increased expression of APOBEC3 region transcripts and associated with a high-risk APOBEC3 germline variant, correlated with mutational tumor heterogeneity. TP53 mutation status was associated with APOBEC hypermutator status. Interferon pathways were enriched in tumors with high APOBEC mutagenesis and IFN-γ-induced expression of APOBEC3B in lung adenocarcinoma cells, suggesting that the immune microenvironment may promote mutational heterogeneity. CNAs occurring late in tumor evolution correlated with downstream transcriptomic and proteomic heterogeneity, although global proteomic heterogeneity was significantly greater than transcriptomic and CNA heterogeneity. These results illustrate key mechanisms underlying multi-dimensional heterogeneity in metastatic thoracic tumors.
Collapse
Affiliation(s)
- Nitin Roper
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Shaojian Gao
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Tapan K Maity
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - A Rouf Banday
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20814, USA
| | - Xu Zhang
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Abhilash Venugopalan
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Constance M Cultraro
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Rajesh Patidar
- Genetics Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Sivasish Sindiri
- Genetics Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Anna-Leigh Brown
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20814, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20814, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20814, USA
| | - Romi Biswas
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Anish Thomas
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Arun Rajan
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Corey A Carter
- Walter Reed National Military Medical Center, Bethesda, MD 20814, USA
| | - David E Kleiner
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Stephen M Hewitt
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Javed Khan
- Genetics Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA
| | - Ludmila Prokunina-Olsson
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20814, USA
| | - Udayan Guha
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20814, USA.
| |
Collapse
|
15
|
Rogozin IB, Goncearenco A, Lada AG, De S, Yurchenko V, Nudelman G, Panchenko AR, Cooper DN, Pavlov YI. DNA polymerase η mutational signatures are found in a variety of different types of cancer. Cell Cycle 2018; 17:348-355. [PMID: 29139326 DOI: 10.1080/15384101.2017.1404208] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
DNA polymerase (pol) η is a specialized error-prone polymerase with at least two quite different and contrasting cellular roles: to mitigate the genetic consequences of solar UV irradiation, and promote somatic hypermutation in the variable regions of immunoglobulin genes. Misregulation and mistargeting of pol η can compromise genome integrity. We explored whether the mutational signature of pol η could be found in datasets of human somatic mutations derived from normal and cancer cells. A substantial excess of single and tandem somatic mutations within known pol η mutable motifs was noted in skin cancer as well as in many other types of human cancer, suggesting that somatic mutations in A:T bases generated by DNA polymerase η are a common feature of tumorigenesis. Another peculiarity of pol ηmutational signatures, mutations in YCG motifs, led us to speculate that error-prone DNA synthesis opposite methylated CpG dinucleotides by misregulated pol η in tumors might constitute an additional mechanism of cytosine demethylation in this hypermutable dinucleotide.
Collapse
Affiliation(s)
- Igor B Rogozin
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Alexander Goncearenco
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Artem G Lada
- b Department Microbiology and Molecular Genetics , University of California , Davis , CA , USA
| | - Subhajyoti De
- c Rutgers Cancer Institute of New Jersey , Rutgers University , New Brunswick , NJ , USA
| | - Vyacheslav Yurchenko
- d Life Science Research Center , University of Ostrava, 71000 Ostrava , Czech Republic
| | - German Nudelman
- e Systems Biology Center , Icahn School of Medicine at Mount Sinai , New York , New York 10029 , USA
| | - Anna R Panchenko
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - David N Cooper
- f Institute of Medical Genetics, School of Medicine , Cardiff University , UK
| | - Youri I Pavlov
- g Eppley Institute for Research in Cancer and Allied Diseases , University of Nebraska Medical Center , Omaha , NE 68198, USA.,h Departments of Microbiology and Pathology , University of Nebraska Medical Center , Omaha , NE , USA.,i Biochemistry and Molecular Biology , University of Nebraska Medical Center , Omaha , NE , USA.,j Genetics, Cell Biology and Anatomy , University of Nebraska Medical Center , Omaha , NE , USA
| |
Collapse
|
16
|
Abstract
In this review we describe a protocol to annotate the effects of missense mutations on proteins, their functions, stability, and binding. For this purpose we present a collection of the most comprehensive databases which store different types of sequencing data on missense mutations, we discuss their relationships, possible intersections, and unique features. Next, we suggest an annotation workflow using the state-of-the art methods and highlight their usability, advantages, and limitations for different cases. Finally, we address a particularly difficult problem of deciphering the molecular mechanisms of mutations on proteins and protein complexes to understand the origins and mechanisms of diseases.
Collapse
Affiliation(s)
- Minghui Li
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
17
|
Shaytan AK, Armeev GA, Goncearenco A, Zhurkin VB, Landsman D, Panchenko AR. Trajectories of microsecond molecular dynamics simulations of nucleosomes and nucleosome core particles. Data Brief 2016; 7:1678-81. [PMID: 27222871 PMCID: PMC4872717 DOI: 10.1016/j.dib.2016.04.073] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 03/22/2016] [Accepted: 04/29/2016] [Indexed: 10/29/2022] Open
Abstract
We present here raw trajectories of molecular dynamics simulations for nucleosome with linker DNA strands as well as minimalistic nucleosome core particle model. The simulations were done in explicit solvent using CHARMM36 force field. We used this data in the research article Shaytan et al., 2016 [1]. The trajectory files are supplemented by TCL scripts providing advanced visualization capabilities.
Collapse
Affiliation(s)
- Alexey K Shaytan
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD 20894, United States; Biology Department, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Grigoriy A Armeev
- Biology Department, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Alexander Goncearenco
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD 20894, United States
| | - Victor B Zhurkin
- Laboratory of Cell Biology, National Cancer Institute, NIH, Bethesda, MD 20892, United States
| | - David Landsman
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD 20894, United States
| | - Anna R Panchenko
- National Center for Biotechnology Information, NLM, NIH, Bethesda, MD 20894, United States
| |
Collapse
|
18
|
Li M, Simonetti FL, Goncearenco A, Panchenko AR. MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions. Nucleic Acids Res 2016; 44:W494-501. [PMID: 27150810 PMCID: PMC4987923 DOI: 10.1093/nar/gkw374] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 04/23/2016] [Indexed: 11/13/2022] Open
Abstract
Proteins engage in highly selective interactions with their macromolecular partners. Sequence variants that alter protein binding affinity may cause significant perturbations or complete abolishment of function, potentially leading to diseases. There exists a persistent need to develop a mechanistic understanding of impacts of variants on proteins. To address this need we introduce a new computational method MutaBind to evaluate the effects of sequence variants and disease mutations on protein interactions and calculate the quantitative changes in binding affinity. The MutaBind method uses molecular mechanics force fields, statistical potentials and fast side-chain optimization algorithms. The MutaBind server maps mutations on a structural protein complex, calculates the associated changes in binding affinity, determines the deleterious effect of a mutation, estimates the confidence of this prediction and produces a mutant structural model for download. MutaBind can be applied to a large number of problems, including determination of potential driver mutations in cancer and other diseases, elucidation of the effects of sequence variants on protein fitness in evolution and protein design. MutaBind is available at http://www.ncbi.nlm.nih.gov/projects/mutabind/.
Collapse
Affiliation(s)
- Minghui Li
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
19
|
Shaytan AK, Armeev GA, Goncearenco A, Zhurkin VB, Landsman D, Panchenko AR. Nucleosome Dynamics at Microsecond Timescale: DNA-Protein Interactions, Water-Mediated Interactions and Nucleosome Formation. Biophys J 2016. [DOI: 10.1016/j.bpj.2015.11.2189] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
20
|
Shaytan AK, Armeev GA, Goncearenco A, Zhurkin VB, Landsman D, Panchenko AR. Coupling between Histone Conformations and DNA Geometry in Nucleosomes on a Microsecond Timescale: Atomistic Insights into Nucleosome Functions. J Mol Biol 2015; 428:221-237. [PMID: 26699921 DOI: 10.1016/j.jmb.2015.12.004] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 12/04/2015] [Accepted: 12/07/2015] [Indexed: 12/16/2022]
Abstract
An octamer of histone proteins wraps about 200bp of DNA into two superhelical turns to form nucleosomes found in chromatin. Although the static structure of the nucleosomal core particle has been solved, details of the dynamic interactions between histones and DNA remain elusive. We performed extensively long unconstrained, all-atom microsecond molecular dynamics simulations of nucleosomes including linker DNA segments and full-length histones in explicit solvent. For the first time, we were able to identify and characterize the rearrangements in nucleosomes on a microsecond timescale including the coupling between the conformation of the histone tails and the DNA geometry. We found that certain histone tail conformations promoted DNA bulging near its entry/exit sites, resulting in the formation of twist defects within the DNA. This led to a reorganization of histone-DNA interactions, suggestive of the formation of initial nucleosome sliding intermediates. We characterized the dynamics of the histone tails upon their condensation on the core and linker DNA and showed that tails may adopt conformationally constrained positions due to the insertion of "anchoring" lysines and arginines into the DNA minor grooves. Potentially, these phenomena affect the accessibility of post-translationally modified histone residues that serve as important sites for epigenetic marks (e.g., at H3K9, H3K27, H4K16), suggesting that interactions of the histone tails with the core and linker DNA modulate the processes of histone tail modifications and binding of the effector proteins. We discuss the implications of the observed results on the nucleosome function and compare our results to different experimental studies.
Collapse
Affiliation(s)
- Alexey K Shaytan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; Faculty of Biology, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Grigoriy A Armeev
- Faculty of Biology, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Victor B Zhurkin
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
21
|
Zheng Z, Goncearenco A, Berezovsky IN. Nucleotide binding database NBDB--a collection of sequence motifs with specific protein-ligand interactions. Nucleic Acids Res 2015; 44:D301-7. [PMID: 26507856 PMCID: PMC4702817 DOI: 10.1093/nar/gkv1124] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 10/14/2015] [Indexed: 11/14/2022] Open
Abstract
NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand–protein interactions found in crystallized ligand–protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions.
Collapse
Affiliation(s)
- Zejun Zheng
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | | | - Igor N Berezovsky
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
22
|
Goncearenco A, Shaytan AK, Shoemaker BA, Panchenko AR. Structural Perspectives on the Evolutionary Expansion of Unique Protein-Protein Binding Sites. Biophys J 2015. [PMID: 26213149 DOI: 10.1016/j.bpj.2015.06.056] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Structures of protein complexes provide atomistic insights into protein interactions. Human proteins represent a quarter of all structures in the Protein Data Bank; however, available protein complexes cover less than 10% of the human proteome. Although it is theoretically possible to infer interactions in human proteins based on structures of homologous protein complexes, it is still unclear to what extent protein interactions and binding sites are conserved, and whether protein complexes from remotely related species can be used to infer interactions and binding sites. We considered biological units of protein complexes and clustered protein-protein binding sites into similarity groups based on their structure and sequence, which allowed us to identify unique binding sites. We showed that the growth rate of the number of unique binding sites in the Protein Data Bank was much slower than the growth rate of the number of structural complexes. Next, we investigated the evolutionary roots of unique binding sites and identified the major phyletic branches with the largest expansion in the number of novel binding sites. We found that many binding sites could be traced to the universal common ancestor of all cellular organisms, whereas relatively few binding sites emerged at the major evolutionary branching points. We analyzed the physicochemical properties of unique binding sites and found that the most ancient sites were the largest in size, involved many salt bridges, and were the most compact and least planar. In contrast, binding sites that appeared more recently in the evolution of eukaryotes were characterized by a larger fraction of polar and aromatic residues, and were less compact and more planar, possibly due to their more transient nature and roles in signaling processes.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Alexey K Shaytan
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Benjamin A Shoemaker
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Anna R Panchenko
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland.
| |
Collapse
|
23
|
Abstract
The goal of this work is to learn from nature the rules that govern evolution and the design of protein function. The fundamental laws of physics lie in the foundation of the protein structure and all stages of the protein evolution, determining optimal sizes and shapes at different levels of structural hierarchy. We looked back into the very onset of the protein evolution with a goal to find elementary functions (EFs) that came from the prebiotic world and served as building blocks of the first enzymes. We defined the basic structural and functional units of biochemical reactions-elementary functional loops. The diversity of contemporary enzymes can be described via combinations of a limited number of elementary chemical reactions, many of which are performed by the descendants of primitive prebiotic peptides/proteins. By analyzing protein sequences we were able to identify EFs shared by seemingly unrelated protein superfamilies and folds and to unravel evolutionary relations between them. Binding and metabolic processing of the metal- and nucleotide-containing cofactors and ligands are among the most abundant ancient EFs that became indispensable in many natural enzymes. Highly designable folds provide structural scaffolds for many different biochemical reactions. We show that contemporary proteins are built from a limited number of EFs, making their analysis instrumental for establishing the rules for protein design. Evolutionary studies help us to accumulate the library of essential EFs and to establish intricate relations between different folds and functional superfamilies. Generalized sequence-structure descriptors of the EF will become useful in future design and engineering of desired enzymatic functions.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Unit and Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | | |
Collapse
|
24
|
Shaytan AK, Armeev GA, Goncearenco A, Zhurkin VB, Landsman D, Panchenko AR. 6 Combined influence of linker DNA and histone tails on nucleosome dynamics as revealed by microsecond molecular dynamics simulations. J Biomol Struct Dyn 2015. [DOI: 10.1080/07391102.2015.1032630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
25
|
Goncearenco A, Berezovsky IN. The fundamental tradeoff in genomes and proteomes of prokaryotes established by the genetic code, codon entropy, and physics of nucleic acids and proteins. Biol Direct 2014; 9:29. [PMID: 25496919 PMCID: PMC4273451 DOI: 10.1186/s13062-014-0029-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 12/01/2014] [Indexed: 11/26/2022] Open
Abstract
Background Mutations in nucleotide sequences provide a foundation for genetic variability, and selection is the driving force of the evolution and molecular adaptation. Despite considerable progress in the understanding of selective forces and their compositional determinants, the very nature of underlying mutational biases remains unclear. Results We explore here a fundamental tradeoff, which analytically describes mutual adjustment of the nucleotide and amino acid compositions and its possible effect on the mutational biases. The tradeoff is determined by the interplay between the genetic code, optimization of the codon entropy, and demands on the structure and stability of nucleic acids and proteins. Conclusion The tradeoff is the unifying property of all prokaryotes regardless of the differences in their phylogenies, life styles, and extreme environments. It underlies mutational biases characteristic for genomes with different nucleotide and amino acid compositions, providing foundation for evolution and adaptation. Reviewers This article was reviewed by Eugene Koonin, Michael Gromiha, and Alexander Schleiffer. Electronic supplementary material The online version of this article (doi:10.1186/s13062-014-0029-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Unit and Department of Informatics, University of Bergen, N-5008, Bergen, Norway. .,Current address: Computational Biology Branch of the National Center for Biotechnology Information in Bethesda, Maryland, USA.
| | - Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore, 138671, Singapore. .,Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117597, Singapore, Singapore.
| |
Collapse
|
26
|
Goncearenco A, Shoemaker BA, Zhang D, Sarychev A, Panchenko AR. Coverage of protein domain families with structural protein-protein interactions: current progress and future trends. Prog Biophys Mol Biol 2014; 116:187-93. [PMID: 24931138 DOI: 10.1016/j.pbiomolbio.2014.05.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Revised: 04/14/2014] [Accepted: 05/17/2014] [Indexed: 11/16/2022]
Abstract
Protein interactions have evolved into highly precise and regulated networks adding an immense layer of complexity to cellular systems. The most accurate atomistic description of protein binding sites can be obtained directly from structures of protein complexes. The availability of structurally characterized protein interfaces significantly improves our understanding of interactomes, and the progress in structural characterization of protein-protein interactions (PPIs) can be measured by calculating the structural coverage of protein domain families. We analyze the coverage of protein domain families (defined according to CDD and Pfam databases) by structures, structural protein-protein complexes and unique protein binding sites. Structural PPI coverage of currently available protein families is about 30% without any signs of saturation in coverage growth dynamics. Given the current growth rates of domain databases and structural PPI deposition, complete domain coverage with PPIs is not expected in the near future. As a result of this study we identify families without any protein-protein interaction evidence (listed on a supporting website http://www.ncbi.nlm.nih.gov/Structure/ibis/coverage/) and propose them as potential targets for structural studies with a focus on protein interactions.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Branch of the National Center for Biotechnology Information in Bethesda, Maryland, United States
| | - Benjamin A Shoemaker
- Computational Biology Branch of the National Center for Biotechnology Information in Bethesda, Maryland, United States
| | - Dachuan Zhang
- Computational Biology Branch of the National Center for Biotechnology Information in Bethesda, Maryland, United States
| | - Alexey Sarychev
- Computational Biology Branch of the National Center for Biotechnology Information in Bethesda, Maryland, United States
| | - Anna R Panchenko
- Computational Biology Branch of the National Center for Biotechnology Information in Bethesda, Maryland, United States.
| |
Collapse
|
27
|
Goncearenco A, Ma BG, Berezovsky IN. Molecular mechanisms of adaptation emerging from the physics and evolution of nucleic acids and proteins. Nucleic Acids Res 2013; 42:2879-92. [PMID: 24371267 PMCID: PMC3950714 DOI: 10.1093/nar/gkt1336] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea.
Collapse
Affiliation(s)
- Alexander Goncearenco
- CBU, University of Bergen, 5020 Bergen, Norway, Department of Informatics, University of Bergen, 5020 Bergen, Norway, Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671 Singapore and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, 76100, Israel
| | | | | |
Collapse
|
28
|
Goncearenco A, Mitternacht S, Yong T, Eisenhaber B, Eisenhaber F, Berezovsky IN. SPACER: Server for predicting allosteric communication and effects of regulation. Nucleic Acids Res 2013; 41:W266-72. [PMID: 23737445 PMCID: PMC3692057 DOI: 10.1093/nar/gkt460] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The SPACER server provides an interactive framework for exploring allosteric communication in proteins with different sizes, degrees of oligomerization and function. SPACER uses recently developed theoretical concepts based on the thermodynamic view of allostery. It proposes easily tractable and meaningful measures that allow users to analyze the effect of ligand binding on the intrinsic protein dynamics. The server shows potential allosteric sites and allows users to explore communication between the regulatory and functional sites. It is possible to explore, for instance, potential effector binding sites in a given structure as targets for allosteric drugs. As input, the server only requires a single structure. The server is freely available at http://allostery.bii.a-star.edu.sg/.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Unit and Department of Informatics, University of Bergen, Bergen 5020, Norway
| | | | | | | | | | | |
Collapse
|
29
|
Goncearenco A, Grynberg P, Botvinnik OB, Macintyre G, Abeel T. Highlights from the Eighth International Society for Computational Biology (ISCB) Student Council Symposium 2012. BMC Bioinformatics 2012. [PMCID: PMC3522031 DOI: 10.1186/1471-2105-13-s18-a1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
30
|
Lasry I, Seo YA, Ityel H, Shalva N, Pode-Shakked B, Glaser F, Berman B, Berezovsky I, Goncearenco A, Klar A, Levy J, Anikster Y, Kelleher SL, Assaraf YG. A dominant negative heterozygous G87R mutation in the zinc transporter, ZnT-2 (SLC30A2), results in transient neonatal zinc deficiency. J Biol Chem 2012; 287:29348-61. [PMID: 22733820 DOI: 10.1074/jbc.m112.368159] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Zinc is an essential mineral, and infants are particularly vulnerable to zinc deficiency as they require large amounts of zinc for their normal growth and development. We have recently described the first loss-of-function mutation (H54R) in the zinc transporter ZnT-2 (SLC30A2) in mothers with infants harboring transient neonatal zinc deficiency (TNZD). Here we identified and characterized a novel heterozygous G87R ZnT-2 mutation in two unrelated Ashkenazi Jewish mothers with infants displaying TNZD. Transient transfection of G87R ZnT-2 resulted in endoplasmic reticulum-Golgi retention, whereas the WT transporter properly localized to intracellular secretory vesicles in HC11 and MCF-7 cells. Consequently, G87R ZnT-2 showed decreased stability compared with WT ZnT-2 as revealed by Western blot analysis. Three-dimensional homology modeling based on the crystal structure of YiiP, a close zinc transporter homologue from Escherichia coli, revealed that the basic arginine residue of the mutant G87R points toward the membrane lipid core, suggesting misfolding and possible loss-of-function. Indeed, functional assays including vesicular zinc accumulation, zinc secretion, and cytoplasmic zinc pool assessment revealed markedly impaired zinc transport in G87R ZnT-2 transfectants. Moreover, co-transfection experiments with both mutant and WT transporters revealed a dominant negative effect of G87R ZnT-2 over the WT ZnT-2; this was associated with mislocalization, decreased stability, and loss of zinc transport activity of the WT ZnT-2 due to homodimerization observed upon immunoprecipitation experiments. These findings establish that inactivating ZnT-2 mutations are an underlying basis of TNZD and provide the first evidence for the dominant inheritance of heterozygous ZnT-2 mutations via negative dominance due to homodimer formation.
Collapse
Affiliation(s)
- Inbal Lasry
- The Fred Wyszkowski Cancer Research Laboratory, Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Abstract
Background Despite recent progress in studies of the evolution of protein function, the questions what were the first functional protein domains and what were their basic building blocks remain unresolved. Previously, we introduced the concept of elementary functional loops (EFLs), which are the functional units of enzymes that provide elementary reactions in biochemical transformations. They are presumably descendants of primordial catalytic peptides. Results We analyzed distant evolutionary connections between protein functions in Archaea based on the EFLs comprising them. We show examples of the involvement of EFLs in new functional domains, as well as reutilization of EFLs and functional domains in building multidomain structures and protein complexes. Conclusions Our analysis of the archaeal superkingdom yields the dominating mechanisms in different periods of protein evolution, which resulted in several levels of the organization of biochemical function. First, functional domains emerged as combinations of prebiotic peptides with the very basic functions, such as nucleotide/phosphate and metal cofactor binding. Second, domain recombination brought to the evolutionary scene the multidomain proteins and complexes. Later, reutilization and de novo design of functional domains and elementary functional loops complemented evolution of protein function.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Unit, Uni Research, University of Bergen, N-5008 Bergen, Norway
| | | |
Collapse
|
32
|
Goncearenco A, Berezovsky IN. Computational reconstruction of primordial prototypes of elementary functional loops in modern proteins. ACTA ACUST UNITED AC 2011; 27:2368-75. [PMID: 21724592 DOI: 10.1093/bioinformatics/btr396] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
MOTIVATION Enzymes are complex catalytic machines, which perform sequences of elementary chemical transformations resulting in biochemical function. The building blocks of enzymes, elementary functional loops (EFLs), possess distinct functional signatures and provide catalytic and binding amino acids to the enzyme's active sites. The goal of this work is to obtain primordial prototypes of EFLs that existed before the formation of enzymatic domains and served as their building blocks. RESULTS We developed a computational strategy for reconstructing ancient prototypes of EFLs based on the comparison of sequence segments on the proteomic scale, which goes beyond detection of conserved functional motifs in homologous proteins. We illustrate the procedure by a CxxC-containing prototype with a very basic and ancient elementary function of metal/metal-containing cofactor binding and redox activity. Acquiring the prototypes of EFLs is necessary for revealing how the original set of protein folds with enzymatic functions emerged in predomain evolution. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT igor.berezovsky@uni.no.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Unit, Uni Research, University of Bergen, N-5008 Bergen, Norway
| | | |
Collapse
|
33
|
Abstract
Motivation: Earlier studies of protein structure revealed closed loops with a characteristic size 25–30 residues and ring-like shape as a basic universal structural element of globular proteins. Elementary functional loops (EFLs) have specific signatures and provide functional residues important for binding/activation and principal chemical transformation steps of the enzymatic reaction. The goal of this work is to show how these functional loops evolved from pre-domain peptides and to find a set of prototypes from which the EFLs of contemporary proteins originated. Results: This article describes a computational method for deriving prototypes of EFLs based on the sequences of complete genomes. The procedure comprises the iterative derivation of sequence profiles followed by their hierarchical clustering. The scoring function takes into account information content on profile positions, thus preserving the signature. The statistical significance of scores is evaluated from the empirical distribution of scores of the background model. A set of prototypes of EFLs from archaeal proteomes is derived. This set delineates evolutionary connections between major functions and illuminates how folds and functions emerged in pre-domain evolution as a combination of prototypes. Contact:Igor.Berezovsky@uni.no
Collapse
Affiliation(s)
- Alexander Goncearenco
- Bergen Center for Computational Science and Department of Informatics, University of Bergen, Bergen, Norway
| | | |
Collapse
|
34
|
Ma BG, Goncearenco A, Berezovsky IN. Thermophilic Adaptation of Protein Complexes Inferred from Proteomic Homology Modeling. Structure 2010; 18:819-28. [DOI: 10.1016/j.str.2010.04.004] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2010] [Revised: 03/14/2010] [Accepted: 04/01/2010] [Indexed: 11/27/2022]
|
35
|
Berezovsky IN, Ma BG, Goncearenco A. Thermophilic Adaptation of Protein Complexes Inferred from Proteomic Homology Modeling. Biophys J 2010. [DOI: 10.1016/j.bpj.2009.12.2473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|