1
|
Amalfitano A, Stocchi N, Atencio HM, Villarreal F, Ten Have A. Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues. Genome Biol 2024; 25:230. [PMID: 39187866 PMCID: PMC11346255 DOI: 10.1186/s13059-024-03371-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 08/13/2024] [Indexed: 08/28/2024] Open
Abstract
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.
Collapse
Affiliation(s)
- Agustín Amalfitano
- Laboratorio de Procesamiento de Imágenes, ICyTE-CONICET-UNMdP, Mar del Plata, Argentina
| | - Nicolás Stocchi
- Computational Biology and Comparative Genomics, IIB-CONICET-UNMdP, Mar del Plata, Argentina
| | - Hugo Marcelo Atencio
- Banco Activo de Germoplasma de Papa Andina, EEA-Balcarce INTA, Balcarce, Argentina
| | - Fernando Villarreal
- Computational Biology and Comparative Genomics, IIB-CONICET-UNMdP, Mar del Plata, Argentina.
| | - Arjen Ten Have
- Computational Biology and Comparative Genomics, IIB-CONICET-UNMdP, Mar del Plata, Argentina
| |
Collapse
|
2
|
Pomarici ND, Cacciato R, Kokot J, Fernández-Quintero ML, Liedl KR. Evolution of the Immunoglobulin Isotypes-Variations of Biophysical Properties among Animal Classes. Biomolecules 2023; 13:801. [PMID: 37238671 PMCID: PMC10216798 DOI: 10.3390/biom13050801] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/03/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023] Open
Abstract
The adaptive immune system arose around 500 million years ago in jawed fish, and, since then, it has mediated the immune defense against pathogens in all vertebrates. Antibodies play a central role in the immune reaction, recognizing and attacking external invaders. During the evolutionary process, several immunoglobulin isotypes emerged, each having a characteristic structural organization and dedicated function. In this work, we investigate the evolution of the immunoglobulin isotypes, in order to highlight the relevant features that were preserved over time and the parts that, instead, mutated. The residues that are coupled in the evolution process are often involved in intra- or interdomain interactions, meaning that they are fundamental to maintaining the immunoglobulin fold and to ensuring interactions with other domains. The explosive growth of available sequences allows us to point out the evolutionary conserved residues and compare the biophysical properties among different animal classes and isotypes. Our study offers a general overview of the evolution of immunoglobulin isotypes and advances the knowledge of their characteristic biophysical properties, as a first step in guiding protein design from evolution.
Collapse
Affiliation(s)
| | | | | | - Monica L. Fernández-Quintero
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innrain 80-82, A-6020 Innsbruck, Austria
| | - Klaus R. Liedl
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innrain 80-82, A-6020 Innsbruck, Austria
| |
Collapse
|
3
|
Huh E, Agosto MA, Wensel TG, Lichtarge O. Coevolutionary signals in metabotropic glutamate receptors capture residue contacts and long-range functional interactions. J Biol Chem 2023; 299:103030. [PMID: 36806686 PMCID: PMC10060750 DOI: 10.1016/j.jbc.2023.103030] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/18/2023] Open
Abstract
Upon ligand binding to a G protein-coupled receptor, extracellular signals are transmitted into a cell through sets of residue interactions that translate ligand binding into structural rearrangements. These interactions needed for functions impose evolutionary constraints so that, on occasion, mutations in one position may be compensated by other mutations at functionally coupled positions. To quantify the impact of amino acid substitutions in the context of major evolutionary divergence in the G protein-coupled receptor subfamily of metabotropic glutamate receptors (mGluRs), we combined two phylogenetic-based algorithms, Evolutionary Trace and covariation Evolutionary Trace, to infer potential structure-function couplings and roles in mGluRs. We found a subset of evolutionarily important residues at known functional sites and evidence of coupling among distinct structural clusters in mGluR. In addition, experimental mutagenesis and functional assays confirmed that some highly covariant residues are coupled, revealing their synergy. Collectively, these findings inform a critical step toward understanding the molecular and structural basis of amino acid variation patterns within mGluRs and provide insight for drug development, protein engineering, and analysis of naturally occurring variants.
Collapse
Affiliation(s)
- Eunna Huh
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Retina and Optic Nerve Research Laboratory, Department of Physiology and Biophysics, Dalhousie University, Halifax, Canada
| | - Theodore G Wensel
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
4
|
Walther D. Specifics of Metabolite-Protein Interactions and Their Computational Analysis and Prediction. Methods Mol Biol 2023; 2554:179-197. [PMID: 36178627 DOI: 10.1007/978-1-0716-2624-5_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Computational approaches to the characterization and prediction of compound-protein interactions have a long research history and are well established, driven primarily by the needs of drug development. While, in principle, many of the computational methods developed in the context of drug development can also be applied directly to the investigation of metabolite-protein interactions, the interactions of metabolites with proteins (enzymes) are characterized by a number of particularities that result from their natural evolutionary origin and their biological and biochemical roles, as well as from a different problem setting when investigating them. In this review, these special aspects will be highlighted and recent research on them and developed computational approaches presented, along with available resources. They concern, among others, binding promiscuity, allostery, the role of posttranslational modifications, molecular steering and crowding effects, and metabolic conversion rate predictions. Recent breakthroughs in the field of protein structure prediction and newly developed machine learning techniques are being discussed as a tremendous opportunity for developing a more detailed molecular understanding of metabolism.
Collapse
Affiliation(s)
- Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.
| |
Collapse
|
5
|
Wang L, Guo S, Zeng B, Wang S, Chen Y, Cheng S, Liu B, Wang C, Wang Y, Meng Q. Draft Genome Assembly and Annotation for Cutaneotrichosporon dermatis NICC30027, an Oleaginous Yeast Capable of Simultaneous Glucose and Xylose Assimilation. MYCOBIOLOGY 2022; 50:69-81. [PMID: 35291590 PMCID: PMC8890563 DOI: 10.1080/12298093.2022.2038844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 01/10/2022] [Accepted: 02/02/2022] [Indexed: 06/14/2023]
Abstract
The identification of oleaginous yeast species capable of simultaneously utilizing xylose and glucose as substrates to generate value-added biological products is an area of key economic interest. We have previously demonstrated that the Cutaneotrichosporon dermatis NICC30027 yeast strain is capable of simultaneously assimilating both xylose and glucose, resulting in considerable lipid accumulation. However, as no high-quality genome sequencing data or associated annotations for this strain are available at present, it remains challenging to study the metabolic mechanisms underlying this phenotype. Herein, we report a 39,305,439 bp draft genome assembly for C. dermatis NICC30027 comprised of 37 scaffolds, with 60.15% GC content. Within this genome, we identified 524 tRNAs, 142 sRNAs, 53 miRNAs, 28 snRNAs, and eight rRNA clusters. Moreover, repeat sequences totaling 1,032,129 bp in length were identified (2.63% of the genome), as were 14,238 unigenes that were 1,789.35 bp in length on average (64.82% of the genome). The NCBI non-redundant protein sequences (NR) database was employed to successfully annotate 11,795 of these unigenes, while 3,621 and 11,902 were annotated with the Swiss-Prot and TrEMBL databases, respectively. Unigenes were additionally subjected to pathway enrichment analyses using the Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Cluster of Orthologous Groups of proteins (COG), Clusters of orthologous groups for eukaryotic complete genomes (KOG), and Non-supervised Orthologous Groups (eggNOG) databases. Together, these results provide a foundation for future studies aimed at clarifying the mechanistic basis for the ability of C. dermatis NICC30027 to simultaneously utilize glucose and xylose to synthesize lipids.
Collapse
Affiliation(s)
- Laiyou Wang
- School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China
- Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
| | - Shuxian Guo
- School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China
- Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
| | - Bo Zeng
- School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China
- Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
| | - Shanshan Wang
- School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China
- Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
| | - Yan Chen
- School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China
- Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
| | - Shuang Cheng
- School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China
- Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
| | - Bingbing Liu
- School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China
- Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
| | - Chunyan Wang
- School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China
- Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
| | - Yu Wang
- College of Biological Science and Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Qingshan Meng
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
6
|
Functional Classification and Characterization of the Fungal Glycoside Hydrolase 28 Protein Family. J Fungi (Basel) 2022; 8:jof8030217. [PMID: 35330219 PMCID: PMC8952511 DOI: 10.3390/jof8030217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 02/13/2022] [Accepted: 02/15/2022] [Indexed: 02/01/2023] Open
Abstract
Pectin is a major constituent of the plant cell wall, comprising compounds with important industrial applications such as homogalacturonan, rhamnogalacturonan and xylogalacturonan. A large array of enzymes is involved in the degradation of this amorphous substrate. The Glycoside Hydrolase 28 (GH28) family includes polygalacturonases (PG), rhamnogalacturonases (RG) and xylogalacturonases (XG) that share a structure of three to four pleated β-sheets that form a rod with the catalytic site amidst a long, narrow groove. Although these enzymes have been studied for many years, there has been no systematic analysis. We have collected a comprehensive set of GH28 encoding sequences to study their evolution in fungi, directed at obtaining a functional classification, as well as at the identification of substrate specificity as functional constraint. Computational tools such as Alphafold, Consurf and MEME were used to identify the subfamilies’ characteristics. A hierarchic classification defines the major classes of endoPG, endoRG and endoXG as well as three exoPG classes. Ascomycete endoPGs are further classified in two subclasses whereas we identify four exoRG subclasses. Diversification towards exomode is explained by loops that appear inserted in a number of turns. Substrate-driven diversification can be identified by various specificity determining positions that appear to surround the binding groove.
Collapse
|
7
|
Extracting phylogenetic dimensions of coevolution reveals hidden functional signals. Sci Rep 2022; 12:820. [PMID: 35039514 PMCID: PMC8764114 DOI: 10.1038/s41598-021-04260-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 12/17/2021] [Indexed: 11/08/2022] Open
Abstract
Despite the structural and functional information contained in the statistical coupling between pairs of residues in a protein, coevolution associated with function is often obscured by artifactual signals such as genetic drift, which shapes a protein's phylogenetic history and gives rise to concurrent variation between protein sequences that is not driven by selection for function. Here, we introduce a background model for phylogenetic contributions of statistical coupling that separates the coevolution signal due to inter-clade and intra-clade sequence comparisons and demonstrate that coevolution can be measured on multiple phylogenetic timescales within a single protein. Our method, nested coevolution (NC), can be applied as an extension to any coevolution metric. We use NC to demonstrate that poorly conserved residues can nonetheless have important roles in protein function. Moreover, NC improved the structural-contact predictions of several coevolution-based methods, particularly in subsampled alignments with fewer sequences. NC also lowered the noise in detecting functional sectors of collectively coevolving residues. Sectors of coevolving residues identified after application of NC were more spatially compact and phylogenetically distinct from the rest of the protein, and strongly enriched for mutations that disrupt protein activity. Thus, our conceptualization of the phylogenetic separation of coevolution provides the potential to further elucidate relationships among protein evolution, function, and genetic diseases.
Collapse
|
8
|
Abstract
Junctophilins (JPHs) comprise a family of structural proteins that connect the plasma membrane to intracellular organelles such as the endo/sarcoplasmic reticulum. Tethering of these membrane structures results in the formation of highly organized subcellular junctions that play important signaling roles in all excitable cell types. There are four JPH isoforms, expressed primarily in muscle and neuronal cell types. Each JPH protein consists of 6 'membrane occupation and recognition nexus' (MORN) motifs, a joining region connecting these to another set of 2 MORN motifs, a putative alpha-helical region, a divergent region exhibiting low homology between JPH isoforms, and a carboxy-terminal transmembrane region anchoring into the ER/SR membrane. JPH isoforms play essential roles in developing and maintaining subcellular membrane junctions. Conversely, inherited mutations in JPH2 cause hypertrophic or dilated cardiomyopathy, while trinucleotide expansions in the JPH3 gene cause Huntington Disease-Like 2. Loss of JPH1 protein levels can cause skeletal myopathy, while loss of cardiac JPH2 levels causes heart failure and atrial fibrillation, among other disease. This review will provide a comprehensive overview of the JPH gene family, phylogeny, and evolutionary analysis of JPH genes and other MORN domain proteins. JPH biogenesis, membrane tethering, and binding partners will be discussed, as well as functional roles of JPH isoforms in excitable cells. Finally, potential roles of JPH isoform deficits in human disease pathogenesis will be reviewed.
Collapse
Affiliation(s)
- Stephan E Lehnart
- Cellular Biophysics and Translational Cardiology Section, Heart Research Center Göttingen, University Medical Center Göttingen, Department of Cardiology and Pneumology, Georg-August University Göttingen, Göttingen, Germany.,Cluster of Excellence "Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells" (MBExC), University of Göttingen, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Göttingen, Germany
| | - Xander H T Wehrens
- Cardiovascular Research Institute, Baylor College of Medicine, Houston, Texas, United States; Departments of Molecular Physiology and Biophysics, Medicine (Cardiology), Pediatrics (Cardiology), Neuroscience, and Center for Space Medicine, Baylor College of Medicine, Houston, Texas, United States
| |
Collapse
|
9
|
Tsutakawa SE, Bacolla A, Katsonis P, Bralić A, Hamdan SM, Lichtarge O, Tainer JA, Tsai CL. Decoding Cancer Variants of Unknown Significance for Helicase-Nuclease-RPA Complexes Orchestrating DNA Repair During Transcription and Replication. Front Mol Biosci 2021; 8:791792. [PMID: 34966786 PMCID: PMC8710748 DOI: 10.3389/fmolb.2021.791792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 11/16/2021] [Indexed: 01/13/2023] Open
Abstract
All tumors have DNA mutations, and a predictive understanding of those mutations could inform clinical treatments. However, 40% of the mutations are variants of unknown significance (VUS), with the challenge being to objectively predict whether a VUS is pathogenic and supports the tumor or whether it is benign. To objectively decode VUS, we mapped cancer sequence data and evolutionary trace (ET) scores onto crystallography and cryo-electron microscopy structures with variant impacts quantitated by evolutionary action (EA) measures. As tumors depend on helicases and nucleases to deal with transcription/replication stress, we targeted helicase–nuclease–RPA complexes: (1) XPB-XPD (within TFIIH), XPF-ERCC1, XPG, and RPA for transcription and nucleotide excision repair pathways and (2) BLM, EXO5, and RPA plus DNA2 for stalled replication fork restart. As validation, EA scoring predicts severe effects for most disease mutations, but disease mutants with low ET scores not only are likely destabilizing but also disrupt sophisticated allosteric mechanisms. For sites of disease mutations and VUS predicted to be severe, we found strong co-localization to ordered regions. Rare discrepancies highlighted the different survival requirements between disease and tumor mutations, as well as the value of examining proteins within complexes. In a genome-wide analysis of 33 cancer types, we found correlation between the number of mutations in each tumor and which pathways or functional processes in which the mutations occur, revealing different mutagenic routes to tumorigenesis. We also found upregulation of ancient genes including BLM, which supports a non-random and concerted cancer process: reversion to a unicellular, proliferation-uncontrolled, status by breaking multicellular constraints on cell division. Together, these genes and global analyses challenge the binary “driver” and “passenger” mutation paradigm, support a gradient impact as revealed by EA scoring from moderate to severe at a single gene level, and indicate reduced regulation as well as activity. The objective quantitative assessment of VUS scoring and gene overexpression in the context of functional interactions and pathways provides insights for biology, oncology, and precision medicine.
Collapse
Affiliation(s)
- Susan E Tsutakawa
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Albino Bacolla
- Department of Molecular and Cellular Oncology, University of Texas M.D. Anderson Cancer Center, Houston, TX, United States
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States
| | - Amer Bralić
- Laboratory of DNA Replication and Recombination, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Samir M Hamdan
- Laboratory of DNA Replication and Recombination, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States
| | - John A Tainer
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, CA, United States.,Department of Molecular and Cellular Oncology, University of Texas M.D. Anderson Cancer Center, Houston, TX, United States.,Department of Cancer Biology, University of Texas M.D. Anderson Cancer Center, Houston, TX, United States
| | - Chi-Lin Tsai
- Department of Molecular and Cellular Oncology, University of Texas M.D. Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|
10
|
The Functional Differences between the GroEL Chaperonin of Escherichia coli and the HtpB Chaperonin of Legionella pneumophila Can Be Mapped to Specific Amino Acid Residues. Biomolecules 2021; 12:biom12010059. [PMID: 35053207 PMCID: PMC8774168 DOI: 10.3390/biom12010059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/26/2021] [Accepted: 12/28/2021] [Indexed: 11/17/2022] Open
Abstract
Group I chaperonins are a highly conserved family of essential proteins that self-assemble into molecular nanoboxes that mediate the folding of cytoplasmic proteins in bacteria and organelles. GroEL, the chaperonin of Escherichia coli, is the archetype of the family. Protein folding-independent functions have been described for numerous chaperonins, including HtpB, the chaperonin of the bacterial pathogen Legionella pneumophila. Several protein folding-independent functions attributed to HtpB are not shared by GroEL, suggesting that differences in the amino acid (aa) sequence between these two proteins could correlate with functional differences. GroEL and HtpB differ in 137 scattered aa positions. Using the Evolutionary Trace (ET) bioinformatics method, site-directed mutagenesis, and a functional reporter test based upon a yeast-two-hybrid interaction with the eukaryotic protein ECM29, it was determined that out of those 137 aa, ten (M68, M212, S236, K298, N507 and the cluster AEHKD in positions 471-475) were involved in the interaction of HtpB with ECM29. GroEL was completely unable to interact with ECM29, but when GroEL was modified at those 10 aa positions, to display the HtpB aa, it acquired a weak ability to interact with ECM29. This constitutes proof of concept that the unique functional abilities of HtpB can be mapped to specific aa positions.
Collapse
|
11
|
Sinelnikov IG, Siedhoff NE, Chulkin AM, Zorov IN, Schwaneberg U, Davari MD, Sinitsyna OA, Shcherbakova LA, Sinitsyn AP, Rozhkova AM. Expression and Refolding of the Plant Chitinase From Drosera capensis for Applications as a Sustainable and Integrated Pest Management. Front Bioeng Biotechnol 2021; 9:728501. [PMID: 34621729 PMCID: PMC8490864 DOI: 10.3389/fbioe.2021.728501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 09/08/2021] [Indexed: 11/13/2022] Open
Abstract
Recently, the study of chitinases has become an important target of numerous research projects due to their potential for applications, such as biocontrol pest agents. Plant chitinases from carnivorous plants of the genus Drosera are most aggressive against a wide range of phytopathogens. However, low solubility or insolubility of the target protein hampered application of chitinases as biofungicides. To obtain plant chitinase from carnivorous plants of the genus Drosera in soluble form in E.coli expression strains, three different approaches including dialysis, rapid dilution, and refolding on Ni-NTA agarose to renaturation were tested. The developed « Rapid dilution » protocol with renaturation buffer supplemented by 10% glycerol and 2M arginine in combination with the redox pair of reduced/oxidized glutathione, increased the yield of active soluble protein to 9.5 mg per 1 g of wet biomass. A structure-based removal of free cysteines in the core domain based on homology modeling of the structure was carried out in order to improve the soluble of chitinase. One improved chitinase variant (C191A/C231S/C286T) was identified which shows improved expression and solubility in E. coli expression systems compared to wild type. Computational analyzes of the wild-type and the improved variant revealed overall higher fluctuations of the structure while maintaining a global protein stability. It was shown that free cysteines on the surface of the protein globule which are not involved in the formation of inner disulfide bonds contribute to the insolubility of chitinase from Drosera capensis. The functional characteristics showed that chitinase exhibits high activity against colloidal chitin (360 units/g) and high fungicidal properties of recombinant chitinases against Parastagonospora nodorum. Latter highlights the application of chitinase from D. capensis as a promising enzyme for the control of fungal pathogens in agriculture.
Collapse
Affiliation(s)
- Igor G Sinelnikov
- Federal Research Centre Fundamentals of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | | | - Andrey M Chulkin
- Federal Research Centre Fundamentals of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Ivan N Zorov
- Federal Research Centre Fundamentals of Biotechnology, Russian Academy of Sciences, Moscow, Russia.,Department of Chemistry, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Ulrich Schwaneberg
- Institute of Biotechnology, RWTH Aachen University, Aachen, Germany.,DWI-Leibniz Institute for Interactive Materials, Aachen, Germany
| | - Mehdi D Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Halle, Germany
| | - Olga A Sinitsyna
- Department of Chemistry, M.V. Lomonosov Moscow State University, Moscow, Russia
| | | | - Arkady P Sinitsyn
- Federal Research Centre Fundamentals of Biotechnology, Russian Academy of Sciences, Moscow, Russia.,Department of Chemistry, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Aleksandra M Rozhkova
- Federal Research Centre Fundamentals of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
12
|
Das S, Scholes HM, Sen N, Orengo C. CATH functional families predict functional sites in proteins. Bioinformatics 2021; 37:1099-1106. [PMID: 33135053 PMCID: PMC8150129 DOI: 10.1093/bioinformatics/btaa937] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| |
Collapse
|
13
|
Porzio E, Faraone Mennella MR, Manco G. DING Proteins Extend to the Extremophilic World. Int J Mol Sci 2021; 22:2035. [PMID: 33670786 PMCID: PMC7922408 DOI: 10.3390/ijms22042035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 02/04/2021] [Accepted: 02/16/2021] [Indexed: 11/16/2022] Open
Abstract
The DING proteins are ubiquitous in the three domains of life, from mesophiles to thermo- and hyperthermophiles. They belong to a family of more than sixty members and have a characteristic N-terminus, DINGGG, which is considered a "signature" of these proteins. Structurally, they share a highly conserved phosphate binding site, and a three dimensional organization resembling the "Venus Flytrap", both reminding the ones of PstS proteins. They have unusually high sequence conservation, even between distantly related species. Nevertheless despite that the genomes of most of these species have been sequenced, the DING gene has not been reported for all the relative characterized DING proteins. Identity of known DING proteins has been confirmed immunologically and, in some cases, by N-terminal sequence analysis. Only a few of the DING proteins have been purified and biochemically characterized. DING proteins are heterogeneous for their wide range of biological activities and some show different activities not always correlated with each other. Most of them have been originally identified for different biological properties, or rather for binding to phosphate and also to other ligands. Their involvement in pathologies is described. This review is an update of the most recent findings on old and new DING proteins.
Collapse
Affiliation(s)
- Elena Porzio
- Institute of Biochemistry and Cell Biology, CNR, Via P. Castellino 111, 80131 Naples, Italy;
| | | | - Giuseppe Manco
- Institute of Biochemistry and Cell Biology, CNR, Via P. Castellino 111, 80131 Naples, Italy;
| |
Collapse
|
14
|
Evolutionary History of Alzheimer Disease-Causing Protein Family Presenilins with Pathological Implications. J Mol Evol 2020; 88:674-688. [DOI: 10.1007/s00239-020-09966-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Accepted: 09/22/2020] [Indexed: 12/14/2022]
|
15
|
Serçinoğlu O, Ozbek P. Sequence-structure-function relationships in class I MHC: A local frustration perspective. PLoS One 2020; 15:e0232849. [PMID: 32421728 PMCID: PMC7233585 DOI: 10.1371/journal.pone.0232849] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 04/22/2020] [Indexed: 12/22/2022] Open
Abstract
Class I Major Histocompatibility Complex (MHC) binds short antigenic peptides with the help of Peptide Loading Complex (PLC), and presents them to T-cell Receptors (TCRs) of cytotoxic T-cells and Killer-cell Immunglobulin-like Receptors (KIRs) of Natural Killer (NK) cells. With more than 10000 alleles, human MHC (Human Leukocyte Antigen, HLA) is the most polymorphic protein in humans. This allelic diversity provides a wide coverage of peptide sequence space, yet does not affect the three-dimensional structure of the complex. Moreover, TCRs mostly interact with HLA in a common diagonal binding mode, and KIR-HLA interaction is allele-dependent. With the aim of establishing a framework for understanding the relationships between polymorphism (sequence), structure (conserved fold) and function (protein interactions) of the human MHC, we performed here a local frustration analysis on pMHC homology models covering 1436 HLA I alleles. An analysis of local frustration profiles indicated that (1) variations in MHC fold are unlikely due to minimally-frustrated and relatively conserved residues within the HLA peptide-binding groove, (2) high frustration patches on HLA helices are either involved in or near interaction sites of MHC with the TCR, KIR, or tapasin of the PLC, and (3) peptide ligands mainly stabilize the F-pocket of HLA binding groove.
Collapse
Affiliation(s)
- Onur Serçinoğlu
- Department of Bioengineering, Recep Tayyip Erdogan University, Faculty of Engineering, Fener, Rize, Turkey
| | - Pemra Ozbek
- Department of Bioengineering, Marmara University, Faculty of Engineering, Goztepe, Istanbul, Turkey
- * E-mail:
| |
Collapse
|
16
|
Novikov IB, Wilkins AD, Lichtarge O. An Evolutionary Trace method defines functionally important bases and sites common to RNA families. PLoS Comput Biol 2020; 16:e1007583. [PMID: 32208421 PMCID: PMC7092961 DOI: 10.1371/journal.pcbi.1007583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 11/27/2019] [Indexed: 11/18/2022] Open
Abstract
Functional non-coding (fnc)RNAs are nucleotide sequences of varied lengths, structures, and mechanisms that ubiquitously influence gene expression and translation, genome stability and dynamics, and human health and disease. Here, to shed light on their functional determinants, we seek to exploit the evolutionary record of variation and divergence read from sequence comparisons. The approach follows the phylogenetic Evolutionary Trace (ET) paradigm, first developed and extensively validated on proteins. We assigned a relative rank of importance to every base in a study of 1070 functional RNAs, including the ribosome, and observed evolutionary patterns strikingly similar to those seen in proteins, namely, (1) the top-ranked bases clustered in secondary and tertiary structures. (2) In turn, these clusters mapped functional regions for catalysis, binding proteins and drugs, post-transcriptional modification, and deleterious mutations. (3) Moreover, the quantitative quality of these clusters correlated with the identification of functional regions. (4) As a result of this correlation, smoother structural distributions of evolutionary important nucleotides improved functional site predictions. Thus, in practice, phylogenetic analysis can broadly identify functional determinants in RNA sequences and functional sites in RNA structures, and reveal details on the basis of RNA molecular functions. As example of application, we report several previously undocumented and potentially functional ET nucleotide clusters in the ribosome. This work is broadly relevant to studies of structure-function in ribonucleic acids. Additionally, this generalization of ET shows that evolutionary constraints among sequence, structure, and function are similar in structured RNA and proteins. RNA ET is currently available as part of the ET command-line package, and will be available as a web-server. Traditionally, RNA has been delegated to the role of an intermediate between DNA and proteins. However, we now recognize that RNAs are broadly functional beyond their role in translation, and that a number of diverse classes exist. Because functional, non-coding RNAs are prevalent in biology and impact human health, it is important to better understand their functional determinants. However, the classical solution to this problem, targeted mutagenesis, is time-consuming and scales poorly. We propose an alternative computational approach to this problem, the Evolutionary Trace method. Previously developed and validated for proteins, Evolutionary Trace examines evolutionary history of a molecule and predicts evolutionarily important residues in the sequence. We apply Evolutionary Trace to a set of diverse RNAs, and find that the evolutionarily important nucleotides cluster on the three-dimensional structure, and that these clusters closely overlap functional sites. We also find that the clustering property can be used to refine and improve predictions. These findings are in close agreement with our observations of Evolutionary Trace in proteins, and suggest that structured functional RNAs and proteins evolve under similar constraints. In practice, the approach is to be used by RNA researches seeking insight into their molecule of interest, and the Evolutionary Trace program, along with a working example, is available at https://github.com/LichtargeLab/RNA_ET_ms.
Collapse
Affiliation(s)
- Ilya B. Novikov
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, United States of America
| | - Angela D. Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
17
|
Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity. Sci Rep 2020; 10:1691. [PMID: 32015389 PMCID: PMC6997377 DOI: 10.1038/s41598-019-55118-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 11/23/2019] [Indexed: 01/03/2023] Open
Abstract
Protein functional constraints are manifest as superfamily and functional-subgroup conserved residues, and as pairwise correlations. Deep Analysis of Residue Constraints (DARC) aids the visualization of these constraints, characterizes how they correlate with each other and with structure, and estimates statistical significance. This can identify determinants of protein functional specificity, as we illustrate for bacterial DNA clamp loader ATPases. These load ring-shaped sliding clamps onto DNA to keep polymerase attached during replication and contain one δ, three γ, and one δ’ AAA+ subunits semi-circularly arranged in the order δ-γ1-γ2-γ3-δ’. Only γ is active, though both γ and δ’ functionally influence an adjacent γ subunit. DARC identifies, as functionally-congruent features linking allosterically the ATP, DNA, and clamp binding sites: residues distinctive of γ and of γ/δ’ that mutually interact in trans, centered on the catalytic base; several γ/δ’-residues and six γ/δ’-covariant residue pairs within the DNA binding N-termini of helices α2 and α3; and γ/δ’-residues associated with the α2 C-terminus and the clamp-binding loop. Most notable is a trans-acting γ/δ’ hydroxyl group that 99% of other AAA+ proteins lack. Mutation of this hydroxyl to a methyl group impedes clamp binding and opening, DNA binding, and ATP hydrolysis—implying a remarkably clamp-loader-specific function.
Collapse
|
18
|
White MA, Tsalkova T, Mei FC, Cheng X. Conformational States of Exchange Protein Directly Activated by cAMP (EPAC1) Revealed by Ensemble Modeling and Integrative Structural Biology. Cells 2019; 9:cells9010035. [PMID: 31877746 PMCID: PMC7016869 DOI: 10.3390/cells9010035] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 02/08/2023] Open
Abstract
Exchange proteins directly activated by cAMP (EPAC1 and EPAC2) are important allosteric regulators of cAMP-mediated signal transduction pathways. To understand the molecular mechanism of EPAC activation, we performed detailed Small-Angle X-ray Scattering (SAXS) analysis of EPAC1 in its apo (inactive), cAMP-bound, and effector (Rap1b)-bound states. Our study demonstrates that we can model the solution structures of EPAC1 in each state using ensemble analysis and homology models derived from the crystal structures of EPAC2. The N-terminal domain of EPAC1, which is not conserved between EPAC1 and EPAC2, appears folded and interacts specifically with another component of EPAC1 in each state. The apo-EPAC1 state is a dynamic mixture of a compact (Rg = 32.9 Å, 86%) and a more extended (Rg = 38.5 Å, 13%) conformation. The cAMP-bound form of EPAC1 in the absence of Rap1 forms a dimer in solution; but its molecular structure is still compatible with the active EPAC1 conformation of the ternary complex model with cAMP and Rap1. Herein, we show that SAXS can elucidate the conformational states of EPAC1 activation as it proceeds from the compact, inactive apo conformation through a previously unknown intermediate-state, to the extended cAMP-bound form, and then binds to its effector (Rap1b) in a ternary complex.
Collapse
Affiliation(s)
- Mark Andrew White
- Sealy Center for Structural Biology and Molecular Biophysics, The University of Texas Medical Branch, Galveston, TX 77555, USA
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX 77555, USA
- Correspondence: (M.A.W.); (X.C.); Tel.: +409-747-4747 (M.A.W.); +713-500-7487 (X.C.)
| | - Tamara Tsalkova
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch, Galveston, TX 77555, USA;
| | - Fang C. Mei
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, Houston, TX 77030, USA;
- Texas Therapeutics Institute, Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Xiaodong Cheng
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, Houston, TX 77030, USA;
- Texas Therapeutics Institute, Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Correspondence: (M.A.W.); (X.C.); Tel.: +409-747-4747 (M.A.W.); +713-500-7487 (X.C.)
| |
Collapse
|
19
|
Split intein-mediated selection of cells containing two plasmids using a single antibiotic. Nat Commun 2019; 10:4967. [PMID: 31672972 PMCID: PMC6823396 DOI: 10.1038/s41467-019-12911-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 10/07/2019] [Indexed: 11/08/2022] Open
Abstract
To build or dissect complex pathways in bacteria and mammalian cells, it is often necessary to recur to at least two plasmids, for instance harboring orthogonal inducible promoters. Here we present SiMPl, a method based on rationally designed split enzymes and intein-mediated protein trans-splicing, allowing the selection of cells carrying two plasmids with a single antibiotic. We show that, compared to the traditional method based on two antibiotics, SiMPl increases the production of the antimicrobial non-ribosomal peptide indigoidine and the non-proteinogenic aromatic amino acid para-amino-L-phenylalanine from bacteria. Using a human T cell line, we employ SiMPl to obtain a highly pure population of cells double positive for the two chains of the T cell receptor, TCRα and TCRβ, using a single antibiotic. SiMPl has profound implications for metabolic engineering and for constructing complex synthetic circuits in bacteria and mammalian cells.
Collapse
|
20
|
Hamm MO, Moss BL, Leydon AR, Gala HP, Lanctot A, Ramos R, Klaeser H, Lemmex AC, Zahler ML, Nemhauser JL, Wright RC. Accelerating structure-function mapping using the ViVa webtool to mine natural variation. PLANT DIRECT 2019; 3:e00147. [PMID: 31372596 PMCID: PMC6658840 DOI: 10.1002/pld3.147] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Revised: 04/20/2019] [Accepted: 04/29/2019] [Indexed: 05/13/2023]
Abstract
Thousands of sequenced genomes are now publicly available capturing a significant amount of natural variation within plant species; yet, much of these data remain inaccessible to researchers without significant bioinformatics experience. Here, we present a webtool called ViVa (Visualizing Variation) which aims to empower any researcher to take advantage of the amazing genetic resource collected in the Arabidopsis thaliana 1001 Genomes Project (http://1001genomes.org). ViVa facilitates data mining on the gene, gene family, or gene network level. To test the utility and accessibility of ViVa, we assembled a team with a range of expertise within biology and bioinformatics to analyze the natural variation within the well-studied nuclear auxin signaling pathway. Our analysis has provided further confirmation of existing knowledge and has also helped generate new hypotheses regarding this well-studied pathway. These results highlight how natural variation could be used to generate and test hypotheses about less-studied gene families and networks, especially when paired with biochemical and genetic characterization. ViVa is also readily extensible to databases of interspecific genetic variation in plants as well as other organisms, such as the 3,000 Rice Genomes Project ( http://snp-seek.irri.org/) and human genetic variation ( https://www.ncbi.nlm.nih.gov/clinvar/).
Collapse
Affiliation(s)
- Morgan O. Hamm
- Department of BiologyUniversity of WashingtonSeattleWashington
| | | | | | - Hardik P. Gala
- Department of BiologyUniversity of WashingtonSeattleWashington
| | - Amy Lanctot
- Department of BiologyUniversity of WashingtonSeattleWashington
| | - Román Ramos
- Department of BiologyUniversity of WashingtonSeattleWashington
| | - Hannah Klaeser
- Department of BiologyWhitman CollegeWalla WallaWashington
| | | | | | | | - R. Clay Wright
- Biological Systems EngineeringVirginia TechBlacksburgVirginia
| |
Collapse
|
21
|
Ban X, Lahiri P, Dhoble AS, Li D, Gu Z, Li C, Cheng L, Hong Y, Li Z, Kaustubh B. Evolutionary Stability of Salt Bridges Hints Its Contribution to Stability of Proteins. Comput Struct Biotechnol J 2019; 17:895-903. [PMID: 31333816 PMCID: PMC6620738 DOI: 10.1016/j.csbj.2019.06.022] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 06/19/2019] [Accepted: 06/20/2019] [Indexed: 11/18/2022] Open
Abstract
The contribution of newly designed salt bridges to protein stabilization remains controversial even today. In order to solve this problem, we investigated salt bridges from two aspects: spatial distribution and evolutionary characteristics of salt bridges. Firstly, we analyzed spatial distribution of salt bridges in proteins, elucidating the basic requirements of forming salt bridges. Then, from an evolutionary point of view, the evolutionary characteristics of salt bridges as well as their neighboring residues were investigated in our study. The results demonstrate that charged residues appear more frequently than other neutral residues at certain positions of sequence even under evolutionary pressure, which are able to form electrostatic interactions that could increase the evolutionary stability of corresponding amino acid regions, enhancing their importance to stability of proteins. As a corollary, we conjectured that the newly designed salt bridges with more contribution to proteins, not only, are qualified spatial distribution of salt bridges, but also, are needed to further increase the evolutionary stability of corresponding amino acid regions. Based on analysis, the 8 mutations were accordingly constructed in the 1,4-α-glucan branching enzyme (EC 2.4.1.18, GBE) from Geobacillus thermoglucosidans STB02, of which 7 mutations improved thermostability of GBE. The enhanced thermostability of 7 mutations might be a result of additional salt bridges on residue positions that at least one of amino acids positions is conservative, improving their contribution of stabilization to proteins.
Collapse
Affiliation(s)
- Xiaofeng Ban
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Pratik Lahiri
- Department of Agricultural and Biological Engineering, University of Illinois at Urbana-Champaign, IL-61801, USA
| | - Abhishek S. Dhoble
- Department of Agricultural and Biological Engineering, University of Illinois at Urbana-Champaign, IL-61801, USA
| | - Dan Li
- The Second Military Medical University, Shanghai, China
| | - Zhengbiao Gu
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- Synergetic Innovation Center of Food Safety and Nutrition, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Caiming Li
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Li Cheng
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Yan Hong
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Zhaofeng Li
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- Synergetic Innovation Center of Food Safety and Nutrition, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Bhalerao Kaustubh
- Department of Agricultural and Biological Engineering, University of Illinois at Urbana-Champaign, IL-61801, USA
| |
Collapse
|
22
|
Feltes BC, Grisci BI, Poloni JDF, Dorn M. Perspectives and applications of machine learning for evolutionary developmental biology. Mol Omics 2018; 14:289-306. [PMID: 30168572 DOI: 10.1039/c8mo00111a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Evolutionary Developmental Biology (Evo-Devo) is an ever-expanding field that aims to understand how development was modulated by the evolutionary process. In this sense, "omic" studies emerged as a powerful ally to unravel the molecular mechanisms underlying development. In this scenario, bioinformatics tools become necessary to analyze the growing amount of information. Among computational approaches, machine learning stands out as a promising field to generate knowledge and trace new research perspectives for bioinformatics. In this review, we aim to expose the current advances of machine learning applied to evolution and development. We draw clear perspectives and argue how evolution impacted machine learning techniques.
Collapse
Affiliation(s)
- Bruno César Feltes
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil.
| | | | | | | |
Collapse
|
23
|
Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018; 14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open
Abstract
The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures. Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.
Collapse
Affiliation(s)
- Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- Harris School of Public Policy and Center for Data Science and Public Policy, University of Chicago, Chicago, IL, United States of America
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
24
|
Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018; 19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. Results In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. Conclusions Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49.
Collapse
Affiliation(s)
- Min Han
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yifan Song
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Jiaqiang Qian
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangsu, 211816, Nanjing, People's Republic of China.
| |
Collapse
|
25
|
Karasev DA, Veselovsky AV, Lagunin AA, Filimonov DA, Sobolev BN. Determination of Amino Acid Residues Responsible for Specific Interaction of Protein Kinases with Small Molecule Inhibitors. Mol Biol 2018. [DOI: 10.1134/s002689331802005x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
26
|
Neuwald AF, Aravind L, Altschul SF. Inferring joint sequence-structural determinants of protein functional specificity. eLife 2018; 7. [PMID: 29336305 PMCID: PMC5770160 DOI: 10.7554/elife.29880] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 12/22/2017] [Indexed: 01/05/2023] Open
Abstract
Residues responsible for allostery, cooperativity, and other subtle but functionally important interactions remain difficult to detect. To aid such detection, we employ statistical inference based on the assumption that residues distinguishing a protein subgroup from evolutionarily divergent subgroups often constitute an interacting functional network. We identify such networks with the aid of two measures of statistical significance. One measure aids identification of divergent subgroups based on distinguishing residue patterns. For each subgroup, a second measure identifies structural interactions involving pattern residues. Such interactions are derived either from atomic coordinates or from Direct Coupling Analysis scores, used as surrogates for structural distances. Applying this approach to N-acetyltransferases, P-loop GTPases, RNA helicases, synaptojanin-superfamily phosphatases and nucleases, and thymine/uracil DNA glycosylases yielded results congruent with biochemical understanding of these proteins, and also revealed striking sequence-structural features overlooked by other methods. These and similar analyses can aid the design of drugs targeting allosteric sites.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, United States.,Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, United States
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, United States
| | - Stephen F Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, United States
| |
Collapse
|
27
|
Gallion J, Koire A, Katsonis P, Schoenegge A, Bouvier M, Lichtarge O. Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling. Hum Mutat 2017; 38:569-580. [PMID: 28230923 PMCID: PMC5516182 DOI: 10.1002/humu.23193] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 01/25/2017] [Accepted: 02/04/2017] [Indexed: 11/11/2022]
Abstract
Computational prediction yields efficient and scalable initial assessments of how variants of unknown significance may affect human health. However, when discrepancies between these predictions and direct experimental measurements of functional impact arise, inaccurate computational predictions are frequently assumed as the source. Here, we present a methodological analysis indicating that shortcomings in both computational and biological data can contribute to these disagreements. We demonstrate that incomplete assaying of multifunctional proteins can affect the strength of correlations between prediction and experiments; a variant's full impact on function is better quantified by considering multiple assays that probe an ensemble of protein functions. Additionally, many variants predictions are sensitive to protein alignment construction and can be customized to maximize relevance of predictions to a specific experimental question. We conclude that inconsistencies between computation and experiment can often be attributed to the fact that they do not test identical hypotheses. Aligning the design of the computational input with the design of the experimental output will require cooperation between computational and biological scientists, but will also lead to improved estimations of computational prediction accuracy and a better understanding of the genotype–phenotype relationship.
Collapse
Affiliation(s)
- Jonathan Gallion
- Program in Structural and Computational Biology and Molecular BiophysicsBaylor College of MedicineHoustonTexas
| | - Amanda Koire
- Program in Structural and Computational Biology and Molecular BiophysicsBaylor College of MedicineHoustonTexas
| | - Panagiotis Katsonis
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTexas
| | - Anne‐Marie Schoenegge
- Department of BiochemistryInstitute for Research in Immunology and CancerUniversité de MontrealQuebecCanada
| | - Michel Bouvier
- Department of BiochemistryInstitute for Research in Immunology and CancerUniversité de MontrealQuebecCanada
| | - Olivier Lichtarge
- Program in Structural and Computational Biology and Molecular BiophysicsBaylor College of MedicineHoustonTexas
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTexas
| |
Collapse
|
28
|
Neuwald AF, Altschul SF. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations. PLoS Comput Biol 2016; 12:e1005294. [PMID: 28002465 PMCID: PMC5225019 DOI: 10.1371/journal.pcbi.1005294] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 01/10/2017] [Accepted: 12/08/2016] [Indexed: 11/25/2022] Open
Abstract
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu). Protein sequence data, when gathered in great quantity, contain important but implicit biological information manifest as statistical correlations. Here we describe an approach to access this information by comprehensively modeling and characterizing the distribution of sequences belonging to a major protein superfamily. This approach takes as input a large set of unaligned sequences belonging to the superfamily. By applying the minimum description length principle, it seeks the statistical model that best explains the sequences while avoiding over-fitting the data. It concurrently aligns the sequences and, to model evolutionary divergence, partitions them into subgroups that are hierarchically-arranged based upon correlated residue patterns. Auxiliary routines create PyMOL scripts to visualize the locations of correlated residues within available structures. Because these correlations likely arise from structural and biochemical constraints, they can help elucidate protein properties important for functional specificity. Comparing and contrasting sequence and structural features in this way may therefore suggest, in the light of published studies, plausible biological hypotheses for experimental investigation. We illustrate this approach with N-acetyltransferases.
Collapse
Affiliation(s)
- Andrew F. Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, BioPark II, Room 617, Baltimore, MD, United States of America
- * E-mail:
| | - Stephen F. Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America
| |
Collapse
|
29
|
Gallion J, Wilkins AD, Lichtarge O. HUMAN KINASES DISPLAY MUTATIONAL HOTSPOTS AT COGNATE POSITIONS WITHIN CANCER. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 22:414-425. [PMID: 27896994 DOI: 10.1142/9789813207813_0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The discovery of driver genes is a major pursuit of cancer genomics, usually based on observing the same mutation in different patients. But the heterogeneity of cancer pathways plus the high background mutational frequency of tumor cells often cloud the distinction between less frequent drivers and innocent passenger mutations. Here, to overcome these disadvantages, we grouped together mutations from close kinase paralogs under the hypothesis that cognate mutations may functionally favor cancer cells in similar ways. Indeed, we find that kinase paralogs often bear mutations to the same substituted amino acid at the same aligned positions and with a large predicted Evolutionary Action. Functionally, these high Evolutionary Action, non-random mutations affect known kinase motifs, but strikingly, they do so differently among different kinase types and cancers, consistent with differences in selective pressures. Taken together, these results suggest that cancer pathways may flexibly distribute a dependence on a given functional mutation among multiple close kinase paralogs. The recognition of this "mutational delocalization" of cancer drivers among groups of paralogs is a new phenomena that may help better identify relevant mechanisms and therefore eventually guide personalized therapy.
Collapse
Affiliation(s)
- Jonathan Gallion
- Structural Computational Biology and Molecular Biophysics, Baylor College of Medicine, One Baylor Plaza Houston, TX, 77030, USA†The authors gratefully acknowledge support from the National Institutes of Health (GM066099 and GM079656), from the National Science Foundation (DBI-1356569), and from DARPA (N66001-15-C-4042),
| | | | | |
Collapse
|
30
|
Protein stabilization improves STAT3 function in autosomal dominant hyper-IgE syndrome. Blood 2016; 128:3061-3072. [PMID: 27799162 DOI: 10.1182/blood-2016-02-702373] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 10/19/2016] [Indexed: 12/17/2022] Open
Abstract
Autosomal dominant hyper-IgE syndrome (AD-HIES) is caused by dominant-negative mutations in STAT3; however, the molecular basis for mutant STAT3 allele dysfunction is unclear and treatment remains supportive. We hypothesized that AD-HIES mutations decrease STAT3 protein stability and that mutant STAT3 activity can be improved by agents that increase chaperone protein activity. We used computer modeling to characterize the effect of STAT3 mutations on protein stability. We measured STAT3 protein half-life (t1/2) and determined levels of STAT3 phosphorylated on tyrosine (Y) 705 (pY-STAT3) and mRNA levels of STAT3 gene targets in Epstein-Barr virus-transformed B (EBV) cells, human peripheral blood mononuclear cells (PBMCs), and mouse splenocytes incubated without or with chaperone protein modulators-HSF1A, a small-molecule TRiC modulator, or geranylgeranylacetone (GGA), a drug that upregulates heat shock protein (HSP) 70 and HSP90. Computer modeling predicted that 81% of AD-HIES mutations are destabilizing. STAT3 protein t1/2 in EBV cells from AD-HIES patients with destabilizing STAT3 mutations was markedly reduced. Treatment of EBV cells containing destabilizing STAT3 mutations with either HSF1A or GGA normalized STAT3 t1/2, increased pY-STAT3 levels, and increased mRNA levels of STAT3 target genes up to 79% of control. In addition, treatment of human PBMCs or mouse splenocytes containing destabilizing STAT3 mutations with either HSF1A or GGA increased levels of cytokine-activated pY-STAT3 within human CD4+ and CD8+ T cells and numbers of IL-17-producing CD4+ mouse splenocytes, respectively. Thus, most AD-HIES STAT3 mutations are destabilizing; agents that modulate chaperone protein function improve STAT3 stability and activity in T cells and may provide a specific treatment.
Collapse
|
31
|
Farhoodi R, Akbal-Delibas B, Haspel N. Machine Learning Approaches for Predicting Protein Complex Similarity. J Comput Biol 2016; 24:40-51. [PMID: 27748625 DOI: 10.1089/cmb.2016.0137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Discriminating native-like structures from false positives with high accuracy is one of the biggest challenges in protein-protein docking. While there is an agreement on the existence of a relationship between various favorable intermolecular interactions (e.g., Van der Waals, electrostatic, and desolvation forces) and the similarity of a conformation to its native structure, the precise nature of this relationship is not known. Existing protein-protein docking methods typically formulate this relationship as a weighted sum of selected terms and calibrate their weights by using a training set to evaluate and rank candidate complexes. Despite improvements in the predictive power of recent docking methods, producing a large number of false positives by even state-of-the-art methods often leads to failure in predicting the correct binding of many complexes. With the aid of machine learning methods, we tested several approaches that not only rank candidate structures relative to each other but also predict how similar each candidate is to the native conformation. We trained a two-layer neural network, a multilayer neural network, and a network of Restricted Boltzmann Machines against extensive data sets of unbound complexes generated by RosettaDock and PyDock. We validated these methods with a set of refinement candidate structures. We were able to predict the root mean squared deviations (RMSDs) of protein complexes with a very small, often less than 1.5 Å, error margin when trained with structures that have RMSD values of up to 7 Å. In our most recent experiments with the protein samples having RMSD values up to 27 Å, the average prediction error was still relatively small, attesting to the potential of our approach in predicting the correct binding of protein-protein complexes.
Collapse
Affiliation(s)
- Roshanak Farhoodi
- 1 Department of Computer Science, University of Massachusetts Boston , Boston, Massachusetts
| | - Bahar Akbal-Delibas
- 2 Department of Computer Engineering, Kadir Has University , Istanbul, Turkey
| | - Nurit Haspel
- 1 Department of Computer Science, University of Massachusetts Boston , Boston, Massachusetts
| |
Collapse
|
32
|
Akbal-Delibas B, Pomplun M, Haspel N. Accurate Prediction of Docked Protein Structure Similarity. J Comput Biol 2016; 22:892-904. [PMID: 26335807 DOI: 10.1089/cmb.2015.0114] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
One of the major challenges for protein-protein docking methods is to accurately discriminate nativelike structures. The protein docking community agrees on the existence of a relationship between various favorable intermolecular interactions (e.g. Van der Waals, electrostatic, desolvation forces, etc.) and the similarity of a conformation to its native structure. Different docking algorithms often formulate this relationship as a weighted sum of selected terms and calibrate their weights against specific training data to evaluate and rank candidate structures. However, the exact form of this relationship is unknown and the accuracy of such methods is impaired by the pervasiveness of false positives. Unlike the conventional scoring functions, we propose a novel machine learning approach that not only ranks the candidate structures relative to each other but also indicates how similar each candidate is to the native conformation. We trained the AccuRMSD neural network with an extensive dataset using the back-propagation learning algorithm. Our method achieved predicting RMSDs of unbound docked complexes with 0.4Å error margin.
Collapse
Affiliation(s)
- Bahar Akbal-Delibas
- Department of Computer Science, University of Massachusetts, Boston , Massachusetts
| | - Marc Pomplun
- Department of Computer Science, University of Massachusetts, Boston , Massachusetts
| | - Nurit Haspel
- Department of Computer Science, University of Massachusetts, Boston , Massachusetts
| |
Collapse
|
33
|
Akbal-Delibas B, Farhoodi R, Pomplun M, Haspel N. Accurate refinement of docked protein complexes using evolutionary information and deep learning. J Bioinform Comput Biol 2015; 14:1642002. [PMID: 26846813 DOI: 10.1142/s0219720016420026] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
One of the major challenges for protein docking methods is to accurately discriminate native-like structures from false positives. Docking methods are often inaccurate and the results have to be refined and re-ranked to obtain native-like complexes and remove outliers. In a previous work, we introduced AccuRefiner, a machine learning based tool for refining protein-protein complexes. Given a docked complex, the refinement tool produces a small set of refined versions of the input complex, with lower root-mean-square-deviation (RMSD) of atomic positions with respect to the native structure. The method employs a unique ranking tool that accurately predicts the RMSD of docked complexes with respect to the native structure. In this work, we use a deep learning network with a similar set of features and five layers. We show that a properly trained deep learning network can accurately predict the RMSD of a docked complex with 1.40 Å error margin on average, by approximating the complex relationship between a wide set of scoring function terms and the RMSD of a docked structure. The network was trained on 35000 unbound docking complexes generated by RosettaDock. We tested our method on 25 different putative docked complexes produced also by RosettaDock for five proteins that were not included in the training data. The results demonstrate that the high accuracy of the ranking tool enables AccuRefiner to consistently choose the refinement candidates with lower RMSD values compared to the coarsely docked input structures.
Collapse
Affiliation(s)
- Bahar Akbal-Delibas
- 1 Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Roshanak Farhoodi
- 1 Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Marc Pomplun
- 1 Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Nurit Haspel
- 1 Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| |
Collapse
|
34
|
Lua RC, Wilson SJ, Konecki DM, Wilkins AD, Venner E, Morgan DH, Lichtarge O. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures. Nucleic Acids Res 2015; 44:D308-12. [PMID: 26590254 PMCID: PMC4702906 DOI: 10.1093/nar/gkv1279] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 11/02/2015] [Indexed: 02/07/2023] Open
Abstract
The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/.
Collapse
Affiliation(s)
- Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Stephen J Wilson
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Daniel M Konecki
- Department of Structural and Computational Biology and Molecular Biophysics, Houston, TX 77030, USA
| | - Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric Venner
- Department of Structural and Computational Biology and Molecular Biophysics, Houston, TX 77030, USA
| | - Daniel H Morgan
- Department of Structural and Computational Biology and Molecular Biophysics, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA Department of Structural and Computational Biology and Molecular Biophysics, Houston, TX 77030, USA Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA Department of Pharmacology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
35
|
Assessing the genetic diversity of Cu resistance in mine tailings through high-throughput recovery of full-length copA genes. Sci Rep 2015; 5:13258. [PMID: 26286020 PMCID: PMC4541151 DOI: 10.1038/srep13258] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 06/16/2015] [Indexed: 11/17/2022] Open
Abstract
Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses.
Collapse
|
36
|
Elucidation of G-protein and β-arrestin functional selectivity at the dopamine D2 receptor. Proc Natl Acad Sci U S A 2015; 112:7097-102. [PMID: 25964346 DOI: 10.1073/pnas.1502742112] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The neuromodulator dopamine signals through the dopamine D2 receptor (D2R) to modulate central nervous system functions through diverse signal transduction pathways. D2R is a prominent target for drug treatments in disorders where dopamine function is aberrant, such as schizophrenia. D2R signals through distinct G-protein and β-arrestin pathways, and drugs that are functionally selective for these pathways could have improved therapeutic potential. How D2R signals through the two pathways is still not well defined, and efforts to elucidate these pathways have been hampered by the lack of adequate tools for assessing the contribution of each pathway independently. To address this, Evolutionary Trace was used to produce D2R mutants with strongly biased signal transduction for either the G-protein or β-arrestin interactions. These mutants were used to resolve the role of G proteins and β-arrestins in D2R signaling assays. The results show that D2R interactions with the two downstream effectors are dissociable and that G-protein signaling accounts for D2R canonical MAP kinase signaling cascade activation, whereas β-arrestin only activates elements of this cascade under certain conditions. Nevertheless, when expressed in mice in GABAergic medium spiny neurons of the striatum, the β-arrestin-biased D2R caused a significant potentiation of amphetamine-induced locomotion, whereas the G protein-biased D2R had minimal effects. The mutant receptors generated here provide a molecular tool set that should enable a better definition of the individual roles of G-protein and β-arrestin signaling pathways in D2R pharmacology, neurobiology, and associated pathologies.
Collapse
|
37
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
38
|
Donegan RK, Hill SE, Freeman DM, Nguyen E, Orwig SD, Turnage KC, Lieberman RL. Structural basis for misfolding in myocilin-associated glaucoma. Hum Mol Genet 2014; 24:2111-24. [PMID: 25524706 DOI: 10.1093/hmg/ddu730] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Olfactomedin (OLF) domain-containing proteins play roles in fundamental cellular processes and have been implicated in disorders ranging from glaucoma, cancers and inflammatory bowel disorder, to attention deficit disorder and childhood obesity. We solved crystal structures of the OLF domain of myocilin (myoc-OLF), the best studied such domain to date. Mutations in myoc-OLF are causative in the autosomal dominant inherited form of the prevalent ocular disorder glaucoma. The structures reveal a new addition to the small family of five-bladed β-propellers. Propellers are most well known for their ability to act as hubs for protein-protein interactions, a function that seems most likely for myoc-OLF, but they can also act as enzymes. A calcium ion, sodium ion and glycerol molecule were identified within a central hydrophilic cavity that is accessible via movements of surface loop residues. By mapping familial glaucoma-associated lesions onto the myoc-OLF structure, three regions sensitive to aggregation have been identified, with direct applicability to differentiating between neutral and disease-causing non-synonymous mutations documented in the human population worldwide. Evolutionary analysis mapped onto the myoc-OLF structure reveals conserved and divergent regions for possible overlapping and distinctive functional protein-protein or protein-ligand interactions across the broader OLF domain family. While deciphering the specific normal biological functions, ligands and binding partners for OLF domains will likely continue to be a challenging long-term experimental pursuit, atomic detail structural knowledge of myoc-OLF is a valuable guide for understanding the implications of glaucoma-associated mutations and will help focus future studies of this biomedically important domain family.
Collapse
Affiliation(s)
- Rebecca K Donegan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
| | - Shannon E Hill
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
| | - Dana M Freeman
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
| | - Elaine Nguyen
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
| | - Susan D Orwig
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
| | - Katherine C Turnage
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
| | - Raquel L Lieberman
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
| |
Collapse
|
39
|
Lua RC, Marciano DC, Katsonis P, Adikesavan AK, Wilkins AD, Lichtarge O. Prediction and redesign of protein-protein interactions. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:194-202. [PMID: 24878423 DOI: 10.1016/j.pbiomolbio.2014.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 05/02/2014] [Accepted: 05/17/2014] [Indexed: 12/14/2022]
Abstract
Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI.
Collapse
Affiliation(s)
- Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David C Marciano
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Anbu K Adikesavan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
40
|
Pelé J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. Comparative analysis of sequence covariation methods to mine evolutionary hubs: Examples from selected GPCR families. Proteins 2014; 82:2141-56. [DOI: 10.1002/prot.24570] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 03/11/2014] [Accepted: 03/19/2014] [Indexed: 01/26/2023]
Affiliation(s)
- Julien Pelé
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Matthieu Moreau
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Hervé Abdi
- The University of Texas at Dallas; School of Behavioral and Brain Sciences; Richardson, TX 75080-3021 USA
| | - Patrice Rodien
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
- Department of Endocrinology, Reference Centre for the pathologies of hormonal receptivity; Centre Hospitalier Universitaire of Angers; 4 rue Larrey 49933 Angers France
| | - Hélène Castel
- INSERM U982, Laboratory of Neuronal and Neuroendocrine Communication and Differentiation, DC2N; University of Rouen; 76821 Mont-Saint-Aignan France
| | - Marie Chabbert
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| |
Collapse
|
41
|
Young E, Zheng ZY, Wilkins AD, Jeong HT, Li M, Lichtarge O, Chang EC. Regulation of Ras localization and cell transformation by evolutionarily conserved palmitoyltransferases. Mol Cell Biol 2014; 34:374-85. [PMID: 24248599 PMCID: PMC3911504 DOI: 10.1128/mcb.01248-13] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Revised: 10/16/2013] [Accepted: 11/09/2013] [Indexed: 01/06/2023] Open
Abstract
Ras can act on the plasma membrane (PM) to mediate extracellular signaling and tumorigenesis. To identify key components controlling Ras PM localization, we performed an unbiased screen to seek Schizosaccharomyces pombe mutants with reduced PM Ras. Five mutants were found with mutations affecting the same gene, S. pombe erf2 (sp-erf2), encoding sp-Erf2, a palmitoyltransferase, with various activities. sp-Erf2 localizes to the trans-Golgi compartment, a process which is mediated by its third transmembrane domain and the Erf4 cofactor. In fission yeast, the human ortholog zDHHC9 rescues the phenotypes of sp-erf2 null cells. In contrast, expressing zDHHC14, another sp-Erf2-like human protein, did not rescue Ras1 mislocalization in these cells. Importantly, ZDHHC9 is widely overexpressed in cancers. Overexpressing ZDHHC9 promotes, while repressing it diminishes, Ras PM localization and transformation of mammalian cells. These data strongly demonstrate that sp-Erf2/zDHHC9 palmitoylates Ras proteins in a highly selective manner in the trans-Golgi compartment to facilitate PM targeting via the trans-Golgi network, a role that is most certainly critical for Ras-driven tumorigenesis.
Collapse
Affiliation(s)
- Evelin Young
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
| | - Ze-Yi Zheng
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
| | - Angela D. Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- CIBR Center for Computational and Integrative Biomedical Research, Baylor College of Medicine, Houston, Texas, USA
| | - Hee-Tae Jeong
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
| | - Min Li
- Department of Oncology, Nanjing Hospital of Traditional Chinese Medicine, Nanjing, Jiangsu, China
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- CIBR Center for Computational and Integrative Biomedical Research, Baylor College of Medicine, Houston, Texas, USA
| | - Eric C. Chang
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
42
|
Homan EP, Lietman C, Grafe I, Lennington J, Morello R, Napierala D, Jiang MM, Munivez EM, Dawson B, Bertin TK, Chen Y, Lua R, Lichtarge O, Hicks J, Weis MA, Eyre D, Lee BHL. Differential effects of collagen prolyl 3-hydroxylation on skeletal tissues. PLoS Genet 2014; 10:e1004121. [PMID: 24465224 PMCID: PMC3900401 DOI: 10.1371/journal.pgen.1004121] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 12/04/2013] [Indexed: 02/04/2023] Open
Abstract
Mutations in the genes encoding cartilage associated protein (CRTAP) and prolyl 3-hydroxylase 1 (P3H1 encoded by LEPRE1) were the first identified causes of recessive Osteogenesis Imperfecta (OI). These proteins, together with cyclophilin B (encoded by PPIB), form a complex that 3-hydroxylates a single proline residue on the α1(I) chain (Pro986) and has cis/trans isomerase (PPIase) activity essential for proper collagen folding. Recent data suggest that prolyl 3-hydroxylation of Pro986 is not required for the structural stability of collagen; however, the absence of this post-translational modification may disrupt protein-protein interactions integral for proper collagen folding and lead to collagen over-modification. P3H1 and CRTAP stabilize each other and absence of one results in degradation of the other. Hence, hypomorphic or loss of function mutations of either gene cause loss of the whole complex and its associated functions. The relative contribution of losing this complex's 3-hydroxylation versus PPIase and collagen chaperone activities to the phenotype of recessive OI is unknown. To distinguish between these functions, we generated knock-in mice carrying a single amino acid substitution in the catalytic site of P3h1 (Lepre1H662A). This substitution abolished P3h1 activity but retained ability to form a complex with Crtap and thus the collagen chaperone function. Knock-in mice showed absence of prolyl 3-hydroxylation at Pro986 of the α1(I) and α1(II) collagen chains but no significant over-modification at other collagen residues. They were normal in appearance, had no growth defects and normal cartilage growth plate histology but showed decreased trabecular bone mass. This new mouse model recapitulates elements of the bone phenotype of OI but not the cartilage and growth phenotypes caused by loss of the prolyl 3-hydroxylation complex. Our observations suggest differential tissue consequences due to selective inactivation of P3H1 hydroxylase activity versus complete ablation of the prolyl 3-hydroxylation complex. The prolyl 3-hydroxylase complex serves to hydroxylate a single residue in type I collagen and also serves as a collagen chaperone. The complex is comprised of prolyl 3-hydroxylase 1, cartilage associated protein, and cyclophilin B. Mutations have been identified in the genes encoding the complex members in patients with recessive Osteogenesis Imperfecta. Recent data suggest that prolyl 3-hydroxylation of collagen does not alter the stability of collagen but may rather mediate protein-protein interactions. Additionally, the collagen chaperoning function of the complex is an important rate limiting step in the modification of type I collagen. Irrespective of whether patients with mutations in the genes encoding the members of the prolyl 3-hydroxylase complex have hypomorphic or complete loss of function alleles, either circumstance leads to the loss of both functions of the prolyl 3-hydroxylase complex. Thus, it is unknown how collagen chaperoning and/or hydroxylation affect bone and cartilage homeostasis. In this study, we generated a mouse model lacking the prolyl 3-hydroxylation activity of the complex while maintaining the chaperoning function. We found that the hydroxylase mutant mice have normal cartilage and normal cortical bone but decreased trabecular bone, suggesting that there is a differential requirement for hydroxylation in different tissues.
Collapse
Affiliation(s)
- Erica P. Homan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Caressa Lietman
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Ingo Grafe
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Jennifer Lennington
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Roy Morello
- Department of Physiology and Biophysics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Dobrawa Napierala
- Department of Oral and Maxillofacial Surgery, School of Dentistry, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Ming-Ming Jiang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Howard Hughes Medical Institute, Baylor College of Medicine, Houston, Texas, United States of America
| | - Elda M. Munivez
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Brian Dawson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Howard Hughes Medical Institute, Baylor College of Medicine, Houston, Texas, United States of America
| | - Terry K. Bertin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Yuqing Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Howard Hughes Medical Institute, Baylor College of Medicine, Houston, Texas, United States of America
| | - Rhonald Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - John Hicks
- Department of Pathology, Texas Children's Hospital, Baylor College of Medicine, Houston, Texas, United States of America
| | - Mary Ann Weis
- Department of Orthopaedics and Sports Medicine, University of Washington, Seattle, Washington, United States of America
| | - David Eyre
- Department of Orthopaedics and Sports Medicine, University of Washington, Seattle, Washington, United States of America
| | - Brendan H. L. Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Howard Hughes Medical Institute, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
43
|
Erdin S, Venner E, Lisewski AM, Lichtarge O. Function prediction from networks of local evolutionary similarity in protein structure. BMC Bioinformatics 2013; 14 Suppl 3:S6. [PMID: 23514548 PMCID: PMC3584919 DOI: 10.1186/1471-2105-14-s3-s6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary. RESULTS Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy. CONCLUSIONS We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.
Collapse
Affiliation(s)
- Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Eric Venner
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| |
Collapse
|
44
|
Skolnick J, Zhou H, Gao M. Are predicted protein structures of any value for binding site prediction and virtual ligand screening? Curr Opin Struct Biol 2013; 23:191-7. [PMID: 23415854 DOI: 10.1016/j.sbi.2013.01.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 01/04/2013] [Accepted: 01/23/2013] [Indexed: 01/03/2023]
Abstract
The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ≈ 75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA.
| | | | | |
Collapse
|
45
|
Engel AS, Johnson LR, Porter ML. Arsenite oxidase gene diversity among Chloroflexi and Proteobacteria from El Tatio Geyser Field, Chile. FEMS Microbiol Ecol 2012; 83:745-56. [PMID: 23066664 DOI: 10.1111/1574-6941.12030] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Revised: 10/05/2012] [Accepted: 10/07/2012] [Indexed: 11/29/2022] Open
Abstract
Arsenic concentrations (450-600 μmol L(-1)) at the El Tatio Geyser Field in northern Chile are an order of magnitude greater than at other natural geothermal sites, making El Tatio an ideal location to investigate unique microbial diversity and metabolisms associated with the arsenic cycle in low sulfide, > 50 °C, and circumneutral pH waters. 16S rRNA gene and arsenite oxidase gene (aioA) diversities were evaluated from biofilms and microbial mats from two geyser-discharge stream transects. Chloroflexi was the most prevalent bacterial phylum at flow distances where arsenite was converted to arsenate, corresponding to roughly 60 °C. Among aioA-like gene sequences retrieved, most had homology to whole genomes of Chloroflexus aurantiacus, but others were homologous to alphaproteobacterial and undifferentiated beta- and gammaproteobacterial groups. No Deinococci, Thermus, Aquificales, or Chlorobi aioA-like genes were retrieved. The functional importance of amino acid sites was evaluated from evolutionary trace analyses of all retrieved aioA genes. Fifteen conserved residue sites identified across all phylogenetic groups highlight a conserved functional core, while six divergent sites demonstrate potential differences in electron transfer modes. This research expands the known distribution and diversity of arsenite oxidation in natural geothermal settings, and provides information about the evolutionary history of microbe-arsenic interactions.
Collapse
Affiliation(s)
- Annette Summers Engel
- Department of Earth and Planetary Sciences, University of Tennessee, Knoxville, TN, USA.
| | | | | |
Collapse
|
46
|
Bowen DM, Lewis JA, Lu W, Schein CH. Simplifying complex sequence information: a PCP-consensus protein binds antibodies against all four Dengue serotypes. Vaccine 2012; 30:6081-7. [PMID: 22863657 DOI: 10.1016/j.vaccine.2012.07.042] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Revised: 07/13/2012] [Accepted: 07/18/2012] [Indexed: 12/15/2022]
Abstract
Designing proteins that reflect the natural variability of a pathogen is essential for developing novel vaccines and drugs. Flaviviruses, including Dengue (DENV) and West Nile (WNV), evolve rapidly and can "escape" neutralizing monoclonal antibodies by mutation. Designing antigens that represent many distinct strains is important for DENV, where infection with a strain from one of the four serotypes may lead to severe hemorrhagic disease on subsequent infection with a strain from another serotype. Here, a DENV physicochemical property (PCP)-consensus sequence was derived from 671 unique sequences from the Flavitrack database. PCP-consensus proteins for domain 3 of the envelope protein (EdomIII) were expressed from synthetic genes in Escherichia coli. The ability of the purified consensus proteins to bind polyclonal antibodies generated in response to infection with strains from each of the four DENV serotypes was determined. The initial consensus protein bound antibodies from DENV-1-3 in ELISA and Western blot assays. This sequence was altered in 3 steps to incorporate regions of maximum variability, identified as significant changes in the PCPs, characteristic of DENV-4 strains. The final protein was recognized by antibodies against all four serotypes. Two amino acids essential for efficient binding to all DENV antibodies are part of a discontinuous epitope previously defined for a neutralizing monoclonal antibody. The PCP-consensus method can significantly reduce the number of experiments required to define a multivalent antigen, which is particularly important when dealing with pathogens that must be tested at higher biosafety levels.
Collapse
Affiliation(s)
- David M Bowen
- Computational Biology, Sealy Center for Structural Biology and Molecular Biophysics, Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX 77555-0857, United States
| | | | | | | |
Collapse
|