51
|
Correa Marrero M, Immink RGH, de Ridder D, van Dijk ADJ. Improved inference of intermolecular contacts through protein-protein interaction prediction using coevolutionary analysis. Bioinformatics 2020; 35:2036-2042. [PMID: 30398547 DOI: 10.1093/bioinformatics/bty924] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 10/11/2018] [Accepted: 11/05/2018] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Predicting residue-residue contacts between interacting proteins is an important problem in bioinformatics. The growing wealth of sequence data can be used to infer these contacts through correlated mutation analysis on multiple sequence alignments of interacting homologs of the proteins of interest. This requires correct identification of pairs of interacting proteins for many species, in order to avoid introducing noise (i.e. non-interacting sequences) in the analysis that will decrease predictive performance. RESULTS We have designed Ouroboros, a novel algorithm to reduce such noise in intermolecular contact prediction. Our method iterates between weighting proteins according to how likely they are to interact based on the correlated mutations signal, and predicting correlated mutations based on the weighted sequence alignment. We show that this approach accurately discriminates between protein interaction versus non-interaction and simultaneously improves the prediction of intermolecular contact residues compared to a naive application of correlated mutation analysis. This requires no training labels concerning interactions or contacts. Furthermore, the method relaxes the assumption of one-to-one interaction of previous approaches, allowing for the study of many-to-many interactions. AVAILABILITY AND IMPLEMENTATION Source code and test data are available at www.bif.wur.nl/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Richard G H Immink
- Laboratory of Molecular Biology, Department of Plant Sciences.,Bioscience, Wageningen Plant Research
| | | | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences.,Bioscience, Wageningen Plant Research.,Biometris, Department of Plant Sciences, Wageningen University & Research, Wageningen PB, The Netherlands
| |
Collapse
|
52
|
Chandonia JM, Fox NK, Brenner SE. SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database. Nucleic Acids Res 2020; 47:D475-D481. [PMID: 30500919 PMCID: PMC6323910 DOI: 10.1093/nar/gky1134] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 11/27/2018] [Indexed: 11/12/2022] Open
Abstract
The SCOPe (Structural Classification of Proteins—extended, https://scop.berkeley.edu) database hierarchically classifies domains from the majority of proteins of known structure according to their structural and evolutionary relationships. SCOPe also incorporates and updates the ASTRAL compendium, which provides multiple databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. Protein structures are classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.07, we have focused our manual curation efforts on larger protein structures, including the spliceosome, proteasome and RNA polymerase I, as well as many other Pfam families that had not previously been classified. Domains from these large protein complexes are distinctive in several ways: novel non-globular folds are more common, and domains from previously observed protein families often have N- or C-terminal extensions that were disordered or not present in previous structures. The current monthly release update, SCOPe 2.07–2018-10–18, classifies 90 992 PDB entries (about two thirds of PDB entries).
Collapse
Affiliation(s)
- John-Marc Chandonia
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Naomi K Fox
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Steven E Brenner
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
53
|
Zaman AB, Kamranfar P, Domeniconi C, Shehu A. Reducing Ensembles of Protein Tertiary Structures Generated De Novo via Clustering. Molecules 2020; 25:E2228. [PMID: 32397410 PMCID: PMC7248879 DOI: 10.3390/molecules25092228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 04/21/2020] [Accepted: 04/28/2020] [Indexed: 11/16/2022] Open
Abstract
Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. In this paper, we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. Evaluations are related on both benchmark and CASP target proteins. Structure ensembles subjected to the proposed approach and the source code of the proposed approach are publicly-available at the links provided in Section 1.
Collapse
Affiliation(s)
- Ahmed Bin Zaman
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Parastoo Kamranfar
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
- Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
54
|
Ferruz N, Lobos F, Lemm D, Toledo-Patino S, Farías-Rico JA, Schmidt S, Höcker B. Identification and Analysis of Natural Building Blocks for Evolution-Guided Fragment-Based Protein Design. J Mol Biol 2020; 432:3898-3914. [PMID: 32330481 PMCID: PMC7322520 DOI: 10.1016/j.jmb.2020.04.013] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/15/2022]
Abstract
Natural evolution has generated an impressively diverse protein universe via duplication and recombination from a set of protein fragments that served as building blocks. The application of these concepts to the design of new proteins using subdomain-sized fragments from different folds has proven to be experimentally successful. To better understand how evolution has shaped our protein universe, we performed an all-against-all comparison of protein domains representing all naturally existing folds and identified conserved homologous protein fragments. Overall, we found more than 1000 protein fragments of various lengths among different folds through similarity network analysis. These fragments are present in very different protein environments and represent versatile building blocks for protein design. These data are available in our web server called F(old P)uzzle (fuzzle.uni-bayreuth.de), which allows to individually filter the dataset and create customized networks for folds of interest. We believe that our results serve as an invaluable resource for structural and evolutionary biologists and as raw material for the design of custom-made proteins.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Francisco Lobos
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Dominik Lemm
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Saacnicteh Toledo-Patino
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | | | - Steffen Schmidt
- Max Planck Institute for Developmental Biology, Tübingen, Germany; Computational Biochemistry, University of Bayreuth, Bayreuth, Germany.
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany.
| |
Collapse
|
55
|
Uversky VN. Torches, Candles, Lamps, Lanterns, Flashlights, Spotlights, Night Vision Goggles… You Need Them All to See in Darkness. Proteomics 2020; 19:e1900085. [PMID: 30829430 DOI: 10.1002/pmic.201900085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Articles assembled in the second part of this Special Issue describe some experimental and computational approaches for the structural and functional characterization of intrinsically disordered proteins. Since these tools represent specialized gear for the focused analysis of various aspects of dark proteome, they can be viewed as torches, candles, lamps, lanterns, flashlights, spotlights, night vision goggles, and other means needed to see in darkness.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow, 142290, Russia
| |
Collapse
|
56
|
Dohmen E, Klasberg S, Bornberg-Bauer E, Perrey S, Kemena C. The modular nature of protein evolution: domain rearrangement rates across eukaryotic life. BMC Evol Biol 2020; 20:30. [PMID: 32059645 PMCID: PMC7023805 DOI: 10.1186/s12862-020-1591-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 01/31/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Modularity is important for evolutionary innovation. The recombination of existing units to form larger complexes with new functionalities spares the need to create novel elements from scratch. In proteins, this principle can be observed at the level of protein domains, functional subunits which are regularly rearranged to acquire new functions. RESULTS In this study we analyse the mechanisms leading to new domain arrangements in five major eukaryotic clades (vertebrates, insects, fungi, monocots and eudicots) at unprecedented depth and breadth. This allows, for the first time, to directly compare rates of rearrangements between different clades and identify both lineage specific and general patterns of evolution in the context of domain rearrangements. We analyse arrangement changes along phylogenetic trees by reconstructing ancestral domain content in combination with feasible single step events, such as fusion or fission. Using this approach we explain up to 70% of all rearrangements by tracing them back to their precursors. We find that rates in general and the ratio between these rates for a given clade in particular, are highly consistent across all clades. In agreement with previous studies, fusions are the most frequent event leading to new domain arrangements. A lineage specific pattern in fungi reveals exceptionally high loss rates compared to other clades, supporting recent studies highlighting the importance of loss for evolutionary innovation. Furthermore, our methodology allows us to link domain emergences at specific nodes in the phylogenetic tree to important functional developments, such as the origin of hair in mammals. CONCLUSIONS Our results demonstrate that domain rearrangements are based on a canonical set of mutational events with rates which lie within a relatively narrow and consistent range. In addition, gained knowledge about these rates provides a basis for advanced domain-based methodologies for phylogenetics and homology analysis which complement current sequence-based methods.
Collapse
Affiliation(s)
- Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany.,Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, Recklinghausen, 45665, Germany
| | - Steffen Klasberg
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany
| | - Sören Perrey
- Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, Recklinghausen, 45665, Germany
| | - Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany.
| |
Collapse
|
57
|
Wan X, Tan X. A study on separation of the protein structural types in amino acid sequence feature spaces. PLoS One 2019; 14:e0226768. [PMID: 31869390 PMCID: PMC6927603 DOI: 10.1371/journal.pone.0226768] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 12/03/2019] [Indexed: 11/23/2022] Open
Abstract
Proteins are diverse with their sequences, structures and functions, it is important to study the relations between the sequences, structures and functions. In this paper, we conduct a study that surveying the relations between the protein sequences and their structures. In this study, we use the natural vector (NV) and the averaged property factor (APF) features to represent protein sequences into feature vectors, and use the multi-class MSE and the convex hull methods to separate proteins of different structural classes into different regions. We found that proteins from different structural classes are separable by hyper-planes and convex hulls in the natural vector feature space, where the feature vectors of different structural classes are separated into disjoint regions or convex hulls in the high dimensional feature spaces. The natural vector outperforms the averaged property factor method in identifying the structures, and the convex hull method outperforms the multi-class MSE in separating the feature points. These outcomes convince the strong connections between the protein sequences and their structures, and may imply that the amino acids composition and their sequence arrangements represented by the natural vectors have greater influences to the structures than the averaged physical property factors of the amino acids.
Collapse
Affiliation(s)
- Xiaogeng Wan
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, China
- * E-mail:
| | - Xinying Tan
- The Fourth Center of PLA General Hospital, Beijing, China
| |
Collapse
|
58
|
V K MA, Chandrasekaran VM, Pandurangan S. Protein Domain Level Cancer Drug Targets in the Network of MAPK Pathways. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:2057-2065. [PMID: 29993692 DOI: 10.1109/tcbb.2018.2829507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Proteins in the MAPK pathways considered as potential drug targets for cancer treatment. Pathways along with the cross-talks increase their scope to view them as a network of MAPK pathways. Side effect causing targeted domains act as a proxy for drug targets due to its structural similarity and frequent reuse of their variants. We proposed to identify non-repeatable protein domains as the drug targets to disrupt the signal transduction than targeting the whole protein. Network based approach is used to understand the contribution of 52 domains in non-hub, non-essential, and intra-pathway cancerous nodes and to identify potential drug target domains. 34 distinct domains in the cancerous proteins are playing vital roles in making cancer as a complex disease and pose challenges to identify potential drug targets. Distribution of domain families follows the power law in the network. Single promiscuous domains are contributing to the formation of hubs like Pkinease, Pkinease Tyr, and Ras. Hub nodes are positively correlated with the domain coverage and targeting them would disrupt functional properties of the proteins. EIF 4EBP, alpha Kinase, Sel1, ROKNT, and KH 1 are the domains identified as potential domain targets for the disruption of the signaling mechanism involved in cancer.
Collapse
|
59
|
Lamiable A, Bitard-Feildel T, Rebehmed J, Quintus F, Schoentgen F, Mornon JP, Callebaut I. A topology-based investigation of protein interaction sites using Hydrophobic Cluster Analysis. Biochimie 2019; 167:68-80. [PMID: 31525399 DOI: 10.1016/j.biochi.2019.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 09/11/2019] [Indexed: 01/20/2023]
Abstract
Hydrophobic clusters, as defined by Hydrophobic Cluster Analysis (HCA), are conditioned binary patterns, made of hydrophobic and non-hydrophobic positions, whose limits fit well those of regular secondary structures. They were proved to be useful for predicting secondary structures in proteins from the only information of a single amino acid sequence and have permitted to assess, in a comprehensive way, the leading role of binary patterns in secondary structure preference towards a particular state. Here, we considered the available experimental 3D structures of protein globular domains to enlarge our previously reported hydrophobic cluster database (HCDB), almost doubling the number of hydrophobic cluster species (each species being defined by a unique binary pattern) that represent the most frequent structural bricks encountered within protein globular domains. We then used this updated HCDB to show that the hydrophobic amino acids of discordant clusters, i.e. those less abundant clusters for which the observed secondary structure is in disagreement with the binary pattern preference of the species to which they belong, are more exposed to solvent and are more involved in protein interfaces than the hydrophobic amino acids of concordant clusters. As amino acid composition differs between concordant/discordant clusters, considering binary patterns may be used to gain novel insights into key features of protein globular domain cores and surfaces. It can also provide useful information on possible conformational plasticity, including disorder to order transitions.
Collapse
Affiliation(s)
- Alexis Lamiable
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Joseph Rebehmed
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France; Lebanese American University, Department of Computer Science and Mathematics, Beirut, Lebanon
| | - Flavien Quintus
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Françoise Schoentgen
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Jean-Paul Mornon
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France.
| |
Collapse
|
60
|
Uversky VN. Bringing Darkness to Light: Intrinsic Disorder as a Means to Dig into the Dark Proteome. Proteomics 2019; 18:e1800352. [PMID: 30334344 DOI: 10.1002/pmic.201800352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Moscow Region, Russia
| |
Collapse
|
61
|
Bertolani SJ, Siegel JB. A new benchmark illustrates that integration of geometric constraints inferred from enzyme reaction chemistry can increase enzyme active site modeling accuracy. PLoS One 2019; 14:e0214126. [PMID: 30947258 PMCID: PMC6448891 DOI: 10.1371/journal.pone.0214126] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 03/07/2019] [Indexed: 01/06/2023] Open
Abstract
Enzymes play a critical role in a wide array of industrial, medical, and research applications and with the recent explosion of genomic sequencing, we now have sequences for millions of enzymes for which there is no known structure. In order to utilize modern computational design tools for constructing inhibitors or engineering novel catalysts, the ability to accurately model enzymes is critical. A popular approach for modeling enzymes are comparative modeling techniques which can often accurately predict the global structural features. However, achieving atomic accuracy of an active site remains a challenge and is an issue when trying to utilize the molecular details for designing inhibitors or enhanced catalysts. Here we explore integrating knowledge about the required geometric orientation of conserved catalytic residues into the comparative modeling process in order to improve modeling accuracy. In order to investigate the utility of adding this information, we first carefully construct a benchmark set of reference structures to use. Consistent with previous findings, our benchmark demonstrates that the geometry between catalytic residues across an enzyme family is conserved and does not tend to deviate by more than 0.5Å. We then find that by integrating these geometric constraints during modeling, we can double the number of atomic level accuracy models (<1Å RMSD to the crystal structure ligand) within our benchmarking dataset, even for targets with templates as low as 20-30% sequence identity. Catalytic residues within an enzyme family are highly conserved and can often be readily identified through comparative sequence analysis to a known structure within the enzyme family. Therefore utilizing this readily available information has the potential to significantly improve drug design and enzyme engineering efforts for which there is no known structure for the enzyme of interest.
Collapse
Affiliation(s)
- Steve J. Bertolani
- Department of Chemistry, University of California Davis, Davis, California, United States of America
| | - Justin B. Siegel
- Department of Chemistry, University of California Davis, Davis, California, United States of America
- Department of Biochemistry and Molecular Medicine, University of California, Davis, Davis, California, United States of America
- Genome Center, University of California Davis, Davis, California, United States of America
| |
Collapse
|
62
|
Debiec KT, Whitley MJ, Koharudin LMI, Chong LT, Gronenborn AM. Integrating NMR, SAXS, and Atomistic Simulations: Structure and Dynamics of a Two-Domain Protein. Biophys J 2019; 114:839-855. [PMID: 29490245 DOI: 10.1016/j.bpj.2018.01.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 12/19/2017] [Accepted: 01/02/2018] [Indexed: 12/21/2022] Open
Abstract
Multidomain proteins with two or more independently folded functional domains are prevalent in nature. Whereas most multidomain proteins are linked linearly in sequence, roughly one-tenth possess domain insertions where a guest domain is implanted into a loop of a host domain, such that the two domains are connected by a pair of interdomain linkers. Here, we characterized the influence of the interdomain linkers on the structure and dynamics of a domain-insertion protein in which the guest LysM domain is inserted into a central loop of the host CVNH domain. Expanding upon our previous crystallographic and NMR studies, we applied SAXS in combination with NMR paramagnetic relaxation enhancement to construct a structural model of the overall two-domain system. Although the two domains have no fixed relative orientation, certain orientations were found to be preferred over others. We also assessed the accuracies of molecular mechanics force fields in modeling the structure and dynamics of tethered multidomain proteins by integrating our experimental results with microsecond-scale atomistic molecular dynamics simulations. In particular, our evaluation of two different combinations of the latest force fields and water models revealed that both combinations accurately reproduce certain structural and dynamical properties, but are inaccurate for others. Overall, our study illustrates the value of integrating experimental NMR and SAXS studies with long timescale atomistic simulations for characterizing structural ensembles of flexibly linked multidomain systems.
Collapse
Affiliation(s)
- Karl T Debiec
- Molecular Biophysics and Structural Biology Graduate Program, University of Pittsburgh and Carnegie Mellon University, Pittsburgh, Pennsylvania; Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania; Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Matthew J Whitley
- Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Leonardus M I Koharudin
- Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Lillian T Chong
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Angela M Gronenborn
- Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania.
| |
Collapse
|
63
|
Abstract
Genomes appear similar to natural language texts, and protein domains can be treated as analogs of words. To investigate the linguistic properties of genomes further, we calculated the complexity of the “protein languages” in all major branches of life and identified a nearly universal value of information gain associated with the transition from a random domain arrangement to the current protein domain architecture. An exploration of the evolutionary relationship of the protein languages identified the domain combinations that discriminate between the major branches of cellular life. We conclude that there exists a “quasi-universal grammar” of protein domains and that the nearly constant information gain we identified corresponds to the minimal complexity required to maintain a functional cell. From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, n-gram analysis, to probe the “proteome grammar”—that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of “protein languages” in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the n-grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a “quasi-universal grammar” underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of signal processing that is required to maintain a functioning cell.
Collapse
|
64
|
Abstract
Protein domains are reusable segments of proteins and play an important role in protein evolution. By combining the elements from a relatively small set of domains into unique arrangements, a large number of distinct proteins can be generated. Since domains often have specific functions, changes in their arrangement usually affect the overall protein function. Furthermore, domains are well amenable to computational representations, e.g., by Hidden Markov Models (HMMs), and these HMMs are widely represented in various databases. Therefore, domains can be efficiently used for proteomic analyses. Here, we describe how domains are annotated using different domain databases and then how to assess the annotation quality of proteomes. We next show how functional annotations of domains in large-scale data such as whole genomes or transcriptomes can be used to analyze molecular differences between species. Furthermore, we describe methods to analyze the changes in domain content of proteins which significantly helps to characterize and reconstruct the modular evolution of proteins. Altogether, domain-based methods offer a computationally highly effective approach to analyze large amounts of proteomic data in an evolutionary setting.
Collapse
Affiliation(s)
- Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| |
Collapse
|
65
|
Bitard‐Feildel T, Lamiable A, Mornon J, Callebaut I. Order in Disorder as Observed by the "Hydrophobic Cluster Analysis" of Protein Sequences. Proteomics 2018; 18:e1800054. [PMID: 30299594 PMCID: PMC7168002 DOI: 10.1002/pmic.201800054] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 08/29/2018] [Indexed: 12/17/2022]
Abstract
Hydrophobic cluster analysis (HCA) is an original approach for protein sequence analysis, which provides access to the foldable repertoire of the protein universe, including yet unannotated protein segments ("dark proteome"). Foldable segments correspond to ordered regions, as well as to intrinsically disordered regions (IDRs) undergoing disorder to order transitions. In this review, how HCA can be used to give insight into this last category of foldable segments is illustrated, with examples matching known 3D structures. After reviewing the HCA principles, examples of short foldable segments are given, which often contain short linear motifs, typically matching hydrophobic clusters. These segments become ordered upon contact with partners, with secondary structure preferences generally corresponding to those observed in the 3D structures within the complexes. Such small foldable segments are sometimes larger than the segments of known 3D structures, including flanking hydrophobic clusters that may be critical for interaction specificity or regulation, as well as intervening sequences allowing fuzziness. Cases of larger conditionally disordered domains are also presented, with lower density in hydrophobic clusters than well-folded globular domains or with exposed hydrophobic patches, which are stabilized by interaction with partners.
Collapse
Affiliation(s)
- Tristan Bitard‐Feildel
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
- Laboratoire de Biologie Computationnelle et Quantitative (LCQB)Institute of Biology Paris‐Seine (IBPS)Centre national de la recherche scientifique (CNRS)Sorbonne Université75005ParisFrance
| | - Alexis Lamiable
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| | - Jean‐Paul Mornon
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| | - Isabelle Callebaut
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| |
Collapse
|
66
|
Kulkarni P, Uversky VN. Intrinsically Disordered Proteins: The Dark Horse of the Dark Proteome. Proteomics 2018; 18:e1800061. [DOI: 10.1002/pmic.201800061] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 09/07/2018] [Indexed: 12/27/2022]
Affiliation(s)
- Prakash Kulkarni
- Department of Medical Oncology and Therapeutics Research; City of Hope National Medical Center; Duarte CA 91010 USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine; Morsani College of Medicine; University of South Florida; Tampa FL 33612 USA
- Laboratory of New methods in Biology; Institute for Biological Instrumentation; Russian Academy of Sciences; Pushchino Moscow Region 142290 Russia
| |
Collapse
|
67
|
Hu G, Wang K, Song J, Uversky VN, Kurgan L. Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity. Proteomics 2018; 18:e1800243. [PMID: 30198635 DOI: 10.1002/pmic.201800243] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/30/2018] [Indexed: 12/14/2022]
Abstract
Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, 33612, USA.,Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
68
|
Du MZ, Wei W, Qin L, Liu S, Zhang AY, Zhang Y, Zhou H, Guo FB. Co-adaption of tRNA gene copy number and amino acid usage influences translation rates in three life domains. DNA Res 2018; 24:623-633. [PMID: 28992099 PMCID: PMC5726483 DOI: 10.1093/dnares/dsx030] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 06/15/2017] [Indexed: 12/01/2022] Open
Abstract
Although more and more entangled participants of translation process were realized, how they cooperate and co-determine the final translation efficiency still lacks details. Here, we reasoned that the basic translation components, tRNAs and amino acids should be consistent to maximize the efficiency and minimize the cost. We firstly revealed that 310 out of 410 investigated genomes of three domains had significant co-adaptions between the tRNA gene copy numbers and amino acid compositions, indicating that maximum efficiency constitutes ubiquitous selection pressure on protein translation. Furthermore, fast-growing and larger bacteria are found to have significantly better co-adaption and confirmed the effect of this pressure. Within organism, highly expressed proteins and those connected to acute responses have higher co-adaption intensity. Thus, the better co-adaption probably speeds up the growing of cells through accelerating the translation of special proteins. Experimentally, manipulating the tRNA gene copy number to optimize co-adaption between enhanced green fluorescent protein (EGFP) and tRNA gene set of Escherichia coli indeed lifted the translation rate (speed). Finally, as a newly confirmed translation rate regulating mechanism, the co-adaption reflecting translation rate not only deepens our understanding on translation process but also provides an easy and practicable method to improve protein translation rates and productivity.
Collapse
Affiliation(s)
| | - Wen Wei
- School of Life Science and Technology
| | - Lei Qin
- School of Life Science and Technology
| | - Shuo Liu
- School of Life Science and Technology
| | - An-Ying Zhang
- School of Life Science and Technology.,Centre for Informational Biology
| | - Yong Zhang
- School of Life Science and Technology.,Centre for Informational Biology
| | - Hong Zhou
- School of Life Science and Technology.,Centre for Informational Biology
| | - Feng-Biao Guo
- School of Life Science and Technology.,Centre for Informational Biology.,Key Laboratory for Neuroinformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
69
|
O'Donoghue SI, Baldi BF, Clark SJ, Darling AE, Hogan JM, Kaur S, Maier-Hein L, McCarthy DJ, Moore WJ, Stenau E, Swedlow JR, Vuong J, Procter JB. Visualization of Biomedical Data. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013424] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.
Collapse
Affiliation(s)
- Seán I. O'Donoghue
- Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Eveleigh NSW 2015, Australia
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW), Kensington NSW 2033, Australia
| | - Benedetta Frida Baldi
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
| | - Susan J. Clark
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
| | - Aaron E. Darling
- The ithree Institute, University of Technology Sydney, Ultimo NSW 2007, Australia
| | - James M. Hogan
- School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane QLD, 4000, Australia
| | - Sandeep Kaur
- School of Computer Science and Engineering, University of New South Wales (UNSW), Kensington NSW 2033, Australia
| | - Lena Maier-Hein
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Davis J. McCarthy
- European Bioinformatics Institute (EBI), European Molecular Biology Laboratory (EMBL), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- St. Vincent's Institute of Medical Research, Fitzroy VIC 3065, Australia
| | - William J. Moore
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Esther Stenau
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Jason R. Swedlow
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Jenny Vuong
- Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Eveleigh NSW 2015, Australia
| | - James B. Procter
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| |
Collapse
|
70
|
Inhibition of protein interactions: co-crystalized protein-protein interfaces are nearly as good as holo proteins in rigid-body ligand docking. J Comput Aided Mol Des 2018; 32:769-779. [PMID: 30003468 DOI: 10.1007/s10822-018-0124-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 05/22/2018] [Indexed: 12/15/2022]
Abstract
Modulating protein interaction pathways may lead to the cure of many diseases. Known protein-protein inhibitors bind to large pockets on the protein-protein interface. Such large pockets are detected also in the protein-protein complexes without known inhibitors, making such complexes potentially druggable. The inhibitor-binding site is primary defined by the side chains that form the largest pocket in the protein-bound conformation. Low-resolution ligand docking shows that the success rate for the protein-bound conformation is close to the one for the ligand-bound conformation, and significantly higher than for the apo conformation. The conformational change on the protein interface upon binding to the other protein results in a pocket employed by the ligand when it binds to that interface. This proof-of-concept study suggests that rather than using computational pocket-opening procedures, one can opt for an experimentally determined structure of the target co-crystallized protein-protein complex as a starting point for drug design.
Collapse
|
71
|
Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J 2018; 285:2605-2625. [PMID: 29802682 DOI: 10.1111/febs.14504] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 04/12/2018] [Accepted: 05/11/2018] [Indexed: 12/11/2022]
Abstract
Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multidomain arrangements. Young domains, such as most HCA-defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross-species comparisons alone.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| | - Tristan Bitard-Feildel
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| |
Collapse
|
72
|
Abstract
The vast, mostly unknown protein universe can be explored by analyzing protein sequences as a string of domains. A broader coverage can be achieved when these domains, the essential blocks in protein evolution, are detected using sequence profiles. Using clustering to collapse redundant profiles into unique function words (UFWs), we find that over the years 2009–2016, the number of UFWs saturates while the number of sequences matched by a combination of two or more UFWs grows exponentially. Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of “words” or UFWs (57% shared), the “sentences” (MDAs) are different (1.3% shared).
Collapse
|
73
|
Validation of LDLr Activity as a Tool to Improve Genetic Diagnosis of Familial Hypercholesterolemia: A Retrospective on Functional Characterization of LDLr Variants. Int J Mol Sci 2018; 19:ijms19061676. [PMID: 29874871 PMCID: PMC6032215 DOI: 10.3390/ijms19061676] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Revised: 05/28/2018] [Accepted: 06/04/2018] [Indexed: 12/11/2022] Open
Abstract
Familial hypercholesterolemia (FH) is an autosomal dominant disorder characterized by high blood-cholesterol levels mostly caused by mutations in the low-density lipoprotein receptor (LDLr). With a prevalence as high as 1/200 in some populations, genetic screening for pathogenic LDLr mutations is a cost-effective approach in families classified as ‘definite’ or ‘probable’ FH and can help to early diagnosis. However, with over 2000 LDLr variants identified, distinguishing pathogenic mutations from benign mutations is a long-standing challenge in the field. In 1998, the World Health Organization (WHO) highlighted the importance of improving the diagnosis and prognosis of FH patients thus, identifying LDLr pathogenic variants is a longstanding challenge to provide an accurate genetic diagnosis and personalized treatments. In recent years, accessible methodologies have been developed to assess LDLr activity in vitro, providing experimental reproducibility between laboratories all over the world that ensures rigorous analysis of all functional studies. In this review we present a broad spectrum of functionally characterized missense LDLr variants identified in patients with FH, which is mandatory for a definite diagnosis of FH.
Collapse
|
74
|
Keel BN, Deng B, Moriyama EN. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks. Bioinformatics 2018; 34:1270-1277. [PMID: 29186344 DOI: 10.1093/bioinformatics/btx755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/23/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. Availability and implementation MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. Contact emoriyama2@unl.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Brittney N Keel
- USDA †, ARS, U.S. Meat Animal Research Center, Clay Center, NE 68933, USA.,Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Bo Deng
- Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences and Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
75
|
Agyei D, Tsopmo A, Udenigwe CC. Bioinformatics and peptidomics approaches to the discovery and analysis of food-derived bioactive peptides. Anal Bioanal Chem 2018. [PMID: 29516135 DOI: 10.1007/s00216-018-0974-1] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.
Collapse
Affiliation(s)
- Dominic Agyei
- Department of Food Science, University of Otago, Dunedin, 9054, New Zealand
| | - Apollinaire Tsopmo
- Food Science and Nutrition Program, Department of Chemistry, Carleton University, Ottawa, ON, K1S 5B6, Canada
| | - Chibuike C Udenigwe
- School of Nutrition Sciences, University of Ottawa, Ottawa, ON, K1N 6N5, Canada. .,Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, ON, K1N 6N5, Canada.
| |
Collapse
|
76
|
Brown T, Brown N, Stollar EJ. Most yeast SH3 domains bind peptide targets with high intrinsic specificity. PLoS One 2018; 13:e0193128. [PMID: 29470497 PMCID: PMC5823434 DOI: 10.1371/journal.pone.0193128] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 02/04/2018] [Indexed: 01/07/2023] Open
Abstract
A need exists to develop bioinformatics for predicting differences in protein function, especially for members of a domain family who share a common fold, yet are found in a diverse array of proteins. Many domain families have been conserved over large evolutionary spans and representative genomic data during these periods are now available. This allows a simple method for grouping domain sequences to reveal common and unique/specific binding residues. As such, we hypothesize that sequence alignment analysis of the yeast SH3 domain family across ancestral species in the fungal kingdom can determine whether each member encodes specific information to bind unique peptide targets. With this approach, we identify important specific residues for a given domain as those that show little conservation within an alignment of yeast domain family members (paralogs) but are conserved in an alignment of its direct relatives (orthologs). We find most of the yeast SH3 domain family members have maintained unique amino acid conservation patterns that suggest they bind peptide targets with high intrinsic specificity through varying degrees of non-canonical recognition. For a minority of domains, we predict a less diverse binding surface, likely requiring additional factors to bind targets specifically. We observe that our predictions are consistent with high throughput binding data, which suggests our approach can probe intrinsic binding specificity in any other interaction domain family that is maintained during evolution.
Collapse
Affiliation(s)
- Tom Brown
- Math and Computer Science Department, Eastern New Mexico University, Portales, NM, United States of America
| | - Nick Brown
- Portales High School, Portales, NM, United States of America
| | - Elliott J. Stollar
- Physical Sciences Department, Eastern New Mexico University, Portales, NM, United States of America
- * E-mail:
| |
Collapse
|
77
|
Valasatava Y, Rosato A, Furnham N, Thornton JM, Andreini C. To what extent do structural changes in catalytic metal sites affect enzyme function? J Inorg Biochem 2018; 179:40-53. [PMID: 29161638 PMCID: PMC5760197 DOI: 10.1016/j.jinorgbio.2017.11.002] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 11/02/2017] [Accepted: 11/04/2017] [Indexed: 01/09/2023]
Abstract
About half of known enzymatic reactions involve metals. Enzymes belonging to the same superfamily often evolve to catalyze different reactions on the same structural scaffold. The work presented here investigates how functional differentiation, within superfamilies that contain metalloenzymes, relates to structural changes at the catalytic metal site. In general, when the catalytic metal site is unchanged across the enzymes of a superfamily, the functional differentiation within the superfamily tends to be low and the mechanism conserved. Conversely, all types of structural changes in the metal binding site are observed for superfamilies with high functional differentiation. Overall, the catalytic role of the metal ions appears to be one of the most conserved features of the enzyme mechanism within metalloenzyme superfamilies. In particular, when the catalytic role of the metal ion does not involve a redox reaction (i.e. there is no exchange of electrons with the substrate), this role is almost always maintained even when the site undergoes significant structural changes. In these enzymes, functional diversification is most often associated with modifications in the surrounding protein matrix, which has changed so much that the enzyme chemistry is significantly altered. On the other hand, in more than 50% of the examples where the metal has a redox role in catalysis, changes at the metal site modify its catalytic role. Further, we find that there are no examples in our dataset where metal sites with a redox role are lost during evolution. SYNOPSIS In this paper we investigate how functional diversity within superfamilies of metalloenzymes relates to structural changes at the catalytic metal site. Evolution tends to strictly conserve the metal site. When changes occur, they do not modify the catalytic role of non-redox metals whereas they affect the role of redox-active metals.
Collapse
Affiliation(s)
- Yana Valasatava
- Magnetic Resonance Center, University of Florence, 50019 Sesto Fiorentino, Italy; Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center, University of Florence, 50019 Sesto Fiorentino, Italy; Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom
| | - Janet M Thornton
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Claudia Andreini
- Magnetic Resonance Center, University of Florence, 50019 Sesto Fiorentino, Italy; Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy.
| |
Collapse
|
78
|
Wu Z, Lin L, Khan M, Zhang W, Mao S, Zheng Y, Li Z, Lin JM. DNA-Mediated rolling circle amplification for ultrasensitive detection of thrombin using MALDI-TOF mass spectrometry. Chem Commun (Camb) 2018; 54:11546-11549. [DOI: 10.1039/c8cc06934d] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
A DNA-mediated rolling circle amplification (RCA) strategy was established for ultrasensitive and specific detection of thrombin via MALDI-TOF MS.
Collapse
Affiliation(s)
- Zengnan Wu
- State Key Laboratory of Chemical Resource Engineering
- Beijing University of Chemical Technology
- Beijing
- China
- Department of Chemistry
| | - Ling Lin
- CAS Key Laboratory of Standardization and Measurement for Nanotechnology
- CAS Center for Excellence in Nanoscience
- National Center for Nanoscience and Technology
- Beijing
- China
| | - Mashooq Khan
- Department of Chemistry
- Beijing Key Laboratory of Micronalytical Methods and Instrumentation
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology
- Tsinghua University
- Beijing 100084
| | - Weifei Zhang
- Department of Chemistry
- Beijing Key Laboratory of Micronalytical Methods and Instrumentation
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology
- Tsinghua University
- Beijing 100084
| | - Sifeng Mao
- Department of Chemistry
- Beijing Key Laboratory of Micronalytical Methods and Instrumentation
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology
- Tsinghua University
- Beijing 100084
| | - Yajing Zheng
- Department of Chemistry
- Beijing Key Laboratory of Micronalytical Methods and Instrumentation
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology
- Tsinghua University
- Beijing 100084
| | - Zenghe Li
- State Key Laboratory of Chemical Resource Engineering
- Beijing University of Chemical Technology
- Beijing
- China
| | - Jin-Ming Lin
- Department of Chemistry
- Beijing Key Laboratory of Micronalytical Methods and Instrumentation
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology
- Tsinghua University
- Beijing 100084
| |
Collapse
|
79
|
|
80
|
Fundamentals and Methods for T- and B-Cell Epitope Prediction. J Immunol Res 2017; 2017:2680160. [PMID: 29445754 PMCID: PMC5763123 DOI: 10.1155/2017/2680160] [Citation(s) in RCA: 284] [Impact Index Per Article: 40.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 11/22/2017] [Accepted: 11/27/2017] [Indexed: 12/25/2022] Open
Abstract
Adaptive immunity is mediated by T- and B-cells, which are immune cells capable of developing pathogen-specific memory that confers immunological protection. Memory and effector functions of B- and T-cells are predicated on the recognition through specialized receptors of specific targets (antigens) in pathogens. More specifically, B- and T-cells recognize portions within their cognate antigens known as epitopes. There is great interest in identifying epitopes in antigens for a number of practical reasons, including understanding disease etiology, immune monitoring, developing diagnosis assays, and designing epitope-based vaccines. Epitope identification is costly and time-consuming as it requires experimental screening of large arrays of potential epitope candidates. Fortunately, researchers have developed in silico prediction methods that dramatically reduce the burden associated with epitope mapping by decreasing the list of potential epitope candidates for experimental testing. Here, we analyze aspects of antigen recognition by T- and B-cells that are relevant for epitope prediction. Subsequently, we provide a systematic and inclusive review of the most relevant B- and T-cell epitope prediction methods and tools, paying particular attention to their foundations.
Collapse
|
81
|
Shapiro JA. Living Organisms Author Their Read-Write Genomes in Evolution. BIOLOGY 2017; 6:E42. [PMID: 29211049 PMCID: PMC5745447 DOI: 10.3390/biology6040042] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 11/17/2017] [Accepted: 11/28/2017] [Indexed: 12/18/2022]
Abstract
Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
Collapse
Affiliation(s)
- James A Shapiro
- Department of Biochemistry and Molecular Biology, University of Chicago GCIS W123B, 979 E. 57th Street, Chicago, IL 60637, USA.
| |
Collapse
|
82
|
Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci Rep 2017; 7:15449. [PMID: 29133927 PMCID: PMC5684393 DOI: 10.1038/s41598-017-15635-8] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 10/31/2017] [Indexed: 11/17/2022] Open
Abstract
The protein sequences found in nature represent a tiny fraction of the potential sequences that could be constructed from the 20-amino-acid alphabet. To help define the properties that shaped proteins to stand out from the space of possible alternatives, we conducted a systematic computational and experimental exploration of random (unevolved) sequences in comparison with biological proteins. In our study, combinations of secondary structure, disorder, and aggregation predictions are accompanied by experimental characterization of selected proteins. We found that the overall secondary structure and physicochemical properties of random and biological sequences are very similar. Moreover, random sequences can be well-tolerated by living cells. Contrary to early hypotheses about the toxicity of random and disordered proteins, we found that random sequences with high disorder have low aggregation propensity (unlike random sequences with high structural content) and were particularly well-tolerated. This direct structure content/aggregation propensity dependence differentiates random and biological proteins. Our study indicates that while random sequences can be both structured and disordered, the properties of the latter make them better suited as progenitors (in both in vivo and in vitro settings) for further evolution of complex, soluble, three-dimensional scaffolds that can perform specific biochemical tasks.
Collapse
|
83
|
Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017; 114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected. Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.
Collapse
|
84
|
Hücker SM, Ardern Z, Goldberg T, Schafferhans A, Bernhofer M, Vestergaard G, Nelson CW, Schloter M, Rost B, Scherer S, Neuhaus K. Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome. PLoS One 2017; 12:e0184119. [PMID: 28902868 PMCID: PMC5597208 DOI: 10.1371/journal.pone.0184119] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 08/20/2017] [Indexed: 12/29/2022] Open
Abstract
In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.
Collapse
Affiliation(s)
- Sarah M. Hücker
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Tatyana Goldberg
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Andrea Schafferhans
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Michael Bernhofer
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Gisle Vestergaard
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Chase W. Nelson
- Sackler Institute for Comparative Genomics, American Museum of Natural History New York, New York, United States of America
| | - Michael Schloter
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Burkhard Rost
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
- * E-mail:
| |
Collapse
|
85
|
Hosseini SR, Wagner A. Constraint and Contingency Pervade the Emergence of Novel Phenotypes in Complex Metabolic Systems. Biophys J 2017; 113:690-701. [PMID: 28793223 DOI: 10.1016/j.bpj.2017.06.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 01/25/2017] [Accepted: 06/19/2017] [Indexed: 01/23/2023] Open
Abstract
An evolutionary constraint is a bias or limitation in phenotypic variation that a biological system produces. We know examples of such constraints, but we have no systematic understanding about their extent and causes for any one biological system. We here study metabolisms, genomically encoded complex networks of enzyme-catalyzed biochemical reactions, and the constraints they experience in bringing forth novel phenotypes that allow survival on novel carbon sources. Our computational approach does not limit us to analyzing constrained variation in any one organism, but allows us to quantify constraints experienced by any metabolism. Specifically, we study metabolisms that are viable on one of 50 different carbon sources, and quantify how readily alterations of their chemical reactions create the ability to survive on a novel carbon source. We find that some metabolic phenotypes are much less likely to originate than others. For example, metabolisms viable on D-glucose are 1835 times more likely to give rise to metabolisms viable on D-fructose than on acetate. Likewise, we observe that some novel metabolic phenotypes are more contingent on parental phenotypes than others. Biochemical similarities among carbon sources can help explain the causes of these constraints. In addition, we study metabolisms that can be produced by recombination among 55 metabolisms of different bacterial strains or species, and show that their novel phenotypes are also contingent on and constrained by parental genotypes. To our knowledge, our analysis is the first to systematically quantify the incidence of constrained evolution in a broad class of biological system that is central to life and its evolution.
Collapse
Affiliation(s)
- Sayed-Rzgar Hosseini
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland; The Swiss Institute of Bioinformatics, Bioinformatics, Lausanne, Switzerland
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland; The Swiss Institute of Bioinformatics, Bioinformatics, Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico.
| |
Collapse
|
86
|
Romero R, Erez O, Maymon E, Chaemsaithong P, Xu Z, Pacora P, Chaiworapongsa T, Done B, Hassan SS, Tarca AL. The maternal plasma proteome changes as a function of gestational age in normal pregnancy: a longitudinal study. Am J Obstet Gynecol 2017; 217:67.e1-67.e21. [PMID: 28263753 PMCID: PMC5813489 DOI: 10.1016/j.ajog.2017.02.037] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2016] [Revised: 02/10/2017] [Accepted: 02/23/2017] [Indexed: 12/21/2022]
Abstract
OBJECTIVE Pregnancy is accompanied by dramatic physiological changes in maternal plasma proteins. Characterization of the maternal plasma proteome in normal pregnancy is an essential step for understanding changes to predict pregnancy outcome. The objective of this study was to describe maternal plasma proteins that change in abundance with advancing gestational age and determine biological processes that are perturbed in normal pregnancy. STUDY DESIGN A longitudinal study included 43 normal pregnancies that had a term delivery of an infant who was appropriate for gestational age without maternal or neonatal complications. For each pregnancy, 3 to 6 maternal plasma samples (median, 5) were profiled to measure the abundance of 1125 proteins using multiplex assays. Linear mixed-effects models with polynomial splines were used to model protein abundance as a function of gestational age, and the significance of the association was inferred via likelihood ratio tests. Proteins considered to be significantly changed were defined as having the following: (1) >1.5-fold change between 8 and 40 weeks of gestation; and (2) a false discovery rate-adjusted value of P < .1. Gene ontology enrichment analysis was used to identify biological processes overrepresented among the proteins that changed with advancing gestation. RESULTS The following results were found: (1) Ten percent (112 of 1125) of the profiled proteins changed in abundance as a function of gestational age; (2) of the 1125 proteins analyzed, glypican-3, sialic acid-binding immunoglobulin-type lectin-6, placental growth factor, C-C motif-28, carbonic anhydrase 6, prolactin, interleukin-1 receptor 4, dual-specificity mitogen-activated protein kinase 4, and pregnancy-associated plasma protein-A had more than a 5-fold change in abundance across gestation (these 9 proteins are known to be involved in a wide range of both physiological and pathological processes, such as growth regulation, embryogenesis, angiogenesis immunoregulation, inflammation etc); and (3) biological processes associated with protein changes in normal pregnancy included defense response, defense response to bacteria, proteolysis, and leukocyte migration (false discovery rate, 10%). CONCLUSION The plasma proteome of normal pregnancy demonstrates dramatic changes in both the magnitude of changes and the fraction of the proteins involved. Such information is important to understand the physiology of pregnancy and the development of biomarkers to differentiate normal vs abnormal pregnancy and determine the response to interventions.
Collapse
Affiliation(s)
- Roberto Romero
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI; Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI.
| | - Offer Erez
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Eli Maymon
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Piya Chaemsaithong
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Zhonghui Xu
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI
| | - Percy Pacora
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Tinnakorn Chaiworapongsa
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Bogdan Done
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI
| | - Sonia S Hassan
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI
| | - Adi L Tarca
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, and Detroit, MI; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI.
| |
Collapse
|
87
|
Eggermont L, Verstraeten B, Van Damme EJM. Genome-Wide Screening for Lectin Motifs in Arabidopsis thaliana. THE PLANT GENOME 2017; 10. [PMID: 28724081 DOI: 10.3835/plantgenome2017.02.0010] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
For more than three decades, served as a model for plant biology research. At present only a few protein families have been studied in detail in . This study focused on all sequences with lectin motifs in the genome of . Based on amino acid sequence similarity (BLASTp searches), 217 putative lectin genes were retrieved belonging to 9 out of 12 different lectin families. The domain organization and genomic distribution for each lectin family were analyzed. Domain architecture analysis revealed that most of these lectin gene sequences are linked to other domains, often belonging to protein families with catalytic activity. Many protein domains identified are known to play a role in stress signaling and defense, suggesting a major contribution of the putative lectins in development and plant defense. This genome-wide screen for different lectin motifs will help to unravel the functional characteristics of lectins. In addition, phylogenetic trees and WebLogos were created and showed that most lectin sequences that share the same domain architecture evolved together. Furthermore, the amino acids responsible for carbohydrate binding are largely conserved. Our results provide information about the evolutionary relationships and functional divergence of the lectin motifs in .
Collapse
|
88
|
Wang J, Kawasaki R, Uewaki JI, Rashid AUR, Tochio N, Tate SI. Dynamic Allostery Modulates Catalytic Activity by Modifying the Hydrogen Bonding Network in the Catalytic Site of Human Pin1. Molecules 2017; 22:molecules22060992. [PMID: 28617332 PMCID: PMC6152768 DOI: 10.3390/molecules22060992] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2017] [Revised: 06/13/2017] [Accepted: 06/13/2017] [Indexed: 02/01/2023] Open
Abstract
Allosteric communication among domains in modular proteins consisting of flexibly linked domains with complimentary roles remains poorly understood. To understand how complementary domains communicate, we have studied human Pin1, a representative modular protein with two domains mutually tethered by a flexible linker: a WW domain for substrate recognition and a peptidyl-prolyl isomerase (PPIase) domain. Previous studies of Pin1 showed that physical contact between the domains causes dynamic allostery by reducing conformation dynamics in the catalytic domain, which compensates for the entropy costs of substrate binding to the catalytic site and thus increases catalytic activity. In this study, the S138A mutant PPIase domain, a mutation that mimics the structural impact of the interdomain contact, was demonstrated to display dynamic allostery by rigidification of the α2-α3 loop that harbors the key catalytic residue C113. The reduced dynamics of the α2-α3 loop stabilizes the C113-H59 hydrogen bond in the hydrogen-bonding network of the catalytic site. The stabilized hydrogen bond between C113 and H59 retards initiation of isomerization, which explains the reduced isomerization rate by ~20% caused by the S138A mutation. These results provide new insight into the interdomain allosteric communication of Pin1.
Collapse
Affiliation(s)
- Jing Wang
- Department of Mathematical and Life Sciences, School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8526, Japan.
| | - Ryosuke Kawasaki
- Department of Mathematical and Life Sciences, School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8526, Japan.
| | - Jun-Ichi Uewaki
- Research Center for the Mathematics on Chromatin Live Dynamics (RcMcD), Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8526, Japan.
| | - Arif U R Rashid
- Department of Mathematical and Life Sciences, School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8526, Japan.
| | - Naoya Tochio
- Research Center for the Mathematics on Chromatin Live Dynamics (RcMcD), Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8526, Japan.
| | - Shin-Ichi Tate
- Department of Mathematical and Life Sciences, School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8526, Japan.
- Research Center for the Mathematics on Chromatin Live Dynamics (RcMcD), Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima 739-8526, Japan.
| |
Collapse
|
89
|
Van Holle S, De Schutter K, Eggermont L, Tsaneva M, Dang L, Van Damme EJM. Comparative Study of Lectin Domains in Model Species: New Insights into Evolutionary Dynamics. Int J Mol Sci 2017; 18:ijms18061136. [PMID: 28587095 PMCID: PMC5485960 DOI: 10.3390/ijms18061136] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Revised: 05/20/2017] [Accepted: 05/22/2017] [Indexed: 01/07/2023] Open
Abstract
Lectins are present throughout the plant kingdom and are reported to be involved in diverse biological processes. In this study, we provide a comparative analysis of the lectin families from model species in a phylogenetic framework. The analysis focuses on the different plant lectin domains identified in five representative core angiosperm genomes (Arabidopsisthaliana, Glycine max, Cucumis sativus, Oryza sativa ssp. japonica and Oryza sativa ssp. indica). The genomes were screened for genes encoding lectin domains using a combination of Basic Local Alignment Search Tool (BLAST), hidden Markov models, and InterProScan analysis. Additionally, phylogenetic relationships were investigated by constructing maximum likelihood phylogenetic trees. The results demonstrate that the majority of the lectin families are present in each of the species under study. Domain organization analysis showed that most identified proteins are multi-domain proteins, owing to the modular rearrangement of protein domains during evolution. Most of these multi-domain proteins are widespread, while others display a lineage-specific distribution. Furthermore, the phylogenetic analyses reveal that some lectin families evolved to be similar to the phylogeny of the plant species, while others share a closer evolutionary history based on the corresponding protein domain architecture. Our results yield insights into the evolutionary relationships and functional divergence of plant lectins.
Collapse
Affiliation(s)
- Sofie Van Holle
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Kristof De Schutter
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
- Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Lore Eggermont
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Mariya Tsaneva
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Liuyi Dang
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Els J M Van Damme
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| |
Collapse
|
90
|
Wan X, Zhao X, Yau SST. An information-based network approach for protein classification. PLoS One 2017; 12:e0174386. [PMID: 28350835 PMCID: PMC5370107 DOI: 10.1371/journal.pone.0174386] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 03/08/2017] [Indexed: 11/25/2022] Open
Abstract
Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method.
Collapse
Affiliation(s)
- Xiaogeng Wan
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
- * E-mail: (XW); (XZ); (SSTY)
| | - Xin Zhao
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
- * E-mail: (XW); (XZ); (SSTY)
| | - Stephen S. T. Yau
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
- * E-mail: (XW); (XZ); (SSTY)
| |
Collapse
|
91
|
Popovic A, Hai T, Tchigvintsev A, Hajighasemi M, Nocek B, Khusnutdinova AN, Brown G, Glinos J, Flick R, Skarina T, Chernikova TN, Yim V, Brüls T, Paslier DL, Yakimov MM, Joachimiak A, Ferrer M, Golyshina OV, Savchenko A, Golyshin PN, Yakunin AF. Activity screening of environmental metagenomic libraries reveals novel carboxylesterase families. Sci Rep 2017; 7:44103. [PMID: 28272521 PMCID: PMC5341072 DOI: 10.1038/srep44103] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 02/01/2017] [Indexed: 11/29/2022] Open
Abstract
Metagenomics has made accessible an enormous reserve of global biochemical diversity. To tap into this vast resource of novel enzymes, we have screened over one million clones from metagenome DNA libraries derived from sixteen different environments for carboxylesterase activity and identified 714 positive hits. We have validated the esterase activity of 80 selected genes, which belong to 17 different protein families including unknown and cyclase-like proteins. Three metagenomic enzymes exhibited lipase activity, and seven proteins showed polyester depolymerization activity against polylactic acid and polycaprolactone. Detailed biochemical characterization of four new enzymes revealed their substrate preference, whereas their catalytic residues were identified using site-directed mutagenesis. The crystal structure of the metal-ion dependent esterase MGS0169 from the amidohydrolase superfamily revealed a novel active site with a bound unknown ligand. Thus, activity-centered metagenomics has revealed diverse enzymes and novel families of microbial carboxylesterases, whose activity could not have been predicted using bioinformatics tools.
Collapse
Affiliation(s)
- Ana Popovic
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Tran Hai
- School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, UK
| | - Anatoly Tchigvintsev
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Mahbod Hajighasemi
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Boguslaw Nocek
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439, USA
| | - Anna N Khusnutdinova
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Greg Brown
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Julia Glinos
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Robert Flick
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Tatiana Skarina
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | | | - Veronica Yim
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Thomas Brüls
- Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Direction de la Recherche Fondamentale, Institut de Génomique, Université de d'Evry Val d'Essonne (UEVE), Centre National de la Recherche Scientifique (CNRS), UMR8030, Génomique métabolique, Evry, France
| | - Denis Le Paslier
- Université de d'Evry Val d'Essonne (UEVE), Centre National de la Recherche, Scientifique (CNRS), UMR8030, Génomique métabolique, Commissariat à l'Energie, Atomique et aux Energies Alternatives (CEA), Direction de la Recherche, Fondamentale, Institut de Génomique, Evry, France
| | | | - Andrzej Joachimiak
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439, USA
| | | | - Olga V Golyshina
- School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, UK
| | - Alexei Savchenko
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Peter N Golyshin
- School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, UK
| | - Alexander F Yakunin
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| |
Collapse
|
92
|
Anishchenko I, Kundrotas PJ, Vakser IA. Modeling complexes of modeled proteins. Proteins 2017; 85:470-478. [PMID: 27701777 PMCID: PMC5313347 DOI: 10.1002/prot.25183] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2016] [Revised: 09/22/2016] [Accepted: 10/02/2016] [Indexed: 12/21/2022]
Abstract
Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å Cα RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas 66047, USA
| | - Petras J. Kundrotas
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas 66047, USA
| | - Ilya A. Vakser
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas 66047, USA
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047, USA
| |
Collapse
|
93
|
Van Holle S, Rougé P, Van Damme EJM. Evolution and structural diversification of Nictaba-like lectin genes in food crops with a focus on soybean (Glycine max). ANNALS OF BOTANY 2017; 119:901-914. [PMID: 28087663 PMCID: PMC5379587 DOI: 10.1093/aob/mcw259] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 10/24/2016] [Accepted: 11/17/2016] [Indexed: 05/10/2023]
Abstract
Background and Aims The Nictaba family groups all proteins that show homology to Nictaba, the tobacco lectin. So far, Nictaba and an Arabidopsis thaliana homologue have been shown to be implicated in the plant stress response. The availability of more than 50 sequenced plant genomes provided the opportunity for a genome-wide identification of Nictaba -like genes in 15 species, representing members of the Fabaceae, Poaceae, Solanaceae, Musaceae, Arecaceae, Malvaceae and Rubiaceae. Additionally, phylogenetic relationships between the different species were explored. Furthermore, this study included domain organization analysis, searching for orthologous genes in the legume family and transcript profiling of the Nictaba -like lectin genes in soybean. Methods Using a combination of BLASTp, InterPro analysis and hidden Markov models, the genomes of Medicago truncatula , Cicer arietinum , Lotus japonicus , Glycine max , Cajanus cajan , Phaseolus vulgaris , Theobroma cacao , Solanum lycopersicum , Solanum tuberosum , Coffea canephora , Oryza sativa , Zea mays, Sorghum bicolor , Musa acuminata and Elaeis guineensis were searched for Nictaba -like genes. Phylogenetic analysis was performed using RAxML and additional protein domains in the Nictaba-like sequences were identified using InterPro. Expression analysis of the soybean Nictaba -like genes was investigated using microarray data. Key Results Nictaba -like genes were identified in all studied species and analysis of the duplication events demonstrated that both tandem and segmental duplication contributed to the expansion of the Nictaba gene family in angiosperms. The single-domain Nictaba protein and the multi-domain F-box Nictaba architectures are ubiquitous among all analysed species and microarray analysis revealed differential expression patterns for all soybean Nictaba-like genes. Conclusions Taken together, the comparative genomics data contributes to our understanding of the Nictaba -like gene family in species for which the occurrence of Nictaba domains had not yet been investigated. Given the ubiquitous nature of these genes, they have probably acquired new functions over time and are expected to take on various roles in plant development and defence.
Collapse
Affiliation(s)
- Sofie Van Holle
- Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Pierre Rougé
- UMR 152 PHARMA-DEV, Université de Toulouse, IRD, UPS, Chemin des Maraîchers 35, 31400 Toulouse, France
| | - Els J. M. Van Damme
- Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| |
Collapse
|
94
|
Wu R, Wilton R, Cuff ME, Endres M, Babnigg G, Edirisinghe JN, Henry CS, Joachimiak A, Schiffer M, Pokkuluri PR. A novel signal transduction protein: Combination of solute binding and tandem PAS-like sensor domains in one polypeptide chain. Protein Sci 2017; 26:857-869. [PMID: 28168783 DOI: 10.1002/pro.3134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Revised: 01/27/2017] [Accepted: 02/02/2017] [Indexed: 11/07/2022]
Abstract
We report the structural and biochemical characterization of a novel periplasmic ligand-binding protein, Dret_0059, from Desulfohalobium retbaense DSM 5692, an organism isolated from Lake Retba, in Senegal. The structure of the protein consists of a unique combination of a periplasmic solute binding protein (SBP) domain at the N-terminal and a tandem PAS-like sensor domain at the C-terminal region. SBP domains are found ubiquitously, and their best known function is in solute transport across membranes. PAS-like sensor domains are commonly found in signal transduction proteins. These domains are widely observed as parts of many protein architectures and complexes but have not been observed previously within the same polypeptide chain. In the structure of Dret_0059, a ketoleucine moiety is bound to the SBP, whereas a cytosine molecule is bound in the distal PAS-like domain of the tandem PAS-like domain. Differential scanning flourimetry support the binding of ligands observed in the crystal structure. There is significant interaction between the SBP and tandem PAS-like domains, and it is possible that the binding of one ligand could have an effect on the binding of the other. We uncovered three other proteins with this structural architecture in the non-redundant sequence data base, and predict that they too bind the same substrates. The genomic context of this protein did not offer any clues for its function. We did not find any biological process in which the two observed ligands are coupled. The protein Dret_0059 could be involved in either signal transduction or solute transport.
Collapse
Affiliation(s)
- R Wu
- Midwest Center for Structural Genomics, Argonne National Laboratory, Argonne, Illinois, 60439.,Biosciences Division, Argonne National Laboratory, Argonne, Illinois, 60439
| | - R Wilton
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, 60439
| | - M E Cuff
- Midwest Center for Structural Genomics, Argonne National Laboratory, Argonne, Illinois, 60439.,Biosciences Division, Argonne National Laboratory, Argonne, Illinois, 60439.,Structural Biology Center, Argonne National Laboratory, Argonne, Illinois, 60439
| | - M Endres
- Midwest Center for Structural Genomics, Argonne National Laboratory, Argonne, Illinois, 60439
| | - G Babnigg
- Midwest Center for Structural Genomics, Argonne National Laboratory, Argonne, Illinois, 60439.,Biosciences Division, Argonne National Laboratory, Argonne, Illinois, 60439
| | - J N Edirisinghe
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, 60439.,Computation Institute, University of Chicago, Chicago, Illinois, 60637
| | - C S Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, 60439.,Computation Institute, University of Chicago, Chicago, Illinois, 60637
| | - A Joachimiak
- Midwest Center for Structural Genomics, Argonne National Laboratory, Argonne, Illinois, 60439.,Biosciences Division, Argonne National Laboratory, Argonne, Illinois, 60439.,Structural Biology Center, Argonne National Laboratory, Argonne, Illinois, 60439.,Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, 60637
| | - M Schiffer
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, 60439
| | - P R Pokkuluri
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, 60439
| |
Collapse
|
95
|
Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017; 7:41425. [PMID: 28134276 PMCID: PMC5278394 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open
Abstract
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
Collapse
|
96
|
|
97
|
Cordeiro TN, Herranz-Trillo F, Urbanek A, Estaña A, Cortés J, Sibille N, Bernadó P. Structural Characterization of Highly Flexible Proteins by Small-Angle Scattering. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 1009:107-129. [DOI: 10.1007/978-981-10-6038-0_7] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
98
|
Dang L, Van Damme EJM. Genome-wide identification and domain organization of lectin domains in cucumber. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2016; 108:165-176. [PMID: 27434144 DOI: 10.1016/j.plaphy.2016.07.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2016] [Revised: 07/04/2016] [Accepted: 07/09/2016] [Indexed: 05/21/2023]
Abstract
Lectins are ubiquitous proteins in plants and play important roles in a diverse set of biological processes, such as plant defense and cell signaling. Despite the availability of the Cucumis sativus L. genome sequence since 2009, little is known with respect to the occurrence of lectins in cucumber. In this study, a total of 146 putative lectin genes belonging to 10 different lectin families were identified and localized in the cucumber genome. Domain architecture analysis revealed that most of these lectin gene sequences contain multiple domains, where lectin domains are linked with other domains, as such creating chimeric lectin sequences encoding proteins with dual activities. This study provides an overview of lectin motifs in cucumber and will help to understand their potential biological role(s).
Collapse
Affiliation(s)
- Liuyi Dang
- Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Els J M Van Damme
- Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| |
Collapse
|
99
|
Zhong Y, Cheng ZMM. A unique RPW8-encoding class of genes that originated in early land plants and evolved through domain fission, fusion, and duplication. Sci Rep 2016; 6:32923. [PMID: 27678195 PMCID: PMC5039405 DOI: 10.1038/srep32923] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 08/16/2016] [Indexed: 01/17/2023] Open
Abstract
Duplication, lateral gene transfer, domain fusion/fission and de novo domain creation play a key role in formation of initial common ancestral protein. Abundant protein diversities are produced by domain rearrangements, including fusions, fissions, duplications, and terminal domain losses. In this report, we explored the origin of the RPW8 domain and examined the domain rearrangements that have driven the evolution of RPW8-encoding genes in land plants. The RPW8 domain first emerged in the early land plant, Physcomitrella patens, and it likely originated de novo from a non-coding sequence or domain divergence after duplication. It was then incorporated into the NBS-LRR protein to create a main sub-class of RPW8-encoding genes, the RPW8-NBS-encoding genes. They evolved by a series of genetic events of domain fissions, fusions, and duplications. Many species-specific duplication events and tandemly duplicated clusters clearly demonstrated that species-specific and tandem duplications played important roles in expansion of RPW8-encoding genes, especially in gymnosperms and species of the Rosaceae. RPW8 domains with greater Ka/Ks values than those of the NBS domains indicated that they evolved faster than the NBS domains in RPW8-NBSs.
Collapse
Affiliation(s)
- Yan Zhong
- College of Horticulture, Nanjing Agricultural University, Nanjing, 210095, China
| | - Zong-Ming Max Cheng
- College of Horticulture, Nanjing Agricultural University, Nanjing, 210095, China.,Department of Plant Science, University of Tennessee, Knoxville, 37996, USA
| |
Collapse
|
100
|
Palamini M, Canciani A, Forneris F. Identifying and Visualizing Macromolecular Flexibility in Structural Biology. Front Mol Biosci 2016; 3:47. [PMID: 27668215 PMCID: PMC5016524 DOI: 10.3389/fmolb.2016.00047] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 08/22/2016] [Indexed: 12/29/2022] Open
Abstract
Structural biology comprises a variety of tools to obtain atomic resolution data for the investigation of macromolecules. Conventional structural methodologies including crystallography, NMR and electron microscopy often do not provide sufficient details concerning flexibility and dynamics, even though these aspects are critical for the physiological functions of the systems under investigation. However, the increasing complexity of the molecules studied by structural biology (including large macromolecular assemblies, integral membrane proteins, intrinsically disordered systems, and folding intermediates) continuously demands in-depth analyses of the roles of flexibility and conformational specificity involved in interactions with ligands and inhibitors. The intrinsic difficulties in capturing often subtle but critical molecular motions in biological systems have restrained the investigation of flexible molecules into a small niche of structural biology. Introduction of massive technological developments over the recent years, which include time-resolved studies, solution X-ray scattering, and new detectors for cryo-electron microscopy, have pushed the limits of structural investigation of flexible systems far beyond traditional approaches of NMR analysis. By integrating these modern methods with powerful biophysical and computational approaches such as generation of ensembles of molecular models and selective particle picking in electron microscopy, more feasible investigations of dynamic systems are now possible. Using some prominent examples from recent literature, we review how current structural biology methods can contribute useful data to accurately visualize flexibility in macromolecular structures and understand its important roles in regulation of biological processes.
Collapse
Affiliation(s)
| | | | - Federico Forneris
- The Armenise-Harvard Laboratory of Structural Biology, Department of Biology and Biotechnology, University of PaviaPavia, Italy
| |
Collapse
|