1
|
Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
The principle of continuity demands the existence of prior molecular states and common ancestors responsible for extant macromolecular structure. Here, we focus on the emergence and evolution of loop prototypes - the elemental architects of protein domain structure. Phylogenomic reconstruction spanning superkingdoms and viruses generated an evolutionary chronology of prototypes with six distinct evolutionary phases defining a most parsimonious evolutionary progression of cellular life. Each phase was marked by strategic prototype accumulation shaping the structures and functions of common ancestors. The last universal common ancestor (LUCA) of cells and viruses and the last universal cellular ancestor (LUCellA) defined stem lines that were structurally and functionally complex. The evolutionary saga highlighted transformative forces. LUCA lacked biosynthetic ribosomal machinery, while the pivotal LUCellA lacked essential DNA biosynthesis and modern transcription. Early proteins therefore relied on RNA for genetic information storage but appeared initially decoupled from it, hinting at transformative shifts of genetic processing. Urancestral loop types suggest advanced folding designs were present at an early evolutionary stage. An exploration of loop geometric properties revealed gradual replacement of prototypes with α-helix and β-strand bracing structures over time, paving the way for the dominance of other loop types. AlphFold2-generated atomic models of prototype accretion described patterns of fold emergence. Our findings favor a ‛processual' model of evolving stem lines aligned with Woese's vision of a communal world. This model prompts discussing the 'problem of ancestors' and the challenges that lie ahead for research in taxonomy, evolution and complexity.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Callout Biotech, Albuquerque, NM, 87112, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
2
|
Stetler-Stevenson WG. The Continuing Saga of Tissue Inhibitor of Metalloproteinase 2: Emerging Roles in Tissue Homeostasis and Cancer Progression. THE AMERICAN JOURNAL OF PATHOLOGY 2023; 193:1336-1352. [PMID: 37572947 PMCID: PMC10548276 DOI: 10.1016/j.ajpath.2023.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 07/26/2023] [Accepted: 08/01/2023] [Indexed: 08/14/2023]
Abstract
Tissue inhibitors of metalloproteinases (TIMPs) are a conserved family of proteins that were originally identified as cytokine-like erythroid growth factors. Subsequently, TIMPs were characterized as endogenous inhibitors of matrixin proteinases. These proteinases are the primary mediators of extracellular matrix turnover in pathologic conditions, such as cancer invasion and metastasis. Thus, TIMPs were immediately recognized as important regulators of tissue homeostasis. However, TIMPs also demonstrate unique biological activities that are independent of metalloproteinase regulation. Although often overlooked, these non-protease-mediated TIMP functions demonstrate a variety of direct cellular effects of potential therapeutic value. TIMP2 is the most abundantly expressed TIMP family member, and ongoing studies show that its tumor suppressor activity extends beyond protease inhibition to include direct modulation of tumor, endothelial, and fibroblast cellular responses in the tumor microenvironment. Recent data suggest that TIMP2 can suppress both primary tumor growth and metastatic niche formation. TIMP2 directly interacts with cellular receptors and matrisome elements to modulate cell signaling pathways that result in reduced proliferation and migration of neoplastic, endothelial, and fibroblast cell populations. These effects result in enhanced cell adhesion and focal contact formation while reducing tumor and endothelial proliferation, migration, and epithelial-to-mesenchymal transitions. These findings are consistent with TIMP2 homeostatic functions beyond simple inhibition of metalloprotease activity. This review examines the ongoing evolution of TIMP2 function, future perspectives in TIMP research, and the therapeutic potential of TIMP2.
Collapse
Affiliation(s)
- William G Stetler-Stevenson
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.
| |
Collapse
|
3
|
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023; 91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Collapse
Affiliation(s)
- Pavan Gollapalli
- Center for Bioinformatics and Biostatistics, Nitte (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Sushmitha Rudrappa
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India
| | - Vadlapudi Kumar
- Department of Biochemistry, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - Hulikal Shivashankara Santosh Kumar
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India.
| |
Collapse
|
4
|
Mayer C, Vogt A, Uslu T, Scalzitti N, Chennen K, Poch O, Thompson JD. CeGAL: Redefining a Widespread Fungal-Specific Transcription Factor Family Using an In Silico Error-Tracking Approach. J Fungi (Basel) 2023; 9:jof9040424. [PMID: 37108879 PMCID: PMC10141177 DOI: 10.3390/jof9040424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/21/2023] [Accepted: 03/28/2023] [Indexed: 03/31/2023] Open
Abstract
In fungi, the most abundant transcription factor (TF) class contains a fungal-specific ‘GAL4-like’ Zn2C6 DNA binding domain (DBD), while the second class contains another fungal-specific domain, known as ‘fungal_trans’ or middle homology domain (MHD), whose function remains largely uncharacterized. Remarkably, almost a third of MHD-containing TFs in public sequence databases apparently lack DNA binding activity, since they are not predicted to contain a DBD. Here, we reassess the domain organization of these ‘MHD-only’ proteins using an in silico error-tracking approach. In a large-scale analysis of ~17,000 MHD-only TF sequences present in all fungal phyla except Microsporidia and Cryptomycota, we show that the vast majority (>90%) result from genome annotation errors and we are able to predict a new DBD sequence for 14,261 of them. Most of these sequences correspond to a Zn2C6 domain (82%), with a small proportion of C2H2 domains (4%) found only in Dikarya. Our results contradict previous findings that the MHD-only TF are widespread in fungi. In contrast, we show that they are exceptional cases, and that the fungal-specific Zn2C6–MHD domain pair represents the canonical domain signature defining the most predominant fungal TF family. We call this family CeGAL, after the highly characterized members: Cep3, whose 3D structure is determined, and GAL4, a eukaryotic TF archetype. We believe that this will not only improve the annotation and classification of the Zn2C6 TF but will also provide critical guidance for future fungal gene regulatory network analyses.
Collapse
Affiliation(s)
- Claudine Mayer
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
- Faculté des Sciences, Université Paris Cité, UFR Sciences du Vivant, 75013 Paris, France
- Correspondence: (C.M.); (J.D.T.)
| | - Arthur Vogt
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Tuba Uslu
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Nicolas Scalzitti
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Kirsley Chennen
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Julie D. Thompson
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
- Correspondence: (C.M.); (J.D.T.)
| |
Collapse
|
5
|
Taha Tolba EAEH, Ahmed Amer HZ. In silico Analysis of Tyrosine Kinases Receptor in Papillary and Medullary Thyroid Cancer Using Sequence-alignment-based Methods. BIOTECHNOLOGY(FAISALABAD) 2023; 22:18-27. [DOI: 10.3923/biotech.2023.18.27] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
6
|
Tong CL, Kanwar N, Morrone DJ, Seelig B. Nature-inspired engineering of an artificial ligase enzyme by domain fusion. Nucleic Acids Res 2022; 50:11175-11185. [PMID: 36243966 PMCID: PMC9638898 DOI: 10.1093/nar/gkac858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 08/30/2022] [Accepted: 09/26/2022] [Indexed: 11/20/2022] Open
Abstract
The function of most proteins is accomplished through the interplay of two or more protein domains and fine-tuned by natural evolution. In contrast, artificial enzymes have often been engineered from a single domain scaffold and frequently have lower catalytic activity than natural enzymes. We previously generated an artificial enzyme that catalyzed an RNA ligation by >2 million-fold but was likely limited in its activity by low substrate affinity. Inspired by nature's concept of domain fusion, we fused the artificial enzyme to a series of protein domains known to bind nucleic acids with the goal of improving its catalytic activity. The effect of the fused domains on catalytic activity varied greatly, yielding severalfold increases but also reductions caused by domains that previously enhanced nucleic acid binding in other protein engineering projects. The combination of the two better performing binding domains improved the activity of the parental ligase by more than an order of magnitude. These results demonstrate for the first time that nature's successful evolutionary mechanism of domain fusion can also improve an unevolved primordial-like protein whose structure and function had just been created in the test tube. The generation of multi-domain proteins might therefore be an ancient evolutionary process.
Collapse
Affiliation(s)
- Cher Ling Tong
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
- BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| | - Nisha Kanwar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
- BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| | - Dana J Morrone
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
- BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| | - Burckhard Seelig
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
- BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| |
Collapse
|
7
|
Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021; 18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
INTRODUCTION While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA.,C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Derek Caetano-Anollés
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
8
|
Lindenburg LH, Pantelejevs T, Gielen F, Zuazua-Villar P, Butz M, Rees E, Kaminski CF, Downs JA, Hyvönen M, Hollfelder F. Improved RAD51 binders through motif shuffling based on the modularity of BRC repeats. Proc Natl Acad Sci U S A 2021; 118:e2017708118. [PMID: 34772801 PMCID: PMC8727024 DOI: 10.1073/pnas.2017708118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/10/2021] [Indexed: 01/20/2023] Open
Abstract
Exchanges of protein sequence modules support leaps in function unavailable through point mutations during evolution. Here we study the role of the two RAD51-interacting modules within the eight binding BRC repeats of BRCA2. We created 64 chimeric repeats by shuffling these modules and measured their binding to RAD51. We found that certain shuffled module combinations were stronger binders than any of the module combinations in the natural repeats. Surprisingly, the contribution from the two modules was poorly correlated with affinities of natural repeats, with a weak BRC8 repeat containing the most effective N-terminal module. The binding of the strongest chimera, BRC8-2, to RAD51 was improved by -2.4 kCal/mol compared to the strongest natural repeat, BRC4. A crystal structure of RAD51:BRC8-2 complex shows an improved interface fit and an extended β-hairpin in this repeat. BRC8-2 was shown to function in human cells, preventing the formation of nuclear RAD51 foci after ionizing radiation.
Collapse
Affiliation(s)
- Laurens H Lindenburg
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Teodors Pantelejevs
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Fabrice Gielen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
- Living Systems Institute, University of Exeter, Exeter EX4 4QD, United Kingdom
| | - Pedro Zuazua-Villar
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Maren Butz
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Eric Rees
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Clemens F Kaminski
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Jessica A Downs
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Marko Hyvönen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| |
Collapse
|
9
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
10
|
Ma P, Luo T, Ge L, Chen Z, Wang X, Zhao R, Liao W, Bao L. Compensatory effects of M. tuberculosis rpoB mutations outside the rifampicin resistance-determining region. Emerg Microbes Infect 2021; 10:743-752. [PMID: 33775224 PMCID: PMC8057087 DOI: 10.1080/22221751.2021.1908096] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Mycobacterium tuberculosis has been observed to develop resistance to the frontline anti-tuberculosis drug rifampicin, primarily through mutations in the rifampicin resistance-determining region (RRDR) of rpoB. While these mutations have been determined to confer a fitness cost, compensatory mutations in rpoA and rpoC that may enhance the fitness of resistant strains have been demonstrated. Recent genomic studies identified several rpoB non-RRDR mutations that co-occurred with RRDR mutations in clinical isolates without rpoA/rpoC mutations and may confer fitness compensation. In this study, we identified 33 evolutionarily convergent rpoB non-RRDR mutations through phylogenomic analysis of public genomic data for clinical M. tuberculosis isolates. We found that none of these mutations, except V170F and I491F, can cause rifampin resistance in Mycolicibacterium smegmatis. The compensatory effects of five representative mutations across rpoB were evaluated by an in vitro competition assay, through which we observed that each of these mutations can significantly improve the relative fitness of the initial S450L mutant (0.97–1.08 vs 0.87). Furthermore, we observed that the decreased RNAP transcription efficiency introduced by S450L was significantly alleviated by each of the five mutations. Structural analysis indicated that the fitness compensation observed for the non-RRDR mutations might be achieved by modification of the RpoB active centre or by changes in interactions between RNAP subunits. Our results provide experimental evidence supporting that compensatory effects are exerted by several rpoB non-RRDR mutations, which could be utilized as additional molecular markers for predicting the fitness of clinical rifampin-resistant M. tuberculosis strains.
Collapse
Affiliation(s)
- Pengjiao Ma
- Laboratory of Infection and Immunity, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, People's Republic of China
| | - Tao Luo
- Laboratory of Infection and Immunity, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, People's Republic of China
| | - Liang Ge
- Laboratory of Infection and Immunity, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, People's Republic of China
| | - Zonghai Chen
- Laboratory of Infection and Immunity, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, People's Republic of China
| | - Xinyan Wang
- Laboratory of Infection and Immunity, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, People's Republic of China
| | - Rongchuan Zhao
- Laboratory of Infection and Immunity, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, People's Republic of China
| | - Wei Liao
- Laboratory of Infection and Immunity, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, People's Republic of China
| | - Lang Bao
- Laboratory of Infection and Immunity, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, People's Republic of China
| |
Collapse
|
11
|
Monzon V, Lafita A, Bateman A. Discovery of fibrillar adhesins across bacterial species. BMC Genomics 2021; 22:550. [PMID: 34275445 PMCID: PMC8286594 DOI: 10.1186/s12864-021-07586-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 04/07/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Fibrillar adhesins are long multidomain proteins that form filamentous structures at the cell surface of bacteria. They are an important yet understudied class of proteins composed of adhesive and stalk domains that mediate interactions of bacteria with their environment. This study aims to characterize fibrillar adhesins in a wide range of bacterial phyla and to identify new fibrillar adhesin-like proteins to improve our understanding of host-bacteria interactions. RESULTS Through careful literature and computational searches, we identified 82 stalk and 27 adhesive domain families in fibrillar adhesins. Based on the presence of these domains in the UniProt Reference Proteomes database, we identified and analysed 3,542 fibrillar adhesin-like proteins across species of the most common bacterial phyla. We further enumerate the adhesive and stalk domain combinations found in nature and demonstrate that fibrillar adhesins have complex and variable domain architectures, which differ across species. By analysing the domain architecture of fibrillar adhesins, we show that in Gram positive bacteria, adhesive domains are mostly positioned at the N-terminus and cell surface anchors at the C-terminus of the protein, while their positions are more variable in Gram negative bacteria. We provide an open repository of fibrillar adhesin-like proteins and domains to enable further studies of this class of bacterial surface proteins. CONCLUSION This study provides a domain-based characterization of fibrillar adhesins and demonstrates that they are widely found in species across the main bacterial phyla. We have discovered numerous novel fibrillar adhesins and improved our understanding of pathogenic adhesion and invasion mechanisms.
Collapse
Affiliation(s)
- Vivian Monzon
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK.
| | - Aleix Lafita
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| |
Collapse
|
12
|
Weadick CJ. Molecular Evolutionary Analysis of Nematode Zona Pellucida (ZP) Modules Reveals Disulfide-Bond Reshuffling and Standalone ZP-C Domains. Genome Biol Evol 2021; 12:1240-1255. [PMID: 32426804 PMCID: PMC7456536 DOI: 10.1093/gbe/evaa095] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2020] [Indexed: 12/11/2022] Open
Abstract
Zona pellucida (ZP) modules mediate extracellular protein-protein interactions and contribute to important biological processes including syngamy and cellular morphogenesis. Although some biomedically relevant ZP modules are well studied, little is known about the protein family's broad-scale diversity and evolution. The increasing availability of sequenced genomes from "nonmodel" systems provides a valuable opportunity to address this issue and to use comparative approaches to gain new insights into ZP module biology. Here, through phylogenetic and structural exploration of ZP module diversity across the nematode phylum, I report evidence that speaks to two important aspects of ZP module biology. First, I show that ZP-C domains-which in some modules act as regulators of ZP-N domain-mediated polymerization activity, and which have never before been found in isolation-can indeed be found as standalone domains. These standalone ZP-C domain proteins originated in independent (paralogous) lineages prior to the diversification of extant nematodes, after which they evolved under strong stabilizing selection, suggesting the presence of ZP-N domain-independent functionality. Second, I provide a much-needed phylogenetic perspective on disulfide bond variability, uncovering evidence for both convergent evolution and disulfide-bond reshuffling. This result has implications for our evolutionary understanding and classification of ZP module structural diversity and highlights the usefulness of phylogenetics and diverse sampling for protein structural biology. All told, these findings set the stage for broad-scale (cross-phyla) evolutionary analysis of ZP modules and position Caenorhabditis elegans and other nematodes as important experimental systems for exploring the evolution of ZP modules and their constituent domains.
Collapse
|
13
|
Vicedomini R, Blachon C, Oteri F, Carbone A. MyCLADE: a multi-source domain annotation server for sequence functional exploration. Nucleic Acids Res 2021; 49:W452-W458. [PMID: 34023906 PMCID: PMC8262732 DOI: 10.1093/nar/gkab395] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/27/2021] [Accepted: 04/29/2021] [Indexed: 11/13/2022] Open
Abstract
The ever-increasing number of genomic and metagenomic sequences accumulating in our databases requires accurate approaches to explore their content against specific domain targets. MyCLADE is a user-friendly webserver designed for targeted functional profiling of genomic and metagenomic sequences based on a database of a few million probabilistic models of Pfam domains. It uses the MetaCLADE multi-source domain annotation strategy, modelling domains based on multiple probabilistic profiles. MyCLADE takes a list of protein sequences and possibly a target set of domains/clans as input and, for each sequence, it provides a domain architecture built from the targeted domains or from all Pfam domains. It is linked to the Pfam and QuickGO databases in multiple ways for easy retrieval of domain and clan information. E-value, bit-score, domain-dependent probability scores and logos representing the match of the model with the sequence are provided to help the user to assess the quality of each annotation. Availability and implementation: MyCLADE is freely available at http://www.lcqb.upmc.fr/myclade.
Collapse
Affiliation(s)
- Riccardo Vicedomini
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238, Paris 75005, France
- Sorbonne Université, CNRS, Institut des Sciences du Calcul et des Données (ISCD), France
| | - Clémence Blachon
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238, Paris 75005, France
| | - Francesco Oteri
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238, Paris 75005, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238, Paris 75005, France
| |
Collapse
|
14
|
Abstract
Domains are the structural, functional and evolutionary units of proteins. They combine to form multidomain proteins. The evolutionary history of this molecular combinatorics has been studied with phylogenomic methods. Here, we construct networks of domain organization and explore their evolution. A time series of networks revealed two ancient waves of structural novelty arising from ancient 'p-loop' and 'winged helix' domains and a massive 'big bang' of domain organization. The evolutionary recruitment of domains was highly modular, hierarchical and ongoing. Domain rearrangements elicited non-random and scale-free network structure. Comparative analyses of preferential attachment, randomness and modularity showed yin-and-yang complementary transition and biphasic patterns along the structural chronology. Remarkably, the evolving networks highlighted a central evolutionary role of cofactor-supporting structures of non-ribosomal peptide synthesis pathways, likely crucial to the early development of the genetic code. Some highly modular domains featured dual response regulation in two-component signal transduction systems with DNA-binding activity linked to transcriptional regulation of responses to environmental change. Interestingly, hub domains across the evolving networks shared the historical role of DNA binding and editing, an ancient protein function in molecular evolution. Our investigation unfolds historical source-sink patterns of evolutionary recruitment that further our understanding of protein architectures and functions.
Collapse
|
15
|
Structure-based protein function prediction using graph convolutional networks. Nat Commun 2021; 12:3168. [PMID: 34039967 PMCID: PMC8155034 DOI: 10.1038/s41467-021-23303-9] [Citation(s) in RCA: 211] [Impact Index Per Article: 70.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 04/22/2021] [Indexed: 02/04/2023] Open
Abstract
The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/ .
Collapse
|
16
|
Bordin N, Sillitoe I, Lees JG, Orengo C. Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds. Front Mol Biosci 2021; 8:668184. [PMID: 34041266 PMCID: PMC8141709 DOI: 10.3389/fmolb.2021.668184] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jonathan G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
17
|
Xiao X, Xue GF, Stamatovic B, Qiu WR. Using Cellular Automata to Simulate Domain Evolution in Proteins. Front Genet 2020; 11:515. [PMID: 32582278 PMCID: PMC7296063 DOI: 10.3389/fgene.2020.00515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 04/28/2020] [Indexed: 11/26/2022] Open
Abstract
Proteins play primary roles in important biological processes such as catalysis, physiological functions, and immune system functions. Thus, the research on how proteins evolved has been a nuclear question in the field of evolutionary biology. General models of protein evolution help to determine the baseline expectations for evolution of sequences, and these models have been extensively useful in sequence analysis as well as for the computer simulation of artificial sequence data sets. We have developed a new method of simulating multi-domain protein evolution, including fusions of domains, insertion, and deletion. It has been observed via the simulation test that the success rates achieved by the proposed predictor are remarkably high. For the convenience of the most experimental scientists, a user-friendly web server has been established at http://jci-bioinfo.cn/domainevo, by which users can easily get their desired results without having to go through the detailed mathematics. Through the simulation results of this website, users can predict the evolution trend of the protein domain architecture.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Guang-Fu Xue
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Biljana Stamatovic
- Faculty of Information Systems and Technologies, University of Donja Gorica, Podgorica, Montenegro
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| |
Collapse
|
18
|
Koo DCE, Bonneau R. Towards region-specific propagation of protein functions. Bioinformatics 2020; 35:1737-1744. [PMID: 30304483 PMCID: PMC6513163 DOI: 10.1093/bioinformatics/bty834] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 08/23/2018] [Accepted: 10/08/2018] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION Due to the nature of experimental annotation, most protein function prediction methods operate at the protein-level, where functions are assigned to full-length proteins based on overall similarities. However, most proteins function by interacting with other proteins or molecules, and many functional associations should be limited to specific regions rather than the entire protein length. Most domain-centric function prediction methods depend on accurate domain family assignments to infer relationships between domains and functions, with regions that are unassigned to a known domain-family left out of functional evaluation. Given the abundance of residue-level annotations currently available, we present a function prediction methodology that automatically infers function labels of specific protein regions using protein-level annotations and multiple types of region-specific features. RESULTS We apply this method to local features obtained from InterPro, UniProtKB and amino acid sequences and show that this method improves both the accuracy and region-specificity of protein function transfer and prediction. We compare region-level predictive performance of our method against that of a whole-protein baseline method using proteins with structurally verified binding sites and also compare protein-level temporal holdout predictive performances to expand the variety and specificity of GO terms we could evaluate. Our results can also serve as a starting point to categorize GO terms into region-specific and whole-protein terms and select prediction methods for different classes of GO terms. AVAILABILITY AND IMPLEMENTATION The code and features are freely available at: https://github.com/ek1203/rsfp. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Da Chen Emily Koo
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Richard Bonneau
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA.,Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.,Center for Data Science, New York University, New York, NY, USA
| |
Collapse
|
19
|
Bokhari RH, Amirjan N, Jeong H, Kim KM, Caetano-Anollés G, Nasir A. Bacterial Origin and Reductive Evolution of the CPR Group. Genome Biol Evol 2020; 12:103-121. [PMID: 32031619 PMCID: PMC7093835 DOI: 10.1093/gbe/evaa024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2020] [Indexed: 12/24/2022] Open
Abstract
The candidate phyla radiation (CPR) is a proposed subdivision within the bacterial domain comprising several candidate phyla. CPR organisms are united by small genome and physical sizes, lack several metabolic enzymes, and populate deep branches within the bacterial subtree of life. These features raise intriguing questions regarding their origin and mode of evolution. In this study, we performed a comparative and phylogenomic analysis to investigate CPR origin and evolution. Unlike previous gene/protein sequence-based reports of CPR evolution, we used protein domain superfamilies classified by protein structure databases to resolve the evolutionary relationships of CPR with non-CPR bacteria, Archaea, Eukarya, and viruses. Across all supergroups, CPR shared maximum superfamilies with non-CPR bacteria and were placed as deep branching bacteria in most phylogenomic trees. CPR contributed 1.22% of new superfamilies to bacteria including the ribosomal protein L19e and encoded four core superfamilies that are likely involved in cell-to-cell interaction and establishing episymbiotic lifestyles. Although CPR and non-CPR bacterial proteomes gained common superfamilies over the course of evolution, CPR and Archaea had more common losses. These losses mostly involved metabolic superfamilies. In fact, phylogenies built from only metabolic protein superfamilies separated CPR and non-CPR bacteria. These findings indicate that CPR are bacterial organisms that have probably evolved in an Archaea-like manner via the early loss of metabolic functions. We also discovered that phylogenies built from metabolic and informational superfamilies gave contrasting views of the groupings among Archaea, Bacteria, and Eukarya, which add to the current debate on the evolutionary relationships among superkingdoms.
Collapse
Affiliation(s)
| | - Nooreen Amirjan
- Department of Biosciences, COMSATS University Islamabad, Pakistan
| | - Hyeonsoo Jeong
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana
| | - Arshan Nasir
- Department of Biosciences, COMSATS University Islamabad, Pakistan
- Theoretical Biology & Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico
| |
Collapse
|
20
|
Naveenkumar N, Kumar G, Sowdhamini R, Srinivasan N, Vishwanath S. Fold combinations in multi-domain proteins. Bioinformation 2019; 15:342-350. [PMID: 31249437 PMCID: PMC6589474 DOI: 10.6026/97320630015342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Accepted: 05/07/2019] [Indexed: 01/21/2023] Open
Abstract
Domain-domain interactions in multi-domain proteins play an important role in the combined function of individual domains for the
overall biological activity of the protein. The functions of the tethered domains are often coupled and hence, limited numbers of domain
architectures with defined folds are known in nature. Therefore, it is of interest to document the available fold-fold combinations and their
preference in multi-domain proteins. Hence, we analyzed all multi-domain proteins with known structures in the protein databank and
observed that only about 860 fold-fold combinations are present among them. Analyses of multi-domain proteins represented in sequence
database result in recognition of 29,860 fold-fold combinations and it accounts for only 2.8% of the theoretically possible 1,036,080 (1439C2)
fold-fold combinations. The observed preference for fold-fold combinations in multi-domain proteins is interesting in the context of
multiple functions through structural adaptation by gene fusion.
Collapse
Affiliation(s)
- Nagarajan Naveenkumar
- National Center for Biological Science, GKVK Campus, Bengaluru, Karnataka, India - 560065.,Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.,Molecular Biophysics Unit, Indian Institute of Science, Bengaluru, Karnataka, India - 560012
| | - Gayatri Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bengaluru, Karnataka, India - 560012
| | - Ramanathan Sowdhamini
- National Center for Biological Science, GKVK Campus, Bengaluru, Karnataka, India - 560065
| | | | - Sneha Vishwanath
- Molecular Biophysics Unit, Indian Institute of Science, Bengaluru, Karnataka, India - 560012
| |
Collapse
|
21
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
22
|
Navigating Among Known Structures in Protein Space. Methods Mol Biol 2018. [PMID: 30298400 DOI: 10.1007/978-1-4939-8736-8_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.In broad strokes, studies of protein space vary in the entities they represent, the similarity measure comparing these entities, and the representation used. The entities can be, for example, protein chains, domains, supra-domains, or smaller protein sub-parts denoted themes. The measures of similarity between the entities can be based on sequence, structure, function, or any combination of these. The representation can be global, encompassing the whole space, or local, focusing on a particular region surrounding protein(s) of interest. Global representations include lists of grouped proteins, protein networks, and maps. Networks are the abstraction that is derived most directly from the similarity data: each node is the protein entity (e.g., a domain), and edges connect similar domains. Selecting the entities, the similarity measure, and the abstraction are three intertwined decisions: the similarity measures allow us to identify the entities, and the selection of entities influences what is a meaningful similarity measure. Similarly, we seek entities that are related to each other in a way, for which a simple representation describes their relationships succinctly and accurately. This chapter will cover studies that rely on different entities, similarity measures, and a range of representations to better understand protein structure space. Scholars may use publicly available navigators offering a global representation, and in particular the hierarchical classifications SCOP, CATH, and ECOD, or a local representation, which encompass structural alignment algorithms. Alternatively, scholars can configure their own navigator using existing tools. To demonstrate this DIY (do it yourself) approach for navigating in protein space, we investigate substrate-binding proteins. By presenting sequence similarities among this large and diverse protein family as a network, we can infer that one member (pdb ID 4ntl; of yet unknown function) may bind methionine and suggest a putative binding mechanism.
Collapse
|
23
|
Keel BN, Deng B, Moriyama EN. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks. Bioinformatics 2018; 34:1270-1277. [PMID: 29186344 DOI: 10.1093/bioinformatics/btx755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/23/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. Availability and implementation MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. Contact emoriyama2@unl.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Brittney N Keel
- USDA †, ARS, U.S. Meat Animal Research Center, Clay Center, NE 68933, USA.,Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Bo Deng
- Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences and Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
24
|
Slama P. Two-domain analysis of JmjN-JmjC and PHD-JmjC lysine demethylases: Detecting an inter-domain evolutionary stress. Proteins 2017; 86:3-12. [PMID: 28975662 DOI: 10.1002/prot.25394] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Revised: 09/26/2017] [Accepted: 10/03/2017] [Indexed: 11/09/2022]
Abstract
Residues at different positions of a multiple sequence alignment sometimes evolve together, due to a correlated structural or functional stress at these positions. Co-evolution has thus been evidenced computationally in multiple proteins or protein domains. Here, we wish to study whether an evolutionary stress is exerted on a sequence alignment across protein domains, i.e., on longer sequence separations than within a single protein domain. JmjC-containing lysine demethylases were chosen for analysis, as a follow-up to previous studies; these proteins are important multidomain epigenetic regulators. In these proteins, the JmjC domain is responsible for the demethylase activity, and surrounding domains interact with histones, DNA or partner proteins. This family of enzymes was analyzed at the sequence level, in order to determine whether the sequence of JmjC-domains was affected by the presence of a neighboring JmjN domain or PHD finger in the protein. Multiple positions within JmjC sequences were shown to have their residue distributions significantly altered by the presence of the second domain. Structural considerations confirmed the relevance of the analysis for JmjN-JmjC proteins, while among PHD-JmjC proteins, the length of the linker region could be correlated to the residues observed at the most affected positions. The correlation of domain architecture with residue types at certain positions, as well as that of overall architecture with protein function, is discussed. The present results thus evidence the existence of an across-domain evolutionary stress in JmjC-containing demethylases, and provide further insights into the overall domain architecture of JmjC domain-containing proteins.
Collapse
Affiliation(s)
- Patrick Slama
- Independent researcher, Paris, France; Center for Imaging Science, the Johns Hopkins University, Clark Hall, 3400 N Charles Street, Baltimore, Maryland, 21218
| |
Collapse
|
25
|
Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017; 114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected. Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.
Collapse
|
26
|
Leuenberger P, Ganscha S, Kahraman A, Cappelletti V, Boersema PJ, von Mering C, Claassen M, Picotti P. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science 2017; 355:355/6327/eaai7825. [PMID: 28232526 DOI: 10.1126/science.aai7825] [Citation(s) in RCA: 254] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 01/12/2017] [Indexed: 12/14/2022]
Abstract
Temperature-induced cell death is thought to be due to protein denaturation, but the determinants of thermal sensitivity of proteomes remain largely uncharacterized. We developed a structural proteomic strategy to measure protein thermostability on a proteome-wide scale and with domain-level resolution. We applied it to Escherichia coli, Saccharomyces cerevisiae, Thermus thermophilus, and human cells, yielding thermostability data for more than 8000 proteins. Our results (i) indicate that temperature-induced cellular collapse is due to the loss of a subset of proteins with key functions, (ii) shed light on the evolutionary conservation of protein and domain stability, and (iii) suggest that natively disordered proteins in a cell are less prevalent than predicted and (iv) that highly expressed proteins are stable because they are designed to tolerate translational errors that would lead to the accumulation of toxic misfolded species.
Collapse
Affiliation(s)
- Pascal Leuenberger
- Institute of Biochemistry, Department of Biology, ETH Zurich (ETHZ), CH-8093 Zurich, Switzerland.,Systems Biology Graduate School PhD Program, ETHZ and University of Zurich, CH-8093 Zurich, Switzerland
| | - Stefan Ganscha
- Systems Biology Graduate School PhD Program, ETHZ and University of Zurich, CH-8093 Zurich, Switzerland.,Institute of Molecular Systems Biology, Department of Biology, ETHZ, CH-8093 Zurich, Switzerland
| | - Abdullah Kahraman
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | - Valentina Cappelletti
- Institute of Biochemistry, Department of Biology, ETH Zurich (ETHZ), CH-8093 Zurich, Switzerland
| | - Paul J Boersema
- Institute of Biochemistry, Department of Biology, ETH Zurich (ETHZ), CH-8093 Zurich, Switzerland
| | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | - Manfred Claassen
- Institute of Molecular Systems Biology, Department of Biology, ETHZ, CH-8093 Zurich, Switzerland
| | - Paola Picotti
- Institute of Biochemistry, Department of Biology, ETH Zurich (ETHZ), CH-8093 Zurich, Switzerland.
| |
Collapse
|
27
|
Arguments Reinforcing the Three-Domain View of Diversified Cellular Life. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2016; 2016:1851865. [PMID: 28050162 PMCID: PMC5165138 DOI: 10.1155/2016/1851865] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 10/18/2016] [Accepted: 11/03/2016] [Indexed: 11/18/2022]
Abstract
The archaeal ancestor scenario (AAS) for the origin of eukaryotes implies the emergence of a new kind of organism from the fusion of ancestral archaeal and bacterial cells. Equipped with this “chimeric” molecular arsenal, the resulting cell would gradually accumulate unique genes and develop the complex molecular machineries and cellular compartments that are hallmarks of modern eukaryotes. In this regard, proteins related to phagocytosis and cell movement should be present in the archaeal ancestor, thus identifying the recently described candidate archaeal phylum “Lokiarchaeota” as resembling a possible candidate ancestor of eukaryotes. Despite its appeal, AAS seems incompatible with the genomic, molecular, and biochemical differences that exist between Archaea and Eukarya. In particular, the distribution of conserved protein domain structures in the proteomes of cellular organisms and viruses appears hard to reconcile with the AAS. In addition, concerns related to taxon and character sampling, presupposing bacterial outgroups in phylogenies, and nonuniform effects of protein domain structure rearrangement and gain/loss in concatenated alignments of protein sequences cast further doubt on AAS-supporting phylogenies. Here, we evaluate AAS against the traditional “three-domain” world of cellular organisms and propose that the discovery of Lokiarchaeota could be better reconciled under the latter view, especially in light of several additional biological and technical considerations.
Collapse
|
28
|
Bernardes J, Zaverucha G, Vaquero C, Carbone A. Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence. PLoS Comput Biol 2016; 12:e1005038. [PMID: 27472895 PMCID: PMC4966962 DOI: 10.1371/journal.pcbi.1005038] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 06/28/2016] [Indexed: 11/30/2022] Open
Abstract
Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. Current sequence databases contain hundreds of billions of nucleotides coding for genes and a classification of these sequences is a primary problem in genomics. A reasonable way to organize these sequences is through their predicted domains, but the identification of domains in very divergent sequences, spanning the entire phylogenetic tree of species, is a difficult problem. By generating multiple probabilistic models for a domain, describing the spread of evolutionary patterns in different phylogenetic clades, we can effectively explore domains that are likely to be coded in gene sequences. Through a machine learning approach and optimization techniques, coding for expected evolutionary constraints, we filter the many possibilities of domain identification found for a gene and propose the most likely domain architecture associated to it. The application of this novel approach to the full genome of Plasmodium falciparum, to a dataset of sequences from three SCOP datasets highlights the interest of exploring multiple pathways of domain evolution in the aim of extracting biological information from genomic sequences. Our new computational approach was developed with the hope of providing a novel tier of accurate and precise tools that complement existing tools such as HMMer, HHblits and PSI-BLAST, by exploring in a novel way the large amount of sequence data available. The existence of powerful databases for sequences, domains and architectures help make this hope a reality.
Collapse
Affiliation(s)
- Juliana Bernardes
- Sorbonne Universités, UPMC Univ-Paris 6, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
- * E-mail: (JB); (AC)
| | - Gerson Zaverucha
- COPPE, Programa de Engenharia de Sistemas e Computação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Catherine Vaquero
- Sorbonne Universités, UPMC Univ-Paris 6, INSERM U1135, CNRS ERL 8255, Centre d’Immunologie et des Maladies Infectieuses (CIMI-Paris), Paris, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC Univ-Paris 6, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
- Institut Universitaire de France, Paris, France
- * E-mail: (JB); (AC)
| |
Collapse
|
29
|
Lees JG, Dawson NL, Sillitoe I, Orengo CA. Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 2016; 38:44-52. [DOI: 10.1016/j.sbi.2016.05.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 10/21/2022]
|
30
|
Papaleo E, Saladino G, Lambrughi M, Lindorff-Larsen K, Gervasio FL, Nussinov R. The Role of Protein Loops and Linkers in Conformational Dynamics and Allostery. Chem Rev 2016; 116:6391-423. [DOI: 10.1021/acs.chemrev.5b00623] [Citation(s) in RCA: 239] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Elena Papaleo
- Computational
Biology Laboratory, Unit of Statistics, Bioinformatics and Registry, Danish Cancer Society Research Center, Strandboulevarden 49, 2100 Copenhagen, Denmark
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Giorgio Saladino
- Department
of Chemistry, University College London, London WC1E 6BT, United Kingdom
| | - Matteo Lambrughi
- Department
of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza
della Scienza 2, 20126 Milan, Italy
| | - Kresten Lindorff-Larsen
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | | | - Ruth Nussinov
- Cancer
and Inflammation Program, Leidos Biomedical Research, Inc., Frederick
National Laboratory for Cancer Research, National Cancer Institute Frederick, Frederick, Maryland 21702, United States
- Sackler Institute
of Molecular Medicine, Department of Human Genetics and Molecular
Medicine Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
31
|
Abstract
Specific conformations of signaling proteins can serve as “signals” in signal transduction by being recognized by receptors.
Collapse
Affiliation(s)
- Peter Tompa
- VIB Structural Biology Research Center (SBRC)
- Brussels
- Belgium
- Vrije Universiteit Brussel
- Brussels
| |
Collapse
|
32
|
Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015; 35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]
Abstract
Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins.
Collapse
Affiliation(s)
- Andrea Scaiewicz
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States.
| |
Collapse
|
33
|
Bernardes JS, Vieira FRJ, Zaverucha G, Carbone A. A multi-objective optimization approach accurately resolves protein domain architectures. Bioinformatics 2015; 32:345-53. [PMID: 26458889 PMCID: PMC4734041 DOI: 10.1093/bioinformatics/btv582] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 10/02/2015] [Indexed: 11/15/2022] Open
Abstract
Motivation: Given a protein sequence and a number of potential domains matching it, what are the domain content and the most likely domain architecture for the sequence? This problem is of fundamental importance in protein annotation, constituting one of the main steps of all predictive annotation strategies. On the other hand, when potential domains are several and in conflict because of overlapping domain boundaries, finding a solution for the problem might become difficult. An accurate prediction of the domain architecture of a multi-domain protein provides important information for function prediction, comparative genomics and molecular evolution. Results: We developed DAMA (Domain Annotation by a Multi-objective Approach), a novel approach that identifies architectures through a multi-objective optimization algorithm combining scores of domain matches, previously observed multi-domain co-occurrence and domain overlapping. DAMA has been validated on a known benchmark dataset based on CATH structural domain assignments and on the set of Plasmodium falciparum proteins. When compared with existing tools on both datasets, it outperforms all of them. Availability and implementation: DAMA software is implemented in C++ and the source code can be found at http://www.lcqb.upmc.fr/DAMA. Contact:juliana.silva_bernardes@upmc.fr or alessandra.carbone@lip6.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- J S Bernardes
- Sorbonne Universités, UPMC Univ-Paris 6, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, 15 rue de l'Ecole de Médecine, 75006 Paris
| | - F R J Vieira
- CNRS, UMR 7606, Laboratoire d'Informatique de Paris 6, 75005 Paris, France and COPPE-UFRJ, Programa de Engenharia de Sistemas e Computação, Rio de Janeiro, Brazil
| | - G Zaverucha
- COPPE-UFRJ, Programa de Engenharia de Sistemas e Computação, Rio de Janeiro, Brazil
| | - A Carbone
- Sorbonne Universités, UPMC Univ-Paris 6, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, 15 rue de l'Ecole de Médecine, 75006 Paris, Institut Universitaire de France, 75005 Paris
| |
Collapse
|
34
|
Stolzer M, Siewert K, Lai H, Xu M, Durand D. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 2015; 16 Suppl 14:S8. [PMID: 26451642 PMCID: PMC4610023 DOI: 10.1186/1471-2105-16-s14-s8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reconstructing evolution provides valuable insights into the processes of gene evolution and function. However, while there have been great advances in algorithms and software to reconstruct the history of gene families, these tools do not model the domain shuffling events (domain duplication, insertion, transfer, and deletion) that drive the evolution of multidomain protein families. Protein evolution through domain shuffling events allows for rapid exploration of functions by introducing new combinations of existing folds. This powerful mechanism was key to some significant evolutionary innovations, such as multicellularity and the vertebrate immune system. A method for reconstructing this important evolutionary process is urgently needed. RESULTS Here, we introduce a novel, event-based framework for studying multidomain evolution by reconciling a domain tree with a gene tree, with additional information provided by the species tree. In the context of this framework, we present the first reconciliation algorithms to infer domain shuffling events, while addressing the challenges inherent in the inference of evolution across three levels of organization. CONCLUSIONS We apply these methods to the evolution of domains in the Membrane associated Guanylate Kinase family. These case studies reveal a more vivid and detailed evolutionary history than previously provided. Our algorithms have been implemented in software, freely available at http://www.cs.cmu.edu/˜durand/Notung.
Collapse
|
35
|
Reid WR, Zhang L, Liu N. Temporal Gene Expression Profiles of Pre Blood-Fed Adult Females Immediately Following Eclosion in the Southern House Mosquito Culex Quinquefasciatus. Int J Biol Sci 2015; 11:1306-13. [PMID: 26435696 PMCID: PMC4582154 DOI: 10.7150/ijbs.12829] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 07/28/2015] [Indexed: 01/08/2023] Open
Abstract
Prior to acquisition of the first host blood meal, the anautogenous mosquito Culex quinquefasciatus requires a period of time in order to prepare for the blood feeding and, later, vitellogenesis. In the current study, we conducted whole transcriptome analyses of adult female Culex mosquitoes to identify genes that may be necessary for both taking of the blood meal, and processing of the blood meal in adult female mosquitoes Cx. quinquefasciatus. We examined temporal expression of genes for the periods of post eclosion and prior to the female freely taking a blood meal. We further evaluated the temporal expression of certain genes for the periods after the taking of a blood meal to identify genes that may be necessary for both the taking of the blood meal, and the processing of the blood meal. We found that adult females required a minimum of 48 h post-eclosion before they freely took their first blood meal. We hypothesized that gene expression signatures were altered in the mosquitoes before blood feeding in preparation for the acquisition of the blood meal through changes in multiple gene expression. To identify the genes involved in the acquisition of blood feeding, we quantified the gene expression levels of adult female Cx. quinquefasciatus using RNA Seq throughout a pre-blooding period from 2 to 72 h post eclosion at 12 h intervals. A total of 325 genes were determined to be differentially-expressed throughout the pre-blooding period, with the majority of differentially-expressed genes occurring between the 2 h and 12 h post-eclosion time points. Among the up-regulated genes were salivary proteins, cytochrome P450s, odorant-binding proteins, and proteases, while the majority of the down-regulated genes were hypothetical or cuticular genes. In addition, Trypsin was found to be up-regulated immediately following blood feeding, while trypsin and chymotrypsin were up-regulated at 48h and 60h post blood-feeding, respectively, suggesting that these proteases are likely involved in the digestion of the blood meal. Overall, this study reviewed multiple genes that might be involved in the adult female competency for blood meal acquisition in mosquitoes.
Collapse
Affiliation(s)
- William R Reid
- 1. Department of Entomology and Plant Pathology, Auburn University, Auburn, AL 36849, USA ; 2. Current address: UDSA-ARS Center for Medical Veterinary and Agricultural Entomology, Mosquito and Fly Research Unit, Gainesville, FL 32608, USA
| | - Lee Zhang
- 3. Genomics Laboratory, Auburn University, Auburn, AL 36849, USA
| | - Nannan Liu
- 1. Department of Entomology and Plant Pathology, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
36
|
Chang TC, Stergiopoulos I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett 2015; 589:1813-8. [PMID: 26067847 DOI: 10.1016/j.febslet.2015.05.048] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Revised: 05/11/2015] [Accepted: 05/20/2015] [Indexed: 10/23/2022]
Abstract
Domain promiscuity is a powerful evolutionary force that promotes functional innovation in proteins, thus increasing proteome and organismal complexity. Carbohydrate-binding modules, in particular, are known to partake in complex modular architectures that play crucial roles in numerous biochemical and molecular processes. However, the extent, functional, and evolutionary significance of promiscuity is shrouded in mystery for most CBM families. Here, we analyzed the global promiscuity of family 14 carbohydrate-binding modules (CBM14s) and show that fusion, fission, and reorganization events with numerous other domain types interplayed incessantly in a lineage-dependent manner to likely facilitate species adaptation and functional innovation in the family.
Collapse
Affiliation(s)
- Ti-Cheng Chang
- Department of Plant Pathology, University of California Davis, Davis, CA, USA
| | | |
Collapse
|
37
|
Multiple nucleophilic elbows leading to multiple active sites in a single module esterase from Sorangium cellulosum. J Struct Biol 2015; 190:314-27. [DOI: 10.1016/j.jsb.2015.04.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Revised: 03/25/2015] [Accepted: 04/10/2015] [Indexed: 11/17/2022]
|
38
|
Linkeviciute V, Rackham OJL, Gough J, Oates ME, Fang H. Function-selective domain architecture plasticity potentials in eukaryotic genome evolution. Biochimie 2015; 119:269-77. [PMID: 25980317 PMCID: PMC4679076 DOI: 10.1016/j.biochi.2015.05.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 05/06/2015] [Indexed: 12/20/2022]
Abstract
To help evaluate how protein function impacts on genome evolution, we introduce a new concept of ‘architecture plasticity potential’ – the capacity to form distinct domain architectures – both for an individual domain, or more generally for a set of domains grouped by shared function. We devise a scoring metric to measure the plasticity potential for these domain sets, and evaluate how function has changed over time for different species. Applying this metric to a phylogenetic tree of eukaryotic genomes, we find that the involvement of each function is not random but highly selective. For certain lineages there is strong bias for evolution to involve domains related to certain functions. In general eukaryotic genomes, particularly animals, expand complex functional activities such as signalling and regulation, but at the cost of reducing metabolic processes. We also observe differential evolution of transcriptional regulation and a unique evolutionary role of channel regulators; crucially this is only observable in terms of the architecture plasticity potential. Our findings provide a new layer of information to understand the significance of function in eukaryotic genome evolution. A web search tool, available at http://supfam.org/Pevo, offers a wide spectrum of options for exploring functional importance in eukaryotic genome evolution. A new concept to measure domain architecture plasticity potential in a genome. We reveal the function-selective role in eukaryotic genome evolution. Eukaryotic genomes expand signalling and regulations but reduce metabolism. We observe differential evolution between trans- and cis-acting regulations. We observe a unique role of channel regulators in separating eukaryotic kingdoms.
Collapse
Affiliation(s)
- Viktorija Linkeviciute
- Computational Genomics Group, Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK; School of Biological Sciences, University of Edinburgh, Darwin Building, The King's Buildings, Edinburgh EH9 3BF, UK
| | - Owen J L Rackham
- Computational Genomics Group, Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK; Centre for Computational Biology, Duke-NUS Graduate Medical School, Singapore 169857, Singapore
| | - Julian Gough
- Computational Genomics Group, Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK
| | - Matt E Oates
- Computational Genomics Group, Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK
| | - Hai Fang
- Computational Genomics Group, Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK; Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.
| |
Collapse
|
39
|
Mbandi SK, Hesse U, van Heusden P, Christoffels A. Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms. BMC Bioinformatics 2015; 16:58. [PMID: 25880035 PMCID: PMC4344733 DOI: 10.1186/s12859-015-0492-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 02/06/2015] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies. RESULTS Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5' and 3') regions and non-coding gene loci. CONCLUSIONS IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.
Collapse
Affiliation(s)
- Stanley Kimbung Mbandi
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | - Uljana Hesse
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | - Peter van Heusden
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | - Alan Christoffels
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| |
Collapse
|
40
|
Krishnamurthy P, Hong JK, Kim JA, Jeong MJ, Lee YH, Lee SI. Genome-wide analysis of the expansin gene superfamily reveals Brassica rapa-specific evolutionary dynamics upon whole genome triplication. Mol Genet Genomics 2014; 290:521-30. [PMID: 25325993 DOI: 10.1007/s00438-014-0935-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 09/30/2014] [Indexed: 01/27/2023]
Abstract
Chinese cabbage (Brassica rapa subsp. pekinensis) is an economically important vegetable that has encountered four rounds of polyploidization. The fourth event, whole genome triplication (WGT), occurred after its divergence from Arabidopsis. Expansins (EXPs) are cell wall loosening proteins that participate in cell wall modification processes. In this study, the impacts of WGT on the B. rapa expansin (BrEXP) superfamily were evaluated. Whole genome screening of B. rapa identified 32 loci coding 53 expansin genes. Fifteen of the loci maintained a single gene copy, 15 maintained two gene copies and 2 maintained three gene copies. Six loci had no synteny to any Arabidopsis thaliana orthologs. Two loci were involved in tandem duplication. Segmental duplication and fragment recombination were dominant in accelerating BrEXP evolution. Three genes (BrEXPA7, BrEXLA1 and BrEXLA2) lost one of their ancestral introns, two genes (BrEXPA18 and BrEXPB6) gained new introns, and a domain tandem repeat (BrEXPA18) and domain recombination (Bra016981; not considered as expansin) were observed in one gene each. Further, domain deletion was observed in an additional five genes (Bra033068, Bra000142, Bra025800, Bra016473 and Bra004891, not considered as expansins) that lost one of their expansin-specific domains evolutionarily. These findings provide a basis for the evolution and modification of the BrEXP superfamily after a WGT event, which will help in determining the functional characteristics of BrEXPs.
Collapse
Affiliation(s)
- Panneerselvam Krishnamurthy
- Department of Agricultural Biotechnology, National Academy of Agricultural Science (NAAS), Jeonju, 560-500, Korea
| | | | | | | | | | | |
Collapse
|
41
|
Cromar G, Wong KC, Loughran N, On T, Song H, Xiong X, Zhang Z, Parkinson J. New tricks for "old" domains: how novel architectures and promiscuous hubs contributed to the organization and evolution of the ECM. Genome Biol Evol 2014; 6:2897-917. [PMID: 25323955 PMCID: PMC4224354 DOI: 10.1093/gbe/evu228] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/07/2014] [Indexed: 12/15/2022] Open
Abstract
The extracellular matrix (ECM) is a defining characteristic of metazoans and consists of a meshwork of self-assembling, fibrous proteins, and their functionally related neighbours. Previous studies, focusing on a limited number of gene families, suggest that vertebrate complexity predominantly arose through the duplication and subsequent modification of retained, preexisting ECM genes. These genes provided the structural underpinnings to support a variety of specialized tissues, as well as a platform for the organization of spatio-temporal signaling and cell migration. However, the relative contributions of ancient versus novel domains to ECM evolution have not been quantified across the full range of ECM proteins. Here, utilizing a high quality list comprising 324 ECM genes, we reveal general and clade-specific domain combinations, identifying domains of eukaryotic and metazoan origin recruited into new roles in approximately two-third of the ECM proteins in humans representing novel vertebrate proteins. We show that, rather than acquiring new domains, sampling of new domain combinations has been key to the innovation of paralogous ECM genes during vertebrate evolution. Applying a novel framework for identifying potentially important, noncontiguous, conserved arrangements of domains, we find that the distinct biological characteristics of the ECM have arisen through unique evolutionary processes. These include the preferential recruitment of novel domains to existing architectures and the utilization of high promiscuity domains in organizing the ECM network around a connected array of structural hubs. Our focus on ECM proteins reveals that distinct types of proteins and/or the biological systems in which they operate have influenced the types of evolutionary forces that drive protein innovation. This emphasizes the need for rigorously defined systems to address questions of evolution that focus on specific systems of interacting proteins.
Collapse
Affiliation(s)
- Graham Cromar
- Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada Department of Molecular Genetics, University of Toronto, Ontario, Canada
| | - Ka-Chun Wong
- Department of Computer Science, University of Toronto, Ontario, Canada Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Ontario, Canada
| | - Noeleen Loughran
- Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Tuan On
- Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada Department of Molecular Genetics, University of Toronto, Ontario, Canada
| | - Hongyan Song
- Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Xuejian Xiong
- Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Zhaolei Zhang
- Department of Molecular Genetics, University of Toronto, Ontario, Canada Department of Computer Science, University of Toronto, Ontario, Canada Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Ontario, Canada Banting and Best Department of Medical Research, University of Toronto, Ontario, Canada
| | - John Parkinson
- Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada Department of Molecular Genetics, University of Toronto, Ontario, Canada Department of Biochemistry, University of Toronto, Ontario, Canada
| |
Collapse
|
42
|
Kočar V, Božič Abram S, Doles T, Bašić N, Gradišar H, Pisanski T, Jerala R. TOPOFOLD, the designed modular biomolecular folds: polypeptide-based molecular origami nanostructures following the footsteps of DNA. WILEY INTERDISCIPLINARY REVIEWS-NANOMEDICINE AND NANOBIOTECHNOLOGY 2014; 7:218-37. [PMID: 25196147 DOI: 10.1002/wnan.1289] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2014] [Revised: 07/08/2014] [Accepted: 07/20/2014] [Indexed: 12/14/2022]
Abstract
Biopolymers, the essential components of life, are able to form many complex nanostructures, and proteins in particular are the material of choice for most cellular processes. Owing to numerous cooperative interactions, rational design of new protein folds remains extremely challenging. An alternative strategy is to design topofolds-nanostructures built from polypeptide arrays of interacting modules that define their topology. Over the course of the last several decades DNA has successfully been repurposed from its native role of information storage to a smart nanomaterial used for nanostructure self-assembly of almost any shape, which is largely because of its programmable nature. Unfortunately, polypeptides do not possess the straightforward complementarity as do nucleic acids. However, a modular approach can nevertheless be used to assemble polypeptide nanostructures, as was recently demonstrated on a single-chain polypeptide tetrahedron. This review focuses on the current state-of-the-art in the field of topological polypeptide folds. It starts with a brief overview of the field of structural DNA and RNA nanotechnology, from which it draws parallels and possible directions of development for the emerging field of polypeptide-based nanotechnology. The principles of topofold strategy and unique properties of such polypeptide nanostructures in comparison to native protein folds are discussed. Reasons for the apparent absence of such folds in nature are also examined. Physicochemical versatility of amino acid residues and cost-effective production makes polypeptides an attractive platform for designed functional bionanomaterials.
Collapse
Affiliation(s)
- Vid Kočar
- Department of Biotechnology, National Institute of Chemistry, Ljubljana, Slovenia
| | | | | | | | | | | | | |
Collapse
|
43
|
Khafif M, Cottret L, Balagué C, Raffaele S. Identification and phylogenetic analyses of VASt, an uncharacterized protein domain associated with lipid-binding domains in Eukaryotes. BMC Bioinformatics 2014; 15:222. [PMID: 24965341 PMCID: PMC4082322 DOI: 10.1186/1471-2105-15-222] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 06/19/2014] [Indexed: 01/25/2023] Open
Abstract
Background Several regulators of programmed cell death (PCD) in plants encode proteins with putative lipid-binding domains. Among them, VAD1 is a regulator of PCD propagation harboring a GRAM putative lipid-binding domain. However the function of VAD1 at the subcellular level is unknown and the domain architecture of VAD1 has not been analyzed in details. Results We analyzed sequence conservation across the plant kingdom in the VAD1 protein and identified an uncharacterized VASt (VAD1 Analog of StAR-related lipid transfer) domain. Using profile hidden Markov models (profile HMMs) and phylogenetic analysis we found that this domain is conserved among eukaryotes and generally associates with various lipid-binding domains. Proteins containing both a GRAM and a VASt domain include notably the yeast Ysp2 cell death regulator and numerous uncharacterized proteins. Using structure-based phylogeny, we found that the VASt domain is structurally related to Bet v1-like domains. Conclusion We identified a novel protein domain ubiquitous in Eukaryotic genomes and belonging to the Bet v1-like superfamily. Our findings open perspectives for the functional analysis of VASt-containing proteins and the characterization of novel mechanisms regulating PCD.
Collapse
Affiliation(s)
| | | | | | - Sylvain Raffaele
- INRA, Laboratoire des Interactions Plantes-Microorganismes (LIPM), UMR441, 24 Chemin de Borde Rouge - Auzeville, CS52627, F31326 Castanet Tolosan Cedex, France.
| |
Collapse
|
44
|
A daily-updated tree of (sequenced) life as a reference for genome research. Sci Rep 2014; 3:2015. [PMID: 23778980 PMCID: PMC6504836 DOI: 10.1038/srep02015] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Accepted: 05/10/2013] [Indexed: 11/08/2022] Open
Abstract
We report a daily-updated sequenced/species Tree Of Life (sTOL) as a reference for the increasing number of cellular organisms with their genomes sequenced. The sTOL builds on a likelihood-based weight calibration algorithm to consolidate NCBI taxonomy information in concert with unbiased sampling of molecular characters from whole genomes of all sequenced organisms. Via quantifying the extent of agreement between taxonomic and molecular data, we observe there are many potential improvements that can be made to the status quo classification, particularly in the Fungi kingdom; we also see that the current state of many animal genomes is rather poor. To augment the use of sTOL in providing evolutionary contexts, we integrate an ontology infrastructure and demonstrate its utility for evolutionary understanding on: nuclear receptors, stem cells and eukaryotic genomes. The sTOL (http://supfam.org/SUPERFAMILY/sTOL) provides a binary tree of (sequenced) life, and contributes to an analytical platform linking genome evolution, function and phenotype.
Collapse
|
45
|
Ishino S, Yamagami T, Kitamura M, Kodera N, Mori T, Sugiyama S, Ando T, Goda N, Tenno T, Hiroaki H, Ishino Y. Multiple interactions of the intrinsically disordered region between the helicase and nuclease domains of the archaeal Hef protein. J Biol Chem 2014; 289:21627-39. [PMID: 24947516 DOI: 10.1074/jbc.m114.554998] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Hef is an archaeal protein that probably functions mainly in stalled replication fork repair. The presence of an unstructured region was predicted between the two distinct domains of the Hef protein. We analyzed the interdomain region of Thermococcus kodakarensis Hef and demonstrated its disordered structure by CD, NMR, and high speed atomic force microscopy (AFM). To investigate the functions of this intrinsically disordered region (IDR), we screened for proteins interacting with the IDR of Hef by a yeast two-hybrid method, and 10 candidate proteins were obtained. We found that PCNA1 and a RecJ-like protein specifically bind to the IDR in vitro. These results suggested that the Hef protein interacts with several different proteins that work together in the pathways downstream from stalled replication fork repair by converting the IDR structure depending on the partner protein.
Collapse
Affiliation(s)
- Sonoko Ishino
- From the Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, and Faculty of Agriculture, Kyushu University, Fukuoka 812-8581
| | - Takeshi Yamagami
- From the Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, and Faculty of Agriculture, Kyushu University, Fukuoka 812-8581
| | - Makoto Kitamura
- From the Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, and Faculty of Agriculture, Kyushu University, Fukuoka 812-8581
| | - Noriyuki Kodera
- the Bio-AFM Frontier Research Center and Department of Physics, College of Science and Engineering, Kanazawa University, Kanazawa 920-1192, and
| | - Tetsuya Mori
- the Bio-AFM Frontier Research Center and Department of Physics, College of Science and Engineering, Kanazawa University, Kanazawa 920-1192, and
| | - Shyogo Sugiyama
- the Bio-AFM Frontier Research Center and Department of Physics, College of Science and Engineering, Kanazawa University, Kanazawa 920-1192, and
| | - Toshio Ando
- the Bio-AFM Frontier Research Center and Department of Physics, College of Science and Engineering, Kanazawa University, Kanazawa 920-1192, and
| | - Natsuko Goda
- the Graduate School of Pharmaceutical Sciences, Nagoya University, Nagoya 464-8601, Japan
| | - Takeshi Tenno
- the Graduate School of Pharmaceutical Sciences, Nagoya University, Nagoya 464-8601, Japan
| | - Hidekazu Hiroaki
- the Graduate School of Pharmaceutical Sciences, Nagoya University, Nagoya 464-8601, Japan
| | - Yoshizumi Ishino
- From the Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, and Faculty of Agriculture, Kyushu University, Fukuoka 812-8581,
| |
Collapse
|
46
|
Pandya C, Dunaway-Mariano D, Xia Y, Allen KN. Structure-guided approach for detecting large domain inserts in protein sequences as illustrated using the haloacid dehalogenase superfamily. Proteins 2014; 82:1896-906. [PMID: 24577717 DOI: 10.1002/prot.24543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Revised: 02/19/2014] [Accepted: 02/22/2014] [Indexed: 11/11/2022]
Abstract
In multi-domain proteins, the domains typically run end-to-end, that is, one domain follows the C-terminus of another domain. However, approximately 10% of multi-domain proteins are formed by insertion of one domain sequence into that of another domain. Detecting such insertions within protein sequences is a fundamental challenge in structural biology. The haloacid dehalogenase superfamily (HADSF) serves as a challenging model system wherein a variable cap domain (∼5-200 residues in length) accessorizes the ubiquitous Rossmann-fold core domain, with variations in insertion site and topology corresponding to different classes of cap types. Herein, we describe a comprehensive computational strategy, CapPredictor, for determining large, variable domain insertions in protein sequences. Using a novel sequence-alignment algorithm in conjunction with a structure-guided sequence profile from 154 core-domain-only structures, more than 40,000 HADSF member sequences were assigned cap types. The resulting data set afforded insight into HADSF evolution. Notably, a similar distribution of cap-type classes across different phyla was observed, indicating that all cap types existed in the last universal common ancestor. In addition, comparative analyses of the predicted cap-type and functional assignments showed that different cap types carry out similar chemistries. Thus, while cap domains play a role in substrate recognition and chemical reactivity, cap-type does not strictly define functional class. Through this example, we have shown that CapPredictor is an effective new tool for the study of form and function in protein families where domain insertion occurs.
Collapse
Affiliation(s)
- Chetanya Pandya
- Bioinformatics Graduate Program, Boston University, 24 Cummington Mall, Boston, Massachusetts, 02215
| | | | | | | |
Collapse
|
47
|
Abstract
Efforts from the TB Structural Genomics Consortium together with those of tuberculosis structural biologists worldwide have led to the determination of about 350 structures, making up nearly a tenth of the pathogen's proteome. Given that knowledge of protein structures is essential to obtaining a high-resolution understanding of the underlying biology, it is desirable to have a structural view of the entire proteome. Indeed, structure prediction methods have advanced sufficiently to allow structural models of many more proteins to be built based on homology modeling and fold recognition strategies. By means of these approaches, structural models for about 2,877 proteins, making up nearly 70% of the Mycobacterium tuberculosis proteome, are available. Knowledge from bioinformatics has made significant inroads into an improved annotation of the M. tuberculosis genome and in the prediction of key protein players that interact in vital pathways, some of which are unique to the organism. Functional inferences have been made for a large number of proteins based on fold-function associations. More importantly, ligand-binding pockets of the proteins are identified and scanned against a large database, leading to binding site-based ligand associations and hence structure-based function annotation. Near proteome-wide structural models provide a global perspective of the fold distribution in the genome. New insights about the folds that predominate in the genome, as well as the fold combinations that make up multidomain proteins, are also obtained. This chapter describes the structural proteome, functional inferences drawn from it, and its applications in drug discovery.
Collapse
|
48
|
Baines AJ, Lu HC, Bennett PM. The Protein 4.1 family: hub proteins in animals for organizing membrane proteins. BIOCHIMICA ET BIOPHYSICA ACTA 2014; 1838:605-19. [PMID: 23747363 DOI: 10.1016/j.bbamem.2013.05.030] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Revised: 05/22/2013] [Accepted: 05/28/2013] [Indexed: 01/10/2023]
Abstract
Proteins of the 4.1 family are characteristic of eumetazoan organisms. Invertebrates contain single 4.1 genes and the Drosophila model suggests that 4.1 is essential for animal life. Vertebrates have four paralogues, known as 4.1R, 4.1N, 4.1G and 4.1B, which are additionally duplicated in the ray-finned fish. Protein 4.1R was the first to be discovered: it is a major mammalian erythrocyte cytoskeletal protein, essential to the mechanochemical properties of red cell membranes because it promotes the interaction between spectrin and actin in the membrane cytoskeleton. 4.1R also binds certain phospholipids and is required for the stable cell surface accumulation of a number of erythrocyte transmembrane proteins that span multiple functional classes; these include cell adhesion molecules, transporters and a chemokine receptor. The vertebrate 4.1 proteins are expressed in most tissues, and they are required for the correct cell surface accumulation of a very wide variety of membrane proteins including G-Protein coupled receptors, voltage-gated and ligand-gated channels, as well as the classes identified in erythrocytes. Indeed, such large numbers of protein interactions have been mapped for mammalian 4.1 proteins, most especially 4.1R, that it appears that they can act as hubs for membrane protein organization. The range of critical interactions of 4.1 proteins is reflected in disease relationships that include hereditary anaemias, tumour suppression, control of heartbeat and nervous system function. The 4.1 proteins are defined by their domain structure: apart from the spectrin/actin-binding domain they have FERM and FERM-adjacent domains and a unique C-terminal domain. Both the FERM and C-terminal domains can bind transmembrane proteins, thus they have the potential to be cross-linkers for membrane proteins. The activity of the FERM domain is subject to multiple modes of regulation via binding of regulatory ligands, phosphorylation of the FERM associated domain and differential mRNA splicing. Finally, the spectrum of interactions of the 4.1 proteins overlaps with that of another membrane-cytoskeleton linker, ankyrin. Both ankyrin and 4.1 link to the actin cytoskeleton via spectrin, and we hypothesize that differential regulation of 4.1 proteins and ankyrins allows highly selective control of cell surface protein accumulation and, hence, function. This article is part of a Special Issue entitled: Reciprocal influences between cell cytoskeleton and membrane channels, receptors and transporters. Guest Editor: Jean Claude Hervé
Collapse
Affiliation(s)
| | - Hui-Chun Lu
- Randall Division of Cell and Molecular Biophysics, King's College London, UK
| | - Pauline M Bennett
- Randall Division of Cell and Molecular Biophysics, King's College London, UK.
| |
Collapse
|
49
|
Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput Biol 2014; 10:e1003452. [PMID: 24499935 PMCID: PMC3907288 DOI: 10.1371/journal.pcbi.1003452] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 12/03/2013] [Indexed: 12/21/2022] Open
Abstract
Domains are modules within proteins that can fold and function independently and are evolutionarily conserved. Here we compared the usage and distribution of protein domain families in the free-living proteomes of Archaea, Bacteria and Eukarya and reconstructed species phylogenies while tracing the history of domain emergence and loss in proteomes. We show that both gains and losses of domains occurred frequently during proteome evolution. The rate of domain discovery increased approximately linearly in evolutionary time. Remarkably, gains generally outnumbered losses and the gain-to-loss ratios were much higher in akaryotes compared to eukaryotes. Functional annotations of domain families revealed that both Archaea and Bacteria gained and lost metabolic capabilities during the course of evolution while Eukarya acquired a number of diverse molecular functions including those involved in extracellular processes, immunological mechanisms, and cell regulation. Results also highlighted significant contemporary sharing of informational enzymes between Archaea and Eukarya and metabolic enzymes between Bacteria and Eukarya. Finally, the analysis provided useful insights into the evolution of species. The archaeal superkingdom appeared first in evolution by gradual loss of ancestral domains, bacterial lineages were the first to gain superkingdom-specific domains, and eukaryotes (likely) originated when an expanding proto-eukaryotic stem lineage gained organelles through endosymbiosis of already diversified bacterial lineages. The evolutionary dynamics of domain families in proteomes and the increasing number of domain gains is predicted to redefine the persistence strategies of organisms in superkingdoms, influence the make up of molecular functions, and enhance organismal complexity by the generation of new domain architectures. This dynamics highlights ongoing secondary evolutionary adaptations in akaryotic microbes, especially Archaea.
Collapse
|
50
|
Fornili A, Pandini A, Lu HC, Fraternali F. Specialized Dynamical Properties of Promiscuous Residues Revealed by Simulated Conformational Ensembles. J Chem Theory Comput 2013; 9:5127-5147. [PMID: 24250278 PMCID: PMC3827836 DOI: 10.1021/ct400486p] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2013] [Indexed: 12/13/2022]
Abstract
![]()
The
ability to interact with different partners is one of the most
important features in proteins. Proteins that bind a large number
of partners (hubs) have been often associated with intrinsic disorder.
However, many examples exist of hubs with an ordered structure, and
evidence of a general mechanism promoting promiscuity in ordered proteins
is still elusive. An intriguing hypothesis is that promiscuous binding
sites have specific dynamical properties, distinct from the rest of
the interface and pre-existing in the protein isolated state. Here,
we present the first comprehensive study of the intrinsic dynamics
of promiscuous residues in a large protein data set. Different computational
methods, from coarse-grained elastic models to geometry-based sampling
methods and to full-atom Molecular Dynamics simulations, were used
to generate conformational ensembles for the isolated proteins. The
flexibility and dynamic correlations of interface residues with a
different degree of binding promiscuity were calculated and compared
considering side chain and backbone motions, the latter both on a
local and on a global scale. The study revealed that (a) promiscuous
residues tend to be more flexible than nonpromiscuous ones, (b) this
additional flexibility has a higher degree of organization, and (c)
evolutionary conservation and binding promiscuity have opposite effects
on intrinsic dynamics. Findings on simulated ensembles were also validated
on ensembles of experimental structures extracted from the Protein
Data Bank (PDB). Additionally, the low occurrence of single nucleotide
polymorphisms observed for promiscuous residues indicated a tendency
to preserve binding diversity at these positions. A case study on
two ubiquitin-like proteins exemplifies how binding promiscuity in
evolutionary related proteins can be modulated by the fine-tuning
of the interface dynamics. The interplay between promiscuity and flexibility
highlighted here can inspire new directions in protein–protein
interaction prediction and design methods.
Collapse
Affiliation(s)
- Arianna Fornili
- Randall Division of Cell and Molecular Biophysics, King's College London , New Hunt's House, London SE1 1UL, United Kingdom
| | | | | | | |
Collapse
|