1
|
Dixson JD, Azad RK. Physicochemical Evaluation of Remote Homology in the Twilight Zone. Proteins 2024. [PMID: 39219099 DOI: 10.1002/prot.26742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024]
Abstract
A fundamental problem in the field of protein evolutionary biology is determining the degree and nature of evolutionary relatedness among homologous proteins that have diverged to a point where they share less than 30% amino acid identity yet retain similar structures and/or functions. Such proteins are said to lie within the "Twilight Zone" of amino acid identity. Many researchers have leveraged experimentally determined structures in the quest to classify proteins in the Twilight Zone. Such endeavors can be highly time consuming and prohibitively expensive for large-scale analyses. Motivated by this problem, here we use molecular weight-hydrophobicity physicochemical dynamic time warping (MWHP DTW) to quantify similarity of simulated and real-world homologous protein domains. MWHP DTW is a physicochemical method requiring only the amino acid sequence to quantify similarity of related proteins and is particularly useful in determining similarity within the Twilight Zone due to its resilience to primary sequence substitution saturation. This is a step forward in determination of the relatedness among Twilight Zone proteins and most notably allows for the discrimination of random similarity and true homology in the 0%-20% identity range. This method was previously presented expeditiously just after the outbreak of COVID-19 because it was able to functionally cluster ACE2-binding betacoronavirus receptor binding domains (RBDs), a task that has been elusive using standard techniques. Here we show that one reason that MWHP DTW is an effective technique for comparisons within the Twilight Zone is because it can uncover hidden homology by exploiting physicochemical conservation, a problem that protein sequence alignment algorithms are inherently incapable of addressing within the Twilight Zone. Further, we present an extended definition of the Twilight Zone that incorporates the dynamic relationship between structural, physicochemical, and sequence-based metrics.
Collapse
Affiliation(s)
- Jamie Dennis Dixson
- Department of Biological Sciences, University of North Texas, Denton, Texas, USA
| | - Rajeev Kumar Azad
- Department of Biological Sciences, University of North Texas, Denton, Texas, USA
- BioDiscovery Institute, University of North Texas, Denton, Texas, USA
| |
Collapse
|
2
|
Agam G, Barth A, Lamb DC. Folding pathway of a discontinuous two-domain protein. Nat Commun 2024; 15:690. [PMID: 38263337 PMCID: PMC10805907 DOI: 10.1038/s41467-024-44901-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024] Open
Abstract
It is estimated that two-thirds of all proteins in higher organisms are composed of multiple domains, many of them containing discontinuous folds. However, to date, most in vitro protein folding studies have focused on small, single-domain proteins. As a model system for a two-domain discontinuous protein, we study the unfolding/refolding of a slow-folding double mutant of the maltose binding protein (DM-MBP) using single-molecule two- and three-color Förster Resonance Energy Transfer experiments. We observe a dynamic folding intermediate population in the N-terminal domain (NTD), C-terminal domain (CTD), and at the domain interface. The dynamic intermediate fluctuates rapidly between unfolded states and compact states, which have a similar FRET efficiency to the folded conformation. Our data reveals that the delayed folding of the NTD in DM-MBP is imposed by an entropic barrier with subsequent folding of the highly dynamic CTD. Notably, accelerated DM-MBP folding is routed through the same dynamic intermediate within the cavity of the GroEL/ES chaperone system, suggesting that the chaperonin limits the conformational space to overcome the entropic folding barrier. Our study highlights the subtle tuning and co-dependency in the folding of a discontinuous multi-domain protein.
Collapse
Affiliation(s)
- Ganesh Agam
- Department of Chemistry, Ludwig-Maximilians University Munich, Munich, Germany
- Center for NanoScience, Ludwig-Maximilians University Munich, Munich, Germany
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0QH, UK
| | - Anders Barth
- Department of Chemistry, Ludwig-Maximilians University Munich, Munich, Germany
- Center for NanoScience, Ludwig-Maximilians University Munich, Munich, Germany
- Department of Bionanoscience, Kavli Institute of Nanoscience Delft, Delft University of Technology, 2629HZ, Delft, The Netherlands
| | - Don C Lamb
- Department of Chemistry, Ludwig-Maximilians University Munich, Munich, Germany.
- Center for NanoScience, Ludwig-Maximilians University Munich, Munich, Germany.
| |
Collapse
|
3
|
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023; 91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Collapse
Affiliation(s)
- Pavan Gollapalli
- Center for Bioinformatics and Biostatistics, Nitte (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Sushmitha Rudrappa
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India
| | - Vadlapudi Kumar
- Department of Biochemistry, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - Hulikal Shivashankara Santosh Kumar
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India.
| |
Collapse
|
4
|
Shams A, Higgins SA, Fellmann C, Laughlin TG, Oakes BL, Lew R, Kim S, Lukarska M, Arnold M, Staahl BT, Doudna JA, Savage DF. Comprehensive deletion landscape of CRISPR-Cas9 identifies minimal RNA-guided DNA-binding modules. Nat Commun 2021; 12:5664. [PMID: 34580310 PMCID: PMC8476515 DOI: 10.1038/s41467-021-25992-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 09/10/2021] [Indexed: 11/28/2022] Open
Abstract
Proteins evolve through the modular rearrangement of elements known as domains. Extant, multidomain proteins are hypothesized to be the result of domain accretion, but there has been limited experimental validation of this idea. Here, we introduce a technique for genetic minimization by iterative size-exclusion and recombination (MISER) for comprehensively making all possible deletions of a protein. Using MISER, we generate a deletion landscape for the CRISPR protein Cas9. We find that the catalytically-dead Streptococcus pyogenes Cas9 can tolerate large single deletions in the REC2, REC3, HNH, and RuvC domains, while still functioning in vitro and in vivo, and that these deletions can be stacked together to engineer minimal, DNA-binding effector proteins. In total, our results demonstrate that extant proteins retain significant modularity from the accretion process and, as genetic size is a major limitation for viral delivery systems, establish a general technique to improve genome editing and gene therapy-based therapeutics.
Collapse
Affiliation(s)
- Arik Shams
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Sean A Higgins
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Scribe Therapeutics, Alameda, CA, 94501, USA
| | - Christof Fellmann
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Gladstone Institutes, San Francisco, CA, 94158, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Thomas G Laughlin
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Division of Biological Sciences, University of California, San Diego, San Diego, CA, 92093, USA
| | - Benjamin L Oakes
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Scribe Therapeutics, Alameda, CA, 94501, USA
| | - Rachel Lew
- Gladstone Institutes, San Francisco, CA, 94158, USA
| | - Shin Kim
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Maria Lukarska
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Madeline Arnold
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Brett T Staahl
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Scribe Therapeutics, Alameda, CA, 94501, USA
| | - Jennifer A Doudna
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Gladstone Institutes, San Francisco, CA, 94158, USA
- Graduate Group in Biophysics, University of California, Berkeley, Berkeley, CA, 94720, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - David F Savage
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA.
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA.
| |
Collapse
|
5
|
Yadav A, Fernández-Baca D, Cannon SB. Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families. Evol Bioinform Online 2020; 16:1176934320939943. [PMID: 32694909 PMCID: PMC7350399 DOI: 10.1177/1176934320939943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 06/15/2020] [Indexed: 11/27/2022] Open
Abstract
Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.
Collapse
Affiliation(s)
- Akshay Yadav
- Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA, USA
| | | | - Steven B Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, USA
| |
Collapse
|
6
|
Borriello E, Walker SI, Laubichler MD. Cell phenotypes as macrostates of the GRN dynamics. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2020; 334:213-224. [DOI: 10.1002/jez.b.22938] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 02/16/2020] [Accepted: 02/17/2020] [Indexed: 01/04/2023]
Affiliation(s)
- Enrico Borriello
- ASU‐SFI Center for Biosocial Complex SystemsArizona State UniversityTempe Arizona
| | - Sara I. Walker
- ASU‐SFI Center for Biosocial Complex SystemsArizona State UniversityTempe Arizona
- Beyond Center for Fundamental Concepts in ScienceArizona State UniversityTempe Arizona
- School of Earth and Space ExplorationArizona State UniversityTempe Arizona
- Blue Marble Space Institute of ScienceSeattle Washington
| | - Manfred D. Laubichler
- ASU‐SFI Center for Biosocial Complex SystemsArizona State UniversityTempe Arizona
- Santa Fe InstituteSanta Fe New Mexico
- Marine Biological LaboratoryWoods Hole Massachusetts
- School of Life SciencesArizona State UniversityTempe Arizona
| |
Collapse
|
7
|
Upadhyay A. Structure of proteins: Evolution with unsolved mysteries. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 149:160-172. [PMID: 31014967 DOI: 10.1016/j.pbiomolbio.2019.04.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 04/16/2019] [Accepted: 04/19/2019] [Indexed: 02/07/2023]
Abstract
Evolution of macromolecules could be considered as a milestone in the history of life. Nucleic acids are the long stretches of nucleotides that contain all the possible codes and information of life. On the other hand, proteins are their actual translated outcomes, or reflections of modifications in their structure that have occurred at a slow, but steady rate over a very long period of evolution. Over the years of research, biophysicists, biochemists, molecular and structural biologists have unfurled several layers of the structural convolutions in these chemical molecules; however evolutionists look over their structures through a different prism, which may or may not coincide with others. There remains a need to outline several well-known, but less discussed features of protein structures, like intrinsically disordered states, degron signals and different types of ubiquitin chains providing degradation signals, which help the cellular proteolytic machinery to identify and target the proteins towards degradation pathways. There are several important factors, which are critical for folding of proteins into their native three-dimensional conformations by the cytoplasmic chaperones; but in real time how the chaperones fold the newly synthesized polypeptide sequences into a particular three-dimensional shape within a fraction of second is still a mystery for biologists as well as mathematicians. Multiple similar unsolved or unaddressed questions need to be addressed in detail so that future line of research can dig deeper into the finer details of these structures of the proteins.
Collapse
Affiliation(s)
- Arun Upadhyay
- Department of Biochemistry, Central University of Rajasthan, Ajmer, 305817, India.
| |
Collapse
|
8
|
Will WR, Brzovic P, Le Trong I, Stenkamp RE, Lawrenz MB, Karlinsey JE, Navarre WW, Main-Hester K, Miller VL, Libby SJ, Fang FC. The Evolution of SlyA/RovA Transcription Factors from Repressors to Countersilencers in Enterobacteriaceae. mBio 2019; 10:e00009-19. [PMID: 30837332 PMCID: PMC6401476 DOI: 10.1128/mbio.00009-19] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 01/29/2019] [Indexed: 02/02/2023] Open
Abstract
Gene duplication and subsequent evolutionary divergence have allowed conserved proteins to develop unique roles. The MarR family of transcription factors (TFs) has undergone extensive duplication and diversification in bacteria, where they act as environmentally responsive repressors of genes encoding efflux pumps that confer resistance to xenobiotics, including many antimicrobial agents. We have performed structural, functional, and genetic analyses of representative members of the SlyA/RovA lineage of MarR TFs, which retain some ancestral functions, including repression of their own expression and that of divergently transcribed multidrug efflux pumps, as well as allosteric inhibition by aromatic carboxylate compounds. However, SlyA and RovA have acquired the ability to countersilence horizontally acquired genes, which has greatly facilitated the evolution of Enterobacteriaceae by horizontal gene transfer. SlyA/RovA TFs in different species have independently evolved novel regulatory circuits to provide the enhanced levels of expression required for their new role. Moreover, in contrast to MarR, SlyA is not responsive to copper. These observations demonstrate the ability of TFs to acquire new functions as a result of evolutionary divergence of both cis-regulatory sequences and in trans interactions with modulatory ligands.IMPORTANCE Bacteria primarily evolve via horizontal gene transfer, acquiring new traits such as virulence and antibiotic resistance in single transfer events. However, newly acquired genes must be integrated into existing regulatory networks to allow appropriate expression in new hosts. This is accommodated in part by the opposing mechanisms of xenogeneic silencing and countersilencing. An understanding of these mechanisms is necessary to understand the relationship between gene regulation and bacterial evolution. Here we examine the functional evolution of an important lineage of countersilencers belonging to the ancient MarR family of classical transcriptional repressors. We show that although members of the SlyA lineage retain some ancestral features associated with the MarR family, their cis-regulatory sequences have evolved significantly to support their new function. Understanding the mechanistic requirements for countersilencing is critical to understanding the pathoadaptation of emerging pathogens and also has practical applications in synthetic biology.
Collapse
Affiliation(s)
- W Ryan Will
- Department of Laboratory Medicine, University of Washington, Seattle, Washington, USA
| | - Peter Brzovic
- Department of Biochemistry, University of Washington, Seattle, Washington, USA
| | - Isolde Le Trong
- Department of Biological Structure, University of Washington, Seattle, Washington, USA
| | - Ronald E Stenkamp
- Department of Biochemistry, University of Washington, Seattle, Washington, USA
- Department of Biological Structure, University of Washington, Seattle, Washington, USA
| | - Matthew B Lawrenz
- Department of Microbiology and Immunology and the Center for Predictive Medicine for Biodefense and Emerging Infectious Diseases, University of Louisville School of Medicine, Louisville, Kentucky, USA
| | - Joyce E Karlinsey
- Department of Microbiology, University of Washington, Seattle, Washington, USA
| | - William W Navarre
- Department of Microbiology, University of Washington, Seattle, Washington, USA
| | - Kara Main-Hester
- Department of Microbiology, University of Washington, Seattle, Washington, USA
| | - Virginia L Miller
- Department of Microbiology and Immunology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Stephen J Libby
- Department of Laboratory Medicine, University of Washington, Seattle, Washington, USA
| | - Ferric C Fang
- Department of Laboratory Medicine, University of Washington, Seattle, Washington, USA
- Department of Microbiology, University of Washington, Seattle, Washington, USA
| |
Collapse
|
9
|
Krepel D, Levy Y. Intersegmental transfer of proteins between DNA regions in the presence of crowding. Phys Chem Chem Phys 2018; 19:30562-30569. [PMID: 29115315 DOI: 10.1039/c7cp05251k] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Intersegmental transfer that involves direct relocation of a DNA-binding protein from one nonspecific DNA site to another was previously shown to contribute to speeding up the identification of the DNA target site. This mechanism is promoted when the protein is composed of at least two domains that have different DNA binding affinities and thus show a degree of mobility. In this study, we investigate the effect of particle crowding on the ability of a multi-domain protein to perform intersegmental transfer. We show that although crowding conditions often favor 1D diffusion of proteins along DNA over 3D diffusion, relocation of one of the tethered domains to initiate intersegmental transfer is possible even under crowding conditions. The tendency to perform intersegmental transfer by a multi-domain protein under crowding conditions is much higher for larger crowding particles than smaller ones and can be even greater than under no-crowding conditions. We report that the asymmetry of the two domains is even magnified by the crowders. The observations that crowding supports intersegmental transfer serve as another example that in vivo complexity does not necessarily slow down DNA search kinetics by proteins.
Collapse
Affiliation(s)
- Dana Krepel
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | |
Collapse
|
10
|
Tevatia R, Oyler GA. Evolution of DDB1-binding WD40 (DWD) in the viridiplantae. PLoS One 2018; 13:e0190282. [PMID: 29293590 PMCID: PMC5749748 DOI: 10.1371/journal.pone.0190282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 12/11/2017] [Indexed: 12/16/2022] Open
Abstract
Damaged DNA Binding 1 (DDB1)—binding WD40 (DWD) proteins are highly conserved and involved in a plethora of developmental and physiological processes such as flowering time control, photomorphogenesis, and abiotic stress responses. The phylogeny of this family of proteins in plants and algae of viridiplante is a critical area to understand the emergence of this family in such important and diverse functions. We aimed to investigate the putative homologs of DWD in the viridiplante and establish a deeper DWD evolutionary grasp. The advancement in publicly available genomic data allowed us to perform an extensive genome-wide DWD retrieval. Using annotated Arabidopsis thaliana DWDs as the reference, we generated and characterized a comprehensive DWD database for the studied photoautotrophs. Further, a generic DWD classification system (Type A to K), based on (i) position of DWD motifs, (ii) number of DWD motifs, and (iii) presence/absence of other domains, was adopted. About 72–80% DWDs have one DWD motif, whereas 17–24% DWDs have two and 0.5–4.7% DWDs have three DWD motifs. Neighbor-joining phylogenetic construction of A. thaliana DWDs facilitated us to tune these substrate receptors into 15 groups. Though the DWD count increases from microalgae to higher land plants, the ratio of DWD to WD40 remained constant throughout the viridiplante. The DWD expansion appeared to be the consequence of consistent DWD genetic flow accompanied by several gene duplication events. The network, phylogenetic, and statistical analysis delineated DWD evolutionary relevance in the viridiplante.
Collapse
Affiliation(s)
- Rahul Tevatia
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail: (RT); (GAO)
| | - George A. Oyler
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Synaptic Research LLC, Baltimore, Maryland, United States of America
- * E-mail: (RT); (GAO)
| |
Collapse
|
11
|
Abstract
The study of evolutionary relationships among protein sequences was one of the first applications of bioinformatics. Since then, and accompanying the wealth of biological data produced by genome sequencing and other high-throughput techniques, the use of bioinformatics in general and phylogenetics in particular has been gaining ground in the study of protein and proteome evolution. Nowadays, the use of phylogenetics is instrumental not only to infer the evolutionary relationships among species and their genome sequences, but also to reconstruct ancestral states of proteins and proteomes and hence trace the paths followed by evolution. Here I survey recent progress in the elucidation of mechanisms of protein and proteome evolution in which phylogenetics has played a determinant role.
Collapse
Affiliation(s)
- Toni Gabaldón
- Bioinformatics Department, Centro de Investigación Principe Felipe
| |
Collapse
|
12
|
Schaeffer RD, Kinch LN, Liao Y, Grishin NV. Classification of proteins with shared motifs and internal repeats in the ECOD database. Protein Sci 2016; 25:1188-203. [PMID: 26833690 DOI: 10.1002/pro.2893] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 01/23/2016] [Accepted: 01/27/2016] [Indexed: 12/19/2022]
Abstract
Proteins and their domains evolve by a set of events commonly including the duplication and divergence of small motifs. The presence of short repetitive regions in domains has generally constituted a difficult case for structural domain classifications and their hierarchies. We developed the Evolutionary Classification Of protein Domains (ECOD) in part to implement a new schema for the classification of these types of proteins. Here we document the ways in which ECOD classifies proteins with small internal repeats, widespread functional motifs, and assemblies of small domain-like fragments in its evolutionary schema. We illustrate the ways in which the structural genomics project impacted the classification and characterization of new structural domains and sequence families over the decade.
Collapse
Affiliation(s)
- R Dustin Schaeffer
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Yuxing Liao
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| |
Collapse
|
13
|
Cai S, Liu Z, Lee HC. Mean field theory for biology inspired duplication-divergence network model. CHAOS (WOODBURY, N.Y.) 2015; 25:083106. [PMID: 26328557 DOI: 10.1063/1.4928212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The duplication-divergence network model is generally thought to incorporate key ingredients underlying the growth and evolution of protein-protein interaction networks. Properties of the model have been elucidated through numerous simulation studies. However, a comprehensive theoretical study of the model is lacking. Here, we derived analytic expressions for quantities describing key characteristics of the network-the average degree, the degree distribution, the clustering coefficient, and the neighbor connectivity-in the mean-field, large-N limit of an extended version of the model, duplication-divergence complemented with heterodimerization and addition. We carried out extensive simulations and verified excellent agreement between simulation and theory except for one partial case. All four quantities obeyed power-laws even at moderate network size ( N∼10(4)), except the degree distribution, which had an additional exponential factor observed to obey power-law. It is shown that our network model can lead to the emergence of scale-free property and hierarchical modularity simultaneously, reproducing the important topological properties of real protein-protein interaction networks.
Collapse
Affiliation(s)
- Shuiming Cai
- Faculty of Science, Jiangsu University, Zhenjiang 212013, China
| | - Zengrong Liu
- Institute of Systems Biology, Shanghai University, Shanghai 200444, China
| | - H C Lee
- Institute of Systems Biology and Bioinformatics, National Central University, Zhongli, 32001 Taiwan
| |
Collapse
|
14
|
Espinoza-Valles I, Vora GJ, Lin B, Leekitcharoenphon P, González-Castillo A, Ussery D, Høj L, Gomez-Gil B. Unique and conserved genome regions in Vibrio harveyi and related species in comparison with the shrimp pathogen Vibrio harveyi CAIM 1792. MICROBIOLOGY-SGM 2015. [PMID: 26198743 DOI: 10.1099/mic.0.000141] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Vibrio harveyi CAIM 1792 is a marine bacterial strain that causes mortality in farmed shrimp in north-west Mexico, and the identification of virulence genes in this strain is important for understanding its pathogenicity. The aim of this work was to compare the V. harveyi CAIM 1792 genome with related genome sequences to determine their phylogenic relationship and explore unique regions in silico that differentiate this strain from other V. harveyi strains. Twenty-one newly sequenced genomes were compared in silico against the CAIM 1792 genome at nucleotidic and predicted proteome levels. The proteome of CAIM 1792 had higher similarity to those of other V. harveyi strains (78%) than to those of the other closely related species Vibrio owensii (67%), Vibrio rotiferianus (63%) and Vibrio campbellii (59%). Pan-genome ORFans trees showed the best fit with the accepted phylogeny based on DNA-DNA hybridization and multi-locus sequence analysis of 11 concatenated housekeeping genes. SNP analysis clustered 34/38 genomes within their accepted species. The pangenomic and SNP trees showed that V. harveyi is the most conserved of the four species studied and V. campbellii may be divided into at least three subspecies, supported by intergenomic distance analysis. blastp atlases were created to identify unique regions among the genomes most related to V. harveyi CAIM 1792; these regions included genes encoding glycosyltransferases, specific type restriction modification systems and a transcriptional regulator, LysR, reported to be involved in virulence, metabolism, quorum sensing and motility.
Collapse
Affiliation(s)
| | - Gary J Vora
- Center for Bio/Molecular Science & Engineering, Naval Research Laboratory, Washington, DC, USA
| | - Baochuan Lin
- Center for Bio/Molecular Science & Engineering, Naval Research Laboratory, Washington, DC, USA
| | - Pimlapas Leekitcharoenphon
- National Food Institute, Division for Epidemiology and Microbial Genomics, Technical University of Denmark, Kongens Lyngby, Denmark.,Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Dave Ussery
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark.,Comparative Genomics group, Biosciences Division, Oak Ridge National Labs, Oak Ridge, Tennessee, USA
| | - Lone Høj
- Australian Institute of Marine Science, Townsville, Queensland, Australia
| | - Bruno Gomez-Gil
- CIAD A.C., Mazatlán Unit for Aquaculture, Mazatlán, Sinaloa, Mexico
| |
Collapse
|
15
|
Multiple nucleophilic elbows leading to multiple active sites in a single module esterase from Sorangium cellulosum. J Struct Biol 2015; 190:314-27. [DOI: 10.1016/j.jsb.2015.04.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Revised: 03/25/2015] [Accepted: 04/10/2015] [Indexed: 11/17/2022]
|
16
|
Yadav A, Jalan S. Origin and implications of zero degeneracy in networks spectra. CHAOS (WOODBURY, N.Y.) 2015; 25:043110. [PMID: 25933658 DOI: 10.1063/1.4917286] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The spectra of many real world networks exhibit properties which are different from those of random networks generated using various models. One such property is the existence of a very high degeneracy at the zero eigenvalue. In this work, we provide all the possible reasons behind the occurrence of the zero degeneracy in the network spectra, namely, the complete and partial duplications, as well as their implications. The power-law degree sequence and the preferential attachment are the properties which enhances the occurrence of such duplications and hence leading to the zero degeneracy. A comparison of the zero degeneracy in protein-protein interaction networks of six different species and in their corresponding model networks indicates importance of the degree sequences and the power-law exponent for the occurrence of zero degeneracy.
Collapse
Affiliation(s)
- Alok Yadav
- Complex Systems Lab, Discipline of Physics, Indian Institute of Technology Indore, Indore 452017, India
| | - Sarika Jalan
- Complex Systems Lab, Discipline of Physics, Indian Institute of Technology Indore, Indore 452017, India
| |
Collapse
|
17
|
How do regulatory networks evolve and expand throughout evolution? Curr Opin Biotechnol 2015; 34:180-8. [PMID: 25723843 DOI: 10.1016/j.copbio.2015.02.001] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Revised: 02/04/2015] [Accepted: 02/04/2015] [Indexed: 11/23/2022]
Abstract
Throughout evolution, regulatory networks need to expand and adapt to accommodate novel genes and gene functions. However, the molecular details explaining how gene networks evolve remain largely unknown. Recent studies demonstrate that changes in transcription factors contribute to the evolution of regulatory networks. In particular, duplication of transcription factors followed by specific mutations in their DNA-binding or interaction domains propels the divergence and emergence of new networks. The innate promiscuity and modularity of regulatory networks contributes to their evolvability: duplicated promiscuous regulators and their target promoters can acquire mutations that lead to gradual increases in specificity, allowing neofunctionalization or subfunctionalization.
Collapse
|
18
|
Bhattacherjee A, Levy Y. Search by proteins for their DNA target site: 2. The effect of DNA conformation on the dynamics of multidomain proteins. Nucleic Acids Res 2014; 42:12415-24. [PMID: 25324311 PMCID: PMC4227779 DOI: 10.1093/nar/gku933] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2014] [Revised: 09/22/2014] [Accepted: 09/24/2014] [Indexed: 11/14/2022] Open
Abstract
Multidomain transcription factors, which are especially abundant in eukaryotic genomes, are advantageous to accelerate the search kinetics for target site because they can follow the intersegment transfer via the monkey-bar mechanism in which the protein forms a bridged intermediate between two distant DNA regions. Monkey-bar dynamics highly depends on the properties of the multidomain protein (the affinity of each of the constituent domains to the DNA and the length of the linker) and the DNA molecules (their inter-distance and inter-angle). In this study, we investigate using coarse-grained molecular dynamics simulations how the local conformation of the DNA may affect the DNA search performed by a multidomain protein Pax6 in comparison to that of the isolated domains. Our results suggest that in addition to the common rotation-coupled translation along the DNA major groove, for curved DNA the tethered domains may slide in a rotation-decoupled sliding mode. Furthermore, the multidomain proteins move by longer jumps on curved DNA compared with those performed by the single domain protein. The long jumps originate from the DNA curvature bringing two sequentially distant DNA sites into close proximity with each other and they suggest that multidomain proteins may move on highly curved DNA faster than linear DNA.
Collapse
Affiliation(s)
- Arnab Bhattacherjee
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yaakov Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
19
|
Pougach K, Voet A, Kondrashov FA, Voordeckers K, Christiaens JF, Baying B, Benes V, Sakai R, Aerts J, Zhu B, Van Dijck P, Verstrepen KJ. Duplication of a promiscuous transcription factor drives the emergence of a new regulatory network. Nat Commun 2014; 5:4868. [PMID: 25204769 PMCID: PMC4172970 DOI: 10.1038/ncomms5868] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 07/31/2014] [Indexed: 11/08/2022] Open
Abstract
The emergence of new genes throughout evolution requires rewiring and extension of regulatory networks. However, the molecular details of how the transcriptional regulation of new gene copies evolves remain largely unexplored. Here we show how duplication of a transcription factor gene allowed the emergence of two independent regulatory circuits. Interestingly, the ancestral transcription factor was promiscuous and could bind different motifs in its target promoters. After duplication, one paralogue evolved increased binding specificity so that it only binds one type of motif, whereas the other copy evolved a decreased activity so that it only activates promoters that contain multiple binding sites. Interestingly, only a few mutations in both the DNA-binding domains and in the promoter binding sites were required to gradually disentangle the two networks. These results reveal how duplication of a promiscuous transcription factor followed by concerted cis and trans mutations allows expansion of a regulatory network.
Collapse
Affiliation(s)
- Ksenia Pougach
- Laboratory for Genetics and Genomics, Department M2S, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, B-3001 Leuven, Belgium
- Laboratory for Systems biology, Vlaams Instituut voor Biotechnologie (VIB), B-3001 Leuven, Belgium
| | - Arnout Voet
- Structural Bioinformatics, Center for Life Science Technologies (CLST), RIKEN, 230-0045 Yokohama, Japan
| | - Fyodor A. Kondrashov
- Laboratory of Evolutionary Genomics, Centre for genomic regulation (CRG), 08003 Barcelona, Spain
| | - Karin Voordeckers
- Laboratory for Genetics and Genomics, Department M2S, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, B-3001 Leuven, Belgium
- Laboratory for Systems biology, Vlaams Instituut voor Biotechnologie (VIB), B-3001 Leuven, Belgium
| | - Joaquin F. Christiaens
- Laboratory for Genetics and Genomics, Department M2S, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, B-3001 Leuven, Belgium
- Laboratory for Systems biology, Vlaams Instituut voor Biotechnologie (VIB), B-3001 Leuven, Belgium
| | - Bianka Baying
- Genomics Core Facility, European Molecular Biology Laboratory Heidelberg (EMBL), 69117 Heidelberg, Germany
| | - Vladimir Benes
- Genomics Core Facility, European Molecular Biology Laboratory Heidelberg (EMBL), 69117 Heidelberg, Germany
| | - Ryo Sakai
- Department of Electrical Engineering, STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, B-3001 Leuven, Belgium
- iMinds Medical Information Technologies Department, KU Leuven, B-3001 Leuven, Belgium
| | - Jan Aerts
- Department of Electrical Engineering, STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, B-3001 Leuven, Belgium
- iMinds Medical Information Technologies Department, KU Leuven, B-3001 Leuven, Belgium
| | - Bo Zhu
- Laboratory for Genetics and Genomics, Department M2S, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, B-3001 Leuven, Belgium
- Laboratory for Systems biology, Vlaams Instituut voor Biotechnologie (VIB), B-3001 Leuven, Belgium
| | - Patrick Van Dijck
- Molecular Microbiology and Biotechnology Section, KU Leuven, B-3001 Leuven, Belgium
- Department of Molecular Microbiology, VIB, B-3001 Leuven, Belgium
| | - Kevin J. Verstrepen
- Laboratory for Genetics and Genomics, Department M2S, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, B-3001 Leuven, Belgium
- Laboratory for Systems biology, Vlaams Instituut voor Biotechnologie (VIB), B-3001 Leuven, Belgium
| |
Collapse
|
20
|
Vuzman D, Levy Y. The “Monkey-Bar” Mechanism for Searching for the DNA Target Site: The Molecular Determinants. Isr J Chem 2014. [DOI: 10.1002/ijch.201400107] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
21
|
Interface-resolved network of protein-protein interactions. PLoS Comput Biol 2013; 9:e1003065. [PMID: 23696724 PMCID: PMC3656101 DOI: 10.1371/journal.pcbi.1003065] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 04/08/2013] [Indexed: 12/01/2022] Open
Abstract
We define an interface-interaction network (IIN) to capture the specificity and competition between protein-protein interactions (PPI). This new type of network represents interactions between individual interfaces used in functional protein binding and thereby contains the detail necessary to describe the competition and cooperation between any pair of binding partners. Here we establish a general framework for the construction of IINs that merges computational structure-based interface assignment with careful curation of available literature. To complement limited structural data, the inclusion of biochemical data is critical for achieving the accuracy and completeness necessary to analyze the specificity and competition between the protein interactions. Firstly, this procedure provides a means to clarify the information content of existing data on purported protein interactions and to remove indirect and spurious interactions. Secondly, the IIN we have constructed here for proteins involved in clathrin-mediated endocytosis (CME) exhibits distinctive topological properties. In contrast to PPI networks with their global and relatively dense connectivity, the fragmentation of the IIN into distinctive network modules suggests that different functional pressures act on the evolution of its topology. Large modules in the IIN are formed by interfaces sharing specificity for certain domain types, such as SH3 domains distributed across different proteins. The shared and distinct specificity of an interface is necessary for effective negative and positive design of highly selective binding targets. Lastly, the organization of detailed structural data in a network format allows one to identify pathways of specific binding interactions and thereby predict effects of mutations at specific surfaces on a protein and of specific binding inhibitors, as we explore in several examples. Overall, the endocytosis IIN is remarkably complex and rich in features masked in the coarser PPI, and collects relevant detail of protein association in a readily interpretable format. Much of the work inside the cell is carried out by proteins interacting with other proteins. Each edge in a protein-protein interaction network reflects these functional interactions and each node a separate protein, creating a complex structure that nevertheless follows well-established global and local patterns related to robust protein function. However, this network is not detailed enough to assess whether a particular protein can bind multiple interaction partners simultaneously through distinct interfaces, or whether the partners targeting a specific interface share similar structural or chemical properties. By breaking each protein node into its constituent interface nodes, we generate and assess such a detailed new network. To sample protein binding interactions broadly and accurately beyond those seen in crystal structures, our method combines computational interface assignment with data from biochemical studies. Using this approach we are able to assign interfaces to the majority of known interactions between proteins involved in the clathrin-mediated endocytosis pathway in yeast. Analysis of this interface-interaction network provides novel insights into the functional specificity of protein interactions, and highlights elements of cooperativity and competition among the proteins. By identifying diverse multi-protein complexes, interface-interaction networks also provide a map for targeted drug development.
Collapse
|
22
|
Roach JM, Racioppi L, Jones CD, Masci AM. Phylogeny of Toll-like receptor signaling: adapting the innate response. PLoS One 2013; 8:e54156. [PMID: 23326591 PMCID: PMC3543326 DOI: 10.1371/journal.pone.0054156] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 12/10/2012] [Indexed: 02/06/2023] Open
Abstract
The Toll-like receptors represent a largely evolutionarily conserved pathogen recognition machinery responsible for recognition of bacterial, fungal, protozoan, and viral pathogen associated microbial patterns and initiation of inflammatory response. Structurally the Toll-like receptors are comprised of an extracellular leucine rich repeat domain and a cytoplasmic Toll/Interleukin 1 receptor domain. Recognition takes place in the extracellular domain where as the cytoplasmic domain triggers a complex signal network required to sustain appropriate immune response. Signal transduction is regulated by the recruitment of different intracellular adaptors. The Toll-like receptors can be grouped depending on the usage of the adaptor, MyD88, into MyD88-dependent and MyD88 independent subsets. Herein, we present a unique phylogenetic analysis of domain regions of these receptors and their cognate signaling adaptor molecules. Although previously unclear from the phylogeny of full length receptors, these analyses indicate a separate evolutionary origin for the MyD88-dependent and MyD88-independent signaling pathway and provide evidence of a common ancestor for the vertebrate and invertebrate orthologs of the adaptor molecule MyD88. Together these observations suggest a very ancient origin of the MyD88-dependent pathway Additionally we show that early duplications gave rise to several adaptor molecule families. In some cases there is also strong pattern of parallel duplication between adaptor molecules and their corresponding TLR. Our results further support the hypothesis that phylogeny of specific domains involved in signaling pathway can shed light on key processes that link innate to adaptive immune response.
Collapse
Affiliation(s)
- Jeffrey M. Roach
- Research Computing Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Luigi Racioppi
- Department of Medicine, Duke University, Durham, North Carolina; United States of America
- Department of Cellular and Molecular Biology and Pathology, University of Naples Federico II, Naples, Italy
| | - Corbin D. Jones
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Anna Maria Masci
- Department of Immunology, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
23
|
Sasidharan R, Nepusz T, Swarbreck D, Huala E, Paccanaro A. GFam: a platform for automatic annotation of gene families. Nucleic Acids Res 2012; 40:e152. [PMID: 22790981 PMCID: PMC3479161 DOI: 10.1093/nar/gks631] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Collapse
Affiliation(s)
- Rajkumar Sasidharan
- Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| | | | | | | | | |
Collapse
|
24
|
Light S, Sagit R, Ithychanda SS, Qin J, Elofsson A. The evolution of filamin-a protein domain repeat perspective. J Struct Biol 2012; 179:289-98. [PMID: 22414427 DOI: 10.1016/j.jsb.2012.02.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Revised: 02/03/2012] [Accepted: 02/15/2012] [Indexed: 10/28/2022]
Abstract
Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin β3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates.
Collapse
Affiliation(s)
- Sara Light
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Science for Life Laboratory, Bioinformatics Infrastructure for Life Sciences, Stockholm University, SE-17121 Solna, Sweden
| | | | | | | | | |
Collapse
|
25
|
Zhang XC, Wang Z, Zhang X, Le MH, Sun J, Xu D, Cheng J, Stacey G. Evolutionary dynamics of protein domain architecture in plants. BMC Evol Biol 2012; 12:6. [PMID: 22252370 PMCID: PMC3310802 DOI: 10.1186/1471-2148-12-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 01/17/2012] [Indexed: 12/17/2022] Open
Abstract
Background Protein domains are the structural, functional and evolutionary units of the protein. Protein domain architectures are the linear arrangements of domain(s) in individual proteins. Although the evolutionary history of protein domain architecture has been extensively studied in microorganisms, the evolutionary dynamics of domain architecture in the plant kingdom remains largely undefined. To address this question, we analyzed the lineage-based protein domain architecture content in 14 completed green plant genomes. Results Our analyses show that all 14 plant genomes maintain similar distributions of species-specific, single-domain, and multi-domain architectures. Approximately 65% of plant domain architectures are universally present in all plant lineages, while the remaining architectures are lineage-specific. Clear examples are seen of both the loss and gain of specific protein architectures in higher plants. There has been a dynamic, lineage-wise expansion of domain architectures during plant evolution. The data suggest that this expansion can be largely explained by changes in nuclear ploidy resulting from rounds of whole genome duplications. Indeed, there has been a decrease in the number of unique domain architectures when the genomes were normalized into a presumed ancestral genome that has not undergone whole genome duplications. Conclusions Our data show the conservation of universal domain architectures in all available plant genomes, indicating the presence of an evolutionarily conserved, core set of protein components. However, the occurrence of lineage-specific domain architectures indicates that domain architecture diversity has been maintained beyond these core components in plant genomes. Although several features of genome-wide domain architecture content are conserved in plants, the data clearly demonstrate lineage-wise, progressive changes and expansions of individual protein domain architectures, reinforcing the notion that plant genomes have undergone dynamic evolution.
Collapse
Affiliation(s)
- Xue-Cheng Zhang
- Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA.
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Vuzman D, Levy Y. Intrinsically disordered regions as affinity tuners in protein–DNA interactions. ACTA ACUST UNITED AC 2012; 8:47-57. [DOI: 10.1039/c1mb05273j] [Citation(s) in RCA: 154] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
27
|
Ghosh K, Dill K. Cellular proteomes have broad distributions of protein stability. Biophys J 2011; 99:3996-4002. [PMID: 21156142 DOI: 10.1016/j.bpj.2010.10.036] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Revised: 10/11/2010] [Accepted: 10/18/2010] [Indexed: 12/01/2022] Open
Abstract
Biological cells are extremely sensitive to temperature. What is the mechanism? We compute the thermal stabilities of the whole proteomes of Escherichia coli, yeast, and Caenorhabditis elegans using an analytical model and an extensive database of stabilities of individual proteins. Our results support the hypothesis that a cell's thermal sensitivities arise from the collective instability of its proteins. This model shows a denaturation catastrophe at temperatures of 49-55°C, roughly the thermal death point of mesophiles. Cells live on the edge of a proteostasis catastrophe. According to the model, it is not that the average protein is problematic; it is the tail of the distribution. About 650 of E. coli's 4300 proteins are less than 4 kcal mol(-1) stable to denaturation. And upshifting by only 4° from 37° to 41°C is estimated to destabilize an average protein by nearly 20%. This model also treats effects of denaturants, osmolytes, and other physical stressors. In addition, it predicts the dependence of cellular growth rates on temperature. This approach may be useful for studying physical forces in biological evolution and the role of climate change on biology.
Collapse
Affiliation(s)
- Kingshuk Ghosh
- Department of Physics and Astronomy, University of Denver, Denver, Colorado, USA.
| | | |
Collapse
|
28
|
Vuzman D, Polonsky M, Levy Y. Facilitated DNA search by multidomain transcription factors: cross talk via a flexible linker. Biophys J 2010; 99:1202-11. [PMID: 20713004 DOI: 10.1016/j.bpj.2010.06.007] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Revised: 05/30/2010] [Accepted: 06/02/2010] [Indexed: 10/19/2022] Open
Abstract
More than 70% of eukaryotic proteins are composed of multiple domains. However, most studies of the search for DNA focus on individual protein domains and do not consider potential cross talk within a multidomain transcription factor. In this study, the molecular features of the DNA search mechanism were explored for two multidomain transcription factors: human Pax6 and Oct-1. Using a simple computational model, we compared a DNA search of multidomain proteins with a search of isolated domains. Furthermore, we studied how manipulating the binding affinity of a single domain to DNA can affect the overall DNA search of the multidomain protein. Tethering the two domains via a flexible linker increases their affinity to the DNA, resulting in a higher propensity for sliding along the DNA, which is more significant for the domain with the weaker DNA-binding affinity. In this case, the domain that binds DNA more tightly anchors the multidomain protein to the DNA and, via the linker, increases the local concentration of the weak DNA-binding domain (DBD). The tethered domains directly exchange between two parallel DNA molecules via a bridged intermediate, where intersegmental transfer is promoted by the weaker DBD. We found that, in general, the relative affinity of the two domains can significantly affect the cross talk between them and thus their overall capability to search DNA efficiently. The results we obtained by examining various multidomain DNA-binding proteins support the necessity of discrepancies between the DNA-binding affinities of the constituent domains.
Collapse
Affiliation(s)
- Dana Vuzman
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | | | | |
Collapse
|
29
|
Abstract
From a comparatively small number of protein structural domains a staggering array of structural variants has evolved which has, in turn, facilitated an expanse of functional derivatives. Herein I review the primary mechanisms which have contributed to the vastness of our existing, and expanding, protein repertoires.
Collapse
Affiliation(s)
- Roy D Sleator
- Department of Biological Sciences, Cork Institute of Technology.
| |
Collapse
|
30
|
Farré D, Albà MM. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol Biol Evol 2009; 27:325-35. [PMID: 19822635 DOI: 10.1093/molbev/msp242] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Gene duplication is a major mechanism for molecular evolutionary innovation. Young gene duplicates typically exhibit elevated rates of protein evolution and, according to a number of recent studies, increased expression divergence. However, the nature of these changes is still poorly understood. To gain novel insights into the functional consequences of gene duplication, we have undertaken an in-depth analysis of a large data set of gene families containing primate- and/or rodent-specific gene duplicates. We have found a clear tendency toward an increase in protein, promoter, and expression divergence with increasing number of duplication events undergone by each gene since the human-mouse split. In addition, gene duplication is significantly associated with a reduction in expression breadth and intensity. Interestingly, it is possible to identify three main groups regarding the evolution of gene expression following gene duplication. The first group, which comprises around 25% of the families, shows patterns compatible with tissue-expression partitioning. The second and largest group, comprising 33-53% of the families, shows broad expression of one of the gene copies and reduced, overlapping, expression of the other copy or copies. This can be attributed, in most cases, to loss of expression in several tissues of one or more gene copies. Finally, a substantial number of families, 19-35%, maintain a very high level of tissue-expression overlap (>0.8) after tens of millions of years of evolution. These families may have been subject to selection for increased gene dosage.
Collapse
|
31
|
Treangen TJ, Abraham AL, Touchon M, Rocha EPC. Genesis, effects and fates of repeats in prokaryotic genomes. FEMS Microbiol Rev 2009; 33:539-71. [PMID: 19396957 DOI: 10.1111/j.1574-6976.2009.00169.x] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
DNA repeats are causes and consequences of genome plasticity. Repeats are created by intrachromosomal recombination or horizontal transfer. They are targeted by recombination processes leading to amplifications, deletions and rearrangements of genetic material. The identification and analysis of repeats in nearly 700 genomes of bacteria and archaea is facilitated by the existence of sequence data and adequate bioinformatic tools. These have revealed the immense diversity of repeats in genomes, from those created by selfish elements to the ones used for protection against selfish elements, from those arising from transient gene amplifications to the ones leading to stable duplications. Experimental works have shown that some repeats do not carry any adaptive value, while others allow functional diversification and increased expression. All repeats carry some potential to disorganize and destabilize genomes. Because recombination and selection for repeats vary between genomes, the number and types of repeats are also quite diverse and in line with ecological variables, such as host-dependent associations or population sizes, and with genetic variables, such as the recombination machinery. From an evolutionary point of view, repeats represent both opportunities and problems. We describe how repeats are created and how they can be found in genomes. We then focus on the functional and genomic consequences of repeats that dictate their fate.
Collapse
|
32
|
Yosef N, Kupiec M, Ruppin E, Sharan R. A complex-centric view of protein network evolution. Nucleic Acids Res 2009; 37:e88. [PMID: 19465379 PMCID: PMC2709590 DOI: 10.1093/nar/gkp414] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The recent availability of protein-protein interaction networks for several species makes it possible to study protein complexes in an evolutionary context. In this article, we present a novel network-based framework for reconstructing the evolutionary history of protein complexes. Our analysis is based on generalizing evolutionary measures for single proteins to the level of whole subnetworks, comprehensively considering a broad set of computationally derived complexes and accounting for both sequence and interaction changes. Specifically, we compute sets of orthologous complexes across species, and use these to derive evolutionary rate and age measures for protein complexes. We observe significant correlations between the evolutionary properties of a complex and those of its member proteins, suggesting that protein complexes form early in evolution and evolve as coherent units. Additionally, our approach enables us to directly quantify the extent to which gene duplication has played a role in the evolution of complexes. We find that about one quarter of the sets of orthologous complexes have originated from evolutionary cores of homodimers that underwent duplication and divergence, testifying to the important role of gene duplication in protein complex evolution.
Collapse
Affiliation(s)
- Nir Yosef
- The Blavatnik School of Computer Science, Department of Molecular Microbiology and Biotechnology and School of Medicine, Tel-Aviv University, Tel-Aviv 69978, Israel
| | | | | | | |
Collapse
|
33
|
Zhang G, Ignatova Z. Generic algorithm to predict the speed of translational elongation: implications for protein biogenesis. PLoS One 2009; 4:e5036. [PMID: 19343177 PMCID: PMC2661179 DOI: 10.1371/journal.pone.0005036] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2009] [Accepted: 03/03/2009] [Indexed: 11/27/2022] Open
Abstract
Synonymous codon usage and variations in the level of isoaccepting tRNAs exert a powerful selective force on translation fidelity. We have developed an algorithm to evaluate the relative rate of translation which allows large-scale comparisons of the non-uniform translation rate on the protein biogenesis. Using the complete genomes of Escherichia coli and Bacillus subtilis we show that stretches of codons pairing to minor tRNAs form putative sites to locally attenuate translation; thereby the tendency is to cluster in near proximity whereas long contiguous stretches of slow-translating triplets are avoided. The presence of slow-translating segments positively correlates with the protein length irrespective of the protein abundance. The slow-translating clusters are predominantly located down-stream of the domain boundaries presumably to fine-tune translational accuracy with the folding fidelity of multidomain proteins. Translation attenuation patterns at highly structurally and functionally conserved domains are preserved across the species suggesting a concerted selective pressure on the codon selection and species-specific tRNA abundance in these regions.
Collapse
Affiliation(s)
- Gong Zhang
- Department of Biochemistry, Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, Germany
| | - Zoya Ignatova
- Department of Biochemistry, Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, Germany
- * E-mail:
| |
Collapse
|
34
|
Abstract
It has been known for more than 35 years that, during evolution, new proteins are formed by gene duplications, sequence and structural divergence and, in many cases, gene combinations. The genome projects have produced complete, or almost complete, descriptions of the protein repertoires of over 600 distinct organisms. Analyses of these data have dramatically increased our understanding of the formation of new proteins. At the present time, we can accurately trace the evolutionary relationships of about half the proteins found in most genomes, and it is these proteins that we discuss in the present review. Usually, the units of evolution are protein domains that are duplicated, diverge and form combinations. Small proteins contain one domain, and large proteins contain combinations of two or more domains. Domains descended from a common ancestor are clustered into superfamilies. In most genomes, the net growth of superfamily members means that more than 90% of domains are duplicates. In a section on domain duplications, we discuss the number of currently known superfamilies, their size and distribution, and superfamily expansions related to biological complexity and to specific lineages. In a section on divergence, we describe how sequences and structures diverge, the changes in stability produced by acceptable mutations, and the nature of functional divergence and selection. In a section on domain combinations, we discuss their general nature, the sequential order of domains, how combinations modify function, and the extraordinary variety of the domain combinations found in different genomes. We conclude with a brief note on other forms of protein evolution and speculations of the origins of the duplication, divergence and combination processes.
Collapse
|
35
|
Kummerfeld SK, Teichmann SA. Protein domain organisation: adding order. BMC Bioinformatics 2009; 10:39. [PMID: 19178743 PMCID: PMC2657131 DOI: 10.1186/1471-2105-10-39] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2008] [Accepted: 01/29/2009] [Indexed: 11/30/2022] Open
Abstract
Background Domains are the building blocks of proteins. During evolution, they have been duplicated, fused and recombined, to produce proteins with novel structures and functions. Structural and genome-scale studies have shown that pairs or groups of domains observed together in a protein are almost always found in only one N to C terminal order and are the result of a single recombination event that has been propagated by duplication of the multi-domain unit. Previous studies of domain organisation have used graph theory to represent the co-occurrence of domains within proteins. We build on this approach by adding directionality to the graphs and connecting nodes based on their relative order in the protein. Most of the time, the linear order of domains is conserved. However, using the directed graph representation we have identified non-linear features of domain organization that are over-represented in genomes. Recognising these patterns and unravelling how they have arisen may allow us to understand the functional relationships between domains and understand how the protein repertoire has evolved. Results We identify groups of domains that are not linearly conserved, but instead have been shuffled during evolution so that they occur in multiple different orders. We consider 192 genomes across all three kingdoms of life and use domain and protein annotation to understand their functional significance. To identify these features and assess their statistical significance, we represent the linear order of domains in proteins as a directed graph and apply graph theoretical methods. We describe two higher-order patterns of domain organisation: clusters and bi-directionally associated domain pairs and explore their functional importance and phylogenetic conservation. Conclusion Taking into account the order of domains, we have derived a novel picture of global protein organization. We found that all genomes have a higher than expected degree of clustering and more domain pairs in forward and reverse orientation in different proteins relative to random graphs with identical degree distributions. While these features were statistically over-represented, they are still fairly rare. Looking in detail at the proteins involved, we found strong functional relationships within each cluster. In addition, the domains tended to be involved in protein-protein interaction and are able to function as independent structural units. A particularly striking example was the human Jak-STAT signalling pathway which makes use of a set of domains in a range of orders and orientations to provide nuanced signaling functionality. This illustrated the importance of functional and structural constraints (or lack thereof) on domain organisation.
Collapse
Affiliation(s)
- Sarah K Kummerfeld
- Department of Developmental Biology, 279 Campus Dr, Stanford, 94305, CA, USA.
| | | |
Collapse
|
36
|
Isalan M, Lemerle C, Michalodimitrakis K, Horn C, Beltrao P, Raineri E, Garriga-Canut M, Serrano L. Evolvability and hierarchy in rewired bacterial gene networks. Nature 2008; 452:840-5. [PMID: 18421347 PMCID: PMC2666274 DOI: 10.1038/nature06847] [Citation(s) in RCA: 240] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2007] [Accepted: 02/22/2008] [Indexed: 11/09/2022]
Abstract
Sequencing DNA from several organisms has revealed that duplication and drift of existing genes have primarily moulded the contents of a given genome. Though the effect of knocking out or overexpressing a particular gene has been studied in many organisms, no study has systematically explored the effect of adding new links in a biological network. To explore network evolvability, we constructed 598 recombinations of promoters (including regulatory regions) with different transcription or sigma-factor genes in Escherichia coli, added over a wild-type genetic background. Here we show that approximately 95% of new networks are tolerated by the bacteria, that very few alter growth, and that expression level correlates with factor position in the wild-type network hierarchy. Most importantly, we find that certain networks consistently survive over the wild type under various selection pressures. Therefore new links in the network are rarely a barrier for evolution and can even confer a fitness advantage.
Collapse
Affiliation(s)
- Mark Isalan
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), UPF, 08003 Barcelona, Spain.
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Guimarães KS, Przytycka TM. Interrogating domain-domain interactions with parsimony based approaches. BMC Bioinformatics 2008; 9:171. [PMID: 18366803 PMCID: PMC2358894 DOI: 10.1186/1471-2105-9-171] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2007] [Accepted: 03/26/2008] [Indexed: 12/17/2022] Open
Abstract
Background The identification and characterization of interacting domain pairs is an important step towards understanding protein interactions. In the last few years, several methods to predict domain interactions have been proposed. Understanding the power and the limitations of these methods is key to the development of improved approaches and better understanding of the nature of these interactions. Results Building on the previously published Parsimonious Explanation method (PE) to predict domain-domain interactions, we introduced a new Generalized Parsimonious Explanation (GPE) method, which (i) adjusts the granularity of the domain definition to the granularity of the input data set and (ii) permits domain interactions to have different costs. This allowed for preferential selection of the so-called "co-occurring domains" as possible mediators of interactions between proteins. The performance of both variants of the parsimony method are competitive to the performance of the top algorithms for this problem even though parsimony methods use less information than some of the other methods. We also examined possible enrichment of co-occurring domains and homo-domains among domain interactions mediating the interaction of proteins in the network. The corresponding study was performed by surveying domain interactions predicted by the GPE method as well as by using a combinatorial counting approach independent of any prediction method. Our findings indicate that, while there is a considerable propensity towards these special domain pairs among predicted domain interactions, this overrepresentation is significantly lower than in the iPfam dataset. Conclusion The Generalized Parsimonious Explanation approach provides a new means to predict and study domain-domain interactions. We showed that, under the assumption that all protein interactions in the network are mediated by domain interactions, there exists a significant deviation of the properties of domain interactions mediating interactions in the network from that of iPfam data.
Collapse
Affiliation(s)
- Katia S Guimarães
- National Center of Biotechnology, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
38
|
Hülter N, Wackernagel W. Double illegitimate recombination events integrate DNA segments through two different mechanisms during natural transformation of Acinetobacter baylyi. Mol Microbiol 2008; 67:984-95. [PMID: 18194157 DOI: 10.1111/j.1365-2958.2007.06096.x] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Acquisition of foreign DNA by horizontal gene transfer is seen as a major source of genetic diversity in prokaryotes. However, strongly divergent DNA is not genomically integrated by homologous recombination and would depend on illegitimate recombination (IR) events which are rare. We show that, by two mechanisms, during natural transformation of Acinetobacter baylyi two IR events can integrate DNA segments. One mechanism is double illegitimate recombination (DIR) acting in the absence of any homology (frequency: 7 x 10(-13) per cell). It occurs about 10(10)-fold less frequent than homologous transformation. The other mechanism is homology-facilitated double illegitimate recombination (HFDIR) being about 440-fold more frequent (3 x 10(-10) per cell) than DIR. HFDIR depends on a homologous sequence located between the IR sites and on recA(+). In HFDIR two IR events act on the same donor DNA molecule as shown by the joint inheritance of molecular DNA tags. While the IR events in HFDIR occurred at microhomologies, in DIR microhomologies were not used. The HFDIR phenomenon indicates that a temporal recA-dependent association of donor DNA at a homology in recipient DNA may facilitate two IR events on the 5' and 3' heterologous parts of the transforming DNA molecule.
Collapse
Affiliation(s)
- Nils Hülter
- Genetics, Department of Biology and Environmental Sciences, Carl von Ossietzky University Oldenburg, D-26111 Oldenburg, Germany
| | | |
Collapse
|
39
|
Carter P, Lee D, Orengo C. Chapter 1. Target selection in structural genomics projects to increase knowledge of protein structure and function space. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2008; 75:1-52. [PMID: 20731988 DOI: 10.1016/s0065-3233(07)75001-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Structural genomics aims to solve the three-dimensional structures of proteins at a rapid rate and in a cost-effective manner, with the hope of significantly impacting on the life sciences, biotechnology, and drug discovery in the long-term. Structural genomics initiatives started in Japan in 1997 with the advent of the Protein Folds Project. Since then many new initiatives have begun worldwide, with diverse aims motivating the selection of proteins for structure determination. In this chapter, we consider the biological goals of high-throughput structural biology, while focusing on the Protein Structure Initiative in the United States. This is the most productive of the structural genomics initiatives, having solved 3,363 new structures between September 2000 and October 2008.
Collapse
Affiliation(s)
- Phil Carter
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | | | |
Collapse
|
40
|
Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA. Evolution of protein complexes by duplication of homomeric interactions. Genome Biol 2007; 8:R51. [PMID: 17411433 PMCID: PMC1895999 DOI: 10.1186/gb-2007-8-4-r51] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2006] [Revised: 01/15/2007] [Accepted: 04/05/2007] [Indexed: 12/02/2022] Open
Abstract
A study of yeast protein complexes, complexes of known three-dimensional structure in the Protein Data Bank and clusters of pair-wise protein interactions in the networks of several organisms revealed that duplication of homomeric interactions often results in the formation of complexes of paralogous proteins. Background Cellular functions are accomplished by the concerted actions of functional modules. The mechanisms driving the emergence and evolution of these modules are still unclear. Here we investigate the evolutionary origins of protein complexes, modules in physical protein-protein interaction networks. Results We studied protein complexes in Saccharomyces cerevisiae, complexes of known three-dimensional structure in the Protein Data Bank and clusters of pairwise protein interactions in the networks of several organisms. We found that duplication of homomeric interactions, a large class of protein interactions, frequently results in the formation of complexes of paralogous proteins. This route is a common mechanism for the evolution of complexes and clusters of protein interactions. Our conclusions are further confirmed by theoretical modelling of network evolution. We propose reasons for why this is favourable in terms of structure and function of protein complexes. Conclusion Our study provides the first insight into the evolution of functional modularity in protein-protein interaction networks, and the origins of a large class of protein complexes.
Collapse
Affiliation(s)
- Jose B Pereira-Leal
- Instituto Gulbenkian de Ciência, Apartado 14, P-2781-901 Oeiras, Portugal
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | - Emmanuel D Levy
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | - Christel Kamp
- Paul-Ehrlich-Institut, Federal Agency for Sera and Vaccines, Paul-Ehrlich-Straße, 63225 Langen, Germany
| | - Sarah A Teichmann
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| |
Collapse
|
41
|
Ratmann O, Jørgensen O, Hinkley T, Stumpf M, Richardson S, Wiuf C. Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum. PLoS Comput Biol 2007; 3:e230. [PMID: 18052538 PMCID: PMC2098858 DOI: 10.1371/journal.pcbi.0030230] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2007] [Accepted: 10/05/2007] [Indexed: 11/18/2022] Open
Abstract
Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains. The importance of gene duplication to biological evolution has been recognized since the 1930s. For more than a decade, substantial evidence has been collected from genomic sequence data in order to elucidate the importance and the mechanisms of gene duplication; however, most biological characteristics arise from complex interactions between the cell's numerous constituents. Recently, preliminary descriptions of the protein interaction networks have become available for species of different domains. Adapting novel techniques in stochastic simulation, the authors demonstrate that evolutionary inferences can be drawn from large-scale, incomplete network data by fitting a stochastic model of network growth that captures hallmarks of evolution by duplication and divergence. They have also analyzed the effect of summarizing protein networks in different ways, and show that a reliable and consistent analysis requires many aspects of network data to be considered jointly; in contrast to what is commonly done in practice. Their results indicate that duplication and divergence has played a larger role in the network evolution of the eukaryote P. falciparum than in the prokaryote H. pylori, and emphasize at least for the eukaryote the potential importance of subfunctionalization in network evolution.
Collapse
Affiliation(s)
- Oliver Ratmann
- Department of Public Health and Epidemiology, Imperial College London, London, United Kingdom.
| | | | | | | | | | | |
Collapse
|
42
|
Abstract
Three decades ago Gilbert posited that novel proteins arise by re-shuffling genomic sequences encoding polypeptide domains. Today, with numerous genomes and countless genes sequenced, it is well established that recombination of sequences encoding polypeptide domains plays a major role in protein evolution. There is, however, less evidence to suggest how the novel polypeptide domains, themselves, arise. Recent comparisons of genomes from closely related species have revealed numerous species-specific exons, supporting models of domain origin based on "exonization" of intron sequences. Also, a mechanism for the origin of novel polypeptide domains has been proposed based on analyses of insertion-based polymorphisms between orthologous genes across broad phylogenetic spectra and between allelic variants of genes within species. This review discusses these processes and how each might participate in the evolutionary emergence of novel polypeptide domains.
Collapse
Affiliation(s)
- Edward E Schmidt
- Molecular Biosciences, Montana State University, Bozeman, MT 59717, USA.
| | | |
Collapse
|
43
|
Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J. The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 2007; 8:319-30. [PMID: 17356578 DOI: 10.1038/nrm2144] [Citation(s) in RCA: 282] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Analyses of genomes show that more than 70% of eukaryotic proteins are composed of multiple domains. However, most studies of protein folding focus on individual domains and do not consider how interactions between domains might affect folding. Here, we address this by analysing the three-dimensional structures of multidomain proteins that have been characterized experimentally and observe that where the interface is small and loosely packed, or unstructured, the folding of the domains is independent. Furthermore, recent studies indicate that multidomain proteins have evolved mechanisms to minimize the problems of interdomain misfolding.
Collapse
Affiliation(s)
- Jung-Hoon Han
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | | | | | | | | |
Collapse
|
44
|
Gabaldón T. Evolution of proteins and proteomes: a phylogenetics approach. Evol Bioinform Online 2007; 1:51-61. [PMID: 19325853 PMCID: PMC2658874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The study of evolutionary relationships among protein sequences was one of the first applications of bioinformatics. Since then, and accompanying the wealth of biological data produced by genome sequencing and other high-throughput techniques, the use of bioinformatics in general and phylogenetics in particular has been gaining ground in the study of protein and proteome evolution. Nowadays, the use of phylogenetics is instrumental not only to infer the evolutionary relationships among species and their genome sequences, but also to reconstruct ancestral states of proteins and proteomes and hence trace the paths followed by evolution. Here I survey recent progress in the elucidation of mechanisms of protein and proteome evolution in which phylogenetics has played a determinant role.
Collapse
|
45
|
Fong JH, Geer LY, Panchenko AR, Bryant SH. Modeling the evolution of protein domain architectures using maximum parsimony. J Mol Biol 2006; 366:307-15. [PMID: 17166515 PMCID: PMC1858635 DOI: 10.1016/j.jmb.2006.11.017] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Revised: 09/22/2006] [Accepted: 11/06/2006] [Indexed: 10/23/2022]
Abstract
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function.
Collapse
|
46
|
Uhrig JF. Protein interaction networks in plants. PLANTA 2006; 224:771-81. [PMID: 16575597 DOI: 10.1007/s00425-006-0260-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2005] [Accepted: 03/03/2006] [Indexed: 05/08/2023]
Abstract
Protein-protein interactions are fundamental to virtually every aspect of cellular functions. With the development of high-throughput technologies of both the yeast two-hybrid system and tandem mass spectrometry, genome-wide protein-linkage mapping has become a major objective in post-genomic research. While at least partial "interactome" networks of several model organisms are already available, in the plant field, progress in this respect is slow. However, even with comprehensive protein interaction data still missing, substantial recent advance in the graph-theoretical functional interpretation of complex network architectures might pave the way for novel approaches in plant research. This article reviews current progress and discussions in network biology. Emphasis is put on the question of what can be learned about protein functions and cellular processes by studying the topology of complex protein interaction networks and the evolutionary mechanisms underlying their development. Particularly the intermediate and local levels of network organization--the modules, motifs and cliques--are increasingly recognized as the operational units of biological functions. As demonstrated by some recent results from systematic analyses of plant protein families, protein interaction networks promise to be a valuable tool for a molecular understanding of functional specificities and for identifying novel regulatory components and pathways.
Collapse
Affiliation(s)
- Joachim F Uhrig
- Botanisches Institut III, Universität zu Köln, Gyrhof Strasse 15, 50931 Koln, Germany.
| |
Collapse
|
47
|
Han JH, Kerrison N, Chothia C, Teichmann SA. Divergence of interdomain geometry in two-domain proteins. Structure 2006; 14:935-45. [PMID: 16698554 DOI: 10.1016/j.str.2006.01.016] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2005] [Revised: 12/23/2005] [Accepted: 01/18/2006] [Indexed: 10/24/2022]
Abstract
For homologous protein chains composed of two domains, we have determined the extent to which they conserve (1) their interdomain geometry and (2) the molecular structure of the domain interface. This work was carried out on 128 unique two-domain architectures. Of the 128, we find 75 conserve their interdomain geometry and the structure of their domain interface; 5 conserve their interdomain geometry but not the structure of their interface; and 48 have variable geometries and divergent interface structure. We describe how different types of interface changes or the absence of an interface is responsible for these differences in geometry. Variable interdomain geometries can be found in homologous structures with high sequence identities (70%).
Collapse
Affiliation(s)
- Jung-Hoon Han
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, United Kingdom.
| | | | | | | |
Collapse
|
48
|
Pereira-Leal JB, Levy ED, Teichmann SA. The origins and evolution of functional modules: lessons from protein complexes. Philos Trans R Soc Lond B Biol Sci 2006; 361:507-17. [PMID: 16524839 PMCID: PMC1609335 DOI: 10.1098/rstb.2005.1807] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Modularity is an attribute of a system that can be decomposed into a set of cohesive entities that are loosely coupled. Many cellular networks can be decomposed into functional modules-each functionally separable from the other modules. The protein complexes in physical protein interaction networks are a good example of this, and here we focus on their origins and evolution. We investigate the emergence of protein complexes and physical interactions between proteins by duplication, and review other mechanisms. We dissect the dataset of protein complexes of known three-dimensional structure, and show that roughly 90% of these complexes contain contacts between identical proteins within the same complex. Proteins that are shared across different complexes occur frequently, and they tend to be essential genes more often than members of a single protein complex. We also provide a perspective on the evolutionary mechanisms driving the growth of other modular cellular networks such as transcriptional regulatory and metabolic networks.
Collapse
|
49
|
Bonomo J, Warnecke T, Hume P, Marizcurrena A, Gill RT. A comparative study of metabolic engineering anti-metabolite tolerance in Escherichia coli. Metab Eng 2006; 8:227-39. [PMID: 16497527 DOI: 10.1016/j.ymben.2005.12.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2005] [Revised: 12/15/2005] [Accepted: 12/28/2005] [Indexed: 11/22/2022]
Abstract
A problem in strain engineering is that mutations that benefit the expression of a phenotype in one environment may impose a cost to biological fitness in a new environment. The overall objective of this study was to improve understanding of this phenomenon within the context of a classic anti-metabolite selection strategy. We have engineered Escherichia coli using three mutagenesis techniques (chemical mutagenesis, insertional mutagenesis, and plasmid-based overexpression) and assessed the relative costs and benefits to biological fitness of mutants selected for tolerance to five amino acid analogs whose target amino acids (glutamatic acid, aspartic acid, tryptophan, glycine, and serine) differ in metabolic connectivity and biosynthetic energy requirements. Our major findings include (i) the fold increase in anti-metabolite tolerance, independent of mutagenesis strategy, was much greater for aspartic acid beta-hydroxamate (AAH) compared to all other tested hydroxamates, (ii) increased tolerance to glutamic acid gamma-hydroxamate (GAH) was not achieved using any of the mutagenesis strategies, and (iii) characteristics of the anti-metabolite, rather than those of the corresponding metabolite, were more important in determining the ability to increase tolerance.
Collapse
Affiliation(s)
- Jeanne Bonomo
- Department of Chemical and Biological Engineering, University of Colorado, Boulder, Campus Box 424, Boulder, CO 80309, USA
| | | | | | | | | |
Collapse
|
50
|
Kim Y, Subramaniam S. Locally defined protein phylogenetic profiles reveal previously missed protein interactions and functional relationships. Proteins 2006; 62:1115-24. [PMID: 16385560 DOI: 10.1002/prot.20830] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Phylogenetic profiles encode patterns of presence or absence of genes across genomes, and these profiles can be used to assign functional relationships to nonhomologous pairs of proteins (Pellegrini et al., Proc Natl Acad Sci USA 1999;96:4284-4288). Although it is well known that many proteins were created from combinations of domains, most of the existing implementations of phylogenetic profiles do not consider this fact. Here, we introduce an extension that considers the multidomain nature of proteins and test the method against the known interaction data sets. Whereas earlier implementations associated one entire sequence with one protein phylogenetic profile (Single-Profile), our method instead breaks the sequence into a set of segments of predetermined size and constructs a separate profile for each segment (Multiple-Profile). The results show that the Multiple-Profile method performs as well as the Single-Profile method. However, the two methods share, surprisingly, a small fraction of their predictions, indicating that the Multiple-Profile method can detect known interactions missed by the Single-Profile method. Thus, the Multiple-Profile method can be used with other methods to determine functional relationships on a genome scale with wider coverage.
Collapse
Affiliation(s)
- Yohan Kim
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California 92093-0505, USA
| | | |
Collapse
|