Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA. Supra-domains: Evolutionary Units Larger than Single Protein Domains. J Mol Biol 2004;336:809-23. [PMID: 15095989 DOI: 10.1016/j.jmb.2003.12.026] [Citation(s) in RCA: 139] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2003] [Revised: 12/04/2003] [Accepted: 12/05/2003] [Indexed: 11/18/2022]

For:	Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA. Supra-domains: Evolutionary Units Larger than Single Protein Domains. J Mol Biol 2004;336:809-23. [PMID: 15095989 DOI: 10.1016/j.jmb.2003.12.026] [Citation(s) in RCA: 139] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2003] [Revised: 12/04/2003] [Accepted: 12/05/2003] [Indexed: 11/18/2022]

Number

Cited by Other Article(s)

Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]

Stetler-Stevenson WG. The Continuing Saga of Tissue Inhibitor of Metalloproteinase 2: Emerging Roles in Tissue Homeostasis and Cancer Progression. THE AMERICAN JOURNAL OF PATHOLOGY 2023;193:1336-1352. [PMID: 37572947 PMCID: PMC10548276 DOI: 10.1016/j.ajpath.2023.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 07/26/2023] [Accepted: 08/01/2023] [Indexed: 08/14/2023]

Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023;91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]

Mayer C, Vogt A, Uslu T, Scalzitti N, Chennen K, Poch O, Thompson JD. CeGAL: Redefining a Widespread Fungal-Specific Transcription Factor Family Using an In Silico Error-Tracking Approach. J Fungi (Basel) 2023;9:jof9040424. [PMID: 37108879 PMCID: PMC10141177 DOI: 10.3390/jof9040424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/21/2023] [Accepted: 03/28/2023] [Indexed: 03/31/2023] Open

Taha Tolba EAEH, Ahmed Amer HZ. In silico Analysis of Tyrosine Kinases Receptor in Papillary and Medullary Thyroid Cancer Using Sequence-alignment-based Methods. BIOTECHNOLOGY(FAISALABAD) 2023;22:18-27. [DOI: 10.3923/biotech.2023.18.27] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]

Tong CL, Kanwar N, Morrone DJ, Seelig B. Nature-inspired engineering of an artificial ligase enzyme by domain fusion. Nucleic Acids Res 2022;50:11175-11185. [PMID: 36243966 PMCID: PMC9638898 DOI: 10.1093/nar/gkac858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 08/30/2022] [Accepted: 09/26/2022] [Indexed: 11/20/2022] Open

Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021;18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Lindenburg LH, Pantelejevs T, Gielen F, Zuazua-Villar P, Butz M, Rees E, Kaminski CF, Downs JA, Hyvönen M, Hollfelder F. Improved RAD51 binders through motif shuffling based on the modularity of BRC repeats. Proc Natl Acad Sci U S A 2021;118:e2017708118. [PMID: 34772801 PMCID: PMC8727024 DOI: 10.1073/pnas.2017708118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/10/2021] [Indexed: 01/20/2023] Open

Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021;433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]

Abstract

Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.

Collapse

Ma P, Luo T, Ge L, Chen Z, Wang X, Zhao R, Liao W, Bao L. Compensatory effects of M. tuberculosis rpoB mutations outside the rifampicin resistance-determining region. Emerg Microbes Infect 2021;10:743-752. [PMID: 33775224 PMCID: PMC8057087 DOI: 10.1080/22221751.2021.1908096] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Monzon V, Lafita A, Bateman A. Discovery of fibrillar adhesins across bacterial species. BMC Genomics 2021;22:550. [PMID: 34275445 PMCID: PMC8286594 DOI: 10.1186/s12864-021-07586-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 04/07/2021] [Indexed: 11/16/2022] Open

Weadick CJ. Molecular Evolutionary Analysis of Nematode Zona Pellucida (ZP) Modules Reveals Disulfide-Bond Reshuffling and Standalone ZP-C Domains. Genome Biol Evol 2021;12:1240-1255. [PMID: 32426804 PMCID: PMC7456536 DOI: 10.1093/gbe/evaa095] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2020] [Indexed: 12/11/2022] Open

Abstract

Zona pellucida (ZP) modules mediate extracellular protein-protein interactions and contribute to important biological processes including syngamy and cellular morphogenesis. Although some biomedically relevant ZP modules are well studied, little is known about the protein family's broad-scale diversity and evolution. The increasing availability of sequenced genomes from "nonmodel" systems provides a valuable opportunity to address this issue and to use comparative approaches to gain new insights into ZP module biology. Here, through phylogenetic and structural exploration of ZP module diversity across the nematode phylum, I report evidence that speaks to two important aspects of ZP module biology. First, I show that ZP-C domains-which in some modules act as regulators of ZP-N domain-mediated polymerization activity, and which have never before been found in isolation-can indeed be found as standalone domains. These standalone ZP-C domain proteins originated in independent (paralogous) lineages prior to the diversification of extant nematodes, after which they evolved under strong stabilizing selection, suggesting the presence of ZP-N domain-independent functionality. Second, I provide a much-needed phylogenetic perspective on disulfide bond variability, uncovering evidence for both convergent evolution and disulfide-bond reshuffling. This result has implications for our evolutionary understanding and classification of ZP module structural diversity and highlights the usefulness of phylogenetics and diverse sampling for protein structural biology. All told, these findings set the stage for broad-scale (cross-phyla) evolutionary analysis of ZP modules and position Caenorhabditis elegans and other nematodes as important experimental systems for exploring the evolution of ZP modules and their constituent domains.

Collapse

Vicedomini R, Blachon C, Oteri F, Carbone A. MyCLADE: a multi-source domain annotation server for sequence functional exploration. Nucleic Acids Res 2021;49:W452-W458. [PMID: 34023906 PMCID: PMC8262732 DOI: 10.1093/nar/gkab395] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/27/2021] [Accepted: 04/29/2021] [Indexed: 11/13/2022] Open

Evolution of networks of protein domain organization. Sci Rep 2021;11:12075. [PMID: 34103558 PMCID: PMC8187734 DOI: 10.1038/s41598-021-90498-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 05/11/2021] [Indexed: 02/05/2023] Open

Structure-based protein function prediction using graph convolutional networks. Nat Commun 2021;12:3168. [PMID: 34039967 PMCID: PMC8155034 DOI: 10.1038/s41467-021-23303-9] [Citation(s) in RCA: 225] [Impact Index Per Article: 75.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 04/22/2021] [Indexed: 02/04/2023] Open

Bordin N, Sillitoe I, Lees JG, Orengo C. Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds. Front Mol Biosci 2021;8:668184. [PMID: 34041266 PMCID: PMC8141709 DOI: 10.3389/fmolb.2021.668184] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open

Xiao X, Xue GF, Stamatovic B, Qiu WR. Using Cellular Automata to Simulate Domain Evolution in Proteins. Front Genet 2020;11:515. [PMID: 32582278 PMCID: PMC7296063 DOI: 10.3389/fgene.2020.00515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 04/28/2020] [Indexed: 11/26/2022] Open

Koo DCE, Bonneau R. Towards region-specific propagation of protein functions. Bioinformatics 2020;35:1737-1744. [PMID: 30304483 PMCID: PMC6513163 DOI: 10.1093/bioinformatics/bty834] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 08/23/2018] [Accepted: 10/08/2018] [Indexed: 01/06/2023] Open

Bokhari RH, Amirjan N, Jeong H, Kim KM, Caetano-Anollés G, Nasir A. Bacterial Origin and Reductive Evolution of the CPR Group. Genome Biol Evol 2020;12:103-121. [PMID: 32031619 PMCID: PMC7093835 DOI: 10.1093/gbe/evaa024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2020] [Indexed: 12/24/2022] Open

Naveenkumar N, Kumar G, Sowdhamini R, Srinivasan N, Vishwanath S. Fold combinations in multi-domain proteins. Bioinformation 2019;15:342-350. [PMID: 31249437 PMCID: PMC6589474 DOI: 10.6026/97320630015342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Accepted: 05/07/2019] [Indexed: 01/21/2023] Open

Evolution of Protein Domain Architectures. Methods Mol Biol 2019;1910:469-504. [PMID: 31278674 DOI: 10.1007/978-1-4939-9074-0_15] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Navigating Among Known Structures in Protein Space. Methods Mol Biol 2018. [PMID: 30298400 DOI: 10.1007/978-1-4939-8736-8_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Abstract

Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.In broad strokes, studies of protein space vary in the entities they represent, the similarity measure comparing these entities, and the representation used. The entities can be, for example, protein chains, domains, supra-domains, or smaller protein sub-parts denoted themes. The measures of similarity between the entities can be based on sequence, structure, function, or any combination of these. The representation can be global, encompassing the whole space, or local, focusing on a particular region surrounding protein(s) of interest. Global representations include lists of grouped proteins, protein networks, and maps. Networks are the abstraction that is derived most directly from the similarity data: each node is the protein entity (e.g., a domain), and edges connect similar domains. Selecting the entities, the similarity measure, and the abstraction are three intertwined decisions: the similarity measures allow us to identify the entities, and the selection of entities influences what is a meaningful similarity measure. Similarly, we seek entities that are related to each other in a way, for which a simple representation describes their relationships succinctly and accurately. This chapter will cover studies that rely on different entities, similarity measures, and a range of representations to better understand protein structure space. Scholars may use publicly available navigators offering a global representation, and in particular the hierarchical classifications SCOP, CATH, and ECOD, or a local representation, which encompass structural alignment algorithms. Alternatively, scholars can configure their own navigator using existing tools. To demonstrate this DIY (do it yourself) approach for navigating in protein space, we investigate substrate-binding proteins. By presenting sequence similarities among this large and diverse protein family as a network, we can infer that one member (pdb ID 4ntl; of yet unknown function) may bind methionine and suggest a putative binding mechanism.

Collapse

Keel BN, Deng B, Moriyama EN. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks. Bioinformatics 2018;34:1270-1277. [PMID: 29186344 DOI: 10.1093/bioinformatics/btx755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/23/2017] [Indexed: 11/14/2022] Open

Slama P. Two-domain analysis of JmjN-JmjC and PHD-JmjC lysine demethylases: Detecting an inter-domain evolutionary stress. Proteins 2017;86:3-12. [PMID: 28975662 DOI: 10.1002/prot.25394] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Revised: 09/26/2017] [Accepted: 10/03/2017] [Indexed: 11/09/2022]

Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017;114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

Abstract

We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected.

Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.

Collapse

Leuenberger P, Ganscha S, Kahraman A, Cappelletti V, Boersema PJ, von Mering C, Claassen M, Picotti P. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science 2017;355:355/6327/eaai7825. [PMID: 28232526 DOI: 10.1126/science.aai7825] [Citation(s) in RCA: 255] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 01/12/2017] [Indexed: 12/14/2022]

Arguments Reinforcing the Three-Domain View of Diversified Cellular Life. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2016;2016:1851865. [PMID: 28050162 PMCID: PMC5165138 DOI: 10.1155/2016/1851865] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 10/18/2016] [Accepted: 11/03/2016] [Indexed: 11/18/2022]

Bernardes J, Zaverucha G, Vaquero C, Carbone A. Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence. PLoS Comput Biol 2016;12:e1005038. [PMID: 27472895 PMCID: PMC4966962 DOI: 10.1371/journal.pcbi.1005038] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 06/28/2016] [Indexed: 11/30/2022] Open

Abstract

Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE.

Current sequence databases contain hundreds of billions of nucleotides coding for genes and a classification of these sequences is a primary problem in genomics. A reasonable way to organize these sequences is through their predicted domains, but the identification of domains in very divergent sequences, spanning the entire phylogenetic tree of species, is a difficult problem. By generating multiple probabilistic models for a domain, describing the spread of evolutionary patterns in different phylogenetic clades, we can effectively explore domains that are likely to be coded in gene sequences. Through a machine learning approach and optimization techniques, coding for expected evolutionary constraints, we filter the many possibilities of domain identification found for a gene and propose the most likely domain architecture associated to it. The application of this novel approach to the full genome of Plasmodium falciparum, to a dataset of sequences from three SCOP datasets highlights the interest of exploring multiple pathways of domain evolution in the aim of extracting biological information from genomic sequences. Our new computational approach was developed with the hope of providing a novel tier of accurate and precise tools that complement existing tools such as HMMer, HHblits and PSI-BLAST, by exploring in a novel way the large amount of sequence data available. The existence of powerful databases for sequences, domains and architectures help make this hope a reality.

Collapse

Lees JG, Dawson NL, Sillitoe I, Orengo CA. Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 2016;38:44-52. [DOI: 10.1016/j.sbi.2016.05.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 10/21/2022]

Papaleo E, Saladino G, Lambrughi M, Lindorff-Larsen K, Gervasio FL, Nussinov R. The Role of Protein Loops and Linkers in Conformational Dynamics and Allostery. Chem Rev 2016;116:6391-423. [DOI: 10.1021/acs.chemrev.5b00623] [Citation(s) in RCA: 239] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Tompa P. The principle of conformational signaling. Chem Soc Rev 2016;45:4252-84. [DOI: 10.1039/c6cs00011h] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015;35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]

Bernardes JS, Vieira FRJ, Zaverucha G, Carbone A. A multi-objective optimization approach accurately resolves protein domain architectures. Bioinformatics 2015;32:345-53. [PMID: 26458889 PMCID: PMC4734041 DOI: 10.1093/bioinformatics/btv582] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 10/02/2015] [Indexed: 11/15/2022] Open

Stolzer M, Siewert K, Lai H, Xu M, Durand D. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 2015;16 Suppl 14:S8. [PMID: 26451642 PMCID: PMC4610023 DOI: 10.1186/1471-2105-16-s14-s8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Reid WR, Zhang L, Liu N. Temporal Gene Expression Profiles of Pre Blood-Fed Adult Females Immediately Following Eclosion in the Southern House Mosquito Culex Quinquefasciatus. Int J Biol Sci 2015;11:1306-13. [PMID: 26435696 PMCID: PMC4582154 DOI: 10.7150/ijbs.12829] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 07/28/2015] [Indexed: 01/08/2023] Open

Abstract

Prior to acquisition of the first host blood meal, the anautogenous mosquito Culex quinquefasciatus requires a period of time in order to prepare for the blood feeding and, later, vitellogenesis. In the current study, we conducted whole transcriptome analyses of adult female Culex mosquitoes to identify genes that may be necessary for both taking of the blood meal, and processing of the blood meal in adult female mosquitoes Cx. quinquefasciatus. We examined temporal expression of genes for the periods of post eclosion and prior to the female freely taking a blood meal. We further evaluated the temporal expression of certain genes for the periods after the taking of a blood meal to identify genes that may be necessary for both the taking of the blood meal, and the processing of the blood meal. We found that adult females required a minimum of 48 h post-eclosion before they freely took their first blood meal. We hypothesized that gene expression signatures were altered in the mosquitoes before blood feeding in preparation for the acquisition of the blood meal through changes in multiple gene expression. To identify the genes involved in the acquisition of blood feeding, we quantified the gene expression levels of adult female Cx. quinquefasciatus using RNA Seq throughout a pre-blooding period from 2 to 72 h post eclosion at 12 h intervals. A total of 325 genes were determined to be differentially-expressed throughout the pre-blooding period, with the majority of differentially-expressed genes occurring between the 2 h and 12 h post-eclosion time points. Among the up-regulated genes were salivary proteins, cytochrome P450s, odorant-binding proteins, and proteases, while the majority of the down-regulated genes were hypothetical or cuticular genes. In addition, Trypsin was found to be up-regulated immediately following blood feeding, while trypsin and chymotrypsin were up-regulated at 48h and 60h post blood-feeding, respectively, suggesting that these proteases are likely involved in the digestion of the blood meal. Overall, this study reviewed multiple genes that might be involved in the adult female competency for blood meal acquisition in mosquitoes.

Collapse

Chang TC, Stergiopoulos I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett 2015;589:1813-8. [PMID: 26067847 DOI: 10.1016/j.febslet.2015.05.048] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Revised: 05/11/2015] [Accepted: 05/20/2015] [Indexed: 10/23/2022]

Multiple nucleophilic elbows leading to multiple active sites in a single module esterase from Sorangium cellulosum. J Struct Biol 2015;190:314-27. [DOI: 10.1016/j.jsb.2015.04.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Revised: 03/25/2015] [Accepted: 04/10/2015] [Indexed: 11/17/2022]

Linkeviciute V, Rackham OJL, Gough J, Oates ME, Fang H. Function-selective domain architecture plasticity potentials in eukaryotic genome evolution. Biochimie 2015;119:269-77. [PMID: 25980317 PMCID: PMC4679076 DOI: 10.1016/j.biochi.2015.05.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 05/06/2015] [Indexed: 12/20/2022]

Mbandi SK, Hesse U, van Heusden P, Christoffels A. Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms. BMC Bioinformatics 2015;16:58. [PMID: 25880035 PMCID: PMC4344733 DOI: 10.1186/s12859-015-0492-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 02/06/2015] [Indexed: 11/19/2022] Open

Abstract

BACKGROUND

De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies.

RESULTS

Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5' and 3') regions and non-coding gene loci.

CONCLUSIONS

IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.

Collapse

Krishnamurthy P, Hong JK, Kim JA, Jeong MJ, Lee YH, Lee SI. Genome-wide analysis of the expansin gene superfamily reveals Brassica rapa-specific evolutionary dynamics upon whole genome triplication. Mol Genet Genomics 2014;290:521-30. [PMID: 25325993 DOI: 10.1007/s00438-014-0935-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 09/30/2014] [Indexed: 01/27/2023]

Cromar G, Wong KC, Loughran N, On T, Song H, Xiong X, Zhang Z, Parkinson J. New tricks for "old" domains: how novel architectures and promiscuous hubs contributed to the organization and evolution of the ECM. Genome Biol Evol 2014;6:2897-917. [PMID: 25323955 PMCID: PMC4224354 DOI: 10.1093/gbe/evu228] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/07/2014] [Indexed: 12/15/2022] Open

Abstract

The extracellular matrix (ECM) is a defining characteristic of metazoans and consists of a meshwork of self-assembling, fibrous proteins, and their functionally related neighbours. Previous studies, focusing on a limited number of gene families, suggest that vertebrate complexity predominantly arose through the duplication and subsequent modification of retained, preexisting ECM genes. These genes provided the structural underpinnings to support a variety of specialized tissues, as well as a platform for the organization of spatio-temporal signaling and cell migration. However, the relative contributions of ancient versus novel domains to ECM evolution have not been quantified across the full range of ECM proteins. Here, utilizing a high quality list comprising 324 ECM genes, we reveal general and clade-specific domain combinations, identifying domains of eukaryotic and metazoan origin recruited into new roles in approximately two-third of the ECM proteins in humans representing novel vertebrate proteins. We show that, rather than acquiring new domains, sampling of new domain combinations has been key to the innovation of paralogous ECM genes during vertebrate evolution. Applying a novel framework for identifying potentially important, noncontiguous, conserved arrangements of domains, we find that the distinct biological characteristics of the ECM have arisen through unique evolutionary processes. These include the preferential recruitment of novel domains to existing architectures and the utilization of high promiscuity domains in organizing the ECM network around a connected array of structural hubs. Our focus on ECM proteins reveals that distinct types of proteins and/or the biological systems in which they operate have influenced the types of evolutionary forces that drive protein innovation. This emphasizes the need for rigorously defined systems to address questions of evolution that focus on specific systems of interacting proteins.

Collapse

Kočar V, Božič Abram S, Doles T, Bašić N, Gradišar H, Pisanski T, Jerala R. TOPOFOLD, the designed modular biomolecular folds: polypeptide-based molecular origami nanostructures following the footsteps of DNA. WILEY INTERDISCIPLINARY REVIEWS-NANOMEDICINE AND NANOBIOTECHNOLOGY 2014;7:218-37. [PMID: 25196147 DOI: 10.1002/wnan.1289] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2014] [Revised: 07/08/2014] [Accepted: 07/20/2014] [Indexed: 12/14/2022]

Khafif M, Cottret L, Balagué C, Raffaele S. Identification and phylogenetic analyses of VASt, an uncharacterized protein domain associated with lipid-binding domains in Eukaryotes. BMC Bioinformatics 2014;15:222. [PMID: 24965341 PMCID: PMC4082322 DOI: 10.1186/1471-2105-15-222] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 06/19/2014] [Indexed: 01/25/2023] Open

A daily-updated tree of (sequenced) life as a reference for genome research. Sci Rep 2014;3:2015. [PMID: 23778980 PMCID: PMC6504836 DOI: 10.1038/srep02015] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Accepted: 05/10/2013] [Indexed: 11/08/2022] Open

Ishino S, Yamagami T, Kitamura M, Kodera N, Mori T, Sugiyama S, Ando T, Goda N, Tenno T, Hiroaki H, Ishino Y. Multiple interactions of the intrinsically disordered region between the helicase and nuclease domains of the archaeal Hef protein. J Biol Chem 2014;289:21627-39. [PMID: 24947516 DOI: 10.1074/jbc.m114.554998] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Pandya C, Dunaway-Mariano D, Xia Y, Allen KN. Structure-guided approach for detecting large domain inserts in protein sequences as illustrated using the haloacid dehalogenase superfamily. Proteins 2014;82:1896-906. [PMID: 24577717 DOI: 10.1002/prot.24543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Revised: 02/19/2014] [Accepted: 02/22/2014] [Indexed: 11/11/2022]

Structural Annotation of the Mycobacterium tuberculosis Proteome. Microbiol Spectr 2014;2. [PMID: 26105824 DOI: 10.1128/microbiolspec.mgm2-0027-2013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Baines AJ, Lu HC, Bennett PM. The Protein 4.1 family: hub proteins in animals for organizing membrane proteins. BIOCHIMICA ET BIOPHYSICA ACTA 2014;1838:605-19. [PMID: 23747363 DOI: 10.1016/j.bbamem.2013.05.030] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Revised: 05/22/2013] [Accepted: 05/28/2013] [Indexed: 01/10/2023]

Abstract

Proteins of the 4.1 family are characteristic of eumetazoan organisms. Invertebrates contain single 4.1 genes and the Drosophila model suggests that 4.1 is essential for animal life. Vertebrates have four paralogues, known as 4.1R, 4.1N, 4.1G and 4.1B, which are additionally duplicated in the ray-finned fish. Protein 4.1R was the first to be discovered: it is a major mammalian erythrocyte cytoskeletal protein, essential to the mechanochemical properties of red cell membranes because it promotes the interaction between spectrin and actin in the membrane cytoskeleton. 4.1R also binds certain phospholipids and is required for the stable cell surface accumulation of a number of erythrocyte transmembrane proteins that span multiple functional classes; these include cell adhesion molecules, transporters and a chemokine receptor. The vertebrate 4.1 proteins are expressed in most tissues, and they are required for the correct cell surface accumulation of a very wide variety of membrane proteins including G-Protein coupled receptors, voltage-gated and ligand-gated channels, as well as the classes identified in erythrocytes. Indeed, such large numbers of protein interactions have been mapped for mammalian 4.1 proteins, most especially 4.1R, that it appears that they can act as hubs for membrane protein organization. The range of critical interactions of 4.1 proteins is reflected in disease relationships that include hereditary anaemias, tumour suppression, control of heartbeat and nervous system function. The 4.1 proteins are defined by their domain structure: apart from the spectrin/actin-binding domain they have FERM and FERM-adjacent domains and a unique C-terminal domain. Both the FERM and C-terminal domains can bind transmembrane proteins, thus they have the potential to be cross-linkers for membrane proteins. The activity of the FERM domain is subject to multiple modes of regulation via binding of regulatory ligands, phosphorylation of the FERM associated domain and differential mRNA splicing. Finally, the spectrum of interactions of the 4.1 proteins overlaps with that of another membrane-cytoskeleton linker, ankyrin. Both ankyrin and 4.1 link to the actin cytoskeleton via spectrin, and we hypothesize that differential regulation of 4.1 proteins and ankyrins allows highly selective control of cell surface protein accumulation and, hence, function. This article is part of a Special Issue entitled: Reciprocal influences between cell cytoskeleton and membrane channels, receptors and transporters. Guest Editor: Jean Claude Hervé

Collapse

Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput Biol 2014;10:e1003452. [PMID: 24499935 PMCID: PMC3907288 DOI: 10.1371/journal.pcbi.1003452] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 12/03/2013] [Indexed: 12/21/2022] Open

Abstract

Domains are modules within proteins that can fold and function independently and are evolutionarily conserved. Here we compared the usage and distribution of protein domain families in the free-living proteomes of Archaea, Bacteria and Eukarya and reconstructed species phylogenies while tracing the history of domain emergence and loss in proteomes. We show that both gains and losses of domains occurred frequently during proteome evolution. The rate of domain discovery increased approximately linearly in evolutionary time. Remarkably, gains generally outnumbered losses and the gain-to-loss ratios were much higher in akaryotes compared to eukaryotes. Functional annotations of domain families revealed that both Archaea and Bacteria gained and lost metabolic capabilities during the course of evolution while Eukarya acquired a number of diverse molecular functions including those involved in extracellular processes, immunological mechanisms, and cell regulation. Results also highlighted significant contemporary sharing of informational enzymes between Archaea and Eukarya and metabolic enzymes between Bacteria and Eukarya. Finally, the analysis provided useful insights into the evolution of species. The archaeal superkingdom appeared first in evolution by gradual loss of ancestral domains, bacterial lineages were the first to gain superkingdom-specific domains, and eukaryotes (likely) originated when an expanding proto-eukaryotic stem lineage gained organelles through endosymbiosis of already diversified bacterial lineages. The evolutionary dynamics of domain families in proteomes and the increasing number of domain gains is predicted to redefine the persistence strategies of organisms in superkingdoms, influence the make up of molecular functions, and enhance organismal complexity by the generation of new domain architectures. This dynamics highlights ongoing secondary evolutionary adaptations in akaryotic microbes, especially Archaea.

Collapse

Fornili A, Pandini A, Lu HC, Fraternali F. Specialized Dynamical Properties of Promiscuous Residues Revealed by Simulated Conformational Ensembles. J Chem Theory Comput 2013;9:5127-5147. [PMID: 24250278 PMCID: PMC3827836 DOI: 10.1021/ct400486p] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2013] [Indexed: 12/13/2022]

Abstract

The ability to interact with different partners is one of the most important features in proteins. Proteins that bind a large number of partners (hubs) have been often associated with intrinsic disorder. However, many examples exist of hubs with an ordered structure, and evidence of a general mechanism promoting promiscuity in ordered proteins is still elusive. An intriguing hypothesis is that promiscuous binding sites have specific dynamical properties, distinct from the rest of the interface and pre-existing in the protein isolated state. Here, we present the first comprehensive study of the intrinsic dynamics of promiscuous residues in a large protein data set. Different computational methods, from coarse-grained elastic models to geometry-based sampling methods and to full-atom Molecular Dynamics simulations, were used to generate conformational ensembles for the isolated proteins. The flexibility and dynamic correlations of interface residues with a different degree of binding promiscuity were calculated and compared considering side chain and backbone motions, the latter both on a local and on a global scale. The study revealed that (a) promiscuous residues tend to be more flexible than nonpromiscuous ones, (b) this additional flexibility has a higher degree of organization, and (c) evolutionary conservation and binding promiscuity have opposite effects on intrinsic dynamics. Findings on simulated ensembles were also validated on ensembles of experimental structures extracted from the Protein Data Bank (PDB). Additionally, the low occurrence of single nucleotide polymorphisms observed for promiscuous residues indicated a tendency to preserve binding diversity at these positions. A case study on two ubiquitin-like proteins exemplifies how binding promiscuity in evolutionary related proteins can be modulated by the fine-tuning of the interface dynamics. The interplay between promiscuity and flexibility highlighted here can inspire new directions in protein–protein interaction prediction and design methods.

Collapse