1
|
Ulusoy E, Doğan T. Mutual annotation-based prediction of protein domain functions with Domain2GO. Protein Sci 2024; 33:e4988. [PMID: 38757367 PMCID: PMC11099699 DOI: 10.1002/pro.4988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/25/2024] [Accepted: 03/30/2024] [Indexed: 05/18/2024]
Abstract
Identifying unknown functional properties of proteins is essential for understanding their roles in both health and disease states. The domain composition of a protein can reveal critical information in this context, as domains are structural and functional units that dictate how the protein should act at the molecular level. The expensive and time-consuming nature of wet-lab experimental approaches prompted researchers to develop computational strategies for predicting the functions of proteins. In this study, we proposed a new method called Domain2GO that infers associations between protein domains and function-defining gene ontology (GO) terms, thus redefining the problem as domain function prediction. Domain2GO uses documented protein-level GO annotations together with proteins' domain annotations. Co-annotation patterns of domains and GO terms in the same proteins are examined using statistical resampling to obtain reliable associations. As a use-case study, we evaluated the biological relevance of examples selected from the Domain2GO-generated domain-GO term mappings via literature review. Then, we applied Domain2GO to predict unknown protein functions by propagating domain-associated GO terms to proteins annotated with these domains. For function prediction performance evaluation and comparison against other methods, we employed Critical Assessment of Function Annotation 3 (CAFA3) challenge datasets. The results demonstrated the high potential of Domain2GO, particularly for predicting molecular function and biological process terms, along with advantages such as producing interpretable results and having an exceptionally low computational cost. The approach presented here can be extended to other ontologies and biological entities to investigate unknown relationships in complex and large-scale biological data. The source code, datasets, results, and user instructions for Domain2GO are available at https://github.com/HUBioDataLab/Domain2GO. Additionally, we offer a user-friendly online tool at https://huggingface.co/spaces/HUBioDataLab/Domain2GO, which simplifies the prediction of functions of previously unannotated proteins solely using amino acid sequences.
Collapse
Affiliation(s)
- Erva Ulusoy
- Biological Data Science Lab, Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
- Department of BioinformaticsGraduate School of Health Sciences, Hacettepe UniversityAnkaraTurkey
| | - Tunca Doğan
- Biological Data Science Lab, Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
- Department of BioinformaticsGraduate School of Health Sciences, Hacettepe UniversityAnkaraTurkey
| |
Collapse
|
2
|
Zimmerman L, Alon N, Levin I, Koganitsky A, Shpigel N, Brestel C, Lapidoth GD. Context-dependent design of induced-fit enzymes using deep learning generates well-expressed, thermally stable and active enzymes. Proc Natl Acad Sci U S A 2024; 121:e2313809121. [PMID: 38437538 PMCID: PMC10945820 DOI: 10.1073/pnas.2313809121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/09/2024] [Indexed: 03/06/2024] Open
Abstract
The potential of engineered enzymes in industrial applications is often limited by their expression levels, thermal stability, and catalytic diversity. De novo enzyme design faces challenges due to the complexity of enzymatic catalysis. An alternative approach involves expanding natural enzyme capabilities for new substrates and parameters. Here, we introduce CoSaNN (Conformation Sampling using Neural Network), an enzyme design strategy using deep learning for structure prediction and sequence optimization. CoSaNN controls enzyme conformations to expand chemical space beyond simple mutagenesis. It employs a context-dependent approach for generating enzyme designs, considering non-linear relationships in sequence and structure space. We also developed SolvIT, a graph NN predicting protein solubility in Escherichia coli, optimizing enzyme expression selection from larger design sets. Using this method, we engineered enzymes with superior expression levels, with 54% expressed in E. coli, and increased thermal stability, with over 30% having higher Tm than the template, with no high-throughput screening. Our research underscores AI's transformative role in protein design, capturing high-order interactions and preserving allosteric mechanisms in extensively modified enzymes, and notably enhancing expression success rates. This method's ease of use and efficiency streamlines enzyme design, opening broad avenues for biotechnological applications and broadening field accessibility.
Collapse
Affiliation(s)
| | - Noga Alon
- Enzymit Ltd., Ness-Ziona7403626, Israel
| | | | | | | | | | | |
Collapse
|
3
|
García-Paz FDM, Del Moral S, Morales-Arrieta S, Ayala M, Treviño-Quintanilla LG, Olvera-Carranza C. Multidomain chimeric enzymes as a promising alternative for biocatalysts improvement: a minireview. Mol Biol Rep 2024; 51:410. [PMID: 38466518 DOI: 10.1007/s11033-024-09332-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 02/07/2024] [Indexed: 03/13/2024]
Abstract
Searching for new and better biocatalysts is an area of study in constant development. In nature, mechanisms generally occurring in evolution, such as genetic duplication, recombination, and natural selection processes, produce various enzymes with different architectures and properties. The recombination of genes that code proteins produces multidomain chimeric enzymes that contain two or more domains that sometimes enhance their catalytic properties. Protein engineering has mimicked this process to enhance catalytic activity and the global stability of enzymes, searching for new and better biocatalysts. Here, we present and discuss examples from both natural and synthetic multidomain chimeric enzymes and how additional domains heighten their stability and catalytic activity. Moreover, we also describe progress in developing new biocatalysts using synthetic fusion enzymes and revise some methodological strategies to improve their biological fitness.
Collapse
Affiliation(s)
- Flor de María García-Paz
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Av. Universidad 2001 Col. Chamilpa CP 62210, Cuernavaca, Morelos, México
| | - Sandra Del Moral
- Investigador por México-CONAHCyT, Unidad de Investigación y Desarrollo en Alimentos, Tecnológico Nacional de México, Campus Veracruz. MA de Quevedo 2779, Col. Formando Hogar, CP 91960, Veracruz, Veracruz, México
| | - Sandra Morales-Arrieta
- Departamento de Biotecnología, Universidad Politécnica del Estado de Morelos, Boulevard Cuauhnáhuac No. 566 Col. Lomas del Texcal CP 62550, Jiutepec, Morelos, México
| | - Marcela Ayala
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Av. Universidad 2001 Col. Chamilpa CP 62210, Cuernavaca, Morelos, México
| | - Luis Gerardo Treviño-Quintanilla
- Departamento de Biotecnología, Universidad Politécnica del Estado de Morelos, Boulevard Cuauhnáhuac No. 566 Col. Lomas del Texcal CP 62550, Jiutepec, Morelos, México
| | - Clarita Olvera-Carranza
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Av. Universidad 2001 Col. Chamilpa CP 62210, Cuernavaca, Morelos, México.
| |
Collapse
|
4
|
Bonello J, Orengo C. FunPredCATH: An ensemble method for predicting protein function using CATH. BIOCHIMICA ET BIOPHYSICA ACTA. PROTEINS AND PROTEOMICS 2024; 1872:140985. [PMID: 38122964 DOI: 10.1016/j.bbapap.2023.140985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/05/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023]
Abstract
MOTIVATION The growth of unannotated proteins in UniProt increases at a very high rate every year due to more efficient sequencing methods. However, the experimental annotation of proteins is a lengthy and expensive process. Using computational techniques to narrow the search can speed up the process by providing highly specific Gene Ontology (GO) terms. METHODOLOGY We propose an ensemble approach that combines three generic base predictors that predict Gene Ontology (BP, CC and MF) terms from sequences across different species. We train our models on UniProtGOA annotation data and use the CATH domain resources to identify the protein families. We then calculate a score based on the prevalence of individual GO terms in the functional families that is then used as an indicator of confidence when assigning the GO term to an uncharacterised protein. METHODS In the ensemble, we use a statistics-based method that scores the occurrence of GO terms in a CATH FunFam against a background set of proteins annotated by the same GO term. We also developed a set-based method that uses Set Intersection and Set Union to score the occurrence of GO terms within the same CATH FunFam. Finally, we also use FunFams-Plus, a predictor method developed by the Orengo Group at UCL to predict GO terms for uncharacterised proteins in the CAFA3 challenge. EVALUATION We evaluated the methods against the CAFA3 benchmark and DomFun. We used the Precision, Recall and Fmax metrics and the benchmark datasets that are used in CAFA3 to evaluate our models and compare them to the CAFA3 results. Our results show that FunPredCATH compares well with top CAFA methods in the different ontologies and benchmarks. CONTRIBUTIONS FunPredCATH compares well with other prediction methods on CAFA3, and the ensemble approach outperforms the base methods. We show that non-IEA models obtain higher Fmax scores than the IEA counterparts, while the models including IEA annotations have higher coverage at the expense of a lower Fmax score.
Collapse
Affiliation(s)
- Joseph Bonello
- Department of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom; Department of Computer Information Systems, University of Malta, Faculty of ICT, Msida, MSD 2080, Malta.
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom
| |
Collapse
|
5
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
6
|
Dias RVR, Pedro RP, Sanches MN, Moreira GC, Leite VBP, Caruso IP, de Melo FA, de Oliveira LC. Unveiling Metastable Ensembles of GRB2 and the Relevance of Interdomain Communication during Folding. J Chem Inf Model 2023; 63:6344-6353. [PMID: 37824286 DOI: 10.1021/acs.jcim.3c00955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
The folding process of multidomain proteins is a highly intricate phenomenon involving the assembly of distinct domains into a functional three-dimensional structure. During this process, each domain may fold independently while interacting with others. The folding of multidomain proteins can be influenced by various factors, including their composition, the structure of each domain, or the presence of disordered regions, as well as the surrounding environment. Misfolding of multidomain proteins can lead to the formation of nonfunctional structures associated with a range of diseases, including cancers or neurodegenerative disorders. Understanding this process is an important step for many biophysical analyses such as stability, interaction, malfunctioning, and rational drug design. One such multidomain protein is growth factor receptor-bound protein 2 (GRB2), an adaptor protein that is essential in regulating cell survival. GRB2 consists of one central Src homology 2 (SH2) domain flanked by two Src homology 3 (SH3) domains. The SH2 domain interacts with phosphotyrosine regions in other proteins, while the SH3 domains recognize proline-rich regions on protein partners during cell signaling. Here, we combined computational and experimental techniques to investigate the folding process of GRB2. Through computational simulations, we sampled the conformational space and mapped the mechanisms involved by the free energy profiles, which may indicate possible intermediate states. From the molecular dynamics trajectories, we used the energy landscape visualization method (ELViM), which allowed us to visualize a three-dimensional (3D) representation of the overall energy surface. We identified two possible parallel folding routes that cannot be seen in a one-dimensional analysis, with one occurring more frequently during folding. Supporting these results, we used differential scanning calorimetry (DSC) and fluorescence spectroscopy techniques to confirm these intermediate states in vitro. Finally, we analyzed the deletion of domains to compare our model outputs to previously published results, supporting the presence of interdomain modulation. Overall, our study highlights the significance of interdomain communication within the GRB2 protein and its impact on the formation, stability, and structural plasticity of the protein, which are crucial for its interaction with other proteins in key signaling pathways.
Collapse
Affiliation(s)
- Raphael V R Dias
- Department of Physics, São Paulo State University (UNESP), Institute of Biosciences, Humanities, and Exact Sciences, São José do Rio Preto, SP 15054-000, Brazil
- Multiuser Center for Biomolecular Innovation (CMIB), São Paulo State University (UNESP), São José do Rio Preto, SP 15054-000, Brazil
| | - Renan P Pedro
- Department of Physics, São Paulo State University (UNESP), Institute of Biosciences, Humanities, and Exact Sciences, São José do Rio Preto, SP 15054-000, Brazil
- Multiuser Center for Biomolecular Innovation (CMIB), São Paulo State University (UNESP), São José do Rio Preto, SP 15054-000, Brazil
| | - Murilo N Sanches
- Department of Physics, São Paulo State University (UNESP), Institute of Biosciences, Humanities, and Exact Sciences, São José do Rio Preto, SP 15054-000, Brazil
| | - Giovana C Moreira
- Department of Physics, São Paulo State University (UNESP), Institute of Biosciences, Humanities, and Exact Sciences, São José do Rio Preto, SP 15054-000, Brazil
- Multiuser Center for Biomolecular Innovation (CMIB), São Paulo State University (UNESP), São José do Rio Preto, SP 15054-000, Brazil
| | - Vitor B P Leite
- Department of Physics, São Paulo State University (UNESP), Institute of Biosciences, Humanities, and Exact Sciences, São José do Rio Preto, SP 15054-000, Brazil
| | - Icaro P Caruso
- Department of Physics, São Paulo State University (UNESP), Institute of Biosciences, Humanities, and Exact Sciences, São José do Rio Preto, SP 15054-000, Brazil
- Multiuser Center for Biomolecular Innovation (CMIB), São Paulo State University (UNESP), São José do Rio Preto, SP 15054-000, Brazil
| | - Fernando A de Melo
- Department of Physics, São Paulo State University (UNESP), Institute of Biosciences, Humanities, and Exact Sciences, São José do Rio Preto, SP 15054-000, Brazil
- Multiuser Center for Biomolecular Innovation (CMIB), São Paulo State University (UNESP), São José do Rio Preto, SP 15054-000, Brazil
| | - Leandro C de Oliveira
- Department of Physics, São Paulo State University (UNESP), Institute of Biosciences, Humanities, and Exact Sciences, São José do Rio Preto, SP 15054-000, Brazil
| |
Collapse
|
7
|
da Silva Dambroz CM, Aono AH, de Andrade Silva EM, Pereira WA. Genome-wide analysis and characterization of the LRR-RLK gene family provides insights into anthracnose resistance in common bean. Sci Rep 2023; 13:13455. [PMID: 37596307 PMCID: PMC10439169 DOI: 10.1038/s41598-023-40054-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 08/03/2023] [Indexed: 08/20/2023] Open
Abstract
Anthracnose, caused by the hemibiotrophic fungus Colletotrichum lindemuthianum, is a damaging disease of common beans that can drastically reduce crop yield. The most effective strategy to manage anthracnose is the use of resistant cultivars. There are many resistance loci that have been identified, mapped and associated with markers in common bean chromosomes. The Leucine-rich repeat kinase receptor protein (LRR-RLK) family is a diverse group of transmembrane receptors, which potentially recognizes pathogen-associated molecular patterns and activates an immune response. In this study, we performed in silico analyses to identify, classify, and characterize common bean LRR-RLKs, also evaluating their expression profile in response to the infection by C. lindemuthianum. By analyzing the entire genome of Phaseolus vulgaris, we could identify and classify 230 LRR-RLKs into 15 different subfamilies. The analyses of gene structures, conserved domains and motifs suggest that LRR-RLKs from the same subfamily are consistent in their exon/intron organization and composition. LRR-RLK genes were found along the 11 chromosomes of the species, including regions of proximity with anthracnose resistance markers. By investigating the duplication events within the LRR-RLK family, we associated the importance of such a family with an expansion resulting from a strong stabilizing selection. Promoter analysis was also performed, highlighting cis-elements associated with the plant response to biotic stress. With regard to the expression pattern of LRR-RLKs in response to the infection by C. lindemuthianum, we could point out several differentially expressed genes in this subfamily, which were associated to specific molecular patterns of LRR-RLKs. Our work provides a broad analysis of the LRR-RLK family in P. vulgaris, allowing an in-depth structural and functional characterization of genes and proteins of this family. From specific expression patterns related to anthracnose response, we could infer a direct participation of RLK-LRR genes in the mechanisms of resistance to anthracnose, highlighting important subfamilies for further investigations.
Collapse
Affiliation(s)
| | - Alexandre Hild Aono
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, SP, Brazil
| | | | | |
Collapse
|
8
|
Maatouk M, Merhej V, Pontarotti P, Ibrahim A, Rolain JM, Bittar F. Metallo-Beta-Lactamase-like Encoding Genes in Candidate Phyla Radiation: Widespread and Highly Divergent Proteins with Potential Multifunctionality. Microorganisms 2023; 11:1933. [PMID: 37630493 PMCID: PMC10459063 DOI: 10.3390/microorganisms11081933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/22/2023] [Accepted: 07/27/2023] [Indexed: 08/27/2023] Open
Abstract
The Candidate Phyla Radiation (CPR) was found to harbor a vast repertoire of genes encoding for enzymes with potential antibiotic resistance activity. Among these, as many as 3349 genes were predicted in silico to contain a metallo-beta-lactamase-like (MBL-like) fold. These proteins were subject to an in silico functional characterization by comparing their protein profiles (presence/absence of conserved protein domains) to other MBLs, including 24 already expressed in vitro, along with those of the beta-lactamase database (BLDB) (n = 761). The sequence similarity network (SSN) was then used to predict the functional clusters of CPR MBL-like sequences. Our findings showed that CPR MBL-like sequences were longer and more diverse than bacterial MBL sequences, with a high content of functional domains. Most CPR MBL-like sequences did not show any SSN connectivity with expressed MBLs, indicating the presence of many potential, yet unidentified, functions in CPR. In conclusion, CPR was shown to have many protein functions and a large sequence variability of MBL-like folds, exceeding all known MBLs. Further experimental and evolutionary studies of this superfamily of hydrolyzing enzymes are necessary to illustrate their functional annotation, origin, and expansion for adaptation or specialization within a given niche or compared to a specific substrate.
Collapse
Affiliation(s)
- Mohamad Maatouk
- Microbes, Evolution, Phylogénie et Infection (MEPHI), Institut de Recherche pour le Développement (IRD), Assistance Publique-Hôpitaux de Marseille (AP-HM), Aix-Marseille University, 13005 Marseille, France; (M.M.); (P.P.); (A.I.); (J.-M.R.)
- Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, 13005 Marseille, France
| | - Vicky Merhej
- Microbes, Evolution, Phylogénie et Infection (MEPHI), Institut de Recherche pour le Développement (IRD), Assistance Publique-Hôpitaux de Marseille (AP-HM), Aix-Marseille University, 13005 Marseille, France; (M.M.); (P.P.); (A.I.); (J.-M.R.)
- Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, 13005 Marseille, France
| | - Pierre Pontarotti
- Microbes, Evolution, Phylogénie et Infection (MEPHI), Institut de Recherche pour le Développement (IRD), Assistance Publique-Hôpitaux de Marseille (AP-HM), Aix-Marseille University, 13005 Marseille, France; (M.M.); (P.P.); (A.I.); (J.-M.R.)
- Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, 13005 Marseille, France
- Centre National de la Recherche Scientifique (CNRS-SNC5039), 13009 Marseille, France
| | - Ahmad Ibrahim
- Microbes, Evolution, Phylogénie et Infection (MEPHI), Institut de Recherche pour le Développement (IRD), Assistance Publique-Hôpitaux de Marseille (AP-HM), Aix-Marseille University, 13005 Marseille, France; (M.M.); (P.P.); (A.I.); (J.-M.R.)
- Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, 13005 Marseille, France
| | - Jean-Marc Rolain
- Microbes, Evolution, Phylogénie et Infection (MEPHI), Institut de Recherche pour le Développement (IRD), Assistance Publique-Hôpitaux de Marseille (AP-HM), Aix-Marseille University, 13005 Marseille, France; (M.M.); (P.P.); (A.I.); (J.-M.R.)
- Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, 13005 Marseille, France
| | - Fadi Bittar
- Microbes, Evolution, Phylogénie et Infection (MEPHI), Institut de Recherche pour le Développement (IRD), Assistance Publique-Hôpitaux de Marseille (AP-HM), Aix-Marseille University, 13005 Marseille, France; (M.M.); (P.P.); (A.I.); (J.-M.R.)
- Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, 13005 Marseille, France
| |
Collapse
|
9
|
Iruegas R, Pfefferle K, Göttig S, Averhoff B, Ebersberger I. Feature architecture aware phylogenetic profiling indicates a functional diversification of type IVa pili in the nosocomial pathogen Acinetobacter baumannii. PLoS Genet 2023; 19:e1010646. [PMID: 37498819 PMCID: PMC10374093 DOI: 10.1371/journal.pgen.1010646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 06/06/2023] [Indexed: 07/29/2023] Open
Abstract
The Gram-negative bacterial pathogen Acinetobacter baumannii is a major cause of hospital-acquired opportunistic infections. The increasing spread of pan-drug resistant strains makes A. baumannii top-ranking among the ESKAPE pathogens for which novel routes of treatment are urgently needed. Comparative genomics approaches have successfully identified genetic changes coinciding with the emergence of pathogenicity in Acinetobacter. Genes that are prevalent both in pathogenic and a-pathogenic Acinetobacter species were not considered ignoring that virulence factors may emerge by the modification of evolutionarily old and widespread proteins. Here, we increased the resolution of comparative genomics analyses to also include lineage-specific changes in protein feature architectures. Using type IVa pili (T4aP) as an example, we show that three pilus components, among them the pilus tip adhesin ComC, vary in their Pfam domain annotation within the genus Acinetobacter. In most pathogenic Acinetobacter isolates, ComC displays a von Willebrand Factor type A domain harboring a finger-like protrusion, and we provide experimental evidence that this finger conveys virulence-related functions in A. baumannii. All three genes are part of an evolutionary cassette, which has been replaced at least twice during A. baumannii diversification. The resulting strain-specific differences in T4aP layout suggests differences in the way how individual strains interact with their host. Our study underpins the hypothesis that A. baumannii uses T4aP for host infection as it was shown previously for other pathogens. It also indicates that many more functional complexes may exist whose precise functions have been adjusted by modifying individual components on the domain level.
Collapse
Affiliation(s)
- Ruben Iruegas
- Applied Bioinformatics Group, Inst of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Katharina Pfefferle
- Molecular Microbiology & Bioenergetics, Institute of Molecular Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Stephan Göttig
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt, Germany
| | - Beate Averhoff
- Molecular Microbiology & Bioenergetics, Institute of Molecular Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Inst of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt am Main, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| |
Collapse
|
10
|
Dosch J, Bergmann H, Tran V, Ebersberger I. FAS: assessing the similarity between proteins using multi-layered feature architectures. Bioinformatics 2023; 39:btad226. [PMID: 37084276 PMCID: PMC10185405 DOI: 10.1093/bioinformatics/btad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/23/2023] [Accepted: 04/13/2023] [Indexed: 04/23/2023] Open
Abstract
MOTIVATION Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations. RESULTS Here, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximize the pair-wise architecture similarity. In a large-scale evaluation on more than 10 000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications. AVAILABILITY AND IMPLEMENTATION FAS is available as python package: https://pypi.org/project/greedyFAS/.
Collapse
Affiliation(s)
- Julian Dosch
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Holger Bergmann
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Vinh Tran
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, 60325, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt, 60325, Germany
| |
Collapse
|
11
|
Saco A, Suárez H, Novoa B, Figueras A. A Genomic and Transcriptomic Analysis of the C-Type Lectin Gene Family Reveals Highly Expanded and Diversified Repertoires in Bivalves. Mar Drugs 2023; 21:md21040254. [PMID: 37103393 PMCID: PMC10140915 DOI: 10.3390/md21040254] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 04/17/2023] [Accepted: 04/18/2023] [Indexed: 04/28/2023] Open
Abstract
C-type lectins belong to a widely conserved family of lectins characterized in Metazoa. They show important functional diversity and immune implications, mainly as pathogen recognition receptors. In this work, C-type lectin-like proteins (CTLs) of a set of metazoan species were analyzed, revealing an important expansion in bivalve mollusks, which contrasted with the reduced repertoires of other mollusks, such as cephalopods. Orthology relationships demonstrated that these expanded repertoires consisted of CTL subfamilies conserved within Mollusca or Bivalvia and of lineage-specific subfamilies with orthology only between closely related species. Transcriptomic analyses revealed the importance of the bivalve subfamilies in mucosal immunity, as they were mainly expressed in the digestive gland and gills and modulated with specific stimuli. CTL domain-containing proteins that had additional domains (CTLDcps) were also studied, revealing interesting gene families with different conservation degrees of the CTL domain across orthologs from different taxa. Unique bivalve CTLDcps with specific domain architectures were revealed, corresponding to uncharacterized bivalve proteins with putative immune function according to their transcriptomic modulation, which could constitute interesting targets for functional characterization.
Collapse
Affiliation(s)
- Amaro Saco
- Institute of Marine Research IIM-CSIC, 36208 Vigo, Spain
| | - Hugo Suárez
- Institute of Marine Research IIM-CSIC, 36208 Vigo, Spain
| | - Beatriz Novoa
- Institute of Marine Research IIM-CSIC, 36208 Vigo, Spain
| | | |
Collapse
|
12
|
Diaz-Parga P, Gould A, de Alba E. Natural and engineered inflammasome adapter proteins reveal optimum linker length for self-assembly. J Biol Chem 2022; 298:102501. [PMID: 36116550 PMCID: PMC9640978 DOI: 10.1016/j.jbc.2022.102501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 08/31/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022] Open
Abstract
The inflammasome is a multiprotein complex that triggers the activation of proinflammatory cytokines. The adapter ASC and its isoform ASCb mediate inflammasome assembly via self-association and oligomerization with other inflammasome proteins by homotypic interactions of their two identical Death Domains, PYD and CARD, connected by a linker of different length: 23 (ASC) and 4 (ASCb) amino acids long. However, ASC is a more potent inflammasome activator compared to ASCb. Thus, adapter isoforms might be involved in the regulation of the inflammatory response. As previously reported, ASC's faster and less polydisperse self-association compared to ASCb points to interdomain flexibility resulting from the linker length as a key factor in inflammasome regulation. To test the influence of linker length in self-association, we have engineered the isoform ASC3X with identical PYD and CARD connected by a 69 amino acid-long linker (i.e., three-times longer than ASC's linker). Real-time NMR and dynamic light scattering data indicate that ASC3X polymerization is less effective and more polydisperse compared to ASC or ASCb. However, transmission electron micrographs show that ASC3X can polymerize into filaments. Comparative interdomain dynamics of the three isoforms obtained from NMR relaxation data reveal that ASCb tumbles as a rod, whereas the PYD and CARD of ASC and ASC3X tumble independently with marginally higher interdomain flexibility in ASC3X. Altogether, our data suggest that ASC's linker length is optimized for self-association by allowing enough flexibility to favor intermolecular homotypic interactions but simultaneously keeping both domains sufficiently close for essential participation in filament formation.
Collapse
Affiliation(s)
- Pedro Diaz-Parga
- Department of Bioengineering, School of Engineering, University of California Merced, California, USA; Quantitative Systems Biology PhD Program, University of California Merced, California, USA
| | - Andrea Gould
- Department of Bioengineering, School of Engineering, University of California Merced, California, USA
| | - Eva de Alba
- Department of Bioengineering, School of Engineering, University of California Merced, California, USA.
| |
Collapse
|
13
|
Gilchrist CLM, Chooi YH. Synthaser: a CD-Search enabled Python toolkit for analysing domain architecture of fungal secondary metabolite megasynth(et)ases. Fungal Biol Biotechnol 2021; 8:13. [PMID: 34763725 PMCID: PMC8582187 DOI: 10.1186/s40694-021-00120-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 10/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fungi are prolific producers of secondary metabolites (SMs), which are bioactive small molecules with important applications in medicine, agriculture and other industries. The backbones of a large proportion of fungal SMs are generated through the action of large, multi-domain megasynth(et)ases such as polyketide synthases (PKSs) and nonribosomal peptide synthetases (NRPSs). The structure of these backbones is determined by the domain architecture of the corresponding megasynth(et)ase, and thus accurate annotation and classification of these architectures is an important step in linking SMs to their biosynthetic origins in the genome. RESULTS Here we report synthaser, a Python package leveraging the NCBI's conserved domain search tool for remote prediction and classification of fungal megasynth(et)ase domain architectures. Synthaser is capable of batch sequence analysis, and produces rich textual output and interactive visualisations which allow for quick assessment of the megasynth(et)ase diversity of a fungal genome. Synthaser uses a hierarchical rule-based classification system, which can be extensively customised by the user through a web application ( http://gamcil.github.io/synthaser ). We show that synthaser provides more accurate domain architecture predictions than comparable tools which rely on curated profile hidden Markov model (pHMM)-based approaches; the utilisation of the NCBI conserved domain database also allows for significantly greater flexibility compared to pHMM approaches. In addition, we demonstrate how synthaser can be applied to large scale genome mining pipelines through the construction of an Aspergillus PKS similarity network. CONCLUSIONS Synthaser is an easy to use tool that represents a significant upgrade to previous domain architecture analysis tools. It is freely available under a MIT license from PyPI ( https://pypi.org/project/synthaser ) and GitHub ( https://github.com/gamcil/synthaser ).
Collapse
Affiliation(s)
- Cameron L M Gilchrist
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Australia.
| | - Yit-Heng Chooi
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Australia.
| |
Collapse
|
14
|
Zhao VY, Rodrigues JV, Lozovsky ER, Hartl DL, Shakhnovich EI. Switching an active site helix in dihydrofolate reductase reveals limits to subdomain modularity. Biophys J 2021; 120:4738-4750. [PMID: 34571014 PMCID: PMC8595743 DOI: 10.1016/j.bpj.2021.09.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 09/14/2021] [Accepted: 09/22/2021] [Indexed: 11/23/2022] Open
Abstract
To what degree are individual structural elements within proteins modular such that similar structures from unrelated proteins can be interchanged? We study subdomain modularity by creating 20 chimeras of an enzyme, Escherichia coli dihydrofolate reductase (DHFR), in which a catalytically important, 10-residue α-helical sequence is replaced by α-helical sequences from a diverse set of proteins. The chimeras stably fold but have a range of diminished thermal stabilities and catalytic activities. Evolutionary coupling analysis indicates that the residues of this α-helix are under selection pressure to maintain catalytic activity in DHFR. Reversion to phenylalanine at key position 31 was found to partially restore catalytic activity, which could be explained by evolutionary coupling values. We performed molecular dynamics simulations using replica exchange with solute tempering. Chimeras with low catalytic activity exhibit nonhelical conformations that block the binding site and disrupt the positioning of the catalytically essential residue D27. Simulation observables and in vitro measurements of thermal stability and substrate-binding affinity are strongly correlated. Several E. coli strains with chromosomally integrated chimeric DHFRs can grow, with growth rates that follow predictions from a kinetic flux model that depends on the intracellular abundance and catalytic activity of DHFR. Our findings show that although α-helices are not universally substitutable, the molecular and fitness effects of modular segments can be predicted by the biophysical compatibility of the replacement segment.
Collapse
Affiliation(s)
- Victor Y Zhao
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - João V Rodrigues
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Elena R Lozovsky
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Daniel L Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
15
|
Álvarez-Lugo A, Becerra A. The Role of Gene Duplication in the Divergence of Enzyme Function: A Comparative Approach. Front Genet 2021; 12:641817. [PMID: 34335678 PMCID: PMC8318041 DOI: 10.3389/fgene.2021.641817] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
Gene duplication is a crucial process involved in the appearance of new genes and functions. It is thought to have played a major role in the growth of enzyme families and the expansion of metabolism at the biosphere's dawn and in recent times. Here, we analyzed paralogous enzyme content within each of the seven enzymatic classes for a representative sample of prokaryotes by a comparative approach. We found a high ratio of paralogs for three enzymatic classes: oxidoreductases, isomerases, and translocases, and within each of them, most of the paralogs belong to only a few subclasses. Our results suggest an intricate scenario for the evolution of prokaryotic enzymes, involving different fates for duplicated enzymes fixed in the genome, where around 20-40% of prokaryotic enzymes have paralogs. Intracellular organisms have a lesser ratio of duplicated enzymes, whereas free-living enzymes show the highest ratios. We also found that phylogenetically close phyla and some unrelated but with the same lifestyle share similar genomic and biochemical traits, which ultimately support the idea that gene duplication is associated with environmental adaptation.
Collapse
Affiliation(s)
- Alejandro Álvarez-Lugo
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Arturo Becerra
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
16
|
Caetano-Anollés G. The Compressed Vocabulary of Microbial Life. Front Microbiol 2021; 12:655990. [PMID: 34305827 PMCID: PMC8292947 DOI: 10.3389/fmicb.2021.655990] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 04/27/2021] [Indexed: 12/22/2022] Open
Abstract
Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspondences between the lexicon, syntax, semantics, and pragmatics of language organization and the module, structure, function, and fitness paradigms of molecular biology. These correspondences are constrained by universal laws and engineering principles. Macromolecular structure, for example, follows quantitative linguistic patterns arising from statistical laws that are likely universal, including the Zipf's law, a special case of the scale-free distribution, the Heaps' law describing sublinear growth typical of economies of scales, and the Menzerath-Altmann's law, which imposes size-dependent patterns of decreasing returns. Trade-off solutions between principles of economy, flexibility, and robustness define a "triangle of persistence" describing the impact of the environment on a biological system. The pragmatic landscape of the triangle interfaces with the syntax and semantics of molecular languages, which together with comparative and evolutionary genomic data can explain global patterns of diversification of cellular life. The vocabularies of proteins (proteomes) and functions (functionomes) revealed a significant universal lexical core supporting a universal common ancestor, an ancestral evolutionary link between Bacteria and Eukarya, and distinct reductive evolutionary strategies of language compression in Archaea and Bacteria. A "causal" word cloud strategy inspired by the dependency grammar paradigm used in catenae unfolded the evolution of lexical units associated with Gene Ontology terms at different levels of ontological abstraction. While Archaea holds the smallest, oldest, and most homogeneous vocabulary of all superkingdoms, Bacteria heterogeneously apportions a more complex vocabulary, and Eukarya pushes functional innovation through mechanisms of flexibility and robustness.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, and C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, United States
| |
Collapse
|
17
|
Rauer C, Sen N, Waman VP, Abbasian M, Orengo CA. Computational approaches to predict protein functional families and functional sites. Curr Opin Struct Biol 2021; 70:108-122. [PMID: 34225010 DOI: 10.1016/j.sbi.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/13/2021] [Accepted: 05/25/2021] [Indexed: 01/06/2023]
Abstract
Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.
Collapse
Affiliation(s)
- Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
18
|
de Rond T, Asay JE, Moore BS. Co-occurrence of enzyme domains guides the discovery of an oxazolone synthetase. Nat Chem Biol 2021; 17:794-799. [PMID: 34099916 PMCID: PMC8238888 DOI: 10.1038/s41589-021-00808-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 04/29/2021] [Indexed: 02/04/2023]
Abstract
Multidomain enzymes orchestrate two or more catalytic activities to carry out metabolic transformations with increased control and speed. Here, we report the design and development of a genome-mining approach for targeted discovery of biochemical transformations through the analysis of co-occurring enzyme domains (CO-ED) in a single protein. CO-ED was designed to identify unannotated multifunctional enzymes for functional characterization and discovery based on the premise that linked enzyme domains have evolved to function collaboratively. Guided by CO-ED, we targeted an unannotated predicted ThiF-nitroreductase di-domain enzyme found in more than 50 proteobacteria. Through heterologous expression and biochemical reconstitution, we discovered a series of natural products containing the rare oxazolone heterocycle and characterized their biosynthesis. Notably, we identified the di-domain enzyme as an oxazolone synthetase, validating CO-ED-guided genome mining as a methodology with potential broad utility for both the discovery of unusual enzymatic transformations and the functional annotation of multidomain enzymes.
Collapse
Affiliation(s)
- Tristan de Rond
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093
| | - Julia E. Asay
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093
| | - Bradley S. Moore
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093
| |
Collapse
|
19
|
Das S, Scholes HM, Sen N, Orengo C. CATH functional families predict functional sites in proteins. Bioinformatics 2021; 37:1099-1106. [PMID: 33135053 PMCID: PMC8150129 DOI: 10.1093/bioinformatics/btaa937] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| |
Collapse
|
20
|
Bordin N, Sillitoe I, Lees JG, Orengo C. Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds. Front Mol Biosci 2021; 8:668184. [PMID: 34041266 PMCID: PMC8141709 DOI: 10.3389/fmolb.2021.668184] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jonathan G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
21
|
One of Nature’s Basic Laws: Combination-Sharing. HUMAN ARENAS 2021. [DOI: 10.1007/s42087-021-00215-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
22
|
Abstract
We describe the de novo design of an allosterically regulated protein, which comprises two tightly coupled domains. One domain is based on the DF (Due Ferri in Italian or two-iron in English) family of de novo proteins, which have a diiron cofactor that catalyzes a phenol oxidase reaction, while the second domain is based on PS1 (Porphyrin-binding Sequence), which binds a synthetic Zn-porphyrin (ZnP). The binding of ZnP to the original PS1 protein induces changes in structure and dynamics, which we expected to influence the catalytic rate of a fused DF domain when appropriately coupled. Both DF and PS1 are four-helix bundles, but they have distinct bundle architectures. To achieve tight coupling between the domains, they were connected by four helical linkers using a computational method to discover the most designable connections capable of spanning the two architectures. The resulting protein, DFP1 (Due Ferri Porphyrin), bound the two cofactors in the expected manner. The crystal structure of fully reconstituted DFP1 was also in excellent agreement with the design, and it showed the ZnP cofactor bound over 12 Å from the dimetal center. Next, a substrate-binding cleft leading to the diiron center was introduced into DFP1. The resulting protein acts as an allosterically modulated phenol oxidase. Its Michaelis-Menten parameters were strongly affected by the binding of ZnP, resulting in a fourfold tighter K m and a 7-fold decrease in k cat These studies establish the feasibility of designing allosterically regulated catalytic proteins, entirely from scratch.
Collapse
|
23
|
Aledo JC, Aledo P. Susceptibility of Protein Methionine Oxidation in Response to Hydrogen Peroxide Treatment-Ex Vivo Versus In Vitro: A Computational Insight. Antioxidants (Basel) 2020; 9:antiox9100987. [PMID: 33066324 PMCID: PMC7602125 DOI: 10.3390/antiox9100987] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/08/2020] [Accepted: 10/09/2020] [Indexed: 11/25/2022] Open
Abstract
Methionine oxidation plays a relevant role in cell signaling. Recently, we built a database containing thousands of proteins identified as sulfoxidation targets. Using this resource, we have now developed a computational approach aimed at characterizing the oxidation of human methionyl residues. We found that proteins oxidized in both cell-free preparations (in vitro) and inside living cells (ex vivo) were enriched in methionines and intrinsically disordered regions. However, proteins oxidized ex vivo tended to be larger and less abundant than those oxidized in vitro. Another distinctive feature was their subcellular localizations. Thus, nuclear and mitochondrial proteins were preferentially oxidized ex vivo but not in vitro. The nodes corresponding with ex vivo and in vitro oxidized proteins in a network based on gene ontology terms showed an assortative mixing suggesting that ex vivo oxidized proteins shared among them molecular functions and biological processes. This was further supported by the observation that proteins from the ex vivo set were co-regulated more often than expected by chance. We also investigated the sequence environment of oxidation sites. Glutamate and aspartate were overrepresented in these environments regardless the group. In contrast, tyrosine, tryptophan and histidine were clearly avoided but only in the environments of the ex vivo sites. A hypothetical mechanism of methionine oxidation accounts for these observations presented.
Collapse
|
24
|
Wen Z, He J, Huang SY. Topology-independent and global protein structure alignment through an FFT-based algorithm. Bioinformatics 2020; 36:478-486. [PMID: 31384919 DOI: 10.1093/bioinformatics/btz609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 07/22/2019] [Accepted: 08/02/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Protein structure alignment is one of the fundamental problems in computational structure biology. A variety of algorithms have been developed to address this important issue in the past decade. However, due to their heuristic nature, current structure alignment methods may suffer from suboptimal alignment and/or over-fragmentation and thus lead to a biologically wrong alignment in some cases. To overcome these limitations, we have developed an accurate topology-independent and global structure alignment method through an FFT-based exhaustive search algorithm, which is referred to as FTAlign. RESULTS Our FTAlign algorithm was extensively tested on six commonly used datasets and compared with seven state-of-the-art structure alignment approaches, TMalign, DeepAlign, Kpax, 3DCOMB, MICAN, SPalignNS and CLICK. It was shown that FTAlign outperformed the other methods in reproducing manually curated alignments and obtained a high success rate of 96.7 and 90.0% on two gold-standard benchmarks, MALIDUP and MALISAM, respectively. Moreover, FTAlign also achieved the overall best performance in terms of biologically meaningful structure overlap (SO) and TMscore on both the sequential alignment test sets including MALIDUP, MALISAM and 64 difficult cases from HOMSTRAD, and the non-sequential sets including MALIDUP-NS, MALISAM-NS, 199 topology-different cases, where FTAlign especially showed more advantage for non-sequential alignment. Despite its global search feature, FTAlign is also computationally efficient and can normally complete a pairwise alignment within one second. AVAILABILITY AND IMPLEMENTATION http://huanglab.phys.hust.edu.cn/ftalign/.
Collapse
Affiliation(s)
- Zeyu Wen
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| |
Collapse
|
25
|
Czubat B, Minias A, Brzostek A, Żaczek A, Struś K, Zakrzewska-Czerwińska J, Dziadek J. Functional Disassociation Between the Protein Domains of MSMEG_4305 of Mycolicibacterium smegmatis ( Mycobacterium smegmatis) in vivo. Front Microbiol 2020; 11:2008. [PMID: 32973726 PMCID: PMC7466739 DOI: 10.3389/fmicb.2020.02008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 07/29/2020] [Indexed: 12/02/2022] Open
Abstract
MSMEG_4305 is a two-domain protein of Mycolicibacterium smegmatis (Mycobacterium smegmatis) (Mycolicibacterium smegmatis). The N-terminal domain of MSMEG_4305 encodes an RNase H type I. The C-terminal domain is a presumed CobC, predicted to be involved in the aerobic synthesis of vitamin B12. Both domains reach their maximum at distinct pH, approximately 8.5 and 4.5, respectively. The presence of the CobC domain influenced RNase activity in vitro in homolog Rv2228c. Here, we analyzed the role of MSMEG_4305 in vitamin B12 synthesis and the functional association between both domains in vivo in M. smegmatis. We used knock-out mutant of M. smegmatis, deficient in MSMEG_4305. Whole-cell lysates of the mutants strain contained a lower concentration of vitamin B12, as it determined with immunoenzimatic assay. We observed growth deficits, related to vitamin B12 production, on media containing sulfamethazine and propionate. Removal of the CobC domain of MSMEG_4305 in ΔrnhA background hardly affected the growth rate of M. smegmatis in vivo. The strain carrying truncation showed no fitness deficit in the competitive assay and it did not show increased level of RNA/DNA hybrids in its genome. We show that homologs of MSMEG_4305 are present only in the Actinomycetales phylogenetic branch (according to the old classification system). The domains of MSMEG_4305 homologs accumulate mutations at a different rate, while the linker region is highly variable. We conclude that MSMEG_4305 is a multidomain protein that most probably was fixed in the phylogenetic tree of life due to genetic drift.
Collapse
Affiliation(s)
- Bożena Czubat
- Department of Experimental and Clinical Pharmacology, University of Rzeszów, Rzeszów, Poland.,Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Alina Minias
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Anna Brzostek
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Anna Żaczek
- Institute of Medical Sciences, Medical College of Rzeszów University, Rzeszów, Poland
| | - Katarzyna Struś
- Department of Bioenergetics, Food Analysis and Microbiology, Institute of Food Technology and Nutrition, University of Rzeszów, Rzeszów, Poland
| | | | - Jarosław Dziadek
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| |
Collapse
|
26
|
Abstract
Background: Locating the root node of the "tree of life" (ToL) is one of the hardest problems in phylogenetics, given the time depth. The root-node, or the universal common ancestor (UCA), groups descendants into organismal clades/domains. Two notable variants of the two-domains ToL (2D-ToL) have gained support recently. One 2D-ToL posits that eukaryotes (organisms with nuclei) and akaryotes (organisms without nuclei) are sister clades that diverged from the UCA, and that Asgard archaea are sister to other archaea. The other 2D-ToL proposes that eukaryotes emerged from within archaea and places Asgard archaea as sister to eukaryotes. Williams et al. ( Nature Ecol. Evol. 4: 138-147; 2020) re-evaluated the data and methods that support the competing two-domains proposals and concluded that eukaryotes are the closest relatives of Asgard archaea. Critique: The poor resolution of the archaea in their analysis, despite employing amino acid alignments from thousands of proteins and the best-fitting substitution models, contradicts their conclusions. We argue that they overlooked important aspects of estimating evolutionary relatedness and assessing phylogenetic signal in empirical data. Which 2D-ToL is better supported depends on which kind of molecular features are better for resolving common ancestors at the roots of clades - protein-domains or their component amino acids. We focus on phylogenetic character reconstructions necessary to describe the UCA or its closest descendants in the absence of reliable fossils. Clarifications: It is well known that different character types present different perspectives on evolutionary history that relate to different phylogenetic depths. We show that protein structural-domains support more reliable phylogenetic reconstructions of deep-diverging clades in the ToL. Accordingly, Eukaryotes and Akaryotes are better supported clades in a 2D-ToL.
Collapse
Affiliation(s)
| | - David Morrison
- Department of Organismal Biology, Systematic Biology, Uppsala University, Uppsala, 752 36, Sweden
| |
Collapse
|
27
|
Carrillo-Campos J. Estructura y función de las oxigenasas tipo Rieske/mononuclear. TIP REVISTA ESPECIALIZADA EN CIENCIAS QUÍMICO-BIOLÓGICAS 2019. [DOI: 10.22201/fesz.23958723e.2019.0.196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Las oxigenasas Rieske/mononuclear son un grupo de metaloenzimas que catalizan la oxidación de una variedad de compuestos, destaca su participación en la degradación de compuestos xenobióticos contaminantes; estas enzimas también participan en la biosíntesis de algunos compuestos de interés comercial. Poseen una amplia especificidad por el sustrato, convirtiéndolas en un grupo de enzimas con un alto potencial de aplicación en procesos biotecnológicos que hasta el momento no ha sido explotado. La presente revisión aborda aspectos generales acerca de la función y estructura de este importante grupo de enzimas.
Collapse
|
28
|
Mechanisms of noncanonical binding dynamics in multivalent protein-protein interactions. Proc Natl Acad Sci U S A 2019; 116:25659-25667. [PMID: 31776263 DOI: 10.1073/pnas.1902909116] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein multivalency can provide increased affinity and specificity relative to monovalent counterparts, but these emergent biochemical properties and their mechanistic underpinnings are difficult to predict as a function of the biophysical properties of the multivalent binding partners. Here, we present a mathematical model that accurately simulates binding kinetics and equilibria of multivalent protein-protein interactions as a function of the kinetics of monomer-monomer binding, the structure and topology of the multidomain interacting partners, and the valency of each partner. These properties are all experimentally or computationally estimated a priori, including approximating topology with a worm-like chain model applicable to a variety of structurally disparate systems, thus making the model predictive without parameter fitting. We conceptualize multivalent binding as a protein-protein interaction network: ligand and receptor valencies determine the number of interacting species in the network, with monomer kinetics and structural properties dictating the dynamics of each species. As predicted by the model and validated by surface plasmon resonance experiments, multivalent interactions can generate several noncanonical macroscopic binding dynamics, including a transient burst of high-energy configurations during association, biphasic equilibria resulting from interligand competition at high concentrations, and multiexponential dissociation arising from differential lifetimes of distinct network species. The transient burst was only uncovered when extending our analysis to trivalent interactions due to the significantly larger network, and we were able to predictably tune burst magnitude by altering linker rigidity. This study elucidates mechanisms of multivalent binding and establishes a framework for model-guided analysis and engineering of such interactions.
Collapse
|
29
|
Pascarella S. Computational classification of MocR transcriptional regulators into subgroups as a support for experimental and functional characterization. Bioinformation 2019; 15:151-159. [PMID: 31435161 PMCID: PMC6677901 DOI: 10.6026/97320630015151] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 02/03/2019] [Indexed: 11/23/2022] Open
Abstract
MocR bacterial transcriptional regulators are a subfamily within the GntR family. The MocR proteins possess an N-terminal domain
containing the winged Helix-Turn-Helix (wHTH) motif and a C-terminal domain whose architecture is homologous to the fold type-I
pyridoxal 5'-phosphate (PLP) dependent enzymes and whose archetypical protein is aspartate aminotransferase (AAT). The ancestor of the
fold type-I PLP dependent super-family is considered one of the earliest enzymes. The members of this super-family are the product of
evolution which resulted in a diversified protein population able to catalyze a set of reactions on substrates often containing amino groups.
The MocR regulators are activators or repressors of gene control within many metabolic pathways often involving PLP enzymes. This
diversity implies that MocR specifically responds to different classes of effector molecules. Therefore, it is of interest to compare the AAT
domains of MocR from six bacteria phyla. Multi dimensional scaling and cluster analyses suggested that at least three subgroups exist
within the population that reflects functional specialization rather than taxonomic origin. The AAT-domains of the three clusters display
variable degree of similarity to different fold type-I PLP enzyme families. The results support the hypothesis that independent fusion
events generated at least three different MocR subgroups.
Collapse
Affiliation(s)
- Stefano Pascarella
- Structural bioinformatics and Molecular modelling Lab;Dipartimento di Scienze biochimiche;Sapienza Universita di Roma;00185 Roma,Italy
| |
Collapse
|
30
|
Baker EP, Hittinger CT. Evolution of a novel chimeric maltotriose transporter in Saccharomyces eubayanus from parent proteins unable to perform this function. PLoS Genet 2019; 15:e1007786. [PMID: 30946740 PMCID: PMC6448821 DOI: 10.1371/journal.pgen.1007786] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 10/25/2018] [Indexed: 11/23/2022] Open
Abstract
At the molecular level, the evolution of new traits can be broadly divided between changes in gene expression and changes in protein-coding sequence. For proteins, the evolution of novel functions is generally thought to proceed through sequential point mutations or recombination of whole functional units. In Saccharomyces, the uptake of the sugar maltotriose into the cell is the primary limiting factor in its utilization, but maltotriose transporters are relatively rare, except in brewing strains. No known wild strains of Saccharomyces eubayanus, the cold-tolerant parent of hybrid lager-brewing yeasts (Saccharomyces cerevisiae x S. eubayanus), are able to consume maltotriose, which limits their ability to fully ferment malt extract. In one strain of S. eubayanus, we found a gene closely related to a known maltotriose transporter and were able to confer maltotriose consumption by overexpressing this gene or by passaging the strain on maltose. Even so, most wild strains of S. eubayanus lack native maltotriose transporters. To determine how this rare trait could evolve in naive genetic backgrounds, we performed an adaptive evolution experiment for maltotriose consumption, which yielded a single strain of S. eubayanus able to grow on maltotriose. We mapped the causative locus to a gene encoding a novel chimeric transporter that was formed by an ectopic recombination event between two genes encoding transporters that are unable to import maltotriose. In contrast to classic models of the evolution of novel protein functions, the recombination breakpoints occurred within a single functional domain. Thus, the ability of the new protein to carry maltotriose was likely acquired through epistatic interactions between independently evolved substitutions. By acquiring multiple mutations at once, the transporter rapidly gained a novel function, while bypassing potentially deleterious intermediate steps. This study provides an illuminating example of how recombination between paralogs can establish novel interactions among substitutions to create adaptive functions. Hybrids of the yeasts Saccharomyces cerevisiae and Saccharomyces eubayanus (lager-brewing yeasts) dominate the modern brewing industry. S. cerevisiae, also known as baker’s yeast, is well-known for its role in industry and scientific research. Less well recognized is S. eubayanus, which was only discovered as a pure species in 2011. While most lager-brewing yeasts rapidly and completely utilize the important brewing sugar maltotriose, no strain of S. eubayanus isolated to date is known to do so. Despite being unable to consume maltotriose, we identified one strain of S. eubayanus carrying a gene for a functional maltotriose transporter, although most strains lack this gene. During an adaptive evolution experiment, a strain of S. eubayanus without native maltotriose transporters evolved the ability to grow on maltotriose. Maltotriose consumption in the evolved strain resulted from a chimeric transporter that arose by shuffling genes encoding parent proteins that were unable to transport maltotriose. Traditionally, functional chimeric proteins are thought to evolve by shuffling discrete functional domains or modules, but the breakpoints in the chimera studied here occurred within the single functional module of the protein. These results support the less well-recognized role of shuffling duplicate gene sequences to generate novel proteins with adaptive functions.
Collapse
Affiliation(s)
- EmilyClare P. Baker
- Laboratory of Genetics, Microbiology Doctoral Training Program, Genome Center of Wisconsin, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Chris Todd Hittinger
- Laboratory of Genetics, Microbiology Doctoral Training Program, Genome Center of Wisconsin, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail:
| |
Collapse
|
31
|
Debiec KT, Whitley MJ, Koharudin LMI, Chong LT, Gronenborn AM. Integrating NMR, SAXS, and Atomistic Simulations: Structure and Dynamics of a Two-Domain Protein. Biophys J 2019; 114:839-855. [PMID: 29490245 DOI: 10.1016/j.bpj.2018.01.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 12/19/2017] [Accepted: 01/02/2018] [Indexed: 12/21/2022] Open
Abstract
Multidomain proteins with two or more independently folded functional domains are prevalent in nature. Whereas most multidomain proteins are linked linearly in sequence, roughly one-tenth possess domain insertions where a guest domain is implanted into a loop of a host domain, such that the two domains are connected by a pair of interdomain linkers. Here, we characterized the influence of the interdomain linkers on the structure and dynamics of a domain-insertion protein in which the guest LysM domain is inserted into a central loop of the host CVNH domain. Expanding upon our previous crystallographic and NMR studies, we applied SAXS in combination with NMR paramagnetic relaxation enhancement to construct a structural model of the overall two-domain system. Although the two domains have no fixed relative orientation, certain orientations were found to be preferred over others. We also assessed the accuracies of molecular mechanics force fields in modeling the structure and dynamics of tethered multidomain proteins by integrating our experimental results with microsecond-scale atomistic molecular dynamics simulations. In particular, our evaluation of two different combinations of the latest force fields and water models revealed that both combinations accurately reproduce certain structural and dynamical properties, but are inaccurate for others. Overall, our study illustrates the value of integrating experimental NMR and SAXS studies with long timescale atomistic simulations for characterizing structural ensembles of flexibly linked multidomain systems.
Collapse
Affiliation(s)
- Karl T Debiec
- Molecular Biophysics and Structural Biology Graduate Program, University of Pittsburgh and Carnegie Mellon University, Pittsburgh, Pennsylvania; Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania; Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Matthew J Whitley
- Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Leonardus M I Koharudin
- Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Lillian T Chong
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Angela M Gronenborn
- Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania.
| |
Collapse
|
32
|
Abstract
Drugs modulate disease states through their actions on targets in the body. Determining these targets aids the focused development of new treatments, and helps to better characterize those already employed. One means of accomplishing this is through the deployment of in silico methodologies, harnessing computational analytical and predictive power to produce educated hypotheses for experimental verification. Here, we provide an overview of the current state of the art, describe some of the well-established methods in detail, and reflect on how they, and emerging technologies promoting the incorporation of complex and heterogeneous data-sets, can be employed to improve our understanding of (poly)pharmacology.
Collapse
Affiliation(s)
- Ryan Byrne
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland.
| |
Collapse
|
33
|
Swaroop Srivastava S, Raman R, Kiran U, Garg R, Chadalawada S, Pawar AD, Sankaranarayanan R, Sharma Y. Interface interactions between βγ-crystallin domain and Ig-like domain render Ca 2+ -binding site inoperative in abundant perithecial protein of Neurospora crassa. Mol Microbiol 2018; 110:955-972. [PMID: 30216631 DOI: 10.1111/mmi.14130] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2018] [Indexed: 11/30/2022]
Abstract
We describe a set of proteins in which a βγ-crystallin domain pairs with an Ig-like domain, and which are confined to microbes, like bacteria, slime molds and fungi. DdCAD-1 (Ca2+ -dependent cell adhesion molecule-1) and abundant perithecial protein (APP) represent this class of molecules. Using the crystal structure of APP-NTD (N-terminal domain of APP), we describe its mode of Ca2+ binding and provide a generalized theme for correct identification of the Ca2+ -binding site within this class of molecules. As a common feature, one of the two Ca2+ -binding sites is non-functional in the βγ-crystallin domains of these proteins. While APP-NTD binds Ca2+ with a micromolar affinity which is comparable to DdCAD-1, APP surprisingly does not bind Ca2+ . Crystal structures of APP and Ca2+ -bound APP-NTD reveal that the interface interactions in APP render its Ca2+ -binding site inoperative. Thus, heterodomain association provides a novel mode of Ca2+ -binding regulation in APP. Breaking the interface interactions (mutating Asp30Ala, Leu132Ala and Ile135Ala) or separation from the Ig-like domain removes the constraints upon the required conformational transition and enables the βγ-crystallin domain to bind Ca2+ . In mechanistic detail, our work demonstrates an interdomain interface adapted to distinct functional niches in APP and its homolog DdCAD-1.
Collapse
Affiliation(s)
| | - Rajeev Raman
- CSIR - Centre for Cellular and Molecular Biology (CCMB), Hyderabad, 500 007, India
| | - Uday Kiran
- CSIR - Centre for Cellular and Molecular Biology (CCMB), Hyderabad, 500 007, India
| | - Rupsi Garg
- CSIR - Centre for Cellular and Molecular Biology (CCMB), Hyderabad, 500 007, India
| | - Swathi Chadalawada
- CSIR - Centre for Cellular and Molecular Biology (CCMB), Hyderabad, 500 007, India
| | - Asmita D Pawar
- CSIR - Centre for Cellular and Molecular Biology (CCMB), Hyderabad, 500 007, India
| | - Rajan Sankaranarayanan
- CSIR - Centre for Cellular and Molecular Biology (CCMB), Hyderabad, 500 007, India.,Academy of Scientific and Innovative Research (AcSIR), New Delhi, India
| | - Yogendra Sharma
- CSIR - Centre for Cellular and Molecular Biology (CCMB), Hyderabad, 500 007, India.,Academy of Scientific and Innovative Research (AcSIR), New Delhi, India
| |
Collapse
|
34
|
Jakubec D, Kratochvíl M, Vymĕtal J, Vondrášek J. Widespread evolutionary crosstalk among protein domains in the context of multi-domain proteins. PLoS One 2018; 13:e0203085. [PMID: 30169546 PMCID: PMC6118372 DOI: 10.1371/journal.pone.0203085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 08/14/2018] [Indexed: 11/20/2022] Open
Abstract
Domains are distinct units within proteins that typically can fold independently into recognizable three-dimensional structures to facilitate their functions. The structural and functional independence of protein domains is reflected by their apparent modularity in the context of multi-domain proteins. In this work, we examined the coupling of evolution of domain sequences co-occurring within multi-domain proteins to see if it proceeds independently, or in a coordinated manner. We used continuous information theory measures to assess the extent of correlated mutations among domains in multi-domain proteins from organisms across the tree of life. In all multi-domain architectures we examined, domains co-occurring within protein sequences had to some degree undergone concerted evolution. This finding challenges the notion of complete modularity and independence of protein domains, providing new perspective on the evolution of protein sequence and function.
Collapse
Affiliation(s)
- David Jakubec
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
- Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University, 128 43 Prague 2, Czech Republic
| | - Miroslav Kratochvíl
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, 118 00 Prague 1, Czech Republic
| | - Jiří Vymĕtal
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
| | - Jiří Vondrášek
- Department of Bioinformatics, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 166 10 Prague 6, Czech Republic
| |
Collapse
|
35
|
Mehrotra P, Ami VKG, Srinivasan N. Clustering of multi-domain protein sequences. Proteins 2018; 86:759-776. [PMID: 29675880 DOI: 10.1002/prot.25510] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Revised: 04/09/2018] [Accepted: 04/16/2018] [Indexed: 11/06/2022]
Abstract
The overall function of a multi-domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment-based methods commonly utilize domain-level information and provide classification only at the level of domains. Such methods are not capable of taking into account the contributions of other domains in the proteins, and domain-linker regions and classify multi-domain proteins. An alignment-free protein sequence comparison tool, CLAP (CLAssification of Proteins) was previously developed in our laboratory to especially handle multi-domain protein sequences without a requirement of defining domain boundaries and sequential order of domains. Through this method we aim to achieve a biologically meaningful classification scheme for multi-domain protein sequences. In this article, CLAP-based classification has been explored on 5 datasets of multi-domain proteins and we present detailed analysis for proteins containing (1) Tyrosine phosphatase and (2) SH3 domain. At the domain-level CLAP-based classification scheme resulted in a clustering similar to that obtained from an alignment-based method. CLAP-based clusters obtained for full-length datasets were shown to comprise of proteins with similar functions and domain architectures. Our study demonstrates that multi-domain proteins could be classified effectively by considering full-length sequences without a requirement of identification of domains in the sequence.
Collapse
Affiliation(s)
- Prachi Mehrotra
- Indian Institute of Science Mathematics Initiative, Bangalore, 560012, India.,Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| | - Vimla Kany G Ami
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, 560100, India
| | | |
Collapse
|
36
|
Abstract
Most proteins comprise two or more domains from a limited suite of protein families. These domains are often rearranged in various combinations through gene fusion events to evolve new protein functions, including the acquisition of protein allostery through the incorporation of regulatory domains. The enzyme 3-deoxy-d-arabino-heptulosonate 7-phosphate synthase (DAH7PS) is the first enzyme of aromatic amino acid biosynthesis and displays a diverse range of allosteric mechanisms. DAH7PSs adopt a common architecture with a shared (β/α)8 catalytic domain which can be attached to an ACT-like or a chorismate mutase regulatory domain that operates via distinct mechanisms. These respective domains confer allosteric regulation by controlling DAH7PS function in response to ligand Tyr or prephenate. Starting with contemporary DAH7PS proteins, two protein chimeras were created, with interchanged regulatory domains. Both engineered proteins were catalytically active and delivered new functional allostery with switched ligand specificity and allosteric mechanisms delivered by their nonhomologous regulatory domains. This interchangeability of protein domains represents an efficient method not only to engineer allostery in multidomain proteins but to create a new bifunctional enzyme.
Collapse
|
37
|
Blacklock KM, Yang L, Mulligan VK, Khare SD. A computational method for the design of nested proteins by loop-directed domain insertion. Proteins 2018; 86:354-369. [PMID: 29250820 DOI: 10.1002/prot.25445] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 12/04/2017] [Accepted: 12/15/2017] [Indexed: 12/23/2022]
Abstract
The computational design of novel nested proteins-in which the primary structure of one protein domain (insert) is flanked by the primary structure segments of another (parent)-would enable the generation of multifunctional proteins. Here we present a new algorithm, called Loop-Directed Domain Insertion (LooDo), implemented within the Rosetta software suite, for the purpose of designing nested protein domain combinations connected by flexible linker regions. Conformational space for the insert domain is sampled using large libraries of linker fragments for linker-to-parent domain superimposition followed by insert-to-linker superimposition. The relative positioning of the two domains (treated as rigid bodies) is sampled efficiently by a grid-based, mutual placement compatibility search. The conformations of the loop residues, and the identities of loop as well as interface residues, are simultaneously optimized using a generalized kinematic loop closure algorithm and Rosetta EnzymeDesign, respectively, to minimize interface energy. The algorithm was found to consistently sample near-native conformations and interface sequences for a benchmark set of structurally similar but functionally divergent domain-inserted enzymes from the α/β hydrolase superfamily, and discriminates well between native and nonnative conformations and sequences, although loop conformations tended to deviate from the native conformations. Furthermore, in cross-domain placement tests, native insert-parent domain combinations were ranked as the best-scoring structures compared to nonnative domain combinations. This algorithm should be broadly applicable to the design of multi-domain protein complexes with any combination of inserted or tandem domain connections.
Collapse
Affiliation(s)
- Kristin M Blacklock
- Institute for Quantitative Biomedicine, Rutgers The State University of New Jersey, Piscataway, New Jersey.,Department of Chemistry and Chemical Biology, Rutgers The State University of New Jersey, Piscataway, New Jersey.,Center for Integrative Proteomics Research, Rutgers The State University of New Jersey, Piscataway, New Jersey
| | - Lu Yang
- Department of Chemistry and Chemical Biology, Rutgers The State University of New Jersey, Piscataway, New Jersey.,Center for Integrative Proteomics Research, Rutgers The State University of New Jersey, Piscataway, New Jersey
| | - Vikram K Mulligan
- Institute for Protein Design and Department of Biochemistry, University of Washington, Seattle, Washington
| | - Sagar D Khare
- Institute for Quantitative Biomedicine, Rutgers The State University of New Jersey, Piscataway, New Jersey.,Department of Chemistry and Chemical Biology, Rutgers The State University of New Jersey, Piscataway, New Jersey.,Center for Integrative Proteomics Research, Rutgers The State University of New Jersey, Piscataway, New Jersey
| |
Collapse
|
38
|
Sasnauskas G, Tamulaitienė G, Tamulaitis G, Čalyševa J, Laime M, Rimšelienė R, Lubys A, Siksnys V. UbaLAI is a monomeric Type IIE restriction enzyme. Nucleic Acids Res 2017; 45:9583-9594. [PMID: 28934493 PMCID: PMC5766183 DOI: 10.1093/nar/gkx634] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 07/08/2017] [Accepted: 07/11/2017] [Indexed: 01/11/2023] Open
Abstract
Type II restriction endonucleases (REases) form a large and highly diverse group of enzymes. Even REases specific for a common recognition site often vary in their oligomeric structure, domain organization and DNA cleavage mechanisms. Here we report biochemical and structural characterization of the monomeric restriction endonuclease UbaLAI, specific for the pseudosymmetric DNA sequence 5'-CC/WGG-3' (where W = A/T, and '/' marks the cleavage position). We present a 1.6 Å co-crystal structure of UbaLAI N-terminal domain (UbaLAI-N) and show that it resembles the B3-family domain of EcoRII specific for the 5'-CCWGG-3' sequence. We also find that UbaLAI C-terminal domain (UbaLAI-C) is closely related to the monomeric REase MvaI, another enzyme specific for the 5'-CCWGG-3' sequence. Kinetic studies of UbaLAI revealed that it requires two recognition sites for optimal activity, and, like other type IIE enzymes, uses one copy of a recognition site to stimulate cleavage of a second copy. We propose that during the reaction UbaLAI-N acts as a handle that tethers the monomeric UbaLAI-C domain to the DNA, thereby helping UbaLAI-C to perform two sequential DNA nicking reactions on the second recognition site during a single DNA-binding event. A similar reaction mechanism may be characteristic to other monomeric two-domain REases.
Collapse
Affiliation(s)
- Giedrius Sasnauskas
- Institute of Biotechnology, Vilnius University, Sauletekio av. 7, LT-10257 Vilnius, Lithuania
| | - Giedrė Tamulaitienė
- Institute of Biotechnology, Vilnius University, Sauletekio av. 7, LT-10257 Vilnius, Lithuania
| | - Gintautas Tamulaitis
- Institute of Biotechnology, Vilnius University, Sauletekio av. 7, LT-10257 Vilnius, Lithuania
| | - Jelena Čalyševa
- Institute of Biotechnology, Vilnius University, Sauletekio av. 7, LT-10257 Vilnius, Lithuania
| | - Miglė Laime
- Thermo Fisher Scientific Baltics, V. A. Graiciuno str. 8, LT-02241, Vilnius, Lithuania
| | - Renata Rimšelienė
- Thermo Fisher Scientific Baltics, V. A. Graiciuno str. 8, LT-02241, Vilnius, Lithuania
| | - Arvydas Lubys
- Thermo Fisher Scientific Baltics, V. A. Graiciuno str. 8, LT-02241, Vilnius, Lithuania
| | - Virginijus Siksnys
- Institute of Biotechnology, Vilnius University, Sauletekio av. 7, LT-10257 Vilnius, Lithuania
| |
Collapse
|
39
|
Adaptive evolution by spontaneous domain fusion and protein relocalization. Nat Ecol Evol 2017; 1:1562-1568. [PMID: 29185504 DOI: 10.1038/s41559-017-0283-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 07/18/2017] [Indexed: 11/08/2022]
Abstract
Knowledge of adaptive processes encompasses understanding the emergence of new genes. Computational analyses of genomes suggest that new genes can arise by domain swapping; however, empirical evidence has been lacking. Here we describe a set of nine independent deletion mutations that arose during selection experiments with the bacterium Pseudomonas fluorescens in which the membrane-spanning domain of a fatty acid desaturase became translationally fused to a cytosolic di-guanylate cyclase, generating an adaptive 'wrinkly spreader' phenotype. Detailed genetic analysis of one gene fusion shows that the mutant phenotype is caused by relocalization of the di-guanylate cyclase domain to the cell membrane. The relative ease by which this new gene arose, along with its functional and regulatory effects, provides a glimpse of mutational events and their consequences that are likely to have a role in the evolution of new genes.
Collapse
|
40
|
Lechno-Yossef S, Melnicki MR, Bao H, Montgomery BL, Kerfeld CA. Synthetic OCP heterodimers are photoactive and recapitulate the fusion of two primitive carotenoproteins in the evolution of cyanobacterial photoprotection. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2017; 91:646-656. [PMID: 28503830 DOI: 10.1111/tpj.13593] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 04/25/2017] [Accepted: 05/03/2017] [Indexed: 06/07/2023]
Abstract
The orange carotenoid protein (OCP) governs photoprotection in the majority of cyanobacteria. It is structurally and functionally modular, comprised of a C-terminal regulatory domain (CTD), an N-terminal effector domain (NTD) and a ketocarotenoid; the chromophore spans the two domains in the ground state and translocates fully into the NTD upon illumination. Using both the canonical OCP1 from Fremyella diplosiphon and the presumably more primitive OCP2 paralog from the same organism, we show that an NTD-CTD heterodimer forms when the domains are expressed as separate polypeptides. The carotenoid is required for the heterodimeric association, assembling an orange complex which is stable in the dark. Both OCP1 and OCP2 heterodimers are photoactive, undergoing light-driven heterodimer dissociation, but differ in their ability to reassociate in darkness, setting the stage for bioengineering photoprotection in cyanobacteria as well as for developing new photoswitches for biotechnology. Additionally, we reveal that homodimeric CTD can bind carotenoid in the absence of NTD, and name this truncated variant the C-terminal domain-like carotenoid protein (CCP). This finding supports the hypothesis that the OCP evolved from an ancient fusion event between genes for two different carotenoid-binding proteins ancestral to the NTD and CTD. We suggest that the CCP and its homologs constitute a new family of carotenoproteins within the NTF2-like superfamily found across all kingdoms of life.
Collapse
Affiliation(s)
- Sigal Lechno-Yossef
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA
| | - Matthew R Melnicki
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Han Bao
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA
| | - Beronda L Montgomery
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, 48824, USA
| | - Cheryl A Kerfeld
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA
| |
Collapse
|
41
|
Esch L, Schaffrath U. An Update on Jacalin-Like Lectins and Their Role in Plant Defense. Int J Mol Sci 2017; 18:ijms18071592. [PMID: 28737678 PMCID: PMC5536079 DOI: 10.3390/ijms18071592] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 07/17/2017] [Accepted: 07/20/2017] [Indexed: 12/11/2022] Open
Abstract
Plant lectins are proteins that reversibly bind carbohydrates and are assumed to play an important role in plant development and resistance. Through the binding of carbohydrate ligands, lectins are involved in the perception of environmental signals and their translation into phenotypical responses. These processes require down-stream signaling cascades, often mediated by interacting proteins. Fusing the respective genes of two interacting proteins can be a way to increase the efficiency of this process. Most recently, proteins containing jacalin-related lectin (JRL) domains became a subject of plant resistance responses research. A meta-data analysis of fusion proteins containing JRL domains across different kingdoms revealed diverse partner domains ranging from kinases to toxins. Among them, proteins containing a JRL domain and a dirigent domain occur exclusively within monocotyledonous plants and show an unexpected high range of family member expansion compared to other JRL-fusion proteins. Rice, wheat, and barley plants overexpressing OsJAC1, a member of this family, are resistant against important fungal pathogens. We discuss the possibility that JRL domains also function as a decoy in fusion proteins and help to alert plants of the presence of attacking pathogens.
Collapse
Affiliation(s)
- Lara Esch
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany.
| | - Ulrich Schaffrath
- Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany.
| |
Collapse
|
42
|
Yang F, Sun S, Tan G, Costanzo M, Hill DE, Vidal M, Andrews BJ, Boone C, Roth FP. Identifying pathogenicity of human variants via paralog-based yeast complementation. PLoS Genet 2017; 13:e1006779. [PMID: 28542158 PMCID: PMC5466341 DOI: 10.1371/journal.pgen.1006779] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Revised: 06/09/2017] [Accepted: 04/25/2017] [Indexed: 11/21/2022] Open
Abstract
To better understand the health implications of personal genomes, we now face a largely unmet challenge to identify functional variants within disease-associated genes. Functional variants can be identified by trans-species complementation, e.g., by failure to rescue a yeast strain bearing a mutation in an orthologous human gene. Although orthologous complementation assays are powerful predictors of pathogenic variation, they are available for only a few percent of human disease genes. Here we systematically examine the question of whether complementation assays based on paralogy relationships can expand the number of human disease genes with functional variant detection assays. We tested over 1,000 paralogous human-yeast gene pairs for complementation, yielding 34 complementation relationships, of which 33 (97%) were novel. We found that paralog-based assays identified disease variants with success on par with that of orthology-based assays. Combining all homology-based assay results, we found that complementation can often identify pathogenic variants outside the homologous sequence region, presumably because of global effects on protein folding or stability. Within our search space, paralogy-based complementation more than doubled the number of human disease genes with a yeast-based complementation assay for disease variation. Functional complementation assays of human disease-associated gene variants can reveal many more human disease variants at high confidence than current computational approaches, even using highly-diverged model organisms. However, this has generally only been possible for a minority of human disease genes for which orthologous complementation is known in the relevant model organism, so that alternative assays are urgently needed. Here we show that complementation relationships can be found for many additional human disease genes by exploiting paralogous human-yeast gene relationships, and that disease variant identification using paralogy-based assays performs on par with orthology-based assays.
Collapse
Affiliation(s)
- Fan Yang
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
| | - Song Sun
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Guihong Tan
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Michael Costanzo
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - David E. Hill
- Center for Cancer Systems Biology (CCSB), Dana- Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana- Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Brenda J. Andrews
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Charles Boone
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
| | - Frederick P. Roth
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
- Center for Cancer Systems Biology (CCSB), Dana- Farber Cancer Institute, Boston, Massachusetts, United States of America
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
43
|
Van Holle S, De Schutter K, Eggermont L, Tsaneva M, Dang L, Van Damme EJM. Comparative Study of Lectin Domains in Model Species: New Insights into Evolutionary Dynamics. Int J Mol Sci 2017; 18:ijms18061136. [PMID: 28587095 PMCID: PMC5485960 DOI: 10.3390/ijms18061136] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Revised: 05/20/2017] [Accepted: 05/22/2017] [Indexed: 01/07/2023] Open
Abstract
Lectins are present throughout the plant kingdom and are reported to be involved in diverse biological processes. In this study, we provide a comparative analysis of the lectin families from model species in a phylogenetic framework. The analysis focuses on the different plant lectin domains identified in five representative core angiosperm genomes (Arabidopsisthaliana, Glycine max, Cucumis sativus, Oryza sativa ssp. japonica and Oryza sativa ssp. indica). The genomes were screened for genes encoding lectin domains using a combination of Basic Local Alignment Search Tool (BLAST), hidden Markov models, and InterProScan analysis. Additionally, phylogenetic relationships were investigated by constructing maximum likelihood phylogenetic trees. The results demonstrate that the majority of the lectin families are present in each of the species under study. Domain organization analysis showed that most identified proteins are multi-domain proteins, owing to the modular rearrangement of protein domains during evolution. Most of these multi-domain proteins are widespread, while others display a lineage-specific distribution. Furthermore, the phylogenetic analyses reveal that some lectin families evolved to be similar to the phylogeny of the plant species, while others share a closer evolutionary history based on the corresponding protein domain architecture. Our results yield insights into the evolutionary relationships and functional divergence of plant lectins.
Collapse
Affiliation(s)
- Sofie Van Holle
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Kristof De Schutter
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
- Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Lore Eggermont
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Mariya Tsaneva
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Liuyi Dang
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| | - Els J M Van Damme
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium.
| |
Collapse
|
44
|
Burkhart BJ, Schwalen CJ, Mann G, Naismith JH, Mitchell DA. YcaO-Dependent Posttranslational Amide Activation: Biosynthesis, Structure, and Function. Chem Rev 2017; 117:5389-5456. [PMID: 28256131 DOI: 10.1021/acs.chemrev.6b00623] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
With advances in sequencing technology, uncharacterized proteins and domains of unknown function (DUFs) are rapidly accumulating in sequence databases and offer an opportunity to discover new protein chemistry and reaction mechanisms. The focus of this review, the formerly enigmatic YcaO superfamily (DUF181), has been found to catalyze a unique phosphorylation of a ribosomal peptide backbone amide upon attack by different nucleophiles. Established nucleophiles are the side chains of Cys, Ser, and Thr which gives rise to azoline/azole biosynthesis in ribosomally synthesized and posttranslationally modified peptide (RiPP) natural products. However, much remains unknown about the potential for YcaO proteins to collaborate with other nucleophiles. Recent work suggests potential in forming thioamides, macroamidines, and possibly additional post-translational modifications. This review covers all knowledge through mid-2016 regarding the biosynthetic gene clusters (BGCs), natural products, functions, mechanisms, and applications of YcaO proteins and outlines likely future research directions for this protein superfamily.
Collapse
Affiliation(s)
| | | | - Greg Mann
- Biomedical Science Research Complex, University of St Andrews , BSRC North Haugh, St Andrews KY16 9ST, United Kingdom
| | - James H Naismith
- Biomedical Science Research Complex, University of St Andrews , BSRC North Haugh, St Andrews KY16 9ST, United Kingdom.,State Key Laboratory of Biotherapy, Sichuan University , Sichuan, China
| | | |
Collapse
|
45
|
Van Holle S, Rougé P, Van Damme EJM. Evolution and structural diversification of Nictaba-like lectin genes in food crops with a focus on soybean (Glycine max). ANNALS OF BOTANY 2017; 119:901-914. [PMID: 28087663 PMCID: PMC5379587 DOI: 10.1093/aob/mcw259] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 10/24/2016] [Accepted: 11/17/2016] [Indexed: 05/10/2023]
Abstract
Background and Aims The Nictaba family groups all proteins that show homology to Nictaba, the tobacco lectin. So far, Nictaba and an Arabidopsis thaliana homologue have been shown to be implicated in the plant stress response. The availability of more than 50 sequenced plant genomes provided the opportunity for a genome-wide identification of Nictaba -like genes in 15 species, representing members of the Fabaceae, Poaceae, Solanaceae, Musaceae, Arecaceae, Malvaceae and Rubiaceae. Additionally, phylogenetic relationships between the different species were explored. Furthermore, this study included domain organization analysis, searching for orthologous genes in the legume family and transcript profiling of the Nictaba -like lectin genes in soybean. Methods Using a combination of BLASTp, InterPro analysis and hidden Markov models, the genomes of Medicago truncatula , Cicer arietinum , Lotus japonicus , Glycine max , Cajanus cajan , Phaseolus vulgaris , Theobroma cacao , Solanum lycopersicum , Solanum tuberosum , Coffea canephora , Oryza sativa , Zea mays, Sorghum bicolor , Musa acuminata and Elaeis guineensis were searched for Nictaba -like genes. Phylogenetic analysis was performed using RAxML and additional protein domains in the Nictaba-like sequences were identified using InterPro. Expression analysis of the soybean Nictaba -like genes was investigated using microarray data. Key Results Nictaba -like genes were identified in all studied species and analysis of the duplication events demonstrated that both tandem and segmental duplication contributed to the expansion of the Nictaba gene family in angiosperms. The single-domain Nictaba protein and the multi-domain F-box Nictaba architectures are ubiquitous among all analysed species and microarray analysis revealed differential expression patterns for all soybean Nictaba-like genes. Conclusions Taken together, the comparative genomics data contributes to our understanding of the Nictaba -like gene family in species for which the occurrence of Nictaba domains had not yet been investigated. Given the ubiquitous nature of these genes, they have probably acquired new functions over time and are expected to take on various roles in plant development and defence.
Collapse
Affiliation(s)
- Sofie Van Holle
- Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Pierre Rougé
- UMR 152 PHARMA-DEV, Université de Toulouse, IRD, UPS, Chemin des Maraîchers 35, 31400 Toulouse, France
| | - Els J. M. Van Damme
- Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| |
Collapse
|
46
|
Moar WJ, Evans AJ, Kessenich CR, Baum JA, Bowen DJ, Edrington TC, Haas JA, Kouadio JLK, Roberts JK, Silvanovich A, Yin Y, Weiner BE, Glenn KC, Odegaard ML. The sequence, structural, and functional diversity within a protein family and implications for specificity and safety: The case for ETX_MTX2 insecticidal proteins. J Invertebr Pathol 2017; 142:50-59. [DOI: 10.1016/j.jip.2016.05.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2016] [Revised: 05/20/2016] [Accepted: 05/24/2016] [Indexed: 11/26/2022]
|
47
|
Cao H, Yang X, Jin L, Han W, Zhang Y. Module recombination and functional integration of oligosaccharide-producing multifunctional amylase. ACTA ACUST UNITED AC 2016. [DOI: 10.1016/j.molcatb.2016.08.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
48
|
List JM, Pathmanathan JS, Lopez P, Bapteste E. Unity and disunity in evolutionary sciences: process-based analogies open common research avenues for biology and linguistics. Biol Direct 2016; 11:39. [PMID: 27544206 PMCID: PMC4992195 DOI: 10.1186/s13062-016-0145-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2016] [Accepted: 08/06/2016] [Indexed: 11/13/2022] Open
Abstract
Background For a long time biologists and linguists have been noticing surprising similarities between the evolution of life forms and languages. Most of the proposed analogies have been rejected. Some, however, have persisted, and some even turned out to be fruitful, inspiring the transfer of methods and models between biology and linguistics up to today. Most proposed analogies were based on a comparison of the research objects rather than the processes that shaped their evolution. Focusing on process-based analogies, however, has the advantage of minimizing the risk of overstating similarities, while at the same time reflecting the common strategy to use processes to explain the evolution of complexity in both fields. Results We compared important evolutionary processes in biology and linguistics and identified processes specific to only one of the two disciplines as well as processes which seem to be analogous, potentially reflecting core evolutionary processes. These new process-based analogies support novel methodological transfer, expanding the application range of biological methods to the field of historical linguistics. We illustrate this by showing (i) how methods dealing with incomplete lineage sorting offer an introgression-free framework to analyze highly mosaic word distributions across languages; (ii) how sequence similarity networks can be used to identify composite and borrowed words across different languages; (iii) how research on partial homology can inspire new methods and models in both fields; and (iv) how constructive neutral evolution provides an original framework for analyzing convergent evolution in languages resulting from common descent (Sapir’s drift). Conclusions Apart from new analogies between evolutionary processes, we also identified processes which are specific to either biology or linguistics. This shows that general evolution cannot be studied from within one discipline alone. In order to get a full picture of evolution, biologists and linguists need to complement their studies, trying to identify cross-disciplinary and discipline-specific evolutionary processes. The fact that we found many process-based analogies favoring transfer from biology to linguistics further shows that certain biological methods and models have a broader scope than previously recognized. This opens fruitful paths for collaboration between the two disciplines. Reviewers This article was reviewed by W. Ford Doolittle and Eugene V. Koonin. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0145-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Johann-Mattis List
- CRLAO/EHESS, 2 rue de Lille, Paris, 75007, France. .,Equipe AIRE, UMR 7138, Laboratoire Evolution Paris-Seine, Université Pierre et Marie Curie, 7 quai St Bernard, Paris, 75005, France.
| | - Jananan Sylvestre Pathmanathan
- Equipe AIRE, UMR 7138, Laboratoire Evolution Paris-Seine, Université Pierre et Marie Curie, 7 quai St Bernard, Paris, 75005, France
| | - Philippe Lopez
- Equipe AIRE, UMR 7138, Laboratoire Evolution Paris-Seine, Université Pierre et Marie Curie, 7 quai St Bernard, Paris, 75005, France
| | - Eric Bapteste
- Equipe AIRE, UMR 7138, Laboratoire Evolution Paris-Seine, Université Pierre et Marie Curie, 7 quai St Bernard, Paris, 75005, France
| |
Collapse
|
49
|
Doğan T, MacDougall A, Saidi R, Poggioli D, Bateman A, O'Donovan C, Martin MJ. UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB. Bioinformatics 2016; 32:2264-71. [PMID: 27153729 PMCID: PMC4965628 DOI: 10.1093/bioinformatics/btw114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2015] [Revised: 01/22/2016] [Accepted: 02/25/2016] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. RESULTS We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. AVAILABILITY AND IMPLEMENTATION The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/ CONTACT: tdogan@ebi.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tunca Doğan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Alistair MacDougall
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Diego Poggioli
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
50
|
Jacobs TM, Williams B, Williams T, Xu X, Eletsky A, Federizon JF, Szyperski T, Kuhlman B. Design of structurally distinct proteins using strategies inspired by evolution. Science 2016; 352:687-90. [PMID: 27151863 DOI: 10.1126/science.aad8036] [Citation(s) in RCA: 105] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 03/14/2016] [Indexed: 12/25/2022]
Abstract
Natural recombination combines pieces of preexisting proteins to create new tertiary structures and functions. We describe a computational protocol, called SEWING, which is inspired by this process and builds new proteins from connected or disconnected pieces of existing structures. Helical proteins designed with SEWING contain structural features absent from other de novo designed proteins and, in some cases, remain folded at more than 100°C. High-resolution structures of the designed proteins CA01 and DA05R1 were solved by x-ray crystallography (2.2 angstrom resolution) and nuclear magnetic resonance, respectively, and there was excellent agreement with the design models. This method provides a new strategy to rapidly create large numbers of diverse and designable protein scaffolds.
Collapse
Affiliation(s)
- T M Jacobs
- Program in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - B Williams
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - T Williams
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - X Xu
- Department of Chemistry, State University of New York at Buffalo, Buffalo, NY 14260, USA. Northeast Structural Genomics Consortium
| | - A Eletsky
- Department of Chemistry, State University of New York at Buffalo, Buffalo, NY 14260, USA. Northeast Structural Genomics Consortium
| | - J F Federizon
- Department of Chemistry, State University of New York at Buffalo, Buffalo, NY 14260, USA
| | - T Szyperski
- Department of Chemistry, State University of New York at Buffalo, Buffalo, NY 14260, USA
| | - B Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|