1
|
Zheng Z, Goncearenco A, Berezovsky IN. Back in time to the Gly-rich prototype of the phosphate binding elementary function. Curr Res Struct Biol 2024; 7:100142. [PMID: 38655428 PMCID: PMC11035071 DOI: 10.1016/j.crstbi.2024.100142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 03/31/2024] [Accepted: 04/03/2024] [Indexed: 04/26/2024] Open
Abstract
Binding of nucleotides and their derivatives is one of the most ancient elementary functions dating back to the Origin of Life. We review here the works considering one of the key elements in binding of (di)nucleotide-containing ligands - phosphate binding. We start from a brief discussion of major participants, conditions, and events in prebiotic evolution that resulted in the Origin of Life. Tracing back to the basic functions, including metal and phosphate binding, and, potentially, formation of primitive protein-protein interactions, we focus here on the phosphate binding. Critically assessing works on the structural, functional, and evolutionary aspects of phosphate binding, we perform a simple computational experiment reconstructing its most ancient and generic sequence prototype. The profiles of the phosphate binding signatures have been derived in form of position-specific scoring matrices (PSSMs), their peculiarities depending on the type of the ligands have been analyzed, and evolutionary connections between them have been delineated. Then, the apparent prototype that gave rise to all relevant phosphate-binding signatures had also been reconstructed. We show that two major signatures of the phosphate binding that discriminate between the binding of dinucleotide- and nucleotide-containing ligands are GxGxxG and GxxGxG, respectively. It appears that the signature archetypal for dinucleotide-containing ligands is more generic, and it can frequently bind phosphate groups in nucleotide-containing ligands as well. The reconstructed prototype's key signature GxGGxG underlies the role of glycine residues in providing flexibility and interactions necessary for binding the phosphate groups. The prototype also contains other ancient amino acids, valine, and alanine, showing versatility towards evolutionary design and functional diversification.
Collapse
Affiliation(s)
- Zejun Zheng
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | | | - Igor N. Berezovsky
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
2
|
Sennett MA, Theobald DL. Extant Sequence Reconstruction: The Accuracy of Ancestral Sequence Reconstructions Evaluated by Extant Sequence Cross-Validation. J Mol Evol 2024; 92:181-206. [PMID: 38502220 PMCID: PMC10978691 DOI: 10.1007/s00239-024-10162-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 02/20/2024] [Indexed: 03/21/2024]
Abstract
Ancestral sequence reconstruction (ASR) is a phylogenetic method widely used to analyze the properties of ancient biomolecules and to elucidate mechanisms of molecular evolution. Despite its increasingly widespread application, the accuracy of ASR is currently unknown, as it is generally impossible to compare resurrected proteins to the true ancestors. Which evolutionary models are best for ASR? How accurate are the resulting inferences? Here we answer these questions using a cross-validation method to reconstruct each extant sequence in an alignment with ASR methodology, a method we term "extant sequence reconstruction" (ESR). We thus can evaluate the accuracy of ASR methodology by comparing ESR reconstructions to the corresponding known true sequences. We find that a common measure of the quality of a reconstructed sequence, the average probability, is indeed a good estimate of the fraction of correct amino acids when the evolutionary model is accurate or overparameterized. However, the average probability is a poor measure for comparing reconstructions from different models, because, surprisingly, a more accurate phylogenetic model often results in reconstructions with lower probability. While better (more predictive) models may produce reconstructions with lower sequence identity to the true sequences, better models nevertheless produce reconstructions that are more biophysically similar to true ancestors. In addition, we find that a large fraction of sequences sampled from the reconstruction distribution may have fewer errors than the single most probable (SMP) sequence reconstruction, despite the fact that the SMP has the lowest expected error of all possible sequences. Our results emphasize the importance of model selection for ASR and the usefulness of sampling sequence reconstructions for analyzing ancestral protein properties. ESR is a powerful method for validating the evolutionary models used for ASR and can be applied in practice to any phylogenetic analysis of real biological sequences. Most significantly, ESR uses ASR methodology to provide a general method by which the biophysical properties of resurrected proteins can be compared to the properties of the true protein.
Collapse
Affiliation(s)
- Michael A Sennett
- Department of Biochemistry, Brandeis University, Waltham, MA, 02453, USA
| | - Douglas L Theobald
- Department of Biochemistry, Brandeis University, Waltham, MA, 02453, USA.
| |
Collapse
|
3
|
Grazzini A, Cavanaugh AM. Fungal microtubule organizing centers are evolutionarily unstable structures. Fungal Genet Biol 2024; 172:103885. [PMID: 38485050 DOI: 10.1016/j.fgb.2024.103885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/07/2024] [Accepted: 03/11/2024] [Indexed: 03/24/2024]
Abstract
For most Eukaryotic species the requirements of cilia formation dictate the structure of microtubule organizing centers (MTOCs). In this study we find that loss of cilia corresponds to loss of evolutionary stability for fungal MTOCs. We used iterative search algorithms to identify proteins homologous to those found in Saccharomyces cerevisiae, and Schizosaccharomyces pombe MTOCs, and calculated site-specific rates of change for those proteins that were broadly phylogenetically distributed. Our results indicate that both the protein composition of MTOCs as well as the sequence of MTOC proteins are poorly conserved throughout the fungal kingdom. To begin to reconcile this rapid evolutionary change with the rigid structure and essential function of the S. cerevisiae MTOC we further analyzed how structural interfaces among proteins influence the rates of change for specific residues within a protein. We find that a more stable protein may stabilize portions of an interacting partner where the two proteins are in contact. In summary, while the protein composition and sequences of the MTOC may be rapidly changing the proteins within the structure have a stabilizing effect on one another. Further exploration of fungal MTOCs will expand our understanding of how changes in the functional needs of a cell have affected physical structures, proteomes, and protein sequences throughout fungal evolution.
Collapse
Affiliation(s)
- Adam Grazzini
- Department of Biology, Creighton University, Omaha, Nebraska, USA
| | - Ann M Cavanaugh
- Department of Biology, Creighton University, Omaha, Nebraska, USA.
| |
Collapse
|
4
|
Amangeldina A, Tan ZW, Berezovsky IN. Living in trinity of extremes: Genomic and proteomic signatures of halophilic, thermophilic, and pH adaptation. Curr Res Struct Biol 2024; 7:100129. [PMID: 38327713 PMCID: PMC10847869 DOI: 10.1016/j.crstbi.2024.100129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/16/2024] [Accepted: 01/16/2024] [Indexed: 02/09/2024] Open
Abstract
Since nucleic acids and proteins of unicellular prokaryotes are directly exposed to extreme environmental conditions, it is possible to explore the genomic-proteomic compositional determinants of molecular mechanisms of adaptation developed by them in response to harsh environmental conditions. Using a wealth of currently available complete genomes/proteomes we were able to explore signatures of adaptation to three environmental factors, pH, salinity, and temperature, observing major trends in compositions of their nucleic acids and proteins. We derived predictors of thermostability, halophilic, and pH adaptations and complemented them by the principal components analysis. We observed a clear difference between thermophilic and salinity/pH adaptations, whereas latter invoke seemingly overlapping mechanisms. The genome-proteome compositional trade-off reveals an intricate balance between the work of base paring and base stacking in stabilization of coding DNA and r/tRNAs, and, at the same time, universal requirements for the stability and foldability of proteins regardless of the nucleotide biases. Nevertheless, we still found hidden fingerprints of ancient evolutionary connections between the nucleotide and amino acid compositions indicating their emergence, mutual evolution, and adjustment. The evolutionary perspective on the adaptation mechanisms is further studied here by means of the comparative analysis of genomic/proteomic traits of archaeal and bacterial species. The overall picture of genomic/proteomic signals of adaptation obtained here provides a foundation for future engineering and design of functional biomolecules resistant to harsh environments.
Collapse
Affiliation(s)
- Aidana Amangeldina
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| | - Zhen Wah Tan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - Igor N. Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
5
|
Shapira M, Dobysh A, Liaudanskaya A, Aucharova H, Dzichenka Y, Bokuts V, Jovanović-Šanta S, Yantsevich A. New insights into the substrate specificity of cholesterol oxidases for more aware application. Biochimie 2023; 220:1-10. [PMID: 38104713 DOI: 10.1016/j.biochi.2023.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/20/2023] [Accepted: 12/15/2023] [Indexed: 12/19/2023]
Abstract
Cholesterol oxidases (ChOxes) are enzymes that catalyze the oxidation of cholesterol to cholest-4-en-3-one. These enzymes find wide applications across various diagnostic and industrial settings. In addition, as a pathogenic factor of several bacteria, they have significant clinical implications. The current classification system for ChOxes is based on the type of bond connecting FAD to the apoenzyme, which does not adequately illustrate the enzymatic and structural characteristics of these proteins. In this study, we have adopted an integrative approach, combining evolutionary analysis, classic enzymatic techniques and computational approaches, to elucidate the distinct features of four various ChOxes from Rhodococcus sp. (RCO), Cromobacterium sp. (CCO), Pseudomonas aeruginosa (PCO) and Burkhoderia cepacia (BCO). Comparative and evolutionary analysis of substrate-binding domain (SBD) and FAD-binding domain (FBD) helped to reveal the origin of ChOxes. We discovered that all forms of ChOxes had a common ancestor and that the structural differences evolved later during divergence. Further examination of amino acid variations revealed SBD as a more variable compared to FBD independently of FAD coupling mechanism. Revealed differences in amino acid positions turned out to be critical in determining common for ChOxes properties and those that account for the individual differences in substrate specificity. A novel look with the help of chemical descriptors on found distinct features were sufficient to attempt an alternative classification system aimed at application approach. While univocal characteristics necessary to establish such a system remain elusive, we were able to demonstrate the substrate and protein features that explain the differences in substrate profile.
Collapse
Affiliation(s)
- Michail Shapira
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, Belarus.
| | - Alexandra Dobysh
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, Belarus
| | | | - Hanna Aucharova
- Technical University of Dortmund, Faculty of Chemistry and Chemical Biology, Dortmund, Germany
| | - Yaraslau Dzichenka
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, Belarus
| | - Volha Bokuts
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, Belarus
| | - Suzana Jovanović-Šanta
- University of Novi Sad Faculty of Sciences, Department of Chemistry, Biochemistry and Environmental Protection, Novi Sad, Serbia
| | - Aliaksey Yantsevich
- Institute of Bioorganic Chemistry, National Academy of Sciences of Belarus, Minsk, Belarus
| |
Collapse
|
6
|
Sun J, Chikunova A, Boyle AL, Voskamp P, Timmer M, Ubbink M. Enhanced activity against a third-generation cephalosporin by destabilization of the active site of a class A beta-lactamase. Int J Biol Macromol 2023; 250:126160. [PMID: 37549761 DOI: 10.1016/j.ijbiomac.2023.126160] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/10/2023] [Accepted: 08/04/2023] [Indexed: 08/09/2023]
Abstract
The β-lactamase BlaC conveys resistance to a broad spectrum of β-lactam antibiotics to its host Mycobacterium tuberculosis but poorly hydrolyzes third-generation cephalosporins, such as ceftazidime. Variants of other β-lactamases have been reported to gain activity against ceftazidime at the cost of the native activity. To understand this trade-off, laboratory evolution was performed, screening for enhanced ceftazidime activity. The variant BlaC Pro167Ser shows faster breakdown of ceftazidime, poor hydrolysis of ampicillin and only moderately reduced activity against nitrocefin. NMR spectroscopy, crystallography and kinetic assays demonstrate that the resting state of BlaC P167S exists in an open and a closed state. The open state is more active in the hydrolysis of ceftazidime. In this state the catalytic residue Glu166, generally believed to be involved in the activation of the water molecule required for deacylation, is rotated away from the active site, suggesting it plays no role in the hydrolysis of ceftazidime. In the closed state, deacylation of the BlaC-ceftazidime adduct is slow, while hydrolysis of nitrocefin, which requires the presence of Glu166 in the active site, is barely affected, providing a structural explanation for the trade-off in activities.
Collapse
Affiliation(s)
- Jing Sun
- Macromolecular Biochemistry, Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC Leiden, the Netherlands
| | - Aleksandra Chikunova
- Macromolecular Biochemistry, Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC Leiden, the Netherlands
| | - Aimee L Boyle
- Macromolecular Biochemistry, Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC Leiden, the Netherlands
| | - Patrick Voskamp
- Biophysical Structural Chemistry, Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC Leiden, the Netherlands
| | - Monika Timmer
- Macromolecular Biochemistry, Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC Leiden, the Netherlands
| | - Marcellus Ubbink
- Macromolecular Biochemistry, Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333 CC Leiden, the Netherlands.
| |
Collapse
|
7
|
Szatkownik A, Zea DJ, Richard H, Laine E. Building alternative splicing and evolution-aware sequence-structure maps for protein repeats. J Struct Biol 2023; 215:107997. [PMID: 37453591 DOI: 10.1016/j.jsb.2023.107997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/15/2023] [Accepted: 07/05/2023] [Indexed: 07/18/2023]
Abstract
Alternative splicing of repeats in proteins provides a mechanism for rewiring and fine-tuning protein interaction networks. In this work, we developed a robust and versatile method, ASPRING, to identify alternatively spliced protein repeats from gene annotations. ASPRING leverages evolutionary meaningful alternative splicing-aware hierarchical graphs to provide maps between protein repeats sequences and 3D structures. We re-think the definition of repeats by explicitly accounting for transcript diversity across several genes/species. Using a stringent sequence-based similarity criterion, we detected over 5,000 evolutionary conserved repeats by screening virtually all human protein-coding genes and their orthologs across a dozen species. Through a joint analysis of their sequences and structures, we extracted specificity-determining sequence signatures and assessed their implication in experimentally resolved and modelled protein interactions. Our findings demonstrate the widespread alternative usage of protein repeats in modulating protein interactions and open avenues for targeting repeat-mediated interactions.
Collapse
Affiliation(s)
- Antoine Szatkownik
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France; Bioinformatics Unit, Genome Competence Center (MF1), Robert Koch Institute, 13353 Berlin, Germany
| | - Diego Javier Zea
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Hugues Richard
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France; Bioinformatics Unit, Genome Competence Center (MF1), Robert Koch Institute, 13353 Berlin, Germany.
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France.
| |
Collapse
|
8
|
Martinez-Goikoetxea M, Lupas AN. New protein families with hendecad coiled coils in the proteome of life. J Struct Biol 2023; 215:108007. [PMID: 37524272 DOI: 10.1016/j.jsb.2023.108007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/30/2023] [Accepted: 07/28/2023] [Indexed: 08/02/2023]
Abstract
Coiled coils are a widespread and well understood protein fold. Their short and simple repeats underpin considerable structural and functional diversity. The vast majority of coiled coils consist of 7-residue (heptad) sequence repeats, but in essence most combinations of 3- and 4-residue segments, each starting with a residue of the hydrophobic core, are compatible with coiled-coil structure. The most frequent among these other repeat patterns are 11-residue (hendecad, 3 + 4 + 4) repeats. Hendecads are frequently found in low copy number, interspersed between heptads, but some proteins consist largely or entirely of hendecad repeats. Here we describe the first large-scale survey of these proteins in the proteome of life. For this, we scanned the protein sequence database for sequences with 11-residue periodicity that lacked β-strand prediction. We then clustered these by pairwise similarity to construct a map of potential hendecad coiled-coil families. Here we discuss these according to their structural properties, their potential cellular roles, and the evolutionary mechanisms shaping their diversity. We note in particular the continuous amplification of hendecads, both within existing proteins and de novo from previously non-coding sequence, as a powerful mechanism in the genesis of new coiled-coil forms.
Collapse
Affiliation(s)
| | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Biology, 72076 Tübingen, Germany.
| |
Collapse
|
9
|
Brazão JM, Foster PG, Cox CJ. Data-specific substitution models improve protein-based phylogenetics. PeerJ 2023; 11:e15716. [PMID: 37576497 PMCID: PMC10416777 DOI: 10.7717/peerj.15716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 06/16/2023] [Indexed: 08/15/2023] Open
Abstract
Calculating amino-acid substitution models that are specific for individual protein data sets is often difficult due to the computational burden of estimating large numbers of rate parameters. In this study, we tested the computational efficiency and accuracy of five methods used to estimate substitution models, namely Codeml, FastMG, IQ-TREE, P4 (maximum likelihood), and P4 (Bayesian inference). Data-specific substitution models were estimated from simulated alignments (with different lengths) that were generated from a known simulation model and simulation tree. Each of the resulting data-specific substitution models was used to calculate the maximum likelihood score of the simulation tree and simulated data that was used to calculate the model, and compared with the maximum likelihood scores of the known simulation model and simulation tree on the same simulated data. Additionally, the commonly-used empirical models, cpREV and WAG, were assessed similarly. Data-specific models performed better than the empirical models, which under-fitted the simulated alignments, had the highest difference to the simulation model maximum-likelihood score, clustered further from the simulation model in principal component analysis ordination, and inferred less accurate trees. Data-specific models and the simulation model shared statistically indistinguishable maximum-likelihood scores, indicating that the five methods were reasonably accurate at estimating substitution models by this measure. Nevertheless, tree statistics showed differences between optimal maximum likelihood trees. Unlike other model estimating methods, trees inferred using data-specific models generated with IQ-TREE and P4 (maximum likelihood) were not significantly different from the trees derived from the simulation model in each analysis, indicating that these two methods alone were the most accurate at estimating data-specific models. To show the benefits of using data-specific protein models several published data sets were reanalysed using IQ-TREE-estimated models. These newly estimated models were a better fit to the data than the empirical models that were used by the original authors, often inferred longer trees, and resulted in different tree topologies in more than half of the re-analysed data sets. The results of this study show that software availability and high computation burden are not limitations to generating better-fitting data-specific amino-acid substitution models for phylogenetic analyses.
Collapse
Affiliation(s)
- João M. Brazão
- Centro de Ciências do Mar, Universidade do Algarve, Faro, Algarve, Portugal
| | - Peter G. Foster
- Department of Life Sciences, Natural History Museum, London, United Kingdom
| | - Cymon J. Cox
- Centro de Ciências do Mar, Universidade do Algarve, Faro, Algarve, Portugal
| |
Collapse
|
10
|
Del Amparo R, Arenas M. Influence of substitution model selection on protein phylogenetic tree reconstruction. Gene 2023; 865:147336. [PMID: 36871672 DOI: 10.1016/j.gene.2023.147336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 02/22/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023]
Abstract
Probabilistic phylogenetic tree reconstruction is traditionally performed under a best-fitting substitution model of molecular evolution previously selected according to diverse statistical criteria. Interestingly, some recent studies proposed that this procedure is unnecessary for phylogenetic tree reconstruction leading to a debate in the field. In contrast to DNA sequences, phylogenetic tree reconstruction from protein sequences is traditionally based on empirical exchangeability matrices that can differ among taxonomic groups and protein families. Considering this aspect, here we investigated the influence of selecting a substitution model of protein evolution on phylogenetic tree reconstruction by the analyses of real and simulated data. We found that phylogenetic tree reconstructions based on a selected best-fitting substitution model of protein evolution are the most accurate, in terms of topology and branch lengths, compared with those derived from substitution models with amino acid replacement matrices far from the selected best-fitting model, especially when the data has large genetic diversity. Indeed, we found that substitution models with similar amino acid replacement matrices produce similar reconstructed phylogenetic trees, suggesting the use of substitution models as similar as possible to a selected best-fitting model when the latter cannot be used. Therefore, we recommend the use of the traditional protocol of selection among substitution models of evolution for protein phylogenetic tree reconstruction.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain.
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain; Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain.
| |
Collapse
|
11
|
Zhang L, Li G, Zhang Y, Cheng Y, Roberts N, Glenn SE, DeZwaan-McCabe D, Rube HT, Manthey J, Coleman G, Vakulskas CA, Qi Y. Boosting genome editing efficiency in human cells and plants with novel LbCas12a variants. Genome Biol 2023; 24:102. [PMID: 37122009 PMCID: PMC10150537 DOI: 10.1186/s13059-023-02929-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 04/07/2023] [Indexed: 05/02/2023] Open
Abstract
BACKGROUND Cas12a (formerly known as Cpf1), the class II type V CRISPR nuclease, has been widely used for genome editing in mammalian cells and plants due to its distinct characteristics from Cas9. Despite being one of the most robust Cas12a nucleases, LbCas12a in general is less efficient than SpCas9 for genome editing in human cells, animals, and plants. RESULTS To improve the editing efficiency of LbCas12a, we conduct saturation mutagenesis in E. coli and identify 1977 positive point mutations of LbCas12a. We selectively assess the editing efficiency of 56 LbCas12a variants in human cells, identifying an optimal LbCas12a variant (RVQ: G146R/R182V/E795Q) with the most robust editing activity. We further test LbCas12a-RV, LbCas12a-RRV, and LbCas12a-RVQ in plants and find LbCas12a-RV has robust editing activity in rice and tomato protoplasts. Interestingly, LbCas12a-RRV, resulting from the stacking of RV and D156R, displays improved editing efficiency in stably transformed rice and poplar plants, leading to up to 100% editing efficiency in T0 plants of both plant species. Moreover, this high-efficiency editing occurs even at the non-canonical TTV PAM sites. CONCLUSIONS Our results demonstrate that LbCas12a-RVQ is a powerful tool for genome editing in human cells while LbCas12a-RRV confers robust genome editing in plants. Our study reveals the tremendous potential of these LbCas12a variants for advancing precision genome editing applications across a wide range of organisms.
Collapse
Affiliation(s)
- Liyang Zhang
- Integrated DNA Technologies, Coralville, IA, 52241, USA
- Current Address: Aera Therapeutics, 50 Northern Ave, Boston, MA, 02210, USA
| | - Gen Li
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, 20742, USA
| | - Yingxiao Zhang
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, 20742, USA
- Current Address: Syngenta, 9 Davis Dr, Research Triangle, NC, 27709, USA
| | - Yanhao Cheng
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, 20742, USA
| | | | - Steve E Glenn
- Integrated DNA Technologies, Coralville, IA, 52241, USA
| | | | - H Tomas Rube
- Department of Applied Mathematics, University of California-Merced, Merced, CA, 95343, USA
| | - Jeff Manthey
- Integrated DNA Technologies, Coralville, IA, 52241, USA
| | - Gary Coleman
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, 20742, USA
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, 20850, USA
| | | | - Yiping Qi
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, 20742, USA.
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, 20850, USA.
| |
Collapse
|
12
|
Toyoda Y, Miyata H, Uchida N, Morimoto K, Shigesawa R, Kassai H, Nakao K, Tomioka NH, Matsuo H, Ichida K, Hosoyamada M, Aiba A, Suzuki H, Takada T. Vitamin C transporter SVCT1 serves a physiological role as a urate importer: functional analyses and in vivo investigations. Pflugers Arch 2023; 475:489-504. [PMID: 36749388 DOI: 10.1007/s00424-023-02792-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 01/20/2023] [Accepted: 01/24/2023] [Indexed: 02/08/2023]
Abstract
Uric acid, the end product of purine metabolism in humans, is crucial because of its anti-oxidant activity and a causal relationship with hyperuricemia and gout. Several physiologically important urate transporters regulate this water-soluble metabolite in the human body; however, the existence of latent transporters has been suggested in the literature. We focused on the Escherichia coli urate transporter YgfU, a nucleobase-ascorbate transporter (NAT) family member, to address this issue. Only SLC23A proteins are members of the NAT family in humans. Based on the amino acid sequence similarity to YgfU, we hypothesized that SLC23A1, also known as sodium-dependent vitamin C transporter 1 (SVCT1), might be a urate transporter. First, we identified human SVCT1 and mouse Svct1 as sodium-dependent low-affinity/high-capacity urate transporters using mammalian cell-based transport assays. Next, using the CRISPR-Cas9 system followed by the crossing of mice, we generated Svct1 knockout mice lacking both urate transporter 1 and uricase. In the hyperuricemic mice model, serum urate levels were lower than controls, suggesting that Svct1 disruption could reduce serum urate. Given that Svct1 physiologically functions as a renal vitamin C re-absorber, it could also be involved in urate re-uptake from urine, though additional studies are required to obtain deeper insights into the underlying mechanisms. Our findings regarding the dual-substrate specificity of SVCT1 expand the understanding of urate handling systems and functional evolutionary changes in NAT family proteins.
Collapse
|
13
|
Rajapaksa S, Konagurthu AS, Lesk AM. Sequence and structure alignments in post-AlphaFold era. Curr Opin Struct Biol 2023; 79:102539. [PMID: 36753924 DOI: 10.1016/j.sbi.2023.102539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 01/02/2023] [Indexed: 02/09/2023]
Abstract
Sequence alignment is fundamental for analyzing protein structure and function. For all but closely-related proteins, alignments based on structures are more accurate than alignments based purely on amino-acid sequences. However, the disparity between the large amount of sequence data and the relative paucity of experimentally-determined structures has precluded the general applicability of structure alignment. Based on the success of AlphaFold (and its likes) in producing high-quality structure predictions, we suggest that when aligning homologous proteins, lacking experimental structures, better results can be obtained by a structural alignment of predicted structures than by an alignment based only on amino-acid sequences. We present a quantitative evaluation, based on pairwise alignments of sequences and structures (both predicted and experimental) to support this hypothesis.
Collapse
Affiliation(s)
- Sandun Rajapaksa
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, 3800, Victoria, Australia
| | - Arun S Konagurthu
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, 3800, Victoria, Australia
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, 16802, Pennsylvania, USA.
| |
Collapse
|
14
|
Marszalek J, Craig EA, Tomiczek B. J-Domain Proteins Orchestrate the Multifunctionality of Hsp70s in Mitochondria: Insights from Mechanistic and Evolutionary Analyses. Subcell Biochem 2023; 101:293-318. [PMID: 36520311 DOI: 10.1007/978-3-031-14740-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Mitochondrial J-domain protein (JDP) co-chaperones orchestrate the function of their Hsp70 chaperone partner(s) in critical organellar processes that are essential for cell function. These include folding, refolding, and import of mitochondrial proteins, maintenance of mitochondrial DNA, and biogenesis of iron-sulfur cluster(s) (FeS), prosthetic groups needed for function of mitochondrial and cytosolic proteins. Consistent with the organelle's endosymbiotic origin, mitochondrial Hsp70 and the JDPs' functioning in protein folding and FeS biogenesis clearly descended from bacteria, while the origin of the JDP involved in protein import is less evident. Regardless of their origin, all mitochondrial JDP/Hsp70 systems evolved unique features that allowed them to perform mitochondria-specific functions. Their modes of functional diversification and specialization illustrate the versatility of JDP/Hsp70 systems and inform our understanding of system functioning in other cellular compartments.
Collapse
Affiliation(s)
- Jaroslaw Marszalek
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland.
| | - Elizabeth A Craig
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA.
| | - Bartlomiej Tomiczek
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| |
Collapse
|
15
|
Del Amparo R, González-Vázquez LD, Rodríguez-Moure L, Bastolla U, Arenas M. Consequences of Genetic Recombination on Protein Folding Stability. J Mol Evol 2023; 91:33-45. [PMID: 36463317 PMCID: PMC9849154 DOI: 10.1007/s00239-022-10080-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Genetic recombination is a common evolutionary mechanism that produces molecular diversity. However, its consequences on protein folding stability have not attracted the same attention as in the case of point mutations. Here, we studied the effects of homologous recombination on the computationally predicted protein folding stability for several protein families, finding less detrimental effects than we previously expected. Although recombination can affect multiple protein sites, we found that the fraction of recombined proteins that are eliminated by negative selection because of insufficient stability is not significantly larger than the corresponding fraction of proteins produced by mutation events. Indeed, although recombination disrupts epistatic interactions, the mean stability of recombinant proteins is not lower than that of their parents. On the other hand, the difference of stability between recombined proteins is amplified with respect to the parents, promoting phenotypic diversity. As a result, at least one third of recombined proteins present stability between those of their parents, and a substantial fraction have higher or lower stability than those of both parents. As expected, we found that parents with similar sequences tend to produce recombined proteins with stability close to that of the parents. Finally, the simulation of protein evolution along the ancestral recombination graph with empirical substitution models commonly used in phylogenetics, which ignore constraints on protein folding stability, showed that recombination favors the decrease of folding stability, supporting the convenience of adopting structurally constrained models when possible for inferences of protein evolutionary histories with recombination.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Luis Daniel González-Vázquez
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Laura Rodríguez-Moure
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Ugo Bastolla
- Centre for Molecular Biology Severo Ochoa (CSIC-UAM), 28049 Madrid, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain ,Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
16
|
Li J, Yang Y, Li J, Li P, Qi H. Cell-Free Display Techniques for Protein Evolution. Adv Biochem Eng Biotechnol 2023; 185:59-90. [PMID: 37306697 DOI: 10.1007/10_2023_227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cell-free protein synthesis (CFPS) with flexibility and controllability can provide a powerful platform for high-throughput screening of biomolecules, especially in the evolution of peptides or proteins. In this chapter, the emerging strategies for enhancing the protein expression level using different source strains, energy systems, and template designs in constructing CFPS systems are summarized and discussed in detail. In addition, we provide an overview of the ribosome display, mRNA display, cDNA display, and CIS display in vitro display technologies, which can couple genotype and phenotype by forming fusion complexes. Moreover, we point out the trend that improving the protein yields of CFPS itself can offer more favorable conditions for maintaining library diversity and display efficiency. It is hoped that the novel CFPS system can accelerate the development of protein evolution in biotechnological and medical applications.
Collapse
Affiliation(s)
- Jiaojiao Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Youhui Yang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Jinjin Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Peixian Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Hao Qi
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China.
- Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin, China.
| |
Collapse
|
17
|
Costa AD, Franco-Duarte R, Machado R, Gomes AC. Uncovering the Promiscuous Activity of IL-6 Proteins: A Multi-Dimensional Analysis of Phylogeny, Classification and Residue Conservation. Protein Sci 2022; 31:e4469. [PMID: 36222303 DOI: 10.1002/pro.4469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 09/21/2022] [Accepted: 10/08/2022] [Indexed: 11/06/2022]
Abstract
The IL-6 family of cytokines, known for their pleiotropic behavior, share binding to the gp130 receptor for signal transduction with the necessity to bind other receptors. Leukemia inhibitory factor receptor is triggered by the IL-6 family proteins: leukemia inhibitory factor (LIF), oncostatin-m (OSM), cardiotrophin-1 (CT-1), ciliary neurotrophic factor (CNTF) and cardiotrophin-like cytokine factor 1 (CLCF1). Besides the conserved binding sites to the receptor, not much is known in terms of diversity and characteristics of these proteins in different organisms. Herein, we describe the sequence analysis of LIF, OSM and CT-1 from several organisms, and m17, a LIF ortholog found in fishes, regarding its phylogenetics, intrinsic properties and the impact of conserved residues on structural features. Sequences were identified in 7 classes of vertebrates, showing high conservation values in binding site III, but protein-dependent results on binding site II. GRAVY, isoelectric point and molecular weight parameters were relevant to differentiate classes in each protein and to enable, for the first time and with high fidelity, the prediction of both organism class and protein type just using machine learning approaches. OSM sequences from primates showed an increased BC loop when compared to the remaining mammals, which could influence binding to OSM receptor and tune signaling pathways. Overall, this study highlights the potential of sequence diversity analysis to understand IL-6 cytokine family evolution, showing conservation of function-related motifs and evolution of class and protein-dependent characteristics. Our results could impact future medical treatment of disorders associated with imbalances in these cytokines. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- André da Costa
- CBMA - Centre of Molecular and Environmental Biology, Department of Biology, University of Minho, Campus of Gualtar, Braga, Portugal.,IB-S Institute of Science and Innovation for Sustainability, University of Minho, Campus of Gualtar, Braga, Portugal
| | - Ricardo Franco-Duarte
- CBMA - Centre of Molecular and Environmental Biology, Department of Biology, University of Minho, Campus of Gualtar, Braga, Portugal.,IB-S Institute of Science and Innovation for Sustainability, University of Minho, Campus of Gualtar, Braga, Portugal
| | - Raul Machado
- CBMA - Centre of Molecular and Environmental Biology, Department of Biology, University of Minho, Campus of Gualtar, Braga, Portugal.,IB-S Institute of Science and Innovation for Sustainability, University of Minho, Campus of Gualtar, Braga, Portugal
| | - Andreia C Gomes
- CBMA - Centre of Molecular and Environmental Biology, Department of Biology, University of Minho, Campus of Gualtar, Braga, Portugal.,IB-S Institute of Science and Innovation for Sustainability, University of Minho, Campus of Gualtar, Braga, Portugal
| |
Collapse
|
18
|
Verma S, Sowdhamini R. A genome-wide search of Toll/Interleukin-1 receptor (TIR) domain-containing adapter molecule (TICAM) and their evolutionary divergence from other TIR domain containing proteins. Biol Direct 2022; 17:24. [PMID: 36056415 PMCID: PMC9440496 DOI: 10.1186/s13062-022-00335-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open
Abstract
Toll/Interleukin-1 receptor (TIR) domains are cytoplasmic domain that mediates receptor signalling. These domains are present in proteins like Toll-like receptors (TLR), its signaling adaptors and Interleukins, that form a major part of the immune system. These TIR domain containing signaling adaptors binds to the TLRs and interacts with their TIR domains for downstream signaling. We have examined the evolutionary divergence across the tree of life of two of these TIR domain containing adaptor molecules (TICAM) i.e., TIR domain-containing adapter-inducing interferon-β (TRIF/TICAM1) and TIR domain containing adaptor molecule2 (TRAM/TICAM2), by using computational approaches. We studied their orthologs, domain architecture, conserved motifs, and amino acid variations. Our study also adds a timeframe to infer the duplication of TICAM protein from Leptocardii and later divergence into TICAM1/TRIF and TICAM2/TRAM. More evidence of TRIF proteins was seen, but the absence of conserved co-existing domains such as TRIF-NTD, TIR, and RHIM domains in distant relatives hints on diversification and adaptation to different biological functions. TRAM was lost in Actinopteri and has conserved domain architecture of TIR across species except in Aves. An additional isoform of TRAM, TAG (TRAM adaptor with the GOLD domain), could be identified in species in the Mesozoic era. Finally, the Hypothesis based Likelihood ratio test was applied to look for selection pressure amongst orthologues of TRIF and TRAM to search for positively selected sites. These residues were mostly seen in the non-structural region of the proteins. Overall, this study unravels evolutionary information on the adaptors TRAM and TRIF and how well they had duplicated to perform diverse functions by changes in their domain architecture across lineages.
Collapse
Affiliation(s)
- Shailya Verma
- National Centre for Biological Sciences, GKVK Campus, Bellary Road, Bangalore, 560065, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, GKVK Campus, Bellary Road, Bangalore, 560065, India. .,Institute of Bioinformatics and Applied Biotechnology, Bangalore, 560100, India. .,Molecular Biophysics Unit, Indian Institute of Science, CV Raman Road, Karnataka, 560012, Bangalore, India.
| |
Collapse
|
19
|
Sanchez Granel ML, Siburu NG, Fricska A, Maldonado LL, Gargiulo LB, Nudel CB, Uttaro AD, Nusblat AD. A novel Tetrahymena thermophila sterol C-22 desaturase belongs to the Fatty Acid Hydroxylase/Desaturase superfamily. J Biol Chem 2022; 298:102397. [PMID: 35988640 PMCID: PMC9485055 DOI: 10.1016/j.jbc.2022.102397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 08/09/2022] [Accepted: 08/15/2022] [Indexed: 11/27/2022] Open
Abstract
Sterols in eukaryotic cells play important roles in modulating membrane fluidity and in cell signaling and trafficking. During evolution, a combination of gene losses and acquisitions gave rise to an extraordinary diversity of sterols in different organisms. The sterol C-22 desaturase identified in plants and fungi as a cytochrome P-450 monooxygenase evolved from the first eukaryotic cytochrome P450 and was lost in many lineages. Although the ciliate Tetrahymena thermophila desaturates sterols at the C-22 position, no cytochrome P-450 orthologs are present in the genome. Here, we aim to identify the genes responsible for the desaturation as well as their probable origin. We used gene knockout and yeast heterologous expression approaches to identify two putative genes, retrieved from a previous transcriptomic analysis, as sterol C-22 desaturases. Furthermore, we demonstrate using bioinformatics and evolutionary analyses that both genes encode a novel type of sterol C-22 desaturase that belongs to the large fatty acid hydroxylase/desaturase superfamily and the genes originated by genetic duplication prior to functional diversification. These results stress the widespread existence of nonhomologous isofunctional enzymes among different lineages of the tree of life as well as the suitability for the use of T. thermophila as a valuable model to investigate the evolutionary process of large enzyme families.
Collapse
Affiliation(s)
- María L Sanchez Granel
- Instituto de Nanobiotecnología (NANOBIOTEC), CONICET, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires, Junín 956, C1113AAD, Buenos Aires, Argentina
| | - Nicolás G Siburu
- Instituto de Biología Molecular y Celular de Rosario, CONICET, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Ocampo y Esmeralda s/n, S2000FHQ, Rosario, Argentina
| | - Annamária Fricska
- Instituto de Nanobiotecnología (NANOBIOTEC), CONICET, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires, Junín 956, C1113AAD, Buenos Aires, Argentina
| | - Lucas L Maldonado
- Instituto de Investigaciones en Microbiología y Parasitología Médica (IMPaM), CONICET, Facultad de Medicina, Universidad de Buenos Aires, Junín 956, C1113AAD, Buenos Aires, Argentina
| | - Laura B Gargiulo
- Instituto de Nanobiotecnología (NANOBIOTEC), CONICET, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires, Junín 956, C1113AAD, Buenos Aires, Argentina
| | - Clara B Nudel
- Instituto de Nanobiotecnología (NANOBIOTEC), CONICET, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires, Junín 956, C1113AAD, Buenos Aires, Argentina
| | - Antonio D Uttaro
- Instituto de Biología Molecular y Celular de Rosario, CONICET, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Ocampo y Esmeralda s/n, S2000FHQ, Rosario, Argentina.
| | - Alejandro D Nusblat
- Instituto de Nanobiotecnología (NANOBIOTEC), CONICET, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires, Junín 956, C1113AAD, Buenos Aires, Argentina.
| |
Collapse
|
20
|
Fraga D, Ellington WR, Suzuki T. The characterization of novel monomeric creatine kinases in the early branching Alveolata species, Perkinsus marinus: Implications for phosphagen kinase evolution. Comp Biochem Physiol B Biochem Mol Biol 2022; 262:110758. [PMID: 35598705 DOI: 10.1016/j.cbpb.2022.110758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/14/2022] [Accepted: 05/16/2022] [Indexed: 11/20/2022]
Abstract
The genome of the unicellular molluscan parasite Perkinsus marinus contains at least five genes coding for putative creatine kinases (CK), a phosphoryl transfer enzyme which plays a key role in cellular energy transactions. Expression and kinetic analyses of three of the P. marinus CKs revealed them to be true CKs with catalytic properties in the range of typical metazoan CKs. A sequence comparison of the P. marinus CKs with a range of CK dimers and other dimeric phosphoryl transfer enzymes in this family (phosphagen kinases) showed that the P. marinus CKs lacked some of the critical residues involved in dimer stabilization, a trait all previously characterized CKs share. Size exclusion chromatography of all three expressed P. marinus CK constructs indicated they are monomeric, consistent with the observed lack of some critical dimer stabilizing residues. Phylogenetic analyses of the P. marinus CKs and putative dinoflagellate CKs with a broad range of monomeric and dimeric phosphagen kinases revealed that the Perkinsus CKs form a distinct, well-supported clade with dinoflagellate CKs which also lack the dimer stabilizing residues. Analysis of the genomic data for P. marinus showed the presence of putative genes for the two enzymes associated with creatine biosynthesis. CK in higher organisms plays a critical role in energy buffering in cell types displaying high and variable rates of ATP turnover. The presence of multiple CKs and the creatine biosynthetic pathway in P. marinus indicates that this unicellular parasite has the full complement of molecular machinery for CK-mediated energy buffering.
Collapse
|
21
|
McDonald JMC, Reed RD. Patterns of selection across gene regulatory networks. Semin Cell Dev Biol 2022; 145:60-67. [PMID: 35474149 DOI: 10.1016/j.semcdb.2022.03.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 01/31/2022] [Accepted: 03/23/2022] [Indexed: 12/29/2022]
Abstract
Gene regulatory networks (GRNs) are the core engine of organismal development. If we would like to understand the origin and diversification of phenotypes, it is necessary to consider the structure of GRNs in order to reconstruct the links between genetic mutations and phenotypic change. Much of the progress in evolutionary developmental biology, however, has occurred without a nuanced consideration of the evolution of functional relationships between genes, especially in the context of their broader network interactions. Characterizing and comparing GRNs across traits and species in a more detailed way will allow us to determine how network position influences what genes drive adaptive evolution. In this perspective paper, we consider the architecture of developmental GRNs and how positive selection strength may vary across a GRN. We then propose several testable models for these patterns of selection and experimental approaches to test these models.
Collapse
Affiliation(s)
- Jeanne M C McDonald
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, United States.
| | - Robert D Reed
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, United States.
| |
Collapse
|
22
|
Ahmed S, Manjunath K, Chattopadhyay G, Varadarajan R. Identification of stabilizing point mutations through mutagenesis of destabilized protein libraries. J Biol Chem 2022; 298:101785. [PMID: 35247389 PMCID: PMC8971944 DOI: 10.1016/j.jbc.2022.101785] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 02/18/2022] [Accepted: 02/26/2022] [Indexed: 01/22/2023] Open
Abstract
Although there have been recent transformative advances in the area of protein structure prediction, prediction of point mutations that improve protein stability remains challenging. It is possible to construct and screen large mutant libraries for improved activity or ligand binding. However, reliable screens for mutants that improve protein stability do not yet exist, especially for proteins that are well folded and relatively stable. Here, we demonstrate that incorporation of a single, specific, destabilizing mutation termed parent inactivating mutation into each member of a single-site saturation mutagenesis library, followed by screening for suppressors, allows for robust and accurate identification of stabilizing mutations. We carried out fluorescence-activated cell sorting of such a yeast surface display, saturation suppressor library of the bacterial toxin CcdB, followed by deep sequencing of sorted populations. We found that multiple stabilizing mutations could be identified after a single round of sorting. In addition, multiple libraries with different parent inactivating mutations could be pooled and simultaneously screened to further enhance the accuracy of identification of stabilizing mutations. Finally, we show that individual stabilizing mutations could be combined to result in a multi-mutant that demonstrated an increase in thermal melting temperature of about 20 °C, and that displayed enhanced tolerance to high temperature exposure. We conclude that as this method is robust and employs small library sizes, it can be readily extended to other display and screening formats to rapidly isolate stabilized protein mutants.
Collapse
Affiliation(s)
- Shahbaz Ahmed
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Kavyashree Manjunath
- Centre for Chemical Biology and Therapeutics, Institute of Stem Cell Science and Regenerative Medicine, Bangalore, India
| | | | | |
Collapse
|
23
|
Bahr G, González LJ, Vila AJ. Metallo-β-lactamases and a tug-of-war for the available zinc at the host-pathogen interface. Curr Opin Chem Biol 2022; 66:102103. [PMID: 34864439 PMCID: PMC8860843 DOI: 10.1016/j.cbpa.2021.102103] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 10/16/2021] [Accepted: 10/28/2021] [Indexed: 02/03/2023]
Abstract
Metallo-β-lactamases (MBLs) are zinc-dependent hydrolases that inactivate virtually all β-lactam antibiotics. The expression of MBLs by Gram-negative bacteria severely limits the therapeutic options to treat infections. MBLs bind the essential metal ions in the bacterial periplasm, and their activity is challenged upon the zinc starvation conditions elicited by the native immune response. Metal depletion compromises both the enzyme activity and stability in the periplasm, impacting on the resistance profile in vivo. Thus, novel inhibitory approaches involve the use of chelating agents or metal-based drugs that displace the native metal ion. However, newer MBL variants incorporate mutations that improve their metal binding abilities or stabilize the metal-depleted form, revealing that metal starvation is a driving force acting on MBL evolution. Future challenges require addressing the gap between in cell and in vitro studies, dissecting the mechanism for MBL metalation and determining the metal content in situ.
Collapse
Affiliation(s)
- Guillermo Bahr
- Instituto de Biología Molecular y Celular de Rosario (IBR, CONICET-UNR), S2000EXF Rosario, Argentina; Área Biofísica, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, S2002LRK Rosario, Argentina
| | - Lisandro J González
- Instituto de Biología Molecular y Celular de Rosario (IBR, CONICET-UNR), S2000EXF Rosario, Argentina; Área Biofísica, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, S2002LRK Rosario, Argentina
| | - Alejandro J Vila
- Instituto de Biología Molecular y Celular de Rosario (IBR, CONICET-UNR), S2000EXF Rosario, Argentina; Área Biofísica, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, S2002LRK Rosario, Argentina.
| |
Collapse
|
24
|
Moussian B, Casadei N. Identification and Functional Characterization of Argonaute (Ago) Proteins in Insect Genomes. Methods Mol Biol 2022; 2360:9-17. [PMID: 34495503 DOI: 10.1007/978-1-0716-1633-8_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
RNA processing is a vital process in all organisms. In eukaryotes, the RNA induced silencing complex (RISC) mediates this function during development and physiological processes and, at least in arthropods, during RNA-viral infections. Argonaute-like RNA-binding proteins are central components of this complex. RNA-based insecticides are gaining more and more a central role in pest control. Understanding of the underlying molecular mechanisms including Ago-like proteins is crucial in designing powerful, species-specific and environmental-friendly insecticides. This chapter describes a protocol for identification and genetic functional analyses of insect Ago-like proteins in the fruit fly Drosophila melanogaster that serves as a living test tube.
Collapse
Affiliation(s)
| | - Nicolas Casadei
- Universitätsklinikum Tübingen, Institute for Medical Genetics and Applied Genomics, Tübingen, Germany
| |
Collapse
|
25
|
Huttener R, Thorrez L, Veld TI, Potter B, Baele G, Granvik M, Van Lommel L, Schuit F. Regional effect on the molecular clock rate of protein evolution in Eutherian and Metatherian genomes. BMC Ecol Evol 2021; 21:153. [PMID: 34348656 PMCID: PMC8336415 DOI: 10.1186/s12862-021-01882-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 07/22/2021] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Different types of proteins diverge at vastly different rates. Moreover, the same type of protein has been observed to evolve with different rates in different phylogenetic lineages. In the present study we measured the rates of protein evolution in Eutheria (placental mammals) and Metatheria (marsupials) on a genome-wide basis and we propose that the gene position in the genome landscape has an important influence on the rate of protein divergence. RESULTS We analyzed a protein-encoding gene set (n = 15,727) common to 16 mammals (12 Eutheria and 4 Metatheria). Using sliding windows that averaged regional effects of protein divergence we constructed landscapes in which strong and lineage-specific regional effects were seen on the molecular clock rate of protein divergence. Within each lineage, the relatively high rates were preferentially found in subtelomeric chromosomal regions. Such regions were observed to contain important and well-studied loci for fetal growth, uterine function and the generation of diversity in the adaptive repertoire of immunoglobulins. CONCLUSIONS A genome landscape approach visualizes lineage-specific regional differences between Eutherian and Metatherian rates of protein evolution. This phenomenon of chromosomal position is a new element that explains at least part of the lineage-specific effects and differences between proteins on the molecular clock rates.
Collapse
Affiliation(s)
- Raf Huttener
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium
| | - Lieven Thorrez
- Tissue Engineering Laboratory, Department of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - Thomas In't Veld
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium
| | - Barney Potter
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Mikaela Granvik
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium
| | - Leentje Van Lommel
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium
| | - Frans Schuit
- Gene Expression Unit, Dept. of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, O&N1, Bus 901, 3000, Leuven, Belgium.
| |
Collapse
|
26
|
Marcos ML, Echave J. The variation among sites of protein structure divergence is shaped by mutation and scaled by selection. Curr Res Struct Biol 2021; 2:156-163. [PMID: 34235475 PMCID: PMC8244499 DOI: 10.1016/j.crstbi.2020.08.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 07/09/2020] [Accepted: 08/17/2020] [Indexed: 12/30/2022] Open
Abstract
Protein structures do not evolve uniformly, but the degree of structure divergence varies among sites. The resulting site-dependent structure divergence patterns emerge from a process that involves mutation and selection, which may both, in principle, influence the emergent pattern. In contrast with sequence divergence patterns, which are known to be mainly determined by selection, the relative contributions of mutation and selection to structure divergence patterns is unclear. Here, studying 6 protein families with a mechanistic biophysical model of protein evolution, we untangle the effects of mutation and selection. We found that even in the absence of selection, structure divergence varies from site to site because the mutational sensitivity is not uniform. Selection scales the profile, increasing its amplitude, without changing its shape. This scaling effect follows from the similarity between mutational sensitivity and sequence variability profiles. The degree of evolutionary divergence of protein structures varies among sites. A Mutation-Selection model (MSM) of protein structure evolution with selection for stability is developed. Even in the case of no selection, the sensitivity of the structure to random mutations varies among sites. Selection amplifies this variation but it does not affect its shape. This scaling effect of selection follows from the similarity between the selection-independent mutational sensitivity and the selection-dependent sequence divergence, the two contributions that are combined to produce the observed structural divergence profile.
Collapse
Affiliation(s)
- María Laura Marcos
- Instituto de Ciencias Físicas, Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, Martín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| | - Julian Echave
- Instituto de Ciencias Físicas, Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, Martín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| |
Collapse
|
27
|
Zhang Z, Ryoo D, Balusek C, Acharya A, Rydmark MO, Linke D, Gumbart JC. Inward-facing glycine residues create sharp turns in β-barrel membrane proteins. Biochim Biophys Acta Biomembr 2021; 1863:183662. [PMID: 34097860 DOI: 10.1016/j.bbamem.2021.183662] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 04/15/2021] [Accepted: 05/17/2021] [Indexed: 11/29/2022]
Abstract
The transmembrane region of outer-membrane proteins (OMPs) of Gram-negative bacteria are almost exclusively β-barrels composed of between 8 and 26 β-strands. To explore the relationship between β-barrel size and shape, we modeled and simulated engineered variants of the Escherichia coli protein OmpX with 8, 10, 12, 14, and 16 β-strands. We found that while smaller barrels maintained a roughly circular shape, the 16-stranded variant developed a flattened cross section. This flat cross section impeded its ability to conduct ions, in agreement with previous experimental observations. Flattening was determined to arise from the presence of inward-facing glycines at sharp turns in the β-barrel. An analysis of all simulations revealed that glycines, on average, make significantly smaller angles with residues on neighboring strands than all other amino acids, including alanine, and create sharp turns in β-barrel cross sections. This observation was generalized to 119 unique structurally resolved OMPs. We also found that the fraction of glycines in β-barrels decreases as the strand number increases, suggesting an evolutionary role for the addition or removal of glycine in OMP sequences.
Collapse
Affiliation(s)
- Zijian Zhang
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30313, United States of America
| | - David Ryoo
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA 30332, United States of America
| | - Curtis Balusek
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30313, United States of America
| | - Atanu Acharya
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30313, United States of America
| | | | - Dirk Linke
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - James C Gumbart
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30313, United States of America.
| |
Collapse
|
28
|
Park D, Hahn Y. Rapid protein sequence evolution via compensatory frameshift is widespread in RNA virus genomes. BMC Bioinformatics 2021; 22:251. [PMID: 34000995 PMCID: PMC8127213 DOI: 10.1186/s12859-021-04182-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/10/2021] [Indexed: 11/25/2022] Open
Abstract
Background RNA viruses possess remarkable evolutionary versatility driven by the high mutability of their genomes. Frameshifting nucleotide insertions or deletions (indels), which cause the premature termination of proteins, are frequently observed in the coding sequences of various viral genomes. When a secondary indel occurs near the primary indel site, the open reading frame can be restored to produce functional proteins, a phenomenon known as the compensatory frameshift. Results In this study, we systematically analyzed publicly available viral genome sequences and identified compensatory frameshift events in hundreds of viral protein-coding sequences. Compensatory frameshift events resulted in large-scale amino acid differences between the compensatory frameshift form and the wild type even though their nucleotide sequences were almost identical. Phylogenetic analyses revealed that the evolutionary distance between proteins with and without a compensatory frameshift were significantly overestimated because amino acid mismatches caused by compensatory frameshifts were counted as substitutions. Further, this could cause compensatory frameshift forms to branch in different locations in the protein and nucleotide trees, which may obscure the correct interpretation of phylogenetic relationships between variant viruses. Conclusions Our results imply that the compensatory frameshift is one of the mechanisms driving the rapid protein evolution of RNA viruses and potentially assisting their host-range expansion and adaptation. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04182-9.
Collapse
Affiliation(s)
- Dongbin Park
- Department of Life Science, Chung-Ang University, Seoul, 06794, South Korea
| | - Yoonsoo Hahn
- Department of Life Science, Chung-Ang University, Seoul, 06794, South Korea.
| |
Collapse
|
29
|
Bhattacharyya T, Sowdhamini R. Genome-wide survey of tyrosine phosphatases in thirty mammalian genomes. Cell Signal 2021; 84:110009. [PMID: 33848580 DOI: 10.1016/j.cellsig.2021.110009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 04/08/2021] [Accepted: 04/09/2021] [Indexed: 11/25/2022]
Abstract
The age of genomics has given us a wealth of information and the tools to study whole genomes. This, in turn, has facilitated genome-wide studies among organisms that were relatively less studied in the pre-genomic era or are non-model organisms. This paves the way to the discovery of interesting evolutionary patterns, which are brought to light by genome-wide surveys of protein superfamilies. Phosphorylation is a post-translational modification that is utilised across all clades of life, and acts as an important signalling switch, regulating several cellular processes. Tyrosine phosphatases, which are found predominantly in eukaryotes, act on phosphorylated tyrosine residues and sometimes on other substrates. Extending on our previous effort to look for tyrosine phosphatases in the human genome, we have looked for sequences of the cysteine-based tyrosine phosphatase superfamily in thirty mammalian genomes from all across Mammalia and validated the sequences with the presence of the signature catalytic motif. Domain architecture annotation, followed by in-depth analysis, revealed interesting taxon-specific patterns such as subtle differences between the protein families in marsupials and early mammals versus placental mammals. Finally, we discuss an interesting case of loss of the tyrosine phosphatase domain from a gene product in the course of eutherian evolution.
Collapse
Affiliation(s)
- Teerna Bhattacharyya
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India.
| |
Collapse
|
30
|
Rix G, Liu CC. Systems for in vivo hypermutation: a quest for scale and depth in directed evolution. Curr Opin Chem Biol 2021; 64:20-6. [PMID: 33784581 DOI: 10.1016/j.cbpa.2021.02.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 02/18/2021] [Accepted: 02/20/2021] [Indexed: 12/14/2022]
Abstract
Traditional approaches to the directed evolution of genes of interest (GOIs) place constraints on the scale of experimentation and depth of evolutionary search reasonably achieved. Engineered genetic systems that dramatically elevate the mutation of target GOIs in vivo relieve these constraints by enabling continuous evolution, affording new strategies in the exploration of sequence space and fitness landscapes for GOIs. We describe various in vivo hypermutation systems for continuous evolution, discuss how different architectures for in vivo hypermutation facilitate evolutionary search scale and depth in their application to problems in protein evolution and engineering, and outline future opportunities for the field.
Collapse
|
31
|
Fogalli GB, Line SRP. Estimating the Influence of Physicochemical and Biochemical Property Indexes on Selection for Amino Acids Usage in Eukaryotic Cells. J Mol Evol 2021; 89:257-268. [PMID: 33760966 DOI: 10.1007/s00239-021-10003-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 03/10/2021] [Indexed: 11/26/2022]
Abstract
Proteins can evolve by accumulating changes on amino acid sequences. These changes are mainly caused by missense mutations on its DNA coding sequences. Mutations with neutral or positive effects on fitness can be maintained while deleterious mutations tend to be eliminated by natural selection. Amino acid changes are influenced by the biophysical, chemical, and biological properties of amino acids. There is a multiplicity of amino acid properties that can influence the function and expression of proteins. Amino acid properties can be expressed into numerical indexes, which can help to predict functional and structural aspects of proteins and allow statistical inferences of selection pressure on amino acid usage. The accuracy of these analyses may be compromised by the existence of several numerical indexes that measure the same amino acid property, and the lack of objective parameters to determine the most accurate and biologically relevant index. In the present study, the gradient consistency test was used in order to estimate the magnitude of directional selection imparted by amino acid biochemical and biophysical properties on protein evolution.
Collapse
Affiliation(s)
- Giovani B Fogalli
- Department of Biosciences, Piracicaba Dental School, University of Campinas, Campinas, Brazil
| | - Sergio R P Line
- Department of Biosciences, Piracicaba Dental School, University of Campinas, Campinas, Brazil.
| |
Collapse
|
32
|
Schäfer GG, Grebe LJ, Depoix F, Lieb B. Hemocyanins of Muricidae: New 'Insights' Unravel an Additional Highly Hydrophilic 800 kDa Mass Within the Molecule. J Mol Evol 2021; 89:62-72. [PMID: 33439299 PMCID: PMC7884596 DOI: 10.1007/s00239-020-09986-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 12/17/2020] [Indexed: 02/03/2023]
Abstract
Hemocyanins are giant oxygen transport proteins that freely float within the hemolymph of most molluscs. The basic quaternary structure of molluscan hemocyanins is a cylindrical decamer with a diameter of 35 nm which is built of 400 kDa subunits. Previously published results, however, showed that one out of two hemocyanin subunits of Rapana venosa encompasses two polypeptides, one 300 kDa and one 100 kDa polypeptide which aggregate to typical 4 MDa and 8 MDa hemocyanin (di-)decamer molecules. It was shown that the polypeptides are bound most probably by one or more cysteine disulfide bridges but it remained open if these polypeptides were coded by one or two genes. Our here presented results clearly showed that both polypeptides are coded by one gene only and that this phenomenon can also be found in the gastropod Nucella lapillus. Thus, it can be defined as clade-specific for Muricidae, a group of the very diverse Caenogastropoda. In addition, we discovered a further deviation of this hemocyanin subunit within both species, namely a region of 340 mainly hydrophilic amino acids (especially histidines and aspartic acids) which have not been identified in any other molluscan hemocyanin, yet. Our results indicate that, within the quaternary structure, these additional amino acids most probably protrude within the inner part of didecamer cylinders, forming a large extra mass of up to 800 kDa. They presumably influence the structure of the protein and may affect the functionality. Thus, these findings reveal further insights into the evolution and structures of gastropod hemocyanins.
Collapse
Affiliation(s)
- Gabriela Giannina Schäfer
- Institute of Molecular Physiology, Johannes Gutenberg-University of Mainz, Johann-Joachim-Becher-Weg 7, 55128, Mainz, Germany.
| | - Lukas Jörg Grebe
- Institute of Molecular Physiology, Johannes Gutenberg-University of Mainz, Johann-Joachim-Becher-Weg 7, 55128, Mainz, Germany
| | - Frank Depoix
- Institute of Molecular Physiology, Johannes Gutenberg-University of Mainz, Johann-Joachim-Becher-Weg 7, 55128, Mainz, Germany
| | - Bernhard Lieb
- Institute of Molecular Physiology, Johannes Gutenberg-University of Mainz, Johann-Joachim-Becher-Weg 7, 55128, Mainz, Germany
| |
Collapse
|
33
|
Guerra Maldonado JF, Vincent AT, Chenal M, Veyrier FJ. CAPRIB: a user-friendly tool to study amino acid changes and selection for the exploration of intra-genus evolution. BMC Genomics 2020; 21:832. [PMID: 33243176 PMCID: PMC7690079 DOI: 10.1186/s12864-020-07232-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Accepted: 11/17/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The evolution of bacteria is shaped by different mechanisms such as mutation, gene deletion, duplication, or insertion of foreign DNA among others. These genetic changes can accumulate in the descendants as a result of natural selection. Using phylogeny and genome comparisons, evolutionary paths can be somehow retraced, with recent events being much easier to detect than older ones. For this reason, multiple tools are available to study the evolutionary events within genomes of single species, such as gene composition alterations, or subtler mutations such as SNPs. However, these tools are generally designed to compare similar genomes and require advanced skills in bioinformatics. We present CAPRIB, a unique tool developed in Java that allows to determine the amino acid changes, at the genus level, that correlate with phenotypic differences between two groups of organisms. RESULTS CAPRIB has a user-friendly graphical interface and uses databases in SQL, making it easy to compare several genomes without the need for programming or thorough knowledge in bioinformatics. This intuitive software narrows down a list of amino acid changes that are concomitant with a given phenotypic divergence at the genus scale. Each permutation found by our software is associated with two already described statistical values that indicate its potential impact on the protein's function, helping the user decide which promising candidates to further investigate. We show that CAPRIB is able to detect already known mutations and uncovers many more, and that this tool can be used to question molecular phylogeny. Finally, we exemplify the utility of CAPRIB by pinpointing amino acid changes that coincided with the emergence of slow-growing mycobacteria from their fast-growing counterparts. The software is freely available at https://github.com/BactSymEvol/Caprib . CONCLUSIONS CAPRIB is a new bioinformatics software aiming to make genus-scale comparisons accessible to all. With its intuitive graphical interface, this tool identifies key amino acid changes concomitant with a phenotypic divergence. By comparing fast and slow-growing mycobacteria, we shed light on evolutionary hotspots, such as the cytokinin pathway, that are interesting candidates for further experimentations.
Collapse
Affiliation(s)
- Juan F Guerra Maldonado
- Institut national de la recherche scientifique, Centre Armand-Frappier Santé Biotechnologie, Bacterial Symbionts Evolution, Laval, Québec, Canada
| | - Antony T Vincent
- Institut national de la recherche scientifique, Centre Armand-Frappier Santé Biotechnologie, Bacterial Symbionts Evolution, Laval, Québec, Canada
| | - Martin Chenal
- Institut national de la recherche scientifique, Centre Armand-Frappier Santé Biotechnologie, Bacterial Symbionts Evolution, Laval, Québec, Canada
| | - Frederic J Veyrier
- Institut national de la recherche scientifique, Centre Armand-Frappier Santé Biotechnologie, Bacterial Symbionts Evolution, Laval, Québec, Canada.
| |
Collapse
|
34
|
Segura E, Mehta A, Marsolais M, Quan XR, Zhao J, Sauvé R, Spafford JD, Parent L. An ancestral MAGUK protein supports the modulation of mammalian voltage-gated Ca 2+ channels through a conserved Ca Vβ-like interface. Biochim Biophys Acta Biomembr 2020; 1862:183439. [PMID: 32814116 DOI: 10.1016/j.bbamem.2020.183439] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 07/11/2020] [Accepted: 08/03/2020] [Indexed: 01/09/2023]
Abstract
Eukaryote voltage-gated Ca2+ channels of the CaV2 channel family are hetero-oligomers formed by the pore-forming CaVα1 protein assembled with auxiliary CaVα2δ and CaVβ subunits. CaVβ subunits are formed by a Src homology 3 (SH3) domain and a guanylate kinase (GK) domain connected through a HOOK domain. The GK domain binds a conserved cytoplasmic region of the pore-forming CaVα1 subunit referred as the "AID". Herein we explored the phylogenetic and functional relationship between CaV channel subunits in distant eukaryotic organisms by investigating the function of a MAGUK protein (XM_004990081) cloned from the choanoflagellate Salpingoeca rosetta (Sro). This MAGUK protein (Sroβ) features SH3 and GK structural domains with a 25% primary sequence identity to mammalian CaVβ. Recombinant expression of its cDNA with mammalian high-voltage activated Ca2+ channel CaV2.3 in mammalian HEK cells produced robust voltage-gated inward Ca2+ currents with typical activation and inactivation properties. Like CaVβ, Sroβ prevents fast degradation of total CaV2.3 proteins in cycloheximide assays. The three-dimensional homology model predicts an interaction between the GK domain of Sroβ and the AID motif of the pore-forming CaVα1 protein. Substitution of AID residues Trp (W386A) and Tyr (Y383A) significantly impaired co-immunoprecipitation of CaV2.3 with Sroβ and functional upregulation of CaV2.3 currents. Likewise, a 6-residue deletion within the GK domain of Sroβ, similar to the locus found in mammalian CaVβ, significantly reduced peak current density. Altogether our data demonstrate that an ancestor MAGUK protein reconstitutes the biophysical and molecular features responsible for channel upregulation by mammalian CaVβ through a minimally conserved molecular interface.
Collapse
Affiliation(s)
- Emilie Segura
- Département de Pharmacologie et Physiologie, Faculté de Médecine, Canada; Centre de Recherche de l'Institut de Cardiologie de Montréal, Université de Montréal, Montréal, Québec H1T 1C8, Canada
| | - Amrit Mehta
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
| | - Mireille Marsolais
- Centre de Recherche de l'Institut de Cardiologie de Montréal, Université de Montréal, Montréal, Québec H1T 1C8, Canada
| | - Xin R Quan
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
| | - Juan Zhao
- Centre de Recherche de l'Institut de Cardiologie de Montréal, Université de Montréal, Montréal, Québec H1T 1C8, Canada
| | - Rémy Sauvé
- Département de Pharmacologie et Physiologie, Faculté de Médecine, Canada
| | - J David Spafford
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
| | - Lucie Parent
- Département de Pharmacologie et Physiologie, Faculté de Médecine, Canada; Centre de Recherche de l'Institut de Cardiologie de Montréal, Université de Montréal, Montréal, Québec H1T 1C8, Canada.
| |
Collapse
|
35
|
Schwersensky M, Rooman M, Pucci F. Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness. BMC Biol 2020; 18:146. [PMID: 33081759 PMCID: PMC7576759 DOI: 10.1186/s12915-020-00870-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/16/2020] [Indexed: 12/31/2022] Open
Abstract
Background How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. Results At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. Conclusion Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.
Collapse
Affiliation(s)
- Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium. .,Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium. .,Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| |
Collapse
|
36
|
Dai Y, Pracana R, Holland PWH. Divergent genes in gerbils: prevalence, relation to GC-biased substitution, and phenotypic relevance. BMC Evol Biol 2020; 20:134. [PMID: 33076817 PMCID: PMC7574485 DOI: 10.1186/s12862-020-01696-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 09/29/2020] [Indexed: 11/25/2022] Open
Abstract
Background Two gerbil species, sand rat (Psammomys obesus) and Mongolian jird (Meriones unguiculatus), can become obese and show signs of metabolic dysregulation when maintained on standard laboratory diets. The genetic basis of this phenotype is unknown. Recently, genome sequencing has uncovered very unusual regions of high guanine and cytosine (GC) content scattered across the sand rat genome, most likely generated by extreme and localized biased gene conversion. A key pancreatic transcription factor PDX1 is encoded by a gene in the most extreme GC-rich region, is remarkably divergent and exhibits altered biochemical properties. Here, we ask if gerbils have proteins in addition to PDX1 that are aberrantly divergent in amino acid sequence, whether they have also become divergent due to GC-biased nucleotide changes, and whether these proteins could plausibly be connected to metabolic dysfunction exhibited by gerbils. Results We analyzed ~ 10,000 proteins with 1-to-1 orthologues in human and rodents and identified 50 proteins that accumulated unusually high levels of amino acid change in the sand rat and 41 in Mongolian jird. We show that more than half of the aberrantly divergent proteins are associated with GC biased nucleotide change and many are in previously defined high GC regions. We highlight four aberrantly divergent gerbil proteins, PDX1, INSR, MEDAG and SPP1, that may plausibly be associated with dietary metabolism. Conclusions We show that through the course of gerbil evolution, many aberrantly divergent proteins have accumulated in the gerbil lineage, and GC-biased nucleotide substitution rather than positive selection is the likely cause of extreme divergence in more than half of these. Some proteins carry putatively deleterious changes that could be associated with metabolic and physiological phenotypes observed in some gerbil species. We propose that these animals provide a useful model to study the ‘tug-of-war’ between natural selection and the excessive accumulation of deleterious substitutions mutations through biased gene conversion.
Collapse
Affiliation(s)
- Yichen Dai
- Department of Zoology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK
| | - Rodrigo Pracana
- Department of Zoology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK
| | - Peter W H Holland
- Department of Zoology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK.
| |
Collapse
|
37
|
Cao A. The Last Secret of Protein Folding: The Real Relationship Between Long-Range Interactions and Local Structures. Protein J 2020; 39:422-33. [PMID: 33040262 DOI: 10.1007/s10930-020-09925-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2020] [Indexed: 01/20/2023]
Abstract
The protein folding problem has been extensively studied for decades, and hundreds of thousands of protein structures have been solved. Yet, how proteins fold from a linear peptide chain to their unique 3D structures is not fully understood. With key clues having emerged unexpectedly from the field of nanoscience, a "Confined Lowest Energy Fragment" (CLEF) hypothesis was proposed. The CLEF hypothesis states that a protein chain can be divided into CLEFs, the semi-independent folding units, by a small number of key residues that form key long-range interactions. The native structure of a CLEF is the lowest energy state under the constraints of the key long-range interactions, but the native structure of the whole protein is not necessary the lowest energy state as Anfinsen's thermodynamic hypothesis suggested. The CLEF hypothesis proposes a unified CLEF mechanism for protein folding, basically a two-step process. In the first step, the favorable enthalpy of CLEFs for native structures quickly brings those residues for the key long-range interactions together, forming intermediates corresponding to the so-called hydrophobic collapse. In the second step, those collapsed key residues shuffle for the right combination to form the native key long-range interactions. The CLEF hypothesis provides a simple solution to all protein folding paradoxes, and proposes a "CLEF Age" or "Stone Age" for the prebiotic evolution of proteins.
Collapse
|
38
|
Landreh M, Jörnvall H. Biological activity versus physiological function of proinsulin C-peptide. Cell Mol Life Sci 2021; 78:1131-8. [PMID: 32959070 DOI: 10.1007/s00018-020-03636-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 08/07/2020] [Accepted: 09/03/2020] [Indexed: 11/06/2022]
Abstract
Proinsulin C-peptide (C-peptide) has drawn much research attention. Even if the peptide has turned out not to be important in the treatment of diabetes, every phase of C-peptide research has changed our view on insulin and peptide hormone biology. The first phase revealed that peptide hormones can be subject to processing, and that their pro-forms may involve regulatory stages. The second phase revealed the possibility that one prohormone could harbor more than one activity, and that the additional activities should be taken into account in the development of hormone-based therapies. In the third phase, a combined view of the evolutionary patterns in hormone biology allowed an assessment of C-peptide´s role in physiology, and of how biological activities and physiological functions are shaped by evolutionary processes. In addition to this distinction, C-peptide research has produced further advances. For example, C-peptide fragments are successfully administered in immunotherapy of type I diabetes, and plasma C-peptide levels remain a standard for measurement of beta cell activity in patients. Even if the concept of C-peptide as a hormone is presently not supported, some of its bioactivities continue to influence our understanding of evolutionary changes of also other peptides.
Collapse
|
39
|
Paladin L, Necci M, Piovesan D, Mier P, Andrade-Navarro MA, Tosatto SCE. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J Struct Biol 2020; 212:107608. [PMID: 32896658 DOI: 10.1016/j.jsb.2020.107608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/30/2022]
Abstract
Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.
Collapse
Affiliation(s)
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padova, Italy
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, Germany
| | | | | |
Collapse
|
40
|
Daughdrill GW. Disorder for Dummies: Functional Mutagenesis of Transient Helical Segments in Disordered Proteins. Methods Mol Biol 2020; 2141:3-20. [PMID: 32696350 DOI: 10.1007/978-1-0716-0524-0_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Most cytosolic eukaryotic proteins contain a mixture of ordered and disordered regions. Disordered regions facilitate cell signaling by concentrating sites for posttranslational modifications and protein-protein interactions into arrays of short linear motifs that can be reorganized by RNA splicing. The evolution of disordered regions looks different from their ordered counterparts. In some cases, selection is focused on maintaining protein binding interfaces and PTM sites, but sequence heterogeneity is common. In other cases, simple properties like charge, length, or end-to-end distance are maintained. Many disordered protein binding sites contain some transient secondary structure that may resemble the structure of the bound state. α-Helical secondary structure is common and a wide range of fractional helicity is observed in different disordered regions. Here we provide a simple protocol to identify transient helical segments and design mutants that can change their structure and function.
Collapse
|
41
|
Tufféry P, de Vries S. The search of sequence variants using a constrained protein evolution simulation approach. Comput Struct Biotechnol J 2020; 18:1790-1799. [PMID: 32695271 PMCID: PMC7355721 DOI: 10.1016/j.csbj.2020.06.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 05/15/2020] [Accepted: 06/09/2020] [Indexed: 10/25/2022] Open
Abstract
Protein engineering or candidate therapeutic peptide optimization are processes in which the identification of relevant sequence variants is critical. Starting from one amino-acid sequence, the choice of the substitutions must meet the objective of not disrupting the structure of the protein, not impacting the main functional properties of the starting entity, while also meeting the condition to enhance some expected property such as thermal stability, resistance to degradation, … Here, we introduce a new approach of sequence evolution that focuses on the objective of not disrupting the structure of the initial protein by embedding a point to point control on the preservation of the local structure at each position in the sequence. For 6 mini-proteins, we find that, starting from a single sequence, our simple approach intrinsically contains information about site-specific rate heterogeneity of substitution, and that it is able to reproduce sequence diversity as can be observed in the sequences available in the Uniref repository. We show that our approach is able to provide information about positions not to substitute and about substitutions not to perform at a given position to maintain structure integrity. Overall, our results demonstrate that point to point preservation of the local structure along a sequence is an important determinant of sequence evolution.
Collapse
Affiliation(s)
- Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, RPBS, F-75013 Paris, France
| | - Sjoerd de Vries
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, RPBS, F-75013 Paris, France
| |
Collapse
|
42
|
Hönigschmid P, Breimann S, Weigl M, Frishman D. AllesTM: predicting multiple structural features of transmembrane proteins. BMC Bioinformatics 2020; 21:242. [PMID: 32532211 PMCID: PMC7291640 DOI: 10.1186/s12859-020-03581-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 06/03/2020] [Indexed: 12/04/2022] Open
Abstract
Background This study is motivated by the following three considerations: a) the physico-chemical properties of transmembrane (TM) proteins are distinctly different from those of globular proteins, necessitating the development of specialized structure prediction techniques, b) for many structural features no specialized predictors for TM proteins are available at all, and c) deep learning algorithms allow to automate the feature engineering process and thus facilitate the development of multi-target methods for predicting several protein properties at once. Results We present AllesTM, an integrated tool to predict almost all structural features of transmembrane proteins that can be extracted from atomic coordinate data. It blends several machine learning algorithms: random forests and gradient boosting machines, convolutional neural networks in their original form as well as those enhanced by dilated convolutions and residual connections, and, finally, long short-term memory architectures. AllesTM outperforms other available methods in predicting residue depth in the membrane, flexibility, topology, relative solvent accessibility in its bound state, while in torsion angles, secondary structure and monomer relative solvent accessibility prediction it lags only slightly behind the currently leading technique SPOT-1D. High accuracy on a multitude of prediction targets and easy installation make AllesTM a one-stop shop for many typical problems in the structural bioinformatics of transmembrane proteins. Conclusions In addition to presenting a highly accurate prediction method and eliminating the need to install and maintain many different software tools, we also provide a comprehensive overview of the impact of different machine learning algorithms and parameter choices on the prediction performance. AllesTM is freely available at https://github.com/phngs/allestm.
Collapse
Affiliation(s)
- Peter Hönigschmid
- Department of Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Maximus-von-Imhof-Forum 3, 85354, Freising, Germany
| | - Stephan Breimann
- Department of Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Maximus-von-Imhof-Forum 3, 85354, Freising, Germany
| | - Martina Weigl
- Department of Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Maximus-von-Imhof-Forum 3, 85354, Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Maximus-von-Imhof-Forum 3, 85354, Freising, Germany.
| |
Collapse
|
43
|
Abstract
Proteins are commonly used as molecular targets against pathogens such as viruses and bacteria. However, pathogens can evolve rapidly permitting their populations to increase in protein diversity over time and thus escape to the activity of a molecular therapy. Subsequently, in order to design more durable and robust therapies as well as to understand viral evolution in a host and subsequent transmission, it is central to understand the evolution of pathogen proteins. This understanding can enable the detection of protein regions that can be potential targets for therapies and predict the emergence of molecular resistance against therapies. In this direction, two articles published recently in the Journal of Molecular Evolution investigated the evolution of proteomes of diverse flaviviruses, including Zika virus, Dengue virus and West Nile virus. Here I discuss the importance of considering the evolution of viral proteins, with the use of as realistic as possible models and methods that mimic protein evolution, to improve the design of antiviral therapies.
Collapse
|
44
|
Merski M, Młynarczyk K, Ludwiczak J, Skrzeczkowski J, Dunin-Horkawicz S, Górna MW. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. BMC Bioinformatics 2020; 21:179. [PMID: 32381046 PMCID: PMC7204011 DOI: 10.1186/s12859-020-3493-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/15/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. RESULTS Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. CONCLUSIONS Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.
Collapse
Affiliation(s)
- Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Krzysztof Młynarczyk
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Jakub Skrzeczkowski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Stanisław Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Maria W. Górna
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| |
Collapse
|
45
|
Abstract
Knowledge of the distribution of fitness effects (DFE) of mutations is critical to the understanding of protein evolution. Here, we describe methods for large-scale, systematic measurements of the DFE using growth competition and deep mutational scanning. We discuss techniques for producing comprehensive libraries of gene variants as well as provide necessary considerations for designing these experiments. Using these methods, we have constructed libraries containing over 18,000 variants, measured fitness effects of these mutations by deep mutational scanning, and verified the presence of fitness effects in individual variants. Our methods provide a high-throughput protocol for measuring biological fitness effects of mutations and the dependence of fitness effects on the environment.
Collapse
|
46
|
Hehenberger E, Eitel M, Fortunato SAV, Miller DJ, Keeling PJ, Cahill MA. Early eukaryotic origins and metazoan elaboration of MAPR family proteins. Mol Phylogenet Evol 2020; 148:106814. [PMID: 32278076 DOI: 10.1016/j.ympev.2020.106814] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 03/24/2020] [Accepted: 04/01/2020] [Indexed: 01/01/2023]
Abstract
The membrane-associated progesterone receptor (MAPR) family consists of heme-binding proteins containing a cytochrome b5 (cytb5) domain characterized by the presence of a MAPR-specific interhelical insert region (MIHIR) between helices 3 and 4 of the canonical cytb5-domain fold. Animals possess three MAPR genes (PGRMC-like, Neuferricin and Neudesin). Here we show that all three animal MAPR genes were already present in the common ancestor of the opisthokonts (comprising animals and fungi as well as related single-celled taxa). All three MAPR genes acquired extensions C-terminal to the cytb5 domain, either before or with the evolution of animals. The archetypical MAPR protein, progesterone receptor membrane component 1 (PGRMC1), contains phosphorylated tyrosines Y139 and Y180. The combination of Y139/Y180 appeared in the common ancestor of cnidarians and bilaterians, along with an early embryological organizer and synapsed neurons, and is strongly conserved in all bilaterian animals. A predicted protein interaction motif in the PGRMC1 MIHIR is potentially regulated by Y139 phosphorylation. A multilayered model of animal MAPR function acquisition includes some pre-metazoan functions (e.g., heme binding and cytochrome P450 interactions) and some acquired animal-specific functions that involve regulation of strongly conserved protein interaction motifs acquired by animals (Metazoa). This study provides a conceptual framework for future studies, against which especially PGRMC1's multiple functions can perhaps be stratified and functionally dissected.
Collapse
Affiliation(s)
- Elisabeth Hehenberger
- Department of Botany, University of British Columbia, 3529-6270 University Boulevard, Vancouver, BC V6T 1Z4, Canada
| | - Michael Eitel
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Sofia A V Fortunato
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD 4811, Australia
| | - David J Miller
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD 4811, Australia
| | - Patrick J Keeling
- Department of Botany, University of British Columbia, 3529-6270 University Boulevard, Vancouver, BC V6T 1Z4, Canada
| | - Michael A Cahill
- School of Biomedical Sciences, Charles Sturt University, Wagga Wagga, NSW 2678, Australia; ACRF Department of Cancer Biology and Therapeutics, The John Curtin School of Medical Research, Canberra, ACT 2601, Australia.
| |
Collapse
|
47
|
Heames B, Schmitz J, Bornberg-Bauer E. A Continuum of Evolving De Novo Genes Drives Protein-Coding Novelty in Drosophila. J Mol Evol 2020; 88:382-398. [PMID: 32253450 PMCID: PMC7162840 DOI: 10.1007/s00239-020-09939-z] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 03/13/2020] [Indexed: 12/13/2022]
Abstract
Orphan genes, lacking detectable homologs in outgroup species, typically represent 10-30% of eukaryotic genomes. Efforts to find the source of these young genes indicate that de novo emergence from non-coding DNA may in part explain their prevalence. Here, we investigate the roots of orphan gene emergence in the Drosophila genus. Across the annotated proteomes of twelve species, we find 6297 orphan genes within 4953 taxon-specific clusters of orthologs. By inferring the ancestral DNA as non-coding for between 550 and 2467 (8.7-39.2%) of these genes, we describe for the first time how de novo emergence contributes to the abundance of clade-specific Drosophila genes. In support of them having functional roles, we show that de novo genes have robust expression and translational support. However, the distinct nucleotide sequences of de novo genes, which have characteristics intermediate between intergenic regions and conserved genes, reflect their recent birth from non-coding DNA. We find that de novo genes encode more disordered proteins than both older genes and intergenic regions. Together, our results suggest that gene emergence from non-coding DNA provides an abundant source of material for the evolution of new proteins. Following gene birth, gradual evolution over large evolutionary timescales moulds sequence properties towards those of conserved genes, resulting in a continuum of properties whose starting points depend on the nucleotide sequences of an initial pool of novel genes.
Collapse
Affiliation(s)
- Brennen Heames
- Institute for Evolution and Biodiversity, 48149, Münster, Germany
| | - Jonathan Schmitz
- Institute for Evolution and Biodiversity, 48149, Münster, Germany
| | | |
Collapse
|
48
|
Abstract
BACKGROUND Studying site-specific amino acid frequencies by eye can reveal biologically significant variability and lineage-specific adaptation. This so-called 'sequence gazing' often informs bioinformatics and experimental research. But it is important to also account for the underlying phylogeny, since similarities may be due to common descent rather than selection pressure, and because it is important to distinguish between founder effects and convergent evolution. We set out to combine phylogenetic and sequence data to produce evolutionarily insightful visualisations. RESULTS We present ChromaClade, a convenient tool with a graphical user-interface that works in concert with popular tree viewers to produce colour-annotated phylogenies highlighting residues found in each taxon and at each site in a sequence alignment. Colouring branches according to residues found at descendent tips also quickly identifies lineage-specific residues and those internal branches where key substitutions have occurred. We demonstrate applications of ChromaClade to human immunodeficiency virus and influenza A virus datasets, illustrating cases of conservative, adaptive and convergent evolution. CONCLUSIONS We find this to be a powerful approach for visualising site-wise residue distributions and detecting evolutionary patterns, especially in large datasets. ChromaClade is available for Windows, macOS and Unix or Linux; program executables and source code are available at github.com/chrismonit/chroma_clade .
Collapse
Affiliation(s)
- Christopher Monit
- Division of Infection and Immunity, University College London, London, WC1E 6BT, UK.
| | - Richard A Goldstein
- Division of Infection and Immunity, University College London, London, WC1E 6BT, UK
| | - Greg J Towers
- Division of Infection and Immunity, University College London, London, WC1E 6BT, UK
| |
Collapse
|
49
|
Pouvreau B, Fenske R, Ivanova A, Murcha MW, Mylne JS. An interstitial peptide is readily processed from within seed proteins. Plant Sci 2019; 285:175-183. [PMID: 31203882 DOI: 10.1016/j.plantsci.2019.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 04/25/2019] [Accepted: 05/02/2019] [Indexed: 06/09/2023]
Abstract
The importance of de novo protein evolution is apparent, but most examples are de novo coding transcripts evolving from silent or non-coding DNA. The peptide macrocycle SunFlower Trypsin Inhibitor 1 (SFTI-1) evolved over 45 million years from genetic expansion within the N-terminal 'discarded' region of an ancestral seed albumin precursor. SFTI-1 and its adjacent albumin are both processed into separate, mature forms by asparaginyl endopeptidase (AEP). Here to determine whether the evolution of SFTI-1 in a latent region of its precursor was critical, we used a transgene approach in A. thaliana analysed by peptide mass spectrometry and RT-qPCR. SFTI could emerge from alternative locations within preproalbumin as well as emerge with precision from unrelated seed proteins via AEP-processing. SFTI production was possible with the adjacent albumin, but peptide levels dropped greatly without the albumin. The ability for SFTI to be processed from multiple sequence contexts and different proteins suggests that to make peptide, it was not crucial for the genetic expansion that gave rise to SFTI and its family to be within a latent protein region. Interstitial peptides, evolving like SFTI within existing proteins, might be more widespread and as a mechanism, SFTI exemplifies a stable, new, functional peptide that did not need a new gene to evolve de novo.
Collapse
Affiliation(s)
- Benjamin Pouvreau
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Ricarda Fenske
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Aneta Ivanova
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Monika W Murcha
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Joshua S Mylne
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia; The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia.
| |
Collapse
|
50
|
Solis AD. Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds. BMC Evol Biol 2019; 19:158. [PMID: 31362700 PMCID: PMC6668081 DOI: 10.1186/s12862-019-1464-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 06/19/2019] [Indexed: 11/10/2022] Open
Abstract
Background There is wide agreement that only a subset of the twenty standard amino acids existed prebiotically in sufficient concentrations to form functional polypeptides. We ask how this subset, postulated as {A,D,E,G,I,L,P,S,T,V}, could have formed structures stable enough to found metabolic pathways. Inspired by alphabet reduction experiments, we undertook a computational analysis to measure the structural coding behavior of sequences simplified by reduced alphabets. We sought to discern characteristics of the prebiotic set that would endow it with unique properties relevant to structure, stability, and folding. Results Drawing on a large dataset of single-domain proteins, we employed an information-theoretic measure to assess how well the prebiotic amino acid set preserves fold information against all other possible ten-amino acid sets. An extensive virtual mutagenesis procedure revealed that the prebiotic set excellently preserves sequence-dependent information regarding both backbone conformation and tertiary contact matrix of proteins. We observed that information retention is fold-class dependent: the prebiotic set sufficiently encodes the structure space of α/β and α + β folds, and to a lesser extent, of all-α and all-β folds. The prebiotic set appeared insufficient to encode the small proteins. Assessing how well the prebiotic set discriminates native vs. incorrect sequence-structure matches, we found that α/β and α + β folds exhibit more pronounced energy gaps with the prebiotic set than with nearly all alternatives. Conclusions The prebiotic set optimally encodes local backbone structures that appear in the folded environment and near-optimally encodes the tertiary contact matrix of extant proteins. The fold-class-specific patterns observed from our structural analysis confirm the postulated timeline of fold appearance in proteogenesis derived from proteomic sequence analyses. Polypeptides arising in a prebiotic environment will likely form α/β and α + β-like folds if any at all. We infer that the progressive expansion of the alphabet allowed the increased conformational stability and functional specificity of later folds, including all-α, all-β, and small proteins. Our results suggest that prebiotic sequences are amenable to mutations that significantly lower native conformational energies and increase discrimination amidst incorrect folds. This property may have assisted the genesis of functional proto-enzymes prior to the expansion of the full amino acid alphabet.
Collapse
Affiliation(s)
- Armando D Solis
- Biological Sciences Department, New York City College of Technology (City Tech), The City University of New York (CUNY), 285 Jay Street, Brooklyn, NY, 11201, USA.
| |
Collapse
|