1
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
2
|
Ferreiro D, Khalil R, Sousa SF, Arenas M. Substitution Models of Protein Evolution with Selection on Enzymatic Activity. Mol Biol Evol 2024; 41:msae026. [PMID: 38314876 PMCID: PMC10873502 DOI: 10.1093/molbev/msae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/07/2024] Open
Abstract
Substitution models of evolution are necessary for diverse evolutionary analyses including phylogenetic tree and ancestral sequence reconstructions. At the protein level, empirical substitution models are traditionally used due to their simplicity, but they ignore the variability of substitution patterns among protein sites. Next, in order to improve the realism of the modeling of protein evolution, a series of structurally constrained substitution models were presented, but still they usually ignore constraints on the protein activity. Here, we present a substitution model of protein evolution with selection on both protein structure and enzymatic activity, and that can be applied to phylogenetics. In particular, the model considers the binding affinity of the enzyme-substrate complex as well as structural constraints that include the flexibility of structural flaps, hydrogen bonds, amino acids backbone radius of gyration, and solvent-accessible surface area that are quantified through molecular dynamics simulations. We applied the model to the HIV-1 protease and evaluated it by phylogenetic likelihood in comparison with the best-fitting empirical substitution model and a structurally constrained substitution model that ignores the enzymatic activity. We found that accounting for selection on the protein activity improves the fitting of the modeled functional regions with the real observations, especially in data with high molecular identity, which recommends considering constraints on the protein activity in the development of substitution models of evolution.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Ruqaiya Khalil
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Sergio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, 4200-319 Porto, Portugal
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
3
|
Dong W, Li H, Li Y, Wang Y, Dai L, Wang S. Characterization of active peptides derived from three leeches and comparison of their anti-thrombotic mechanisms using the tail vein thrombosis model in mice and metabonomics. Front Pharmacol 2024; 14:1324418. [PMID: 38333223 PMCID: PMC10851270 DOI: 10.3389/fphar.2023.1324418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/26/2023] [Indexed: 02/10/2024] Open
Abstract
Background and aims: The increasing incidence of cardiovascular diseases has created an urgent need for safe and effective anti-thrombotic agents. Leech, as a traditional Chinese medicine, has the effect of promoting blood circulation and removing blood stasis, but its real material basis and mechanism of action for the treatment of diseases such as blood stasis and thrombosis have not been reported. Methods: In this study, Whitmania Pigra Whitman (WPW), Hirudo nipponica Whitman (HNW) and Whitmania acranutata Whitman (WAW) were hydrolyzed by biomimetic enzymatic hydrolysis to obtain the active peptides of WPW (APP), the active peptides of HNW (APH) and the active peptides of WAW (APA), respectively. Then their structures were characterized by sykam amino acid analyzer, fourier transform infrared spectrometer (FT-IR), circular dichroism (CD) spectrometer and LC-MS. Next, the anti-thrombotic activities of APP, APH and APA were determined by carrageenan-induced tail vein thrombosis model in mice, and the anti-thrombotic mechanisms of high-dose APP group (HAPP), high-dose APH group (HAPH) and high-dose APA group (HAPA) were explored based on UHPLC-Q-Exactive Orbitrap mass spectrometry. Results: The results showed that the amino acid composition of APP, APH and APA was consistent, and the proportion of each amino acid was few different. The results of FT-IR and CD showed that there were no significant differences in the proportion of secondary structures (such as β-sheet and random coil) and infrared absorption peaks between APP, APH and APA. Mass spectrometry data showed that there were 43 common peptides in APP, APH and APA, indicating that the three have common material basis. APP, APH and APA could significantly inhibit platelet aggregation, reduce black-tail length, whole blood viscosity (WBV), plasma viscosity (PV), and Fibrinogen (FIB), and prolong coagulation time, including activated partial thrombin time (APTT), prothrombin time (PT) and thrombin time (TT). In addition, 24 metabolites were identified as potential biomarkers associated with thrombosis development. Among these, 19, 23, and 20 metabolites were significantly normalized after administration of HAPP, HAPH, and HAPA in the mice, respectively. Furthermore, the intervention mechanism of HAPP, HAPH and HAPA on tail vein thrombosis mainly involved in linoleic acid metabolism, primary bile acid biosynthesis and ether lipid metabolism. Conclusion: Our findings suggest that APP, APH and APA can exert their anti-blood stasis and anti-thrombotic activities by interfering with disordered metabolic pathways in vivo, and there is no significant difference in their efficacies.
Collapse
Affiliation(s)
- Weichao Dong
- School of Pharmacy, Binzhou Medical University, Yantai, China
- School of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Huajian Li
- School of Pharmacy, Binzhou Medical University, Yantai, China
- School of Pharmacy, ZheJiang Chinese Medicial University, Hangzhou, China
| | - Yanan Li
- School of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yuqing Wang
- School of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Long Dai
- School of Pharmacy, Binzhou Medical University, Yantai, China
| | - Shaoping Wang
- School of Pharmacy, Binzhou Medical University, Yantai, China
| |
Collapse
|
4
|
Cao W, Wu LY, Xia XY, Chen X, Wang ZX, Pan XM. A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins. Sci Rep 2023; 13:20304. [PMID: 37985846 PMCID: PMC10662474 DOI: 10.1038/s41598-023-47496-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023] Open
Abstract
Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.
Collapse
Affiliation(s)
- Wei Cao
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Lu-Yun Wu
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xia-Yu Xia
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xiang Chen
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Zhi-Xin Wang
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| | - Xian-Ming Pan
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
5
|
Abstract
Much of the higher-order phylogeny of eukaryotes is well resolved, but the root remains elusive. We assembled a dataset of 183 eukaryotic proteins of archaeal ancestry to test this root. The resulting phylogeny identifies four lineages of eukaryotes currently classified as "Excavata" branching separately at the base of the tree. Thus, Parabasalia appear as the first major branch of eukaryotes followed sequentially by Fornicata, Preaxostyla, and Discoba. All four excavate branch points receive full statistical support from analyses with commonly used evolutionary models, a protein structure partition model that we introduce here, and various controls for deep phylogeny artifacts. The absence of aerobic mitochondria in Parabasalia, Fornicata, and Preaxostyla suggests that modern eukaryotes arose under anoxic conditions, probably much earlier than expected, and without the benefit of mitochondrial respiration.
Collapse
|
6
|
Spence MA, Kaczmarski JA, Saunders JW, Jackson CJ. Ancestral sequence reconstruction for protein engineers. Curr Opin Struct Biol 2021; 69:131-141. [PMID: 34023793 DOI: 10.1016/j.sbi.2021.04.001] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/22/2021] [Accepted: 04/07/2021] [Indexed: 12/11/2022]
Abstract
In addition to its value in the study of molecular evolution, ancestral sequence reconstruction (ASR) has emerged as a useful methodology for engineering proteins with enhanced properties. Proteins generated by ASR often exhibit unique or improved activity, stability, and/or promiscuity, all of which are properties that are valued by protein engineers. Comparison between extant proteins and evolutionary intermediates generated by ASR also allows protein engineers to identify substitutions that have contributed to functional innovation or diversification within protein families. As ASR becomes more widely adopted as a protein engineering approach, it is important to understand the applications, limitations, and recent developments of this technique. This review highlights recent exemplifications of ASR, as well as technical aspects of the reconstruction process that are relevant to protein engineering.
Collapse
Affiliation(s)
- Matthew A Spence
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Joe A Kaczmarski
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Jake W Saunders
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Colin J Jackson
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia; ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia; ARC Centre of Excellence for Innovations in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia.
| |
Collapse
|
7
|
Tufféry P, de Vries S. The search of sequence variants using a constrained protein evolution simulation approach. Comput Struct Biotechnol J 2020; 18:1790-1799. [PMID: 32695271 PMCID: PMC7355721 DOI: 10.1016/j.csbj.2020.06.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 05/15/2020] [Accepted: 06/09/2020] [Indexed: 10/25/2022] Open
Abstract
Protein engineering or candidate therapeutic peptide optimization are processes in which the identification of relevant sequence variants is critical. Starting from one amino-acid sequence, the choice of the substitutions must meet the objective of not disrupting the structure of the protein, not impacting the main functional properties of the starting entity, while also meeting the condition to enhance some expected property such as thermal stability, resistance to degradation, … Here, we introduce a new approach of sequence evolution that focuses on the objective of not disrupting the structure of the initial protein by embedding a point to point control on the preservation of the local structure at each position in the sequence. For 6 mini-proteins, we find that, starting from a single sequence, our simple approach intrinsically contains information about site-specific rate heterogeneity of substitution, and that it is able to reproduce sequence diversity as can be observed in the sequences available in the Uniref repository. We show that our approach is able to provide information about positions not to substitute and about substitutions not to perform at a given position to maintain structure integrity. Overall, our results demonstrate that point to point preservation of the local structure along a sequence is an important determinant of sequence evolution.
Collapse
Affiliation(s)
- Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, RPBS, F-75013 Paris, France
| | - Sjoerd de Vries
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, RPBS, F-75013 Paris, France
| |
Collapse
|
8
|
Weber CC, Perron U, Casey D, Yang Z, Goldman N. Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space. Syst Biol 2020; 70:21-32. [PMID: 32353118 PMCID: PMC7744038 DOI: 10.1093/sysbio/syaa036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 03/20/2020] [Accepted: 03/30/2020] [Indexed: 11/14/2022] Open
Abstract
How can we best learn the history of a protein’s evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be the one that combines the convenience of easily available input data with the ability to return useful parameter estimates. For example, we might be interested in a measure of the strength of selection (typically obtained using a codon model) or an ancestral structure (obtained using structural modeling based on inferred amino acid sequence and side chain configuration). But what if data in the relevant state-space are not readily available? We show that it is possible to obtain accurate estimates of the outputs of interest using an established method for handling missing data. Encoding observed characters in an alignment as ambiguous representations of characters in a larger state-space allows the application of models with the desired features to data that lack the resolution that is normally required. This strategy is viable because the evolutionary path taken through the observed space contains information about states that were likely visited in the “unseen” state-space. To illustrate this, we consider two examples with amino acid sequences as input. We show that \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$$\omega$$\end{document}, a parameter describing the relative strength of selection on nonsynonymous and synonymous changes, can be estimated in an unbiased manner using an adapted version of a standard 61-state codon model. Using simulated and empirical data, we find that ancestral amino acid side chain configuration can be inferred by applying a 55-state empirical model to 20-state amino acid data. Where feasible, combining inputs from both ambiguity-coded and fully resolved data improves accuracy. Adding structural information to as few as 12.5% of the sequences in an amino acid alignment results in remarkable ancestral reconstruction performance compared to a benchmark that considers the full rotamer state information. These examples show that our methods permit the recovery of evolutionary information from sequences where it has previously been inaccessible. [Ancestral reconstruction; natural selection; protein structure; state-spaces; substitution models.]
Collapse
Affiliation(s)
- Claudia C Weber
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Umberto Perron
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Dearbhaile Casey
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ziheng Yang
- Department of Genetics, University College London, London WC1E 6BT, UK
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
9
|
Integrated structural and evolutionary analysis reveals common mechanisms underlying adaptive evolution in mammals. Proc Natl Acad Sci U S A 2020; 117:5977-5986. [PMID: 32123117 DOI: 10.1073/pnas.1916786117] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Understanding the molecular basis of adaptation to the environment is a central question in evolutionary biology, yet linking detected signatures of positive selection to molecular mechanisms remains challenging. Here we demonstrate that combining sequence-based phylogenetic methods with structural information assists in making such mechanistic interpretations on a genomic scale. Our integrative analysis shows that positively selected sites tend to colocalize on protein structures and that positively selected clusters are found in functionally important regions of proteins, indicating that positive selection can contravene the well-known principle of evolutionary conservation of functionally important regions. This unexpected finding, along with our discovery that positive selection acts on structural clusters, opens previously unexplored strategies for the development of better models of protein evolution. Remarkably, proteins where we detect the strongest evidence of clustering belong to just two functional groups: Components of immune response and metabolic enzymes. This gives a coherent picture of pathogens and xenobiotics as important drivers of adaptive evolution of mammals.
Collapse
|