1
|
Zou HT, Ji BY, Xie XL. A multi-source molecular network representation model for protein-protein interactions prediction. Sci Rep 2024; 14:6184. [PMID: 38485942 PMCID: PMC10940665 DOI: 10.1038/s41598-024-56286-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open
Abstract
The prediction of potential protein-protein interactions (PPIs) is a critical step in decoding diseases and understanding cellular mechanisms. Traditional biological experiments have identified plenty of potential PPIs in recent years, but this problem is still far from being solved. Hence, there is urgent to develop computational models with good performance and high efficiency to predict potential PPIs. In this study, we propose a multi-source molecular network representation learning model (called MultiPPIs) to predict potential protein-protein interactions. Specifically, we first extract the protein sequence features according to the physicochemical properties of amino acids by utilizing the auto covariance method. Second, a multi-source association network is constructed by integrating the known associations among miRNAs, proteins, lncRNAs, drugs, and diseases. The graph representation learning method, DeepWalk, is adopted to extract the multisource association information of proteins with other biomolecules. In this way, the known protein-protein interaction pairs can be represented as a concatenation of the protein sequence and the multi-source association features of proteins. Finally, the Random Forest classifier and corresponding optimal parameters are used for training and prediction. In the results, MultiPPIs obtains an average 86.03% prediction accuracy with 82.69% sensitivity at the AUC of 93.03% under five-fold cross-validation. The experimental results indicate that MultiPPIs has a good prediction performance and provides valuable insights into the field of potential protein-protein interactions prediction. MultiPPIs is free available at https://github.com/jiboyalab/multiPPIs .
Collapse
Affiliation(s)
- Hai-Tao Zou
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541000, China
| | - Bo-Ya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China.
| | - Xiao-Lan Xie
- College of Information Science and Engineering, Guilin University of Technology, Guilin, 541000, China.
| |
Collapse
|
2
|
Kiefl E, Esen OC, Miller SE, Kroll KL, Willis AD, Rappé MS, Pan T, Eren AM. Structure-informed microbial population genetics elucidate selective pressures that shape protein evolution. SCIENCE ADVANCES 2023; 9:eabq4632. [PMID: 36812328 DOI: 10.1126/sciadv.abq4632] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 01/18/2023] [Indexed: 06/18/2023]
Abstract
Comprehensive sampling of natural genetic diversity with metagenomics enables highly resolved insights into the interplay between ecology and evolution. However, resolving adaptive, neutral, or purifying processes of evolution from intrapopulation genomic variation remains a challenge, partly due to the sole reliance on gene sequences to interpret variants. Here, we describe an approach to analyze genetic variation in the context of predicted protein structures and apply it to a marine microbial population within the SAR11 subclade 1a.3.V, which dominates low-latitude surface oceans. Our analyses reveal a tight association between genetic variation and protein structure. In a central gene in nitrogen metabolism, we observe decreased occurrence of nonsynonymous variants from ligand-binding sites as a function of nitrate concentrations, revealing genetic targets of distinct evolutionary pressures maintained by nutrient availability. Our work yields insights into the governing principles of evolution and enables structure-aware investigations of microbial population genetics.
Collapse
Affiliation(s)
- Evan Kiefl
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Ozcan C Esen
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Samuel E Miller
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543, USA
| | - Kourtney L Kroll
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Amy D Willis
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Michael S Rappé
- Hawai'i Institute of Marine Biology, University of Hawai'i at Mānoa, Kāne'ohe, HI 96822, USA
| | - Tao Pan
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA
| | - A Murat Eren
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543, USA
- Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, Oldenburg, Germany
- Alfred Wegener Institute for Polar and Marine Research, Bremerhaven, Germany
- Helmholtz Institute for Functional Marine Biodiversity, Oldenburg, Germany
| |
Collapse
|
3
|
Del Amparo R, González-Vázquez LD, Rodríguez-Moure L, Bastolla U, Arenas M. Consequences of Genetic Recombination on Protein Folding Stability. J Mol Evol 2023; 91:33-45. [PMID: 36463317 PMCID: PMC9849154 DOI: 10.1007/s00239-022-10080-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Genetic recombination is a common evolutionary mechanism that produces molecular diversity. However, its consequences on protein folding stability have not attracted the same attention as in the case of point mutations. Here, we studied the effects of homologous recombination on the computationally predicted protein folding stability for several protein families, finding less detrimental effects than we previously expected. Although recombination can affect multiple protein sites, we found that the fraction of recombined proteins that are eliminated by negative selection because of insufficient stability is not significantly larger than the corresponding fraction of proteins produced by mutation events. Indeed, although recombination disrupts epistatic interactions, the mean stability of recombinant proteins is not lower than that of their parents. On the other hand, the difference of stability between recombined proteins is amplified with respect to the parents, promoting phenotypic diversity. As a result, at least one third of recombined proteins present stability between those of their parents, and a substantial fraction have higher or lower stability than those of both parents. As expected, we found that parents with similar sequences tend to produce recombined proteins with stability close to that of the parents. Finally, the simulation of protein evolution along the ancestral recombination graph with empirical substitution models commonly used in phylogenetics, which ignore constraints on protein folding stability, showed that recombination favors the decrease of folding stability, supporting the convenience of adopting structurally constrained models when possible for inferences of protein evolutionary histories with recombination.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Luis Daniel González-Vázquez
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Laura Rodríguez-Moure
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Ugo Bastolla
- Centre for Molecular Biology Severo Ochoa (CSIC-UAM), 28049 Madrid, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain ,Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
4
|
Ferreiro D, Khalil R, Gallego MJ, Osorio NS, Arenas M. The evolution of the HIV-1 protease folding stability. Virus Evol 2022; 8:veac115. [PMID: 36601299 PMCID: PMC9802575 DOI: 10.1093/ve/veac115] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/10/2022] [Accepted: 12/03/2022] [Indexed: 12/11/2022] Open
Abstract
The evolution of structural proteins is generally constrained by the folding stability. However, little is known about the particular capacity of viral proteins to accommodate mutations that can potentially affect the protein stability and, in general, the evolution of the protein stability over time. As an illustrative model case, here, we investigated the evolution of the stability of the human immunodeficiency virus (HIV-1) protease (PR), which is a common HIV-1 drug target, under diverse evolutionary scenarios that include (1) intra-host virus evolution in a cohort of seventy-five patients sampled over time, (2) intra-host virus evolution sampled before and after specific PR-based treatments, and (3) inter-host evolution considering extant and ancestral (reconstructed) PR sequences from diverse HIV-1 subtypes. We also investigated the specific influence of currently known HIV-1 PR resistance mutations on the PR folding stability. We found that the HIV-1 PR stability fluctuated over time within a constant and wide range in any studied evolutionary scenario, accommodating multiple mutations that partially affected the stability while maintaining activity. We did not identify relationships between change of PR stability and diverse clinical parameters such as viral load, CD4+ T-cell counts, and a surrogate of time from infection. Counterintuitively, we predicted that nearly half of the studied HIV-1 PR resistance mutations do not significantly decrease stability, which, together with compensatory mutations, would allow the protein to adapt without requiring dramatic stability changes. We conclude that the HIV-1 PR presents a wide structural plasticity to acquire molecular adaptations without affecting the overall evolution of stability.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, Vigo 36310, Spain,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, Vigo 36310, Spain
| | - Ruqaiya Khalil
- CINBIO, Universidade de Vigo, Vigo 36310, Spain,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, Vigo 36310, Spain
| | - María J Gallego
- CINBIO, Universidade de Vigo, Vigo 36310, Spain,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, Vigo 36310, Spain
| | - Nuno S Osorio
- Life and Health Sciences Research Institute, School of Medicine, University of Minho, Braga 4710-057, Portugal,ICVS/3Bs—PT Government Associate Laboratory, Guimarães 4806-909, Portugal
| | | |
Collapse
|
5
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
6
|
Fer E, McGrath KM, Guy L, Hockenberry AJ, Kaçar B. Early divergence of translation initiation and elongation factors. Protein Sci 2022; 31:e4393. [PMID: 36250475 PMCID: PMC9601768 DOI: 10.1002/pro.4393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/05/2022] [Accepted: 07/11/2022] [Indexed: 11/18/2022]
Abstract
Protein translation is a foundational attribute of all living cells. The translation function carried out by the ribosome critically depends on an assortment of protein interaction partners, collectively referred to as the translation machinery. Various studies suggest that the diversification of the translation machinery occurred prior to the last universal common ancestor, yet it is unclear whether the predecessors of the extant translation machinery factors were functionally distinct from their modern counterparts. Here we reconstructed the shared ancestral trajectory and subsequent evolution of essential translation factor GTPases, elongation factor EF-Tu (aEF-1A/eEF-1A), and initiation factor IF2 (aIF5B/eIF5B). Based upon their similar functions and structural homologies, it has been proposed that EF-Tu and IF2 emerged from an ancient common ancestor. We generated the phylogenetic tree of IF2 and EF-Tu proteins and reconstructed ancestral sequences corresponding to the deepest nodes in their shared evolutionary history, including the last common IF2 and EF-Tu ancestor. By identifying the residue and domain substitutions, as well as structural changes along the phylogenetic history, we developed an evolutionary scenario for the origins, divergence and functional refinement of EF-Tu and IF2 proteins. Our analyses suggest that the common ancestor of IF2 and EF-Tu was an IF2-like GTPase protein. Given the central importance of the translation machinery to all cellular life, its earliest evolutionary constraints and trajectories are key to characterizing the universal constraints and capabilities of cellular evolution.
Collapse
Affiliation(s)
- Evrim Fer
- Department of BacteriologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- Microbiology Doctoral Training ProgramUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- NASA Center for Early Life and EvolutionUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Kaitlyn M. McGrath
- Department of BacteriologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- NASA Center for Early Life and EvolutionUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- Department of Molecular and Cellular BiologyUniversity of ArizonaTucsonArizonaUSA
| | - Lionel Guy
- Department of Medical Biochemistry and Microbiology, Science for Life LaboratoryUppsala UniversityUppsalaSweden
| | - Adam J. Hockenberry
- Department of Integrative BiologyThe University of Texas at AustinAustinTexasUSA
| | - Betül Kaçar
- Department of BacteriologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- NASA Center for Early Life and EvolutionUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| |
Collapse
|
7
|
Abstract
The reconstruction of genetic material of ancestral organisms constitutes a powerful application of evolutionary biology. A fundamental step in this inference is the ancestral sequence reconstruction (ASR), which can be performed with diverse methodologies implemented in computer frameworks. However, most of these methodologies ignore evolutionary properties frequently observed in microbes, such as genetic recombination and complex selection processes, that can bias the traditional ASR. From a practical perspective, here I review methodologies for the reconstruction of ancestral DNA and protein sequences, with particular focus on microbes, and including biases, recommendations, and software implementations. I conclude that microbial ASR is a complex analysis that should be carefully performed and that there is a need for methods to infer more realistic ancestral microbial sequences.
Collapse
Affiliation(s)
- Miguel Arenas
- Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain.
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
- Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain.
| |
Collapse
|
8
|
Arenas M. ProteinEvolverABC: coestimation of recombination and substitution rates in protein sequences by approximate Bayesian computation. Bioinformatics 2021; 38:58-64. [PMID: 34450622 PMCID: PMC8696103 DOI: 10.1093/bioinformatics/btab617] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 07/24/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The evolutionary processes of mutation and recombination, upon which selection operates, are fundamental to understand the observed molecular diversity. Unlike nucleotide sequences, the estimation of the recombination rate in protein sequences has been little explored, neither implemented in evolutionary frameworks, despite protein sequencing methods are largely used. RESULTS In order to accommodate this need, here I present a computational framework, called ProteinEvolverABC, to jointly estimate recombination and substitution rates from alignments of protein sequences. The framework implements the approximate Bayesian computation approach, with and without regression adjustments and includes a variety of substitution models of protein evolution, demographics and longitudinal sampling. It also implements several nuisance parameters such as heterogeneous amino acid frequencies and rate of change among sites and, proportion of invariable sites. The framework produces accurate coestimation of recombination and substitution rates under diverse evolutionary scenarios. As illustrative examples of usage, I applied it to several viral protein families, including coronaviruses, showing heterogeneous substitution and recombination rates. AVAILABILITY AND IMPLEMENTATION ProteinEvolverABC is freely available from https://github.com/miguelarenas/proteinevolverabc, includes a graphical user interface for helping the specification of the input settings, extensive documentation and ready-to-use examples. Conveniently, the simulations can run in parallel on multicore machines. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
9
|
The Roles of Protein Structure, Taxon Sampling, and Model Complexity in Phylogenomics: A Case Study Focused on Early Animal Divergences. BIOPHYSICA 2021. [DOI: 10.3390/biophysica1020008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Despite the long history of using protein sequences to infer the tree of life, the potential for different parts of protein structures to retain historical signal remains unclear. We propose that it might be possible to improve analyses of phylogenomic datasets by incorporating information about protein structure. We test this idea using the position of the root of Metazoa (animals) as a model system. We examined the distribution of “strongly decisive” sites (alignment positions that support a specific tree topology) in a dataset comprising >1500 proteins and almost 100 taxa. The proportion of each class of strongly decisive sites in different structural environments was very sensitive to the model used to analyze the data when a limited number of taxa were used but they were stable when taxa were added. As long as enough taxa were analyzed, sites in all structural environments supported the same topology regardless of whether standard tree searches or decisive sites were used to select the optimal tree. However, the use of decisive sites revealed a difference between the support for minority topologies for sites in different structural environments: buried sites and sites in sheet and coil environments exhibited equal support for the minority topologies, whereas solvent-exposed and helix sites had unequal numbers of sites, supporting the minority topologies. This suggests that the relatively slowly evolving buried, sheet, and coil sites are giving an accurate picture of the true species tree and the amount of conflict among gene trees. Taken as a whole, this study indicates that phylogenetic analyses using sites in different structural environments can yield different topologies for the deepest branches in the animal tree of life and that analyzing larger numbers of taxa eliminates this conflict. More broadly, our results highlight the desirability of incorporating information about protein structure into phylogenomic analyses.
Collapse
|
10
|
Aggarwal S, Acharjee A, Mukherjee A, Baker MS, Srivastava S. Role of Multiomics Data to Understand Host-Pathogen Interactions in COVID-19 Pathogenesis. J Proteome Res 2021; 20:1107-1132. [PMID: 33426872 PMCID: PMC7805606 DOI: 10.1021/acs.jproteome.0c00771] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Indexed: 12/15/2022]
Abstract
Human infectious diseases are contributed equally by the host immune system's efficiency and any pathogens' infectivity. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the coronavirus strain causing the respiratory pandemic coronavirus disease 2019 (COVID-19). To understand the pathobiology of SARS-CoV-2, one needs to unravel the intricacies of host immune response to the virus, the viral pathogen's mode of transmission, and alterations in specific biological pathways in the host allowing viral survival. This review critically analyzes recent research using high-throughput "omics" technologies (including proteomics and metabolomics) on various biospecimens that allow an increased understanding of the pathobiology of SARS-CoV-2 in humans. The altered biomolecule profile facilitates an understanding of altered biological pathways. Further, we have performed a meta-analysis of significantly altered biomolecular profiles in COVID-19 patients using bioinformatics tools. Our analysis deciphered alterations in the immune response, fatty acid, and amino acid metabolism and other pathways that cumulatively result in COVID-19 disease, including symptoms such as hyperglycemic and hypoxic sequelae.
Collapse
Affiliation(s)
- Shalini Aggarwal
- Department of Biosciences and
Bioengineering, Indian Institute of Technology
Bombay, Mumbai 400076,
India
| | - Arup Acharjee
- Department of Biosciences and
Bioengineering, Indian Institute of Technology
Bombay, Mumbai 400076,
India
| | - Amrita Mukherjee
- Department of Biosciences and
Bioengineering, Indian Institute of Technology
Bombay, Mumbai 400076,
India
| | - Mark S. Baker
- Department of Biomedical Science,
Faculty of Medicine, Health and Human Sciences, Macquarie
University, Sydney 2109,
Australia
| | - Sanjeeva Srivastava
- Department of Biosciences and
Bioengineering, Indian Institute of Technology
Bombay, Mumbai 400076,
India
| |
Collapse
|
11
|
Koukouli E, Wang D, Dondelinger F, Park J. A regularized functional regression model enabling transcriptome-wide dosage-dependent association study of cancer drug response. PLoS Comput Biol 2021; 17:e1008066. [PMID: 33493149 PMCID: PMC7920352 DOI: 10.1371/journal.pcbi.1008066] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 03/01/2021] [Accepted: 12/17/2020] [Indexed: 11/18/2022] Open
Abstract
Cancer treatments can be highly toxic and frequently only a subset of the patient population will benefit from a given treatment. Tumour genetic makeup plays an important role in cancer drug sensitivity. We suspect that gene expression markers could be used as a decision aid for treatment selection or dosage tuning. Using in vitro cancer cell line dose-response and gene expression data from the Genomics of Drug Sensitivity in Cancer (GDSC) project, we build a dose-varying regression model. Unlike existing approaches, this allows us to estimate dosage-dependent associations with gene expression. We include the transcriptomic profiles as dose-invariant covariates into the regression model and assume that their effect varies smoothly over the dosage levels. A two-stage variable selection algorithm (variable screening followed by penalized regression) is used to identify genetic factors that are associated with drug response over the varying dosages. We evaluate the effectiveness of our method using simulation studies focusing on the choice of tuning parameters and cross-validation for predictive accuracy assessment. We further apply the model to data from five BRAF targeted compounds applied to different cancer cell lines under different dosage levels. We highlight the dosage-dependent dynamics of the associations between the selected genes and drug response, and we perform pathway enrichment analysis to show that the selected genes play an important role in pathways related to tumorigenesis and DNA damage response.
Collapse
Affiliation(s)
- Evanthia Koukouli
- Department of Mathematics and Statistics, Fylde College, Lancaster University, Bailrigg, Lancaster, UK
| | - Dennis Wang
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Frank Dondelinger
- Centre for Health Informatics and Statistics, Lancaster Medical School, Lancaster University, Bailrigg, Lancaster, UK
| | - Juhyun Park
- Department of Mathematics and Statistics, Fylde College, Lancaster University, Bailrigg, Lancaster, UK
| |
Collapse
|
12
|
Saidijam M, Afshar S, Taherkhani A. Identifying Potential Biomarkers in Colorectal Cancer and Developing Non-invasive Diagnostic Models Using Bioinformatics Approaches. AVICENNA JOURNAL OF MEDICAL BIOCHEMISTRY 2020. [DOI: 10.34172/ajmb.2020.15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Background: Colorectal cancer (CRC) is one of the most frequent causes of gastrointestinal tumors. Due to the invasiveness of the current diagnostic methods, there is an urgent need to develop non-invasive diagnostic approaches for CRC. The exact mechanisms and the most important genes associated with the development of CRC are not fully demonstrated. Objectives: This study aimed to identify differentially expressed miRNAs (DEMs), key genes, and their regulators associated with the pathogenesis of CRC. The signaling pathways and biological processes (BPs) that were significantly affected in CRC were also indicated. Moreover, two non-invasive models were constructed for CRC diagnosis. Methods: The miRNA dataset GSE59856 was downloaded from the Gene Expression Omnibus (GEO) database and analyzed to identify DEMs in CRC patients compared with healthy controls (HCs). A protein-protein interaction (PPI) network was built and analyzed. Significant clusters in the PPI networks were identified, and the BPs and pathways associated with these clusters were studied. The hub genes in the PPI network, as well as their regulators were identified. Results: A total of 569 DEMs were demonstrated with the criteria of P value <0.001. A total of 110 essential genes and 30 modules were identified in the PPI network. Functional analysis revealed that 1005 BPs, 9 molecular functions (MFs), 14 cellular components (CCs), and 887 pathways were significantly affected in CRC. A total of 22 transcription factors (TFs) were demonstrated as the regulators of the hubs. Conclusion: Our results may provide new insight into the pathogenesis of CRC and advance the diagnostic and therapeutic methods of the disease. However, confirmation is required in the future.
Collapse
Affiliation(s)
- Massoud Saidijam
- Department of Molecular Medicine and Genetics, Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Saeid Afshar
- Department of Molecular Medicine and Genetics, Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Amir Taherkhani
- Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
13
|
Abstract
Proteins are commonly used as molecular targets against pathogens such as viruses and bacteria. However, pathogens can evolve rapidly permitting their populations to increase in protein diversity over time and thus escape to the activity of a molecular therapy. Subsequently, in order to design more durable and robust therapies as well as to understand viral evolution in a host and subsequent transmission, it is central to understand the evolution of pathogen proteins. This understanding can enable the detection of protein regions that can be potential targets for therapies and predict the emergence of molecular resistance against therapies. In this direction, two articles published recently in the Journal of Molecular Evolution investigated the evolution of proteomes of diverse flaviviruses, including Zika virus, Dengue virus and West Nile virus. Here I discuss the importance of considering the evolution of viral proteins, with the use of as realistic as possible models and methods that mimic protein evolution, to improve the design of antiviral therapies.
Collapse
|
14
|
Serçinoğlu O, Ozbek P. Sequence-structure-function relationships in class I MHC: A local frustration perspective. PLoS One 2020; 15:e0232849. [PMID: 32421728 PMCID: PMC7233585 DOI: 10.1371/journal.pone.0232849] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 04/22/2020] [Indexed: 12/22/2022] Open
Abstract
Class I Major Histocompatibility Complex (MHC) binds short antigenic peptides with the help of Peptide Loading Complex (PLC), and presents them to T-cell Receptors (TCRs) of cytotoxic T-cells and Killer-cell Immunglobulin-like Receptors (KIRs) of Natural Killer (NK) cells. With more than 10000 alleles, human MHC (Human Leukocyte Antigen, HLA) is the most polymorphic protein in humans. This allelic diversity provides a wide coverage of peptide sequence space, yet does not affect the three-dimensional structure of the complex. Moreover, TCRs mostly interact with HLA in a common diagonal binding mode, and KIR-HLA interaction is allele-dependent. With the aim of establishing a framework for understanding the relationships between polymorphism (sequence), structure (conserved fold) and function (protein interactions) of the human MHC, we performed here a local frustration analysis on pMHC homology models covering 1436 HLA I alleles. An analysis of local frustration profiles indicated that (1) variations in MHC fold are unlikely due to minimally-frustrated and relatively conserved residues within the HLA peptide-binding groove, (2) high frustration patches on HLA helices are either involved in or near interaction sites of MHC with the TCR, KIR, or tapasin of the PLC, and (3) peptide ligands mainly stabilize the F-pocket of HLA binding groove.
Collapse
Affiliation(s)
- Onur Serçinoğlu
- Department of Bioengineering, Recep Tayyip Erdogan University, Faculty of Engineering, Fener, Rize, Turkey
| | - Pemra Ozbek
- Department of Bioengineering, Marmara University, Faculty of Engineering, Goztepe, Istanbul, Turkey
- * E-mail:
| |
Collapse
|
15
|
Pandey A, Braun EL. Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root. BIOLOGY 2020; 9:E64. [PMID: 32231097 PMCID: PMC7235752 DOI: 10.3390/biology9040064] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 03/09/2020] [Accepted: 03/20/2020] [Indexed: 12/23/2022]
Abstract
Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.
Collapse
Affiliation(s)
- Akanksha Pandey
- Department of Biology, University of Florida, Gainesville, FL 32611, USA;
| | - Edward L. Braun
- Department of Biology, University of Florida, Gainesville, FL 32611, USA;
- Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
16
|
Arenas M, Bastolla U. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13341] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology University of Vigo Vigo Spain
- Biomedical Research Center (CINBIO) University of Vigo Vigo Spain
| | - Ugo Bastolla
- Bioinformatics Unit Centre for Molecular Biology Severo Ochoa (CSIC) Madrid Spain
| |
Collapse
|
17
|
Aydınkal RM, Serçinoğlu O, Ozbek P. ProSNEx: a web-based application for exploration and analysis of protein structures using network formalism. Nucleic Acids Res 2019; 47:W471-W476. [PMID: 31114881 PMCID: PMC6602423 DOI: 10.1093/nar/gkz390] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/17/2019] [Accepted: 05/09/2019] [Indexed: 01/14/2023] Open
Abstract
ProSNEx (Protein Structure Network Explorer) is a web service for construction and analysis of Protein Structure Networks (PSNs) alongside amino acid flexibility, sequence conservation and annotation features. ProSNEx constructs a PSN by adding nodes to represent residues and edges between these nodes using user-specified interaction distance cutoffs for either carbon-alpha, carbon-beta or atom-pair contact networks. Different types of weighted networks can also be constructed by using either (i) the residue-residue interaction energies in the format returned by gRINN, resulting in a Protein Energy Network (PEN); (ii) the dynamical cross correlations from a coarse-grained Normal Mode Analysis (NMA) of the protein structure; (iii) interaction strength. Upon construction of the network, common network metrics (such as node centralities) as well as shortest paths between nodes and k-cliques are calculated. Moreover, additional features of each residue in the form of conservation scores and mutation/natural variant information are included in the analysis. By this way, tool offers an enhanced and direct comparison of network-based residue metrics with other types of biological information. ProSNEx is free and open to all users without login requirement at http://prosnex-tool.com.
Collapse
Affiliation(s)
- Rasim Murat Aydınkal
- Department of Bioengineering, Faculty of Engineering, Marmara University, Kadikoy, Istanbul 34722, Turkey
- Ali Nihat Gokyigit Foundation, Etiler, Istanbul 34340, Turkey
| | - Onur Serçinoğlu
- Department of Bioengineering, Faculty of Engineering, Marmara University, Kadikoy, Istanbul 34722, Turkey
- Department of Bioengineering, Faculty of Engineering, Recep Tayyip Erdoğan University, Rize 53100, Turkey
| | - Pemra Ozbek
- Department of Bioengineering, Faculty of Engineering, Marmara University, Kadikoy, Istanbul 34722, Turkey
| |
Collapse
|
18
|
Nute M, Saleh E, Warnow T. Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets. Syst Biol 2019; 68:396-411. [PMID: 30329135 PMCID: PMC6472439 DOI: 10.1093/sysbio/syy068] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/27/2018] [Accepted: 10/11/2018] [Indexed: 01/15/2023] Open
Abstract
The estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical coestimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical coestimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy has better precision and recall (with respect to the true alignments) than the other alignment methods on the simulated data sets but has consistently lower recall on the biological benchmarks (with respect to the reference alignments) than many of the other methods. In other words, we find that BAli-Phy systematically underaligns when operating on biological sequence data but shows no sign of this on simulated data. There are several potential causes for this change in performance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments, and future research is needed to determine the most likely explanation. We conclude with a discussion of the potential ramifications for each of these possibilities. [BAli-Phy; homology; multiple sequence alignment; protein sequences; structural alignment.]
Collapse
Affiliation(s)
- Michael Nute
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S Wright St #101, Champaign, IL 61820, USA
| | - Ehsan Saleh
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1205 W. Clark St., Urbana, IL 61801, USA.,National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
19
|
Gimferrer L, Vila J, Piñana M, Andrés C, Rodrigo-Pendás JA, Peremiquel-Trillas P, Codina MG, C Martín MD, Esperalba J, Fuentes F, Rubio S, Campins-Martí M, Pumarola T, Antón A. Virological surveillance of human respiratory syncytial virus A and B at a tertiary hospital in Catalonia (Spain) during five consecutive seasons (2013-2018). Future Microbiol 2019; 14:373-381. [PMID: 30860397 DOI: 10.2217/fmb-2018-0261] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
AIM Human respiratory syncytial virus (HRSV) is the main cause of respiratory tract infections among infants. MATERIALS & METHODS In the present study, the molecular epidemiology of HRSV detected from 2013 to 2017 has been described. RESULTS A 10% of collected samples were laboratory confirmed for HRSV. Patients under 2 years of age were the main susceptible population to respiratory syncytial virus disease, but an increasingly number of confirmed patients over 65 years of age was reported. Epidemics usually started in autumn and ended in spring. Both HRSV groups co-circulated every season, but the HRSV-B was the most predominant. HRSV-A and HRSV-B strains mainly belonged to ON1 and BA9 genotypes, respectively. CONCLUSION The present study reports recent data about the genetic diversity of circulating HRSV in Spain.
Collapse
Affiliation(s)
- Laura Gimferrer
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Jorgina Vila
- Paediatric Hospitalisation Unit, Department of Paediatrics, Hospital Universitari Maternoinfantil Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Maria Piñana
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Cristina Andrés
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - José A Rodrigo-Pendás
- Preventive Medicine & Epidemiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Paula Peremiquel-Trillas
- Preventive Medicine & Epidemiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - María G Codina
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - María Del C Martín
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Juliana Esperalba
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Francisco Fuentes
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Susana Rubio
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Magda Campins-Martí
- Preventive Medicine & Epidemiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Tomàs Pumarola
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Andrés Antón
- Respiratory Virus Unit, Microbiology Department, Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| |
Collapse
|
20
|
The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference. Methods Mol Biol 2019; 1851:215-231. [PMID: 30298399 DOI: 10.1007/978-1-4939-8736-8_11] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Phylogenetic inference from protein data is traditionally based on empirical substitution models of evolution that assume that protein sites evolve independently of each other and under the same substitution process. However, it is well known that the structural properties of a protein site in the native state affect its evolution, in particular the sequence entropy and the substitution rate. Starting from the seminal proposal by Halpern and Bruno, where structural properties are incorporated in the evolutionary model through site-specific amino acid frequencies, several models have been developed to tackle the influence of protein structure on sequence evolution. Here we describe stability-constrained substitution (SCS) models that explicitly consider the stability of the native state against both unfolded and misfolded states. One of them, the mean-field model, provides an independent sites approximation that can be readily incorporated in maximum likelihood methods of phylogenetic inference, including ancestral sequence reconstruction. Next, we describe its validation with simulated and real proteins and its limitations and advantages with respect to empirical models that lack site specificity. We finally provide guidelines and recommendations to analyze protein data accounting for stability constraints, including computer simulations and inferences of protein evolution based on maximum likelihood. Some practical examples are included to illustrate these procedures.
Collapse
|
21
|
Rota J, Malm T, Chazot N, Peña C, Wahlberg N. A simple method for data partitioning based on relative evolutionary rates. PeerJ 2018; 6:e5498. [PMID: 30186687 PMCID: PMC6118207 DOI: 10.7717/peerj.5498] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 08/01/2018] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Multiple studies have demonstrated that partitioning of molecular datasets is important in model-based phylogenetic analyses. Commonly, partitioning is done a priori based on some known properties of sequence evolution, e.g. differences in rate of evolution among codon positions of a protein-coding gene. Here we propose a new method for data partitioning based on relative evolutionary rates of the sites in the alignment of the dataset being analysed. The rates are inferred using the previously published Tree Independent Generation of Evolutionary Rates (TIGER), and the partitioning is conducted using our novel python script RatePartitions. We conducted simulations to assess the performance of our new method, and we applied it to eight published multi-locus phylogenetic datasets, representing different taxonomic ranks within the insect order Lepidoptera (butterflies and moths) and one phylogenomic dataset, which included ultra-conserved elements as well as introns. METHODS We used TIGER-rates to generate relative evolutionary rates for all sites in the alignments. Then, using RatePartitions, we partitioned the data into partitions based on their relative evolutionary rate. RatePartitions applies a simple formula that ensures a distribution of sites into partitions following the distribution of rates of the characters from the full dataset. This ensures that the invariable sites are placed in a partition with slowly evolving sites, avoiding the pitfalls of previously used methods, such as k-means. Different partitioning strategies were evaluated using BIC scores as calculated by PartitionFinder. RESULTS Simulations did not highlight any misbehaviour of our partitioning approach, even under difficult parameter conditions or missing data. In all eight phylogenetic datasets, partitioning using TIGER-rates and RatePartitions was significantly better as measured by the BIC scores than other partitioning strategies, such as the commonly used partitioning by gene and codon position. We compared the resulting topologies and node support for these eight datasets as well as for the phylogenomic dataset. DISCUSSION We developed a new method of partitioning phylogenetic datasets without using any prior knowledge (e.g. DNA sequence evolution). This method is entirely based on the properties of the data being analysed and can be applied to DNA sequences (protein-coding, introns, ultra-conserved elements), protein sequences, as well as morphological characters. A likely explanation for why our method performs better than other tested partitioning strategies is that it accounts for the heterogeneity in the data to a much greater extent than when data are simply subdivided based on prior knowledge.
Collapse
Affiliation(s)
- Jadranka Rota
- Department of Biology, Lund University, Lund, Sweden
| | - Tobias Malm
- Department of Zoology, Swedish Museum of Natural History, Stockholm, Sweden
| | | | - Carlos Peña
- HipLead, San Francisco, CA, United States of America
| | | |
Collapse
|
22
|
Babbitt GA, Mortensen JS, Coppola EE, Adams LE, Liao JK. DROIDS 1.20: A GUI-Based Pipeline for GPU-Accelerated Comparative Protein Dynamics. Biophys J 2018; 114:1009-1017. [PMID: 29539389 PMCID: PMC5883555 DOI: 10.1016/j.bpj.2018.01.020] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 01/04/2018] [Accepted: 01/22/2018] [Indexed: 11/29/2022] Open
Abstract
Traditional informatics in comparative genomics work only with static representations of biomolecules (i.e., sequence and structure), thereby ignoring the molecular dynamics (MD) of proteins that define function in the cell. A comparative approach applied to MD would connect this very short timescale process, defined in femtoseconds, to one of the longest in the universe: molecular evolution measured in millions of years. Here, we leverage advances in graphics-processing-unit-accelerated MD simulation software to develop a comparative method of MD analysis and visualization that can be applied to any two homologous Protein Data Bank structures. Our open-source pipeline, DROIDS (Detecting Relative Outlier Impacts in Dynamic Simulations), works in conjunction with existing molecular modeling software to convert any Linux gaming personal computer into a "comparative computational microscope" for observing the biophysical effects of mutations and other chemical changes in proteins. DROIDS implements structural alignment and Benjamini-Hochberg-corrected Kolmogorov-Smirnov statistics to compare nanosecond-scale atom bond fluctuations on the protein backbone, color mapping the significant differences identified in protein MD with single-amino-acid resolution. DROIDS is simple to use, incorporating graphical user interface control for Amber16 MD simulations, cpptraj analysis, and the final statistical and visual representations in R graphics and UCSF Chimera. We demonstrate that DROIDS can be utilized to visually investigate molecular evolution and disease-related functional changes in MD due to genetic mutation and epigenetic modification. DROIDS can also be used to potentially investigate binding interactions of pharmaceuticals, toxins, or other biomolecules in a functional evolutionary context as well.
Collapse
Affiliation(s)
- Gregory A Babbitt
- T.H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, New York.
| | - Jamie S Mortensen
- Department of Biomedical Engineering, Rochester Institute of Technology, Rochester, New York
| | - Erin E Coppola
- Department of Biomedical Engineering, Rochester Institute of Technology, Rochester, New York
| | - Lily E Adams
- T.H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, New York
| | - Justin K Liao
- Department of Biomedical Engineering, Rochester Institute of Technology, Rochester, New York
| |
Collapse
|
23
|
Triplet-Based Codon Organization Optimizes the Impact of Synonymous Mutation on Nucleic Acid Molecular Dynamics. J Mol Evol 2018; 86:91-102. [PMID: 29344693 PMCID: PMC5846835 DOI: 10.1007/s00239-018-9828-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 01/06/2018] [Indexed: 11/22/2022]
Abstract
Since the elucidation of the genetic code almost 50 years ago, many nonrandom aspects of its codon organization remain only partly resolved. Here, we investigate the recent hypothesis of ‘dual-use’ codons which proposes that in addition to allowing adjustment of codon optimization to tRNA abundance, the degeneracy in the triplet-based genetic code also multiplexes information regarding DNA’s helical shape and protein-binding dynamics while avoiding interference with other protein-level characteristics determined by amino acid properties. How such structural optimization of the code within eukaryotic chromatin could have arisen from an RNA world is a mystery, but would imply some preadaptation in an RNA context. We analyzed synonymous (protein-silent) and nonsynonymous (protein-altering) mutational impacts on molecular dynamics in 13823 identically degenerate alternative codon reorganizations, defined by codon transitions in 7680 GPU-accelerated molecular dynamic simulations of implicitly and explicitly solvated double-stranded aRNA and bDNA structures. When compared to all possible alternative codon assignments, the standard genetic code minimized the impact of synonymous mutations on the random atomic fluctuations and correlations of carbon backbone vector trajectories while facilitating the specific movements that contribute to DNA polymer flexibility. This trend was notably stronger in the context of RNA supporting the idea that dual-use codon optimization and informational multiplexing in DNA resulted from the preadaptation of the RNA duplex to resist changes to thermostability. The nonrandom and divergent molecular dynamics of synonymous mutations also imply that the triplet-based code may have resulted from adaptive functional expansion enabling a primordial doublet code to multiplex gene regulatory information via the shape and charge of the minor groove.
Collapse
|
24
|
Arenas M, Araujo NM, Branco C, Castelhano N, Castro-Nallar E, Pérez-Losada M. Mutation and recombination in pathogen evolution: Relevance, methods and controversies. INFECTION GENETICS AND EVOLUTION 2017; 63:295-306. [PMID: 28951202 DOI: 10.1016/j.meegid.2017.09.029] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 09/20/2017] [Accepted: 09/21/2017] [Indexed: 02/06/2023]
Abstract
Mutation and recombination drive the evolution of most pathogens by generating the genetic variants upon which selection operates. Those variants can, for example, confer resistance to host immune systems and drug therapies or lead to epidemic outbreaks. Given their importance, diverse evolutionary studies have investigated the abundance and consequences of mutation and recombination in pathogen populations. However, some controversies persist regarding the contribution of each evolutionary force to the development of particular phenotypic observations (e.g., drug resistance). In this study, we revise the importance of mutation and recombination in the evolution of pathogens at both intra-host and inter-host levels. We also describe state-of-the-art analytical methodologies to detect and quantify these two evolutionary forces, including biases that are often ignored in evolutionary studies. Finally, we present some of our former studies involving pathogenic taxa where mutation and recombination played crucial roles in the recovery of pathogenic fitness, the generation of interspecific genetic diversity, or the design of centralized vaccines. This review also illustrates several common controversies and pitfalls in the analysis and in the evaluation and interpretation of mutation and recombination outcomes.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain; Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Natalia M Araujo
- Laboratory of Molecular Virology, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Brazil.
| | - Catarina Branco
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Nadine Castelhano
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Eduardo Castro-Nallar
- Universidad Andrés Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Santiago, Chile.
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Ashburn, VA 20147, Washington, DC, United States; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal.
| |
Collapse
|
25
|
Mishra C, Kumar S, Panigrahi M, Yathish HM, Chaudhary R, Chauhan A, Kumar A, Sonawane AA. Single Nucleotide Polymorphisms in 5' Upstream Region of Bovine TLR4 Gene Affecting Expression Profile and Transcription Factor Binding Sites. Anim Biotechnol 2017; 29:119-128. [PMID: 28594279 DOI: 10.1080/10495398.2017.1326929] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present study in the 5' upstream region of TLR4 gene revealed four Single Nucleotide Polymorphisms (SNPs) in Vrindavani and Tharparkar cattle. The polymorphic information content (PIC), heterozygosity and allelic diversity values were low to moderate for these SNPs. In Vrindavani cattle, one SNP was found to be in Hardy-Weinberg Equilibrium (HWE) and the remaining three were found to be in linkage disequilibrium (LD) as indicated statistically (P > 0.05). In Tharparkar cattle, two SNPs were found to be in HWE and were not in LD as indicated statistically (P > 0.05). These SNPs were used for construction of haplotypes. In-silico analysis of these SNPs predicted abolition of eight transcription factor binding sites and creation of eight new sites. The quantitative real time PCR analysis did not show any significant variation of gene expression among haplotypes. However, gene expression between breed was found to be significant (P < 0.05) which suggested that upstream region of bovine TLR4 gene has a crucial role in its expression. These findings in TLR4 gene offer essential evidence that can be useful in future research exploring its role in immunity. TLR4 can be used as a marker for selection for disease resistance in bovines.
Collapse
Affiliation(s)
- Chinmoy Mishra
- a Department of Animal Genetics , Indian Veterinary Research Institute , Uttar Pradesh , India
| | - Subodh Kumar
- a Department of Animal Genetics , Indian Veterinary Research Institute , Uttar Pradesh , India
| | - Manjit Panigrahi
- a Department of Animal Genetics , Indian Veterinary Research Institute , Uttar Pradesh , India
| | - H M Yathish
- a Department of Animal Genetics , Indian Veterinary Research Institute , Uttar Pradesh , India
| | - Rajni Chaudhary
- a Department of Animal Genetics , Indian Veterinary Research Institute , Uttar Pradesh , India
| | - Anuj Chauhan
- a Department of Animal Genetics , Indian Veterinary Research Institute , Uttar Pradesh , India
| | - Amit Kumar
- a Department of Animal Genetics , Indian Veterinary Research Institute , Uttar Pradesh , India
| | - Arvind A Sonawane
- a Department of Animal Genetics , Indian Veterinary Research Institute , Uttar Pradesh , India
| |
Collapse
|
26
|
Teufel AI, Wilke CO. Accelerated simulation of evolutionary trajectories in origin-fixation models. J R Soc Interface 2017; 14:20160906. [PMID: 28228542 PMCID: PMC5332577 DOI: 10.1098/rsif.2016.0906] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 01/31/2017] [Indexed: 11/12/2022] Open
Abstract
We present an accelerated algorithm to forward-simulate origin-fixation models. Our algorithm requires, on average, only about two fitness evaluations per fixed mutation, whereas traditional algorithms require, per one fixed mutation, a number of fitness evaluations of the order of the effective population size, Ne Our accelerated algorithm yields the exact same steady state as the original algorithm but produces a different order of fixed mutations. By comparing several relevant evolutionary metrics, such as the distribution of fixed selection coefficients and the probability of reversion, we find that the two algorithms behave equivalently in many respects. However, the accelerated algorithm yields less variance in fixed selection coefficients. Notably, we are able to recover the expected amount of variance by rescaling population size, and we find a linear relationship between the rescaled population size and the population size used by the original algorithm. Considering the widespread usage of origin-fixation simulations across many areas of evolutionary biology, we introduce our accelerated algorithm as a useful tool for increasing the computational complexity of fitness functions without sacrificing much in terms of accuracy of the evolutionary simulation.
Collapse
Affiliation(s)
- Ashley I Teufel
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
27
|
Bastolla U, Dehouck Y, Echave J. What evolution tells us about protein physics, and protein physics tells us about evolution. Curr Opin Struct Biol 2017; 42:59-66. [DOI: 10.1016/j.sbi.2016.10.020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Revised: 10/19/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022]
|
28
|
Redondo RAF, de Vladar HP, Włodarski T, Bollback JP. Evolutionary interplay between structure, energy and epistasis in the coat protein of the ϕX174 phage family. J R Soc Interface 2017; 14:20160139. [PMID: 28053111 PMCID: PMC5310724 DOI: 10.1098/rsif.2016.0139] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Accepted: 11/29/2016] [Indexed: 01/01/2023] Open
Abstract
Viral capsids are structurally constrained by interactions among the amino acids (AAs) of their constituent proteins. Therefore, epistasis is expected to evolve among physically interacting sites and to influence the rates of substitution. To study the evolution of epistasis, we focused on the major structural protein of the ϕX174 phage family by first reconstructing the ancestral protein sequences of 18 species using a Bayesian statistical framework. The inferred ancestral reconstruction differed at eight AAs, for a total of 256 possible ancestral haplotypes. For each ancestral haplotype and the extant species, we estimated, in silico, the distribution of free energies and epistasis of the capsid structure. We found that free energy has not significantly increased but epistasis has. We decomposed epistasis up to fifth order and found that higher-order epistasis sometimes compensates pairwise interactions making the free energy seem additive. The dN/dS ratio is low, suggesting strong purifying selection, and that structure is under stabilizing selection. We synthesized phages carrying ancestral haplotypes of the coat protein gene and measured their fitness experimentally. Our findings indicate that stabilizing mutations can have higher fitness, and that fitness optima do not necessarily coincide with energy minima.
Collapse
Affiliation(s)
| | - Harold P de Vladar
- IST Austria, Am Campus 1, 3400 Klosterneuburg, Austria
- Center for the Conceptual Foundations of Science, Parmenides Foundation, 82049 Pullach, Germany
| | - Tomasz Włodarski
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | |
Collapse
|
29
|
Meyer AG, Wilke CO. The utility of protein structure as a predictor of site-wise dN/dS varies widely among HIV-1 proteins. J R Soc Interface 2016; 12:20150579. [PMID: 26468068 DOI: 10.1098/rsif.2015.0579] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure acts as a general constraint on the evolution of viral proteins. One widely recognized structural constraint explaining evolutionary variation among sites is the relative solvent accessibility (RSA) of residues in the folded protein. In influenza virus, the distance from functional sites has been found to explain an additional portion of the evolutionary variation in the external antigenic proteins. However, to what extent RSA and distance from a reference site in the protein can be used more generally to explain protein adaptation in other viruses and in the different proteins of any given virus remains an open question. To address this question, we have carried out an analysis of the distribution and structural predictors of site-wise dN/dS in HIV-1. Our results indicate that the distribution of dN/dS in HIV follows a smooth gamma distribution, with no special enrichment or depletion of sites with dN/dS at or above one. The variation in dN/dS can be partially explained by RSA and distance from a reference site in the protein, but these structural constraints do not act uniformly among the different HIV-1 proteins. Structural constraints are highly predictive in just one of the three enzymes and one of three structural proteins in HIV-1. For these two proteins, the protease enzyme and the gp120 structural protein, structure explains between 30 and 40% of the variation in dN/dS. Finally, for the gp120 protein of the receptor-binding complex, we also find that glycosylation sites explain just 2% of the variation in dN/dS and do not explain gp120 evolution independently of either RSA or distance from the apical surface.
Collapse
Affiliation(s)
- Austin G Meyer
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
30
|
Jack BR, Meyer AG, Echave J, Wilke CO. Functional Sites Induce Long-Range Evolutionary Constraints in Enzymes. PLoS Biol 2016; 14:e1002452. [PMID: 27138088 PMCID: PMC4854464 DOI: 10.1371/journal.pbio.1002452] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 04/04/2016] [Indexed: 12/26/2022] Open
Abstract
Functional residues in proteins tend to be highly conserved over evolutionary time. However, to what extent functional sites impose evolutionary constraints on nearby or even more distant residues is not known. Here, we report pervasive conservation gradients toward catalytic residues in a dataset of 524 distinct enzymes: evolutionary conservation decreases approximately linearly with increasing distance to the nearest catalytic residue in the protein structure. This trend encompasses, on average, 80% of the residues in any enzyme, and it is independent of known structural constraints on protein evolution such as residue packing or solvent accessibility. Further, the trend exists in both monomeric and multimeric enzymes and irrespective of enzyme size and/or location of the active site in the enzyme structure. By contrast, sites in protein-protein interfaces, unlike catalytic residues, are only weakly conserved and induce only minor rate gradients. In aggregate, these observations show that functional sites, and in particular catalytic residues, induce long-range evolutionary constraints in enzymes.
Collapse
Affiliation(s)
- Benjamin R. Jack
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Austin G. Meyer
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, San Martín, Buenos Aires, Argentina
| | - Claus O. Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
31
|
Ragsdale AP, Coffman AJ, Hsieh P, Struck TJ, Gutenkunst RN. Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations. Genetics 2016; 203:513-23. [PMID: 27029732 PMCID: PMC4858796 DOI: 10.1534/genetics.115.184812] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 03/19/2016] [Indexed: 12/27/2022] Open
Abstract
The distribution of mutational effects on fitness is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from Drosophila melanogaster We found a moderate positive correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing it to biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutational fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Program in Applied Mathematics, University of Arizona, Tucson, Arizona 85721
| | - Alec J Coffman
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona 85721
| | - PingHsun Hsieh
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Travis J Struck
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona 85721
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|
32
|
Arenas M. Trends in substitution models of molecular evolution. Front Genet 2015; 6:319. [PMID: 26579193 PMCID: PMC4620419 DOI: 10.3389/fgene.2015.00319] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/09/2015] [Indexed: 11/13/2022] Open
Abstract
Substitution models of evolution describe the process of genetic variation through fixed mutations and constitute the basis of the evolutionary analysis at the molecular level. Almost 40 years after the development of first substitution models, highly sophisticated, and data-specific substitution models continue emerging with the aim of better mimicking real evolutionary processes. Here I describe current trends in substitution models of DNA, codon and amino acid sequence evolution, including advantages and pitfalls of the most popular models. The perspective concludes that despite the large number of currently available substitution models, further research is required for more realistic modeling, especially for DNA coding and amino acid data. Additionally, the development of more accurate complex models should be coupled with new implementations and improvements of methods and frameworks for substitution model selection and downstream evolutionary analysis.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto Porto, Portugal
| |
Collapse
|
33
|
Tripathi S, Waxham MN, Cheung MS, Liu Y. Lessons in Protein Design from Combined Evolution and Conformational Dynamics. Sci Rep 2015; 5:14259. [PMID: 26388515 PMCID: PMC4585694 DOI: 10.1038/srep14259] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 08/21/2015] [Indexed: 11/09/2022] Open
Abstract
Protein-protein interactions play important roles in the control of every cellular process. How natural selection has optimized protein design to produce molecules capable of binding to many partner proteins is a fascinating problem but not well understood. Here, we performed a combinatorial analysis of protein sequence evolution and conformational dynamics to study how calmodulin (CaM), which plays essential roles in calcium signaling pathways, has adapted to bind to a large number of partner proteins. We discovered that amino acid residues in CaM can be partitioned into unique classes according to their degree of evolutionary conservation and local stability. Holistically, categorization of CaM residues into these classes reveals enriched physico-chemical interactions required for binding to diverse targets, balanced against the need to maintain the folding and structural modularity of CaM to achieve its overall function. The sequence-structure-function relationship of CaM provides a concrete example of the general principle of protein design. We have demonstrated the synergy between the fields of molecular evolution and protein biophysics and created a generalizable framework broadly applicable to the study of protein-protein interactions.
Collapse
Affiliation(s)
- Swarnendu Tripathi
- Department of Physics, University of Houston, Houston, TX.,Center for Theoretical Biological Physics, Rice University, Houston, TX
| | - M Neal Waxham
- Department of Neurobiology and Anatomy, University of Texas, Health Science Center, Houston, TX
| | - Margaret S Cheung
- Department of Physics, University of Houston, Houston, TX.,Center for Theoretical Biological Physics, Rice University, Houston, TX
| | - Yin Liu
- Department of Neurobiology and Anatomy, University of Texas, Health Science Center, Houston, TX
| |
Collapse
|
34
|
Bar-Rogovsky H, Stern A, Penn O, Kobl I, Pupko T, Tawfik DS. Assessing the prediction fidelity of ancestral reconstruction by a library approach. Protein Eng Des Sel 2015; 28:507-18. [DOI: 10.1093/protein/gzv038] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 07/20/2015] [Indexed: 11/13/2022] Open
|
35
|
Contingency and entrenchment in protein evolution under purifying selection. Proc Natl Acad Sci U S A 2015; 112:E3226-35. [PMID: 26056312 DOI: 10.1073/pnas.1412933112] [Citation(s) in RCA: 140] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The phenotypic effect of an allele at one genetic site may depend on alleles at other sites, a phenomenon known as epistasis. Epistasis can profoundly influence the process of evolution in populations and shape the patterns of protein divergence across species. Whereas epistasis between adaptive substitutions has been studied extensively, relatively little is known about epistasis under purifying selection. Here we use computational models of thermodynamic stability in a ligand-binding protein to explore the structure of epistasis in simulations of protein sequence evolution. Even though the predicted effects on stability of random mutations are almost completely additive, the mutations that fix under purifying selection are enriched for epistasis. In particular, the mutations that fix are contingent on previous substitutions: Although nearly neutral at their time of fixation, these mutations would be deleterious in the absence of preceding substitutions. Conversely, substitutions under purifying selection are subsequently entrenched by epistasis with later substitutions: They become increasingly deleterious to revert over time. Our results imply that, even under purifying selection, protein sequence evolution is often contingent on history and so it cannot be predicted by the phenotypic effects of mutations assayed in the ancestral background.
Collapse
|
36
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
37
|
Arenas M, Sánchez-Cobos A, Bastolla U. Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability. Mol Biol Evol 2015; 32:2195-207. [PMID: 25837579 DOI: 10.1093/molbev/msv085] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback-Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Agustin Sánchez-Cobos
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Ugo Bastolla
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| |
Collapse
|
38
|
Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics. Curr Opin Struct Biol 2014; 26:84-91. [PMID: 24952216 DOI: 10.1016/j.sbi.2014.05.005] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 04/19/2014] [Accepted: 05/16/2014] [Indexed: 11/24/2022]
Abstract
The variation among sequences and structures in nature is both determined by physical laws and by evolutionary history. However, these two factors are traditionally investigated by disciplines with different emphasis and philosophy-molecular biophysics on one hand and evolutionary population genetics in another. Here, we review recent theoretical and computational approaches that address the crucial need to integrate these two disciplines. We first articulate the elements of these approaches. Then, we survey their contribution to our mechanistic understanding of molecular evolution, the polymorphisms in coding region, the distribution of fitness effects (DFE) of mutations, the observed folding stability of proteins in nature, and the distribution of protein folds in genomes.
Collapse
|
39
|
Detecting selection on protein stability through statistical mechanical models of folding and evolution. Biomolecules 2014; 4:291-314. [PMID: 24970217 PMCID: PMC4030984 DOI: 10.3390/biom4010291] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2013] [Revised: 02/13/2014] [Accepted: 02/14/2014] [Indexed: 12/31/2022] Open
Abstract
The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change.
Collapse
|
40
|
Harms MJ, Thornton JW. Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat Rev Genet 2013; 14:559-71. [PMID: 23864121 DOI: 10.1038/nrg3540] [Citation(s) in RCA: 236] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The repertoire of proteins and nucleic acids in the living world is determined by evolution; their properties are determined by the laws of physics and chemistry. Explanations of these two kinds of causality - the purviews of evolutionary biology and biochemistry, respectively - are typically pursued in isolation, but many fundamental questions fall squarely at the interface of fields. Here we articulate the paradigm of evolutionary biochemistry, which aims to dissect the physical mechanisms and evolutionary processes by which biological molecules diversified and to reveal how their physical architecture facilitates and constrains their evolution. We show how an integration of evolution with biochemistry moves us towards a more complete understanding of why biological molecules have the properties that they do.
Collapse
Affiliation(s)
- Michael J Harms
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, USA
| | | |
Collapse
|
41
|
Arenas M, Dos Santos HG, Posada D, Bastolla U. Protein evolution along phylogenetic histories under structurally constrained substitution models. ACTA ACUST UNITED AC 2013; 29:3020-8. [PMID: 24037213 DOI: 10.1093/bioinformatics/btt530] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Models of molecular evolution aim at describing the evolutionary processes at the molecular level. However, current models rarely incorporate information from protein structure. Conversely, structure-based models of protein evolution have not been commonly applied to simulate sequence evolution in a phylogenetic framework, and they often ignore relevant evolutionary processes such as recombination. A simulation evolutionary framework that integrates substitution models that account for protein structure stability should be able to generate more realistic in silico evolved proteins for a variety of purposes. RESULTS We developed a method to simulate protein evolution that combines models of protein folding stability, such that the fitness depends on the stability of the native state both with respect to unfolding and misfolding, with phylogenetic histories that can be either specified by the user or simulated with the coalescent under complex evolutionary scenarios, including recombination, demographics and migration. We have implemented this framework in a computer program called ProteinEvolver. Remarkably, comparing these models with empirical amino acid replacement models, we found that the former produce amino acid distributions closer to distributions observed in real protein families, and proteins that are predicted to be more stable. Therefore, we conclude that evolutionary models that consider protein stability and realistic evolutionary histories constitute a better approximation of the real evolutionary process.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology 'Severo Ochoa', Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain and Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | | | | | | |
Collapse
|
42
|
Ragan MA, Chan CX. Biological Intuition in Alignment-Free Methods: Response to Posada. J Mol Evol 2013; 77:1-2. [DOI: 10.1007/s00239-013-9573-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 07/04/2013] [Indexed: 10/26/2022]
|
43
|
Posada D. Phylogenetic models of molecular evolution: next-generation data, fit, and performance. J Mol Evol 2013; 76:351-2. [PMID: 23695649 DOI: 10.1007/s00239-013-9566-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Accepted: 05/13/2013] [Indexed: 11/24/2022]
|
44
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
45
|
Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|