1
|
Rees J, Sarangi G, Cheng Q, Floor M, Andrés AM, Oliva Miguel B, Villà-Freixa J, Arnér ESJ, Castellano S. Ancient Loss of Catalytic Selenocysteine Spurred Convergent Adaptation in a Mammalian Oxidoreductase. Genome Biol Evol 2024; 16:evae041. [PMID: 38447079 PMCID: PMC10958145 DOI: 10.1093/gbe/evae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 02/14/2024] [Accepted: 02/22/2024] [Indexed: 03/08/2024] Open
Abstract
Selenocysteine, the 21st amino acid specified by the genetic code, is a rare selenium-containing residue found in the catalytic site of selenoprotein oxidoreductases. Selenocysteine is analogous to the common cysteine amino acid, but its selenium atom offers physical-chemical properties not provided by the corresponding sulfur atom in cysteine. Catalytic sites with selenocysteine in selenoproteins of vertebrates are under strong purifying selection, but one enzyme, glutathione peroxidase 6 (GPX6), independently exchanged selenocysteine for cysteine <100 million years ago in several mammalian lineages. We reconstructed and assayed these ancient enzymes before and after selenocysteine was lost and up to today and found them to have lost their classic ability to reduce hydroperoxides using glutathione. This loss of function, however, was accompanied by additional amino acid changes in the catalytic domain, with protein sites concertedly changing under positive selection across distant lineages abandoning selenocysteine in glutathione peroxidase 6. This demonstrates a narrow evolutionary range in maintaining fitness when sulfur in cysteine impairs the catalytic activity of this protein, with pleiotropy and epistasis likely driving the observed convergent evolution. We propose that the mutations shared across distinct lineages may trigger enzymatic properties beyond those in classic glutathione peroxidases, rather than simply recovering catalytic rate. These findings are an unusual example of adaptive convergence across mammalian selenoproteins, with the evolutionary signatures possibly representing the evolution of novel oxidoreductase functions.
Collapse
Affiliation(s)
- Jasmin Rees
- Great Ormond Street Institute of Child Health, University College London, London, UK
- Division of Biosciences, University College London, London, UK
| | - Gaurab Sarangi
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Qing Cheng
- Division of Biochemistry, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Martin Floor
- Department of Biosciences, Faculty of Sciences and Technology, Universitat de Vic—Universitat Central de Catalunya, Vic, Spain
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Aida M Andrés
- Division of Biosciences, University College London, London, UK
| | - Baldomero Oliva Miguel
- Department of Health and Experimental Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Jordi Villà-Freixa
- Department of Biosciences, Faculty of Sciences and Technology, Universitat de Vic—Universitat Central de Catalunya, Vic, Spain
- Institut de Recerca i Innovació en Ciències de la Vida i de la Salut a la Catalunya Central (IRIS-CC), Vic, Spain
| | - Elias S J Arnér
- Division of Biochemistry, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
- Department of Selenoprotein Research, National Institute of Oncology, Budapest, Hungary
| | - Sergi Castellano
- Great Ormond Street Institute of Child Health, University College London, London, UK
- UCL Genomics, University College London, London, UK
| |
Collapse
|
2
|
Lim D, Baek C, Blanchette M. Graphylo: A deep learning approach for predicting regulatory DNA and RNA sites from whole-genome multiple alignments. iScience 2024; 27:109002. [PMID: 38362268 PMCID: PMC10867641 DOI: 10.1016/j.isci.2024.109002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/17/2023] [Accepted: 01/19/2024] [Indexed: 02/17/2024] Open
Abstract
This study focuses on enhancing the prediction of regulatory functional sites in DNA and RNA sequences, a crucial aspect of gene regulation. Current methods, such as motif overrepresentation and machine learning, often lack specificity. To address this issue, the study leverages evolutionary information and introduces Graphylo, a deep-learning approach for predicting transcription factor binding sites in the human genome. Graphylo combines Convolutional Neural Networks for DNA sequences with Graph Convolutional Networks on phylogenetic trees, using information from placental mammals' genomes and evolutionary history. The research demonstrates that Graphylo consistently outperforms both single-species deep learning techniques and methods that incorporate inter-species conservation scores on a wide range of datasets. It achieves this by utilizing a species-based attention model for evolutionary insights and an integrated gradient approach for nucleotide-level model interpretability. This innovative approach offers a promising avenue for improving the accuracy of regulatory site prediction in genomics.
Collapse
|
3
|
Jowkar G, Pečerska J, Maiolo M, Gil M, Anisimova M. ARPIP: Ancestral sequence Reconstruction with insertions and deletions under the Poisson Indel Process. Syst Biol 2022:6648472. [PMID: 35866991 DOI: 10.1093/sysbio/syac050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 07/06/2022] [Indexed: 11/12/2022] Open
Abstract
Modern phylogenetic methods allow inference of ancestral molecular sequences given an alignment and phylogeny relating present day sequences. This provides insight into the evolutionary history of molecules, helping to understand gene function and to study biological processes such as adaptation and convergent evolution across a variety of applications. Here we propose a dynamic programming algorithm for fast joint likelihood-based reconstruction of ancestral sequences under the Poisson Indel Process (PIP). Unlike previous approaches, our method, named ARPIP, enables the reconstruction with insertions and deletions based on an explicit indel model. Consequently, inferred indel events have an explicit biological interpretation. Likelihood computation is achieved in linear time with respect to the number of sequences. Our method consists of two steps, namely finding the most probable indel points and reconstructing ancestral sequences. First, we find the most likely indel points and prune the phylogeny to reflect the insertion and deletion events per site. Second, we infer the ancestral states on the pruned subtree in a manner similar to FastML. We applied ARPIP on simulated datasets and on real data from the Betacoronavirus genus. ARPIP reconstructs both the indel events and substitutions with a high degree of accuracy. Our method fares well when compared to established state-of-the-art methods such as FastML and PAML. Moreover, the method can be extended to explore both optimal and suboptimal reconstructions, include rate heterogeneity through time and more. We believe it will expand the range of novel applications of ancestral sequence reconstruction.
Collapse
Affiliation(s)
- Gholamhossein Jowkar
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland.,University of Neuchâtel, Institute of biology, CH-2000 Neuchâtel, Switzerland
| | - Jūlija Pečerska
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Massimo Maiolo
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland.,University of Bern, Institute of Pathology, CH-3008 Bern, Switzerland
| | - Manuel Gil
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Maria Anisimova
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| |
Collapse
|
4
|
Ahsan F, Yan Z, Precup D, Blanchette M. PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information. Bioinformatics 2022; 38:i299-i306. [PMID: 35758792 PMCID: PMC9235490 DOI: 10.1093/bioinformatics/btac259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Faizy Ahsan
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | - Zichao Yan
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | - Doina Precup
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | | |
Collapse
|
5
|
Campitelli LF, Yellan I, Albu M, Barazandeh M, Patel ZM, Blanchette M, Hughes TR. Reconstruction of full-length LINE-1 progenitors from ancestral genomes. Genetics 2022; 221:6584822. [PMID: 35552404 PMCID: PMC9252281 DOI: 10.1093/genetics/iyac074] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 04/27/2022] [Indexed: 11/24/2022] Open
Abstract
Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, with 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks.
Collapse
Affiliation(s)
- Laura F Campitelli
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada.,Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Isaac Yellan
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada.,Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Marjan Barazandeh
- Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada.,Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Zain M Patel
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada.,Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Mathieu Blanchette
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.,Department of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada.,Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| |
Collapse
|
6
|
Chua M, Tan A, Tremblay-Savard O. BOPAL 2.0 and a study of tRNA and rRNA gene evolution in Clostridium. J Bioinform Comput Biol 2021; 19:2140007. [PMID: 34775921 DOI: 10.1142/s0219720021400072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We present BOPAL 2.0, an improved version of the BOPAL algorithm for the evolutionary history inference of tRNA and rRNA genes in bacterial genomes. Our approach can infer complete evolutionary scenarios and ancestral gene orders on a phylogeny and considers a wide range of events such as duplications, deletions, substitutions, inversions and transpositions. It is based on the fact that tRNA and rRNA genes are often organized in operons/clusters in bacteria, and this information is used to help identify orthologous genes for each genome comparison. BOPAL 2.0 introduces new features, such as a triple-wise alignment step, context-aware singleton matching and a second pass of the algorithm. Evaluation on simulated datasets shows that BOPAL 2.0 outperforms the original BOPAL in terms of the accuracy of inferred events and ancestral genomes. We also present a study of the tRNA/rRNA gene evolution in the Clostridium genus, in which the organization of these genes is very divergent. Our results indicate that tRNA and rRNA genes in Clostridium have evolved through numerous duplications, losses, transpositions and substitutions, but very few inversions were inferred.
Collapse
Affiliation(s)
- Meghan Chua
- Department of Computer Science, University of Manitoba, 103 Dafoe Rd W, Winnipeg, Manitoba, Canada R3T 5V6, Canada
| | - Anthony Tan
- Department of Computer Science, University of Manitoba, 103 Dafoe Rd W, Winnipeg, Manitoba, Canada R3T 5V6, Canada
| | - Olivier Tremblay-Savard
- Department of Computer Science, University of Manitoba, 103 Dafoe Rd W, Winnipeg, Manitoba, Canada R3T 5V6, Canada
| |
Collapse
|
7
|
Khan RT, Musil M, Stourac J, Damborsky J, Bednar D. Fully Automated Ancestral Sequence Reconstruction using FireProt ASR. Curr Protoc 2021; 1:e30. [PMID: 33524240 DOI: 10.1002/cpz1.30] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Protein evolution and protein engineering techniques are of great interest in basic science and industrial applications such as pharmacology, medicine, or biotechnology. Ancestral sequence reconstruction (ASR) is a powerful technique for probing evolutionary relationships and engineering robust proteins with good thermostability and broad substrate specificity. The following protocol describes the setting up and execution of an automated FireProtASR workflow using a dedicated web site. The service allows for inference of ancestral proteins automatically, from a single protein sequence. Once a protein sequence is submitted, the server will build a dataset of homology sequences, perform a multiple sequence alignment (MSA), build a phylogenetic tree, and reconstruct ancestral nodes. The protocol is also highly flexible and allows for multiple forms of input, advanced settings, and the ability to start jobs from: (i) a single sequence, (ii) a set of homologous sequences, (iii) an MSA, and (iv) a phylogenetic tree. This approach automates all necessary steps and offers a way for novices with limited exposure to ASR techniques to improve the properties of a protein of interest. The technique can even be used to introduce catalytic promiscuity into an enzyme. A web server for accessing the fully automated workflow is freely accessible at https://loschmidt.chemi.muni.cz/fireprotasr/. © 2021 Wiley Periodicals LLC. Basic Protocol: ASR using the Web Server FireProtASR.
Collapse
Affiliation(s)
- Rayyan Tariq Khan
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
8
|
Lim D, Blanchette M. EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM. Bioinformatics 2021; 36:i353-i361. [PMID: 32657367 PMCID: PMC7355264 DOI: 10.1093/bioinformatics/btaa447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Motivation Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. Results We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. Availability and implementation Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongjoon Lim
- School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada
| |
Collapse
|
9
|
Planas-Iglesias J, Marques SM, Pinto GP, Musil M, Stourac J, Damborsky J, Bednar D. Computational design of enzymes for biotechnological applications. Biotechnol Adv 2021; 47:107696. [PMID: 33513434 DOI: 10.1016/j.biotechadv.2021.107696] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 01/12/2021] [Accepted: 01/13/2021] [Indexed: 12/14/2022]
Abstract
Enzymes are the natural catalysts that execute biochemical reactions upholding life. Their natural effectiveness has been fine-tuned as a result of millions of years of natural evolution. Such catalytic effectiveness has prompted the use of biocatalysts from multiple sources on different applications, including the industrial production of goods (food and beverages, detergents, textile, and pharmaceutics), environmental protection, and biomedical applications. Natural enzymes often need to be improved by protein engineering to optimize their function in non-native environments. Recent technological advances have greatly facilitated this process by providing the experimental approaches of directed evolution or by enabling computer-assisted applications. Directed evolution mimics the natural selection process in a highly accelerated fashion at the expense of arduous laboratory work and economic resources. Theoretical methods provide predictions and represent an attractive complement to such experiments by waiving their inherent costs. Computational techniques can be used to engineer enzymatic reactivity, substrate specificity and ligand binding, access pathways and ligand transport, and global properties like protein stability, solubility, and flexibility. Theoretical approaches can also identify hotspots on the protein sequence for mutagenesis and predict suitable alternatives for selected positions with expected outcomes. This review covers the latest advances in computational methods for enzyme engineering and presents many successful case studies.
Collapse
Affiliation(s)
- Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Sérgio M Marques
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Gaspar P Pinto
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic; IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 61266 Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic.
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic.
| |
Collapse
|
10
|
Musil M, Khan RT, Beier A, Stourac J, Konegger H, Damborsky J, Bednar D. FireProtASR: A Web Server for Fully Automated Ancestral Sequence Reconstruction. Brief Bioinform 2020; 22:6042664. [PMID: 33346815 PMCID: PMC8294521 DOI: 10.1093/bib/bbaa337] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/12/2020] [Indexed: 12/13/2022] Open
Abstract
There is a great interest in increasing proteins’ stability to widen their usability in numerous biomedical and biotechnological applications. However, native proteins cannot usually withstand the harsh industrial environment, since they are evolved to function under mild conditions. Ancestral sequence reconstruction is a well-established method for deducing the evolutionary history of genes. Besides its applicability to discover the most probable evolutionary ancestors of the modern proteins, ancestral sequence reconstruction has proven to be a useful approach for the design of highly stable proteins. Recently, several computational tools were developed, which make the ancestral reconstruction algorithms accessible to the community, while leaving the most crucial steps of the preparation of the input data on users’ side. FireProtASR aims to overcome this obstacle by constructing a fully automated workflow, allowing even the unexperienced users to obtain ancestral sequences based on a sequence query as the only input. FireProtASR is complemented with an interactive, easy-to-use web interface and is freely available at https://loschmidt.chemi.muni.cz/fireprotasr/.
Collapse
Affiliation(s)
| | | | - Andy Beier
- Loschmidt Laboratories, Masaryk University
| | | | | | - Jiri Damborsky
- International Clinical Research Center at St. Ann's Teaching Hospital
| | | |
Collapse
|
11
|
Simões BF, Foley NM, Hughes GM, Zhao H, Zhang S, Rossiter SJ, Teeling EC. As Blind as a Bat? Opsin Phylogenetics Illuminates the Evolution of Color Vision in Bats. Mol Biol Evol 2019; 36:54-68. [PMID: 30476197 PMCID: PMC6340466 DOI: 10.1093/molbev/msy192] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Through their unique use of sophisticated laryngeal echolocation bats are considered sensory specialists amongst mammals and represent an excellent model in which to explore sensory perception. Although several studies have shown that the evolution of vision is linked to ecological niche adaptation in other mammalian lineages, this has not yet been fully explored in bats. Recent molecular analysis of the opsin genes, which encode the photosensitive pigments underpinning color vision, have implicated high-duty cycle (HDC) echolocation and the adoption of cave roosting habits in the degeneration of color vision in bats. However, insufficient sampling of relevant taxa has hindered definitive testing of these hypotheses. To address this, novel sequence data was generated for the SWS1 and MWS/LWS opsin genes and combined with existing data to comprehensively sample species representing diverse echolocation types and niches (SWS1 n = 115; MWS/LWS n = 45). A combination of phylogenetic analysis, ancestral state reconstruction, and selective pressure analyses were used to reconstruct the evolution of these visual pigments in bats and revealed that although both genes are evolving under purifying selection in bats, MWS/LWS is highly conserved but SWS1 is highly variable. Spectral tuning analyses revealed that MWS/LWS opsin is tuned to a long wavelength, 555-560 nm in the bat ancestor and the majority of extant taxa. The presence of UV vision in bats is supported by our spectral tuning analysis, but phylogenetic analyses demonstrated that the SWS1 opsin gene has undergone pseudogenization in several lineages. We do not find support for a link between the evolution of HDC echolocation and the pseudogenization of the SWS1 gene in bats, instead we show the SWS1 opsin is functional in the HDC echolocator, Pteronotus parnellii. Pseudogenization of the SWS1 is correlated with cave roosting habits in the majority of pteropodid species. Together these results demonstrate that the loss of UV vision in bats is more widespread than was previously considered and further elucidate the role of ecological niche specialization in the evolution of vision in bats.
Collapse
Affiliation(s)
- Bruno F Simões
- UCD School of Biology and Environmental Science, University College Dublin, Dublin 4, Ireland
- School of Earth Science, University of Bristol, Bristol, United Kingdom
- School of Biological Science, The University of Adelaide, South Australia, Australia
| | - Nicole M Foley
- UCD School of Biology and Environmental Science, University College Dublin, Dublin 4, Ireland
| | - Graham M Hughes
- UCD School of Biology and Environmental Science, University College Dublin, Dublin 4, Ireland
| | - Huabin Zhao
- Department of Ecology and Hubei Key Laboratory of Cell Homeostasis, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Shuyi Zhang
- College of Animal Science and Veterinary Medicine, Shenyang Agricultural University, Shenyang, China
| | - Stephen J Rossiter
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Emma C Teeling
- UCD School of Biology and Environmental Science, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
12
|
Musil M, Konegger H, Hon J, Bednar D, Damborsky J. Computational Design of Stable and Soluble Biocatalysts. ACS Catal 2018. [DOI: 10.1021/acscatal.8b03613] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Milos Musil
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Hannes Konegger
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Hon
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
13
|
Leclercq M, Diallo AB, Blanchette M. Prediction of human miRNA target genes using computationally reconstructed ancestral mammalian sequences. Nucleic Acids Res 2016; 45:556-566. [PMID: 27899600 PMCID: PMC5314757 DOI: 10.1093/nar/gkw1085] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 09/26/2016] [Accepted: 11/13/2016] [Indexed: 11/14/2022] Open
Abstract
MicroRNAs (miRNA) are short single-stranded RNA molecules derived from hairpin-forming precursors that play a crucial role as post-transcriptional regulators in eukaryotes and viruses. In the past years, many microRNA target genes (MTGs) have been identified experimentally. However, because of the high costs of experimental approaches, target genes databases remain incomplete. Although several target prediction programs have been developed in the recent years to identify MTGs in silico, their specificity and sensitivity remain low. Here, we propose a new approach called MirAncesTar, which uses ancestral genome reconstruction to boost the accuracy of existing MTGs prediction tools for human miRNAs. For each miRNA and each putative human target UTR, our algorithm makes uses of existing prediction tools to identify putative target sites in the human UTR, as well as in its mammalian orthologs and inferred ancestral sequences. It then evaluates evidence in support of selective pressure to maintain target site counts (rather than sequences), accounting for the possibility of target site turnover. It finally integrates this measure with several simpler ones using a logistic regression predictor. MirAncesTar improves the accuracy of existing MTG predictors by 26% to 157%. Source code and prediction results for human miRNAs, as well as supporting evolutionary data are available at http://cs.mcgill.ca/∼blanchem/mirancestar.
Collapse
Affiliation(s)
- Mickael Leclercq
- School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, H3A0E9, Canada
| | - Abdoulaye Baniré Diallo
- Laboratoire de bio-informatique du département informatique, Université du Québec à Montréal, Montréal, Québec H2X 3Y7, Canada
| | - Mathieu Blanchette
- School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, H3A0E9, Canada
| |
Collapse
|
14
|
Amarasinghe S, Watson-Haigh NS, Gilliham M, Roy S, Baumann U. The evolutionary origin of CIPK16: A gene involved in enhanced salt tolerance. Mol Phylogenet Evol 2016; 100:135-147. [DOI: 10.1016/j.ympev.2016.03.031] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Revised: 03/30/2016] [Accepted: 03/31/2016] [Indexed: 12/26/2022]
|
15
|
Affiliation(s)
- Jeffrey B. Joy
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- University of British Columbia, Department of Medicine, Vancouver, British Columbia, Canada
| | - Richard H. Liang
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | | | - T. Nguyen
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Art F. Y. Poon
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- University of British Columbia, Department of Medicine, Vancouver, British Columbia, Canada
| |
Collapse
|
16
|
James D, Sanderson D, Varga A, Sheveleva A, Chirkov S. Genome Sequence Analysis of New Isolates of the Winona Strain of Plum pox virus and the First Definitive Evidence of Intrastrain Recombination Events. PHYTOPATHOLOGY 2016; 106:407-416. [PMID: 26667187 DOI: 10.1094/phyto-09-15-0211-r] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Plum pox virus (PPV) is genetically diverse with nine different strains identified. Mutations, indel events, and interstrain recombination events are known to contribute to the genetic diversity of PPV. This is the first report of intrastrain recombination events that contribute to PPV's genetic diversity. Fourteen isolates of the PPV strain Winona (W) were analyzed including nine new strain W isolates sequenced completely in this study. Isolates of other strains of PPV with more than one isolate with the complete genome sequence available in GenBank were included also in this study for comparison and analysis. Five intrastrain recombination events were detected among the PPV W isolates, one among PPV C strain isolates, and one among PPV M strain isolates. Four (29%) of the PPV W isolates analyzed are recombinants; one of which (P2-1) is a mosaic, with three recombination events identified. A new interstrain recombinant event was identified between a strain M isolate and a strain Rec isolate, a known recombinant. In silico recombination studies and pairwise distance analyses of PPV strain D isolates indicate that a threshold of genetic diversity exists for the detectability of recombination events, in the range of approximately 0.78×10(-2) to 1.33×10(-2) mean pairwise distance. RDP4 analyses indicate that in the case of PPV Rec isolates there may be a recombinant breakpoint distinct from the obvious transition point of strain sequences. Evidence was obtained that indicates that the frequency of PPV recombination is underestimated, which may be true for other RNA viruses where low genetic diversity exists.
Collapse
Affiliation(s)
- Delano James
- First, second, and third authors: Centre for Plant Health-Sidney Laboratory, Canadian Food Inspection Agency, 8801 East Saanich Road, North Saanich, British Columbia, V8L 1H3, Canada; and fourth and fifth authors: Department of Virology, Biology Faculty, Lomonosov Moscow State University, Leninskie Gory MSU 1/12, Moscow, 119991, Russia
| | - Dan Sanderson
- First, second, and third authors: Centre for Plant Health-Sidney Laboratory, Canadian Food Inspection Agency, 8801 East Saanich Road, North Saanich, British Columbia, V8L 1H3, Canada; and fourth and fifth authors: Department of Virology, Biology Faculty, Lomonosov Moscow State University, Leninskie Gory MSU 1/12, Moscow, 119991, Russia
| | - Aniko Varga
- First, second, and third authors: Centre for Plant Health-Sidney Laboratory, Canadian Food Inspection Agency, 8801 East Saanich Road, North Saanich, British Columbia, V8L 1H3, Canada; and fourth and fifth authors: Department of Virology, Biology Faculty, Lomonosov Moscow State University, Leninskie Gory MSU 1/12, Moscow, 119991, Russia
| | - Anna Sheveleva
- First, second, and third authors: Centre for Plant Health-Sidney Laboratory, Canadian Food Inspection Agency, 8801 East Saanich Road, North Saanich, British Columbia, V8L 1H3, Canada; and fourth and fifth authors: Department of Virology, Biology Faculty, Lomonosov Moscow State University, Leninskie Gory MSU 1/12, Moscow, 119991, Russia
| | - Sergei Chirkov
- First, second, and third authors: Centre for Plant Health-Sidney Laboratory, Canadian Food Inspection Agency, 8801 East Saanich Road, North Saanich, British Columbia, V8L 1H3, Canada; and fourth and fifth authors: Department of Virology, Biology Faculty, Lomonosov Moscow State University, Leninskie Gory MSU 1/12, Moscow, 119991, Russia
| |
Collapse
|
17
|
Ezawa K. Characterization of multiple sequence alignment errors using complete-likelihood score and position-shift map. BMC Bioinformatics 2016; 17:133. [PMID: 26992851 PMCID: PMC4799563 DOI: 10.1186/s12859-016-0945-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 02/11/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reconstruction of multiple sequence alignments (MSAs) is a crucial step in most homology-based sequence analyses, which constitute an integral part of computational biology. To improve the accuracy of this crucial step, it is essential to better characterize errors that state-of-the-art aligners typically make. For this purpose, we here introduce two tools: the complete-likelihood score and the position-shift map. RESULTS The logarithm of the total probability of a MSA under a stochastic model of sequence evolution along a time axis via substitutions, insertions and deletions (called the "complete-likelihood score" here) can serve as an ideal score of the MSA. A position-shift map, which maps the difference in each residue's position between two MSAs onto one of them, can clearly visualize where and how MSA errors occurred and help disentangle composite errors. To characterize MSA errors using these tools, we constructed three sets of simulated MSAs of selectively neutral mammalian DNA sequences, with small, moderate and large divergences, under a stochastic evolutionary model with an empirically common power-law insertion/deletion length distribution. Then, we reconstructed MSAs using MAFFT and Prank as representative state-of-the-art single-optimum-search aligners. About 40-99% of the hundreds of thousands of gapped segments were involved in alignment errors. In a substantial fraction, from about 1/4 to over 3/4, of erroneously reconstructed segments, reconstructed MSAs by each aligner showed complete-likelihood scores not lower than those of the true MSAs. Out of the remaining errors, a majority by an iterative option of MAFFT showed discrepancies between the aligner-specific score and the complete-likelihood score, and a majority by Prank seemed due to inadequate exploration of the MSA space. Analyses by position-shift maps indicated that true MSAs are in considerable neighborhoods of reconstructed MSAs in about 80-99% of the erroneous segments for small and moderate divergences, but in only a minority for large divergences. CONCLUSIONS The results of this study suggest that measures to further improve the accuracy of reconstructed MSAs would substantially differ depending on the types of aligners. They also re-emphasize the importance of obtaining a probability distribution of fairly likely MSAs, instead of just searching for a single optimum MSA.
Collapse
Affiliation(s)
- Kiyoshi Ezawa
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, 820-8502, Japan. .,Department of Biology and Biochemistry, University of Houston, Houston, TX, 77204-5001, USA.
| |
Collapse
|
18
|
Paijmans JLA, Fickel J, Courtiol A, Hofreiter M, Förster DW. Impact of enrichment conditions on cross-species capture of fresh and degraded DNA. Mol Ecol Resour 2015; 16:42-55. [DOI: 10.1111/1755-0998.12420] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 04/21/2015] [Accepted: 04/23/2015] [Indexed: 12/12/2022]
Affiliation(s)
- Johanna L. A. Paijmans
- Department of Biology; University of York; York YO10 5DD UK
- Institute for Biochemistry and Biology; University of Potsdam; Karl-Liebknecht-Str 24-25 14476 Potsdam Germany
| | - Joerns Fickel
- Institute for Biochemistry and Biology; University of Potsdam; Karl-Liebknecht-Str 24-25 14476 Potsdam Germany
- Evolutionary Genetics Department, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Str. 17; 10315 Berlin Germany
| | - Alexandre Courtiol
- Evolutionary Genetics Department, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Str. 17; 10315 Berlin Germany
| | - Michael Hofreiter
- Department of Biology; University of York; York YO10 5DD UK
- Institute for Biochemistry and Biology; University of Potsdam; Karl-Liebknecht-Str 24-25 14476 Potsdam Germany
| | - Daniel W. Förster
- Evolutionary Genetics Department, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Str. 17; 10315 Berlin Germany
| |
Collapse
|
19
|
Kwak D, Kam A, Becerra D, Zhou Q, Hops A, Zarour E, Kam A, Sarmenta L, Blanchette M, Waldispühl J. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment. Genome Biol 2014; 14:R116. [PMID: 24148814 DOI: 10.1186/gb-2013-14-10-r116] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 10/22/2013] [Indexed: 11/10/2022] Open
Abstract
Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem.
Collapse
|
20
|
Wang GZ, Marini S, Ma X, Yang Q, Zhang X, Zhu Y. Improvement of Dscam homophilic binding affinity throughout Drosophila evolution. BMC Evol Biol 2014; 14:186. [PMID: 25158691 PMCID: PMC4243935 DOI: 10.1186/s12862-014-0186-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 08/07/2014] [Indexed: 11/30/2022] Open
Abstract
Background Drosophila Dscam1 is a cell-surface protein that plays important roles in neural development and axon tiling of neurons. It is known that thousands of isoforms bind themselves through specific homophilic interactions, a process which provides the basis for cellular self-recognition. Detailed biochemical studies of specific isoforms strongly suggest that homophilic binding, i.e. the formation of homodimers by identical Dscam1 isomers, is of great importance for the self-avoidance of neurons. Due to experimental limitations, it is currently impossible to measure the homophilic binding affinities for all 19,000 potential isoforms. Results Here we reconstructed the DNA sequences of an ancestral Dscam form (which likely existed approximately 40 ~ 50 million years ago) using a comparative genomic approach. On the basis of this sequence, we established a working model to predict the self-binding affinities of all isoforms in both the current and the ancestral genome, using machine-learning methods. Detailed computational analysis was performed to compare the self-binding affinities of all isoforms present in these two genomes. Our results revealed that 1) isoforms containing newly derived variable domains exhibit higher self-binding affinities than those with conserved domains, and 2) current isoforms display higher self-binding affinities than their counterparts in the ancient genome. As thousands of Dscam isoforms are needed for the self-avoidance of the neuron, we propose that an increase in self-binding affinity provides the basis for the successful evolution of the arthropod brain. Conclusions Our data presented here provide an excellent model for future experimental studies of the binding behavior of Dscam isoforms. The results of our analysis indicate that evolution favored the rise of novel variable domains thanks to their higher self-binding affinities, rather than selection merely on the basis of simple expansion of isoform diversity, as that this particular selection process would have established the powerful mechanisms required for neuronal self-avoidance. Thus, we reveal here a new molecular mechanism for the successful evolution of arthropod brains. Electronic supplementary material The online version of this article (doi:10.1186/s12862-014-0186-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | - Xuegong Zhang
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Beijing 100101, China.
| | | |
Collapse
|
21
|
McCloskey RM, Liang RH, Harrigan PR, Brumme ZL, Poon AFY. An evaluation of phylogenetic methods for reconstructing transmitted HIV variants using longitudinal clonal HIV sequence data. J Virol 2014; 88:6181-94. [PMID: 24648453 PMCID: PMC4093844 DOI: 10.1128/jvi.00483-14] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 03/11/2014] [Indexed: 11/20/2022] Open
Abstract
UNLABELLED A population of human immunodeficiency virus (HIV) within a host often descends from a single transmitted/founder virus. The high mutation rate of HIV, coupled with long delays between infection and diagnosis, make isolating and characterizing this strain a challenge. In theory, ancestral reconstruction could be used to recover this strain from sequences sampled in chronic infection; however, the accuracy of phylogenetic techniques in this context is unknown. To evaluate the accuracy of these methods, we applied ancestral reconstruction to a large panel of published longitudinal clonal and/or single-genome-amplification HIV sequence data sets with at least one intrapatient sequence set sampled within 6 months of infection or seroconversion (n = 19,486 sequences, median [interquartile range] = 49 [20 to 86] sequences/set). The consensus of the earliest sequences was used as the best possible estimate of the transmitted/founder. These sequences were compared to ancestral reconstructions from sequences sampled at later time points using both phylogenetic and phylogeny-naive methods. Overall, phylogenetic methods conferred a 16% improvement in reproducing the consensus of early sequences, compared to phylogeny-naive methods. This relative advantage increased with intrapatient sequence diversity (P < 10(-5)) and the time elapsed between the earliest and subsequent samples (P < 10(-5)). However, neither approach performed well for reconstructing ancestral indel variation, especially within indel-rich regions of the HIV genome. Although further improvements are needed, our results indicate that phylogenetic methods for ancestral reconstruction significantly outperform phylogeny-naive alternatives, and we identify experimental conditions and study designs that can enhance accuracy of transmitted/founder virus reconstruction. IMPORTANCE When HIV is transmitted into a new host, most of the viruses fail to infect host cells. Consequently, an HIV infection tends to be descended from a single "founder" virus. A priority target for the vaccine research, these transmitted/founder viruses are difficult to isolate since newly infected individuals are often unaware of their status for months or years, by which time the virus population has evolved substantially. Here, we report on the potential use of evolutionary methods to reconstruct the genetic sequence of the transmitted/founder virus from its descendants at later stages of an infection. These methods can recover this ancestral sequence with an overall error rate of about 2.3%-about 15% more information than if we had ignored the evolutionary relationships among viruses. Although there is no substitute for sampling infections at earlier points in time, these methods can provide useful information about the genetic makeup of transmitted/founder HIV.
Collapse
Affiliation(s)
- Rosemary M. McCloskey
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Richard H. Liang
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - P. Richard Harrigan
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Zabrina L. Brumme
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Art F. Y. Poon
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
22
|
Macqueen DJ, Johnston IA. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proc Biol Sci 2014; 281:20132881. [PMID: 24452024 PMCID: PMC3906940 DOI: 10.1098/rspb.2013.2881] [Citation(s) in RCA: 270] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Whole genome duplication (WGD) is often considered to be mechanistically associated with species diversification. Such ideas have been anecdotally attached to a WGD at the stem of the salmonid fish family, but remain untested. Here, we characterized an extensive set of gene paralogues retained from the salmonid WGD, in species covering the major lineages (subfamilies Salmoninae, Thymallinae and Coregoninae). By combining the data in calibrated relaxed molecular clock analyses, we provide the first well-constrained and direct estimate for the timing of the salmonid WGD. Our results suggest that the event occurred no later in time than 88 Ma and that 40-50 Myr passed subsequently until the subfamilies diverged. We also recovered a Thymallinae-Coregoninae sister relationship with maximal support. Comparative phylogenetic tests demonstrated that salmonid diversification patterns are closely allied in time with the continuous climatic cooling that followed the Eocene-Oligocene transition, with the highest diversification rates coinciding with recent ice ages. Further tests revealed considerably higher speciation rates in lineages that evolved anadromy--the physiological capacity to migrate between fresh and seawater--than in sister groups that retained the ancestral state of freshwater residency. Anadromy, which probably evolved in response to climatic cooling, is an established catalyst of genetic isolation, particularly during environmental perturbations (for example, glaciation cycles). We thus conclude that climate-linked ecophysiological factors, rather than WGD, were the primary drivers of salmonid diversification.
Collapse
Affiliation(s)
- Daniel J Macqueen
- Institute of Biological and Environmental Sciences, University of Aberdeen, , Tillydrone Avenue, Aberdeen AB24 2TZ, UK, Scottish Oceans Institute, School of Biology, University of St Andrews, , St Andrews, Fife KY16 8LB, UK
| | | |
Collapse
|
23
|
Abstract
MOTIVATIONS Recent progress in ancient DNA sequencing technologies and protocols has lead to the sequencing of whole ancient bacterial genomes, as illustrated by the recent sequence of the Yersinia pestis strain that caused the Black Death pandemic. However, sequencing ancient genomes raises specific problems, because of the decay and fragmentation of ancient DNA among others, making the scaffolding of ancient contigs challenging. RESULTS We show that computational paleogenomics methods aimed at reconstructing the organization of ancestral genomes from the comparison of extant genomes can be adapted to correct, order and orient ancient bacterial contigs. We describe the method FPSAC (fast phylogenetic scaffolding of ancient contigs) and apply it on a set of 2134 ancient contigs assembled from the recently sequenced Black Death agent genome. We obtain a unique scaffold for the whole chromosome of this ancient genome that allows to gain precise insights into the structural evolution of the Yersinia clade.
Collapse
Affiliation(s)
- Ashok Rajaraman
- Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A1S6, Canada, International Graduate Training Center in Mathematical Biology, Pacific Institute for the Mathematical Sciences, Vancouver (BC), Canada, INRIA Grenoble Rhône-Alpes, Montbonnot 38334, France, Université de Lyon 1, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558 F-69622 Villeurbanne, France and LaBRI, Université Bordeaux I, 33405 Talence, France
| | | | | |
Collapse
|
24
|
Smith JD, McManus KF, Fraser HB. A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers. Mol Biol Evol 2013; 30:2509-18. [PMID: 23904330 DOI: 10.1093/molbev/mst134] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Measuring natural selection on genomic elements involved in the cis-regulation of gene expression--such as transcriptional enhancers and promoters--is critical for understanding the evolution of genomes, yet it remains a major challenge. Many studies have attempted to detect positive or negative selection in these noncoding elements by searching for those with the fastest or slowest rates of evolution, but this can be problematic. Here, we introduce a new approach to this issue, and demonstrate its utility on three mammalian transcriptional enhancers. Using results from saturation mutagenesis studies of these enhancers, we classified all possible point mutations as upregulating, downregulating, or silent, and determined which of these mutations have occurred on each branch of a phylogeny. Applying a framework analogous to Ka/Ks in protein-coding genes, we measured the strength of selection on upregulating and downregulating mutations, in specific branches as well as entire phylogenies. We discovered distinct modes of selection acting on different enhancers: although all three have experienced negative selection against downregulating mutations, the selection pressures on upregulating mutations vary. In one case, we detected positive selection for upregulation, whereas the other two had no detectable selection on upregulating mutations. Our methodology is applicable to the growing number of saturation mutagenesis data sets, and provides a detailed picture of the mode and strength of natural selection acting on cis-regulatory elements.
Collapse
|
25
|
Blanchette M. Exploiting ancestral mammalian genomes for the prediction of human transcription factor binding sites. BMC Bioinformatics 2012; 13 Suppl 19:S2. [PMID: 23281809 PMCID: PMC3526440 DOI: 10.1186/1471-2105-13-s19-s2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The computational prediction of Transcription Factor Binding Sites (TFBS) remains a challenge due to their short length and low information content. Comparative genomics approaches that simultaneously consider several related species and favor sites that have been conserved throughout evolution improve the accuracy (specificity) of the predictions but are limited due to a phenomenon called binding site turnover, where sequence evolution causes one TFBS to replace another in the same region. In parallel to this development, an increasing number of mammalian genomes are now sequenced and it is becoming possible to infer, to a surprisingly high degree of accuracy, ancestral mammalian sequences. Results We propose a TFBS prediction approach that makes use of the availability of inferred ancestral mammalian genomes to improve its accuracy. This method aims to identify binding loci, which are regions of a few hundred base pairs that have preserved their potential to bind a given transcription factor over evolutionary time. After proposing a neutral evolutionary model of predicted TFBS counts in a DNA region of a given length, we use it to identify regions that have preserved the number of predicted TFBS they contain to an unexpected degree given their divergence. The approach is applied to human chromosome 1 and shows significant gains in accuracy as compared to both existing single-species and multi-species TFBS prediction approaches, in particular for transcription factors that are subject to high turnover rates. Availability The source code and predictions made by the program are available at http://www.cs.mcgill.ca/~blanchem/bindingLoci.
Collapse
Affiliation(s)
- Mathieu Blanchette
- McGill Centre for Bioinformatics and School of Computer Science, McGill University, H3C 2B4 Québec, Canada.
| |
Collapse
|
26
|
Dröge J, Pande A, Englander EW, Makałowski W. Comparative genomics of neuroglobin reveals its early origins. PLoS One 2012; 7:e47972. [PMID: 23133533 PMCID: PMC3485006 DOI: 10.1371/journal.pone.0047972] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2012] [Accepted: 09/24/2012] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Neuroglobin (Ngb) is a hexacoordinated globin expressed mainly in the central and peripheral nervous system of vertebrates. Although several hypotheses have been put forward regarding the role of neuroglobin, its definite function remains uncertain. Ngb appears to have a neuro-protective role enhancing cell viability under hypoxia and other types of oxidative stress. Ngb is phylogenetically ancient and has a substitution rate nearly four times lower than that of other vertebrate globins, e.g. hemoglobin. Despite its high sequence conservation among vertebrates Ngb seems to be elusive in invertebrates. PRINCIPAL FINDINGS We determined candidate orthologs in invertebrates and identified a globin of the placozoan Trichoplax adhaerens that is most likely orthologous to vertebrate Ngb and confirmed the orthologous relationship of the polymeric globin of the sea urchin Strongylocentrotus purpuratus to Ngb. The putative orthologous globin genes are located next to genes orthologous to vertebrate POMT2 similarly to localization of vertebrate Ngb. The shared syntenic position of the globins from Trichoplax, the sea urchin and of vertebrate Ngb strongly suggests that they are orthologous. A search for conserved transcription factor binding sites (TFBSs) in the promoter regions of the Ngb genes of different vertebrates via phylogenetic footprinting revealed several TFBSs, which may contribute to the specific expression of Ngb, whereas a comparative analysis with myoglobin revealed several common TFBSs, suggestive of regulatory mechanisms common to globin genes. SIGNIFICANCE Identification of the placozoan and echinoderm genes orthologous to vertebrate neuroglobin strongly supports the hypothesis of the early evolutionary origin of this globin, as it shows that neuroglobin was already present in the placozoan-bilaterian last common ancestor. Computational determination of the transcription factor binding sites repertoire provides on the one hand a set of transcriptional factors that are responsible for the specific expression of the Ngb genes and on the other hand a set of factors potentially controlling expression of a couple of different globin genes.
Collapse
Affiliation(s)
- Jasmin Dröge
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Muenster, Germany
| | - Amit Pande
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Muenster, Germany
| | - Ella W. Englander
- Department of Surgery, University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Wojciech Makałowski
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Muenster, Germany
| |
Collapse
|
27
|
Sadri J, Diallo AB, Blanchette M. Predicting site-specific human selective pressure using evolutionary signatures. Bioinformatics 2011; 27:i266-74. [PMID: 21685080 PMCID: PMC3117352 DOI: 10.1093/bioinformatics/btr241] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation: The identification of non-coding functional regions of the human genome remains one of the main challenges of genomics. By observing how a given region evolved over time, one can detect signs of negative or positive selection hinting that the region may be functional. With the quickly increasing number of vertebrate genomes to compare with our own, this type of approach is set to become extremely powerful, provided the right analytical tools are available. Results: A large number of approaches have been proposed to measure signs of past selective pressure, usually in the form of reduced mutation rate. Here, we propose a radically different approach to the detection of non-coding functional region: instead of measuring past evolutionary rates, we build a machine learning classifier to predict current substitution rates in human based on the inferred evolutionary events that affected the region during vertebrate evolution. We show that different types of evolutionary events, occurring along different branches of the phylogenetic tree, bring very different amounts of information. We propose a number of simple machine learning classifiers and show that a Support-Vector Machine (SVM) predictor clearly outperforms existing tools at predicting human non-coding functional sites. Comparison to external evidences of selection and regulatory function confirms that these SVM predictions are more accurate than those of other approaches. Availability: The predictor and predictions made are available at http://www.mcb.mcgill.ca/~blanchem/sadri. Contact:blanchem@mcb.mcgill.ca
Collapse
Affiliation(s)
- Javad Sadri
- School of Computer Science, McGill University, 3630 University, Montreal, QC, Canada H3A 2B2
| | | | | |
Collapse
|
28
|
Horvath JE, Sheedy CB, Merrett SL, Diallo AB, Swofford DL, NISC Comparative Sequencing Program, Green ED, Willard HF. Comparative analysis of the primate X-inactivation center region and reconstruction of the ancestral primate XIST locus. Genome Res 2011; 21:850-62. [PMID: 21518738 DOI: 10.1101/gr.111849.110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Here we provide a detailed comparative analysis across the candidate X-Inactivation Center (XIC) region and the XIST locus in the genomes of six primates and three mammalian outgroup species. Since lemurs and other strepsirrhine primates represent the sister lineage to all other primates, this analysis focuses on lemurs to reconstruct the ancestral primate sequences and to gain insight into the evolution of this region and the genes within it. This comparative evolutionary genomics approach reveals significant expansion in genomic size across the XIC region in higher primates, with minimal size alterations across the XIST locus itself. Reconstructed primate ancestral XIC sequences show that the most dramatic changes during the past 80 million years occurred between the ancestral primate and the lineage leading to Old World monkeys. In contrast, the XIST locus compared between human and the primate ancestor does not indicate any dramatic changes to exons or XIST-specific repeats; rather, evolution of this locus reflects small incremental changes in overall sequence identity and short repeat insertions. While this comparative analysis reinforces that the region around XIST has been subject to significant genomic change, even among primates, our data suggest that evolution of the XIST sequences themselves represents only small lineage-specific changes across the past 80 million years.
Collapse
Affiliation(s)
- Julie E Horvath
- Duke Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708, USA.
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Price N, Cartwright RA, Sabath N, Graur D, Azevedo RBR. Neutral evolution of robustness in Drosophila microRNA precursors. Mol Biol Evol 2011; 28:2115-23. [PMID: 21285032 DOI: 10.1093/molbev/msr029] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Mutational robustness describes the extent to which a phenotype remains unchanged in the face of mutations. Theory predicts that the strength of direct selection for mutational robustness is at most the magnitude of the rate of deleterious mutation. As far as nucleic acid sequences are concerned, only long sequences in organisms with high deleterious mutation rates and large population sizes are expected to evolve mutational robustness. Surprisingly, recent studies have concluded that molecules that meet none of these conditions--the microRNA precursors (pre-miRNAs) of multicellular eukaryotes--show signs of selection for mutational and/or environmental robustness. To resolve the apparent disagreement between theory and these studies, we have reconstructed the evolutionary history of Drosophila pre-miRNAs and compared the robustness of each sequence to that of its reconstructed ancestor. In addition, we "replayed the tape" of pre-miRNA evolution via simulation under different evolutionary assumptions and compared these alternative histories with the actual one. We found that Drosophila pre-miRNAs have evolved under strong purifying selection against changes in secondary structure. Contrary to earlier claims, there is no evidence that these RNAs have been shaped by either direct or congruent selection for any kind of robustness. Instead, the high robustness of Drosophila pre-miRNAs appears to be mostly intrinsic and likely a consequence of selection for functional structures.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Biology and Biochemistry, University of Houston, USA.
| | | | | | | | | |
Collapse
|
30
|
Menzel P, Stadler PF, Gorodkin J. maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences. ACTA ACUST UNITED AC 2010; 27:317-25. [PMID: 21123221 PMCID: PMC3031029 DOI: 10.1093/bioinformatics/btq651] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches. RESULTS We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset. AVAILABILITY maxAlike is available for download and web server at: http://rth.dk/resources/maxAlike.
Collapse
Affiliation(s)
- Peter Menzel
- Center for non-coding RNA in Technology and Health, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark
| | | | | |
Collapse
|
31
|
Abstract
Like other RNA viruses, coxsackievirus B5 (CVB5) exists as circulating heterogeneous populations of genetic variants. In this study, we present the reconstruction and characterization of a probable ancestral virion of CVB5. Phylogenetic analyses based on capsid protein-encoding regions (the VP1 gene of 41 clinical isolates and the entire P1 region of eight clinical isolates) of CVB5 revealed two major cocirculating lineages. Ancestral capsid sequences were inferred from sequences of these contemporary CVB5 isolates by using maximum likelihood methods. By using Bayesian phylodynamic analysis, the inferred VP1 ancestral sequence dated back to 1854 (1807 to 1898). In order to study the properties of the putative ancestral capsid, the entire ancestral P1 sequence was synthesized de novo and inserted into the replicative backbone of an infectious CVB5 cDNA clone. Characterization of the recombinant virus in cell culture showed that fully functional infectious virus particles were assembled and that these viruses displayed properties similar to those of modern isolates in terms of receptor preferences, plaque phenotypes, growth characteristics, and cell tropism. This is the first report describing the resurrection and characterization of a picornavirus with a putative ancestral capsid. Our approach, including a phylogenetics-based reconstruction of viral predecessors, could serve as a starting point for experimental studies of viral evolution and might also provide an alternative strategy for the development of vaccines.
Collapse
|