1
|
Northover DE, Shank SD, Liberles DA. Characterizing lineage-specific evolution and the processes driving genomic diversification in chordates. BMC Evol Biol 2020; 20:24. [PMID: 32046633 PMCID: PMC7011509 DOI: 10.1186/s12862-020-1585-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 01/16/2020] [Indexed: 11/21/2022] Open
Abstract
Background Understanding the origins of genome content has long been a goal of molecular evolution and comparative genomics. By examining genome evolution through the guise of lineage-specific evolution, it is possible to make inferences about the evolutionary events that have given rise to species-specific diversification. Here we characterize the evolutionary trends found in chordate species using The Adaptive Evolution Database (TAED). TAED is a database of phylogenetically indexed gene families designed to detect episodes of directional or diversifying selection across chordates. Gene families within the database have been assessed for lineage-specific estimates of dN/dS and have been reconciled to the chordate species to identify retained duplicates. Gene families have also been mapped to the functional pathways and amino acid changes which occurred on high dN/dS lineages have been mapped to protein structures. Results An analysis of this exhaustive database has enabled a characterization of the processes of lineage-specific diversification in chordates. A pathway level enrichment analysis of TAED determined that pathways most commonly found to have elevated rates of evolution included those involved in metabolism, immunity, and cell signaling. An analysis of protein fold presence on proteins, after normalizing for frequency in the database, found common folds such as Rossmann folds, Jelly Roll folds, and TIM barrels were overrepresented on proteins most likely to undergo directional selection. A set of gene families which experience increased numbers of duplications within short evolutionary times are associated with pathways involved in metabolism, olfactory reception, and signaling. An analysis of protein secondary structure indicated more relaxed constraint in β-sheets and stronger constraint on alpha Helices, amidst a general preference for substitutions at exposed sites. Lastly a detailed analysis of the ornithine decarboxylase gene family, a key enzyme in the pathway for polyamine synthesis, revealed lineage-specific evolution along the lineage leading to Cetacea through rapid sequence evolution in a duplicate gene with amino acid substitutions causing active site rearrangement. Conclusion Episodes of lineage-specific evolution are frequent throughout chordate species. Both duplication and directional selection have played large roles in the evolution of the phylum. TAED is a powerful tool for facilitating this understanding of lineage-specific evolution.
Collapse
Affiliation(s)
- David E Northover
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - Stephen D Shank
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA. .,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA.
| |
Collapse
|
2
|
The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates. J Mol Evol 2017; 85:46-56. [PMID: 28795237 DOI: 10.1007/s00239-017-9806-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 08/03/2017] [Indexed: 12/11/2022]
Abstract
With the large collections of gene and genome sequences, there is a need to generate curated comparative genomic databases that enable interpretation of results in an evolutionary context. Such resources can facilitate an understanding of the co-evolution of genes in the context of a genome mapped onto a phylogeny, of a protein structure, and of interactions within a pathway. A phylogenetically indexed gene family database, the adaptive evolution database (TAED), is presented that organizes gene families and their evolutionary histories in a species tree context. Gene families include alignments, phylogenetic trees, lineage-specific dN/dS ratios, reconciliation with the species tree to enable both the mapping and the identification of duplication events, mapping of gene families onto pathways, and mapping of amino acid substitutions onto protein structures. In addition to organization of the data, new phylogenetic visualization tools have been developed to aid in interpreting the data that are also available, including TreeThrasher and TAED Tree Viewer. A new resource of gene families organized by species and taxonomic lineage promises to be a valuable comparative genomics database for molecular biologists, evolutionary biologists, and ecologists. The new visualization tools and database framework will be of interest to both evolutionary biologists and bioinformaticians.
Collapse
|
3
|
Simakov O, Kawashima T. Independent evolution of genomic characters during major metazoan transitions. Dev Biol 2016; 427:179-192. [PMID: 27890449 DOI: 10.1016/j.ydbio.2016.11.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 11/08/2016] [Accepted: 11/14/2016] [Indexed: 02/03/2023]
Abstract
Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies.
Collapse
Affiliation(s)
- Oleg Simakov
- Okinawa Institute of Science and Technology, Okinawa, Japan.
| | | |
Collapse
|
4
|
Extracting functional trends from whole genome duplication events using comparative genomics. Biol Proced Online 2016; 18:11. [PMID: 27168732 PMCID: PMC4862183 DOI: 10.1186/s12575-016-0041-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 04/24/2016] [Indexed: 01/06/2023] Open
Abstract
Background The number of species with completed genomes, including those with evidence for recent whole genome duplication events has exploded. The recently sequenced Atlantic salmon genome has been through two rounds of whole genome duplication since the divergence of teleost fish from the lineage that led to amniotes. This quadrupoling of the number of potential genes has led to complex patterns of retention and loss among gene families. Results Methods have been developed to characterize the interplay of duplicate gene retention processes across both whole genome duplication events and additional smaller scale duplication events. Further, gene expression divergence data has become available as well for Atlantic salmon and the closely related, pre-whole genome duplication pike and methods to describe expression divergence are also presented. These methods for the characterization of duplicate gene retention and gene expression divergence that have been applied to salmon are described. Conclusions With the growth in available genomic and functional data, the opportunities to extract functional inference from large scale duplicates using comparative methods have expanded dramatically. Recently developed methods that further this inference for duplicated genes have been described. Electronic supplementary material The online version of this article (doi:10.1186/s12575-016-0041-2) contains supplementary material, which is available to authorized users.
Collapse
|
5
|
Wang K, Ouyang H, Xie Z, Yao C, Guo N, Li M, Jiao H, Pang D. Efficient Generation of Myostatin Mutations in Pigs Using the CRISPR/Cas9 System. Sci Rep 2015; 5:16623. [PMID: 26564781 PMCID: PMC4643223 DOI: 10.1038/srep16623] [Citation(s) in RCA: 110] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 10/16/2015] [Indexed: 12/15/2022] Open
Abstract
Genetically modified pigs are increasingly used for biomedical and agricultural applications. The efficient CRISPR/Cas9 gene editing system holds great promise for the generation of gene-targeting pigs without selection marker genes. In this study, we aimed to disrupt the porcine myostatin (MSTN) gene, which functions as a negative regulator of muscle growth. The transfection efficiency of porcine fetal fibroblasts (PFFs) was improved to facilitate the targeting of Cas9/gRNA. We also demonstrated that Cas9/gRNA can induce non-homologous end-joining (NHEJ), long fragment deletions/inversions and homology-directed repair (HDR) at the MSTN locus of PFFs. Single-cell MSTN knockout colonies were used to generate cloned pigs via somatic cell nuclear transfer (SCNT), which resulted in 8 marker-gene-free cloned pigs with biallelic mutations. Some of the piglets showed obvious intermuscular grooves and enlarged tongues, which are characteristic of the double muscling (DM) phenotype. The protein level of MSTN was decreased in the mutant cloned pigs compared with the wild-type controls, and the mRNA levels of MSTN and related signaling pathway factors were also analyzed. Finally, we carefully assessed off-target mutations in the cloned pigs. The gene editing platform used in this study can efficiently generate genetically modified pigs with biological safety.
Collapse
Affiliation(s)
- Kankan Wang
- Jilin Provincial Key Laboratory of Animal Embryo Engineering, College of Animal Sciences, Jilin University, Changchun, Jilin Province, People’s Republic of China
| | - Hongsheng Ouyang
- Jilin Provincial Key Laboratory of Animal Embryo Engineering, College of Animal Sciences, Jilin University, Changchun, Jilin Province, People’s Republic of China
| | - Zicong Xie
- Jilin Provincial Key Laboratory of Animal Embryo Engineering, College of Animal Sciences, Jilin University, Changchun, Jilin Province, People’s Republic of China
| | - Chaogang Yao
- Jilin Provincial Key Laboratory of Animal Embryo Engineering, College of Animal Sciences, Jilin University, Changchun, Jilin Province, People’s Republic of China
| | - Nannan Guo
- Jilin Provincial Key Laboratory of Animal Embryo Engineering, College of Animal Sciences, Jilin University, Changchun, Jilin Province, People’s Republic of China
| | - Mengjing Li
- Jilin Provincial Key Laboratory of Animal Embryo Engineering, College of Animal Sciences, Jilin University, Changchun, Jilin Province, People’s Republic of China
| | - Huping Jiao
- Jilin Provincial Key Laboratory of Animal Embryo Engineering, College of Animal Sciences, Jilin University, Changchun, Jilin Province, People’s Republic of China
| | - Daxin Pang
- Jilin Provincial Key Laboratory of Animal Embryo Engineering, College of Animal Sciences, Jilin University, Changchun, Jilin Province, People’s Republic of China
| |
Collapse
|
6
|
Evidence for positive selection on the leptin gene in Cetacea and Pinnipedia. PLoS One 2011; 6:e26579. [PMID: 22046310 PMCID: PMC3203152 DOI: 10.1371/journal.pone.0026579] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2011] [Accepted: 09/29/2011] [Indexed: 01/21/2023] Open
Abstract
The leptin gene has received intensive attention and scientific investigation for its importance in energy homeostasis and reproductive regulation in mammals. Furthermore, study of the leptin gene is of crucial importance for public health, particularly for its role in obesity, as well as for other numerous physiological roles that it plays in mammals. In the present work, we report the identification of novel leptin genes in 4 species of Cetacea, and a comparison with 55 publicly available leptin sequences from mammalian genome assemblies and previous studies. Our study provides evidence for positive selection in the suborder Odontoceti (toothed whales) of the Cetacea and the family Phocidae (earless seals) of the Pinnipedia. We also detected positive selection in several leptin gene residues in these two lineages. To test whether leptin and its receptor evolved in a coordinated manner, we analyzed 24 leptin receptor gene (LPR) sequences from available mammalian genome assemblies and other published data. Unlike the case of leptin, our analyses did not find evidence of positive selection for LPR across the Cetacea and Pinnipedia lineages. In line with this, positively selected sites identified in the leptin genes of these two lineages were located outside of leptin receptor binding sites, which at least partially explains why co-evolution of leptin and its receptor was not observed in the present study. Our study provides interesting insights into current understanding of the evolution of mammalian leptin genes in response to selective pressures from life in an aquatic environment, and leads to a hypothesis that new tissue specificity or novel physiologic functions of leptin genes may have arisen in both odontocetes and phocids. Additional data from other species encompassing varying life histories and functional tests of the adaptive role of the amino acid changes identified in this study will help determine the factors that promote the adaptive evolution of the leptin genes in marine mammals.
Collapse
|
7
|
Liberles DA, Tisdell MDM, Grahnen JA. Binding constraints on the evolution of enzymes and signalling proteins: the important role of negative pleiotropy. Proc Biol Sci 2011; 278:1930-5. [PMID: 21490020 DOI: 10.1098/rspb.2010.2637] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A number of biophysical and population-genetic processes influence amino acid substitution rates. It is commonly recognized that proteins must fold into a native structure with preference over an unfolded state, and must bind to functional interacting partners favourably to function properly. What is less clear is how important folding and binding specificity are to amino acid substitution rates. A hypothesis of the importance of binding specificity in constraining sequence and functional evolution is presented. Examples include an evolutionary simulation of a population of SH2 sequences evolved by threading through the structure and binding to a native ligand, as well as SH3 domain signalling in yeast and selection for specificity in enzymatic reactions. An example in vampire bats where negative pleiotropy appears to have been adaptive is presented. Finally, considerations of compartmentalization and macromolecular crowding on negative pleiotropy are discussed.
Collapse
Affiliation(s)
- David A Liberles
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA.
| | | | | |
Collapse
|
8
|
Kamneva OK, Liberles DA, Ward NL. Genome-wide influence of indel Substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method. Genome Biol Evol 2010; 2:870-86. [PMID: 21048002 PMCID: PMC3000692 DOI: 10.1093/gbe/evq071] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Whole-genome scans for positive Darwinian selection are widely used to detect evolution of genome novelty. Most approaches are based on evaluation of nonsynonymous to synonymous substitution rate ratio across evolutionary lineages. These methods are sensitive to saturation of synonymous sites and thus cannot be used to study evolution of distantly related organisms. In contrast, indels occur less frequently than amino acid replacements, accumulate more slowly, and can be employed to characterize evolution of diverged organisms. As indels are also subject to the forces of natural selection, they can generate functional changes through positive selection. Here, we present a new computational approach to detect selective constraints on indel substitutions at the whole-genome level for distantly related organisms. Our method is based on ancestral sequence reconstruction, takes into account the varying susceptibility of different types of secondary structure to indels, and according to simulation studies is conservative. We applied this newly developed framework to characterize the evolution of organisms of the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) bacterial superphylum. The superphylum contains organisms with unique cell biology, physiology, and diverse lifestyles. It includes bacteria with simple cell organization and more complex eukaryote-like compartmentalization. Lifestyles range from free-living organisms to obligate pathogens. In this study, we conduct a whole-genome level analysis of indel substitutions specific to evolutionary lineages of the PVC superphylum and found that indels evolved under positive selection on up to 12% of gene tree branches. We also analyzed possible functional consequences for several case studies of predicted indel events.
Collapse
Affiliation(s)
| | | | - Naomi L. Ward
- Department of Molecular Biology, University of Wyoming
- Department of Botany, University of Wyoming
- Program in Ecology, University of Wyoming
- Corresponding author: E-mail:
| |
Collapse
|
9
|
Rodgers BD, Garikipati DK. Clinical, agricultural, and evolutionary biology of myostatin: a comparative review. Endocr Rev 2008; 29:513-34. [PMID: 18591260 PMCID: PMC2528853 DOI: 10.1210/er.2008-0003] [Citation(s) in RCA: 160] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The discovery of myostatin and our introduction to the "Mighty Mouse" over a decade ago spurred both basic and applied research and impacted popular culture as well. The myostatin-null genotype produces "double muscling" in mice and livestock and was recently described in a child. The field's rapid growth is by no means surprising considering the potential benefits of enhancing muscle growth in clinical and agricultural settings. Indeed, several recent studies suggest that blocking myostatin's inhibitory effects could improve the clinical treatment of several muscle growth disorders, whereas comparative studies suggest that these actions are at least partly conserved. Thus, neutralizing myostatin's effects could also have agricultural significance. Extrapolating between studies that use different vertebrate models, particularly fish and mammals, is somewhat confusing because whole genome duplication events have resulted in the production and retention of up to four unique myostatin genes in some fish species. Such comparisons, however, suggest that myostatin's actions may not be limited to skeletal muscle per se, but may additionally influence other tissues including cardiac muscle, adipocytes, and the brain. Thus, therapeutic intervention in the clinic or on the farm must consider the potential of alternative side effects that could impact these or other tissues. In addition, the presence of multiple and actively diversifying myostatin genes in most fish species provides a unique opportunity to study adaptive molecular evolution. It may also provide insight into myostatin's nonmuscle actions as results from these and other comparative studies gain visibility in biomedical fields.
Collapse
Affiliation(s)
- Buel D Rodgers
- Department of Animal Sciences, 124 ASLB, Washington State University, Pullman, Washington 99164, USA.
| | | |
Collapse
|
10
|
Liberles DA, Dittmar K. Characterizing gene family evolution. Biol Proced Online 2008; 10:66-73. [PMID: 19461954 PMCID: PMC2683547 DOI: 10.1251/bpo144] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Revised: 03/17/2008] [Accepted: 04/07/2008] [Indexed: 11/23/2022] Open
Abstract
Gene families are widely used in comparative genomics, molecular evolution, and in systematics. However, they are constructed in different manners, their data analyzed and interpreted differently, with different underlying assumptions, leading to sometimes divergent conclusions. In systematics, concepts like monophyly and the dichotomy between homoplasy and homology have been central to the analysis of phylogenies. We critique the traditional use of such concepts as applied to gene families and give examples of incorrect inferences they may lead to. Operational definitions that have emerged within functional genomics are contrasted with the common formal definitions derived from systematics. Lastly, we question the utility of layers of homology and the meaning of homology at the character state level in the context of sequence evolution. From this, we move forward to present an idealized strategy for characterizing gene family evolution for both systematic and functional purposes, including recent methodological improvements.
Collapse
|
11
|
Abstract
Background The rate of evolution varies spatially along genomes and temporally in time. The presence of evolutionary rate variation is an informative signal that often marks functional regions of genomes and historical selection events. There exist many tests for temporal rate variation, or heterotachy, that start by partitioning sampled sequences into two or more groups and testing rate homogeneity among the groups. I develop a Bayesian method to infer phylogenetic trees with a divergence point, or dramatic temporal shifts in selection pressure that affect many nucleotide sites simultaneously, located at an unknown position in the tree. Results Simulation demonstrates that the method is most able to detect divergence points when rate variation and the number of affected sites is high, but not beyond biologically relevant values. The method is applied to two viral data sets. A divergence point is identified separating the B and C subtypes, two genetically distinct variants of HIV that have spread into different human populations with the AIDS epidemic. In contrast, no strong signal of temporal rate variation is found in a sample of F and H genotypes, two genetic variants of HBV that have likely evolved with humans during their immigration and expansion into the Americas. Conclusion Temporal shifts in evolutionary rate of sufficient magnitude are detectable in the history of sampled sequences. The ability to detect such divergence points without the need to specify a prior hypothesis about the location or timing of the divergence point should help scientists identify historically important selection events and decipher mechanisms of evolution.
Collapse
Affiliation(s)
- Karin S Dorman
- Department of Statistics, and the Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, USA.
| |
Collapse
|
12
|
Ardawatia H, Liberles DA. A systematic analysis of lineage-specific evolution in metabolic pathways. Gene 2007; 387:67-74. [PMID: 17034962 DOI: 10.1016/j.gene.2006.08.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2006] [Revised: 07/30/2006] [Accepted: 08/10/2006] [Indexed: 12/29/2022]
Abstract
In a search for the lineage-specific evolution of pathways between human, chimpanzee, mouse, and rat, orthologous gene families were generated from genome sequences. For each family, a model-based ratio of nonsynonymous to synonymous nucleotide substitution rates was calculated. Where the free-ratio model of individual ratios on each branch was supported, these families were mapped to two databases of metabolic pathways (KEGG and BioCyc) and the lineage-specific evolution of pathways was evaluated. The most similar pathway evolution was seen between mouse and rat, while the evolutionary pattern between human and chimpanzee was less correlated. Individual pathways in the human lineage were observed to evolve in a faster, lineage-specific manner, including the pathway involving arachidonic acid metabolism (identified through the KEGG analysis) and pyrimidine metabolism (identified through both analyses).
Collapse
Affiliation(s)
- Himanshu Ardawatia
- Computational Biology Unit, BCCS, University of Bergen, 5020 Bergen, Norway
| | | |
Collapse
|
13
|
Romanish MT, Lock WM, van de Lagemaat LN, Dunn CA, Mager DL. Repeated recruitment of LTR retrotransposons as promoters by the anti-apoptotic locus NAIP during mammalian evolution. PLoS Genet 2006; 3:e10. [PMID: 17222062 PMCID: PMC1781489 DOI: 10.1371/journal.pgen.0030010] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2006] [Accepted: 12/05/2006] [Indexed: 12/19/2022] Open
Abstract
Neuronal apoptosis inhibitory protein (NAIP, also known as BIRC1) is a member of the conserved inhibitor of apoptosis protein (IAP) family. Lineage-specific rearrangements and expansions of this locus have yielded different copy numbers among primates and rodents, with human retaining a single functional copy and mouse possessing several copies, depending on the strain. Roles for this gene in disease have been documented, but little is known about transcriptional regulation of NAIP. We show here that NAIP has multiple promoters sharing no similarity between human and rodents. Moreover, we demonstrate that multiple, domesticated long terminal repeats (LTRs) of endogenous retroviral elements provide NAIP promoter function in human, mouse, and rat. In human, an LTR serves as a tissue-specific promoter, active primarily in testis. However, in rodents, our evidence indicates that an ancestral LTR common to all rodent genes is the major, constitutive promoter for these genes, and that a second LTR found in two of the mouse genes is a minor promoter. Thus, independently acquired LTRs have assumed regulatory roles for orthologous genes, a remarkable evolutionary scenario. We also demonstrate that 5′ flanking regions of IAP family genes as a group, in both human and mouse are enriched for LTR insertions compared to average genes. We propose several potential explanations for these findings, including a hypothesis that recruitment of LTRs near NAIP or other IAP genes may represent a host-cell adaptation to modulate apoptotic responses. When retroviruses infect cells, the viral DNA inserts into the cellular genome. If this happens in gametes (egg or sperm), the viral DNA will be transmitted from parent to offspring, like all chromosomal DNA. Through evolutionary time, such infections of gametes have been so prevalent that 8%–10% of the normal human and mouse genomes are now composed of ancient viral DNA, termed endogenous retroviruses (ERVs). In human, these ERVs are mutated or “dead” but it has been shown that ERV regulatory regions can be employed by the host to help control expression of cellular genes. Here, we report on a remarkable example of this phenomenon. We demonstrate that both the human and rodent neuronal apoptosis inhibitory protein (NAIP) genes, involved in preventing cell death, use different ERV sequences to drive gene expression. Moreover, in each of the primate and rodent lineages, two separate ERVs contribute to NAIP gene expression. This repeated ERV recruitment by NAIP genes throughout evolution is very unlikely to have occurred by chance. We offer a number of potential explanations, including the intriguing possibility that it may be advantageous for anti-cell death genes like NAIP to use ERVs to control their expression. These results support the view that not all retroviral remnants in our genome are simply junk DNA.
Collapse
Affiliation(s)
- Mark T Romanish
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Wynne M Lock
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Louie N. van de Lagemaat
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Catherine A Dunn
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Dixie L Mager
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- *To whom correspondence should be addressed. E-mail:
| |
Collapse
|
14
|
Pie MR, Alvares LE. Evolution of myostatin in vertebrates: Is there evidence for positive selection? Mol Phylogenet Evol 2006; 41:730-4. [PMID: 16876447 DOI: 10.1016/j.ympev.2006.05.038] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2006] [Revised: 05/17/2006] [Accepted: 05/30/2006] [Indexed: 12/01/2022]
Affiliation(s)
- Marcio R Pie
- Departamento de Zoologia, Caixa Postal 19073, Universidade Federal do Paraná, Curitiba, PR 81531-990, Brazil.
| | | |
Collapse
|
15
|
Berglund-Sonnhammer AC, Steffansson P, Betts MJ, Liberles DA. Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol 2006; 63:240-50. [PMID: 16830091 DOI: 10.1007/s00239-005-0096-1] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2005] [Accepted: 04/15/2006] [Indexed: 10/24/2022]
Abstract
Gene duplication and gene loss as well as other biological events can result in multiple copies of genes in a given species. Because of these gene duplication and loss dynamics, in addition to variation in sequence evolution and other sources of uncertainty, different gene trees ultimately present different evolutionary histories. All of this together results in gene trees that give different topologies from each other, making consensus species trees ambiguous in places. Other sources of data to generate species trees are also unable to provide completely resolved binary species trees. However, in addition to gene duplication events, speciation events have provided some underlying phylogenetic signal, enabling development of algorithms to characterize these processes. Therefore, a soft parsimony algorithm has been developed that enables the mapping of gene trees onto species trees and modification of uncertain or weakly supported branches based on minimizing the number of gene duplication and loss events implied by the tree. The algorithm also allows for rooting of unrooted trees and for removal of in-paralogues (lineage-specific duplicates and redundant sequences masquerading as such). The algorithm has also been made available for download as a software package, Softparsmap.
Collapse
|
16
|
Roth C, Liberles DA. A systematic search for positive selection in higher plants (Embryophytes). BMC PLANT BIOLOGY 2006; 6:12. [PMID: 16784532 PMCID: PMC1540423 DOI: 10.1186/1471-2229-6-12] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Accepted: 06/19/2006] [Indexed: 05/04/2023]
Abstract
BACKGROUND Previously, a database characterizing examples of Embryophyte gene family lineages showing evidence of positive selection was reported. Of the gene family trees, 138 Embryophyte branches showed Ka/Ks>>1 and are candidates for functional adaptation. The database and these examples have now been studied in further detail to better understand the molecular basis for plant genome evolution. RESULTS Neutral modeling showed an excess of positive and/or negative selection in the database over a neutral expectation centered on the mean Ka/Ks ratio. Out of 673 families with assigned structures, 490 have at least one branch with Ka/Ks >>1 in a region of the protein, enabling a picture of selective pressures delineated by protein structure. Most gene families allowed reconstruction back to the last common ancestor of flowering plants (Magnoliophytes) without saturation of 4- fold degenerate codon position. Positive selection occurred in a wide variety of gene families with different functions, including in the self incompatibility locus, in defense against pathogens, in embryogenesis, in cold acclimation, and in electrontransport. Structurally, selective pressures were similar between alpha-helices and beta- sheets, but were less negative and more variant on the surface and away from the hydrophobic core. CONCLUSION Positive selection was detected statistically significantly in a small and nonrandom minority of gene families in a systematic analysis of embryophyte gene families. More sensitive methods increased the level of positive selection that was detected and presented a structural basis for the role of positive selection in plant genomes.
Collapse
Affiliation(s)
- Christian Roth
- Computational Biology Unit, BCCS, University of Bergen, 5020 Bergen, Norway
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden
- Department of Molecular Biology, University of Wyoming, Dept. 3944, 1000 E. University Avenue, Laramie, WY 82071, USA
| | - David A Liberles
- Computational Biology Unit, BCCS, University of Bergen, 5020 Bergen, Norway
- Department of Molecular Biology, University of Wyoming, Dept. 3944, 1000 E. University Avenue, Laramie, WY 82071, USA
| |
Collapse
|
17
|
Chen L, Lee C. Distinguishing HIV-1 drug resistance, accessory, and viral fitness mutations using conditional selection pressure analysis of treated versus untreated patient samples. Biol Direct 2006; 1:14. [PMID: 16737543 PMCID: PMC1523337 DOI: 10.1186/1745-6150-1-14] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2006] [Accepted: 05/31/2006] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND HIV can evolve drug resistance rapidly in response to new drug treatments, often through a combination of multiple mutations 123. It would be useful to develop automated analyses of HIV sequence polymorphism that are able to predict drug resistance mutations, and to distinguish different types of functional roles among such mutations, for example, those that directly cause drug resistance, versus those that play an accessory role. Detecting functional interactions between mutations is essential for this classification. We have adapted a well-known measure of evolutionary selection pressure (Ka/Ks) and developed a conditional Ka/Ks approach to detect important interactions. RESULTS We have applied this analysis to four independent HIV protease sequencing datasets: 50,000 clinical samples sequenced by Specialty Laboratories, Inc.; 1800 samples from patients treated with protease inhibitors; 2600 samples from untreated patients; 400 samples from untreated African patients. We have identified 428 mutation interactions in Specialty dataset with statistical significance and we were able to distinguish primary vs. accessory mutations for many well-studied examples. Amino acid interactions identified by conditional Ka/Ks matched 80 of 92 pair wise interactions found by a completely independent study of HIV protease (p-value for this match is significant: 10-70). Furthermore, Ka/Ks selection pressure results were highly reproducible among these independent datasets, both qualitatively and quantitatively, suggesting that they are detecting real drug-resistance and viral fitness mutations in the wild HIV-1 population. CONCLUSION Conditional Ka/Ks analysis can detect mutation interactions and distinguish primary vs. accessory mutations in HIV-1. Ka/Ks analysis of treated vs. untreated patient data can distinguish drug-resistance vs. viral fitness mutations. Verification of these results would require longitudinal studies. The result provides a valuable resource for AIDS research and will be available for open access upon publication at http://www.bioinformatics.ucla.edu/HIV.
Collapse
Affiliation(s)
- Lamei Chen
- Institute for Genomics & Proteomics, Molecular Biology Institute, Dept. of Chemistry & Biochemistry, UCLA, Los Angeles, CA 90095-1570, USA
| | - Christopher Lee
- Institute for Genomics & Proteomics, Molecular Biology Institute, Dept. of Chemistry & Biochemistry, UCLA, Los Angeles, CA 90095-1570, USA
| |
Collapse
|
18
|
Berglund AC, Wallner B, Elofsson A, Liberles DA. Tertiary windowing to detect positive diversifying selection. J Mol Evol 2005; 60:499-504. [PMID: 15883884 DOI: 10.1007/s00239-004-0223-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2004] [Accepted: 10/20/2004] [Indexed: 12/01/2022]
Abstract
As a protein-encoding gene evolves, different selective pressures act on the gene temporally and spatially. An examination of the ratio of nonsynonymous-to-synonymous nucleotide substitution rate ratios (K(a)/K(s)) has proven to be a valuable method to examine selective pressures on protein encoding genes, including detecting positive diversifying selection. To gain power over averaging all sites in a gene together, examination of sites in primary sequence windows has frequently been employed. However, selection acts on folded proteins and sites that are close in tertiary space may not be close in primary sequence. A new method for the examination of K(a)/K(s) ratios based upon windows in tertiary structure is introduced and applied to the leptin gene family in mammals. Tertiary sequence windowing detects new sites under positive diversifying selection and detects positive diversifying selection with a more significant signal along various branches of the leptin gene family tree.
Collapse
|
19
|
Bhushan S, Ståhl A, Nilsson S, Lefebvre B, Seki M, Roth C, McWilliam D, Wright SJ, Liberles DA, Shinozaki K, Bruce BD, Boutry M, Glaser E. Catalysis, subcellular localization, expression and evolution of the targeting peptides degrading protease, AtPreP2. PLANT & CELL PHYSIOLOGY 2005; 46:985-96. [PMID: 15827031 DOI: 10.1093/pcp/pci107] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We have previously identified a zinc metalloprotease involved in the degradation of mitochondrial and chloroplast targeting peptides, the presequence protease (PreP). In the Arabidopsis thaliana genomic database, there are two genes that correspond to the protease, the zinc metalloprotease (AAL90904) and the putative zinc metalloprotease (AAG13049). We have named the corresponding proteins AtPreP1 and AtPreP2, respectively. AtPreP1 and AtPreP2 show significant differences in their targeting peptides and the proteins are predicted to be localized in different compartments. AtPreP1 was shown to degrade both mitochondrial and chloroplast targeting peptides and to be dual targeted to both organelles using an ambiguous targeting peptide. Here, we have overexpressed, purified and characterized proteolytic and targeting properties of AtPreP2. AtPreP2 exhibits different proteolytic subsite specificity from AtPreP1 when used for degradation of organellar targeting peptides and their mutants. Interestingly, AtPreP2 precursor protein was also found to be dual targeted to both mitochondria and chloroplasts in a single and dual in vitro import system. Furthermore, targeting peptide of the AtPreP2 dually targeted green fluorescent protein (GFP) to both mitochondria and chloroplasts in tobacco protoplasts and leaves using an in vivo transient expression system. The targeting of both AtPreP1 and AtPreP2 proteases to chloroplasts in A. thaliana in vivo was confirmed via a shotgun mass spectrometric analysis of highly purified chloroplasts. Reverse transcription-polymerase chain reaction (RT-PCR) analysis revealed that AtPreP1 and AtPreP2 are differentially expressed in mature A. thaliana plants. Phylogenetic evidence indicated that AtPreP1 and AtPreP2 are recent gene duplicates that may have diverged through subfunctionalization.
Collapse
Affiliation(s)
- Shashi Bhushan
- Department of Biochemistry and Biophysics, Arrhenius Laboratories for Natural Sciences, Stockholm University, 10691 Stockholm, Sweden
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Roth C, Betts MJ, Steffansson P, Saelensminde G, Liberles DA. The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics. Nucleic Acids Res 2005; 33:D495-7. [PMID: 15608245 PMCID: PMC540044 DOI: 10.1093/nar/gki090] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
From 138 662 embryophyte (higher plant) and 348 142 chordate genes, 4216 embryophyte and 15 452 chordate gene families were generated. For each of these gene families, multiple sequence alignments, phylogenetic trees, ratios of non-synonymous to synonymous nucleotide substitution rates (Ka/Ks), mappings from gene trees to the NCBI taxonomy and structural links to solved three-dimensional protein structures in the Protein Data Bank (PDB) with Grantham-weighted mutational factors were all calculated. Of the ‘gene family trees’, 173 embryophyte and 505 chordate branches show Ka/Ks ≫ 1 and are candidates for functional adaptation. The calculated information is available both as a gene family database and as a phylogenetically indexed resource, called ‘The Adaptive Evolution Database’ (TAED), available at http://www.bioinfo.no/tools/TAED.
Collapse
Affiliation(s)
- Christian Roth
- Computational Biology Unit, BCCS, University of Bergen, 5020 Bergen, Norway
| | | | | | | | | |
Collapse
|
21
|
Rastogi S, Liberles DA. Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol 2005; 5:28. [PMID: 15831095 PMCID: PMC1112588 DOI: 10.1186/1471-2148-5-28] [Citation(s) in RCA: 250] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2005] [Accepted: 04/14/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene duplication has been suggested to be an important process in the generation of evolutionary novelty. Neofunctionalization, as an adaptive process where one copy mutates into a function that was not present in the pre-duplication gene, is one mechanism that can lead to the retention of both copies. More recently, subfunctionalization, as a neutral process where the two copies partition the ancestral function, has been proposed as an alternative mechanism driving duplicate gene retention in organisms with small effective population sizes. The relative importance of these two processes is unclear. RESULTS A set of lattice model genes that fold and bind to two peptide ligands with overlapping binding pockets, but not a third ligand present in the cell was designed. Each gene was duplicated in a model haploid species with a small constant population size and no recombination. One set of models allowed subfunctionalization of binding events following duplication, while another set did not allow subfunctionalization. Modeling under such conditions suggests that subfunctionalization plays an important role, but as a transition state to neofunctionalization rather than as a terminal fate of duplicated genes. There is no apparent selective pressure to maintain redundancy. CONCLUSION Subfunctionalization results in an increase in the preservation of duplicated gene copies, including those that are neofunctionalized, but never represents a substantial fraction of duplicate gene copies at any evolutionary time point and ultimately leads to neofunctionalization of those preserved copies. This conclusion also may reflect changes in gene function after duplication with time in real genomes.
Collapse
Affiliation(s)
- Shruti Rastogi
- Computational Biology Unit, BCCS, University of Bergen, 5020 Bergen, Norway
| | - David A Liberles
- Computational Biology Unit, BCCS, University of Bergen, 5020 Bergen, Norway
| |
Collapse
|
22
|
Tellgren A, Berglund AC, Savolainen P, Janis CM, Liberles DA. Myostatin rapid sequence evolution in ruminants predates domestication. Mol Phylogenet Evol 2005; 33:782-90. [PMID: 15522803 DOI: 10.1016/j.ympev.2004.07.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2004] [Revised: 05/19/2004] [Indexed: 11/29/2022]
Abstract
Myostatin (GDF-8) is a negative regulator of skeletal muscle development. This gene has previously been implicated in the double muscling phenotype in mice and cattle. A systematic analysis of myostatin sequence evolution in ruminants was performed in a phylogenetic context. The myostatin coding sequence was determined from duiker (Sylvicapra grimmia caffra), eland (Taurotragus derbianus), gaur (Bos gaurus), ibex (Capra ibex), impala (Aepyceros melampus rednilis), pronghorn (Antilocapra americana), and tahr (Hemitragus jemlahicus). Analysis of nonsynonymous to synonymous nucleotide substitution rate ratios (Ka/Ks) indicates that positive selection may have been operating on this gene during the time of divergence of Bovinae and Antilopinae, starting from approximately 23 million years ago, a period that appears to account for most of the sequence difference between myostatin in these groups. These periods of positive selective pressure on myostatin may correlate with changes in skeletal muscle mass during the same period.
Collapse
Affiliation(s)
- Asa Tellgren
- Computational Biology Unit, Bergen Centre for Computational Science, University of Bergen, 5020 Bergen, Norway
| | | | | | | | | |
Collapse
|
23
|
Wong WSW, Yang Z, Goldman N, Nielsen R. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 2005; 168:1041-51. [PMID: 15514074 PMCID: PMC1448811 DOI: 10.1534/genetics.104.031153] [Citation(s) in RCA: 447] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The parsimony method of Suzuki and Gojobori (1999) and the maximum likelihood method developed from the work of Nielsen and Yang (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites.
Collapse
Affiliation(s)
- Wendy S W Wong
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA.
| | | | | | | |
Collapse
|
24
|
Abstract
Duplication-degeneration-complementation (DDC) describes a process by which evolving duplicates of a pleiotropic ancestral gene divide up the multiple functions of the ancestor between them (i.e. subfunctionalize), and this ultimately frustrates the rate of pseudogene formation. Focusing explicitly on enzyme-like pleiotropic function, we model DDC driven by sequence divergence between duplicates. The model incorporates an idealized sequence-function mapping in which enzyme-substrate binding affinity is related to hydrophobic versus polar (HP) amino-acid composition of tertiary structure about the binding pocket. In this sense, a transparent coupling between physical-chemical function of an enzyme and sequence evolution is presented.
Collapse
Affiliation(s)
- F N Braun
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden.
| | | |
Collapse
|
25
|
Abstract
Background Joining a model for the molecular evolution of a protein family to the paleontological and geological records (geobiology), and then to the chemical structures of substrates, products, and protein folds, is emerging as a broad strategy for generating hypotheses concerning function in a post-genomic world. This strategy expands systems biology to a planetary context, necessary for a notion of fitness to underlie (as it must) any discussion of function within a biomolecular system. Results Here, we report an example of such an expansion, where tools from planetary biology were used to analyze three genes from the pig Sus scrofa that encode cytochrome P450 aromatases–enzymes that convert androgens into estrogens. The evolutionary history of the vertebrate aromatase gene family was reconstructed. Transition redundant exchange silent substitution metrics were used to interpolate dates for the divergence of family members, the paleontological record was consulted to identify changes in physiology that correlated in time with the change in molecular behavior, and new aromatase sequences from peccary were obtained. Metrics that detect changing function in proteins were then applied, including KA/KS values and those that exploit structural biology. These identified specific amino acid replacements that were associated with changing substrate and product specificity during the time of presumed adaptive change. The combined analysis suggests that aromatase paralogs arose in pigs as a result of selection for Suoidea with larger litters than their ancestors, and permitted the Suoidea to survive the global climatic trauma that began in the Eocene. Conclusions This combination of bioinformatics analysis, molecular evolution, paleontology, cladistics, global climatology, structural biology, and organic chemistry serves as a paradigm in planetary biology. As the geological, paleontological, and genomic records improve, this approach should become widely useful to make systems biology statements about high-level function for biomolecular systems.
Collapse
|
26
|
Hughes T, Hyun Y, Liberles DA. Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinformatics 2004; 5:48. [PMID: 15117420 PMCID: PMC419335 DOI: 10.1186/1471-2105-5-48] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2004] [Accepted: 04/29/2004] [Indexed: 11/10/2022] Open
Abstract
Background Common existing phylogenetic tree visualisation tools are not able to display readable trees with more than a few thousand nodes. These existing methodologies are based in two dimensional space. Results We introduce the idea of visualising phylogenetic trees in three dimensional hyperbolic space with the Walrus graph visualisation tool and have developed a conversion tool that enables the conversion of standard phylogenetic tree formats to Walrus' format. With Walrus, it becomes possible to visualise and navigate phylogenetic trees with more than 100,000 nodes. Conclusion Walrus enables desktop visualisation of very large phylogenetic trees in 3 dimensional hyperbolic space. This application is potentially useful for visualisation of the tree of life and for functional genomics derivatives, like The Adaptive Evolution Database (TAED).
Collapse
Affiliation(s)
- Timothy Hughes
- Computational Biology Unit Bergen Centre for Computational Science University of Bergen 5020 Bergen Norway
| | - Young Hyun
- Cooperative Association for Internet Data Analysis SDSC University of California – San Diego MC0505 9500 Gilman Drive La Jolla, CA 92093 USA
| | - David A Liberles
- Computational Biology Unit Bergen Centre for Computational Science University of Bergen 5020 Bergen Norway
| |
Collapse
|
27
|
Endo T, Ogishima S, Tanaka H. Standardized phylogenetic tree: a reference to discover functional evolution. J Mol Evol 2004; 57 Suppl 1:S174-81. [PMID: 15008414 DOI: 10.1007/s00239-003-0025-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Functional evolution is often driven by positive natural selection. Although it is thought to be rare in evolution at the molecular level, its effects may be observed as the accelerated evolutionary rates. Therefore one of the effective ways to identify functional evolution is to identify accelerated evolution. Many methods have been developed to test the statistical significance of the accelerated evolutionary rate by comparison with the appropriate reference rate. The rates of synonymous substitution are one of the most useful and popular references, especially for large-scale analyses. On the other hand, these rates are applicable only to a limited evolutionary time period because they saturate quickly--i.e., multiple substitutions happen frequently because of the lower functional constraint. The relative rate test is an alternative method. This technique has an advantage in terms of the saturation effect but is not sufficiently powerful when the evolutionary rate differs considerably among phylogenetic lineages. For the aim to provide a universal reference tree, we propose a method to construct a standardized tree which serves as the reference for accelerated evolutionary rate. The method is based upon multiple molecular phylogenies of single genes with the aim of providing higher reliability. The tree has averaged and normalized branch lengths with standard deviations for statistical neutrality limits. The standard deviation also suggests the reliability level of the branch order. The resulting tree serves as a reference tree for the reliability level of the branch order and the test of evolutionary rate acceleration even when some of the species lineages show an accelerated evolutionary rate for most of their genes due to bottlenecking and other effects.
Collapse
Affiliation(s)
- Toshinori Endo
- Department of Bioinformatics, MRI, Tokyo Medical and Dental University, Yushima 1-5-45, Bunkyo-ku, Tokyo, 113-8510, Japan.
| | | | | |
Collapse
|
28
|
Swart EC, Hide WA, Seoighe C. FRAGS: estimation of coding sequence substitution rates from fragmentary data. BMC Bioinformatics 2004; 5:8. [PMID: 15005802 PMCID: PMC344743 DOI: 10.1186/1471-2105-5-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2003] [Accepted: 01/29/2004] [Indexed: 01/06/2023] Open
Abstract
Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data.
Collapse
Affiliation(s)
- Estienne C Swart
- South African National Bioinformatics Institute, University of the Western Cape, Private Bag X17, Bellville 7535, South Africa
| | - Winston A Hide
- South African National Bioinformatics Institute, University of the Western Cape, Private Bag X17, Bellville 7535, South Africa
| | - Cathal Seoighe
- South African National Bioinformatics Institute, University of the Western Cape, Private Bag X17, Bellville 7535, South Africa
| |
Collapse
|
29
|
Choi SS, Lahn BT. Adaptive evolution of MRG, a neuron-specific gene family implicated in nociception. Genome Res 2003; 13:2252-9. [PMID: 14525927 PMCID: PMC403691 DOI: 10.1101/gr.1431603] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2003] [Accepted: 08/11/2003] [Indexed: 12/19/2022]
Abstract
The MRG gene family (also known as SNSR) belongs to the G-protein-coupled receptor (GPCR) superfamily, is expressed specifically in nociceptive neurons, and is implicated in the modulation of nociception. Here, we show that Ka/Ks (the ratio between nonsynonymous and synonymous substitution rates) displays distinct profiles along the coding regions of MRG, with peaks (Ka/Ks>1) corresponding to extracellular domains, and valleys (Ka/Ks<1) corresponding to transmembrane and cytoplasmic domains. The extracellular domains are also characterized by a significant excess of radical amino acid changes. Statistical analysis shows that positive selection is by far the most suitable model to account for the nucleotide substitution patterns in MRG. Together, these results demonstrate that the extracellular domains of the MRG receptor family, which presumably partake in ligand binding, have experienced strong positive selection. Such selection is likely directed at altering the sensitivity and/or selectivity of nociceptive neurons to aversive stimuli. Thus, our finding suggests pain perception as an aspect of the nervous system that may have experienced a surprising level of adaptive evolution.
Collapse
Affiliation(s)
- Sun Shim Choi
- Howard Hughes Medical Institute and Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | | |
Collapse
|
30
|
Benner SA, Caraco MD, Thomson JM, Gaucher EA. Planetary biology--paleontological, geological, and molecular histories of life. Science 2002; 296:864-8. [PMID: 11988562 DOI: 10.1126/science.1069863] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The history of life on Earth is chronicled in the geological strata, the fossil record, and the genomes of contemporary organisms. When examined together, these records help identify metabolic and regulatory pathways, annotate protein sequences, and identify animal models to develop new drugs, among other features of scientific and biomedical interest. Together, planetary analysis of genome and proteome databases is providing an enhanced understanding of how life interacts with the biosphere and adapts to global change.
Collapse
Affiliation(s)
- Steven A Benner
- Department of Chemistry, University of Florida, Gainesville FL, 32611-7200, USA.
| | | | | | | |
Collapse
|
31
|
Abstract
Immediately after a gene duplication event, the duplicate genes have redundant functions. Is natural selection therefore completely relaxed after duplication? Does one gene evolve more rapidly than the other? Several recent genome-wide studies have suggested that duplicate genes are always under purifying selection and do not always evolve at the same rate.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Biology, University of New Mexico, 167A Castetter Hall, Albuquerque, NM 817131-1091, USA.
| |
Collapse
|
32
|
Abstract
As more gene and genomic sequences from an increasing assortment of species become available, new pictures of evolution are emerging. Improved methods can pinpoint where positive and negative selection act in individual codons in specific genes on specific branches of phylogenetic trees. Positive selection appears to be important in the interaction between genotype, protein structure, function, and organismal phenotype.
Collapse
Affiliation(s)
- David A Liberles
- Department of Biochemistry and Biophysics and Stockholm Bioinformatics Center, Stockholm University, 10691 Stockholm, Sweden.
| | | |
Collapse
|
33
|
Liberles DA. Evaluation of methods for determination of a reconstructed history of gene sequence evolution. Mol Biol Evol 2001; 18:2040-7. [PMID: 11606700 DOI: 10.1093/oxfordjournals.molbev.a003745] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
With whole-genome sequences being completed at an increasing rate, it is important to develop and assess tools to analyze them. Following annotation of the protein content of a genome, one can compare sequences with previously characterized homologous genes to detect novel functions within specific proteins in the evolution of the newly sequenced genome. One common statistical method to detect such changes is to compare the ratios of nonsynonymous (K(a)) to synonymous (K(s)) nucleotide substitution rates. Here, the effects of several parameters that can influence this calculation (sequence reconstruction method, phylogenetic tree branch length weighting, GC content, and codon bias) are examined. Also, two new alternative measures of adaptive evolution, the point accepted mutations (PAM)/neutral evolutionary distance (NED) ratio and the sequence space assessment (SSA) statistic are presented. All of these methods are compared using two sequence families: the recent divergence of leptin orthologs in primates, and the more ancient divergence of the deoxyribonucleoside kinase family. The examination of these and other measures to detect changes of gene function along branches of a phylogenetic tree will become increasingly important in the postgenomic era.
Collapse
Affiliation(s)
- D A Liberles
- Department of Biochemistry and Biophysics and Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden.
| |
Collapse
|