51
|
Liberles DA. Reading the Story in DNA: A Beginner's Guide to Molecular Evolution. Syst Biol 2009. [DOI: 10.1093/sysbio/syp003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
52
|
Jaillon O, Aury JM, Wincker P. “Changing by doubling”, the impact of Whole Genome Duplications in the evolution of eukaryotes. C R Biol 2009; 332:241-53. [DOI: 10.1016/j.crvi.2008.07.007] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2008] [Accepted: 07/21/2008] [Indexed: 12/17/2022]
|
53
|
Ciccarelli FD, Miklós I. Co-evolutionary Models for Reconstructing Ancestral Genomic Sequences: Computational Issues and Biological Examples. COMPARATIVE GENOMICS 2009. [PMCID: PMC7120581 DOI: 10.1007/978-3-642-04744-2_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content. In this work we formally define a computational problem for the inference of ancestral genome content by co-evolution. We show that this problem is NP-hard and present both a Fixed Parameter Tractable (FPT) algorithm, and heuristic approximation algorithms for solving it. The running time of these algorithms on simulated inputs with hundreds of protein families and hundreds of co-evolutionary relations was fast (up to four minutes) and it achieved an approximation ratio < 1.3. We use our approach to study the ancestral genome content of the Fungi. To this end, we implement our approach on a dataset of 33,931 protein families and 20,317 co-evolutionary relations. Our algorithm added and removed hundreds of proteins from the ancestral genomes inferred by maximum likelihood (ML) or maximum parsimony (MP) while slightly affecting the likelihood/parsimony score of the results. A biological analysis revealed various pieces of evidence that support the biological plausibility of the new solutions.
Collapse
Affiliation(s)
| | - István Miklós
- Rényi Institute, Hungarian Academy of Sciences, Reáltanoda utca 13-15, 1053 Budapest, Hungary
| |
Collapse
|
54
|
Paten B, Herrero J, Fitzgerald S, Beal K, Flicek P, Holmes I, Birney E. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genes Dev 2008; 18:1829-43. [PMID: 18849525 PMCID: PMC2577868 DOI: 10.1101/gr.076521.108] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2008] [Accepted: 09/09/2008] [Indexed: 11/24/2022]
Abstract
Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide reconstructions will facilitate a greater knowledge of the events that have driven evolution. We present a new evolutionary alignment modeler, called "Ortheus," for inferring the evolutionary history of a multiple alignment, in terms of both substitutions and, importantly, insertions and deletions. Based on a multiple sequence probabilistic transducer model of the type proposed by Holmes, Ortheus uses efficient stochastic graph-based dynamic programming methods. Unlike other methods, Ortheus does not rely on a single fixed alignment from which to work. Ortheus is also more scaleable than previous methods while being fast, stable, and open source. Large-scale simulations show that Ortheus performs close to optimally on a deep mammalian phylogeny. Simulations also indicate that significant proportions of errors due to insertions and deletions can be avoided by not assuming a fixed alignment. We additionally use a challenging hold-out cross-validation procedure to test the method; using the reconstructions to predict extant sequence bases, we demonstrate significant improvements over using closest extant neighbor sequences. Accompanying this paper, a new, public, and genome-wide set of Ortheus ancestor alignments provide an intriguing new resource for evolutionary studies in mammals. As a first piece of analysis, we attempt to recover "fossilized" ancestral pseudogenes. We confidently find 31 cases in which the ancestral sequence had a more complete sequence than any of the extant sequences.
Collapse
Affiliation(s)
- Benedict Paten
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Javier Herrero
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Stephen Fitzgerald
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Kathryn Beal
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Paul Flicek
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Ian Holmes
- Department of Bioengineering, University of California Berkeley, Berkeley, California 94720, USA
| | - Ewan Birney
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| |
Collapse
|
55
|
Profile of David Haussler. Proc Natl Acad Sci U S A 2008; 105:14251-3. [DOI: 10.1073/pnas.0808284105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
56
|
Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput Biol 2008; 4:e1000172. [PMID: 18787703 PMCID: PMC2527138 DOI: 10.1371/journal.pcbi.1000172] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2007] [Accepted: 07/31/2008] [Indexed: 11/19/2022] Open
Abstract
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.
Collapse
|
57
|
Amniote phylogenomics: testing evolutionary hypotheses with BAC library scanning and targeted clone analysis of large-scale DNA sequences from reptiles. Methods Mol Biol 2008; 422:91-117. [PMID: 18629663 DOI: 10.1007/978-1-59745-581-7_7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
Abstract
Phylogenomics research integrating established principles of systematic biology and taking advantage of the wealth of DNA sequences being generated by genome science holds promise for answering long-standing evolutionary questions with orders of magnitude more primary data than in the past. Although it is unrealistic to expect whole-genome initiatives to proceed rapidly for commercially unimportant species such as reptiles, practical approaches utilizing genomic libraries of large-insert clones pave the way for a phylogenomics of species that are nevertheless essential for testing evolutionary hypotheses within a phylogenetic framework. This chapter reviews the case for adopting genome-enabled approaches to evolutionary studies and outlines a program for using bacterial artificial chromosome (BAC) libraries or plasmid libraries as a basis for completing "genome scans" of reptiles. We have used BACs to close a critical gap in the genome database for Reptilia, the sister group of mammals, and present the methodological approaches taken to achieve this as a guideline for designing similar comparative studies. In addition, we provide a detailed step-by-step protocol for BAC-library screening and shotgun sequencing of specific clones containing target genes of evolutionary interest. Taken together, the genome scanning and shotgun sequencing techniques offer complementary diagnostic potential and can substantially increase the scale and power of analyses aimed at testing evolutionary hypotheses for nonmodel species.
Collapse
|
58
|
Li G, Steel M, Zhang L. More Taxa Are Not Necessarily Better for the Reconstruction of Ancestral Character States. Syst Biol 2008; 57:647-53. [DOI: 10.1080/10635150802203898] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Affiliation(s)
- Guoliang Li
- Department of Computer Science, National University of Singapore, Singapore
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore
| |
Collapse
|
59
|
Abstract
Molecular sequence data have been sampled from 10% of all species known to science. Although it is not yet feasible to assemble these data into a single phylogenetic tree of life, it is possible to quantify how much phylogenetic signal is present. Analysis of 14,289 phylogenies built from 2.6 million sequences in GenBank suggests that signal is strong in vertebrates and specific groups of nonvertebrate model organisms. Across eukaryotes, however, although phylogenetic evidence is very broadly distributed, for the average species in the database it is equivalent to less than one well-supported gene tree. This analysis shows that a stronger sampling effort aimed at genomic depth, in addition to taxonomic breadth, will be required to build high-resolution phylogenetic trees at this scale.
Collapse
Affiliation(s)
- Michael J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.
| |
Collapse
|
60
|
Levasseur A, Pontarotti P, Poch O, Thompson JD. Strategies for reliable exploitation of evolutionary concepts in high throughput biology. Evol Bioinform Online 2008; 4:121-37. [PMID: 19204813 PMCID: PMC2614184 DOI: 10.4137/ebo.s597] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology.
Collapse
Affiliation(s)
- Anthony Levasseur
- Phylogenomics Laboratory, EA 3781 Evolution Biologique, Université de Provence, 13331 Marseille, France
| | | | | | | |
Collapse
|
61
|
Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol 2008; 3:e247. [PMID: 18085818 PMCID: PMC2134963 DOI: 10.1371/journal.pcbi.0030247] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Accepted: 10/30/2007] [Indexed: 02/01/2023] Open
Abstract
Taking advantage of the complete genome sequences of several mammals, we developed a novel method to detect losses of well-established genes in the human genome through syntenic mapping of gene structures between the human, mouse, and dog genomes. Unlike most previous genomic methods for pseudogene identification, this analysis is able to differentiate losses of well-established genes from pseudogenes formed shortly after segmental duplication or generated via retrotransposition. Therefore, it enables us to find genes that were inactivated long after their birth, which were likely to have evolved nonredundant biological functions before being inactivated. The method was used to look for gene losses along the human lineage during the approximately 75 million years (My) since the common ancestor of primates and rodents (the euarchontoglire crown group). We identified 26 losses of well-established genes in the human genome that were all lost at least 50 My after their birth. Many of them were previously characterized pseudogenes in the human genome, such as GULO and UOX. Our methodology is highly effective at identifying losses of single-copy genes of ancient origin, allowing us to find a few well-known pseudogenes in the human genome missed by previous high-throughput genome-wide studies. In addition to confirming previously known gene losses, we identified 16 previously uncharacterized human pseudogenes that are definitive losses of long-established genes. Among them is ACYL3, an ancient enzyme present in archaea, bacteria, and eukaryotes, but lost approximately 6 to 8 Mya in the ancestor of humans and chimps. Although losses of well-established genes do not equate to adaptive gene losses, they are a useful proxy to use when searching for such genetic changes. This is especially true for adaptive losses that occurred more than 250,000 years ago, since any genetic evidence of the selective sweep indicative of such an event has been erased.
Collapse
|
62
|
Elango N, Kim SH, Vigoda E, Yi SV. Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation. PLoS Comput Biol 2008; 4:e1000015. [PMID: 18463707 PMCID: PMC2265638 DOI: 10.1371/journal.pcbi.1000015] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Accepted: 01/30/2008] [Indexed: 11/19/2022] Open
Abstract
Transitions at CpG dinucleotides, referred to as “CpG substitutions”, are a major mutational input into vertebrate genomes and a leading cause of human genetic disease. The prevalence of CpG substitutions is due to their mutational origin, which is dependent on DNA methylation. In comparison, other single nucleotide substitutions (for example those occurring at GpC dinucleotides) mainly arise from errors during DNA replication. Here we analyzed high quality BAC-based data from human, chimpanzee, and baboon to investigate regional variation of CpG substitution rates. We show that CpG substitutions occur approximately 15 times more frequently than other single nucleotide substitutions in primate genomes, and that they exhibit substantial regional variation. Patterns of CpG rate variation are consistent with differences in methylation level and susceptibility to subsequent deamination. In particular, we propose a “distance-decaying” hypothesis, positing that due to the molecular mechanism of a CpG substitution, rates are correlated with the stability of double-stranded DNA surrounding each CpG dinucleotide, and the effect of local DNA stability may decrease with distance from the CpG dinucleotide. Consistent with our “distance-decaying” hypothesis, rates of CpG substitution are strongly (negatively) correlated with regional G+C content. The influence of G+C content decays as the distance from the target CpG site increases. We estimate that the influence of local G+C content extends up to 1,500∼2,000 bps centered on each CpG site. We also show that the distance-decaying relationship persisted when we controlled for the effect of long-range homogeneity of nucleotide composition. GpC sites, in contrast, do not exhibit such “distance-decaying” relationship. Our results highlight an example of the distinctive properties of methylation-dependent substitutions versus substitutions mostly arising from errors during DNA replication. Furthermore, the negative relationship between G+C content and CpG rates may provide an explanation for the observation that GC-rich SINEs show lower CpG rates than other repetitive elements. Mutations are raw materials of evolution. Earlier studies have shown that mutations occur at different frequencies in different genomic regions. By investigating the patterns and causes of such “regional” variation of mutations, we can better understand the mechanisms of underlying mutagenesis. In the human and other mammalian genomes, the most common type of mutation is caused by DNA methylation, which targets cytosines followed by guanine (CpG dinucleotides). Methylated cytosines are then subject to spontaneous deamination, which will cause a C to T (or G to A) transition (CpG substitution). Because this mutational process is unique to CpG substitutions, we reasoned that they might show different patterns of variability from other substitutions. Using high quality genomic sequences from primates and by separately analyzing variability of CpG substitutions and other substitutions, we demonstrate that CpG substitutions occur approximately 15 times more frequently than other substitutions, and show a distinctive pattern of regional variability. Particularly, we propose and provide evidence that because the deamination step requires temporary strand separation, G+C composition near 1,500–2,000 bps each direction from a target CpG affects the probability of a CpG substitution. Incorporating the difference in CpG and other substitutions discovered in this study will help build more realistic evolutionary models.
Collapse
Affiliation(s)
- Navin Elango
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Seong-Ho Kim
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - NISC Comparative Sequencing Program
- Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Eric Vigoda
- College of Computing, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Soojin V. Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
63
|
Muffato M, Crollius HR. Paleogenomics in vertebrates, or the recovery of lost genomes from the mist of time. Bioessays 2008; 30:122-34. [DOI: 10.1002/bies.20707] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
64
|
Blanchette M, Diallo AB, Green ED, Miller W, Haussler D. Computational reconstruction of ancestral DNA sequences. Methods Mol Biol 2008; 422:171-84. [PMID: 18629667 DOI: 10.1007/978-1-59745-581-7_11] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
This chapter introduces the problem of ancestral sequence reconstruction: given a set of extant orthologous DNA genomic sequences (or even whole-genomes), together with a phylogenetic tree relating these sequences, predict the DNA sequence of all ancestral species in the tree. Blanchette et al. (1) have shown that for certain sets of species (in particular, for eutherian mammals), very accurate reconstruction can be obtained. We explain the main steps involved in this process, including multiple sequence alignment, insertion and deletion inference, substitution inference, and gene arrangement inference. We also describe a simulation-based procedure to assess the accuracy of the reconstructed sequences. The whole reconstruction process is illustrated using a set of mammalian sequences from the CFTR region.
Collapse
|
65
|
Rosenbloom K, Taylor J, Schaeffer S, Kent J, Haussler D, Miller W. Phylogenomic resources at the UCSC Genome Browser. Methods Mol Biol 2008; 422:133-44. [PMID: 18629665 DOI: 10.1007/978-1-59745-581-7_9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The UC Santa Cruz Genome Browser provides a number of resources that can be used for phylogenomic studies, including (1) whole-genome sequence data from a number of vertebrate species, (2) pairwise alignments of the human genome sequence to a number of other vertebrate genome, (3) a simultaneous alignment of 17 vertebrate genomes (most of them incompletely sequenced) that covers all of the human sequence, (4) several independent sets of multiple alignments covering 1% of the human genome (ENCODE regions), (5) extensive sequence annotation for interpreting those sequences and alignments, and (6) sequence, alignments, and annotations from certain other species, including an alignment of nine insect genomes. We illustrate the use of these resources in the context of assigning rare genomic changes to the branch of the phylogenetic tree where they appear to have occurred, or of looking for evidence supporting a particular possible tree topology. Sample source code for performing such studies is available.
Collapse
Affiliation(s)
- Kate Rosenbloom
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA
| | | | | | | | | | | |
Collapse
|
66
|
Akashi H, Goel P, John A. Ancestral inference and the study of codon bias evolution: implications for molecular evolutionary analyses of the Drosophila melanogaster subgroup. PLoS One 2007; 2:e1065. [PMID: 17957249 PMCID: PMC2020436 DOI: 10.1371/journal.pone.0001065] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2007] [Accepted: 09/21/2007] [Indexed: 11/18/2022] Open
Abstract
Reliable inference of ancestral sequences can be critical to identifying both patterns and causes of molecular evolution. Robustness of ancestral inference is often assumed among closely related species, but tests of this assumption have been limited. Here, we examine the performance of inference methods for data simulated under scenarios of codon bias evolution within the Drosophila melanogaster subgroup. Genome sequence data for multiple, closely related species within this subgroup make it an important system for studying molecular evolutionary genetics. The effects of asymmetric and lineage-specific substitution rates (i.e., varying levels of codon usage bias and departures from equilibrium) on the reliability of ancestral codon usage was investigated. Maximum parsimony inference, which has been widely employed in analyses of Drosophila codon bias evolution, was compared to an approach that attempts to account for uncertainty in ancestral inference by weighting ancestral reconstructions by their posterior probabilities. The latter approach employs maximum likelihood estimation of rate and base composition parameters. For equilibrium and most non-equilibrium scenarios that were investigated, the probabilistic method appears to generate reliable ancestral codon bias inferences for molecular evolutionary studies within the D. melanogaster subgroup. These reconstructions are more reliable than parsimony inference, especially when codon usage is strongly skewed. However, inference biases are considerable for both methods under particular departures from stationarity (i.e., when adaptive evolution is prevalent). Reliability of inference can be sensitive to branch lengths, asymmetry in substitution rates, and the locations and nature of lineage-specific processes within a gene tree. Inference reliability, even among closely related species, can be strongly affected by (potentially unknown) patterns of molecular evolution in lineages ancestral to those of interest.
Collapse
Affiliation(s)
- Hiroshi Akashi
- Institute of Molecular Evolutionary Genetics, Department of Biology, Pennsylvania State University, State College, Pennsylvania, United States of America.
| | | | | |
Collapse
|
67
|
Abstract
Multi-sequence alignments of large genomic regions are at the core of many computational genome-annotation approaches aimed at identifying coding regions, RNA genes, regulatory regions, and other functional features. Such alignments also underlie many genome-evolution studies. Here we review recent computational advances in the area of multi-sequence alignment, focusing on methods suitable for aligning whole vertebrate genomes. We introduce the key algorithmic ideas in use today, and identify publicly available resources for computing, accessing, and visualizing genomic alignments. Finally, we describe the latest alignment-based approaches to identify and characterize various types of functional sequences. Key areas of research are identified and directions for future improvements are suggested.
Collapse
Affiliation(s)
- Mathieu Blanchette
- McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
68
|
Oliveri RS. Epigenetic dedifferentiation of somatic cells into pluripotency: cellular alchemy in the age of regenerative medicine? Regen Med 2007; 2:795-816. [PMID: 17907932 DOI: 10.2217/17460751.2.5.795] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Ever since the derivation of the first human embryonic stem cell line, hopes have persisted for the treatment of a wide range of cellular degenerative diseases. However, significant immuno-incompatibility between donor cells and recipient patients remains an unsolved challenge. Currently, three main strategies are investigated in humans to create autologous pluripotent stem cells: somatic cell nuclear transfer, cell fusion and cell extract incubation. All methods exploit the fact that a somatic genome is amenable to epigenetic dedifferentiation into a more plastic state, presumably through direct exposure to and manipulation by heterologous transcriptional factors. Epigenetic reprogramming includes profound modifications of chromatin structure, but the responsible mechanisms that work in toti- and pluripotent cells remain largely unknown. This review presents a brief introduction to stem cell terminology and epigenetics, followed by a critical examination of the predominant methodologies involved. Finally, the search for specific reprogramming factors is discussed, and obstacles for the clinical implementation of reprogrammed cells are addressed.
Collapse
Affiliation(s)
- Roberto S Oliveri
- The Juliane Marie Center for Children, Women, and Reproduction, Laboratory of Reproductive Biology, Rigshospitalet, Blegdamsvej 9, DK-2100 Copenhagen, Denmark.
| |
Collapse
|
69
|
Diallo AB, Makarenkov V, Blanchette M. Exact and heuristic algorithms for the Indel Maximum Likelihood Problem. J Comput Biol 2007; 14:446-61. [PMID: 17572023 DOI: 10.1089/cmb.2007.a006] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing the most likely scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, that we called the Indel Maximum Likelihood Problem (IMLP), is an important step toward the reconstruction of ancestral genomics sequences, and is important for studying evolutionary processes, genome function, adaptation and convergence. We solve the IMLP using a new type of tree hidden Markov model whose states correspond to single-base evolutionary scenarios and where transitions model dependencies between neighboring columns. The standard Viterbi and Forward-backward algorithms are optimized to produce the most likely ancestral reconstruction and to compute the level of confidence associated to specific regions of the reconstruction. A heuristic is presented to make the method practical for large data sets, while retaining an extremely high degree of accuracy. The methods are illustrated on a 1-Mb alignment of the CFTR regions from 12 mammals.
Collapse
Affiliation(s)
- Abdoulaye Banire Diallo
- McGill Centre for Bioinformatics and School of Computer Science, McGill University, Montréal, Québec, Canada
| | | | | |
Collapse
|
70
|
Rolland M, Jensen MA, Nickle DC, Yan J, Learn GH, Heath L, Weiner D, Mullins JI. Reconstruction and function of ancestral center-of-tree human immunodeficiency virus type 1 proteins. J Virol 2007; 81:8507-14. [PMID: 17537854 PMCID: PMC1951385 DOI: 10.1128/jvi.02683-06] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The extensive diversity of human immunodeficiency virus type 1 (HIV-1) and its capacity to mutate and escape host immune responses are major challenges for AIDS vaccine development. Ancestral sequences, which minimize the genetic distance to circulating strains, provide an opportunity to design immunogens with the potential to elicit broad recognition of HIV epitopes. We developed a phylogenetics-informed algorithm to reconstruct ancestral HIV sequences, called Center of Tree (COT). COT sequences have potentially significant benefits over isolate-based strategies, as they minimize the evolutionary distances to circulating strains. COT sequences are designed to surmount the potential pitfalls stemming from sampling bias with the consensus method and outlier bias with the most-recent-common-ancestor approach. We computationally derived COT sequences from circulating HIV-1 subtype B sequences for the genes encoding the major viral structural protein (Gag) and two regulatory proteins, Tat and Nef. COT genes were synthesized de novo and expressed in mammalian cells, and the proteins were characterized. COT Gag was shown to generate virus-like particles, while COT Tat transactivated gene expression from the HIV-1 long terminal repeat and COT Nef mediated downregulation of cell surface major histocompatibility complex class I. Thus, retrodicted ancestral COT proteins can retain the biological functions of extant HIV-1 proteins. Additionally, COT proteins were immunogenic, as they elicited antigen-specific cytotoxic T-lymphocyte responses in mice. These data support the utility of the COT approach to create novel and biologically active ancestral proteins as a starting point for studies of the structure, function, and biological fitness of highly variable genes, as well as for the rational design of globally relevant vaccine candidates.
Collapse
MESH Headings
- AIDS Vaccines/genetics
- AIDS Vaccines/immunology
- Algorithms
- Amino Acid Sequence
- Animals
- Antigens, Viral/classification
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- Base Sequence
- Directed Molecular Evolution/methods
- Epitopes/genetics
- Epitopes/immunology
- Female
- Gene Products, gag/classification
- Gene Products, gag/genetics
- Gene Products, gag/immunology
- Gene Products, nef/classification
- Gene Products, nef/genetics
- Gene Products, nef/immunology
- Gene Products, tat/classification
- Gene Products, tat/genetics
- Gene Products, tat/immunology
- HIV-1/genetics
- HIV-1/immunology
- Humans
- Mice
- Mice, Inbred BALB C
- Molecular Sequence Data
- Phylogeny
- nef Gene Products, Human Immunodeficiency Virus
- tat Gene Products, Human Immunodeficiency Virus
Collapse
Affiliation(s)
- Morgane Rolland
- Department of Microbiology SC-42, University of Washington, Seattle, WA 98195-8070, USA
| | | | | | | | | | | | | | | |
Collapse
|
71
|
Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, Dinh HH, Dugan-Rocha S, Fulton LA, Gabisi RA, Garner TT, Godfrey J, Hawes AC, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Kirkness EF, Cree A, Fowler RG, Lee S, Lewis LR, Li Z, Liu YS, Moore SM, Muzny D, Nazareth LV, Ngo DN, Okwuonu GO, Pai G, Parker D, Paul HA, Pfannkoch C, Pohl CS, Rogers YH, Ruiz SJ, Sabo A, Santibanez J, Schneider BW, Smith SM, Sodergren E, Svatek AF, Utterback TR, Vattathil S, Warren W, White CS, Chinwalla AT, Feng Y, Halpern AL, Hillier LW, Huang X, Minx P, Nelson JO, Pepin KH, Qin X, Sutton GG, Venter E, Walenz BP, Wallis JW, Worley KC, Yang SP, Jones SM, Marra MA, Rocchi M, Schein JE, Baertsch R, Clarke L, Csürös M, Glasscock J, Harris RA, Havlak P, Jackson AR, Jiang H, Liu Y, Messina DN, Shen Y, Song HXZ, Wylie T, Zhang L, Birney E, Han K, Konkel MK, Lee J, Smit AFA, Ullmer B, Wang H, Xing J, Burhans R, Cheng Z, Karro JE, Ma J, Raney B, She X, Cox MJ, Demuth JP, Dumas LJ, Han SG, Hopkins J, Karimpour-Fard A, Kim YH, Pollack JR, Vinar T, Addo-Quaye C, Degenhardt J, Denby A, Hubisz MJ, Indap A, Kosiol C, Lahn BT, Lawson HA, Marklein A, Nielsen R, Vallender EJ, Clark AG, Ferguson B, Hernandez RD, Hirani K, Kehrer-Sawatzki H, Kolb J, Patil S, Pu LL, Ren Y, Smith DG, Wheeler DA, Schenck I, Ball EV, Chen R, Cooper DN, Giardine B, Hsu F, Kent WJ, Lesk A, Nelson DL, O'brien WE, Prüfer K, Stenson PD, Wallace JC, Ke H, Liu XM, Wang P, Xiang AP, Yang F, Barber GP, Haussler D, Karolchik D, Kern AD, Kuhn RM, Smith KE, Zwieg AS. Evolutionary and biomedical insights from the rhesus macaque genome. Science 2007; 316:222-34. [PMID: 17431167 DOI: 10.1126/science.1139247] [Citation(s) in RCA: 1002] [Impact Index Per Article: 58.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
Collapse
|
72
|
Ogden TH, Rosenberg MS. Alignment and Topological Accuracy of the Direct Optimization approach via POY and Traditional Phylogenetics via ClustalW + PAUP*. Syst Biol 2007; 56:182-93. [PMID: 17454974 DOI: 10.1080/10635150701281102] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Direct optimization frameworks for simultaneously estimating alignments and phylogenies have recently been developed. One such method, implemented in the program POY, is becoming more common for analyses of variable length sequences (e.g., analyses using ribosomal genes) and for combined evidence analyses (morphology + multiple genes). Simulation of sequences containing insertion and deletion events was performed in order to directly compare a widely used method of multiple sequence alignment (ClustalW) and subsequent parsimony analysis in PAUP* with direct optimization via POY. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (clocklike, non-clocklike, and ultrametric). Alignment accuracy scores for the implied alignments from POY and the multiple sequence alignments from ClustalW were calculated and compared. In almost all cases (99.95%), ClustalW produced more accurate alignments than POY-implied alignments, judged by the proportion of correctly identified homologous sites. Topological accuracy (distance to the true tree) for POY topologies and topologies generated under parsimony in PAUP* from the ClustalW alignments were also compared. In 44.94% of the cases, Clustal alignment tree reconstructions via PAUP* were more accurate than POY, whereas in 16.71% of the cases POY reconstructions were more topologically accurate (38.38% of the time they were equally accurate). Comparisons between POY hypothesized alignments and the true alignments indicated that, on average, as alignment error increased, topological accuracy decreased.
Collapse
Affiliation(s)
- T Heath Ogden
- Department of Biological Sciences, Idaho State University, Idaho 83209, USA.
| | | |
Collapse
|
73
|
Murphy WJ, Pringle TH, Crider TA, Springer MS, Miller W. Using genomic data to unravel the root of the placental mammal phylogeny. Genes Dev 2007; 17:413-21. [PMID: 17322288 PMCID: PMC1832088 DOI: 10.1101/gr.5918807] [Citation(s) in RCA: 316] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Accepted: 12/20/2006] [Indexed: 11/24/2022]
Abstract
The phylogeny of placental mammals is a critical framework for choosing future genome sequencing targets and for resolving the ancestral mammalian genome at the nucleotide level. Despite considerable recent progress defining superordinal relationships, several branches remain poorly resolved, including the root of the placental tree. Here we analyzed the genome sequence assemblies of human, armadillo, elephant, and opossum to identify informative coding indels that would serve as rare genomic changes to infer early events in placental mammal phylogeny. We also expanded our species sampling by including sequence data from >30 ongoing genome projects, followed by PCR and sequencing validation of each indel in additional taxa. Our data provide support for a sister-group relationship between Afrotheria and Xenarthra (the Atlantogenata hypothesis), which is in turn the sister-taxon to Boreoeutheria. We failed to recover any indels in support of a basal position for Xenarthra (Epitheria), which is suggested by morphology and a recent retroposon analysis, or a hypothesis with Afrotheria basal (Exafricoplacentalia), which is favored by phylogenetic analysis of large nuclear gene data sets. In addition, we identified two retroposon insertions that also support Atlantogenata and none for the alternative hypotheses. A revised molecular timescale based on these phylogenetic inferences suggests Afrotheria and Xenarthra diverged from other placental mammals approximately 103 (95-114) million years ago. We discuss the impacts of this topology on earlier phylogenetic reconstructions and repeat-based inferences of phylogeny.
Collapse
Affiliation(s)
- William J Murphy
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843, USA.
| | | | | | | | | |
Collapse
|
74
|
Abstract
We propose an approach for identifying microinversions across different species and show that microinversions provide a source of low-homoplasy evolutionary characters. These characters may be used as "certificates" to verify different branches in a phylogenetic tree, turning the challenging problem of phylogeny reconstruction into a relatively simple algorithmic problem. We estimate that there exist hundreds of thousands of microinversions in genomes of mammals from comparative sequencing projects, an untapped source of new phylogenetic characters.
Collapse
Affiliation(s)
- M J Chaisson
- Bioinformatics Program and Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093, USA.
| | | | | |
Collapse
|
75
|
Chindelevitch L, Li Z, Blais E, Blanchette M. On the inference of parsimonious indel evolutionary scenarios. J Bioinform Comput Biol 2006; 4:721-44. [PMID: 16960972 DOI: 10.1142/s0219720006002168] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2005] [Revised: 12/02/2005] [Accepted: 12/31/2005] [Indexed: 11/18/2022]
Abstract
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing a most parsimonious scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, called the Indel Parsimony Problem, is a crucial component of the problem of ancestral genome reconstruction, and its solution provides valuable information to many genome functional annotation approaches. We first show that the problem is NP-complete. Second, we provide an algorithm, based on the fractional relaxation of an integer linear programming formulation. The algorithm is fast in practice, and the solutions it produces are, in most cases, provably optimal. We describe a divide-and-conquer approach that makes it possible to solve very large instances on a simple desktop machine, while retaining guaranteed optimality. Our algorithms are tested and shown efficient and accurate on a set of 1.8 Mb mammalian orthologous sequences in the CFTR region.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- School of Computer Science, McGill University, 3480 University Street, Montreal, Quebec, H3A 2A7, Canada.
| | | | | | | |
Collapse
|
76
|
Abstract
Paleogenomics propels the meaning of genomic studies back through hundreds of millions of years of deep time. Now that the genome of the echinoid Strongylocentrotus purpuratus is sequenced, the operation of its genes can be interpreted in light of the well-understood echinoderm fossil record. Characters that first appear in Early Cambrian forms are still characteristic of echinoderms today. Key genes for one of these characters, the biomineralized tissue stereom, can be identified in the S. purpuratus genome and are likely to be the same genes that were involved with stereom formation in the earliest echinoderms some 520 million years ago.
Collapse
Affiliation(s)
- David J Bottjer
- Department of Earth Sciences, University of Southern California, Los Angeles, CA 90089-0740, USA.
| | | | | | | |
Collapse
|
77
|
Kim J, Sinha S. Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment. Bioinformatics 2006; 23:289-97. [PMID: 17110370 DOI: 10.1093/bioinformatics/btl578] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION A quantitative study of molecular evolutionary events such as substitutions, insertions and deletions from closely related genomes requires (1) an accurate multiple sequence alignment program and (2) a method to annotate the insertions and deletions that explain the 'gaps' in the alignment. Although the former requirement has been extensively addressed, the latter problem has received little attention, especially in a comprehensive probabilistic framework. RESULTS Here, we present Indelign, a program that uses a probabilistic evolutionary model to compute the most likely scenario of insertions and deletions consistent with an input multiple alignment. It is also capable of modifying the given alignment so as to obtain a better agreement with the evolutionary model. We find close to optimal performance and substantial improvement over alternative methods, in tests of Indelign on synthetic data. We use Indelign to analyze regulatory sequences in Drosophila, and find an excess of insertions over deletions, which is different from what has been reported for neutral sequences. AVAILABILITY The Indelign program may be downloaded from the website http://veda.cs.uiuc.edu/indelign/ SUPPLEMENTARY INFORMATION Supplementary material is available at Bioinformatics online.
Collapse
Affiliation(s)
- Jaebum Kim
- Department of Computer Science, University of Illinois, Urbana-Champaign, Urbana, IL, USA
| | | |
Collapse
|
78
|
Wu F, Mueller LA, Crouzillat D, Pétiard V, Tanksley SD. Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics 2006; 174:1407-20. [PMID: 16951058 PMCID: PMC1667096 DOI: 10.1534/genetics.106.062455] [Citation(s) in RCA: 188] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2006] [Accepted: 08/08/2006] [Indexed: 11/18/2022] Open
Abstract
We report herein the application of a set of algorithms to identify a large number (2869) of single-copy orthologs (COSII), which are shared by most, if not all, euasterid plant species as well as the model species Arabidopsis. Alignments of the orthologous sequences across multiple species enabled the design of "universal PCR primers," which can be used to amplify the corresponding orthologs from a broad range of taxa, including those lacking any sequence databases. Functional annotation revealed that these conserved, single-copy orthologs encode a higher-than-expected frequency of proteins transported and utilized in organelles and a paucity of proteins associated with cell walls, protein kinases, transcription factors, and signal transduction. The enabling power of this new ortholog resource was demonstrated in phylogenetic studies, as well as in comparative mapping across the plant families tomato (family Solanaceae) and coffee (family Rubiaceae). The combined results of these studies provide compelling evidence that (1) the ancestral species that gave rise to the core euasterid families Solanaceae and Rubiaceae had a basic chromosome number of x=11 or 12.2) No whole-genome duplication event (i.e., polyploidization) occurred immediately prior to or after the radiation of either Solanaceae or Rubiaceae as has been recently suggested.
Collapse
Affiliation(s)
- Feinan Wu
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA
| | | | | | | | | |
Collapse
|
79
|
Ma J, Zhang L, Suh BB, Raney BJ, Burhans RC, Kent WJ, Blanchette M, Haussler D, Miller W. Reconstructing contiguous regions of an ancestral genome. Genome Res 2006; 16:1557-65. [PMID: 16983148 PMCID: PMC1665639 DOI: 10.1101/gr.5383506] [Citation(s) in RCA: 225] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
This article analyzes mammalian genome rearrangements at higher resolution than has been published to date. We identify 3171 intervals, covering approximately 92% of the human genome, within which we find no rearrangements larger than 50 kilobases (kb) in the lineages leading to human, mouse, rat, and dog from their most recent common ancestor. Combining intervals that are adjacent in all contemporary species produces 1338 segments that may contain large insertions or deletions but that are free of chromosome fissions or fusions as well as inversions or translocations >50 kb in length. We describe a new method for predicting the ancestral order and orientation of those intervals from their observed adjacencies in modern species. We combine the results from this method with data from chromosome painting experiments to produce a map of an early mammalian genome that accounts for 96.8% of the available human genome sequence data. The precision is further increased by mapping inversions as small as 31 bp. Analysis of the predicted evolutionary breakpoints in the human lineage confirms certain published observations but disagrees with others. Although only a few mammalian genomes are currently sequenced to high precision, our theoretical analyses and computer simulations indicate that our results are reasonably accurate and that they will become highly accurate in the foreseeable future. Our methods were developed as part of a project to reconstruct the genome sequence of the last ancestor of human, dogs, and most other placental mammals.
Collapse
Affiliation(s)
- Jian Ma
- Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania 16802, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
80
|
Ogden TH, Rosenberg MS. How should gaps be treated in parsimony? A comparison of approaches using simulation. Mol Phylogenet Evol 2006; 42:817-26. [PMID: 17011794 DOI: 10.1016/j.ympev.2006.07.021] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Revised: 07/07/2006] [Accepted: 07/22/2006] [Indexed: 10/24/2022]
Abstract
Simulation with indels was used to produce alignments where true site homologies in DNA sequences were known; the gaps from these datasets were removed and the sequences were then aligned to produce hypothesized alignments. Both alignments were then analyzed under three widely used methods of treating gaps during tree reconstruction under the maximum parsimony principle. With the true alignments, for many cases (82%), there was no difference in topological accuracy for the different methods of gap coding. However, in cases where a difference was present, coding gaps as a fifth state character or as separate presence/absence characters outperformed treating gaps as unknown/missing data nearly 90% of the time. For the hypothesized alignments, on average, all gap treatment approaches performed equally well. Data sets with higher sequence divergence and more pectinate tree shapes with variable branch lengths are more affected by gap coding than datasets associated with shallower non-pectinate tree shapes.
Collapse
Affiliation(s)
- T Heath Ogden
- Department of Biological Sciences, Idaho State University, Pocatello, ID 83209, USA.
| | | |
Collapse
|
81
|
Berglund-Sonnhammer AC, Steffansson P, Betts MJ, Liberles DA. Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol 2006; 63:240-50. [PMID: 16830091 DOI: 10.1007/s00239-005-0096-1] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2005] [Accepted: 04/15/2006] [Indexed: 10/24/2022]
Abstract
Gene duplication and gene loss as well as other biological events can result in multiple copies of genes in a given species. Because of these gene duplication and loss dynamics, in addition to variation in sequence evolution and other sources of uncertainty, different gene trees ultimately present different evolutionary histories. All of this together results in gene trees that give different topologies from each other, making consensus species trees ambiguous in places. Other sources of data to generate species trees are also unable to provide completely resolved binary species trees. However, in addition to gene duplication events, speciation events have provided some underlying phylogenetic signal, enabling development of algorithms to characterize these processes. Therefore, a soft parsimony algorithm has been developed that enables the mapping of gene trees onto species trees and modification of uncertain or weakly supported branches based on minimizing the number of gene duplication and loss events implied by the tree. The algorithm also allows for rooting of unrooted trees and for removal of in-paralogues (lineage-specific duplicates and redundant sequences masquerading as such). The algorithm has also been made available for download as a software package, Softparsmap.
Collapse
|
82
|
Abstract
The human genome project has had an impact on both biological research and its political organization; this review focuses primarily on the scientific novelty that has emerged from the project but also touches on its political dimensions. The project has generated both anticipated and novel information; in the later category are the description of the unusual distribution of genes, the prevalence of non-protein-coding genes, and the extraordinary evolutionary conservation of some regions of the genome. The applications of the sequence data are just starting to be felt in basic, rather than therapeutic, biomedical research and in the vibrant human origins and variation debates. The political impact of the project is in the unprecedented extent to which directed funding programs have emerged as drivers of basic research and the organization of the multidisciplinary groups that are needed to utilize the human DNA sequence.
Collapse
Affiliation(s)
- Peter F R Little
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney 2074, New South Wales, Australia.
| |
Collapse
|
83
|
Cui L, Leebens-Mack J, Wang LS, Tang J, Rymarquis L, Stern DB, dePamphilis CW. Adaptive evolution of chloroplast genome structure inferred using a parametric bootstrap approach. BMC Evol Biol 2006; 6:13. [PMID: 16469102 PMCID: PMC1421436 DOI: 10.1186/1471-2148-6-13] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2005] [Accepted: 02/09/2006] [Indexed: 11/29/2022] Open
Abstract
Background Genome rearrangements influence gene order and configuration of gene clusters in all genomes. Most land plant chloroplast DNAs (cpDNAs) share a highly conserved gene content and with notable exceptions, a largely co-linear gene order. Conserved gene orders may reflect a slow intrinsic rate of neutral chromosomal rearrangements, or selective constraint. It is unknown to what extent observed changes in gene order are random or adaptive. We investigate the influence of natural selection on gene order in association with increased rate of chromosomal rearrangement. We use a novel parametric bootstrap approach to test if directional selection is responsible for the clustering of functionally related genes observed in the highly rearranged chloroplast genome of the unicellular green alga Chlamydomonas reinhardtii, relative to ancestral chloroplast genomes. Results Ancestral gene orders were inferred and then subjected to simulated rearrangement events under the random breakage model with varying ratios of inversions and transpositions. We found that adjacent chloroplast genes in C. reinhardtii were located on the same strand much more frequently than in simulated genomes that were generated under a random rearrangement processes (increased sidedness; p < 0.0001). In addition, functionally related genes were found to be more clustered than those evolved under random rearrangements (p < 0.0001). We report evidence of co-transcription of neighboring genes, which may be responsible for the observed gene clusters in C. reinhardtii cpDNA. Conclusion Simulations and experimental evidence suggest that both selective maintenance and directional selection for gene clusters are determinants of chloroplast gene order.
Collapse
Affiliation(s)
- Liying Cui
- Department of Biology, Institute of Molecular Evolutionary Genetics, and Huck Institutes of Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jim Leebens-Mack
- Department of Biology, Institute of Molecular Evolutionary Genetics, and Huck Institutes of Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Li-San Wang
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | - Linda Rymarquis
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - David B Stern
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Claude W dePamphilis
- Department of Biology, Institute of Molecular Evolutionary Genetics, and Huck Institutes of Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
84
|
Lucena B, Haussler D. Counterexample to a claim about the reconstruction of ancestral character states. Syst Biol 2006; 54:693-5. [PMID: 16126665 DOI: 10.1080/10635150590950344] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Affiliation(s)
- Brian Lucena
- Division of Computer Science, University of California, Berkeley, California, USA.
| | | |
Collapse
|
85
|
|
86
|
Phylogenetic Profiling of Insertions and Deletions in Vertebrate Genomes. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11732990_23] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
87
|
Siepel A, Pollard KS, Haussler D. New Methods for Detecting Lineage-Specific Selection. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11732990_17] [Citation(s) in RCA: 124] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
88
|
Leebens-Mack J, Vision T, Brenner E, Bowers JE, Cannon S, Clement MJ, Cunningham CW, dePamphilis C, deSalle R, Doyle JJ, Eisen JA, Gu X, Harshman J, Jansen RK, Kellogg EA, Koonin EV, Mishler BD, Philippe H, Pires JC, Qiu YL, Rhee SY, Sjölander K, Soltis DE, Soltis PS, Stevenson DW, Wall K, Warnow T, Zmasek C. Taking the first steps towards a standard for reporting on phylogenies: Minimum Information About a Phylogenetic Analysis (MIAPA). OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2006; 10:231-7. [PMID: 16901231 PMCID: PMC3167193 DOI: 10.1089/omi.2006.10.231] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
Collapse
Affiliation(s)
- Jim Leebens-Mack
- Department of Biology, Institute of Molecular Evolutionary Genetics, and Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
89
|
Pardi F, Goldman N. Species choice for comparative genomics: being greedy works. PLoS Genet 2005; 1:e71. [PMID: 16327885 PMCID: PMC1298936 DOI: 10.1371/journal.pgen.0010071] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2005] [Accepted: 10/27/2005] [Indexed: 12/22/2022] Open
Abstract
Several projects investigating genetic function and evolution through sequencing and comparison of multiple genomes are now underway. These projects consume many resources, and appropriate planning should be devoted to choosing which species to sequence, potentially involving cooperation among different sequencing centres. A widely discussed criterion for species choice is the maximisation of evolutionary divergence. Our mathematical formalization of this problem surprisingly shows that the best long-term cooperative strategy coincides with the seemingly short-term “greedy” strategy of always choosing the next best single species. Other criteria influencing species choice, such as medical relevance or sequencing costs, can also be accommodated in our approach, suggesting our results' broad relevance in scientific policy decisions. What would happen if sequencing centres around the world were to choose genomes without consulting each other and without devising long-term strategies? When several parties are involved in decisions with interacting consequences, experience teaches that cooperation and planning are usually necessary to guarantee the best result. Similarly, in computer science, “greedy” algorithms—which construct solutions by iteratively taking the best immediate choice—are rarely the best option to solve a problem. The authors show, however, that in the context of choosing species for comparative genomics, cooperation and planning can be kept to a minimum without affecting the quality of the global result: a greedy algorithm applied to the problem of maximising the evolutionary divergence among species chosen from a known phylogeny is proven to guarantee optimal solutions.
Collapse
Affiliation(s)
- Fabio Pardi
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
| | | |
Collapse
|
90
|
Abstract
The chimp was a great start. But the genomes of our other primate relatives will help to reveal a whole lot more, says Carina Dennis. The cover photo by Kevin Langergraber shows the adult female chimpanzee ‘Jolie’ in Kibale National Park, Uganda. This was taken on 16 August 2004, a few weeks before Jolie gave birth to her first infant. This week marks a landmark in the study of our closest living relative: the publication by the Chimpanzee Sequencing and Analysis Consortium of the initial sequence of the chimpanzee genome, together with a comparison with the human genome. The paper describes changes that have shaped human and chimpanzee species since the split from our common ancestor, and hints at what makes us uniquely human: 35 million single-nucleotide substitutions, 5 million small insertions and deletions, local rearrangements and a chromosome fusion. A comparison of gene duplications in chimpanzee and human genomes reveals gene expression differences that may underlie disease susceptibility. A study of primate genomes shows that subtelomeres are hot spots of recent chromosomal duplication and gene conversion. Conservation of Y-linked genes during human evolution is revealed by comparative sequencing in the chimpanzee. The final research paper in this collection fills a big gap in our knowledge: the first chimpanzee fossils ever found show that chimps and early humans inhabited the same environments in which they evolved and diverged. The fossils — three teeth — are from half-million-year-old sediments in Kenya that also yielded fossils of Homo . Four Progress reviews accompany these papers, looking at chimp culture, social behaviour, psychology and cognition. Elsewhere in the issue, researchers talk about working with chimpanzees, a feature summarizes other primate genome projects, and in two Commentaries, important ethical issues surrounding research on great apes are considered.
Collapse
|
91
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447508 DOI: 10.1002/cfg.422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
92
|
Khamsi R. Ancient mammal genes reconstructed. Nature 2004. [DOI: 10.1038/news041129-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|