1
|
Cerón-Romero MA, Fonseca MM, de Oliveira Martins L, Posada D, Katz LA. Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages. Genome Biol Evol 2022; 14:evac119. [PMID: 35880421 PMCID: PMC9366629 DOI: 10.1093/gbe/evac119] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2022] [Indexed: 12/02/2022] Open
Abstract
Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.
Collapse
Affiliation(s)
- Mario A Cerón-Romero
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, USA
| | - Miguel M Fonseca
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
| | - Leonardo de Oliveira Martins
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
2
|
Abstract
One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought.
Collapse
Affiliation(s)
- Warren R Francis
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gert Wörheide
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany.,GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany.,Bavarian State Collection for Paleontology and Geology, Munich, Germany
| |
Collapse
|
3
|
Grealy A, Phillips M, Miller G, Gilbert MTP, Rouillard JM, Lambert D, Bunce M, Haile J. Eggshell palaeogenomics: Palaeognath evolutionary history revealed through ancient nuclear and mitochondrial DNA from Madagascan elephant bird (Aepyornis sp.) eggshell. Mol Phylogenet Evol 2017; 109:151-163. [DOI: 10.1016/j.ympev.2017.01.005] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 12/20/2016] [Accepted: 01/07/2017] [Indexed: 01/12/2023]
|
4
|
Zimin AV, Cornish AS, Maudhoo MD, Gibbs RM, Zhang X, Pandey S, Meehan DT, Wipfler K, Bosinger SE, Johnson ZP, Tharp GK, Marçais G, Roberts M, Ferguson B, Fox HS, Treangen T, Salzberg SL, Yorke JA, Norgren RB. A new rhesus macaque assembly and annotation for next-generation sequencing analyses. Biol Direct 2014; 9:20. [PMID: 25319552 PMCID: PMC4214606 DOI: 10.1186/1745-6150-9-20] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 10/03/2014] [Indexed: 12/13/2022] Open
Abstract
Background The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses. Results We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies. Conclusions The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates. Reviewers This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Robert B Norgren
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, Nebraska 68198, USA.
| |
Collapse
|
5
|
Ogawa LM, Vallender EJ. Evolutionary conservation in genes underlying human psychiatric disorders. Front Hum Neurosci 2014; 8:283. [PMID: 24834046 PMCID: PMC4018557 DOI: 10.3389/fnhum.2014.00283] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 04/16/2014] [Indexed: 01/07/2023] Open
Abstract
Many psychiatric diseases observed in humans have tenuous or absent analogs in other species. Most notable among these are schizophrenia and autism. One hypothesis has posited that these diseases have arisen as a consequence of human brain evolution, for example, that the same processes that led to advances in cognition, language, and executive function also resulted in novel diseases in humans when dysfunctional. Here, the molecular evolution of the protein-coding regions of genes associated with these and other psychiatric disorders are compared among species. Genes associated with psychiatric disorders are drawn from the literature and orthologous sequences are collected from eleven primate species (human, chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, baboon, marmoset, squirrel monkey, and galago) and 34 non-primate mammalian species. Evolutionary parameters, including dN/dS, are calculated for each gene and compared between disease classes and among species, focusing on humans and primates compared to other mammals, and on large-brained taxa (cetaceans, rhinoceros, walrus, bear, and elephant) compared to their small-brained sister species. Evidence of differential selection in humans to the exclusion of non-human primates was absent, however elevated dN/dS was detected in catarrhines as a whole, as well as in cetaceans, possibly as part of a more general trend. Although this may suggest that protein changes associated with schizophrenia and autism are not a cost of the higher brain function found in humans, it may also point to insufficiencies in the study of these diseases including incomplete or inaccurate gene association lists and/or a greater role of regulatory changes or copy number variation. Through this work a better understanding of the molecular evolution of the human brain, the pathophysiology of disease, and the genetic basis of human psychiatric disease is gained.
Collapse
Affiliation(s)
- Lisa M Ogawa
- Division of Neuroscience, New England Primate Research Center, Harvard Medical School Southborough, MA, USA
| | - Eric J Vallender
- Division of Neuroscience, New England Primate Research Center, Harvard Medical School Southborough, MA, USA
| |
Collapse
|
6
|
Abstract
The study of nonhuman primates (NHP) is key to understanding human evolution, in addition to being an important model for biomedical research. NHPs are especially important for translational medicine. There are now exciting opportunities to greatly increase the utility of these models by incorporating Next Generation (NextGen) sequencing into study design. Unfortunately, the draft status of nonhuman genomes greatly constrains what can currently be accomplished with available technology. Although all genomes contain errors, draft assemblies and annotations contain so many mistakes that they make currently available nonhuman primate genomes misleading to investigators conducting evolutionary studies; and these genomes are of insufficient quality to serve as references for NextGen studies. Fortunately, NextGen sequencing can be used in the production of greatly improved genomes. Existing Sanger sequences can be supplemented with NextGen whole genome, and exomic genomic sequences to create new, more complete and correct assemblies. Additional physical mapping, and an incorporation of information about gene structure, can be used to improve assignment of scaffolds to chromosomes. In addition, mRNA-sequence data can be used to economically acquire transcriptome information, which can be used for annotation. Some highly polymorphic and complex regions, for example MHC class I and immunoglobulin loci, will require extra effort to properly assemble and annotate. However, for the vast majority of genes, a modest investment in money, and a somewhat greater investment in time, can greatly improve assemblies and annotations sufficient to produce true, reference grade nonhuman primate genomes. Such resources can reasonably be expected to transform nonhuman primate research.
Collapse
Affiliation(s)
- Robert B. Norgren
- Address correspondence and reprint requests to Dr. Robert B. Norgren, Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, 985805 Nebraska Medical Center, Omaha, NE 68198 or email
| |
Collapse
|
7
|
Zhang X, Goodsell J, Norgren RB. Limitations of the rhesus macaque draft genome assembly and annotation. BMC Genomics 2012; 13:206. [PMID: 22646658 PMCID: PMC3426473 DOI: 10.1186/1471-2164-13-206] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 05/30/2012] [Indexed: 11/30/2022] Open
Abstract
Finished genome sequences and assemblies are available for only a few vertebrates. Thus, investigators studying many species must rely on draft genomes. Using the rhesus macaque as an example, we document the effects of sequencing errors, gaps in sequence and misassemblies on one automated gene model pipeline, Gnomon. The combination of draft genome with automated gene finding software can result in spurious sequences. We estimate that approximately 50% of the rhesus gene models are missing, incomplete or incorrect. The problems identified in this work likely apply to all draft vertebrate genomes annotated with any automated gene model pipeline and thus represent a pervasive challenge to the analysis of draft genomes.
Collapse
Affiliation(s)
- Xiongfei Zhang
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | | | | |
Collapse
|
8
|
Vallender EJ. Expanding whole exome resequencing into non-human primates. Genome Biol 2011; 12:R87. [PMID: 21917143 PMCID: PMC3308050 DOI: 10.1186/gb-2011-12-9-r87] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Revised: 08/03/2011] [Accepted: 09/02/2011] [Indexed: 01/03/2023] Open
Abstract
Background Complete exome resequencing has the power to greatly expand our understanding of non-human primate genomes. This includes both a better appreciation of the variation that exists in non-human primate model species, but also an improved annotation of their genomes. By developing an understanding of the variation between individuals, non-human primate models of human disease can be better developed. This effort is hindered largely by the lack of comprehensive information on specific non-human primate genetic variation and the costs of generating these data. If the tools that have been developed in humans for complete exome resequencing can be applied to closely related non-human primate species, then these difficulties can be circumvented. Results Using a human whole exome enrichment technique, chimpanzee and rhesus macaque samples were captured alongside a human sample and sequenced using standard next-generation methodologies. The results from the three species were then compared for efficacy. The chimpanzee sample showed similar coverage levels and distributions following exome capture based on the human genome as the human sample. The rhesus macaque sample showed significant coverage in protein-coding sequence but significantly less in untranslated regions. Both chimpanzee and rhesus macaque showed significant numbers of frameshift mutations compared to self-genomes and suggest a need for further annotation. Conclusions Current whole exome resequencing technologies can successfully be used to identify coding-region variation in non-human primates extending into old world monkeys. In addition to identifying variation, whole exome resequencing can aid in better annotation of non-human primate genomes.
Collapse
Affiliation(s)
- Eric J Vallender
- New England Primate Research Center, Harvard Medical School, One Pine Hill Drive, Southborough, MA 01772, USA.
| |
Collapse
|
9
|
Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors. Genes (Basel) 2011; 2:449-501. [PMID: 24710207 PMCID: PMC3927609 DOI: 10.3390/genes2030449] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Revised: 06/14/2011] [Accepted: 06/20/2011] [Indexed: 11/17/2022] Open
Abstract
In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI's GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1].
Collapse
|
10
|
Vallender EJ, Xie Z, Westmoreland SV, Miller GM. Functional evolution of the trace amine associated receptors in mammals and the loss of TAAR1 in dogs. BMC Evol Biol 2010; 10:51. [PMID: 20167089 PMCID: PMC2838891 DOI: 10.1186/1471-2148-10-51] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2009] [Accepted: 02/18/2010] [Indexed: 01/01/2023] Open
Abstract
Background The trace amine associated receptor family is a diverse array of GPCRs that arose before the first vertebrates walked on land. Trace amine associated receptor 1 (TAAR1) is a wide spectrum aminergic receptor that acts as a modulator in brain monoaminergic systems. Other trace amine associated receptors appear to relate to environmental perception and show a birth-and-death pattern in mammals similar to olfactory receptors. Results Across mammals, avians, and amphibians, the TAAR1 gene is intact and appears to be under strong purifying selection based on rates of amino acid fixation compared to neutral mutations. We have found that in dogs it has become a pseudogene. Our analyses using a comparative genetics approach revealed that the pseudogenization event predated the emergence of the Canini tribe rather than being coincident with canine domestication. By assessing the effects of the TAAR1 agonist β-phenylethylamine on [3H]dopamine uptake in canine striatal synaptosomes and comparing the degree and pattern of uptake inhibition to that seen in other mammals, including TAAR1 knockout mice, wild type mice and rhesus monkey, we found that the TAAR1 pseudogenization event resulted in an uncompensated loss of function. Conclusion The gene family has seen expansions among certain mammals, notably rodents, and reductions in others, including primates. By placing the trace amine associated receptors in an evolutionary context we can better understand their function and their potential associations with behavior and neurological disease.
Collapse
Affiliation(s)
- Eric J Vallender
- New England Primate Research Center, Harvard Medical School, One Pine Hill Drive, Southborough, MA 01772, USA.
| | | | | | | |
Collapse
|