1
|
Wang Y, Tang H, Wang X, Sun Y, Joseph PV, Paterson AH. Detection of colinear blocks and synteny and evolutionary analyses based on utilization of MCScanX. Nat Protoc 2024; 19:2206-2229. [PMID: 38491145 DOI: 10.1038/s41596-024-00968-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 12/20/2023] [Indexed: 03/18/2024]
Abstract
As different taxa evolve, gene order often changes slowly enough that chromosomal 'blocks' with conserved gene orders (synteny) are discernible. The MCScanX toolkit ( https://github.com/wyp1125/MCScanX ) was published in 2012 as freely available software for the detection of such 'colinear blocks' and subsequent synteny and evolutionary analyses based on genome-wide gene location and protein sequence information. Owing to its simplicity and high efficiency for colinear block detection, MCScanX provides a powerful tool for conducting diverse synteny and evolutionary analyses. Moreover, the detection of colinear blocks has been embraced as an integral step for pangenome graph construction. Here, new application trends of MCScanX are explored, striving to better connect this increasingly used tool to other tools and accelerate insight generation from exponentially growing sequence data. We provide a detailed protocol that covers how to install MCScanX on diverse platforms, tune parameters, prepare input files from data from the National Center for Biotechnology Information, run MCScanX and its visualization and evolutionary analysis tools, and connect MCScanX with external tools, including MCScanX-transposed, Circos and SynVisio. This protocol is easily implemented by users with minimal computational background and is adaptable to new data of interest to them. The data and utility programs for this protocol can be obtained from http://bdx-consulting.com/mcscanx-protocol .
Collapse
Affiliation(s)
- Yupeng Wang
- BDX Research & Consulting LLC, Herndon, VA, USA
- Plant Genome Mapping Laboratory, The University of Georgia, Athens, GA, USA
| | - Haibao Tang
- Plant Genome Mapping Laboratory, The University of Georgia, Athens, GA, USA
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xiyin Wang
- Plant Genome Mapping Laboratory, The University of Georgia, Athens, GA, USA
- Center for Genomics, College of Science, North China University of Science and Technology, Tangshan, China
| | - Ying Sun
- BDX Research & Consulting LLC, Herndon, VA, USA
| | - Paule V Joseph
- Section of Sensory Science and Metabolism, National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA.
- National Institute of Nursing Research, Bethesda, MD, USA.
| | - Andrew H Paterson
- Plant Genome Mapping Laboratory, The University of Georgia, Athens, GA, USA.
| |
Collapse
|
2
|
Lopes JML, Nascimento LSDQ, Souza VC, de Matos EM, Fortini EA, Grazul RM, Santos MO, Soltis DE, Soltis PS, Otoni WC, Viccini LF. Water stress modulates terpene biosynthesis and morphophysiology at different ploidal levels in Lippia alba (Mill.) N. E. Brown (Verbenaceae). PROTOPLASMA 2024; 261:227-243. [PMID: 37665420 DOI: 10.1007/s00709-023-01890-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 08/18/2023] [Indexed: 09/05/2023]
Abstract
Monoterpenes are the main component in essential oils of Lippia alba. In this species, the chemical composition of essential oils varies with genome size: citral (geraniol and neral) is dominant in diploids and tetraploids, and linalool in triploids. Because environmental stress impacts various metabolic pathways, we hypothesized that stress responses in L. alba could alter the relationship between genome size and essential oil composition. Water stress affects the flowering, production, and reproduction of plants. Here, we evaluated the effect of water stress on morphophysiology, essential oil production, and the expression of genes related to monoterpene synthesis in diploid, triploid, and tetraploid accessions of L. alba cultivated in vitro for 40 days. First, using transcriptome data, we performed de novo gene assembly and identified orthologous genes using phylogenetic and clustering-based approaches. The expression of candidate genes related to terpene biosynthesis was estimated by real-time quantitative PCR. Next, we assessed the expression of these genes under water stress conditions, whereby 1% PEG-4000 was added to MS medium. Water stress modulated L. alba morphophysiology at all ploidal levels. Gene expression and essential oil production were affected in triploid accessions. Polyploid accessions showed greater growth and metabolic tolerance under stress compared to diploids. These results confirm the complex regulation of metabolic pathways such as the production of essential oils in polyploid genomes. In addition, they highlight aspects of genotype and environment interactions, which may be important for the conservation of tropical biodiversity.
Collapse
Affiliation(s)
- Juliana Mainenti Leal Lopes
- Department of Biology, Insitute of Biological Science, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, 36036-900, Brazil
- School of Life Science and Environment, Department of Genetic and Biotechnology, University of Trás-Os-Montes and Alto Douro, 5001-801, Vila Real, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1649-004, Lisbon, Portugal
| | | | - Vinicius Carius Souza
- Department of Biology, Insitute of Biological Science, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, 36036-900, Brazil
| | - Elyabe Monteiro de Matos
- Department of Biology, Insitute of Biological Science, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, 36036-900, Brazil
| | - Evandro Alexandre Fortini
- Laboratory of Plant Tissue Culture (LCTII), Department of Plant Biology/BIOAGRO, Universidade Federal de Viçosa, Av. P.H. Rolfs S/N, Campus Universitário, Viçosa, MG, 36570-000, Brazil
| | | | - Marcelo Oliveira Santos
- Department of Biology, Insitute of Biological Science, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, 36036-900, Brazil
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Department of Biology, University of Florida, Gainesville, FL, 32611, USA
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
| | - Wagner Campos Otoni
- Laboratory of Plant Tissue Culture (LCTII), Department of Plant Biology/BIOAGRO, Universidade Federal de Viçosa, Av. P.H. Rolfs S/N, Campus Universitário, Viçosa, MG, 36570-000, Brazil
| | - Lyderson Facio Viccini
- Department of Biology, Insitute of Biological Science, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, 36036-900, Brazil.
| |
Collapse
|
3
|
Singh V, Singh V. Inferring Interaction Networks from Transcriptomic Data: Methods and Applications. Methods Mol Biol 2024; 2812:11-37. [PMID: 39068355 DOI: 10.1007/978-1-0716-3886-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Transcriptomic data is a treasure trove in modern molecular biology, as it offers a comprehensive viewpoint into the intricate nuances of gene expression dynamics underlying biological systems. This genetic information must be utilized to infer biomolecular interaction networks that can provide insights into the complex regulatory mechanisms underpinning the dynamic cellular processes. Gene regulatory networks and protein-protein interaction networks are two major classes of such networks. This chapter thoroughly investigates the wide range of methodologies used for distilling insightful revelations from transcriptomic data that include association-based methods (based on correlation among expression vectors), probabilistic models (using Bayesian and Gaussian models), and interologous methods. We reviewed different approaches for evaluating the significance of interactions based on the network topology and biological functions of the interacting molecules and discuss various strategies for the identification of functional modules. The chapter concludes with highlighting network-based techniques of prioritizing key genes, outlining the centrality-based, diffusion- based, and subgraph-based methods. The chapter provides a meticulous framework for investigating transcriptomic data to uncover assembly of complex molecular networks for their adaptable analyses across a broad spectrum of biological domains.
Collapse
Affiliation(s)
- Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, Himachal Pradesh, India
| | - Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, Himachal Pradesh, India.
| |
Collapse
|
4
|
Lyubetsky VA, Rubanov LI, Tereshina MB, Ivanova AS, Araslanova KR, Uroshlev LA, Goremykina GI, Yang JR, Kanovei VG, Zverkov OA, Shitikov AD, Korotkova DD, Zaraisky AG. Wide-scale identification of novel/eliminated genes responsible for evolutionary transformations. Biol Direct 2023; 18:45. [PMID: 37568147 PMCID: PMC10416458 DOI: 10.1186/s13062-023-00405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 08/07/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND It is generally accepted that most evolutionary transformations at the phenotype level are associated either with rearrangements of genomic regulatory elements, which control the activity of gene networks, or with changes in the amino acid contents of proteins. Recently, evidence has accumulated that significant evolutionary transformations could also be associated with the loss/emergence of whole genes. The targeted identification of such genes is a challenging problem for both bioinformatics and evo-devo research. RESULTS To solve this problem we propose the WINEGRET method, named after the first letters of the title. Its main idea is to search for genes that satisfy two requirements: first, the desired genes were lost/emerged at the same evolutionary stage at which the phenotypic trait of interest was lost/emerged, and second, the expression of these genes changes significantly during the development of the trait of interest in the model organism. To verify the first requirement, we do not use existing databases of orthologs, but rely purely on gene homology and local synteny by using some novel quickly computable conditions. Genes satisfying the second requirement are found by deep RNA sequencing. As a proof of principle, we used our method to find genes absent in extant amniotes (reptiles, birds, mammals) but present in anamniotes (fish and amphibians), in which these genes are involved in the regeneration of large body appendages. As a result, 57 genes were identified. For three of them, c-c motif chemokine 4, eotaxin-like, and a previously unknown gene called here sod4, essential roles for tail regeneration were demonstrated. Noteworthy, we established that the latter gene belongs to a novel family of Cu/Zn-superoxide dismutases lost by amniotes, SOD4. CONCLUSIONS We present a method for targeted identification of genes whose loss/emergence in evolution could be associated with the loss/emergence of a phenotypic trait of interest. In a proof-of-principle study, we identified genes absent in amniotes that participate in body appendage regeneration in anamniotes. Our method provides a wide range of opportunities for studying the relationship between the loss/emergence of phenotypic traits and the loss/emergence of specific genes in evolution.
Collapse
Affiliation(s)
- Vassily A Lyubetsky
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
- Department of Mechanics and Mathematics, Lomonosov Moscow State University, Kolmogorova Str., 1, Moscow, Russia, 119234
| | - Lev I Rubanov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Maria B Tereshina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Pirogov Russian National Research Medical University, Moscow, Russia
| | - Anastasiya S Ivanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, USA
| | - Karina R Araslanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Leonid A Uroshlev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32, Vavilova Str., Moscow, Russia, 119991
| | - Galina I Goremykina
- Plekhanov Russian University of Economics, Stremyanny Lane 36, Moscow, Russia
| | - Jian-Rong Yang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Vladimir G Kanovei
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Oleg A Zverkov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Alexander D Shitikov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Daria D Korotkova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Global Health Institute, School of Life Sciences, EPFL, Lausanne, Switzerland
| | - Andrey G Zaraisky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997.
- Pirogov Russian National Research Medical University, Moscow, Russia.
| |
Collapse
|
5
|
Watanabe T, Kure A, Horiike T. OrthoPhy: A Program to Construct Ortholog Data Sets Using Taxonomic Information. Genome Biol Evol 2023; 15:7044703. [PMID: 36799928 PMCID: PMC9991595 DOI: 10.1093/gbe/evad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 01/30/2023] [Accepted: 02/13/2023] [Indexed: 02/18/2023] Open
Abstract
Species phylogenetic trees represent the evolutionary processes of organisms, and they are fundamental in evolutionary research. Therefore, new methods have been developed to obtain more reliable species phylogenetic trees. A highly reliable method is the construction of an ortholog data set based on sequence information of genes, which is then used to infer the species phylogenetic tree. However, although methods for constructing an ortholog data set for species phylogenetic analysis have been developed, they cannot remove some paralogs, which is necessary for reliable species phylogenetic inference. To address the limitations of current methods, we developed OrthoPhy, a program that excludes paralogs and constructs highly accurate ortholog data sets using taxonomic information dividing analyzed species into monophyletic groups. OrthoPhy can remove paralogs, detecting inconsistencies between taxonomic information and phylogenetic trees of candidate ortholog groups clustered by sequence similarity. Performance tests using evolutionary simulated sequences and real sequences of 40 bacteria revealed that the precision of ortholog inference by OrthoPhy is higher than that of existing programs. Additionally, the phylogenetic analysis of species was more accurate when performed using ortholog data sets constructed by OrthoPhy than that performed using data sets constructed by existing programs. Furthermore, we performed a benchmark test of the Quest for Orthologs using real sequence data and found that the concordance rate between the phylogenetic trees of orthologs inferred by OrthoPhy and those of species was higher than the rates obtained by other ortholog inference programs. Therefore, ortholog data sets constructed using OrthoPhy enabled a more accurate phylogenetic analysis of species than those constructed using the existing programs, and OrthoPhy can be used for the phylogenetic analysis of species even for distantly related species that have experienced many evolutionary events.
Collapse
Affiliation(s)
- Tomoaki Watanabe
- United Graduate School of Agricultural Science, Gifu University, Gifu, Japan
| | - Akinori Kure
- Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
| | - Tokumasa Horiike
- Department of Bioresource Sciences, Shizuoka University, Shizuoka, Japan
| |
Collapse
|
6
|
Bastide P, Soneson C, Stern DB, Lespinet O, Gallopin M. A Phylogenetic Framework to Simulate Synthetic Interspecies RNA-Seq Data. Mol Biol Evol 2023; 40:msac269. [PMID: 36508357 PMCID: PMC11249980 DOI: 10.1093/molbev/msac269] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Interspecies RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single-species differential expression analysis is now a well-studied problem that benefits from sound statistical methods. Extensive reviews on biological or synthetic datasets have provided the community with a clear picture on the relative performances of the available methods in various settings. However, synthetic dataset simulation tools are still missing in the interspecies gene expression context. In this work, we develop and implement a new simulation framework. This tool builds on both the RNA-Seq and the phylogenetic comparative methods literatures to generate realistic count datasets, while taking into account the phylogenetic relationships between the samples. We illustrate the usefulness of this new framework through a targeted simulation study, that reproduces the features of a recently published dataset, containing gene expression data in adult eye tissue across blind and sighted freshwater crayfish species. Using our simulated datasets, we perform a fair comparison of several approaches used for differential expression analysis. This benchmark reveals some of the strengths and weaknesses of both the classical and phylogenetic approaches for interspecies differential expression analysis, and allows for a reanalysis of the crayfish dataset. The tool has been integrated in the R package compcodeR, freely available on Bioconductor.
Collapse
Affiliation(s)
- Paul Bastide
- IMAG, Université de Montpellier, CNRS, Montpellier, France
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - David B Stern
- Department of Integrative Biology, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| | - Olivier Lespinet
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Mélina Gallopin
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| |
Collapse
|
7
|
Cerón-Romero MA, Fonseca MM, de Oliveira Martins L, Posada D, Katz LA. Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages. Genome Biol Evol 2022; 14:evac119. [PMID: 35880421 PMCID: PMC9366629 DOI: 10.1093/gbe/evac119] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2022] [Indexed: 12/02/2022] Open
Abstract
Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.
Collapse
Affiliation(s)
- Mario A Cerón-Romero
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, USA
| | - Miguel M Fonseca
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
| | - Leonardo de Oliveira Martins
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
8
|
Rao WQ, Kalogeropoulos K, Allentoft ME, Gopalakrishnan S, Zhao WN, Workman CT, Knudsen C, Jiménez-Mena B, Seneci L, Mousavi-Derazmahalleh M, Jenkins TP, Rivera-de-Torre E, Liu SQ, Laustsen AH. The rise of genomics in snake venom research: recent advances and future perspectives. Gigascience 2022; 11:giac024. [PMID: 35365832 PMCID: PMC8975721 DOI: 10.1093/gigascience/giac024] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 02/12/2022] [Accepted: 02/13/2022] [Indexed: 12/12/2022] Open
Abstract
Snake venoms represent a danger to human health, but also a gold mine of bioactive proteins that can be harnessed for drug discovery purposes. The evolution of snakes and their venom has been studied for decades, particularly via traditional morphological and basic genetic methods alongside venom proteomics. However, while the field of genomics has matured rapidly over the past 2 decades, owing to the development of next-generation sequencing technologies, snake genomics remains in its infancy. Here, we provide an overview of the state of the art in snake genomics and discuss its potential implications for studying venom evolution and toxinology. On the basis of current knowledge, gene duplication and positive selection are key mechanisms in the neofunctionalization of snake venom proteins. This makes snake venoms important evolutionary drivers that explain the remarkable venom diversification and adaptive variation observed in these reptiles. Gene duplication and neofunctionalization have also generated a large number of repeat sequences in snake genomes that pose a significant challenge to DNA sequencing, resulting in the need for substantial computational resources and longer sequencing read length for high-quality genome assembly. Fortunately, owing to constantly improving sequencing technologies and computational tools, we are now able to explore the molecular mechanisms of snake venom evolution in unprecedented detail. Such novel insights have the potential to affect the design and development of antivenoms and possibly other drugs, as well as provide new fundamental knowledge on snake biology and evolution.
Collapse
Affiliation(s)
- Wei-qiao Rao
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads 224, 2800 Kongens Lyngby, Denmark
- Department of Mass Spectrometry, Beijing Genomics Institute-Research, 518083, Shenzhen, China
| | - Konstantinos Kalogeropoulos
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads 224, 2800 Kongens Lyngby, Denmark
| | - Morten E Allentoft
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Kent Street, 6102, Bentley Perth, Australia
- Globe Institute, University of Copenhagen, Øster Voldgade 5, 1350, Copenhagen, Denmark
| | - Shyam Gopalakrishnan
- Globe Institute, University of Copenhagen, Øster Voldgade 5, 1350, Copenhagen, Denmark
| | - Wei-ning Zhao
- Department of Mass Spectrometry, Beijing Genomics Institute-Research, 518083, Shenzhen, China
| | - Christopher T Workman
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads 224, 2800 Kongens Lyngby, Denmark
| | - Cecilie Knudsen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads 224, 2800 Kongens Lyngby, Denmark
| | - Belén Jiménez-Mena
- DTU Aqua, Technical University of Denmark, Vejlsøvej 39, 8600, Silkeborg, Denmark
| | - Lorenzo Seneci
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads 224, 2800 Kongens Lyngby, Denmark
| | - Mahsa Mousavi-Derazmahalleh
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Kent Street, 6102, Bentley Perth, Australia
| | - Timothy P Jenkins
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads 224, 2800 Kongens Lyngby, Denmark
| | - Esperanza Rivera-de-Torre
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads 224, 2800 Kongens Lyngby, Denmark
| | - Si-qi Liu
- Department of Mass Spectrometry, Beijing Genomics Institute-Research, 518083, Shenzhen, China
| | - Andreas H Laustsen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads 224, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
9
|
Dimos B, Emery M, Beavers K, MacKnight N, Brandt M, Demuth J, Mydlarz L. Adaptive Variation in Homolog Number Within Transcript Families Promotes Expression Divergence in Reef-Building Coral. Mol Ecol 2022; 31:2594-2610. [PMID: 35229964 DOI: 10.1111/mec.16414] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 02/10/2022] [Accepted: 02/22/2022] [Indexed: 11/30/2022]
Abstract
Gene expression, especially in multi-species experiments, is used to gain insight into the genetic basis of how organisms adapt and respond to changing environments. However, evolutionary processes which can influence gene expression patterns between species such as the presence of paralogs which arise from gene duplication events are rarely accounted for. Paralogous transcripts can alter the transcriptional output of a gene and thus exclusion of these transcripts can obscure important biological differences between species. To address this issue, we investigated how differences in transcript family size is associated with divergent gene expression patterns in five species of Caribbean reef-building corals. We demonstrate that transcript families that are rapidly evolving in terms of size have increased levels of expression divergence. Additionally, these rapidly evolving transcript families are enriched for multiple biological processes, with genes involved in the coral innate immune system demonstrating pronounced variation in homolog number between species. Overall, this investigation demonstrates the importance of incorporating paralogous transcripts when comparing gene expression across species by influencing both transcriptional output and the number of transcripts within biological processes. As this investigation was based on transcriptome assemblies, additional insights into the relationship between gene duplications and expression patterns will likely emergence once more genome assemblies are available for study.
Collapse
Affiliation(s)
- Bradford Dimos
- Department of Biology, University of Texas at Arlington, Arlington, TX, 76019, USA
| | - Madison Emery
- Department of Biology, University of Texas at Arlington, Arlington, TX, 76019, USA
| | - Kelsey Beavers
- Department of Biology, University of Texas at Arlington, Arlington, TX, 76019, USA
| | - Nicholas MacKnight
- Department of Biology, University of Texas at Arlington, Arlington, TX, 76019, USA
| | - Marilyn Brandt
- Center for Marine and Environmental Studies, University of the Virgin Islands, St. Thomas, US Virgin Islands, 00802, USA
| | - Jeffery Demuth
- Department of Biology, University of Texas at Arlington, Arlington, TX, 76019, USA
| | - Laura Mydlarz
- Department of Biology, University of Texas at Arlington, Arlington, TX, 76019, USA
| |
Collapse
|
10
|
Shokri Bousjein N, Tierney SM, Gardner MG, Schwarz MP. Does effective population size affect rates of molecular evolution: Mitochondrial data for host/parasite species pairs in bees suggests not. Ecol Evol 2022; 12:e8562. [PMID: 35154650 PMCID: PMC8820120 DOI: 10.1002/ece3.8562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 11/30/2021] [Accepted: 12/20/2021] [Indexed: 11/08/2022] Open
Abstract
Adaptive evolutionary theory argues that organisms with larger effective population size (N e) should have higher rates of adaptive evolution and therefore greater capacity to win evolutionary arm races. However, in some certain cases, species with much smaller N e may be able to survive besides their opponents for an extensive evolutionary time. Neutral theory predicts that accelerated rates of molecular evolution in organisms with exceedingly small N e are due to the effects of genetic drift and fixation of slightly deleterious mutations. We test this prediction in two obligate social parasite species and their respective host species from the bee tribe Allodapini. The parasites (genus Inquilina) have been locked into tight coevolutionary arm races with their exclusive hosts (genus Exoneura) for ~15 million years, even though Inquilina exhibit N e that are an order of magnitude smaller than their host. In this study, we compared rates of molecular evolution between host and parasite using nonsynonymous to synonymous substitution rate ratios (dN/dS) of eleven mitochondrial protein-coding genes sequenced from transcriptomes. Tests of selection on mitochondrial genes indicated no significant differences between host and parasite dN/dS, with evidence for purifying selection acting on all mitochondrial genes of host and parasite species. Several potential factors which could weaken the inverse relationship between N e and rate of molecular evolution are discussed.
Collapse
Affiliation(s)
- Nahid Shokri Bousjein
- College of Science and EngineeringFlinders UniversityAdelaideSouth AustraliaAustralia
- Faculty of Biological SciencesKharazmi UniversityTehranIran
| | - Simon M. Tierney
- Hawkesbury Institute for the EnvironmentWestern Sydney UniversityPenrithNew South WalesAustralia
| | - Michael G. Gardner
- College of Science and EngineeringFlinders UniversityAdelaideSouth AustraliaAustralia
- Evolutionary Biology Unit South Australian MuseumNorth Terrace AdelaideSouth AustraliaAustralia
| | - Michael P. Schwarz
- College of Science and EngineeringFlinders UniversityAdelaideSouth AustraliaAustralia
| |
Collapse
|
11
|
Stern DL, Han C. OUP accepted manuscript. Genome Biol Evol 2022; 14:6602283. [PMID: 35660862 PMCID: PMC9168663 DOI: 10.1093/gbe/evac069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 04/09/2022] [Accepted: 05/03/2022] [Indexed: 11/14/2022] Open
Abstract
Homology of highly divergent genes often cannot be determined from sequence similarity alone. For example, we recently identified in the aphid Hormaphis cornu a family of rapidly evolving bicycle genes, which encode novel proteins implicated as plant gall effectors, and sequence similarity search methods yielded few putative bicycle homologs in other species. Coding sequence-independent features of genes, such as intron-exon boundaries, often evolve more slowly than coding sequences, however, and can provide complementary evidence for homology. We found that a linear logistic regression classifier using only structural features of bicycle genes identified many putative bicycle homologs in other species. Independent evidence from sequence features and intron locations supported homology assignments. To test the potential roles of bicycle genes in other aphids, we sequenced the genome of a second gall-forming aphid, Tetraneura nigriabdominalis and found that many bicycle genes are strongly expressed in the salivary glands of the gall forming foundress. In addition, bicycle genes are strongly overexpressed in the salivary glands of a non-gall forming aphid, Acyrthosiphon pisum, and in the non-gall forming generations of H. cornu. These observations suggest that Bicycle proteins may be used by multiple aphid species to manipulate plants in diverse ways. Incorporation of gene structural features into sequence search algorithms may aid identification of deeply divergent homologs, especially of rapidly evolving genes involved in host-parasite interactions.
Collapse
Affiliation(s)
| | - Clair Han
- Janelia Research Campus of the Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA
| |
Collapse
|
12
|
Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes. Int J Mol Sci 2021; 22:ijms221810019. [PMID: 34576183 PMCID: PMC8468833 DOI: 10.3390/ijms221810019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 12/15/2022] Open
Abstract
Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.
Collapse
|
13
|
Large-scale phylogenomics of the genus Macrostomum (Platyhelminthes) reveals cryptic diversity and novel sexual traits. Mol Phylogenet Evol 2021; 166:107296. [PMID: 34438051 DOI: 10.1016/j.ympev.2021.107296] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/01/2021] [Accepted: 08/19/2021] [Indexed: 02/07/2023]
Abstract
Free-living flatworms of the genus Macrostomum are small and transparent animals, representing attractive study organisms for a broad range of topics in evolutionary, developmental, and molecular biology. The genus includes the model organism M. lignano for which extensive molecular resources are available, and recently there is a growing interest in extending work to additional species in the genus. These endeavours are currently hindered because, even though >200 Macrostomum species have been taxonomically described, molecular phylogenetic information and geographic sampling remain limited. We report on a global sampling campaign aimed at increasing taxon sampling and geographic representation of the genus. Specifically, we use extensive transcriptome and single-locus data to generate phylogenomic hypotheses including 145 species. Across different phylogenetic methods and alignments used, we identify several consistent clades, while their exact grouping is less clear, possibly due to a radiation early in Macrostomum evolution. Moreover, we uncover a large undescribed diversity, with 94 of the studied species likely being new to science, and we identify multiple novel morphological traits. Furthermore, we identify cryptic speciation in a taxonomically challenging assemblage of species, suggesting that the use of molecular markers is a prerequisite for future work, and we describe the distribution of putative synapomorphies and suggest taxonomic revisions based on our finding. Our large-scale phylogenomic dataset now provides a robust foundation for comparative analyses of morphological, behavioural and molecular evolution in this genus.
Collapse
|
14
|
West AC, Mizoro Y, Wood SH, Ince LM, Iversen M, Jørgensen EH, Nome T, Sandve SR, Martin SAM, Loudon ASI, Hazlerigg DG. Immunologic Profiling of the Atlantic Salmon Gill by Single Nuclei Transcriptomics. Front Immunol 2021; 12:669889. [PMID: 34017342 PMCID: PMC8129531 DOI: 10.3389/fimmu.2021.669889] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 04/12/2021] [Indexed: 12/05/2022] Open
Abstract
Anadromous salmonids begin life adapted to the freshwater environments of their natal streams before a developmental transition, known as smoltification, transforms them into marine-adapted fish. In the wild, smoltification is a photoperiod-regulated process, involving radical remodeling of gill function to cope with the profound osmotic and immunological challenges of seawater (SW) migration. While prior work has highlighted the role of specialized "mitochondrion-rich" cells (MRCs) and accessory cells (ACs) in delivering this phenotype, recent RNA profiling experiments suggest that remodeling is far more extensive than previously appreciated. Here, we use single-nuclei RNAseq to characterize the extent of cytological changes in the gill of Atlantic salmon during smoltification and SW transfer. We identify 20 distinct cell clusters, including known, but also novel gill cell types. These data allow us to isolate cluster-specific, smoltification-associated changes in gene expression and to describe how the cellular make-up of the gill changes through smoltification. As expected, we noted an increase in the proportion of seawater mitochondrion-rich cells, however, we also identify previously unknown reduction of several immune-related cell types. Overall, our results provide fresh detail of the cellular complexity in the gill and suggest that smoltification triggers unexpected immune reprogramming.
Collapse
Affiliation(s)
- Alexander C. West
- Arctic seasonal timekeeping initiative (ASTI), Department of Arctic and Marine Biology, UiT – The Arctic University of Norway, Tromsø, Norway
| | - Yasutaka Mizoro
- Unit of Animal Genomics, GIGA Institute, University of Liège, Liège, Belgium
| | - Shona H. Wood
- Arctic seasonal timekeeping initiative (ASTI), Department of Arctic and Marine Biology, UiT – The Arctic University of Norway, Tromsø, Norway
| | - Louise M. Ince
- Department of Pathology and Immunology, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Marianne Iversen
- Arctic seasonal timekeeping initiative (ASTI), Department of Arctic and Marine Biology, UiT – The Arctic University of Norway, Tromsø, Norway
| | - Even H. Jørgensen
- Arctic seasonal timekeeping initiative (ASTI), Department of Arctic and Marine Biology, UiT – The Arctic University of Norway, Tromsø, Norway
| | - Torfinn Nome
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Simen Rød Sandve
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Samuel A. M. Martin
- Institute of Biological and Environmental Sciences, University of Aberdeen, Aberdeen, United Kingdom
| | - Andrew S. I. Loudon
- Division of Diabetes, Endocrinology & Gastroenterology, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - David G. Hazlerigg
- Arctic seasonal timekeeping initiative (ASTI), Department of Arctic and Marine Biology, UiT – The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
15
|
Yang C, Liu Z, Yu S, Ye K, Li X, Shen D. Comparison of three species of Elizabethkingia genus by whole-genome sequence analysis. FEMS Microbiol Lett 2021; 368:6164865. [PMID: 33693941 DOI: 10.1093/femsle/fnab018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 03/04/2021] [Indexed: 12/11/2022] Open
Abstract
Elizabethkingia are found to cause severe neonatal meningitis, nosocomial pneumonia, endocarditis and bacteremia. However, there are few studies on Elizabethkingia genus by comparative genomic analysis. In this study, three species of Elizabethkingia were found: E. meningoseptica, E. anophelis and E. miricola. Resistance genes and associated proteins of seven classes of antibiotics including beta-lactams, aminoglycosides, macrolides, tetracyclines, quinolones, sulfonamides and glycopeptides, as well as multidrug resistance efflux pumps were identified from 20 clinical isolates of Elizabethkingia by whole-genome sequence. Genotype and phenotype displayed a good consistency in beta-lactams, aminoglycosides and glycopeptides, while contradictions exhibited in tetracyclines, quinolones and sulfonamides. Virulence factors and associated genes such as hsp60 (htpB), exopolysaccharide (EPS) (galE/pgi), Mg2+ transport (mgtB/mgtE) and catalase (katA/katG) existed in all clinical and reference strains. The functional analysis of the clusters of orthologous groups indicated that 'metabolism' occupied the largest part in core genome, 'information storage and processing' was the largest group in both accessory genome and unique genome. Abundant mobile elements were identified in E. meningoseptica and E. anophelis. The most significant finding in our study was that a single clone of E. anophelis had been circulating within diversities of departments in a clinical setting for nearly 18 months.
Collapse
Affiliation(s)
- Chen Yang
- Center of Laboratory Medicine, the First Medical Center of Chinese PLA General Hospital, 28 Fu Xing Road, Beijing 100853, China
| | - Zhe Liu
- Center of Laboratory Medicine, the First Medical Center of Chinese PLA General Hospital, 28 Fu Xing Road, Beijing 100853, China
| | - Shuai Yu
- Department of Tropical Medicine and Infectious Diseases, Hainan Hospital, PLA General Hospital, 80 Jiang Lin Road, Sanya, Hainan Province 572016, China
| | - Kun Ye
- Center of Laboratory Medicine, the First Medical Center of Chinese PLA General Hospital, 28 Fu Xing Road, Beijing 100853, China
| | - Xin Li
- Center of Laboratory Medicine, the First Medical Center of Chinese PLA General Hospital, 28 Fu Xing Road, Beijing 100853, China
| | - Dingxia Shen
- Center of Laboratory Medicine, the First Medical Center of Chinese PLA General Hospital, 28 Fu Xing Road, Beijing 100853, China
| |
Collapse
|
16
|
Maruyama SR, Rogerio LA, Freitas PD, Teixeira MMG, Ribeiro JMC. Total Ortholog Median Matrix as an alternative unsupervised approach for phylogenomics based on evolutionary distance between protein coding genes. Sci Rep 2021; 11:3791. [PMID: 33589693 PMCID: PMC7884790 DOI: 10.1038/s41598-021-81926-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 01/05/2021] [Indexed: 11/09/2022] Open
Abstract
The increasing number of available genomic data allowed the development of phylogenomic analytical tools. Current methods compile information from single gene phylogenies, whether based on topologies or multiple sequence alignments. Generally, phylogenomic analyses elect gene families or genomic regions to construct phylogenomic trees. Here, we presented an alternative approach for Phylogenomics, named TOMM (Total Ortholog Median Matrix), to construct a representative phylogram composed by amino acid distance measures of all pairwise ortholog protein sequence pairs from desired species inside a group of organisms. The procedure is divided two main steps, (1) ortholog detection and (2) creation of a matrix with the median amino acid distance measures of all pairwise orthologous sequences. We tested this approach within three different group of organisms: Kinetoplastida protozoa, hematophagous Diptera vectors and Primates. Our approach was robust and efficacious to reconstruct the phylogenetic relationships for the three groups. Moreover, novel branch topologies could be achieved, providing insights about some phylogenetic relationships between some taxa.
Collapse
Affiliation(s)
- Sandra Regina Maruyama
- Department of Genetics and Evolution, Center for Biological Sciences and Health, Federal University of São Carlos (UFSCar), São Carlos, SP, 13565-905, Brazil.
| | - Luana Aparecida Rogerio
- Department of Genetics and Evolution, Center for Biological Sciences and Health, Federal University of São Carlos (UFSCar), São Carlos, SP, 13565-905, Brazil
| | - Patricia Domingues Freitas
- Department of Genetics and Evolution, Center for Biological Sciences and Health, Federal University of São Carlos (UFSCar), São Carlos, SP, 13565-905, Brazil
| | | | - José Marcos Chaves Ribeiro
- Vector Biology Section, Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, 12735 Twinbrook Parkway rm 2E32, Rockville, MD, 20852, USA.
| |
Collapse
|
17
|
Cornet L, Magain N, Baurain D, Lutzoni F. Exploring syntenic conservation across genomes for phylogenetic studies of organisms subjected to horizontal gene transfers: A case study with Cyanobacteria and cyanolichens. Mol Phylogenet Evol 2021; 162:107100. [PMID: 33592234 DOI: 10.1016/j.ympev.2021.107100] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 01/22/2021] [Accepted: 02/01/2021] [Indexed: 11/16/2022]
Abstract
Understanding the evolutionary history of symbiotic Cyanobacteria at a fine scale is essential to unveil patterns of associations with their hosts and factors driving their spatiotemporal interactions. As for bacteria in general, Horizontal Gene Transfers (HGT) are expected to be rampant throughout their evolution, which justified the use of single-locus phylogenies in macroevolutionary studies of these photoautotrophic bacteria. Genomic approaches have greatly increased the amount of molecular data available, but the selection of orthologous, congruent genes that are more likely to reflect bacterial macroevolutionary histories remains problematic. In this study, we developed a synteny-based approach and searched for Collinear Orthologous Regions (COR), under the assumption that genes that are present in the same order and orientation across a wide monophyletic clade are less likely to have undergone HGT. We searched sixteen reference Nostocales genomes and identified 99 genes, part of 28 COR comprising three to eight genes each. We then developed a bioinformatic pipeline, designed to minimize inter-genome contamination and processed twelve Nostoc-associated lichen metagenomes. This reduced our original dataset to 90 genes representing 25 COR, which were used to infer phylogenetic relationships within Nostocales and among lichenized Cyanobacteria. This dataset was narrowed down further to 71 genes representing 22 COR by selecting only genes part of one (largest) operon per COR. We found a relatively high level of congruence among trees derived from the 90-gene dataset, but congruence was only slightly higher among genes within a COR compared to genes across COR. However, topological congruence was significantly higher among the 71 genes part of one operon per COR. Nostocales phylogenies resulting from concatenation and species tree approaches based on the 90- and 71-gene datasets were highly congruent, but the most highly supported result was obtained when using synteny, collinearity, and operon information (i.e., 71-gene dataset) as gene selection criteria, which outperformed larger datasets with more genes.
Collapse
Affiliation(s)
- Luc Cornet
- InBioS - PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium
| | - Nicolas Magain
- Department of Biology, Duke University, Durham, NC, USA; Evolution and Conservation Biology, InBioS, University of Liège, Liège, Belgium
| | - Denis Baurain
- InBioS - PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
| | | |
Collapse
|
18
|
Restrepo-Montoya D, McClean PE, Osorno JM. Orthology and synteny analysis of receptor-like kinases "RLK" and receptor-like proteins "RLP" in legumes. BMC Genomics 2021; 22:113. [PMID: 33568053 PMCID: PMC7874474 DOI: 10.1186/s12864-021-07384-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 01/12/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Legume species are an important plant model because of their protein-rich physiology. The adaptability and productivity of legumes are limited by major biotic and abiotic stresses. Responses to these stresses directly involve plasma membrane receptor proteins known as receptor-like kinases and receptor-like proteins. Evaluating the homology relations among RLK and RLP for seven legume species, and exploring their presence among synteny blocks allow an increased understanding of evolutionary relations, physical position, and chromosomal distribution in related species and their shared roles in stress responses. RESULTS Typically, a high proportion of RLK and RLP legume proteins belong to orthologous clusters, which is confirmed in this study, where between 66 to 90% of the RLKs and RLPs per legume species were classified in orthologous clusters. One-third of the evaluated syntenic blocks had shared RLK/RLP genes among both legumes and non-legumes. Among the legumes, between 75 and 98% of the RLK/RLP were present in syntenic blocks. The distribution of chromosomal segments between Phaseolus vulgaris and Vigna unguiculata, two species that diverged ~ 8 mya, were highly similar. Among the RLK/RLP synteny clusters, seven experimentally validated resistance RLK/RLP genes were identified in syntenic blocks. The RLK resistant genes FLS2, BIR2, ERECTA, IOS1, and AtSERK1 from Arabidopsis and SLSERK1 from Solanum lycopersicum were present in different pairwise syntenic blocks among the legume species. Meanwhile, only the LYM1- RLP resistant gene from Arabidopsis shared a syntenic blocks with Glycine max. CONCLUSIONS The orthology analysis of the RLK and RLP suggests a dynamic evolution in the legume family, with between 66 to 85% of RLK and 83 to 88% of RLP belonging to orthologous clusters among the species evaluated. In fact, for the 10-species comparison, a lower number of singleton proteins were reported among RLP compared to RLK, suggesting that RLP positions are more physically conserved compared to RLK. The identification of RLK and RLP genes among the synteny blocks in legumes revealed multiple highly conserved syntenic blocks on multiple chromosomes. Additionally, the analysis suggests that P. vulgaris is an appropriate anchor species for comparative genomics among legumes.
Collapse
Affiliation(s)
- Daniel Restrepo-Montoya
- Genomics and Bioinformatics Program, North Dakota State University, Fargo, ND, 58108-6050, USA.
- Department of Plant Sciences, North Dakota State University, Fargo, ND, 58108-6050, USA.
| | - Phillip E McClean
- Genomics and Bioinformatics Program, North Dakota State University, Fargo, ND, 58108-6050, USA.
- Department of Plant Sciences, North Dakota State University, Fargo, ND, 58108-6050, USA.
| | - Juan M Osorno
- Department of Plant Sciences, North Dakota State University, Fargo, ND, 58108-6050, USA.
| |
Collapse
|
19
|
Zhang X, Pavlicev M, Jones HN, Muglia LJ. Eutherian-Specific Gene TRIML2 Attenuates Inflammation in the Evolution of Placentation. Mol Biol Evol 2020; 37:507-523. [PMID: 31633784 PMCID: PMC6993854 DOI: 10.1093/molbev/msz238] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Evolution of highly invasive placentation in the stem lineage of eutherians and subsequent extension of pregnancy set eutherians apart from other mammals, that is, marsupials with short-lived placentas, and oviparous monotremes. Recent studies suggest that eutherian implantation evolved from marsupial attachment reaction, an inflammatory process induced by the direct contact of fetal placenta with maternal endometrium after the breakdown of the shell coat, and shortly before the onset of parturition. Unique to eutherians, a dramatic downregulation of inflammation after implantation prevents the onset of premature parturition, and is critical for the maintenance of gestation. This downregulation likely involved evolutionary changes on maternal as well as fetal/placental side. Tripartite-motif family-like2 (TRIML2) only exists in eutherian genomes and shows preferential expression in preimplantation embryos, and trophoblast-derived structures, such as chorion and placental disc. Comparative genomic evidence supports that TRIML2 originated from a gene duplication event in the stem lineage of Eutheria that also gave rise to eutherian TRIML1. Compared with TRIML1, TRIML2 lost the catalytic RING domain of E3 ligase. However, only TRIML2 is induced in human choriocarcinoma cell line JEG3 with poly(I:C) treatment to simulate inflammation during viral infection. Its knockdown increases the production of proinflammatory cytokines and reduces trophoblast survival during poly(I:C) stimulation, while its overexpression reduces proinflammatory cytokine production, supporting TRIML2’s role as a regulatory inhibitor of the inflammatory pathways in trophoblasts. TRIML2’s potential virus-interacting PRY/SPRY domain shows significant signature of selection, suggesting its contribution to the evolution of eutherian-specific inflammation regulation during placentation.
Collapse
Affiliation(s)
- Xuzhe Zhang
- Division of Human Genetics, Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH.,March of Dimes Prematurity Research Center Ohio Collaborative, Cincinnati, OH
| | - Mihaela Pavlicev
- Division of Human Genetics, Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH.,March of Dimes Prematurity Research Center Ohio Collaborative, Cincinnati, OH
| | - Helen N Jones
- Division of Pediatric Surgery, Cincinnati Children's Hospital Medical Center, Cincinnati, OH.,Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Louis J Muglia
- Division of Human Genetics, Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH.,March of Dimes Prematurity Research Center Ohio Collaborative, Cincinnati, OH
| |
Collapse
|
20
|
Boudabous A, Tekaia F. Enhancing Bioinformatics and Genomics Courses: Building Capacity and Skills via Lab Meeting Activities: Fostering a Culture of Critical Capacities to Read, Write, Communicate and Engage in Rigorous Scientific Exchanges. Bioessays 2020; 42:e2000134. [PMID: 32830345 DOI: 10.1002/bies.202000134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 07/08/2020] [Indexed: 11/08/2022]
Abstract
Reading, writing, publishing, and publicly presenting scientific works are vital for a young researcher's profile building and career development. Generally, the traditional educational curricula do not offer training possibilities to learn and practice how to prepare, write, and present scientific works. These are rather a part of lab meeting activities in research groups. The lack of such training is more critical in some developing countries because this adds to the rare opportunities to discuss and become involved in the exchanges on state of the art scientific literature. Here the authors relate their experience in introducing a weekly 1-day lab meeting in the framework of two previously organized 3-month courses on "Bioinformatics and Genome Analyses". The main activities which are developed during these lab meetings include scientific literature follow up as well as preparing and presenting oral and written scientific reviews. These activities prove to be useful for a student's self-confidence building, for enhancing their active participation during the lectures and practical sessions, as well as for the positive impact on running the whole course program. Incorporation of such lab meeting activities in the course program significantly improves the capacity building of the participants, their analytical and critical reading of scientific literature, as well as communication skills. In this work it is shown how to proceed with the different steps involved in the implementation of lab meeting activities, and to recommend their regular institution in similar courses.
Collapse
Affiliation(s)
- Abdellatif Boudabous
- Faculté des Sciences de Tunis, Campus Universitaire El-Manar, El Manar, Tunis, 2092, Tunisia
| | - Fredj Tekaia
- Institut Pasteur Paris, 28, rue du Dr Roux, 75724, Paris, Cedex, 15, France
| |
Collapse
|
21
|
Heger P, Zheng W, Rottmann A, Panfilio KA, Wiehe T. The genetic factors of bilaterian evolution. eLife 2020; 9:e45530. [PMID: 32672535 PMCID: PMC7535936 DOI: 10.7554/elife.45530] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 07/03/2020] [Indexed: 12/13/2022] Open
Abstract
The Cambrian explosion was a unique animal radiation ~540 million years ago that produced the full range of body plans across bilaterians. The genetic mechanisms underlying these events are unknown, leaving a fundamental question in evolutionary biology unanswered. Using large-scale comparative genomics and advanced orthology evaluation techniques, we identified 157 bilaterian-specific genes. They include the entire Nodal pathway, a key regulator of mesoderm development and left-right axis specification; components for nervous system development, including a suite of G-protein-coupled receptors that control physiology and behaviour, the Robo-Slit midline repulsion system, and the neurotrophin signalling system; a high number of zinc finger transcription factors; and novel factors that previously escaped attention. Contradicting the current view, our study reveals that genes with bilaterian origin are robustly associated with key features in extant bilaterians, suggesting a causal relationship.
Collapse
Affiliation(s)
- Peter Heger
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Wen Zheng
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Anna Rottmann
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Kristen A Panfilio
- Institute for Zoology: Developmental Biology, Cologne Biocenter, University of CologneCologneGermany
- School of Life Sciences, University of Warwick, Gibbet Hill CampusCoventryUnited Kingdom
| | - Thomas Wiehe
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| |
Collapse
|
22
|
van Hooff JJE, Tromer E, van Dam TJP, Kops GJPL, Snel B. Inferring the Evolutionary History of Your Favorite Protein: A Guide for Molecular Biologists. Bioessays 2020; 41:e1900006. [PMID: 31026339 DOI: 10.1002/bies.201900006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 02/17/2019] [Indexed: 01/01/2023]
Abstract
Comparative genomics has proven a fruitful approach to acquire many functional and evolutionary insights into core cellular processes. Here it is argued that in order to perform accurate and interesting comparative genomics, one first and foremost has to be able to recognize, postulate, and revise different evolutionary scenarios. After all, these studies lack a simple protocol, due to different proteins having different evolutionary dynamics and demanding different approaches. The authors here discuss this challenge from a practical (what are the observations?) and conceptual (how do these indicate a specific evolutionary scenario?) viewpoint, with the aim to guide investigators who want to analyze the evolution of their protein(s) of interest. By sharing how the authors draft, test, and update such a scenario and how it directs their investigations, the authors hope to illuminate how to execute molecular evolution studies and how to interpret them. Also see the video abstract here https://youtu.be/VCt3l2pbdbQ.
Collapse
Affiliation(s)
- Jolien J E van Hooff
- Theoretical Biology and Bioinformatics, Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands.,Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands
| | - Eelco Tromer
- Theoretical Biology and Bioinformatics, Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands.,Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands.,Department of Biochemistry, University of Cambridge, Hopkins Building, Tennis Court Road, Cambridge, CB2 1QW, UK
| | - Teunis J P van Dam
- Theoretical Biology and Bioinformatics, Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| | - Geert J P L Kops
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands.,Molecular Cancer Research, University Medical Centre Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| |
Collapse
|
23
|
Comparative Genomics of 86 Whole-Genome Sequences in the Six Species of the Elizabethkingia Genus Reveals Intraspecific and Interspecific Divergence. Sci Rep 2019; 9:19167. [PMID: 31844108 PMCID: PMC6915712 DOI: 10.1038/s41598-019-55795-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 11/27/2019] [Indexed: 12/31/2022] Open
Abstract
Bacteria of the genus Elizabethkingia are emerging infectious agents that can cause infection in humans. The number of published whole-genome sequences of Elizabethkingia is rapidly increasing. In this study, we used comparative genomics to investigate the genomes of the six species in the Elizabethkingia genus, namely E. meningoseptica, E. anophelis, E. miricola, E. bruuniana, E. ursingii, and E. occulta. In silico DNA–DNA hybridization, whole-genome sequence-based phylogeny, pan genome analysis, and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed, and clusters of orthologous groups were evaluated. Of the 86 whole-genome sequences available in GenBank, 21 were complete genome sequences and 65 were shotgun sequences. In silico DNA–DNA hybridization clearly delineated the six Elizabethkingia species. Phylogenetic analysis confirmed that E. bruuniana, E. ursingii, and E. occulta were closer to E. miricola than to E. meningoseptica and E. anophelis. A total of 2,609 clusters of orthologous groups were identified among the six type strains of the Elizabethkingia genus. Metabolism-related clusters of orthologous groups accounted for the majority of gene families in KEGG analysis. New genes were identified that substantially increased the total repertoire of the pan genome after the addition of 86 Elizabethkingia genomes, which suggests that Elizabethkingia has shown adaptive evolution to environmental change. This study presents a comparative genomic analysis of Elizabethkingia, and the results of this study provide knowledge that facilitates a better understanding of this microorganism.
Collapse
|
24
|
Identifying genetic markers for a range of phylogenetic utility-From species to family level. PLoS One 2019; 14:e0218995. [PMID: 31369563 PMCID: PMC6675087 DOI: 10.1371/journal.pone.0218995] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 06/13/2019] [Indexed: 12/03/2022] Open
Abstract
Resolving the phylogenetic relationships of closely related species using a small set of loci is challenging as sufficient information may not be captured from a limited sample of the genome. Relying on few loci can also be problematic when conflict between gene-trees arises from incomplete lineage sorting and/or ongoing hybridization, problems especially likely in recently diverged lineages. Here, we developed a method using limited genomic resources that allows identification of many low copy candidate loci from across the nuclear and chloroplast genomes, design probes for target capture and sequence the captured loci. To validate our method we present data from Eucalyptus and Melaleuca, two large and phylogenetically problematic genera within the Myrtaceae family. With one annotated genome, one transcriptome and two whole-genome shotgun sequences of one Eucalyptus and four Melaleuca species, respectively, we identified 212 loci representing 263 kbp for targeted sequence capture and sequencing. Of these, 209 were successfully tested from 47 samples across five related genera of Myrtaceae. The average percentage of reads mapped back to the reference was 57.6% with coverage of more than 20 reads per position across 83.5% of the data. The methods developed here should be applicable across a large range of taxa across all kingdoms. The core methods are very flexible, providing a platform for various genomic resource availabilities and are useful from shallow to deep phylogenies.
Collapse
|
25
|
Rey C, Veber P, Boussau B, Sémon M. CAARS: comparative assembly and annotation of RNA-Seq data. Bioinformatics 2019; 35:2199-2207. [PMID: 30452539 PMCID: PMC6596894 DOI: 10.1093/bioinformatics/bty903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 09/13/2018] [Accepted: 11/16/2018] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carine Rey
- UnivLyon, Université Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France
| | - Philippe Veber
- UnivLyon, Université Claude Bernard Lyon 1, CNRS, UMR, LBBE, F-69100, Villeurbanne, France
| | - Bastien Boussau
- UnivLyon, Université Claude Bernard Lyon 1, CNRS, UMR, LBBE, F-69100, Villeurbanne, France
| | - Marie Sémon
- UnivLyon, Université Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France
| |
Collapse
|
26
|
Morozov A, Galachyants YP. Diatom genes originating from red and green algae: Implications for the secondary endosymbiosis models. Mar Genomics 2019; 45:72-78. [DOI: 10.1016/j.margen.2019.02.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Revised: 02/13/2019] [Accepted: 02/13/2019] [Indexed: 11/27/2022]
|
27
|
Heller D, Szklarczyk D, Mering CV. Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies. BMC Bioinformatics 2019; 20:228. [PMID: 31060495 PMCID: PMC6501302 DOI: 10.1186/s12859-019-2828-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 04/17/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. RESULTS Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. CONCLUSION The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline .
Collapse
Affiliation(s)
- Davide Heller
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| |
Collapse
|
28
|
Gilbert DG. Genes of the pig, Sus scrofa, reconstructed with EvidentialGene. PeerJ 2019; 7:e6374. [PMID: 30723633 PMCID: PMC6361002 DOI: 10.7717/peerj.6374] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Accepted: 12/29/2018] [Indexed: 01/19/2023] Open
Abstract
The pig is a well-studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI reference sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from current NCBI and Ensembl reference pig gene sets, additional alternate transcripts, and other improvements. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline of EvidentialGene project.
Collapse
|
29
|
Guerfali FZ, Laouini D, Boudabous A, Tekaia F. Designing and running an advanced Bioinformatics and genome analyses course in Tunisia. PLoS Comput Biol 2019; 15:e1006373. [PMID: 30689625 PMCID: PMC6349305 DOI: 10.1371/journal.pcbi.1006373] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Genome data, with underlying new knowledge, are accumulating at exponential rate thanks to ever-improving sequencing technologies and the parallel development of dedicated efficient Bioinformatics methods and tools. Advanced Education in Bioinformatics and Genome Analyses is to a large extent not accessible to students in developing countries where endeavors to set up Bioinformatics courses concern most often only basic levels. Here, we report a pioneering pilot experience concerning the design and implementation, from scratch, of a three-months advanced and extensive course in Bioinformatics and Genome Analyses in the Institut Pasteur de Tunis. Most significantly the outcome of the course was upgrading the participants’ skills in Bioinformatics and Genome Analyses to recognized international standards. Here we detail the different steps involved in the implementation of this course as well as the topics covered in the program. The description of this pilot experience might be helpful for the implementation of other similar educational projects, notably in developing countries, aiming to go beyond basics and providing young researchers with high-level skills.
Collapse
Affiliation(s)
- Fatma Z. Guerfali
- Université Tunis El Manar, Tunis, Tunisia
- Institut Pasteur de Tunis, LR11IPT02, Laboratory of Transmission, Control and Immunobiology of Infections (LTCII), Tunis-Belvédère, Tunisia
| | - Dhafer Laouini
- Université Tunis El Manar, Tunis, Tunisia
- Institut Pasteur de Tunis, LR11IPT02, Laboratory of Transmission, Control and Immunobiology of Infections (LTCII), Tunis-Belvédère, Tunisia
| | - Abdellatif Boudabous
- Université Tunis El Manar, Faculté des Sciences de Tunis, Laboratoire Microorganisme et Biomolécules Actives, Campus Universitaire Farhat Heched, El Manar, Tunis, Tunisia
| | - Fredj Tekaia
- Institut Pasteur Paris, 28 rue du Dr Roux, 75724 Paris cedex 15, France
- * E-mail:
| |
Collapse
|
30
|
de Bruijn S, Zhao T, Muiño JM, Schranz EM, Angenent GC, Kaufmann K. PISTILLATA paralogs in Tarenaya hassleriana have diverged in interaction specificity. BMC PLANT BIOLOGY 2018; 18:368. [PMID: 30577806 PMCID: PMC6303913 DOI: 10.1186/s12870-018-1574-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 11/26/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND Floral organs are specified by MADS-domain transcription factors that act in a combinatorial manner, as summarized in the (A)BCE model. However, this evolutionarily conserved model is in contrast to a remarkable amount of morphological diversity in flowers. One of the mechanisms suggested to contribute to this diversity is duplication of floral MADS-domain transcription factors. Although gene duplication is often followed by loss of one of the copies, sometimes both copies are retained. If both copies are retained they will initially be redundant, providing freedom for one of the paralogs to change function. Here, we examine the evolutionary fate and functional consequences of a transposition event at the base of the Brassicales that resulted in the duplication of the floral regulator PISTILLATA (PI), using Tarenaya hassleriana (Cleomaceae) as a model system. RESULTS The transposition of a genomic region containing a PI gene led to two paralogs which are located at different positions in the genome. The original PI copy is syntenic in position with most angiosperms, whereas the transposed copy is syntenic with the PI genes in Brassicaceae. The two PI paralogs of T. hassleriana have very similar expression patterns. However, they may have diverged in function, as only one of these PI proteins was able to act heterologously in the first whorl of A. thaliana flowers. We also observed differences in protein complex formation between the two paralogs, and the two paralogs exhibit subtle differences in DNA-binding specificity. Sequence analysis indicates that most of the protein sequence divergence between the two T. hassleriana paralogs emerged in a common ancestor of the Cleomaceae and the Brassicaceae. CONCLUSIONS We found that the PI paralogs in T. hassleriana have similar expression patterns, but may have diverged at the level of protein function. Data suggest that most protein sequence divergence occurred rapidly, prior to the origin of the Brassicaceae and Cleomaceae. It is tempting to speculate that the interaction specificities of the Brassicaceae-specific PI proteins are different compared to the PI found in other angiosperms. This could lead to PI regulating partly different genes in the Brassicaceae, and ultimately might result in change floral in morphology.
Collapse
Affiliation(s)
- Suzanne de Bruijn
- Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
- Bioscience, Wageningen Plant Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Tao Zhao
- Biosystematics Group, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Jose M. Muiño
- Institute for Biology, Systems Biology of Gene Regulation, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Eric M. Schranz
- Biosystematics Group, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Gerco C. Angenent
- Laboratory of Molecular Biology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
- Bioscience, Wageningen Plant Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Kerstin Kaufmann
- Institute for Biology, Plant Cell and Molecular Biology, Humboldt-Universität zu Berlin, Philippstraße 13, 10115 Berlin, Germany
| |
Collapse
|
31
|
Muthye V, Lavrov DV. Characterization of mitochondrial proteomes of nonbilaterian animals. IUBMB Life 2018; 70:1289-1301. [PMID: 30419142 DOI: 10.1002/iub.1961] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 09/08/2018] [Accepted: 09/29/2018] [Indexed: 12/18/2022]
Abstract
Mitochondria require ~1,500 proteins for their maintenance and proper functionality, which constitute the mitochondrial proteome (mt-proteome). Although a few of these proteins, mostly subunits of the electron transport chain complexes, are encoded in mitochondrial DNA (mtDNA), the vast majority are encoded in the nuclear genome and imported to the organelle. Previous studies have shown a continuous and complex evolution of mt-proteome among eukaryotes. However, there was less attention paid to mt-proteome evolution within Metazoa, presumably because animal mtDNA and, by extension, animal mitochondria are often considered to be uniform. In this analysis, two bioinformatic approaches (Orthologue-detection and Mitochondrial Targeting Sequence prediction) were used to identify mt-proteins in 23 species from four nonbilaterian phyla: Cnidaria, Ctenophora, Placozoa, and Porifera, as well as two choanoflagellates, the closest animal relatives. Our results revealed a large variation in mt-proteome in nonbilaterian animals in size and composition. Myxozoans, highly reduced cnidarian parasites, possessed the smallest inferred mitochondrial proteomes, while calcareous sponges possessed the largest. About 513 mitochondrial orthologous groups were present in all nonbilaterian phyla and human. Interestingly, 42 human mitochondrial proteins were not identified in any nonbilaterian species studied and represent putative innovations along the bilaterian branch. Several of these proteins were involved in apoptosis and innate immunity, two processes known to evolve within Metazoa. Conversely, several proteins identified as mitochondrial in nonbilaterian phyla and animal outgroups were absent in human, representing cases of possible loss. Finally, a few human cytosolic proteins, such as histones and cytosolic ribosomal proteins, were predicted to be targeted to mitochondria in nonbilaterian animals. Overall, our analysis provides the first step in characterization of mt-proteomes in nonbilaterian animals and understanding evolution of animal mt-proteome. © 2018 IUBMB Life, 70(12):1289-1301, 2018.
Collapse
Affiliation(s)
- Viraj Muthye
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Dennis V Lavrov
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA, USA
| |
Collapse
|
32
|
Sheikhizadeh Anari S, de Ridder D, Schranz ME, Smit S. Efficient inference of homologs in large eukaryotic pan-proteomes. BMC Bioinformatics 2018; 19:340. [PMID: 30257640 PMCID: PMC6158922 DOI: 10.1186/s12859-018-2362-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Accepted: 09/09/2018] [Indexed: 12/31/2022] Open
Abstract
Background Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, there is a need for efficient standalone tools to detect homologs in novel data. Results To address this, we present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. We demonstrate accuracy, scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa. Conclusions We clearly observed the trade-off between recall and precision in our homology inference. Favoring recall or precision strongly depends on the application. The clustering behavior of our program can be optimized for particular applications by altering a few key parameters. The program is available for public use at https://github.com/sheikhizadeh/pantools as an extension to our pan-genomic analysis tool, PanTools. Electronic supplementary material The online version of this article (10.1186/s12859-018-2362-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - M Eric Schranz
- Biosystematics Group, Wageningen University, Wageningen, The Netherlands
| | - Sandra Smit
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| |
Collapse
|
33
|
Kayal E, Bentlage B, Sabrina Pankey M, Ohdera AH, Medina M, Plachetzki DC, Collins AG, Ryan JF. Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organismal traits. BMC Evol Biol 2018. [PMCID: PMC5932825 DOI: 10.1186/s12862-018-1142-0] [Citation(s) in RCA: 114] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background The phylogeny of Cnidaria has been a source of debate for decades, during which nearly all-possible relationships among the major lineages have been proposed. The ecological success of Cnidaria is predicated on several fascinating organismal innovations including stinging cells, symbiosis, colonial body plans and elaborate life histories. However, understanding the origins and subsequent diversification of these traits remains difficult due to persistent uncertainty surrounding the evolutionary relationships within Cnidaria. While recent phylogenomic studies have advanced our knowledge of the cnidarian tree of life, no analysis to date has included genome-scale data for each major cnidarian lineage. Results Here we describe a well-supported hypothesis for cnidarian phylogeny based on phylogenomic analyses of new and existing genome-scale data that includes representatives of all cnidarian classes. Our results are robust to alternative modes of phylogenetic estimation and phylogenomic dataset construction. We show that two popular phylogenomic matrix construction pipelines yield profoundly different datasets, both in the identities and in the functional classes of the loci they include, but resolve the same topology. We then leverage our phylogenetic resolution of Cnidaria to understand the character histories of several critical organismal traits. Ancestral state reconstruction analyses based on our phylogeny establish several notable organismal transitions in the evolutionary history of Cnidaria and depict the ancestral cnidarian as a solitary, non-symbiotic polyp that lacked a medusa stage. In addition, Bayes factor tests strongly suggest that symbiosis has evolved multiple times independently across the cnidarian radiation. Conclusions Cnidaria have experienced more than 600 million years of independent evolution and in the process generated an array of organismal innovations. Our results add significant clarification on the cnidarian tree of life and the histories of some of these innovations. Further, we confirm the existence of Acraspeda (staurozoans plus scyphozoans and cubozoans), thus reviving an evolutionary hypothesis put forward more than a century ago. Electronic supplementary material The online version of this article (10.1186/s12862-018-1142-0) contains supplementary material, which is available to authorized users.
Collapse
|
34
|
Nichio BTL, Marchaukoski JN, Raittz RT. New Tools in Orthology Analysis: A Brief Review of Promising Perspectives. Front Genet 2017; 8:165. [PMID: 29163633 PMCID: PMC5674930 DOI: 10.3389/fgene.2017.00165] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 10/16/2017] [Indexed: 11/23/2022] Open
Abstract
Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques) and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST "all-against-all" methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm); or proteinOrtho (which improves the accuracy of ortholog groups); or ReMark (tackling the integration of the pipeline to turn the entry process automatic); or OrthAgogue (using algorithms developed to minimize processing time); and proteinOrtho (developed for dealing with large amounts of biological data). We made a comparison among the main features of four tool and tested them using four for prokaryotic genomas. We hope that our review can be useful for researchers and will help them in selecting the most appropriate tool for their work in the field of orthology.
Collapse
Affiliation(s)
| | | | - Roberto Tadeu Raittz
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Paraná, Curitiba, Brazil
| |
Collapse
|
35
|
Rane RV, Oakeshott JG, Nguyen T, Hoffmann AA, Lee SF. Orthonome - a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes. BMC Genomics 2017; 18:673. [PMID: 28859620 PMCID: PMC5580312 DOI: 10.1186/s12864-017-4079-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Accepted: 08/21/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recall of orthologue prediction remains an ongoing challenge. RESULTS Here we present Orthonome, an orthologue prediction pipeline, designed to reduce the trade-off between orthologue capture rates (recall) and accuracy of multi-species orthologue prediction. The pipeline compares sequence domains and then forms sequence-similar clusters before using phylogenetic comparisons to identify inparalogues. It then corrects sequence similarity metrics for fragment and gene length bias using a novel scoring metric capturing relationships between full length as well as fragmented genes. The remaining genes are then brought together for the identification of orthologues within a phylogenetic framework. The orthologue predictions are further calibrated along with inparalogues and gene births, using synteny, to identify novel orthologous relationships. We use 12 high quality Drosophila genomes to show that, compared to other orthologue prediction pipelines, Orthonome provides orthogroups with minimal error but high recall. Furthermore, Orthonome is resilient to suboptimal assembly/annotation quality, with the inclusion of draft genomes from eight additional Drosophila species still providing >6500 1:1 orthologues across all twenty species while retaining a better combination of accuracy and recall than other pipelines. Orthonome is implemented as a searchable database and query tool along with multiple-sequence alignment browsers for all sets of orthologues. The underlying documentation and database are accessible at http://www.orthonome.com . CONCLUSION We demonstrate that Orthonome provides a superior combination of orthologue capture rates and accuracy on complete and draft drosophilid genomes when tested alongside previously published pipelines. The study also highlights a greater degree of evolutionary conservation across drosophilid species than earlier thought.
Collapse
Affiliation(s)
- Rahul V Rane
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia. .,CSIRO, Canberra, Australian Capital Territory, Australia.
| | | | - Thu Nguyen
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Ary A Hoffmann
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Siu F Lee
- CSIRO, Canberra, Australian Capital Territory, Australia.,Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
36
|
Snake Genome Sequencing: Results and Future Prospects. Toxins (Basel) 2016; 8:toxins8120360. [PMID: 27916957 PMCID: PMC5198554 DOI: 10.3390/toxins8120360] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Revised: 11/23/2016] [Accepted: 11/25/2016] [Indexed: 12/16/2022] Open
Abstract
Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.
Collapse
|
37
|
Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species. Int J Genomics 2016; 2016:2374610. [PMID: 27376076 PMCID: PMC4916311 DOI: 10.1155/2016/2374610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2015] [Revised: 03/19/2016] [Accepted: 03/21/2016] [Indexed: 12/31/2022] Open
Abstract
Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.
Collapse
|
38
|
Tekaia F. Genome Data Exploration Using Correspondence Analysis. Bioinform Biol Insights 2016; 10:59-72. [PMID: 27279736 PMCID: PMC4898644 DOI: 10.4137/bbi.s39614] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Revised: 04/12/2016] [Accepted: 04/14/2016] [Indexed: 01/14/2023] Open
Abstract
Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|