1
|
Aufiero G, Fruggiero C, D’Angelo D, D’Agostino N. Homoeologs in Allopolyploids: Navigating Redundancy as Both an Evolutionary Opportunity and a Technical Challenge-A Transcriptomics Perspective. Genes (Basel) 2024; 15:977. [PMID: 39202338 PMCID: PMC11353593 DOI: 10.3390/genes15080977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/22/2024] [Accepted: 07/23/2024] [Indexed: 09/03/2024] Open
Abstract
Allopolyploidy in plants involves the merging of two or more distinct parental genomes into a single nucleus, a significant evolutionary process in the plant kingdom. Transcriptomic analysis provides invaluable insights into allopolyploid plants by elucidating the fate of duplicated genes, revealing evolutionary novelties and uncovering their environmental adaptations. By examining gene expression profiles, scientists can discern how duplicated genes have evolved to acquire new functions or regulatory roles. This process often leads to the development of novel traits and adaptive strategies that allopolyploid plants leverage to thrive in diverse ecological niches. Understanding these molecular mechanisms not only enhances our appreciation of the genetic complexity underlying allopolyploidy but also underscores their importance in agriculture and ecosystem resilience. However, transcriptome profiling is challenging due to genomic redundancy, which is further complicated by the presence of multiple chromosomes sets and the variations among homoeologs and allelic genes. Prior to transcriptome analysis, sub-genome phasing and homoeology inference are essential for obtaining a comprehensive view of gene expression. This review aims to clarify the terminology in this field, identify the most challenging aspects of transcriptome analysis, explain their inherent difficulties, and suggest reliable analytic strategies. Furthermore, bulk RNA-seq is highlighted as a primary method for studying allopolyploid gene expression, focusing on critical steps like read mapping and normalization in differential gene expression analysis. This approach effectively captures gene expression from both parental genomes, facilitating a comprehensive analysis of their combined profiles. Its sensitivity in detecting low-abundance transcripts allows for subtle differences between parental genomes to be identified, crucial for understanding regulatory dynamics and gene expression balance in allopolyploids.
Collapse
Affiliation(s)
| | | | | | - Nunzio D’Agostino
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, Italy; (G.A.); (C.F.); (D.D.)
| |
Collapse
|
2
|
Barba-Montoya J, Craig JM, Kumar S. Integrating Phylogenies with Chronology to Assemble the Tree of Life. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.17.603989. [PMID: 39091733 PMCID: PMC11291004 DOI: 10.1101/2024.07.17.603989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Reconstructing the global Tree of Life necessitates computational approaches to integrate numerous molecular phylogenies with limited species overlap into a comprehensive supertree. Our survey of published literature shows that individual phylogenies are frequently restricted to specific taxonomic groups due to the expertise of investigators and molecular evolutionary considerations, resulting in any given species present in a minuscule fraction of phylogenies. We present a novel approach, called the chronological supertree algorithm (Chrono-STA), that can build a supertree of species from such data by using node ages in published molecular phylogenies scaled to time. Chrono-STA builds a supertree of organisms by integrating chronological data from molecular timetrees. It fundamentally differs from existing approaches that generate consensus phylogenies from gene trees with missing taxa, as Chrono-STA does not impute nodal distances, use a guide tree as a backbone, or reduce phylogenies to quartets. Analyses of simulated and empirical datasets show that Chrono-STA can combine taxonomically restricted timetrees with extremely limited species overlap. For such data, approaches that impute missing distances or assemble phylogenetic quartets did not perform well. We conclude that integrating phylogenies via temporal dimension enhances the accuracy of reconstructed supertrees that are also scaled to time.
Collapse
Affiliation(s)
- Jose Barba-Montoya
- Richard Gilder Graduate School, American Museum of Natural History, New York, NY
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Jack M Craig
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| |
Collapse
|
3
|
Nunes WVB, Oliveira DS, Dias GDR, Carvalho AB, Caruso ÍP, Biselli JM, Guegen N, Akkouche A, Burlet N, Vieira C, Carareto CMA. A comprehensive evolutionary scenario for the origin and neofunctionalization of the Drosophila speciation gene Odysseus (OdsH). G3 (BETHESDA, MD.) 2024; 14:jkad299. [PMID: 38156703 PMCID: PMC10917504 DOI: 10.1093/g3journal/jkad299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 11/22/2023] [Accepted: 12/20/2023] [Indexed: 01/03/2024]
Abstract
Odysseus (OdsH) was the first speciation gene described in Drosophila related to hybrid sterility in offspring of mating between Drosophila mauritiana and Drosophila simulans. Its origin is attributed to the duplication of the gene unc-4 in the subgenus Sophophora. By using a much larger sample of Drosophilidae species, we showed that contrary to what has been previously proposed, OdsH origin occurred 62 MYA. Evolutionary rates, expression, and transcription factor-binding sites of OdsH evidence that it may have rapidly experienced neofunctionalization in male sexual functions. Furthermore, the analysis of the OdsH peptide allowed the identification of mutations of D. mauritiana that could result in incompatibility in hybrids. In order to find if OdsH could be related to hybrid sterility, beyond Sophophora, we explored the expression of OdsH in Drosophila arizonae and Drosophila mojavensis, a pair of sister species with incomplete reproductive isolation. Our data indicated that OdsH expression is not atypical in their male-sterile hybrids. In conclusion, we have proposed that the origin of OdsH occurred earlier than previously proposed, followed by neofunctionalization. Our results also suggested that its role as a speciation gene might be restricted to D. mauritiana and D. simulans.
Collapse
Affiliation(s)
- William Vilas Boas Nunes
- Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University (Unesp), 2265 Cristóvão Colombo Street, 15054-000 São José do Rio Preto, Brazil
- Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Bât. Grégor Mendel, 43 Boulevard 11 Novembre 1918, 69622 Villeurbanne, France
| | - Daniel Siqueira Oliveira
- Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University (Unesp), 2265 Cristóvão Colombo Street, 15054-000 São José do Rio Preto, Brazil
- Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Bât. Grégor Mendel, 43 Boulevard 11 Novembre 1918, 69622 Villeurbanne, France
| | - Guilherme de Rezende Dias
- Departamento de Genética, Instituto de Biologia, Universidade Federal do Rio de Janeiro, CCS sl A2-075, 373 Carlos Chagas Filho Avenue, 21941-971 Rio de Janeiro, Brazil
| | - Antonio Bernardo Carvalho
- Departamento de Genética, Instituto de Biologia, Universidade Federal do Rio de Janeiro, CCS sl A2-075, 373 Carlos Chagas Filho Avenue, 21941-971 Rio de Janeiro, Brazil
| | - Ícaro Putinhon Caruso
- Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University (Unesp), 2265 Cristóvão Colombo Street, 15054-000 São José do Rio Preto, Brazil
| | - Joice Matos Biselli
- Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University (Unesp), 2265 Cristóvão Colombo Street, 15054-000 São José do Rio Preto, Brazil
| | - Nathalie Guegen
- Faculté de Médecine, iGReD, Université Clermont Auvergne, CNRS, INSERM, 4 Bd Claude Bernard, 63000 Clermont-Ferrande, France
| | - Abdou Akkouche
- Faculté de Médecine, iGReD, Université Clermont Auvergne, CNRS, INSERM, 4 Bd Claude Bernard, 63000 Clermont-Ferrande, France
| | - Nelly Burlet
- Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Bât. Grégor Mendel, 43 Boulevard 11 Novembre 1918, 69622 Villeurbanne, France
| | - Cristina Vieira
- Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Bât. Grégor Mendel, 43 Boulevard 11 Novembre 1918, 69622 Villeurbanne, France
| | - Claudia M A Carareto
- Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University (Unesp), 2265 Cristóvão Colombo Street, 15054-000 São José do Rio Preto, Brazil
| |
Collapse
|
4
|
Hellmuth M, Stadler PF. The Theory of Gene Family Histories. Methods Mol Biol 2024; 2802:1-32. [PMID: 38819554 DOI: 10.1007/978-1-0716-3838-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Most genes are part of larger families of evolutionary-related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.e., the precise dating of evolutionary events relative to phylogenetic tree of the underlying species has remained a challenging topic despite their importance as a basis for detailed investigations into adaptation and functional evolution of individual members of the gene family. The identification of orthologs, moreover, is a particularly important subproblem of the more general setting considered here. In the last few years, an extensive body of mathematical results has appeared that tightly links orthology, a formal notion of best matches among genes, and horizontal gene transfer. The purpose of this chapter is to broadly outline some of the key mathematical insights and to discuss their implication for practical applications. In particular, we focus on tree-free methods, i.e., methods to infer orthology or horizontal gene transfer as well as gene trees, species trees, and reconciliations between them without using a priori knowledge of the underlying trees or statistical models for the inference of phylogenetic trees. Instead, the initial step aims to extract binary relations among genes.
Collapse
Affiliation(s)
- Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, Stockholm, Sweden
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Leipzig University, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad Nacional de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
5
|
Carhuaricra-Huaman D, Setubal JC. Protein-Coding Gene Families in Prokaryote Genome Comparisons. Methods Mol Biol 2024; 2802:33-55. [PMID: 38819555 DOI: 10.1007/978-1-0716-3838-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
The identification of orthologous genes is relevant for comparative genomics, phylogenetic analysis, and functional annotation. There are many computational tools for the prediction of orthologous groups as well as web-based resources that offer orthology datasets for download and online analysis. This chapter presents a simple and practical guide to the process of orthologous group prediction, using a dataset of 10 prokaryotic proteomes as example. The orthology methods covered are OrthoMCL, COGtriangles, OrthoFinder2, and OMA. The authors compare the number of orthologous groups predicted by these various methods, and present a brief workflow for the functional annotation and reconstruction of phylogenies from inferred single-copy orthologous genes. The chapter also demonstrates how to explore two orthology databases: eggNOG6 and OrthoDB.
Collapse
Affiliation(s)
- Dennis Carhuaricra-Huaman
- Programa de Pós-Graduação Interunidades em Bioinformática, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, SP, Brazil
- Research Group in Biotechnology Applied to Animal Health, Production and Conservation (SANIGEN), Laboratory of Biology and Molecular Genetics, Faculty of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
6
|
Cribbie EP, Doerr D, Chauve C. AGO, a Framework for the Reconstruction of Ancestral Syntenies and Gene Orders. Methods Mol Biol 2024; 2802:247-265. [PMID: 38819563 DOI: 10.1007/978-1-0716-3838-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Reconstructing ancestral gene orders from the genome data of extant species is an important problem in comparative and evolutionary genomics. In a phylogenomics setting that accounts for gene family evolution through gene duplication and gene loss, the reconstruction of ancestral gene orders involves several steps, including multiple sequence alignment, the inference of reconciled gene trees, and the inference of ancestral syntenies and gene adjacencies. For each of the steps of such a process, several methods can be used and implemented using a growing corpus of, often parameterized, tools; in practice, interfacing such tools into an ancestral gene order reconstruction pipeline is far from trivial. This chapter introduces AGO, a Python-based framework aimed at creating ancestral gene order reconstruction pipelines allowing to interface and parameterize different bioinformatics tools. The authors illustrate the features of AGO by reconstructing ancestral gene orders for the X chromosome of three ancestral Anopheles species using three different pipelines. AGO is freely available at https://github.com/cchauve/AGO-pipeline .
Collapse
Affiliation(s)
- Evan P Cribbie
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Daniel Doerr
- Department for Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, German Diabetes Center (DDZ), Leibniz Institute for Diabetes Research, and Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
7
|
Lyubetsky VA, Rubanov LI, Tereshina MB, Ivanova AS, Araslanova KR, Uroshlev LA, Goremykina GI, Yang JR, Kanovei VG, Zverkov OA, Shitikov AD, Korotkova DD, Zaraisky AG. Wide-scale identification of novel/eliminated genes responsible for evolutionary transformations. Biol Direct 2023; 18:45. [PMID: 37568147 PMCID: PMC10416458 DOI: 10.1186/s13062-023-00405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 08/07/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND It is generally accepted that most evolutionary transformations at the phenotype level are associated either with rearrangements of genomic regulatory elements, which control the activity of gene networks, or with changes in the amino acid contents of proteins. Recently, evidence has accumulated that significant evolutionary transformations could also be associated with the loss/emergence of whole genes. The targeted identification of such genes is a challenging problem for both bioinformatics and evo-devo research. RESULTS To solve this problem we propose the WINEGRET method, named after the first letters of the title. Its main idea is to search for genes that satisfy two requirements: first, the desired genes were lost/emerged at the same evolutionary stage at which the phenotypic trait of interest was lost/emerged, and second, the expression of these genes changes significantly during the development of the trait of interest in the model organism. To verify the first requirement, we do not use existing databases of orthologs, but rely purely on gene homology and local synteny by using some novel quickly computable conditions. Genes satisfying the second requirement are found by deep RNA sequencing. As a proof of principle, we used our method to find genes absent in extant amniotes (reptiles, birds, mammals) but present in anamniotes (fish and amphibians), in which these genes are involved in the regeneration of large body appendages. As a result, 57 genes were identified. For three of them, c-c motif chemokine 4, eotaxin-like, and a previously unknown gene called here sod4, essential roles for tail regeneration were demonstrated. Noteworthy, we established that the latter gene belongs to a novel family of Cu/Zn-superoxide dismutases lost by amniotes, SOD4. CONCLUSIONS We present a method for targeted identification of genes whose loss/emergence in evolution could be associated with the loss/emergence of a phenotypic trait of interest. In a proof-of-principle study, we identified genes absent in amniotes that participate in body appendage regeneration in anamniotes. Our method provides a wide range of opportunities for studying the relationship between the loss/emergence of phenotypic traits and the loss/emergence of specific genes in evolution.
Collapse
Affiliation(s)
- Vassily A Lyubetsky
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
- Department of Mechanics and Mathematics, Lomonosov Moscow State University, Kolmogorova Str., 1, Moscow, Russia, 119234
| | - Lev I Rubanov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Maria B Tereshina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Pirogov Russian National Research Medical University, Moscow, Russia
| | - Anastasiya S Ivanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, USA
| | - Karina R Araslanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Leonid A Uroshlev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32, Vavilova Str., Moscow, Russia, 119991
| | - Galina I Goremykina
- Plekhanov Russian University of Economics, Stremyanny Lane 36, Moscow, Russia
| | - Jian-Rong Yang
- Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Vladimir G Kanovei
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Oleg A Zverkov
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 Build. 1, Bolshoy Karetny per., Moscow, Russia, 127051
| | - Alexander D Shitikov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
| | - Daria D Korotkova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997
- Global Health Institute, School of Life Sciences, EPFL, Lausanne, Switzerland
| | - Andrey G Zaraisky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya Str., Moscow, Russia, 117997.
- Pirogov Russian National Research Medical University, Moscow, Russia.
| |
Collapse
|
8
|
Folk RA, Gaynor ML, Engle-Wrye NJ, O’Meara BC, Soltis PS, Soltis DE, Guralnick RP, Smith SA, Grady CJ, Okuyama Y. Identifying Climatic Drivers of Hybridization with a New Ancestral Niche Reconstruction Method. Syst Biol 2023; 72:856-873. [PMID: 37073863 PMCID: PMC10405357 DOI: 10.1093/sysbio/syad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 03/23/2023] [Accepted: 04/17/2023] [Indexed: 04/20/2023] Open
Abstract
Applications of molecular phylogenetic approaches have uncovered evidence of hybridization across numerous clades of life, yet the environmental factors responsible for driving opportunities for hybridization remain obscure. Verbal models implicating geographic range shifts that brought species together during the Pleistocene have often been invoked, but quantitative tests using paleoclimatic data are needed to validate these models. Here, we produce a phylogeny for Heuchereae, a clade of 15 genera and 83 species in Saxifragaceae, with complete sampling of recognized species, using 277 nuclear loci and nearly complete chloroplast genomes. We then employ an improved framework with a coalescent simulation approach to test and confirm previous hybridization hypotheses and identify one new intergeneric hybridization event. Focusing on the North American distribution of Heuchereae, we introduce and implement a newly developed approach to reconstruct potential past distributions for ancestral lineages across all species in the clade and across a paleoclimatic record extending from the late Pliocene. Time calibration based on both nuclear and chloroplast trees recovers a mid- to late-Pleistocene date for most inferred hybridization events, a timeframe concomitant with repeated geographic range restriction into overlapping refugia. Our results indicate an important role for past episodes of climate change, and the contrasting responses of species with differing ecological strategies, in generating novel patterns of range contact among plant communities and therefore new opportunities for hybridization. The new ancestral niche method flexibly models the shape of niche while incorporating diverse sources of uncertainty and will be an important addition to the current comparative methods toolkit. [Ancestral niche reconstruction; hybridization; paleoclimate; pleistocene.].
Collapse
Affiliation(s)
- Ryan A Folk
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS, USA
| | - Michelle L Gaynor
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
- Department of Biology, University of Florida, Gainesville, FL, USA
| | - Nicholas J Engle-Wrye
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS, USA
| | - Brian C O’Meara
- Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN, USA
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Biodiversity Institute, University of Florida, Gainesville, FL, USA
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
- Department of Biology, University of Florida, Gainesville, FL, USA
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Biodiversity Institute, University of Florida, Gainesville, FL, USA
| | - Robert P Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
- Biodiversity Institute, University of Florida, Gainesville, FL, USA
| | - Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Charles J Grady
- Biodiversity Institute, University of Kansas, Lawrence, KS, 66045, USA
| | - Yudai Okuyama
- Tsukuba Botanical Garden, National Museum of Nature and Science, Tsukuba, Japan
| |
Collapse
|
9
|
Neves-da-Rocha J, Santos-Saboya MJ, Lopes MER, Rossi A, Martinez-Rossi NM. Insights and Perspectives on the Role of Proteostasis and Heat Shock Proteins in Fungal Infections. Microorganisms 2023; 11:1878. [PMID: 37630438 PMCID: PMC10456932 DOI: 10.3390/microorganisms11081878] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/30/2023] [Accepted: 07/06/2023] [Indexed: 08/27/2023] Open
Abstract
Fungi are a diverse group of eukaryotic organisms that infect humans, animals, and plants. To successfully colonize their hosts, pathogenic fungi must continuously adapt to the host's unique environment, e.g., changes in temperature, pH, and nutrient availability. Appropriate protein folding, assembly, and degradation are essential for maintaining cellular homeostasis and survival under stressful conditions. Therefore, the regulation of proteostasis is crucial for fungal pathogenesis. The heat shock response (HSR) is one of the most important cellular mechanisms for maintaining proteostasis. It is activated by various stresses and regulates the activity of heat shock proteins (HSPs). As molecular chaperones, HSPs participate in the proteostatic network to control cellular protein levels by affecting their conformation, location, and degradation. In recent years, a growing body of evidence has highlighted the crucial yet understudied role of stress response circuits in fungal infections. This review explores the role of protein homeostasis and HSPs in fungal pathogenicity, including their contributions to virulence and host-pathogen interactions, as well as the concerted effects between HSPs and the main proteostasis circuits in the cell. Furthermore, we discuss perspectives in the field and the potential for targeting the components of these circuits to develop novel antifungal therapies.
Collapse
Affiliation(s)
- João Neves-da-Rocha
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto 14049-900, SP, Brazil; (M.J.S.-S.); (M.E.R.L.); (A.R.)
| | | | | | | | - Nilce M. Martinez-Rossi
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto 14049-900, SP, Brazil; (M.J.S.-S.); (M.E.R.L.); (A.R.)
| |
Collapse
|
10
|
Langschied F, Leisegang MS, Brandes RP, Ebersberger I. ncOrtho: efficient and reliable identification of miRNA orthologs. Nucleic Acids Res 2023; 51:e71. [PMID: 37260093 PMCID: PMC10359484 DOI: 10.1093/nar/gkad467] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 05/04/2023] [Accepted: 05/30/2023] [Indexed: 06/02/2023] Open
Abstract
MicroRNAs (miRNAs) are post-transcriptional regulators that finetune gene expression via translational repression or degradation of their target mRNAs. Despite their functional relevance, frameworks for the scalable and accurate detection of miRNA orthologs are missing. Consequently, there is still no comprehensive picture of how miRNAs and their associated regulatory networks have evolved. Here we present ncOrtho, a synteny informed pipeline for the targeted search of miRNA orthologs in unannotated genome sequences. ncOrtho matches miRNA annotations from multi-tissue transcriptomes in precision, while scaling to the analysis of hundreds of custom-selected species. The presence-absence pattern of orthologs to 266 human miRNA families across 402 vertebrate species reveals four bursts of miRNA acquisition, of which the most recent event occurred in the last common ancestor of higher primates. miRNA families are rarely modified or lost, but notable exceptions for both events exist. miRNA co-ortholog numbers faithfully indicate lineage-specific whole genome duplications, and miRNAs are powerful markers for phylogenomic analyses. Their exceptionally low genetic diversity makes them suitable to resolve clades where the phylogenetic signal is blurred by incomplete lineage sorting of ancestral alleles. In summary, ncOrtho allows to routinely consider miRNAs in evolutionary analyses that were thus far reserved to protein-coding genes.
Collapse
Affiliation(s)
- Felix Langschied
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Matthias S Leisegang
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ralf P Brandes
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| |
Collapse
|
11
|
Scheunert A, Lautenschlager U, Ott T, Oberprieler C. Nano-Strainer: A workflow for the identification of single-copy nuclear loci for plant systematic studies, using target capture kits and Oxford Nanopore long reads. Ecol Evol 2023; 13:e10190. [PMID: 37475726 PMCID: PMC10354226 DOI: 10.1002/ece3.10190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 05/18/2023] [Accepted: 06/01/2023] [Indexed: 07/22/2023] Open
Abstract
In modern plant systematics, target enrichment enables simultaneous analysis of hundreds of genes. However, when dealing with reticulate or polyploidization histories, few markers may suffice, but often are required to be single-copy, a condition that is not necessarily met with commercial capture kits. Also, large genome sizes can render target capture ineffective, so that amplicon sequencing would be preferable; however, knowledge about suitable loci is often missing. Here, we present a comprehensive workflow for the identification of putative single-copy nuclear markers in a genus of interest, by mining a small dataset from target capture using a few representative taxa. The proposed pipeline assesses sequence variability contained in the data from targeted loci and assigns reads to their respective genes, via a combined BLAST/clustering procedure. Cluster consensus sequences are then examined based on four pre-defined criteria presumably indicative for absence of paralogy. This is done by calculating four specialized indices; loci are ranked according to their performance in these indices, and top-scoring loci are considered putatively single- or low copy. The approach can be applied to any probe set. As it relies on long reads, the present contribution also provides template workflows for processing Nanopore-based target capture data. Obtained markers are further tested and then entered into amplicon sequencing. For the detection of possibly remaining paralogy in these data, which might occur in groups with rampant paralogy, we also employ the long-read assembly tool canu. In diploid representatives of the young Compositae genus Leucanthemum, characterized by high levels of polyploidy, our approach resulted in successful amplification of 13 loci. Modifications to remove traces of paralogy were made in seven of these. A species tree from the markers correctly reproduced main relationships in the genus, however, at low resolution. The presented workflow has the potential to valuably support phylogenetic research, for example in polyploid plant groups.
Collapse
Affiliation(s)
- Agnes Scheunert
- Evolutionary and Systematic Botany Group, Institute of Plant SciencesUniversity of RegensburgRegensburgGermany
| | - Ulrich Lautenschlager
- Evolutionary and Systematic Botany Group, Institute of Plant SciencesUniversity of RegensburgRegensburgGermany
| | - Tankred Ott
- Evolutionary and Systematic Botany Group, Institute of Plant SciencesUniversity of RegensburgRegensburgGermany
| | - Christoph Oberprieler
- Evolutionary and Systematic Botany Group, Institute of Plant SciencesUniversity of RegensburgRegensburgGermany
| |
Collapse
|
12
|
Cerrudo CS, Motta LF, Cuccovia Warlet FU, Lassalle FM, Simonin JA, Belaich MN. Protein-Gene Orthology in Baculoviridae: An Exhaustive Analysis to Redefine the Ancestrally Common Coding Sequences. Viruses 2023; 15:v15051091. [PMID: 37243176 DOI: 10.3390/v15051091] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/23/2023] [Accepted: 04/28/2023] [Indexed: 05/28/2023] Open
Abstract
Baculoviruses are entomopathogens that carry large, double-stranded circular DNA genomes and infect insect larvae of Lepidoptera, Hymenoptera and Diptera, with applications in the biological control of agricultural pests, in the production of recombinant proteins and as viral vectors for various purposes in mammals. These viruses have a variable genetic composition that differs between species, with some sequences shared by all known members, and others that are lineage-specific or unique to isolates. Based on the analysis of nearly 300 sequenced genomes, a thorough bioinformatic investigation was conducted on all the baculoviral protein coding sequences, characterizing their orthology and phylogeny. This analysis confirmed the 38 protein coding sequences currently considered as core genes, while also identifying novel coding sequences as candidates to join this set. Accordingly, homology was found among all the major occlusion body proteins, thus proposing that the polyhedrin, granulin and CUN085 genes be considered as the 39th core gene of Baculoviridae.
Collapse
Affiliation(s)
- Carolina Susana Cerrudo
- Laboratorio de Ingeniería Genética y Biología Celular y Molecular-Área Virosis de Insectos (LIGBCM-AVI), Instituto de Microbiología Básica y Aplicada, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Sáenz Peña 352, Bernal B1876BXD, Buenos Aires, Argentina
| | - Lucas Federico Motta
- Laboratorio de Ingeniería Genética y Biología Celular y Molecular-Área Virosis de Insectos (LIGBCM-AVI), Instituto de Microbiología Básica y Aplicada, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Sáenz Peña 352, Bernal B1876BXD, Buenos Aires, Argentina
| | - Franco Uriel Cuccovia Warlet
- Laboratorio de Ingeniería Genética y Biología Celular y Molecular-Área Virosis de Insectos (LIGBCM-AVI), Instituto de Microbiología Básica y Aplicada, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Sáenz Peña 352, Bernal B1876BXD, Buenos Aires, Argentina
| | - Fernando Maku Lassalle
- Laboratorio de Ingeniería Genética y Biología Celular y Molecular-Área Virosis de Insectos (LIGBCM-AVI), Instituto de Microbiología Básica y Aplicada, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Sáenz Peña 352, Bernal B1876BXD, Buenos Aires, Argentina
| | - Jorge Alejandro Simonin
- Laboratorio de Ingeniería Genética y Biología Celular y Molecular-Área Virosis de Insectos (LIGBCM-AVI), Instituto de Microbiología Básica y Aplicada, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Sáenz Peña 352, Bernal B1876BXD, Buenos Aires, Argentina
| | - Mariano Nicolás Belaich
- Laboratorio de Ingeniería Genética y Biología Celular y Molecular-Área Virosis de Insectos (LIGBCM-AVI), Instituto de Microbiología Básica y Aplicada, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Sáenz Peña 352, Bernal B1876BXD, Buenos Aires, Argentina
| |
Collapse
|
13
|
Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos DG, Hilgers L, Lindblad-Toh K, Karlsson EK, Hiller M, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Integrating gene annotation with orthology inference at scale. Science 2023; 380:eabn3107. [PMID: 37104600 DOI: 10.1126/science.abn3107] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Annotating coding genes and inferring orthologs are two classical challenges in genomics and evolutionary biology that have traditionally been approached separately, limiting scalability. We present TOGA (Tool to infer Orthologs from Genome Alignments), a method that integrates structural gene annotation and orthology inference. TOGA implements a different paradigm to infer orthologous loci, improves ortholog detection and annotation of conserved genes compared with state-of-the-art methods, and handles even highly fragmented assemblies. TOGA scales to hundreds of genomes, which we demonstrate by applying it to 488 placental mammal and 501 bird assemblies, creating the largest comparative gene resources so far. Additionally, TOGA detects gene losses, enables selection screens, and automatically provides a superior measure of mammalian genome quality. TOGA is a powerful and scalable method to annotate and compare genes in the genomic era.
Collapse
Affiliation(s)
- Bogdan M Kirilenko
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Chetan Munegowda
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - David Jebb
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Moritz Blumer
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Ariadna E Morales
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Alexis-Walid Ahmed
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Dimitrios-Georgios Kontopoulos
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Leon Hilgers
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Abiraami TV, Sanyal RP, Misra HS, Saini A. Genome-wide analysis of bromodomain gene family in Arabidopsis and rice. FRONTIERS IN PLANT SCIENCE 2023; 14:1120012. [PMID: 36968369 PMCID: PMC10030601 DOI: 10.3389/fpls.2023.1120012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
The bromodomain-containing proteins (BRD-proteins) belongs to family of 'epigenetic mark readers', integral to epigenetic regulation. The BRD-members contain a conserved 'bromodomain' (BRD/BRD-fold: interacts with acetylated-lysine in histones), and several additional domains, making them structurally/functionally diverse. Like animals, plants also contain multiple Brd-homologs, however the extent of their diversity and impact of molecular events (genomic duplications, alternative splicing, AS) therein, is relatively less explored. The present genome-wide analysis of Brd-gene families of Arabidopsis thaliana and Oryza sativa showed extensive diversity in structure of genes/proteins, regulatory elements, expression pattern, domains/motifs, and the bromodomain (w.r.t. length, sequence, location) among the Brd-members. Orthology analysis identified thirteen ortholog groups (OGs), three paralog groups (PGs) and four singleton members (STs). While more than 40% Brd-genes were affected by genomic duplication events in both plants, AS-events affected 60% A. thaliana and 41% O. sativa genes. These molecular events affected various regions (promoters, untranslated regions, exons) of different Brd-members with potential impact on expression and/or structure-function characteristics. RNA-Seq data analysis indicated differences in tissue-specificity and stress response of Brd-members. Analysis by RT-qPCR revealed differential abundance and salt stress response of duplicate A. thaliana and O. sativa Brd-genes. Further analysis of AtBrd gene, AtBrdPG1b showed salinity-induced modulation of splicing pattern. Bromodomain (BRD)-region based phylogenetic analysis placed the A. thaliana and O. sativa homologs into clusters/sub-clusters, mostly consistent with ortholog/paralog groups. The bromodomain-region displayed several conserved signatures in key BRD-fold elements (α-helices, loops), along with variations (1-20 sites) and indels among the BRD-duplicates. Homology modeling and superposition identified structural variations in BRD-folds of divergent and duplicate BRD-members, which might affect their interaction with the chromatin histones, and associated functions. The study also showed contribution of various duplication events in Brd-gene family expansion among diverse plants, including several monocot and dicot plant species.
Collapse
Affiliation(s)
- T. V. Abiraami
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai, Maharashtra, India
| | - Ravi Prakash Sanyal
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai, Maharashtra, India
| | - Hari Sharan Misra
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai, Maharashtra, India
- Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Ajay Saini
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai, Maharashtra, India
- Homi Bhabha National Institute, Mumbai, Maharashtra, India
| |
Collapse
|
15
|
Julca I, Tan QW, Mutwil M. Toward kingdom-wide analyses of gene expression. TRENDS IN PLANT SCIENCE 2023; 28:235-249. [PMID: 36344371 DOI: 10.1016/j.tplants.2022.09.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 09/22/2022] [Accepted: 09/30/2022] [Indexed: 06/16/2023]
Abstract
Gene expression data for Archaeplastida are accumulating exponentially, with more than 300 000 RNA-sequencing (RNA-seq) experiments available for hundreds of species. The gene expression data stem from thousands of experiments that capture gene expression in various organs, tissues, cell types, (a)biotic perturbations, and genotypes. Advances in software tools make it possible to process all these data in a matter of weeks on modern office computers, giving us the possibility to study gene expression in a kingdom-wide manner for the first time. We discuss how the expression data can be accessed and processed and outline analyses that take advantage of cross-species analyses, allowing us to generate powerful and robust hypotheses about gene function and evolution.
Collapse
Affiliation(s)
- Irene Julca
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Qiao Wen Tan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore.
| |
Collapse
|
16
|
Marinovíc M, Di Falco M, Aguilar Pontes MV, Gorzsás A, Tsang A, de Vries RP, Mäkelä MR, Hildén K. Comparative Analysis of Enzyme Production Patterns of Lignocellulose Degradation of Two White Rot Fungi: Obba rivulosa and Gelatoporia subvermispora. Biomolecules 2022; 12:biom12081017. [PMID: 35892327 PMCID: PMC9330253 DOI: 10.3390/biom12081017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/08/2022] [Accepted: 07/20/2022] [Indexed: 02/01/2023] Open
Abstract
The unique ability of basidiomycete white rot fungi to degrade all components of plant cell walls makes them indispensable organisms in the global carbon cycle. In this study, we analyzed the proteomes of two closely related white rot fungi, Obba rivulosa and Gelatoporia subvermispora, during eight-week cultivation on solid spruce wood. Plant cell wall degrading carbohydrate-active enzymes (CAZymes) represented approximately 5% of the total proteins in both species. A core set of orthologous plant cell wall degrading CAZymes was shared between these species on spruce suggesting a conserved plant biomass degradation approach in this clade of basidiomycete fungi. However, differences in time-dependent production of plant cell wall degrading enzymes may be due to differences among initial growth rates of these species on solid spruce wood. The obtained results provide insight into specific enzymes and enzyme sets that are produced during the degradation of solid spruce wood in these fungi. These findings expand the knowledge on enzyme production in nature-mimicking conditions and may contribute to the exploitation of white rot fungi and their enzymes for biotechnological applications.
Collapse
Affiliation(s)
- Mila Marinovíc
- Department of Microbiology, Faculty of Agriculture and Forestry, University of Helsinki, 00790 Helsinki, Finland; (M.M.); (M.R.M.)
| | - Marcos Di Falco
- Centre for Structural and Functional Genomics, Concordia University, Montréal, QC H4B 1R6, Canada; (M.D.F.); (A.T.)
| | - Maria Victoria Aguilar Pontes
- Fungal Physiology, Westerdijk Fungal Biodiversity Institute & Fungal Molecular Physiology, Utrecht University, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands; (M.V.A.P.); (R.P.d.V.)
| | - András Gorzsás
- Department of Chemistry, Umeå University, 901 87 Umeå, Sweden;
| | - Adrian Tsang
- Centre for Structural and Functional Genomics, Concordia University, Montréal, QC H4B 1R6, Canada; (M.D.F.); (A.T.)
| | - Ronald P. de Vries
- Fungal Physiology, Westerdijk Fungal Biodiversity Institute & Fungal Molecular Physiology, Utrecht University, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands; (M.V.A.P.); (R.P.d.V.)
| | - Miia R. Mäkelä
- Department of Microbiology, Faculty of Agriculture and Forestry, University of Helsinki, 00790 Helsinki, Finland; (M.M.); (M.R.M.)
| | - Kristiina Hildén
- Department of Microbiology, Faculty of Agriculture and Forestry, University of Helsinki, 00790 Helsinki, Finland; (M.M.); (M.R.M.)
- Correspondence:
| |
Collapse
|
17
|
Smith ML, Vanderpool D, Hahn MW. Using all gene families vastly expands data available for phylogenomic inference. Mol Biol Evol 2022; 39:6596367. [PMID: 35642314 PMCID: PMC9178227 DOI: 10.1093/molbev/msac112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs using clustering approaches and retain families with a single sequence per species. This limits the amount of data available by excluding larger families. Recent advances have suggested several ways to include data from larger families. For instance, tree-based decomposition methods facilitate the extraction of orthologs from large families. Additionally, several methods for species tree inference are robust to the inclusion of paralogs and could use all of the data from larger families. Here, we explore the effects of using all families for phylogenetic inference by examining relationships among 26 primate species in detail and by analyzing five additional data sets. We compare single-copy families, orthologs extracted using tree-based decomposition approaches, and all families with all data. We explore several species tree inference methods, finding that identical trees are returned across nearly all subsets of the data and methods for primates. The relationships among Platyrrhini remain contentious; however, the species tree inference method matters more than the subset of data used. Using data from larger gene families drastically increases the number of genes available and leads to consistent estimates of branch lengths, nodal certainty and concordance, and inferences of introgression in primates. For the other data sets, topological inferences are consistent whether single-copy families or orthologs extracted using decomposition approaches are analyzed. Using larger gene families is a promising approach to include more data in phylogenomics without sacrificing accuracy, at least when high-quality genomes are available.
Collapse
Affiliation(s)
- Megan L Smith
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| | - Dan Vanderpool
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| | - Matthew W Hahn
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| |
Collapse
|
18
|
Nevers Y, Jones TEM, Jyothi D, Yates B, Ferret M, Portell-Silva L, Codo L, Cosentino S, Marcet-Houben M, Vlasova A, Poidevin L, Kress A, Hickman M, Persson E, Piližota I, Guijarro-Clarke C, Iwasaki W, Lecompte O, Sonnhammer E, Roos DS, Gabaldón T, Thybert D, Thomas PD, Hu Y, Emms DM, Bruford E, Capella-Gutierrez S, Martin MJ, Dessimoz C, Altenhoff A. The Quest for Orthologs orthology benchmark service in 2022. Nucleic Acids Res 2022; 50:W623-W632. [PMID: 35552456 PMCID: PMC9252809 DOI: 10.1093/nar/gkac330] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/07/2022] [Accepted: 04/30/2022] [Indexed: 11/15/2022] Open
Abstract
The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.
Collapse
Affiliation(s)
- Yannis Nevers
- To whom correspondence should be addressed. Tel: +41 21 692 5449;
| | - Tamsin E M Jones
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Dushyanth Jyothi
- Protein Function development, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Bethan Yates
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Meritxell Ferret
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Laura Portell-Silva
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Laia Codo
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Salvatore Cosentino
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Tokyo, Japan
| | - Marina Marcet-Houben
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Anna Vlasova
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Laetitia Poidevin
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France,BiGEst-ICube Platform, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France,BiGEst-ICube Platform, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Mark Hickman
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emma Persson
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Guijarro-Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Tokyo, Japan,Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - David S Roos
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain,Centro de Investigaciones Biomédicas en Red de Enfermedades Infecciosas, Barcelona, Spain
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90032, USA
| | - Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
| | - David M Emms
- Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, UK
| | - Elspeth Bruford
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | | | - Maria J Martin
- Protein Function development, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland,Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland,Department of Computer Science, University College London, London, UK,Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Adrian Altenhoff
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland,Computer Science Department, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
19
|
Willson J, Roddur MS, Liu B, Zaharias P, Warnow T. DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition. Syst Biol 2022; 71:610-629. [PMID: 34450658 PMCID: PMC9016570 DOI: 10.1093/sysbio/syab070] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/21/2022] Open
Abstract
Species tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors including gene duplication and loss, makes the estimation of species trees very challenging. While there have been several species tree estimation methods introduced in recent years to specifically address gene tree heterogeneity due to gene duplication and loss (such as DupTree, FastMulRFS, ASTRAL-Pro, and SpeciesRax), many incur high cost in terms of both running time and memory. We introduce a new approach, DISCO, that decomposes the multi-copy gene family trees into many single copy trees, which allows for methods previously designed for species tree inference in a single copy gene tree context to be used. We prove that using DISCO with ASTRAL (i.e., ASTRAL-DISCO) is statistically consistent under the GDL model, provided that ASTRAL-Pro correctly roots and tags each gene family tree. We evaluate DISCO paired with different methods for estimating species trees from single copy genes (e.g., ASTRAL, ASTRID, and IQ-TREE) under a wide range of model conditions, and establish that high accuracy can be obtained even when ASTRAL-Pro is not able to correctly roots and tags the gene family trees. We also compare results using MI, an alternative decomposition strategy from Yang Y. and Smith S.A. (2014), and find that DISCO provides better accuracy, most likely as a result of covering more of the gene family tree leafset in the output decomposition. [Concatenation analysis; gene duplication and loss; species tree inference; summary method.].
Collapse
Affiliation(s)
- James Willson
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Mrinmoy Saha Roddur
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Baqiao Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Paul Zaharias
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
20
|
Dylus D, Nevers Y, Altenhoff AM, Gürtler A, Dessimoz C, Glover NM. How to build phylogenetic species trees with OMA. F1000Res 2022; 9:511. [PMID: 35722083 PMCID: PMC9194518 DOI: 10.12688/f1000research.23790.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/11/2022] [Indexed: 12/21/2022] Open
Abstract
Knowledge of species phylogeny is critical to many fields of biology. In an era of genome data availability, the most common way to make a phylogenetic species tree is by using multiple protein-coding genes, conserved in multiple species. This methodology is composed of several steps: orthology inference, multiple sequence alignment and inference of the phylogeny with dedicated tools. This can be a difficult task, and orthology inference, in particular, is usually computationally intensive and error prone if done
ad hoc. This tutorial provides protocols to make use of OMA Orthologous Groups, a set of genes all orthologous to each other, to infer a phylogenetic species tree. It is designed to be user-friendly and computationally inexpensive, by providing two options: (1) Using only precomputed groups with species available on the OMA Browser, or (2) Computing orthologs using OMA Standalone for additional species, with the option of using precomputed orthology relations for those present in OMA. A protocol for downstream analyses is provided as well, including creating a supermatrix, tree inference, and visualization. All protocols use publicly available software, and we provide scripts and code snippets to facilitate data handling. The protocols are accompanied with practical examples.
Collapse
Affiliation(s)
- David Dylus
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
| | - Adrian M Altenhoff
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computer Science, ETH Zurich, Zurich, 8092, Switzerland
| | - Antoine Gürtler
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Natasha M Glover
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
| |
Collapse
|
21
|
Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform 2022; 23:6514404. [PMID: 35076693 PMCID: PMC8921630 DOI: 10.1093/bib/bbab563] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 12/03/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
Collapse
Affiliation(s)
- Venket Raghavan
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | - Louis Kraft
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | | | | |
Collapse
|
22
|
Tantoso E, Eisenhaber B, Eisenhaber F. Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes. Methods Mol Biol 2022; 2449:299-324. [PMID: 35507269 DOI: 10.1007/978-1-0716-2095-3_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The paradigm shift associated with the introduction of the pan-genome concept has drawn the attention from singular reference genomes toward the actual sequence diversity within organism populations, strain collections, clades, etc. A single genome is no longer sufficient to describe bacteria of interest, but instead, the genomic repertoire of all existing strains is the key to the metabolic, evolutionary, or pathogenic potential of a species. The classification of orthologous genes derived from a collection of taxonomically related genome sequences is central to bacterial pan-genome computational analysis. In this work, we present a review of methods for computing pan-genome gene clusters including their comparative analysis for the case of Streptococcus pyogenes strain genomes. We exhaustively scanned the parametrization space of the homologue searching procedures and find optimal parameters (sequence identity (60%) and coverage (50-60%) in the pairwise alignment) for the orthologous clustering of gene sequences. We find that the sequence identity threshold influences the number of gene families ~3 times stronger than the sequence coverage threshold.
Collapse
Affiliation(s)
- Erwin Tantoso
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- Genome Institute Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Frank Eisenhaber
- Genome Institute and Bioinformatics Institute, Singapore, Singapore.
| |
Collapse
|
23
|
Inoue J. ORTHOSCOPE*: a phylogenetic pipeline to infer gene histories from genome-wide data. Mol Biol Evol 2021; 39:6400256. [PMID: 34662403 PMCID: PMC8763121 DOI: 10.1093/molbev/msab301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Comparative genome-scale analyses of protein-coding gene sequences are employed to examine evidence for whole-genome duplication and horizontal gene transfer. For this purpose, an orthogroup should be delineated to infer evolutionary history regarding each gene, and results of all orthogroup analyses need to be integrated to infer a genome-scale history. An orthogroup is a set of genes descended from a single gene in the last common ancestor of all species under consideration. However, such analyses confront several problems: (1) analytical pipelines to infer all gene histories with methods comparing species and gene trees are not fully developed, and (2) without detailed analyses within orthogroups, evolutionary events of paralogous genes in the same orthogroup cannot be distinguished for genome-wide integration of results derived from multiple orthogroup analyses. Here I present an analytical pipeline, ORTHOSCOPE* (star), to infer evolutionary histories of animal/plant genes from genome-scale data. ORTHOSCOPE* estimates a tree for a specified gene, detects speciation/gene duplication events that occurred at nodes belonging to only one lineage leading to a species of interest, and then integrates results derived from gene trees estimated for all query genes in genome-wide data. Thus, ORTHOSCOPE* can be used to detect species nodes just after whole genome duplications as a first step of comparative genomic analyses. Moreover, by examining the presence or absence of genes belonging to species lineages with dense taxon sampling available from the ORTHOSCOPE web version, ORTHOSCOPE* can detect genes lost in specific lineages and horizontal gene transfers. This pipeline is available at https://github.com/jun-inoue/ORTHOSCOPE_STAR.
Collapse
Affiliation(s)
- Jun Inoue
- Center for Earth Surface System Dynamics, Atmosphere and Ocean Research Institute, University of Tokyo, Kashiwa, Japan
| |
Collapse
|
24
|
Yuan C, Li C, Zhao X, Yan C, Wang J, Mou Y, Sun Q, Shan S. Genome-Wide Identification and Characterization of HSP90-RAR1-SGT1-Complex Members From Arachis Genomes and Their Responses to Biotic and Abiotic Stresses. Front Genet 2021; 12:689669. [PMID: 34512718 PMCID: PMC8430224 DOI: 10.3389/fgene.2021.689669] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 08/05/2021] [Indexed: 11/13/2022] Open
Abstract
The molecular chaperone complex HSP90-RAR1-SGT1 (HRS) plays important roles in both biotic and abiotic stress responses in plants. A previous study showed that wild peanut Arachis diogoi SGT1 (AdSGT1) could enhance disease resistance in transgenic tobacco and peanut. However, no systematic analysis of the HRS complex in Arachis has been conducted to date. In this study, a comprehensive analysis of the HRS complex were performed in Arachis. Nineteen HSP90, two RAR1 and six SGT1 genes were identified from the allotetraploid peanut Arachis hypogaea, a number close to the sum of those from the two wild diploid peanut species Arachis duranensis and Arachis ipaensis. According to phylogenetic and chromosomal location analyses, thirteen orthologous gene pairs from Arachis were identified, all of which except AhHSP90-A8, AhHSP90-B9, AdHSP90-9, and AiHSP90-9 were localized on the syntenic locus, and they shared similar exon-intron structures, conserved motifs and expression patterns. Phylogenetic analysis showed that HSP90 and RAR1 from dicot and monocot plants diverged into different clusters throughout their evolution. Chromosomal location analysis indicated that AdSGT1 (the orthologous gene of AhSGT1-B3 in this study) might provide resistance to leaf late spot disease dependent on the orthologous genes of AhHSP90-B10 and AhRAR1-B in the wild peanut A. diogoi. Several HRS genes exhibited tissue-specific expression patterns, which may reflect the sites where they perform functions. By exploring published RNA-seq data, we found that several HSP90 genes play major roles in both biotic and abiotic stress responses, especially salt and drought responses. Autoactivation assays showed that AhSGT1-B1 could not be used as bait for yeast two-hybrid (Y2H) library screening. AhRAR1 and AhSGT1 could strongly interact with each other and interact with AhHSP90-B8. The present study represents the first systematic analysis of HRS complex genes in Arachis and provides valuable information for functional analyses of HRS complex genes. This study also offers potential stress-resistant genes for peanut improvement.
Collapse
Affiliation(s)
- Cuiling Yuan
- Shandong Peanut Research Institute, Qingdao, China
| | - Chunjuan Li
- Shandong Peanut Research Institute, Qingdao, China
| | - Xiaobo Zhao
- Shandong Peanut Research Institute, Qingdao, China
| | - Caixia Yan
- Shandong Peanut Research Institute, Qingdao, China
| | - Juan Wang
- Shandong Peanut Research Institute, Qingdao, China
| | - Yifei Mou
- Shandong Peanut Research Institute, Qingdao, China
| | - Quanxi Sun
- Shandong Peanut Research Institute, Qingdao, China
| | - Shihua Shan
- Shandong Peanut Research Institute, Qingdao, China
| |
Collapse
|
25
|
Bernal S, Pelaez I, Alias L, Baena M, De Pablo-Moreno JA, Serrano LJ, Camero MD, Tizzano EF, Berrueco R, Liras A. High Mutational Heterogeneity, and New Mutations in the Human Coagulation Factor V Gene. Future Perspectives for Factor V Deficiency Using Recombinant and Advanced Therapies. Int J Mol Sci 2021; 22:9705. [PMID: 34575869 PMCID: PMC8465496 DOI: 10.3390/ijms22189705] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 09/02/2021] [Accepted: 09/04/2021] [Indexed: 02/07/2023] Open
Abstract
Factor V is an essential clotting factor that plays a key role in the blood coagulation cascade on account of its procoagulant and anticoagulant activity. Eighty percent of circulating factor V is produced in the liver and the remaining 20% originates in the α-granules of platelets. In humans, the factor V gene is about 80 kb in size; it is located on chromosome 1q24.2, and its cDNA is 6914 bp in length. Furthermore, nearly 190 mutations have been reported in the gene. Factor V deficiency is an autosomal recessive coagulation disorder associated with mutations in the factor V gene. This hereditary coagulation disorder is clinically characterized by a heterogeneous spectrum of hemorrhagic manifestations ranging from mucosal or soft-tissue bleeds to potentially fatal hemorrhages. Current treatment of this condition consists in the administration of fresh frozen plasma and platelet concentrates. This article describes the cases of two patients with severe factor V deficiency, and of their parents. A high level of mutational heterogeneity of factor V gene was identified, nonsense mutations, frameshift mutations, missense changes, synonymous sequence variants and intronic changes. These findings prompted the identification of a new mutation in the human factor V gene, designated as Jaén-1, which is capable of altering the procoagulant function of factor V. In addition, an update is provided on the prospects for the treatment of factor V deficiency on the basis of yet-to-be-developed recombinant products or advanced gene and cell therapies that could potentially correct this hereditary disorder.
Collapse
Affiliation(s)
- Sara Bernal
- Department of Genetics, Santa Creu i Sant Pau Hospital and IIB Sant Pau, 08041 Barcelona, Spain; (S.B.); (L.A.); (M.B.)
- CIBERER. U-705, 18014 Barcelona, Spain
| | - Irene Pelaez
- Department of Pediatric and Oncohematology, University Hospital Virgen de las Nieves, 18014 Granada, Spain;
| | - Laura Alias
- Department of Genetics, Santa Creu i Sant Pau Hospital and IIB Sant Pau, 08041 Barcelona, Spain; (S.B.); (L.A.); (M.B.)
- CIBERER. U-705, 18014 Barcelona, Spain
| | - Manel Baena
- Department of Genetics, Santa Creu i Sant Pau Hospital and IIB Sant Pau, 08041 Barcelona, Spain; (S.B.); (L.A.); (M.B.)
| | - Juan A. De Pablo-Moreno
- Department of Genetic, Physiology and Microbiology, School of Biology, Complutense University, 28040 Madrid, Spain; (J.A.D.P.-M.); (L.J.S.)
| | - Luis J. Serrano
- Department of Genetic, Physiology and Microbiology, School of Biology, Complutense University, 28040 Madrid, Spain; (J.A.D.P.-M.); (L.J.S.)
| | - M. Dolores Camero
- Association for the Investigation and Cure of Factor V Deficiency, 23002 Jaén, Spain;
| | - Eduardo F. Tizzano
- Department of Clinical and Molecular Genetics, University Hospital Vall d’Hebron and Medicine Genetics Group, Vall d’Hebron Research Institute, 08035 Barcelona, Spain;
| | - Ruben Berrueco
- Pediatric Hematology Department, Hospital Sant Joan de Déu, University of Barcelona and Research Institute Hospital Sant Joan de Déu, 08950 Barcelona, Spain;
| | - Antonio Liras
- Department of Genetic, Physiology and Microbiology, School of Biology, Complutense University, 28040 Madrid, Spain; (J.A.D.P.-M.); (L.J.S.)
| |
Collapse
|
26
|
Zhou W, Soghigian J, Xiang QYJ. A New Pipeline for Removing Paralogs in Target Enrichment Data. Syst Biol 2021; 71:410-425. [PMID: 34146111 PMCID: PMC8974407 DOI: 10.1093/sysbio/syab044] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 06/04/2021] [Accepted: 06/12/2021] [Indexed: 12/30/2022] Open
Abstract
Target enrichment (such as Hyb-Seq) is a well-established high throughput sequencing
method that has been increasingly used for phylogenomic studies. Unfortunately, current
widely used pipelines for analysis of target enrichment data do not have a vigorous
procedure to remove paralogs in target enrichment data. In this study, we develop a
pipeline we call Putative Paralogs Detection (PPD) to better address putative paralogs
from enrichment data. The new pipeline is an add-on to the existing HybPiper pipeline, and
the entire pipeline applies criteria in both sequence similarity and heterozygous sites at
each locus in the identification of paralogs. Users may adjust the thresholds of sequence
identity and heterozygous sites to identify and remove paralogs according to the level of
phylogenetic divergence of their group of interest. The new pipeline also removes highly
polymorphic sites attributed to errors in sequence assembly and gappy regions in the
alignment. We demonstrated the value of the new pipeline using empirical data generated
from Hyb-Seq and the Angiosperms353 kit for two woody genera Castanea
(Fagaceae, Fagales) and Hamamelis (Hamamelidaceae, Saxifragales).
Comparisons of data sets showed that the PPD identified many more putative paralogs than
the popular method HybPiper. Comparisons of tree topologies and divergence times showed
evident differences between data from HybPiper and data from our new PPD pipeline. We
further evaluated the accuracy and error rates of PPD by BLAST mapping of putative
paralogous and orthologous sequences to a reference genome sequence of Castanea
mollissima. Compared to HybPiper alone, PPD identified substantially more
paralogous gene sequences that mapped to multiple regions of the reference genome (31
genes for PPD compared with 4 genes for HybPiper alone). In conjunction with HybPiper,
paralogous genes identified by both pipelines can be removed resulting in the construction
of more robust orthologous gene data sets for phylogenomic and divergence time analyses.
Our study demonstrates the value of Hyb-Seq with data derived from the Angiosperms353
probe set for elucidating species relationships within a genus, and argues for the
importance of additional steps to filter paralogous genes and poorly aligned regions
(e.g., as occur through assembly errors), such as our new PPD pipeline described in this
study. [Angiosperms353; Castanea; divergence time;
Hamamelis; Hyb-Seq, paralogs, phylogenomics.]
Collapse
Affiliation(s)
- Wenbin Zhou
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27965, USA
| | - John Soghigian
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27965, USA
| | - Qiu-Yun Jenny Xiang
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27965, USA
| |
Collapse
|
27
|
Schmitt-Ulms G, Mehrabian M, Williams D, Ehsani S. The IDIP framework for assessing protein function and its application to the prion protein. Biol Rev Camb Philos Soc 2021; 96:1907-1932. [PMID: 33960099 DOI: 10.1111/brv.12731] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 04/22/2021] [Accepted: 04/26/2021] [Indexed: 01/06/2023]
Abstract
The quest to determine the function of a protein can represent a profound challenge. Although this task is the mandate of countless research groups, a general framework for how it can be approached is conspicuously lacking. Moreover, even expectations for when the function of a protein can be considered to be 'known' are not well defined. In this review, we begin by introducing concepts pertinent to the challenge of protein function assignments. We then propose a framework for inferring a protein's function from four data categories: 'inheritance', 'distribution', 'interactions' and 'phenotypes' (IDIP). We document that the functions of proteins emerge at the intersection of inferences drawn from these data categories and emphasise the benefit of considering them in an evolutionary context. We then apply this approach to the cellular prion protein (PrPC ), well known for its central role in prion diseases, whose function continues to be considered elusive by many investigators. We document that available data converge on the conclusion that the function of the prion protein is to control a critical post-translational modification of the neural cell adhesion molecule in the context of epithelial-to-mesenchymal transition and related plasticity programmes. Finally, we argue that this proposed function of PrPC has already passed the test of time and is concordant with the IDIP framework in a way that other functions considered for this protein fail to achieve. We anticipate that the IDIP framework and the concepts analysed herein will aid the investigation of other proteins whose primary functional assignments have thus far been intractable.
Collapse
Affiliation(s)
- Gerold Schmitt-Ulms
- Tanz Centre for Research in Neurodegenerative Diseases, University of Toronto, Toronto, ON, M5T 0S8, Canada.,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | | | - Declan Williams
- Tanz Centre for Research in Neurodegenerative Diseases, University of Toronto, Toronto, ON, M5T 0S8, Canada
| | - Sepehr Ehsani
- Theoretical and Philosophical Biology, Department of Philosophy, University College London, Bloomsbury, London, WC1E 6BT, U.K.,Ronin Institute for Independent Scholarship, Montclair, NJ, 07043, U.S.A
| |
Collapse
|
28
|
Linard B, Ebersberger I, McGlynn SE, Glover N, Mochizuki T, Patricio M, Lecompte O, Nevers Y, Thomas PD, Gabaldón T, Sonnhammer E, Dessimoz C, Uchiyama I. Ten Years of Collaborative Progress in the Quest for Orthologs. Mol Biol Evol 2021; 38:3033-3045. [PMID: 33822172 PMCID: PMC8321534 DOI: 10.1093/molbev/msab098] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/07/2021] [Accepted: 04/01/2021] [Indexed: 12/19/2022] Open
Abstract
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
Collapse
Affiliation(s)
- Benjamin Linard
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,SPYGEN, Le Bourget-du-Lac, France
| | - Ingo Ebersberger
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, Germany.,LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Natasha Glover
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Tomohiro Mochizuki
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BCS-CNS), Jordi Girona, Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, University College London, London, United Kingdom.,Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ikuo Uchiyama
- Department of Theoretical Biology, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | | |
Collapse
|
29
|
Zaquin T, Malik A, Drake JL, Putnam HM, Mass T. Evolution of Protein-Mediated Biomineralization in Scleractinian Corals. Front Genet 2021; 12:618517. [PMID: 33633782 PMCID: PMC7902050 DOI: 10.3389/fgene.2021.618517] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 01/08/2021] [Indexed: 12/19/2022] Open
Abstract
While recent strides have been made in understanding the biological process by which stony corals calcify, much remains to be revealed, including the ubiquity across taxa of specific biomolecules involved. Several proteins associated with this process have been identified through proteomic profiling of the skeletal organic matrix (SOM) extracted from three scleractinian species. However, the evolutionary history of this putative “biomineralization toolkit,” including the appearance of these proteins’ throughout metazoan evolution, remains to be resolved. Here we used a phylogenetic approach to examine the evolution of the known scleractinians’ SOM proteins across the Metazoa. Our analysis reveals an evolutionary process dominated by the co-option of genes that originated before the cnidarian diversification. Each one of the three species appears to express a unique set of the more ancient genes, representing the independent co-option of SOM proteins, as well as a substantial proportion of proteins that evolved independently. In addition, in some instances, the different species expressed multiple orthologous proteins sharing the same evolutionary history. Furthermore, the non-random clustering of multiple SOM proteins within scleractinian-specific branches suggests the conservation of protein function between distinct species for what we posit is part of the scleractinian “core biomineralization toolkit.” This “core set” contains proteins that are likely fundamental to the scleractinian biomineralization mechanism. From this analysis, we infer that the scleractinians’ ability to calcify was achieved primarily through multiple lineage-specific protein expansions, which resulted in a new functional role that was not present in the parent gene.
Collapse
Affiliation(s)
- Tal Zaquin
- Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| | - Assaf Malik
- Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| | - Jeana L Drake
- Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| | - Hollie M Putnam
- Department of Biological Sciences, University of Rhode Island, Kingston, RI, United States
| | - Tali Mass
- Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| |
Collapse
|
30
|
Rouka E, Gourgoulianni N, Lüpold S, Hatzoglou C, Gourgoulianis K, Blanckenhorn WU, Zarogiannis SG. The Drosophila septate junctions beyond barrier function: Review of the literature, prediction of human orthologs of the SJ-related proteins and identification of protein domain families. Acta Physiol (Oxf) 2021; 231:e13527. [PMID: 32603029 DOI: 10.1111/apha.13527] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 06/20/2020] [Accepted: 06/22/2020] [Indexed: 12/20/2022]
Abstract
The involvement of Septate Junctions (SJs) in critical cellular functions that extend beyond their role as diffusion barriers in the epithelia and the nervous system has made the fruit fly an ideal model for the study of human diseases associated with impaired Tight Junction (TJ) function. In this study, we summarized current knowledge of the Drosophila melanogaster SJ-related proteins, focusing on their unconventional functions. Additionally, we sought to identify human orthologs of the corresponding genes as well as protein domain families. The systematic literature search was performed in PubMed and Scopus databases using relevant key terms. Orthologs were predicted using the DIOPT tool and aligned protein regions were determined from the Pfam database. 3-D models of the smooth SJ proteins were built on the Phyre2 and DMPFold protein structure prediction servers. A total of 30 proteins were identified as relatives to the SJ cellular structure. Key roles of these proteins, mainly in the regulation of morphogenetic events and cellular signalling, were highlighted. The investigation of protein domain families revealed that the SJ-related proteins contain conserved domains that are required not only for cell-cell interactions and cell polarity but also for cellular signalling and immunity. DIOPT analysis of orthologs identified novel human genes as putative functional homologs of the fruit fly SJ genes. A gap in our knowledge was identified regarding the domains that occur in the proteins encoded by eight SJ-associated genes. Future investigation of these domains is needed to provide functional information.
Collapse
Affiliation(s)
- Erasmia Rouka
- Department of Physiology Faculty of Medicine School of Health Sciences University of ThessalyBIOPOLIS Larissa Greece
| | - Natalia Gourgoulianni
- Department of Evolutionary Biology and Environmental Studies University of Zurich Zurich Switzerland
| | - Stefan Lüpold
- Department of Evolutionary Biology and Environmental Studies University of Zurich Zurich Switzerland
| | - Chrissi Hatzoglou
- Department of Physiology Faculty of Medicine School of Health Sciences University of ThessalyBIOPOLIS Larissa Greece
- Department of Respiratory Medicine Faculty of Medicine School of Health Sciences University of ThessalyBIOPOLIS Larissa Greece
| | - Konstantinos Gourgoulianis
- Department of Respiratory Medicine Faculty of Medicine School of Health Sciences University of ThessalyBIOPOLIS Larissa Greece
| | - Wolf U. Blanckenhorn
- Department of Evolutionary Biology and Environmental Studies University of Zurich Zurich Switzerland
| | - Sotirios G. Zarogiannis
- Department of Physiology Faculty of Medicine School of Health Sciences University of ThessalyBIOPOLIS Larissa Greece
- Department of Respiratory Medicine Faculty of Medicine School of Health Sciences University of ThessalyBIOPOLIS Larissa Greece
| |
Collapse
|
31
|
Altenhoff AM, Garrayo-Ventas J, Cosentino S, Emms D, Glover NM, Hernández-Plaza A, Nevers Y, Sundesha V, Szklarczyk D, Fernández JM, Codó L, For Orthologs Consortium TQ, Gelpi JL, Huerta-Cepas J, Iwasaki W, Kelly S, Lecompte O, Muffato M, Martin MJ, Capella-Gutierrez S, Thomas PD, Sonnhammer E, Dessimoz C. The Quest for Orthologs benchmark service and consensus calls in 2020. Nucleic Acids Res 2020; 48:W538-W545. [PMID: 32374845 PMCID: PMC7319555 DOI: 10.1093/nar/gkaa308] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 04/16/2020] [Accepted: 04/20/2020] [Indexed: 12/18/2022] Open
Abstract
The identification of orthologs—genes in different species which descended from the same gene in their last common ancestor—is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,ETH Zurich, Department of Computer Science, Zurich, Switzerland
| | | | - Salvatore Cosentino
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - David Emms
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, UK
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Ana Hernández-Plaza
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Vicky Sundesha
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Damian Szklarczyk
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Laia Codó
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Josep Ll Gelpi
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Department of Biochemistry and Molecular Biomedicine. University of Barcelona. Barcelona, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, UK
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, USA
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK
| |
Collapse
|
32
|
Zhou Z, Charlesworth J, Achtman M. Accurate reconstruction of bacterial pan- and core genomes with PEPPAN. Genome Res 2020; 30:1667-1679. [PMID: 33055096 PMCID: PMC7605250 DOI: 10.1101/gr.260828.120] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 09/01/2020] [Indexed: 12/22/2022]
Abstract
Bacterial genomes can contain traces of a complex evolutionary history, including extensive homologous recombination, gene loss, gene duplications, and horizontal gene transfer. To reconstruct the phylogenetic and population history of a set of multiple bacteria, it is necessary to examine their pangenome, the composite of all the genes in the set. Here we introduce PEPPAN, a novel pipeline that can reliably construct pangenomes from thousands of genetically diverse bacterial genomes that represent the diversity of an entire genus. PEPPAN outperforms existing pangenome methods by providing consistent gene and pseudogene annotations extended by similarity-based gene predictions, and identifying and excluding paralogs by combining tree- and synteny-based approaches. The PEPPAN package additionally includes PEPPAN_parser, which implements additional downstream analyses, including the calculation of trees based on accessory gene content or allelic differences between core genes. To test the accuracy of PEPPAN, we implemented SimPan, a novel pipeline for simulating the evolution of bacterial pangenomes. We compared the accuracy and speed of PEPPAN with four state-of-the-art pangenome pipelines using both empirical and simulated data sets. PEPPAN was more accurate and more specific than any of the other pipelines and was almost as fast as any of them. As a case study, we used PEPPAN to construct a pangenome of approximately 40,000 genes from 3052 representative genomes spanning at least 80 species of Streptococcus The resulting gene and allelic trees provide an unprecedented overview of the genomic diversity of the entire Streptococcus genus.
Collapse
Affiliation(s)
- Zhemin Zhou
- Warwick Medical School, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Jane Charlesworth
- Warwick Medical School, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Mark Achtman
- Warwick Medical School, University of Warwick, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
33
|
Lallemand T, Leduc M, Landès C, Rizzon C, Lerat E. An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice. Genes (Basel) 2020; 11:E1046. [PMID: 32899740 PMCID: PMC7565063 DOI: 10.3390/genes11091046] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 09/01/2020] [Accepted: 09/02/2020] [Indexed: 12/11/2022] Open
Abstract
Gene duplication is an important evolutionary mechanism allowing to provide new genetic material and thus opportunities to acquire new gene functions for an organism, with major implications such as speciation events. Various processes are known to allow a gene to be duplicated and different models explain how duplicated genes can be maintained in genomes. Due to their particular importance, the identification of duplicated genes is essential when studying genome evolution but it can still be a challenge due to the various fates duplicated genes can encounter. In this review, we first describe the evolutionary processes allowing the formation of duplicated genes but also describe the various bioinformatic approaches that can be used to identify them in genome sequences. Indeed, these bioinformatic approaches differ according to the underlying duplication mechanism. Hence, understanding the specificity of the duplicated genes of interest is a great asset for tool selection and should be taken into account when exploring a biological question.
Collapse
Affiliation(s)
- Tanguy Lallemand
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Martin Leduc
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Claudine Landès
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Carène Rizzon
- Laboratoire de Mathématiques et Modélisation d’Evry (LaMME), Université d’Evry Val d’Essonne, Université Paris-Saclay, UMR CNRS 8071, ENSIIE, USC INRAE, 23 bvd de France, CEDEX, 91037 Evry Paris, France;
| | - Emmanuelle Lerat
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
34
|
Molloy EK, Warnow T. FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics 2020; 36:i57-i65. [PMID: 32657396 PMCID: PMC7355287 DOI: 10.1093/bioinformatics/btaa444] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed. RESULTS We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods. AVAILABILITY AND IMPEMENTATION FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
35
|
Freeman AR, Ophir AG, Sheehan MJ. The giant pouched rat (Cricetomys ansorgei) olfactory receptor repertoire. PLoS One 2020; 15:e0221981. [PMID: 32240170 PMCID: PMC7117715 DOI: 10.1371/journal.pone.0221981] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 03/06/2020] [Indexed: 12/12/2022] Open
Abstract
For rodents, olfaction is essential for locating food, recognizing mates and competitors, avoiding predators, and navigating their environment. It is thought that rodents may have expanded olfactory receptor repertoires in order to specialize in olfactory behavior. Despite being the largest clade of mammals and depending on olfaction relatively little work has documented olfactory repertoires outside of conventional laboratory species. Here we report the olfactory receptor repertoire of the African giant pouched rat (Cricetomys ansorgei), a Muroid rodent distantly related to mice and rats. The African giant pouched rat is notable for its large cortex and olfactory bulbs relative to its body size compared to other sympatric rodents, which suggests anatomical elaboration of olfactory capabilities. We hypothesized that in addition to anatomical elaboration for olfaction, these pouched rats might also have an expanded olfactory receptor repertoire to enable their olfactory behavior. We examined the composition of the olfactory receptor repertoire to better understand how their sensory capabilities have evolved. We identified 1145 intact olfactory genes, and 260 additional pseudogenes within 301 subfamilies from the African giant pouched rat genome. This repertoire is similar to mice and rats in terms of size, pseudogene percentage and number of subfamilies. Analyses of olfactory receptor gene trees revealed that the pouched rat has 6 expansions in different subfamilies compared to mice, rats and squirrels. We identified 81 orthologous genes conserved among 4 rodent species and an additional 147 conserved genes within the Muroid rodents. The orthologous genes shared within Muroidea suggests that there may be a conserved Muroid-specific olfactory receptor repertoire. We also note that the description of this repertoire can serve as a complement to other studies of rodent olfaction, as the pouched rat is an outgroup within Muroidea. Thus, our data suggest that African giant pouched rats are capable of both natural and trained olfactory behaviors with a typical Muriod olfactory receptor repertoire.
Collapse
Affiliation(s)
- Angela R. Freeman
- Department of Psychology, Cornell University, Ithaca, NY, United States of America
- * E-mail:
| | - Alexander G. Ophir
- Department of Psychology, Cornell University, Ithaca, NY, United States of America
| | - Michael J. Sheehan
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY, United States of America
| |
Collapse
|
36
|
Abstract
The Orthologous Matrix (OMA) is a method and database that allows users to identify orthologs among many genomes. OMA provides three different types of orthologs: pairwise orthologs, OMA Groups and Hierarchical Orthologous Groups (HOGs). This Primer is organized in two parts. In the first part, we provide all the necessary background information to understand the concepts of orthology, how we infer them and the different subtypes of orthology in OMA, as well as what types of analyses they should be used for. In the second part, we describe protocols for using the OMA browser to find a specific gene and its various types of orthologs. By the end of the Primer, readers should be able to (i) understand homology and the different types of orthologs reported in OMA, (ii) understand the best type of orthologs to use for a particular analysis; (iii) find particular genes of interest in the OMA browser; and (iv) identify orthologs for a given gene. The data can be freely accessed from the OMA browser at https://omabrowser.org.
Collapse
Affiliation(s)
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Natasha M. Glover
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
| |
Collapse
|