1
|
Cox RM, Papoulas O, Shril S, Lee C, Gardner T, Battenhouse AM, Lee M, Drew K, McWhite CD, Yang D, Leggere JC, Durand D, Hildebrandt F, Wallingford JB, Marcotte EM. Ancient eukaryotic protein interactions illuminate modern genetic traits and disorders. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.26.595818. [PMID: 38853926 PMCID: PMC11160598 DOI: 10.1101/2024.05.26.595818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
All eukaryotes share a common ancestor from roughly 1.5 - 1.8 billion years ago, a single-celled, swimming microbe known as LECA, the Last Eukaryotic Common Ancestor. Nearly half of the genes in modern eukaryotes were present in LECA, and many current genetic diseases and traits stem from these ancient molecular systems. To better understand these systems, we compared genes across modern organisms and identified a core set of 10,092 shared protein-coding gene families likely present in LECA, a quarter of which are uncharacterized. We then integrated >26,000 mass spectrometry proteomics analyses from 31 species to infer how these proteins interact in higher-order complexes. The resulting interactome describes the biochemical organization of LECA, revealing both known and new assemblies. We analyzed these ancient protein interactions to find new human gene-disease relationships for bone density and congenital birth defects, demonstrating the value of ancestral protein interactions for guiding functional genetics today.
Collapse
Affiliation(s)
- Rachael M Cox
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Ophelia Papoulas
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Shirlee Shril
- Division of Nephrology, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02215, USA
| | - Chanjae Lee
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Tynan Gardner
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Anna M Battenhouse
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Muyoung Lee
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Kevin Drew
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Claire D McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - David Yang
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Janelle C Leggere
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, 4400 5th Avenue Pittsburgh, PA 15213, USA
| | - Friedhelm Hildebrandt
- Division of Nephrology, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02215, USA
| | - John B Wallingford
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
2
|
Thiébaut A, Altenhoff AM, Campli G, Glover N, Dessimoz C, Waterhouse RM. DrosOMA: the Drosophila Orthologous Matrix browser. F1000Res 2024; 12:936. [PMID: 38434623 PMCID: PMC10905159 DOI: 10.12688/f1000research.135250.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/12/2024] [Indexed: 03/05/2024] Open
Abstract
Background Comparative genomic analyses to delineate gene evolutionary histories inform the understanding of organismal biology by characterising gene and gene family origins, trajectories, and dynamics, as well as enabling the tracing of speciation, duplication, and loss events, and facilitating the transfer of gene functional information across species. Genomic data are available for an increasing number of species from the genus Drosophila, however, a dedicated resource exploiting these data to provide the research community with browsable results from genus-wide orthology delineation has been lacking. Methods Using the OMA Orthologous Matrix orthology inference approach and browser deployment framework, we catalogued orthologues across a selected set of Drosophila species with high-quality annotated genomes. We developed and deployed a dedicated instance of the OMA browser to facilitate intuitive exploration, visualisation, and downloading of the genus-wide orthology delineation results. Results DrosOMA - the Drosophila Orthologous Matrix browser, accessible from https://drosoma.dcsr.unil.ch/ - presents the results of orthology delineation for 36 drosophilids from across the genus and four outgroup dipterans. It enables querying and browsing of the orthology data through a feature-rich web interface, with gene-view, orthologous group-view, and genome-view pages, including comprehensive gene name and identifier cross-references together with available functional annotations and protein domain architectures, as well as tools to visualise local and global synteny conservation. Conclusions The DrosOMA browser demonstrates the deployability of the OMA browser framework for building user-friendly orthology databases with dense sampling of a selected taxonomic group. It provides the Drosophila research community with a tailored resource of browsable results from genus-wide orthology delineation.
Collapse
Affiliation(s)
- Antonin Thiébaut
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Adrian M. Altenhoff
- Department of Computer Science, SIB Swiss Institute of Bioinformatics, ETH Zurich, Zurich, Switzerland
| | - Giulia Campli
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Natasha Glover
- Department of Computational Biology, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Robert M. Waterhouse
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
3
|
Rivas-Santisteban J, Yubero P, Robaina-Estévez S, González JM, Tamames J, Pedrós-Alió C. Quantifying microbial guilds. ISME COMMUNICATIONS 2024; 4:ycae042. [PMID: 38707845 PMCID: PMC11069341 DOI: 10.1093/ismeco/ycae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 03/22/2024] [Accepted: 03/22/2024] [Indexed: 05/07/2024]
Abstract
The ecological role of microorganisms is of utmost importance due to their multiple interactions with the environment. However, assessing the contribution of individual taxonomic groups has proven difficult despite the availability of high throughput data, hindering our understanding of such complex systems. Here, we propose a quantitative definition of guild that is readily applicable to metagenomic data. Our framework focuses on the functional character of protein sequences, as well as their diversifying nature. First, we discriminate functional sequences from the whole sequence space corresponding to a gene annotation to then quantify their contribution to the guild composition across environments. In addition, we identify and distinguish functional implementations, which are sequence spaces that have different ways of carrying out the function. In contrast, we found that orthology delineation did not consistently align with ecologically (or functionally) distinct implementations of the function. We demonstrate the value of our approach with two case studies: the ammonia oxidation and polyamine uptake guilds from the Malaspina circumnavigation cruise, revealing novel ecological dynamics of the latter in marine ecosystems. Thus, the quantification of guilds helps us to assess the functional role of different taxonomic groups with profound implications on the study of microbial communities.
Collapse
Affiliation(s)
- Juan Rivas-Santisteban
- Microbiome Analysis Laboratory, Centro Nacional de Biotecnología (CNB), CSIC, Calle Darwin no. 3, Madrid, 28049, Spain
| | - Pablo Yubero
- Logic of Genomic Systems Laboratory, Centro Nacional de Biotecnología (CNB), CSIC, Spain
| | | | | | - Javier Tamames
- Microbiome Analysis Laboratory, Centro Nacional de Biotecnología (CNB), CSIC, Calle Darwin no. 3, Madrid, 28049, Spain
| | - Carlos Pedrós-Alió
- Microbiome Analysis Laboratory, Centro Nacional de Biotecnología (CNB), CSIC, Calle Darwin no. 3, Madrid, 28049, Spain
| |
Collapse
|
4
|
Manzano-Morales S, Liu Y, González-Bodí S, Huerta-Cepas J, Iranzo J. Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses. Genome Biol 2023; 24:250. [PMID: 37904249 PMCID: PMC10614367 DOI: 10.1186/s13059-023-03089-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND A key step for comparative genomics is to group open reading frames into functionally and evolutionarily meaningful gene clusters. Gene clustering is complicated by intraspecific duplications and horizontal gene transfers that are frequent in prokaryotes. In consequence, gene clustering methods must deal with a trade-off between identifying vertically transmitted representatives of multicopy gene families, which are recognizable by synteny conservation, and retrieving complete sets of species-level orthologs. We studied the implications of adopting homology, orthology, or synteny conservation as formal criteria for gene clustering by performing comparative analyses of 125 prokaryotic pangenomes. RESULTS Clustering criteria affect pangenome functional characterization, core genome inference, and reconstruction of ancestral gene content to different extents. Species-wise estimates of pangenome and core genome sizes change by the same factor when using different clustering criteria, allowing robust cross-species comparisons regardless of the clustering criterion. However, cross-species comparisons of genome plasticity and functional profiles are substantially affected by inconsistencies among clustering criteria. Such inconsistencies are driven not only by mobile genetic elements, but also by genes involved in defense, secondary metabolism, and other accessory functions. In some pangenome features, the variability attributed to methodological inconsistencies can even exceed the effect sizes of ecological and phylogenetic variables. CONCLUSIONS Choosing an appropriate criterion for gene clustering is critical to conduct unbiased pangenome analyses. We provide practical guidelines to choose the right method depending on the research goals and the quality of genome assemblies, and a benchmarking dataset to assess the robustness and reproducibility of future comparative studies.
Collapse
Affiliation(s)
- Saioa Manzano-Morales
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Barcelona Supercomputing Centre (BSC-CNS) - Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Yang Liu
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Sara González-Bodí
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
| | - Jaime Iranzo
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
- Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain.
| |
Collapse
|
5
|
Ramos Aguila LC, Sánchez Moreano JP, Akutse KS, Bamisile BS, Liu J, Haider FU, Ashraf HJ, Wang L. Comprehensive genome-wide identification and expression profiling of ADF gene family in Citrus sinensis, induced by endophytic colonization of Beauveria bassiana. Int J Biol Macromol 2023; 225:886-898. [PMID: 36403770 DOI: 10.1016/j.ijbiomac.2022.11.153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 10/30/2022] [Accepted: 11/01/2022] [Indexed: 11/19/2022]
Abstract
Endophytic entomopathogenic species are known to systematically colonize host plants and form symbiotic associations that benefit the plants they live with. The actin-depolymerizing factors (ADFs) are a group of gene family that regulate growth, development, and defense-related functions in plants. Systematic studies of ADF family at the genome-wide level and their expression in response to endophytic colonization are essential to understand its functions but are currently lacking in this field. 14ADF genes were identified and characterized in the Citrus sinensis genome. The ADF genes of C. sinensis were classified into five groups according to the phylogenetic analysis of plant ADFs. Additionally, the cis-acting analysis revealed that these genes play essential role in plant growth/development, phytohormone, and biotic and abiotic responses; and the expression analysis showed that the symbiotic interactions generate a significant expression regulation level of ADF genes in leaves, stems and roots, compared to controls; thus enhancing seedlings' growth. Additionally, the 3D structures of the ADF domain were highly conserved during evolution. These results will be helpful for further functional validation of ADFs candidate genes and provide important insights into the vegetative growth, development and stress tolerance of C. sinensis in responses to endophytic colonization by B. bassiana.
Collapse
Affiliation(s)
- Luis Carlos Ramos Aguila
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Key Laboratory of Biopesticide and Biochemistry, MOE, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China; Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Jessica Paola Sánchez Moreano
- Carrera de Agroecología, Facultad de Ciencias Socio-Ambientales, Universidad Regional Amazónica Ikiam, Tena 150102, Ecuador
| | - Komivi Senyo Akutse
- International Centre of Insect Physiology and Ecology (icipe), Nairobi, P.O. Box 30772-00100, Kenya
| | - Bamisope Steve Bamisile
- Department of Entomology, College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| | - Juxiu Liu
- Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Fasih Ullah Haider
- Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Hafiza Javaira Ashraf
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Key Laboratory of Biopesticide and Biochemistry, MOE, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Liande Wang
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Key Laboratory of Biopesticide and Biochemistry, MOE, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
6
|
Barlow LD, Maciejowski W, More K, Terry K, Vargová R, Záhonová K, Dacks JB. Comparative Genomics for Evolutionary Cell Biology Using AMOEBAE: Understanding the Golgi and Beyond. Methods Mol Biol 2022; 2557:431-452. [PMID: 36512230 DOI: 10.1007/978-1-0716-2639-9_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Taking an evolutionary approach to cell biology can yield important new information about how the cell works and how it evolved to do so. This is true of the Golgi apparatus, as it is of all systems within the cell. Comparative genomics is one of the crucial first steps to this line of research, but comes with technical challenges that must be overcome for rigor and robustness. We here introduce AMOEBAE, a workflow for mid-range scale comparative genomic analyses. It allows for customization of parameters, queries, and taxonomic sampling of genomic and transcriptomics data. This protocol article covers the rationale for an evolutionary approach to cell biological study (i.e., when would AMOEBAE be useful), how to use AMOEBAE, and discussion of limitations. It also provides an example dataset, which demonstrates that the Golgi protein AP4 Epsilon is present as the sole retained subunit of the AP4 complex in basidiomycete fungi. AMOEBAE can facilitate comparative genomic studies by balancing reproducibility and speed with user-input and interpretation. It is hoped that AMOEBAE or similar tools will encourage cell biologists to incorporate an evolutionary context into their research.
Collapse
Affiliation(s)
- Lael D Barlow
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada. .,Division of Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee, UK.
| | - William Maciejowski
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Kiran More
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Kara Terry
- Division of Infectious Diseases, Department of Medicine, University of Alberta, Edmonton, AB, Canada
| | - Romana Vargová
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Kristína Záhonová
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec, Czechia.,Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czechia
| | - Joel B Dacks
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada. .,Division of Infectious Diseases, Department of Medicine, University of Alberta, Edmonton, AB, Canada. .,Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec, Czechia. .,Centre for Life's Origin and Evolution, Department of Genetics, Evolution and Environment, University College of London, London, UK.
| |
Collapse
|
7
|
Zhang X, Smith DR. An overview of online resources for intra-species detection of gene duplications. Front Genet 2022; 13:1012788. [PMID: 36313461 PMCID: PMC9606816 DOI: 10.3389/fgene.2022.1012788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Gene duplication plays an important role in evolutionary mechanism, which can act as a new source of genetic material in genome evolution. However, detecting duplicate genes from genomic data can be challenging. Various bioinformatics resources have been developed to identify duplicate genes from single and/or multiple species. Here, we summarize the metrics used to measure sequence identity among gene duplicates within species, compare several computational approaches that have been used to predict gene duplicates, and review recent advancements of a Basic Local Alignment Search Tool (BLAST)-based web tool and database, allowing future researchers to easily identify intra-species gene duplications. This article is a quick reference guide for research tools used for detecting gene duplicates.
Collapse
Affiliation(s)
- Xi Zhang
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, NS, Canada
| | - David Roy Smith
- Department of Biology, Western University, London, ON, Canada
| |
Collapse
|
8
|
Beal HE, Horenstein NA. Comparative genomic analysis of azasugar biosynthesis. AMB Express 2021; 11:120. [PMID: 34424396 PMCID: PMC8382821 DOI: 10.1186/s13568-021-01279-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 08/13/2021] [Indexed: 11/13/2022] Open
Abstract
Azasugars are monosaccharide analogs in which the ring oxygen is replaced with a nitrogen atom. These well-known glycosidase inhibitors are of interest as therapeutics, yet several aspects of azasugars remain unknown including their distribution, structural diversity, and chemical ecology. The hallmark signature of bacterial azasugar biosynthesis is a three gene cluster (3GC) coding for aminotransferase, phosphatase, and dehydrogenase enzymes. Using the bioinformatics platform Enzyme Similarity Tool (EST), we identified hundreds of putative three gene clusters coding for azasugar production in microbial species. In the course of this work, we also report a consensus sequence for the aminotransferase involved in azasugar biosynthesis as being: SGNXFRXXXFPNXXXXXXXLXVPXPYCXRC. Most clusters are found in Bacillus and Streptomyces species which typically inhabit soil and the rhizosphere, but some clusters are found with diverse species representation such as Photorhabdus and Xenorhabdus which are symbiotic with entomopathogenic nematodes; the human skin commensal Cutibacterium acnes, and the marine Bacillus rugosus SPB7, a symbiont to the sea sponge Spongia officinalis. This pan-taxonomic survey of the azasugar 3GC signature may lead to the identification of new azasugar producers, facilitate studies of their natural functions, and lead to new potential therapeutics.
Collapse
|
9
|
Harris CD, Torrance EL, Raymann K, Bobay LM. CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets. Mol Biol Evol 2021; 38:727-734. [PMID: 32886787 PMCID: PMC7826169 DOI: 10.1093/molbev/msaa224] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.
Collapse
Affiliation(s)
- Connor D Harris
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Ellis L Torrance
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Kasie Raymann
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Louis-Marie Bobay
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| |
Collapse
|
10
|
Berkemer SJ, McGlynn SE. A New Analysis of Archaea-Bacteria Domain Separation: Variable Phylogenetic Distance and the Tempo of Early Evolution. Mol Biol Evol 2021; 37:2332-2340. [PMID: 32316034 PMCID: PMC7403611 DOI: 10.1093/molbev/msaa089] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Comparative genomics and molecular phylogenetics are foundational for understanding biological evolution. Although many studies have been made with the aim of understanding the genomic contents of early life, uncertainty remains. A study by Weiss et al. (Weiss MC, Sousa FL, Mrnjavac N, Neukirchen S, Roettger M, Nelson-Sathi S, Martin WF. 2016. The physiology and habitat of the last universal common ancestor. Nat Microbiol. 1(9):16116.) identified a number of protein families in the last universal common ancestor of archaea and bacteria (LUCA) which were not found in previous works. Here, we report new research that suggests the clustering approaches used in this previous study undersampled protein families, resulting in incomplete phylogenetic trees which do not reflect protein family evolution. Phylogenetic analysis of protein families which include more sequence homologs rejects a simple LUCA hypothesis based on phylogenetic separation of the bacterial and archaeal domains for a majority of the previously identified LUCA proteins (∼82%). To supplement limitations of phylogenetic inference derived from incompletely populated orthologous groups and to test the hypothesis of a period of rapid evolution preceding the separation of the domains, we compared phylogenetic distances both within and between domains, for thousands of orthologous groups. We find a substantial diversity of interdomain versus intradomain branch lengths, even among protein families which exhibit a single domain separating branch and are thought to be associated with the LUCA. Additionally, phylogenetic trees with long interdomain branches relative to intradomain branches are enriched in information categories of protein families in comparison to those associated with metabolic functions. These results provide a new view of protein family evolution and temper claims about the phenotype and habitat of the LUCA.
Collapse
Affiliation(s)
- Sarah J Berkemer
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.,Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Competence Center for Scalable Data Services and Solutions, Dresden/Leipzig, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA.,RIKEN Center for Sustainable Resource Science (CSRS), Saitama, Japan
| |
Collapse
|
11
|
Yuan C, Li C, Lu X, Zhao X, Yan C, Wang J, Sun Q, Shan S. Comprehensive genomic characterization of NAC transcription factor family and their response to salt and drought stress in peanut. BMC PLANT BIOLOGY 2020; 20:454. [PMID: 33008287 PMCID: PMC7532626 DOI: 10.1186/s12870-020-02678-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 09/24/2020] [Indexed: 05/10/2023]
Abstract
BACKGROUND Peanut is one of the most important oil crop species worldwide. NAC transcription factor (TF) genes play important roles in the salt and drought stress responses of plants by activating or repressing target gene expression. However, little is known about NAC genes in peanut. RESULTS We performed a genome-wide characterization of NAC genes from the diploid wild peanut species Arachis duranensis and Arachis ipaensis, which included analyses of chromosomal locations, gene structures, conserved motifs, expression patterns, and cis-acting elements within their promoter regions. In total, 81 and 79 NAC genes were identified from A. duranensis and A. ipaensis genomes. Phylogenetic analysis of peanut NACs along with their Arabidopsis and rice counterparts categorized these proteins into 18 distinct subgroups. Fifty-one orthologous gene pairs were identified, and 46 orthologues were found to be highly syntenic on the chromosomes of both A. duranensis and A. ipaensis. Comparative RNA sequencing (RNA-seq)-based analysis revealed that the expression of 43 NAC genes was up- or downregulated under salt stress and under drought stress. Among these genes, the expression of 17 genes in cultivated peanut (Arachis hypogaea) was up- or downregulated under both stresses. Moreover, quantitative reverse transcription PCR (RT-qPCR)-based analysis revealed that the expression of most of the randomly selected NAC genes tended to be consistent with the comparative RNA-seq results. CONCLUSION Our results facilitated the functional characterization of peanut NAC genes, and the genes involved in salt and drought stress responses identified in this study could be potential genes for peanut improvement.
Collapse
Affiliation(s)
- Cuiling Yuan
- Shandong Peanut Research Institute, Qingdao, 266100, China
| | - Chunjuan Li
- Shandong Peanut Research Institute, Qingdao, 266100, China
| | - Xiaodong Lu
- Shandong Peanut Research Institute, Qingdao, 266100, China
| | - Xiaobo Zhao
- Shandong Peanut Research Institute, Qingdao, 266100, China
| | - Caixia Yan
- Shandong Peanut Research Institute, Qingdao, 266100, China
| | - Juan Wang
- Shandong Peanut Research Institute, Qingdao, 266100, China
| | - Quanxi Sun
- Shandong Peanut Research Institute, Qingdao, 266100, China.
| | - Shihua Shan
- Shandong Peanut Research Institute, Qingdao, 266100, China.
| |
Collapse
|
12
|
Delaux PM, Hetherington AJ, Coudert Y, Delwiche C, Dunand C, Gould S, Kenrick P, Li FW, Philippe H, Rensing SA, Rich M, Strullu-Derrien C, de Vries J. Reconstructing trait evolution in plant evo-devo studies. Curr Biol 2020; 29:R1110-R1118. [PMID: 31689391 DOI: 10.1016/j.cub.2019.09.044] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Our planet is teeming with an astounding diversity of plants. In a mere single group of closely related species, tremendous diversity can be observed in their form and function - the colour of petals in flowering plants, the shape of the fronds in ferns, and the branching pattern of the gametophyte in mosses. Diversity can also be found in subtler traits, such as the resistance to pathogens or the ability to recruit symbiotic microbes from the environment. Plant traits can also be highly conserved - at the cellular and metabolic levels, entire biosynthetic pathways are present in all plant groups, and morphological characteristics such as vascular tissues have been conserved for hundreds of millions of years. The research community that seeks to understand these traits - both the diverse and the conserved - by taking an evolutionary point-of-view on plant biology is growing. Here, we summarize a subset of the different aspects of plant evolutionary biology, provide a guide for structuring comparative biology approaches and discuss the pitfalls that (plant) researchers should avoid when embarking on such studies.
Collapse
Affiliation(s)
- Pierre-Marc Delaux
- Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, CNRS, UPS, Castanet-Tolosan, France.
| | | | - Yoan Coudert
- Laboratoire Reproduction et Développement des Plantes, Ecole Normale Supérieure de Lyon, CNRS, INRA, Université Claude Bernard Lyon 1, INRIA, 46 Allée d'Italie, Lyon, 69007, France
| | | | - Christophe Dunand
- Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, CNRS, UPS, Castanet-Tolosan, France
| | - Sven Gould
- Institute for Molecular Evolution, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Paul Kenrick
- Department of Earth Sciences, The Natural History Museum, Cromwell Road, London, SW7 5BD, UK
| | - Fay-Wei Li
- Boyce Thompson Institute, Ithaca, NY, USA; Plant Biology Section, Cornell University, Ithaca, NY, USA
| | - Hervé Philippe
- Centre de Théorisation et de Modélisation de la Biodiversité, Station d'Écologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, France; Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Stefan A Rensing
- Plant Cell Biology, Faculty of Biology, University of Marburg, 35043 Marburg, Germany; BIOSS Centre for Biological Signalling Studies, University Freiburg, Germany; SYNMIKRO Research Center, University of Marburg, 35043 Marburg, Germany
| | - Mélanie Rich
- Laboratoire de Recherche en Sciences Végétales, Université de Toulouse, CNRS, UPS, Castanet-Tolosan, France
| | - Christine Strullu-Derrien
- Department of Earth Sciences, The Natural History Museum, Cromwell Road, London, SW7 5BD, UK; Institut de Systématique, Évolution, Biodiversité, UMR 7205, Muséum National d'Histoire Naturelle, Paris, France
| | - Jan de Vries
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia B3H 4R2, Canada; Institute of Microbiology, Technische Universitaet Braunschweig, 38106 Braunschweig, Germany; Institute for Microbiology and Genetics, Bioinformatics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
| |
Collapse
|
13
|
Lafond M, Hellmuth M. Reconstruction of time-consistent species trees. Algorithms Mol Biol 2020; 15:16. [PMID: 32843891 PMCID: PMC7439642 DOI: 10.1186/s13015-020-00175-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/25/2020] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The history of gene families-which are equivalent to event-labeled gene trees-can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are "biologically feasible" which is the case if one can find a species tree with which the gene tree can be reconciled in a time-consistent way. RESULTS In this contribution, we consider event-labeled gene trees that contain speciations, duplications as well as horizontal gene transfer (HGT) and we assume that the species tree is unknown. Although many problems become NP-hard as soon as HGT and time-consistency are involved, we show, in contrast, that the problem of finding a time-consistent species tree for a given event-labeled gene can be solved in polynomial-time. We provide a cubic-time algorithm to decide whether a "time-consistent" species tree for a given event-labeled gene tree exists and, in the affirmative case, to construct the species tree within the same time-complexity.
Collapse
Affiliation(s)
- Manuel Lafond
- Department of Computer Science, Université de Sherbrooke, 2500 Boul. de l’Université, Sherbrooke, J1K 2R1 Canada
| | - Marc Hellmuth
- School of Computing, University of Leeds, E C Stoner Building, Leeds, LS2 9JT UK
| |
Collapse
|
14
|
Galperin MY, Kristensen DM, Makarova KS, Wolf YI, Koonin EV. Microbial genome analysis: the COG approach. Brief Bioinform 2020; 20:1063-1070. [PMID: 28968633 DOI: 10.1093/bib/bbx117] [Citation(s) in RCA: 152] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/01/2017] [Indexed: 11/15/2022] Open
Abstract
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.
Collapse
|
15
|
Nagy LG, Merényi Z, Hegedüs B, Bálint B. Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing. Nucleic Acids Res 2020; 48:2209-2219. [PMID: 31943056 PMCID: PMC7049691 DOI: 10.1093/nar/gkz1241] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/15/2019] [Accepted: 12/31/2019] [Indexed: 12/21/2022] Open
Abstract
Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the 'dark side' of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.
Collapse
Affiliation(s)
- László G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Balázs Bálint
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| |
Collapse
|
16
|
Cosentino S, Iwasaki W. SonicParanoid: fast, accurate and easy orthology inference. Bioinformatics 2019; 35:149-151. [PMID: 30032301 PMCID: PMC6298048 DOI: 10.1093/bioinformatics/bty631] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 07/18/2018] [Indexed: 11/18/2022] Open
Abstract
Motivation Orthology inference constitutes a common base of many genome-based studies, as a pre-requisite for annotating new genomes, finding target genes for biotechnological applications and revealing the evolutionary history of life. Although its importance keeps rising with the ever-growing number of sequenced genomes, existing tools are computationally demanding and difficult to employ. Results Here, we present SonicParanoid, which is faster than, but comparably accurate to, the well-established tools with a balanced precision-recall trade-off. Furthermore, SonicParanoid substantially relieves the difficulties of orthology inference for those who need to construct and maintain their own genomic datasets. Availability and implementation SonicParanoid is available with a GNU GPLv3 license on the Python Package Index and BitBucket. Documentation is available at http://iwasakilab.bs.s.u-tokyo.ac.jp/sonicparanoid. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Salvatore Cosentino
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan.,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.,Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, Japan
| |
Collapse
|
17
|
Bioinformatics for Marine Products: An Overview of Resources, Bottlenecks, and Perspectives. Mar Drugs 2019; 17:md17100576. [PMID: 31614509 PMCID: PMC6835618 DOI: 10.3390/md17100576] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 10/01/2019] [Accepted: 10/02/2019] [Indexed: 12/13/2022] Open
Abstract
The sea represents a major source of biodiversity. It exhibits many different ecosystems in a huge variety of environmental conditions where marine organisms have evolved with extensive diversification of structures and functions, making the marine environment a treasure trove of molecules with potential for biotechnological applications and innovation in many different areas. Rapid progress of the omics sciences has revealed novel opportunities to advance the knowledge of biological systems, paving the way for an unprecedented revolution in the field and expanding marine research from model organisms to an increasing number of marine species. Multi-level approaches based on molecular investigations at genomic, metagenomic, transcriptomic, metatranscriptomic, proteomic, and metabolomic levels are essential to discover marine resources and further explore key molecular processes involved in their production and action. As a consequence, omics approaches, accompanied by the associated bioinformatic resources and computational tools for molecular analyses and modeling, are boosting the rapid advancement of biotechnologies. In this review, we provide an overview of the most relevant bioinformatic resources and major approaches, highlighting perspectives and bottlenecks for an appropriate exploitation of these opportunities for biotechnology applications from marine resources.
Collapse
|
18
|
Lafond M, Meghdari Miardan M, Sankoff D. Accurate prediction of orthologs in the presence of divergence after duplication. Bioinformatics 2019; 34:i366-i375. [PMID: 29950018 PMCID: PMC6022570 DOI: 10.1093/bioinformatics/bty242] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have. Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types. Results We formalize the notion of divergence after duplication and provide a theoretical basis for the inference of primary and secondary orthologs. We then put these ideas to practice with the Hybrid Prediction of Paralogs and Orthologs (HyPPO) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs. Availability and implementation HyPPO is a modular framework with a core developed in Python and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manuel Lafond
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.,Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada
| | | | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
| |
Collapse
|
19
|
Altenhoff AM, Levy J, Zarowiecki M, Tomiczek B, Warwick Vesztrocy A, Dalquen DA, Müller S, Telford MJ, Glover NM, Dylus D, Dessimoz C. OMA standalone: orthology inference among public and custom genomes and transcriptomes. Genome Res 2019; 29:1152-1163. [PMID: 31235654 PMCID: PMC6633268 DOI: 10.1101/gr.243212.118] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 05/24/2019] [Indexed: 11/24/2022]
Abstract
Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs—corresponding genes across multiple species—but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Jeremy Levy
- Centre for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), University College London, London WC1E 6BT, United Kingdom.,Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, United Kingdom
| | - Magdalena Zarowiecki
- Genomics England, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Bartłomiej Tomiczek
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, United Kingdom.,Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, 80-307 Gdansk, Poland
| | - Alex Warwick Vesztrocy
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, United Kingdom
| | - Daniel A Dalquen
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Steven Müller
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, United Kingdom
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, United Kingdom
| | - Natasha M Glover
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - David Dylus
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, United Kingdom.,Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,Department of Computer Science, University College London, London WC1E 6BT, United Kingdom
| |
Collapse
|
20
|
Zhang R, Pan Y, Ahmed L, Block E, Zhang Y, Batista VS, Zhuang H. A Multispecific Investigation of the Metal Effect in Mammalian Odorant Receptors for Sulfur-Containing Compounds. Chem Senses 2019; 43:357-366. [PMID: 29659735 DOI: 10.1093/chemse/bjy022] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Metal-coordinating compounds are generally known to have strong smells, a phenomenon that can be attributed to the fact that odorant receptors for intense-smelling compounds, such as those containing sulfur, may be metalloproteins. We previously identified a mouse odorant receptor (OR), Olfr1509, that requires copper ions for sensitive detection of a series of metal-coordinating odorants, including (methylthio)methanethiol (MTMT), a strong-smelling component of male mouse urine that attracts female mice. By combining mutagenesis and quantum mechanics/molecular mechanics (QM/MM) modeling, we identified candidate binding sites in Olfr1509 that may bind to the copper-MTMT complex. However, whether there are other receptors utilizing metal ions for ligand-binding and other sites important for receptor activation is still unknown. In this study, we describe a second mouse OR for MTMT with a copper effect, namely Olfr1019. In an attempt to investigate the functional changes of metal-coordinating ORs in multiple species and to decipher additional sites involved in the metal effect, we cloned various mammalian orthologs of the 2 mouse MTMT receptors, and a third mouse MTMT receptor, Olfr15, that does not have a copper effect. We found that the function of all 3 MTMT receptors varies greatly among species and that the response to MTMT always co-occurred with the copper effect. Furthermore, using ancestral reconstruction and QM/MM modeling combined with receptor functional assay, we found that the amino acid residue R260 in Olfr1509 and the respective R261 site in Olfr1019 may be important for receptor activation.
Collapse
Affiliation(s)
- Ruina Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of the Chinese Ministry of Education, Shanghai Jiaotong University School of Medicine, Huangpu District, Shanghai, P. R. China
| | - Yi Pan
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of the Chinese Ministry of Education, Shanghai Jiaotong University School of Medicine, Huangpu District, Shanghai, P. R. China
| | - Lucky Ahmed
- Department of Chemistry, Yale University, New Haven, CT, USA
| | - Eric Block
- Department of Chemistry, University at Albany, State University of New York, NY, USA
| | - Yuetian Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of the Chinese Ministry of Education, Shanghai Jiaotong University School of Medicine, Huangpu District, Shanghai, P. R. China
| | | | - Hanyi Zhuang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of the Chinese Ministry of Education, Shanghai Jiaotong University School of Medicine, Huangpu District, Shanghai, P. R. China
- Institute of Health Sciences, Shanghai Jiaotong University School of Medicine/Shanghai Institutes for Biological Sciences of Chinese Academy of Sciences, Xuhui District, Shanghai, P. R. China
| |
Collapse
|
21
|
Torres Manno MA, Pizarro MD, Prunello M, Magni C, Daurelio LD, Espariz M. GeM-Pro: a tool for genome functional mining and microbial profiling. Appl Microbiol Biotechnol 2019; 103:3123-3134. [DOI: 10.1007/s00253-019-09648-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 01/11/2019] [Accepted: 01/14/2019] [Indexed: 11/30/2022]
|
22
|
Evolutionary Patterns of Non-Coding RNA in Cardiovascular Biology. Noncoding RNA 2019; 5:ncrna5010015. [PMID: 30709035 PMCID: PMC6468844 DOI: 10.3390/ncrna5010015] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 01/26/2019] [Accepted: 01/29/2019] [Indexed: 12/15/2022] Open
Abstract
Cardiovascular diseases (CVDs) affect the heart and the vascular system with a high prevalence and place a huge burden on society as well as the healthcare system. These complex diseases are often the result of multiple genetic and environmental risk factors and pose a great challenge to understanding their etiology and consequences. With the advent of next generation sequencing, many non-coding RNA transcripts, especially long non-coding RNAs (lncRNAs), have been linked to the pathogenesis of CVD. Despite increasing evidence, the proper functional characterization of most of these molecules is still lacking. The exploration of conservation of sequences across related species has been used to functionally annotate protein coding genes. In contrast, the rapid evolutionary turnover and weak sequence conservation of lncRNAs make it difficult to characterize functional homologs for these sequences. Recent studies have tried to explore other dimensions of interspecies conservation to elucidate the functional role of these novel transcripts. In this review, we summarize various methodologies adopted to explore the evolutionary conservation of cardiovascular non-coding RNAs at sequence, secondary structure, syntenic, and expression level.
Collapse
|
23
|
Merlevede A, Åhl H, Troein C. Homology and linkage in crossover for linear genomes of variable length. PLoS One 2019; 14:e0209712. [PMID: 30605463 PMCID: PMC6317799 DOI: 10.1371/journal.pone.0209712] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 12/09/2018] [Indexed: 11/19/2022] Open
Abstract
The use of variable-length genomes in evolutionary computation has applications in optimisation when the size of the search space is unknown, and provides a unique environment to study the evolutionary dynamics of genome structure. Here, we revisit crossover for linear genomes of variable length, identifying two crucial attributes of successful recombination algorithms: the ability to retain homologous structure, and to reshuffle variant information. We introduce direct measures of these properties-homology score and linkage score-and use them to review existing crossover algorithms, as well as two novel ones. In addition, we measure the performance of these crossover methods on three different benchmark problems, and find that variable-length genomes out-perform fixed-length variants in all three cases. Our homology and linkage scores successfully explain the difference in performance between different crossover methods, providing a simple and insightful framework for crossover in a variable-length setting.
Collapse
Affiliation(s)
- Adriaan Merlevede
- Computational Biology and Biological Physics, Department of Astronomy and Theoretical Physics, Lund University, Lund, Sweden
| | - Henrik Åhl
- Computational Biology and Biological Physics, Department of Astronomy and Theoretical Physics, Lund University, Lund, Sweden
| | - Carl Troein
- Computational Biology and Biological Physics, Department of Astronomy and Theoretical Physics, Lund University, Lund, Sweden
| |
Collapse
|
24
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
25
|
Ambrosino L, Ruggieri V, Bostan H, Miralto M, Vitulo N, Zouine M, Barone A, Bouzayen M, Frusciante L, Pezzotti M, Valle G, Chiusano ML. Multilevel comparative bioinformatics to investigate evolutionary relationships and specificities in gene annotations: an example for tomato and grapevine. BMC Bioinformatics 2018; 19:435. [PMID: 30497367 PMCID: PMC6266932 DOI: 10.1186/s12859-018-2420-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background “Omics” approaches may provide useful information for a deeper understanding of speciation events, diversification and function innovation. This can be achieved by investigating the molecular similarities at sequence level between species, allowing the definition of ortholog and paralog genes. However, the spreading of sequenced genome, often endowed with still preliminary annotations, requires suitable bioinformatics to be appropriately exploited in this framework. Results We presented here a multilevel comparative approach to investigate on genome evolutionary relationships and peculiarities of two fleshy fruit species of relevant agronomic interest, Solanum lycopersicum (tomato) and Vitis vinifera (grapevine). We defined 17,823 orthology relationships between tomato and grapevine reference gene annotations. The resulting orthologs are associated with the detected paralogs in each species, permitting the definition of gene networks, useful to investigate the different relationships. The reconciliation of the compared collections in terms of an updating of the functional descriptions was also exploited. All the results were made accessible in ComParaLogs, a dedicated bioinformatics platform available at http://biosrv.cab.unina.it/comparalogs/gene/search. Conclusions The aim of the work was to suggest a reliable approach to detect all similarities of gene loci between two species based on the integration of results from different levels of information, such as the gene, the transcript and the protein sequences, overcoming possible limits due to exclusive protein versus protein comparisons. This to define reliable ortholog and paralog genes, as well as species specific gene loci in the two species, overcoming limits due to the possible draft nature of preliminary gene annotations. Moreover, reconciled functional descriptions, as well as common or peculiar enzymatic classes and protein domains from tomato and grapevine, together with the definition of species-specific gene sets after the pairwise comparisons, contributed a comprehensive set of information useful to comparatively exploit the two species gene annotations and investigate on differences between species with climacteric and non-climacteric fruits. In addition, the definition of networks of ortholog genes and of associated paralogs, and the organization of web-based interfaces for the exploration of the results, defined a friendly computational bench-work in support of comparative analyses between two species. Electronic supplementary material The online version of this article (10.1186/s12859-018-2420-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luca Ambrosino
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Valentino Ruggieri
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Center for Research in Agricultural Genomics, Cerdanyola, Barcelona, Spain
| | - Hamed Bostan
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Plants for Human Health Institute, North Carolina State University, Kannapolis, NC, USA
| | - Marco Miralto
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Nicola Vitulo
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Mohamed Zouine
- Génomique et Biotechnologie des Fruits, UMR990 INRA / INP-Toulouse, Université de Toulouse, Castanet-Tolosan, France
| | - Amalia Barone
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy
| | - Mondher Bouzayen
- Génomique et Biotechnologie des Fruits, UMR990 INRA / INP-Toulouse, Université de Toulouse, Castanet-Tolosan, France
| | - Luigi Frusciante
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy
| | - Mario Pezzotti
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Giorgio Valle
- CRIBI Biotechnology Centre, University of Padova, Padova, Italy
| | - Maria Luisa Chiusano
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy. .,Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy.
| |
Collapse
|
26
|
Thanki AS, Soranzo N, Herrero J, Haerty W, Davey RP. Aequatus: an open-source homology browser. Gigascience 2018; 7:5160135. [PMID: 30395211 PMCID: PMC6251984 DOI: 10.1093/gigascience/giy128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 09/06/2018] [Accepted: 10/17/2018] [Indexed: 11/18/2022] Open
Abstract
Background Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterization enables the identification of syntenic blocks, which can then be visualized with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. Findings We present Aequatus, an open-source web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualizations. It relies on precalculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfills the visualization aspects of Aequatus, available within the Galaxy web platform as a visualization plug-in, which can be used to visualize gene trees generated by the GeneSeqToFamily workflow.
Collapse
Affiliation(s)
- Anil S Thanki
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Javier Herrero
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
- Bill Lyons Informatics Centre, UCL Cancer Institute, 72 Huntley St., London, WC1E 6DD, UK
| | - Wilfried Haerty
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Robert P Davey
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| |
Collapse
|
27
|
Salt and drought stress and ABA responses related to bZIP genes from V. radiata and V. angularis. Gene 2018; 651:152-160. [PMID: 29425824 DOI: 10.1016/j.gene.2018.02.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 01/08/2018] [Accepted: 02/02/2018] [Indexed: 12/11/2022]
Abstract
Mung bean and adzuki bean are warm-season legumes widely cultivated in China. However, bean production in major producing regions is limited by biotic and abiotic stress, such as drought and salt stress. Basic leucine zipper (bZIP) genes play key roles in responses to various biotic and abiotic stresses. However, only several bZIP genes involved in drought and salt stress in legumes, especially Vigna radiata and Vigna angularis, have been identified. In this study, we identified 54 and 50 bZIP proteins from whole-genome sequences of V. radiata and V. angularis, respectively. First, we comprehensively surveyed the characteristics of all bZIP genes, including their gene structure, chromosome distribution and motif composition. Phylogenetic trees showed that VrbZIP and VabZIP proteins were divided into ten clades comprising nine known and one unknown subgroup. The results of the nucleotide substitution rate of the orthologous gene pairs showed that bZIP proteins have undergone strong purifying selection: V. radiata and V. angularis diverged 1.25 million years ago (mya) to 9.20 mya (average of 4.95 mya). We also found that many cis-acting regulatory elements (CAREs) involved in abiotic stress and plant hormone responses were detected in the putative promoter regions of the bZIP genes. Finally, using the quantitative real-time PCR (qRT-PCR) method, we performed expression profiling of the bZIP genes in response to drought, salt and abscisic acid (ABA). We identified several bZIP genes that may be involved in drought and salt responses. Generally, our results provided useful and rich resources of VrbZIP and VabZIP genes for the functional characterization and understanding of bZIP transcription factors (TFs) in warm-season legumes. In addition, our results revealed important and interesting data - a subset of VrbZIP and VabZIP gene expression profiles in response to drought, salt and ABA stress. These results provide gene expression evidence for the selection of candidate genes under drought and salt stress for future study.
Collapse
|
28
|
Salazar JL, Yamamoto S. Integration of Drosophila and Human Genetics to Understand Notch Signaling Related Diseases. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1066:141-185. [PMID: 30030826 PMCID: PMC6233323 DOI: 10.1007/978-3-319-89512-3_8] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Notch signaling research dates back to more than one hundred years, beginning with the identification of the Notch mutant in the fruit fly Drosophila melanogaster. Since then, research on Notch and related genes in flies has laid the foundation of what we now know as the Notch signaling pathway. In the 1990s, basic biological and biochemical studies of Notch signaling components in mammalian systems, as well as identification of rare mutations in Notch signaling pathway genes in human patients with rare Mendelian diseases or cancer, increased the significance of this pathway in human biology and medicine. In the 21st century, Drosophila and other genetic model organisms continue to play a leading role in understanding basic Notch biology. Furthermore, these model organisms can be used in a translational manner to study underlying mechanisms of Notch-related human diseases and to investigate the function of novel disease associated genes and variants. In this chapter, we first briefly review the major contributions of Drosophila to Notch signaling research, discussing the similarities and differences between the fly and human pathways. Next, we introduce several biological contexts in Drosophila in which Notch signaling has been extensively characterized. Finally, we discuss a number of genetic diseases caused by mutations in genes in the Notch signaling pathway in humans and we expand on how Drosophila can be used to study rare genetic variants associated with these and novel disorders. By combining modern genomics and state-of-the art technologies, Drosophila research is continuing to reveal exciting biology that sheds light onto mechanisms of disease.
Collapse
Affiliation(s)
- Jose L Salazar
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA
| | - Shinya Yamamoto
- Department of Molecular and Human Genetics, Baylor College of Medicine (BCM), Houston, TX, USA.
- Program in Developmental Biology, BCM, Houston, TX, USA.
- Department of Neuroscience, BCM, Houston, TX, USA.
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, USA.
| |
Collapse
|
29
|
Tobias NJ, Wolff H, Djahanschiri B, Grundmann F, Kronenwerth M, Shi YM, Simonyi S, Grün P, Shapiro-Ilan D, Pidot SJ, Stinear TP, Ebersberger I, Bode HB. Natural product diversity associated with the nematode symbionts Photorhabdus and Xenorhabdus. Nat Microbiol 2017; 2:1676-1685. [PMID: 28993611 DOI: 10.1038/s41564-017-0039-9] [Citation(s) in RCA: 105] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 09/08/2017] [Indexed: 12/25/2022]
Abstract
Xenorhabdus and Photorhabdus species dedicate a large amount of resources to the production of specialized metabolites derived from non-ribosomal peptide synthetase (NRPS) or polyketide synthase (PKS). Both bacteria undergo symbiosis with nematodes, which is followed by an insect pathogenic phase. So far, the molecular basis of this tripartite relationship and the exact roles that individual metabolites and metabolic pathways play have not been well understood. To close this gap, we have significantly expanded the database for comparative genomics studies in these bacteria. Clustering the genes encoded in the individual genomes into hierarchical orthologous groups reveals a high-resolution picture of functional evolution in this clade. It identifies groups of genes-many of which are involved in secondary metabolite production-that may account for the niche specificity of these bacteria. Photorhabdus and Xenorhabdus appear very similar at the DNA sequence level, which indicates their close evolutionary relationship. Yet, high-resolution mass spectrometry analyses reveal a huge chemical diversity in the two taxa. Molecular network reconstruction identified a large number of previously unidentified metabolite classes, including the xefoampeptides and tilivalline. Here, we apply genomic and metabolomic methods in a complementary manner to identify and elucidate additional classes of natural products. We also highlight the ability to rapidly and simultaneously identify potentially interesting bioactive products from NRPSs and PKSs, thereby augmenting the contribution of molecular biology techniques to the acceleration of natural product discovery.
Collapse
Affiliation(s)
- Nicholas J Tobias
- Fachbereich Biowissenschaften, Merck Stiftungsprofessur für Molekulare Biotechnologie, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany
| | - Hendrik Wolff
- Fachbereich Biowissenschaften, Merck Stiftungsprofessur für Molekulare Biotechnologie, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany
| | - Bardya Djahanschiri
- Department of Applied Bioinformatics, Institute for Cell Biology and Neuroscience, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany
| | - Florian Grundmann
- Fachbereich Biowissenschaften, Merck Stiftungsprofessur für Molekulare Biotechnologie, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany
| | - Max Kronenwerth
- Fachbereich Biowissenschaften, Merck Stiftungsprofessur für Molekulare Biotechnologie, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany
| | - Yi-Ming Shi
- Fachbereich Biowissenschaften, Merck Stiftungsprofessur für Molekulare Biotechnologie, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany
| | - Svenja Simonyi
- Fachbereich Biowissenschaften, Merck Stiftungsprofessur für Molekulare Biotechnologie, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany
| | - Peter Grün
- Fachbereich Biowissenschaften, Merck Stiftungsprofessur für Molekulare Biotechnologie, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany
| | - David Shapiro-Ilan
- USDA-ARS, SEA, SE Fruit and Tree Nut Research Unit, 21 Dunbar Road, Byron, GA, 31008, USA
| | - Sacha J Pidot
- Department of Microbiology and Immunology, University of Melbourne, at the Doherty Institute for Infection and Immunity, Parkville, Victoria, 3010, Australia
| | - Timothy P Stinear
- Department of Microbiology and Immunology, University of Melbourne, at the Doherty Institute for Infection and Immunity, Parkville, Victoria, 3010, Australia
| | - Ingo Ebersberger
- Department of Applied Bioinformatics, Institute for Cell Biology and Neuroscience, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany.,Senckenberg Climate and Research Centre (BIK-F), Frankfurt am Main, 60325, Germany
| | - Helge B Bode
- Fachbereich Biowissenschaften, Merck Stiftungsprofessur für Molekulare Biotechnologie, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany. .,Buchmann Institute for Molecular Life Sciences, Goethe-Universität Frankfurt, Frankfurt am Main, 60438, Germany.
| |
Collapse
|
30
|
Nascimento FF, Reis MD, Yang Z. A biologist's guide to Bayesian phylogenetic analysis. Nat Ecol Evol 2017; 1:1446-1454. [PMID: 28983516 PMCID: PMC5624502 DOI: 10.1038/s41559-017-0280-x] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 07/17/2017] [Indexed: 11/09/2022]
Abstract
Bayesian methods have become very popular in molecular phylogenetics due to the availability of user-friendly software implementing sophisticated models of evolution. However, Bayesian phylogenetic models are complex, and analyses are often carried out using default settings, which may not be appropriate. Here, we summarize the major features of Bayesian phylogenetic inference and discuss Bayesian computation using Markov chain Monte Carlo (MCMC), the diagnosis of an MCMC run, and ways of summarising the MCMC sample. We discuss the specification of the prior, the choice of the substitution model, and partitioning of the data. Finally, we provide a list of common Bayesian phylogenetic software and provide recommendations as to their use.
Collapse
Affiliation(s)
- Fabrícia F Nascimento
- Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK.
- Department of Infectious Disease Epidemiology, Imperial College London, London, W2 1PG, UK.
| | - Mario Dos Reis
- School of Biological and Chemical Sciences, Queen Mary University of London, London, E1 4NS, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
31
|
Meysman P, Titeca K, Eyckerman S, Tavernier J, Goethals B, Martens L, Valkenborg D, Laukens K. Protein complex analysis: From raw protein lists to protein interaction networks. MASS SPECTROMETRY REVIEWS 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Kevin Titeca
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- IBioStat, Hasselt University, Hasselt, Belgium
- CFP-CeProMa, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
32
|
Hellmuth M. Biologically feasible gene trees, reconciliation maps and informative triples. Algorithms Mol Biol 2017; 12:23. [PMID: 28861118 PMCID: PMC5576477 DOI: 10.1186/s13015-017-0114-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 08/16/2017] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The history of gene families-which are equivalent to event-labeled gene trees-can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are biologically feasible, that is, if there is a possible true history that would explain a given gene tree. In practice, this problem is boiled down to finding a reconciliation map-also known as DTL-scenario-between the event-labeled gene trees and a (possibly unknown) species tree. RESULTS In this contribution, we first characterize whether there is a valid reconciliation map for binary event-labeled gene trees T that contain speciation, duplication and horizontal gene transfer events and some unknown species tree S in terms of "informative" triples that are displayed in T and provide information of the topology of S. These informative triples are used to infer the unknown species tree S for T. We obtain a similar result for non-binary gene trees. To this end, however, the reconciliation map needs to be further restricted. We provide a polynomial-time algorithm to decide whether there is a species tree for a given event-labeled gene tree, and in the positive case, to construct the species tree and the respective (restricted) reconciliation map. However, informative triples as well as DTL-scenarios have their limitations when they are used to explain the biological feasibility of gene trees. While reconciliation maps imply biological feasibility, we show that the converse is not true in general. Moreover, we show that informative triples neither provide enough information to characterize "relaxed" DTL-scenarios nor non-restricted reconciliation maps for non-binary biologically feasible gene trees.
Collapse
Affiliation(s)
- Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany
- Center for Bioinformatics, Saarland University, Building E 2.1, P.O. Box 151150, 66041 Saarbrücken, Germany
| |
Collapse
|
33
|
Ambrosino L, Chiusano ML. Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships. Bioinform Biol Insights 2017; 11:1177932217690136. [PMID: 28469416 PMCID: PMC5348085 DOI: 10.1177/1177932217690136] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 12/17/2016] [Indexed: 12/17/2022] Open
Abstract
The detection of orthologs is a key approach in genomics, useful to understand gene evolution and phylogenetic relationships and essential for gene function prediction. However, a reliable annotation of the encoded protein regions is still a limiting aspect in genomics, mainly due to the lack of confirmatory experimental evidence at proteome level. Nevertheless, the current ortholog collections are generally based on protein sequence comparisons, in addition to the availability of large transcriptome sequence collections. We developed Transcriptologs, a method for the prediction of orthologs based on similarities of translated fragments from messenger RNAs of 2 species. We implemented a procedure to extend BLAST-based alignments and to define orthologs based on the Bidirectional Best Hit approach. Results from a test case on Arabidopsis thaliana and Sorghum bicolor transcript collections revealed in some cases outperformance of Transcriptologs in comparison with a classical protein-based analysis in terms of alignment quality, revealing similarities otherwise not detectable.
Collapse
Affiliation(s)
- Luca Ambrosino
- Department of Agriculture, University of Naples "Federico II," Portici, Italy
| | - Maria Luisa Chiusano
- Department of Agriculture, University of Naples "Federico II," Portici, Italy.,Research Infrastructures for Marine Biological Resources (RIMAR), Stazione Zoologica Anton Dohrn Napoli, Naples, Italy
| |
Collapse
|
34
|
Song H, Wang P, Li C, Han S, Zhao C, Xia H, Bi Y, Guo B, Zhang X, Wang X. Comparative analysis of NBS-LRR genes and their response to Aspergillus flavus in Arachis. PLoS One 2017; 12:e0171181. [PMID: 28158222 PMCID: PMC5291535 DOI: 10.1371/journal.pone.0171181] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 01/17/2017] [Indexed: 12/31/2022] Open
Abstract
Studies have demonstrated that nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes respond to pathogen attack in plants. Characterization of NBS-LRR genes in peanut is not well documented. The newly released whole genome sequences of Arachis duranensis and Arachis ipaënsis have allowed a global analysis of this important gene family in peanut to be conducted. In this study, we identified 393 (AdNBS) and 437 (AiNBS) NBS-LRR genes from A. duranensis and A. ipaënsis, respectively, using bioinformatics approaches. Full-length sequences of 278 AdNBS and 303 AiNBS were identified. Fifty-one orthologous, four AdNBS paralogous, and six AiNBS paralogous gene pairs were predicted. All paralogous gene pairs were located in the same chromosomes, indicating that tandem duplication was the most likely mechanism forming these paralogs. The paralogs mainly underwent purifying selection, but most LRR 8 domains underwent positive selection. More gene clusters were found in A. ipaënsis than in A. duranensis, possibly owing to tandem duplication events occurring more frequently in A. ipaënsis. The expression profile of NBS-LRR genes was different between A. duranensis and A. hypogaea after Aspergillus flavus infection. The up-regulated expression of NBS-LRR in A. duranensis was continuous, while these genes responded to the pathogen temporally in A. hypogaea.
Collapse
Affiliation(s)
- Hui Song
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Pengfei Wang
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Changsheng Li
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
- College of Life Science, Shandong Normal University, Jinan, China
| | - Suoyi Han
- Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Chuanzhi Zhao
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Han Xia
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Yuping Bi
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
| | - Baozhu Guo
- Crop Protection and Management Research Unit, USDA-ARS, Tifton, Georgia, United States of America
| | - Xinyou Zhang
- Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Xingjun Wang
- Biotechnology Research Center, Shandong Academy of Agricultural Sciences; Shandong Provincial Key laboratory of Crop Genetic Improvement, Ecology and Physiology, Jinan, China
- College of Life Science, Shandong Normal University, Jinan, China
| |
Collapse
|
35
|
Abstract
Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
36
|
Abstract
BACKGROUND Orthologs inference is the starting point of most comparative genomics studies, and a plethora of methods have been designed in the last decade to address this challenging task. In this paper we focus on the problems of deciding consistency with a species tree (known or not) of a partial set of orthology/paralogy relationships [Formula: see text] on a collection of n genes. RESULTS We give the first polynomial algorithm - more precisely a O(n 3) time algorithm - to decide whether [Formula: see text] is consistent, even when the species tree is unknown. We also investigate a biologically meaningful optimization version of these problems, in which we wish to minimize the number of duplication events; unfortunately, we show that all these optimization problems are NP-hard and are unlikely to have good polynomial time approximation algorithms. CONCLUSIONS Our polynomial algorithm for checking consistency has been implemented in Python and is available at https://github.com/UdeM-LBIT/OrthoPara-ConstraintChecker .
Collapse
Affiliation(s)
- Mark Jones
- LIRMM, CNRS, Université de Montpellier, Montpellier, France
| | | | | |
Collapse
|
37
|
Abstract
The number of large-scale genomics projects is increasing due to the availability of affordable high-throughput sequencing (HTS) technologies. The use of HTS for bacterial infectious disease research is attractive because one whole-genome sequencing (WGS) run can replace multiple assays for bacterial typing, molecular epidemiology investigations, and more in-depth pathogenomic studies. The computational resources and bioinformatics expertise required to accommodate and analyze the large amounts of data pose new challenges for researchers embarking on genomics projects for the first time. Here, we present a comprehensive overview of a bacterial genomics projects from beginning to end, with a particular focus on the planning and computational requirements for HTS data, and provide a general understanding of the analytical concepts to develop a workflow that will meet the objectives and goals of HTS projects.
Collapse
|
38
|
Song H, Wang P, Lin JY, Zhao C, Bi Y, Wang X. Genome-Wide Identification and Characterization of WRKY Gene Family in Peanut. FRONTIERS IN PLANT SCIENCE 2016; 7:534. [PMID: 27200012 PMCID: PMC4845656 DOI: 10.3389/fpls.2016.00534] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 04/04/2016] [Indexed: 05/18/2023]
Abstract
WRKY, an important transcription factor family, is widely distributed in the plant kingdom. Many reports focused on analysis of phylogenetic relationship and biological function of WRKY protein at the whole genome level in different plant species. However, little is known about WRKY proteins in the genome of Arachis species and their response to salicylic acid (SA) and jasmonic acid (JA) treatment. In this study, we identified 77 and 75 WRKY proteins from the two wild ancestral diploid genomes of cultivated tetraploid peanut, Arachis duranensis and Arachis ipaënsis, using bioinformatics approaches. Most peanut WRKY coding genes were located on A. duranensis chromosome A6 and A. ipaënsis chromosome B3, while the least number of WRKY genes was found in chromosome 9. The WRKY orthologous gene pairs in A. duranensis and A. ipaënsis chromosomes were highly syntenic. Our analysis indicated that segmental duplication events played a major role in AdWRKY and AiWRKY genes, and strong purifying selection was observed in gene duplication pairs. Furthermore, we translate the knowledge gained from the genome-wide analysis result of wild ancestral peanut to cultivated peanut to reveal that gene activities of specific cultivated peanut WRKY gene were changed due to SA and JA treatment. Peanut WRKY7, 8 and 13 genes were down-regulated, whereas WRKY1 and 12 genes were up-regulated with SA and JA treatment. These results could provide valuable information for peanut improvement.
Collapse
Affiliation(s)
- Hui Song
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| | - Pengfei Wang
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| | - Jer-Young Lin
- Department of Molecular, Cell, and Developmental Biology, University of California, Los AngelesLos Angeles, CA, USA
| | - Chuanzhi Zhao
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| | - Yuping Bi
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| | - Xingjun Wang
- Shandong Provincial Key Laboratory of Crop Genetic Improvement, Ecology and Physiology, Biotechnology Research Center, Shandong Academy of Agricultural SciencesJinan, China
| |
Collapse
|
39
|
Hellmuth M, Wieseke N. From Sequence Data Including Orthologs, Paralogs, and Xenologs to Gene and Species Trees. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_21] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
40
|
Frith MC, Kawaguchi R. Split-alignment of genomes finds orthologies more accurately. Genome Biol 2015; 16:106. [PMID: 25994148 PMCID: PMC4464727 DOI: 10.1186/s13059-015-0670-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 05/08/2015] [Indexed: 04/29/2023] Open
Abstract
We present a new pair-wise genome alignment method, based on a simple concept of finding an optimal set of local alignments. It gains accuracy by not masking repeats, and by using a statistical model to quantify the (un)ambiguity of each alignment part. Compared to previous animal genome alignments, it aligns thousands of locations differently and with much higher similarity, strongly suggesting that the previous alignments are non-orthologous. The previous methods suffer from an overly-strong assumption of long un-rearranged blocks. The new alignments should help find interesting and unusual features, such as fast-evolving elements and micro-rearrangements, which are confounded by alignment errors.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| | - Risa Kawaguchi
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan. .,Department of Computational Biology, Faculty of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan.
| |
Collapse
|
41
|
Wang Y, Coleman-Derr D, Chen G, Gu YQ. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res 2015; 43:W78-84. [PMID: 25964301 PMCID: PMC4489293 DOI: 10.1093/nar/gkv487] [Citation(s) in RCA: 313] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Accepted: 05/02/2015] [Indexed: 01/19/2023] Open
Abstract
Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn.
Collapse
Affiliation(s)
- Yi Wang
- USDA-ARS, Western Regional Research Center, Crop Improvement and Genetics Research Unit, Albany, CA 94710, USA Department of Plant Sciences, University of California, Davis, CA 95616, USA Bioengineering College, Campus A, Chongqing University, Chongqing 400030, China
| | | | - Guoping Chen
- Bioengineering College, Campus A, Chongqing University, Chongqing 400030, China
| | - Yong Q Gu
- USDA-ARS, Western Regional Research Center, Crop Improvement and Genetics Research Unit, Albany, CA 94710, USA
| |
Collapse
|
42
|
Abstract
Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.
Collapse
Affiliation(s)
| | - Nives Škunca
- ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | | | - Christophe Dessimoz
- University College London, London, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
43
|
Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales. Life (Basel) 2015; 5:818-40. [PMID: 25764277 PMCID: PMC4390880 DOI: 10.3390/life5010818] [Citation(s) in RCA: 149] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 02/25/2015] [Accepted: 02/28/2015] [Indexed: 11/18/2022] Open
Abstract
With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for orthology identification combined with extensive manual curation, including incorporation of the results of several completed and ongoing research projects in archaeal genomics. A new level of classification is introduced, superclusters that unit two or more arCOGs and more completely reflect gene family evolution than individual, disconnected arCOGs. Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality. In addition to their utility for genome annotation, arCOGs also are a platform for phylogenomic analysis. We explore this aspect of arCOGs by performing a phylogenomic study of the Thermococci that are traditionally viewed as the basal branch of the Euryarchaeota. The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria. The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea.
Collapse
|
44
|
BMX: a tool for computing bacterial phyletic composition from orthologous maps. BMC Res Notes 2015; 8:51. [PMID: 25756192 PMCID: PMC4342873 DOI: 10.1186/s13104-015-1017-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 02/13/2015] [Indexed: 11/10/2022] Open
Abstract
Background New sequencing technologies have made it possible to explore genetic diversity at higher resolution in microbial populations. However, our understanding evolutionary relationships, and comparison of closely and distantly related bacterial genomes from these massive datasets remains a formidable challenge. Numerous clustering algorithms that group genomic data based on homology have been developed, but new tools are still required to analyse the resultant orthologous maps to understand functional genetic similarities and their phyletic patterns (patterns of presence of absence of genes). Findings Bacterial Makeup eXplorer (BMX) implements an algorithm that swiftly and efficiently facilitates the determination of the number of orthologs in prokaryotic genomes employing a reference free approach, which may be further exploited to transfer of gene annotations. BMX is able to integrate orthologous maps of highly diverse prokaryotic genomes therefore making it possible to perform robust and scalable, multi-platform, high quality annotation transfer and gene-by-gene composition assessment method. In addition results are presented in the form of publication quality figures. Conclusions BMX allows extensive data analysis of orthologous map databases to understand underlying biological relationships. Furthermore, BMX is portable across different platforms and can be installed easily. In summary, BMX allows higher resolution analysis of genomes from diverse bacterial populations Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1017-z) contains supplementary material, which is available to authorized users.
Collapse
|
45
|
Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference. G3-GENES GENOMES GENETICS 2015; 5:629-38. [PMID: 25711833 PMCID: PMC4390578 DOI: 10.1534/g3.115.017095] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Ortholog detection (OD) is a lynchpin of most statistical methods in comparative genomics. This task involves accurately identifying genes across species that descend from a common ancestral sequence. OD methods comprise a wide variety of approaches, each with their own benefits and costs under a variety of evolutionary and practical scenarios. In this article, we examine the proteomes of ten mammals by using four methodologically distinct, rigorously filtered OD methods. In head-to-head comparisons, we find that these algorithms significantly outperform one another for 38–45% of the genes analyzed. We leverage this high complementarity through the development MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization, the first tool for integrating methodologically diverse OD methods. Relative to the four methods examined, MOSAIC more than quintuples the number of alignments for which all species are present while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, this improvement in alignment quality yields more confidently aligned sites and higher levels of overall conservation, while simultaneously detecting of up to 180% more positively selected sites. We close by highlighting a MOSAIC-specific positively selected sites near the active site of TPSAB1, an enzyme linked to asthma, heart disease, and irritable bowel disease. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC.
Collapse
|
46
|
Vinuesa P, Contreras-Moreira B. Robust identification of orthologues and paralogues for microbial pan-genomics using GET_HOMOLOGUES: a case study of pIncA/C plasmids. Methods Mol Biol 2015; 1231:203-232. [PMID: 25343868 DOI: 10.1007/978-1-4939-1720-4_14] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
GET_HOMOLOGUES is an open-source software package written in Perl and R to define robust core- and pan-genomes by computing consensus clusters of orthologous gene families from whole-genome sequences using the bidirectional best-hit, COGtriangles, and OrthoMCL clustering algorithms. The granularity of the clusters can be fine-tuned by a user-configurable filtering strategy based on a combination of blastp pairwise alignment parameters, hmmscan-based scanning of Pfam domain composition of the proteins in each cluster, and a partial synteny criterion. We present detailed protocols to fit exponential and binomial mixture models to estimate core- and pan-genome sizes, compute pan-genome trees from the pan-genome matrix using a parsimony criterion, analyze and graphically represent the pan-genome structure, and identify lineage-specific gene families for the 12 complete pIncA/C plasmids currently available in NCBI's RefSeq. The software package, license, and detailed user manual can be downloaded for free for academic use from two mirrors: http://www.eead.csic.es/compbio/soft/gethoms.php and http://maya.ccg.unam.mx/soft/gethoms.php.
Collapse
Affiliation(s)
- Pablo Vinuesa
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, Cuernavaca, Morelos, Mexico,
| | | |
Collapse
|
47
|
Haag ES, Thomas CG. Fundamentals of Comparative Genome Analysis in Caenorhabditis Nematodes. Methods Mol Biol 2015; 1327:11-21. [PMID: 26423964 DOI: 10.1007/978-1-4939-2842-2_2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The genome of the nematode Caenorhabditis elegans was the first of any animal to be sequenced completely, and it remains the "gold standard" for completeness and annotations. Even before the C. elegans genome was completed, however, biologists began examining the generality of its features in the genomes of other Caenorhabditis species. With many such genomes now sequenced and available via WormBase, C. elegans researchers are often confronted with how to interpret comparative genomic data. In this article, we present practical approaches to addressing several common issues, including possible sources of error in homology annotations, the often complex relationships between sequence similarity, orthology, paralogy, and gene family evolution, the impact of sexual mode on genome assemblies and content, and the determination and use of synteny as a tool.
Collapse
Affiliation(s)
- Eric S Haag
- Department of Biology, University of Maryland, 1210 Biology-Psychology Building, College Park, MD, 20742, USA.
| | - Cristel G Thomas
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada, M5S 3B2
| |
Collapse
|
48
|
Jeffares DC, Tomiczek B, Sojo V, dos Reis M. A beginners guide to estimating the non-synonymous to synonymous rate ratio of all protein-coding genes in a genome. Methods Mol Biol 2015; 1201:65-90. [PMID: 25388108 DOI: 10.1007/978-1-4939-1438-8_4] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The ratio of non-synonymous to synonymous substitutions (dN/dS) is a useful measure of the strength and mode of natural selection acting on protein-coding genes. It is widely used to study patterns of selection on protein genes on a genomic scale-from the small genomes of viruses, bacteria, and parasitic eukaryotes to the largest eukaryotic genomes. In this chapter we describe all the steps necessary to calculate the dN/dS of all the genes using at least two genomes. We include a brief discussion on assigning orthologs, and of codon-aware alignment of orthologs. We then describe how to use the CODEML program of the PAML package for phylogenetic analysis to calculate the dN/dS and how to perform some statistical tests for positive selection. We then outline some methods for interpreting output and describe how one may use this data to make discoveries about the biology of your species. Finally, as a worked example we show all the steps we used to calculate dN/dS for 3,261 orthologs from six Plasmodium species, including tests for adaptive evolution (see worked_example.pdf).
Collapse
Affiliation(s)
- Daniel C Jeffares
- Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK,
| | | | | | | |
Collapse
|
49
|
Shyr C, Tarailo-Graovac M, Gottlieb M, Lee JJY, van Karnebeek C, Wasserman WW. FLAGS, frequently mutated genes in public exomes. BMC Med Genomics 2014; 7:64. [PMID: 25466818 PMCID: PMC4267152 DOI: 10.1186/s12920-014-0064-y] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 10/24/2014] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Dramatic improvements in DNA-sequencing technologies and computational analyses have led to wide use of whole exome sequencing (WES) to identify the genetic basis of Mendelian disorders. More than 180 novel rare-disease-causing genes with Mendelian inheritance patterns have been discovered through sequencing the exomes of just a few unrelated individuals or family members. As rare/novel genetic variants continue to be uncovered, there is a major challenge in distinguishing true pathogenic variants from rare benign mutations. METHODS We used publicly available exome cohorts, together with the dbSNP database, to derive a list of genes (n = 100) that most frequently exhibit rare (<1%) non-synonymous/splice-site variants in general populations. We termed these genes FLAGS for FrequentLy mutAted GeneS and analyzed their properties. RESULTS Analysis of FLAGS revealed that these genes have significantly longer protein coding sequences, a greater number of paralogs and display less evolutionarily selective pressure than expected. FLAGS are more frequently reported in PubMed clinical literature and more frequently associated with diseased phenotypes compared to the set of human protein-coding genes. We demonstrated an overlap between FLAGS and the rare-disease causing genes recently discovered through WES studies (n = 10) and the need for replication studies and rigorous statistical and biological analyses when associating FLAGS to rare disease. Finally, we showed how FLAGS are applied in disease-causing variant prioritization approach on exome data from a family affected by an unknown rare genetic disorder. CONCLUSIONS We showed that some genes are frequently affected by rare, likely functional variants in general population, and are frequently observed in WES studies analyzing diverse rare phenotypes. We found that the rate at which genes accumulate rare mutations is beneficial information for prioritizing candidates. We provided a ranking system based on the mutation accumulation rates for prioritizing exome-captured human genes, and propose that clinical reports associating any disease/phenotype to FLAGS be evaluated with extra caution.
Collapse
Affiliation(s)
- Casper Shyr
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada. .,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.
| | - Maja Tarailo-Graovac
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada.
| | - Michael Gottlieb
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada.
| | - Jessica J Y Lee
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Genome Science and Technology Graduate Program, University of British Columbia, Vancouver, BC, Canada.
| | - Clara van Karnebeek
- Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada. .,Division of Biochemical Diseases, BC Children's Hospital, Vancouver, BC, Canada. .,Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada.
| |
Collapse
|
50
|
Altenhoff AM, Škunca N, Glover N, Train CM, Sueki A, Piližota I, Gori K, Tomiczek B, Müller S, Redestig H, Gonnet GH, Dessimoz C. The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res 2014; 43:D240-9. [PMID: 25399418 PMCID: PMC4383958 DOI: 10.1093/nar/gku1158] [Citation(s) in RCA: 177] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes). In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for 'client-side' orthology prediction. OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- University College London, Gower Street, London WC1E 6BT, UK Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Nives Škunca
- University College London, Gower Street, London WC1E 6BT, UK Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Natasha Glover
- University College London, Gower Street, London WC1E 6BT, UK Institut National de la Recherche Agronomique (INRA) UMR1095, Genetics, Diversity and Ecophysiology of Cereals, 5 Chemin de Beaulieu, 63039 Clermont-Ferrand, France Bayer CropScience NV, Technologiepark 38, 9052 Gent, Belgium
| | | | - Anna Sueki
- University College London, Gower Street, London WC1E 6BT, UK
| | - Ivana Piližota
- University College London, Gower Street, London WC1E 6BT, UK
| | - Kevin Gori
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Steven Müller
- University College London, Gower Street, London WC1E 6BT, UK
| | | | - Gaston H Gonnet
- Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Christophe Dessimoz
- University College London, Gower Street, London WC1E 6BT, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|