1
|
Fu X. How deep can we decipher protein evolution with deep learning models. PATTERNS (NEW YORK, N.Y.) 2024; 5:101043. [PMID: 39233697 PMCID: PMC11368669 DOI: 10.1016/j.patter.2024.101043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
Evolutionary-based machine learning models have emerged as a fascinating approach to mapping the landscape for protein evolution. Lian et al. demonstrated that evolution-based deep generative models, specifically variational autoencoders, can organize SH3 homologs in a hierarchical latent space, effectively distinguishing the specific Sho1SH3 domains.
Collapse
Affiliation(s)
- Xiaozhi Fu
- Department of Life Sciences, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
| |
Collapse
|
2
|
Rahiminejad S, De Sanctis B, Pevzner P, Mushegian A. Synthetic lethality and the minimal genome size problem. mSphere 2024; 9:e0013924. [PMID: 38904396 PMCID: PMC11288024 DOI: 10.1128/msphere.00139-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open
Abstract
Gene knockout studies suggest that ~300 genes in a bacterial genome and ~1,100 genes in a yeast genome cannot be deleted without loss of viability. These single-gene knockout experiments do not account for negative genetic interactions, when two or more genes can each be deleted without effect, but their joint deletion is lethal. Thus, large-scale single-gene deletion studies underestimate the size of a minimal gene set compatible with cell survival. In yeast Saccharomyces cerevisiae, the viability of all possible deletions of gene pairs (2-tuples), and of some deletions of gene triplets (3-tuples), has been experimentally tested. To estimate the size of a yeast minimal genome from that data, we first established that finding the size of a minimal gene set is equivalent to finding the minimum vertex cover in the lethality (hyper)graph, where the vertices are genes and (hyper)edges connect k-tuples of genes whose joint deletion is lethal. Using the Lovász-Johnson-Chvatal greedy approximation algorithm, we computed the minimum vertex cover of the synthetic-lethal 2-tuples graph to be 1,723 genes. We next simulated the genetic interactions in 3-tuples, extrapolating from the existing triplet sample, and again estimated minimum vertex covers. The size of a minimal gene set in yeast rapidly approaches the size of the entire genome even when considering only synthetic lethalities in k-tuples with small k. In contrast, several studies reported successful experimental reductions of yeast and bacterial genomes by simultaneous deletions of hundreds of genes, without eliciting synthetic lethality. We discuss possible reasons for this apparent contradiction.IMPORTANCEHow can we estimate the smallest number of genes sufficient for a unicellular organism to survive on a rich medium? One approach is to remove genes one at a time and count how many of such deletion strains are unable to grow. However, the single-gene knockout data are insufficient, because joint gene deletions may result in negative genetic interactions, also known as synthetic lethality. We used a technique from graph theory to estimate the size of minimal yeast genome from partial data on synthetic lethality. The number of potential synthetic lethal interactions grows very fast when multiple genes are deleted, revealing a paradoxical contrast with the experimental reductions of yeast genome by ~100 genes, and of bacterial genomes by several hundreds of genes.
Collapse
Affiliation(s)
- Sara Rahiminejad
- Department of Bioengineering, University of California—San Diego, La Jolla, California, USA
| | - Bianca De Sanctis
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Ecology and Evolutionary Biology, University of California—Santa Cruz, Santa Cruz, California, USA
| | - Pavel Pevzner
- Department of Computer Science and Engineering, University of California—San Diego, La Jolla, California, USA
| | - Arcady Mushegian
- Molecular and Cellular Biosciences Division, National Science Foundation, Alexandria, Virginia, USA
- Clare Hall College, Cambridge, United Kingdom
| |
Collapse
|
3
|
Campos LRS, Trefflich S, Morais DAA, Imparato DO, Chagas VS, Albanus RD, Dalmolin RJS, Castro MAA. Bridge: A New Algorithm for Rooting Orthologous Genes in Large-Scale Evolutionary Analyses. Mol Biol Evol 2024; 41:msae019. [PMID: 38306290 PMCID: PMC10873778 DOI: 10.1093/molbev/msae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 01/21/2024] [Accepted: 01/29/2024] [Indexed: 02/04/2024] Open
Abstract
Orthology information has been used for searching patterns in high-dimensional data, allowing transferring functional information between species. The key concept behind this strategy is that orthologous genes share ancestry to some extent. While reconstructing the history of a single gene is feasible with the existing computational resources, the reconstruction of entire biological systems remains challenging. In this study, we present Bridge, a new algorithm designed to infer the evolutionary root of orthologous genes in large-scale evolutionary analyses. The Bridge algorithm infers the evolutionary root of a given gene based on the distribution of its orthologs in a species tree. The Bridge algorithm is implemented in R and can be used either to assess genetic changes across the evolutionary history of orthologous groups or to infer the onset of specific traits in a biological system.
Collapse
Affiliation(s)
- Leonardo R S Campos
- Bioinformatics Multidisciplinary Environment–BioME, IMD, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Sheyla Trefflich
- Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba 81520-260, Brazil
| | - Diego A A Morais
- Bioinformatics Multidisciplinary Environment–BioME, IMD, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Danilo O Imparato
- Bioinformatics Multidisciplinary Environment–BioME, IMD, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Vinicius S Chagas
- Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba 81520-260, Brazil
| | | | - Rodrigo J S Dalmolin
- Bioinformatics Multidisciplinary Environment–BioME, IMD, Federal University of Rio Grande do Norte, Natal, Brazil
- Department of Biochemistry, CB, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Mauro A A Castro
- Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba 81520-260, Brazil
| |
Collapse
|
4
|
Altenhoff AM, Warwick Vesztrocy A, Bernard C, Train CM, Nicheperovich A, Prieto Baños S, Julca I, Moi D, Nevers Y, Majidian S, Dessimoz C, Glover NM. OMA orthology in 2024: improved prokaryote coverage, ancestral and extant GO enrichment, a revamped synteny viewer and more in the OMA Ecosystem. Nucleic Acids Res 2024; 52:D513-D521. [PMID: 37962356 PMCID: PMC10767875 DOI: 10.1093/nar/gkad1020] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/17/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023] Open
Abstract
In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Charles Bernard
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Clement-Marie Train
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Alina Nicheperovich
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Silvia Prieto Baños
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Irene Julca
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - David Moi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
5
|
Liu Q, Ye L, Li M, Wang Z, Xiong G, Ye Y, Tu T, Schwarzacher T, Heslop-Harrison JSP. Genome-wide expansion and reorganization during grass evolution: from 30 Mb chromosomes in rice and Brachypodium to 550 Mb in Avena. BMC PLANT BIOLOGY 2023; 23:627. [PMID: 38062402 PMCID: PMC10704644 DOI: 10.1186/s12870-023-04644-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023]
Abstract
BACKGROUND The BOP (Bambusoideae, Oryzoideae, and Pooideae) clade of the Poaceae has a common ancestor, with similarities to the genomes of rice, Oryza sativa (2n = 24; genome size 389 Mb) and Brachypodium, Brachypodium distachyon (2n = 10; 271 Mb). We exploit chromosome-scale genome assemblies to show the nature of genomic expansion, structural variation, and chromosomal rearrangements from rice and Brachypodium, to diploids in the tribe Aveneae (e.g., Avena longiglumis, 2n = 2x = 14; 3,961 Mb assembled to 3,850 Mb in chromosomes). RESULTS Most of the Avena chromosome arms show relatively uniform expansion over the 10-fold to 15-fold genome-size increase. Apart from non-coding sequence diversification and accumulation around the centromeres, blocks of genes are not interspersed with blocks of repeats, even in subterminal regions. As in the tribe Triticeae, blocks of conserved synteny are seen between the analyzed species with chromosome fusion, fission, and nesting (insertion) events showing deep evolutionary conservation of chromosome structure during genomic expansion. Unexpectedly, the terminal gene-rich chromosomal segments (representing about 50 Mb) show translocations between chromosomes during speciation, with homogenization of genome-specific repetitive elements within the tribe Aveneae. Newly-formed intergenomic translocations of similar extent are found in the hexaploid A. sativa. CONCLUSIONS The study provides insight into evolutionary mechanisms and speciation in the BOP clade, which is valuable for measurement of biodiversity, development of a clade-wide pangenome, and exploitation of genomic diversity through breeding programs in Poaceae.
Collapse
Affiliation(s)
- Qing Liu
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.
- South China National Botanical Garden, Guangzhou, 510650, China.
- Center for Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, 510650, China.
| | - Lyuhan Ye
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Mingzhi Li
- Bio&Data Biotechnologies Co. Ltd, Guangzhou, 510663, China
| | - Ziwei Wang
- Henry Fok School of Biology and Agriculture, Shaoguan University, Shaoguan, 512005, China
| | - Gui Xiong
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yushi Ye
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
- South China National Botanical Garden, Guangzhou, 510650, China
| | - Tieyao Tu
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
- South China National Botanical Garden, Guangzhou, 510650, China
- Center for Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Trude Schwarzacher
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
- Department of Genetics and Genome Biology, Institute for Environmental Futures, University of Leicester, Leicester, LE1 7RH, UK
| | - John Seymour Pat Heslop-Harrison
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.
- Department of Genetics and Genome Biology, Institute for Environmental Futures, University of Leicester, Leicester, LE1 7RH, UK.
| |
Collapse
|
6
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
7
|
Ceron-Noriega A, Schoonenberg VAC, Butter F, Levin M. AlexandrusPS: A User-Friendly Pipeline for the Automated Detection of Orthologous Gene Clusters and Subsequent Positive Selection Analysis. Genome Biol Evol 2023; 15:evad187. [PMID: 37831426 PMCID: PMC10612477 DOI: 10.1093/gbe/evad187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 09/26/2023] [Accepted: 10/06/2023] [Indexed: 10/14/2023] Open
Abstract
The detection of adaptive selection in a system approach considering all protein-coding genes allows for the identification of mechanisms and pathways that enabled adaptation to different environments. Currently, available programs for the estimation of positive selection signals can be divided into two groups. They are either easy to apply but can analyze only one gene family at a time, restricting system analysis; or they can handle larger cohorts of gene families, but require considerable prerequisite data such as orthology associations, codon alignments, phylogenetic trees, and proper configuration files. All these steps require extensive computational expertise, restricting this endeavor to specialists. Here, we introduce AlexandrusPS, a high-throughput pipeline that overcomes technical challenges when conducting transcriptome-wide positive selection analyses on large sets of nucleotide and protein sequences. The pipeline streamlines 1) the execution of an accurate orthology prediction as a precondition for positive selection analysis, 2) preparing and organizing configuration files for CodeML, 3) performing positive selection analysis using CodeML, and 4) generating an output that is easy to interpret, including all maximum likelihood and log-likelihood test results. The only input needed from the user is the CDS and peptide FASTA files of proteins of interest. The pipeline is provided in a Docker image, requiring no program or module installation, enabling the application of the pipeline in any computing environment. AlexandrusPS and its documentation are available via GitHub (https://github.com/alejocn5/AlexandrusPS).
Collapse
Affiliation(s)
- Alejandro Ceron-Noriega
- Institute of Molecular Biology (IMB), Quantitative Proteomics, Mainz, Germany
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Department of Human Genetics, Mainz, Germany
| | - Vivien A C Schoonenberg
- Institute of Molecular Biology (IMB), Quantitative Proteomics, Mainz, Germany
- Present address: Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
- Present address: Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, Massachusetts, USA.
| | - Falk Butter
- Institute of Molecular Biology (IMB), Quantitative Proteomics, Mainz, Germany
- Institute of Molecular Virology and Cell Biology, Friedrich-Loeffler-Institute, Greifswald, Germany
| | - Michal Levin
- Institute of Molecular Biology (IMB), Quantitative Proteomics, Mainz, Germany
| |
Collapse
|
8
|
Rocha JJ, Jayaram SA, Stevens TJ, Muschalik N, Shah RD, Emran S, Robles C, Freeman M, Munro S. Functional unknomics: Systematic screening of conserved genes of unknown function. PLoS Biol 2023; 21:e3002222. [PMID: 37552676 PMCID: PMC10409296 DOI: 10.1371/journal.pbio.3002222] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/27/2023] [Indexed: 08/10/2023] Open
Abstract
The human genome encodes approximately 20,000 proteins, many still uncharacterised. It has become clear that scientific research tends to focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected. To address this, we have developed a publicly available and customisable "Unknome database" that ranks proteins based on how little is known about them. We applied RNA interference (RNAi) in Drosophila to 260 unknown genes that are conserved between flies and humans. Knockdown of some genes resulted in loss of viability, and functional screening of the rest revealed hits for fertility, development, locomotion, protein quality control, and resilience to stress. CRISPR/Cas9 gene disruption validated a component of Notch signalling and 2 genes contributing to male fertility. Our work illustrates the importance of poorly understood genes, provides a resource to accelerate future research, and highlights a need to support database curation to ensure that misannotation does not erode our awareness of our own ignorance.
Collapse
Affiliation(s)
- João J. Rocha
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Tim J. Stevens
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Rajen D. Shah
- Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Sahar Emran
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Cristina Robles
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Matthew Freeman
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Sean Munro
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
9
|
Dosch J, Bergmann H, Tran V, Ebersberger I. FAS: assessing the similarity between proteins using multi-layered feature architectures. Bioinformatics 2023; 39:btad226. [PMID: 37084276 PMCID: PMC10185405 DOI: 10.1093/bioinformatics/btad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/23/2023] [Accepted: 04/13/2023] [Indexed: 04/23/2023] Open
Abstract
MOTIVATION Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations. RESULTS Here, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximize the pair-wise architecture similarity. In a large-scale evaluation on more than 10 000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications. AVAILABILITY AND IMPLEMENTATION FAS is available as python package: https://pypi.org/project/greedyFAS/.
Collapse
Affiliation(s)
- Julian Dosch
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Holger Bergmann
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Vinh Tran
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, 60325, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt, 60325, Germany
| |
Collapse
|
10
|
Mize TJ, Funkhouser SA, Buck JM, Stitzel JA, Ehringer MA, Evans LM. Testing Association of Previously Implicated Gene Sets and Gene-Networks in Nicotine Exposed Mouse Models with Human Smoking Phenotypes. Nicotine Tob Res 2023; 25:1030-1038. [PMID: 36444815 PMCID: PMC10077928 DOI: 10.1093/ntr/ntac269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 08/15/2022] [Accepted: 11/23/2022] [Indexed: 11/30/2022]
Abstract
INTRODUCTION Smoking behaviors are partly heritable, yet the genetic and environmental mechanisms underlying smoking phenotypes are not fully understood. Developmental nicotine exposure (DNE) is a significant risk factor for smoking and leads to gene expression changes in mouse models; however, it is unknown whether the same genes whose expression is impacted by DNE are also those underlying smoking genetic liability. We examined whether genes whose expression in D1-type striatal medium spiny neurons due to DNE in the mouse are also associated with human smoking behaviors. METHODS Specifically, we assessed whether human orthologs of mouse-identified genes, either individually or as a set, were genetically associated with five human smoking traits using MAGMA and S-LDSC while implementing a novel expression-based gene-SNP annotation methodology. RESULTS We found no strong evidence that these genes sets were more strongly associated with smoking behaviors than the rest of the genome, but ten of these individual genes were significantly associated with three of the five human smoking traits examined (p < 2.5e-6). Three of these genes have not been reported previously and were discovered only when implementing the expression-based annotation. CONCLUSIONS These results suggest the genes whose expression is impacted by DNE in mice are largely distinct from those contributing to smoking genetic liability in humans. However, examining a single mouse neuronal cell type may be too fine a resolution for comparison, suggesting that experimental manipulation of nicotine consumption, reward, or withdrawal in mice may better capture genes related to the complex genetics of human tobacco use. IMPLICATIONS Genes whose expression is impacted by DNE in mouse D1-type striatal medium spiny neurons were not found to be, as a whole, more strongly associated with human smoking behaviors than the rest of the genome, though ten individual mouse-identified genes were associated with human smoking traits. This suggests little overlap between the genetic mechanisms impacted by DNE and those influencing heritable liability to smoking phenotypes in humans. Further research is warranted to characterize how developmental nicotine exposure paradigms in mice can be translated to understand nicotine use in humans and their heritable effects on smoking.
Collapse
Affiliation(s)
- Travis J Mize
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA
| | - Scott A Funkhouser
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
| | - Jordan M Buck
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
- Department of Integrative Physiology, University of Colorado, Boulder, CO, USA
| | - Jerry A Stitzel
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
- Department of Integrative Physiology, University of Colorado, Boulder, CO, USA
| | - Marissa A Ehringer
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
- Department of Integrative Physiology, University of Colorado, Boulder, CO, USA
| | - Luke M Evans
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA
| |
Collapse
|
11
|
One genome, multiple phenotypes: decoding the evolution and mechanisms of environmentally induced developmental plasticity in insects. Biochem Soc Trans 2023; 51:675-689. [PMID: 36929376 DOI: 10.1042/bst20210995] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 02/16/2023] [Accepted: 02/21/2023] [Indexed: 03/18/2023]
Abstract
Plasticity in developmental processes gives rise to remarkable environmentally induced phenotypes. Some of the most striking and well-studied examples of developmental plasticity are seen in insects. For example, beetle horn size responds to nutritional state, butterfly eyespots are enlarged in response to temperature and humidity, and environmental cues also give rise to the queen and worker castes of eusocial insects. These phenotypes arise from essentially identical genomes in response to an environmental cue during development. Developmental plasticity is taxonomically widespread, affects individual fitness, and may act as a rapid-response mechanism allowing individuals to adapt to changing environments. Despite the importance and prevalence of developmental plasticity, there remains scant mechanistic understanding of how it works or evolves. In this review, we use key examples to discuss what is known about developmental plasticity in insects and identify fundamental gaps in the current knowledge. We highlight the importance of working towards a fully integrated understanding of developmental plasticity in a diverse range of species. Furthermore, we advocate for the use of comparative studies in an evo-devo framework to address how developmental plasticity works and how it evolves.
Collapse
|
12
|
Opazo JC, Vandewege MW, Hoffmann FG, Zavala K, Meléndez C, Luchsinger C, Cavieres VA, Vargas-Chacoff L, Morera FJ, Burgos PV, Tapia-Rojas C, Mardones GA. How Many Sirtuin Genes Are Out There? Evolution of Sirtuin Genes in Vertebrates With a Description of a New Family Member. Mol Biol Evol 2023; 40:6993039. [PMID: 36656997 PMCID: PMC9897032 DOI: 10.1093/molbev/msad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 12/21/2022] [Accepted: 01/10/2023] [Indexed: 01/20/2023] Open
Abstract
Studying the evolutionary history of gene families is a challenging and exciting task with a wide range of implications. In addition to exploring fundamental questions about the origin and evolution of genes, disentangling their evolution is also critical to those who do functional/structural studies to allow a deeper and more precise interpretation of their results in an evolutionary context. The sirtuin gene family is a group of genes that are involved in a variety of biological functions mostly related to aging. Their duplicative history is an open question, as well as the definition of the repertoire of sirtuin genes among vertebrates. Our results show a well-resolved phylogeny that represents an improvement in our understanding of the duplicative history of the sirtuin gene family. We identified a new sirtuin gene family member (SIRT3.2) that was apparently lost in the last common ancestor of amniotes but retained in all other groups of jawed vertebrates. According to our experimental analyses, elephant shark SIRT3.2 protein is located in mitochondria, the overexpression of which leads to an increase in cellular levels of ATP. Moreover, in vitro analysis demonstrated that it has deacetylase activity being modulated in a similar way to mammalian SIRT3. Our results indicate that there are at least eight sirtuin paralogs among vertebrates and that all of them can be traced back to the last common ancestor of the group that existed between 676 and 615 millions of years ago.
Collapse
Affiliation(s)
| | - Michael W Vandewege
- College of Veterinary Medicine, North Carolina State University, Raleigh, NC
| | - Federico G Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Starkville, MS,Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS
| | - Kattina Zavala
- Instituto de Ciencias Ambientales y Evolutivas, Facultad de Ciencias, Universidad Austral de Chile, Valdivia, Chile
| | - Catalina Meléndez
- Centro de Biología Celular y Biomedicina (CEBICEM), Facultad de Medicina y Ciencia, Universidad San Sebastián, Santiago, Chile
| | - Charlotte Luchsinger
- Department of Physiology, School of Medicine, Universidad Austral de Chile, Valdivia, Chile
| | - Viviana A Cavieres
- Centro de Biología Celular y Biomedicina (CEBICEM), Facultad de Medicina y Ciencia, Universidad San Sebastián, Santiago, Chile
| | - Luis Vargas-Chacoff
- Integrative Biology Group, Universidad Austral de Chile, Valdivia, Chile,Instituto de Ciencias Marinas y Limnológicas, Universidad Austral de Chile, Valdivia, Chile,Centro Fondap de Investigación de Altas Latitudes (IDEAL), Universidad Austral de Chile, Valdivia, Chile,Millennium Institute Biodiversity of Antarctic and Subantarctic Ecosystems, BASE, Universidad Austral de Chile, Valdivia, Chile
| | - Francisco J Morera
- Integrative Biology Group, Universidad Austral de Chile, Valdivia, Chile,Applied Biochemistry Laboratory, Facultad de Ciencias Veterinarias, Instituto de Farmacología y Morfofisiología, Universidad Austral de Chile, Valdivia, Chile
| | - Patricia V Burgos
- Centro de Biología Celular y Biomedicina (CEBICEM), Facultad de Medicina y Ciencia, Universidad San Sebastián, Santiago, Chile,Centro Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile,Centro de Envejecimiento y Regeneración (CARE-UC), Facultad de Ciencias Biológicas, Pontificia Universidad Católica, Santiago, Chile
| | - Cheril Tapia-Rojas
- Centro de Biología Celular y Biomedicina (CEBICEM), Facultad de Medicina y Ciencia, Universidad San Sebastián, Santiago, Chile,Centro Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
| | | |
Collapse
|
13
|
McCarthy CGP, Mulhair PO, Siu-Ting K, Creevey CJ, O’Connell MJ. Improving Orthologous Signal and Model Fit in Datasets Addressing the Root of the Animal Phylogeny. Mol Biol Evol 2023; 40:6989790. [PMID: 36649189 PMCID: PMC9848061 DOI: 10.1093/molbev/msac276] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/19/2022] [Accepted: 12/23/2022] [Indexed: 01/18/2023] Open
Abstract
There is conflicting evidence as to whether Porifera (sponges) or Ctenophora (comb jellies) comprise the root of the animal phylogeny. Support for either a Porifera-sister or Ctenophore-sister tree has been extensively examined in the context of model selection, taxon sampling, and outgroup selection. The influence of dataset construction is comparatively understudied. We re-examine five animal phylogeny datasets that have supported either root hypothesis using an approach designed to enrich orthologous signal in phylogenomic datasets. We find that many component orthogroups in animal datasets fail to recover major lineages as monophyletic with the exception of Ctenophora, regardless of the supported root. Enriching these datasets to retain orthogroups recovering ≥3 major lineages reduces dataset size by up to 50% while retaining underlying phylogenetic information and taxon sampling. Site-heterogeneous phylogenomic analysis of these enriched datasets recovers both Porifera-sister and Ctenophora-sister positions, even with additional constraints on outgroup sampling. Two datasets which previously supported Ctenophora-sister support Porifera-sister upon enrichment. All enriched datasets display improved model fitness under posterior predictive analysis. While not conclusively rooting animals at either Porifera or Ctenophora, we do see an increase in signal for Porifera-sister and a decrease in signal for Ctenophore-sister when data are filtered for orthologous signal. Our results indicate that dataset size and construction as well as model fit influence animal root inference.
Collapse
Affiliation(s)
| | | | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, United Kingdom
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, United Kingdom
| | | |
Collapse
|
14
|
Zaharias P, Warnow T. Recent progress on methods for estimating and updating large phylogenies. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210244. [PMID: 35989607 PMCID: PMC9393559 DOI: 10.1098/rstb.2021.0244] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/07/2022] [Indexed: 12/20/2022] Open
Abstract
With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
- Paul Zaharias
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
15
|
Doyle JJ. Cell types as species: Exploring a metaphor. FRONTIERS IN PLANT SCIENCE 2022; 13:868565. [PMID: 36072310 PMCID: PMC9444152 DOI: 10.3389/fpls.2022.868565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 07/29/2022] [Indexed: 06/05/2023]
Abstract
The concept of "cell type," though fundamental to cell biology, is controversial. Cells have historically been classified into types based on morphology, physiology, or location. More recently, single cell transcriptomic studies have revealed fine-scale differences among cells with similar gross phenotypes. Transcriptomic snapshots of cells at various stages of differentiation, and of cells under different physiological conditions, have shown that in many cases variation is more continuous than discrete, raising questions about the relationship between cell type and cell state. Some researchers have rejected the notion of fixed types altogether. Throughout the history of discussions on cell type, cell biologists have compared the problem of defining cell type with the interminable and often contentious debate over the definition of arguably the most important concept in systematics and evolutionary biology, "species." In the last decades, systematics, like cell biology, has been transformed by the increasing availability of molecular data, and the fine-grained resolution of genetic relationships have generated new ideas about how that variation should be classified. There are numerous parallels between the two fields that make exploration of the "cell types as species" metaphor timely. These parallels begin with philosophy, with discussion of both cell types and species as being either individuals, groups, or something in between (e.g., homeostatic property clusters). In each field there are various different types of lineages that form trees or networks that can (and in some cases do) provide criteria for grouping. Developing and refining models for evolutionary divergence of species and for cell type differentiation are parallel goals of the two fields. The goal of this essay is to highlight such parallels with the hope of inspiring biologists in both fields to look for new solutions to similar problems outside of their own field.
Collapse
Affiliation(s)
- Jeff J. Doyle
- Section of Plant Biology and Section of Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
16
|
Cerón-Romero MA, Fonseca MM, de Oliveira Martins L, Posada D, Katz LA. Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages. Genome Biol Evol 2022; 14:evac119. [PMID: 35880421 PMCID: PMC9366629 DOI: 10.1093/gbe/evac119] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2022] [Indexed: 12/02/2022] Open
Abstract
Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.
Collapse
Affiliation(s)
- Mario A Cerón-Romero
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, USA
| | - Miguel M Fonseca
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
| | - Leonardo de Oliveira Martins
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
17
|
Hatami E, Jones KE, Kilian N. New Insights Into the Relationships Within Subtribe Scorzonerinae (Cichorieae, Asteraceae) Using Hybrid Capture Phylogenomics (Hyb-Seq). FRONTIERS IN PLANT SCIENCE 2022; 13:851716. [PMID: 35873957 PMCID: PMC9298463 DOI: 10.3389/fpls.2022.851716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 05/18/2022] [Indexed: 06/15/2023]
Abstract
Subtribe Scorzonerinae (Cichorieae, Asteraceae) contains 12 main lineages and approximately 300 species. Relationships within the subtribe, either at inter- or intrageneric levels, were largely unresolved in phylogenetic studies to date, due to the lack of phylogenetic signal provided by traditional Sanger sequencing markers. In this study, we employed a phylogenomics approach (Hyb-Seq) that targets 1,061 nuclear-conserved ortholog loci designed for Asteraceae and obtained chloroplast coding regions as a by-product of off-target reads. Our objectives were to evaluate the potential of the Hyb-Seq approach in resolving the phylogenetic relationships across the subtribe at deep and shallow nodes, investigate the relationships of major lineages at inter- and intrageneric levels, and examine the impact of the different datasets and approaches on the robustness of phylogenetic inferences. We analyzed three nuclear datasets: exon only, excluding all potentially paralogous loci; exon only, including loci that were only potentially paralogous in 1-3 samples; exon plus intron regions (supercontigs); and the plastome CDS region. Phylogenetic relationships were reconstructed using both multispecies coalescent and concatenation (Maximum Likelihood and Bayesian analyses) approaches. Overall, our phylogenetic reconstructions recovered the same monophyletic major lineages found in previous studies and were successful in fully resolving the backbone phylogeny of the subtribe, while the internal resolution of the lineages was comparatively poor. The backbone topologies were largely congruent among all inferences, but some incongruent relationships were recovered between nuclear and plastome datasets, which are discussed and assumed to represent cases of cytonuclear discordance. Considering the newly resolved phylogenies, a new infrageneric classification of Scorzonera in its revised circumscription is proposed.
Collapse
Affiliation(s)
- Elham Hatami
- Department of Biology, Faculty of Science, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Katy E. Jones
- Botanic Garden and Botanical Museum Berlin, Freie Universität Berlin, Berlin, Germany
| | - Norbert Kilian
- Botanic Garden and Botanical Museum Berlin, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
18
|
Foley S, Vlasova A, Marcet-Houben M, Gabaldón T, Hinman VF. Evolutionary analyses of genes in Echinodermata offer insights towards the origin of metazoan phyla. Genomics 2022; 114:110431. [PMID: 35835427 PMCID: PMC9552553 DOI: 10.1016/j.ygeno.2022.110431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 05/10/2022] [Accepted: 07/06/2022] [Indexed: 11/24/2022]
Abstract
Despite recent studies discussing the evolutionary impacts of gene duplications and losses among metazoans, the genomic basis for the evolution of phyla remains enigmatic. Here, we employ phylogenomic approaches to search for orthologous genes without known functions among echinoderms, and subsequently use them to guide the identification of their homologs across other metazoans. Our final set of 14 genes was obtained via a suite of homology prediction tools, gene expression data, gene ontology, and generating the Strongylocentrotus purpuratus phylome. The gene set was subjected to selection pressure analyses, which indicated that they are highly conserved and under negative selection. Their presence across broad taxonomic depths suggests that genes required to form a phylum are ancestral to that phylum. Therefore, rather than de novo gene genesis, we posit that evolutionary forces such as selection on existing genomic elements over large timescales may drive divergence and contribute to the emergence of phyla.
Collapse
Affiliation(s)
- Saoirse Foley
- Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA; Echinobase #6-46, Mellon Institute, 4400 Fifth Ave, Pittsburgh, PA 15213, USA.
| | - Anna Vlasova
- Barcelona Supercomputing Centre (BSC-CNS), Jordi Girona, 29, 08034 Barcelona, Spain; Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Marina Marcet-Houben
- Barcelona Supercomputing Centre (BSC-CNS), Jordi Girona, 29, 08034 Barcelona, Spain; Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS), Jordi Girona, 29, 08034 Barcelona, Spain; Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain; Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Veronica F Hinman
- Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA; Echinobase #6-46, Mellon Institute, 4400 Fifth Ave, Pittsburgh, PA 15213, USA
| |
Collapse
|
19
|
Mining of Cloned Disease Resistance Gene Homologs (CDRHs) in Brassica Species and Arabidopsis thaliana. BIOLOGY 2022; 11:biology11060821. [PMID: 35741342 PMCID: PMC9220128 DOI: 10.3390/biology11060821] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 05/15/2022] [Accepted: 05/24/2022] [Indexed: 01/23/2023]
Abstract
Simple Summary Developing cultivars with resistance genes (R genes) is an effective strategy to support high yield and quality in Brassica crops. The availability of clone R gene and genomic sequences in Brassica species and Arabidopsis thaliana provide the opportunity to compare genomic regions and survey R genes across genomic databases. In this paper, we aim to identify genes related to cloned genes through sequence identity, providing a repertoire of species-wide related R genes in Brassica crops. The comprehensive list of candidate R genes can be used as a reference for functional analysis. Abstract Various diseases severely affect Brassica crops, leading to significant global yield losses and a reduction in crop quality. In this study, we used the complete protein sequences of 49 cloned resistance genes (R genes) that confer resistance to fungal and bacterial diseases known to impact species in the Brassicaceae family. Homology searches were carried out across Brassica napus, B. rapa, B. oleracea, B. nigra, B. juncea, B. carinata and Arabidopsis thaliana genomes. In total, 660 cloned disease R gene homologs (CDRHs) were identified across the seven species, including 431 resistance gene analogs (RGAs) (248 nucleotide binding site-leucine rich repeats (NLRs), 150 receptor-like protein kinases (RLKs) and 33 receptor-like proteins (RLPs)) and 229 non-RGAs. Based on the position and distribution of specific homologs in each of the species, we observed a total of 87 CDRH clusters composed of 36 NLR, 16 RLK and 3 RLP homogeneous clusters and 32 heterogeneous clusters. The CDRHs detected consistently across the seven species are candidates that can be investigated for broad-spectrum resistance, potentially providing resistance to multiple pathogens. The R genes identified in this study provide a novel resource for the future functional analysis and gene cloning of Brassicaceae R genes towards crop improvement.
Collapse
|
20
|
Nevers Y, Jones TEM, Jyothi D, Yates B, Ferret M, Portell-Silva L, Codo L, Cosentino S, Marcet-Houben M, Vlasova A, Poidevin L, Kress A, Hickman M, Persson E, Piližota I, Guijarro-Clarke C, Iwasaki W, Lecompte O, Sonnhammer E, Roos DS, Gabaldón T, Thybert D, Thomas PD, Hu Y, Emms DM, Bruford E, Capella-Gutierrez S, Martin MJ, Dessimoz C, Altenhoff A. The Quest for Orthologs orthology benchmark service in 2022. Nucleic Acids Res 2022; 50:W623-W632. [PMID: 35552456 PMCID: PMC9252809 DOI: 10.1093/nar/gkac330] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/07/2022] [Accepted: 04/30/2022] [Indexed: 11/15/2022] Open
Abstract
The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.
Collapse
Affiliation(s)
- Yannis Nevers
- To whom correspondence should be addressed. Tel: +41 21 692 5449;
| | - Tamsin E M Jones
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Dushyanth Jyothi
- Protein Function development, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Bethan Yates
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Meritxell Ferret
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Laura Portell-Silva
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Laia Codo
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Salvatore Cosentino
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Tokyo, Japan
| | - Marina Marcet-Houben
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Anna Vlasova
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Laetitia Poidevin
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France,BiGEst-ICube Platform, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France,BiGEst-ICube Platform, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Mark Hickman
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emma Persson
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Guijarro-Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Tokyo, Japan,Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - David S Roos
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain,Centro de Investigaciones Biomédicas en Red de Enfermedades Infecciosas, Barcelona, Spain
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90032, USA
| | - Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
| | - David M Emms
- Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, UK
| | - Elspeth Bruford
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | | | - Maria J Martin
- Protein Function development, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland,Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland,Department of Computer Science, University College London, London, UK,Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Adrian Altenhoff
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland,Computer Science Department, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
21
|
Crow M, Suresh H, Lee J, Gillis J. Coexpression reveals conserved gene programs that co-vary with cell type across kingdoms. Nucleic Acids Res 2022; 50:4302-4314. [PMID: 35451481 PMCID: PMC9071420 DOI: 10.1093/nar/gkac276] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 03/30/2022] [Accepted: 04/08/2022] [Indexed: 12/24/2022] Open
Abstract
What makes a mouse a mouse, and not a hamster? Differences in gene regulation between the two organisms play a critical role. Comparative analysis of gene coexpression networks provides a general framework for investigating the evolution of gene regulation across species. Here, we compare coexpression networks from 37 species and quantify the conservation of gene activity 1) as a function of evolutionary time, 2) across orthology prediction algorithms, and 3) with reference to cell- and tissue-specificity. We find that ancient genes are expressed in multiple cell types and have well conserved coexpression patterns, however they are expressed at different levels across cell types. Thus, differential regulation of ancient gene programs contributes to transcriptional cell identity. We propose that this differential regulation may play a role in cell diversification in both the animal and plant kingdoms.
Collapse
Affiliation(s)
- Megan Crow
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - Hamsini Suresh
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - John Lee
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor NY, USA
| |
Collapse
|
22
|
Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou L, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Sci 2022; 31:8-22. [PMID: 34717010 PMCID: PMC8740835 DOI: 10.1002/pro.4218] [Citation(s) in RCA: 528] [Impact Index Per Article: 264.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/24/2021] [Accepted: 10/26/2021] [Indexed: 02/03/2023]
Abstract
Phylogenetics is a powerful tool for analyzing protein sequences, by inferring their evolutionary relationships to other proteins. However, phylogenetics analyses can be challenging: they are computationally expensive and must be performed carefully in order to avoid systematic errors and artifacts. Protein Analysis THrough Evolutionary Relationships (PANTHER; http://pantherdb.org) is a publicly available, user-focused knowledgebase that stores the results of an extensive phylogenetic reconstruction pipeline that includes computational and manual processes and quality control steps. First, fully reconciled phylogenetic trees (including ancestral protein sequences) are reconstructed for a set of "reference" protein sequences obtained from fully sequenced genomes of organisms across the tree of life. Second, the resulting phylogenetic trees are manually reviewed and annotated with function evolution events: inferred gains and losses of protein function along branches of the phylogenetic tree. Here, we describe in detail the current contents of PANTHER, how those contents are generated, and how they can be used in a variety of applications. The PANTHER knowledgebase can be downloaded or accessed via an extensive API. In addition, PANTHER provides software tools to facilitate the application of the knowledgebase to common protein sequence analysis tasks: exploring an annotated genome by gene function; performing "enrichment analysis" of lists of genes; annotating a single sequence or large batch of sequences by homology; and assessing the likelihood that a genetic variant at a particular site in a protein will have deleterious effects.
Collapse
Affiliation(s)
- Paul D. Thomas
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Dustin Ebert
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Laurent‐Philippe Albou
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| |
Collapse
|
23
|
Morales-Briones DF, Gehrke B, Huang CH, Liston A, Ma H, Marx HE, Tank DC, Yang Y. Analysis of Paralogs in Target Enrichment Data Pinpoints Multiple Ancient Polyploidy Events in Alchemilla s.l. (Rosaceae). Syst Biol 2021; 71:190-207. [PMID: 33978764 PMCID: PMC8677558 DOI: 10.1093/sysbio/syab032] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 04/28/2021] [Accepted: 05/03/2021] [Indexed: 12/16/2022] Open
Abstract
Target enrichment is becoming increasingly popular for phylogenomic studies. Although baits for enrichment are typically designed to target single-copy genes, paralogs are often recovered with increased sequencing depth, sometimes from a significant proportion of loci, especially in groups experiencing whole-genome duplication (WGD) events. Common approaches for processing paralogs in target enrichment data sets include random selection, manual pruning, and mainly, the removal of entire genes that show any evidence of paralogy. These approaches are prone to errors in orthology inference or removing large numbers of genes. By removing entire genes, valuable information that could be used to detect and place WGD events is discarded. Here, we used an automated approach for orthology inference in a target enrichment data set of 68 species of Alchemilla s.l. (Rosaceae), a widely distributed clade of plants primarily from temperate climate regions. Previous molecular phylogenetic studies and chromosome numbers both suggested ancient WGDs in the group. However, both the phylogenetic location and putative parental lineages of these WGD events remain unknown. By taking paralogs into consideration and inferring orthologs from target enrichment data, we identified four nodes in the backbone of Alchemilla s.l. with an elevated proportion of gene duplication. Furthermore, using a gene-tree reconciliation approach, we established the autopolyploid origin of the entire Alchemilla s.l. and the nested allopolyploid origin of four major clades within the group. Here, we showed the utility of automated tree-based orthology inference methods, previously designed for genomic or transcriptomic data sets, to study complex scenarios of polyploidy and reticulate evolution from target enrichment data sets.[Alchemilla; allopolyploidy; autopolyploidy; gene tree discordance; orthology inference; paralogs; Rosaceae; target enrichment; whole genome duplication.].
Collapse
Affiliation(s)
- Diego F Morales-Briones
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN 55108, USA
- Department of Biological Sciences and Institute for Bioinformatics and Evolutionary Studies, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID 83844, USA
| | - Berit Gehrke
- University Gardens, University Museum, University of Bergen, Mildeveien 240, 5259 Hjellestad, Norway
| | - Chien-Hsun Huang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center of Genetics and Development, Ministry of Education Key Laboratory of Biodiversity and Ecological Engineering, Institute of Plant Biology, Center of Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Aaron Liston
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331, USA
| | - Hong Ma
- Department of Biology, the Huck Institute of the Life Sciences, the Pennsylvania State University, 510D Mueller Laboratory, University Park, PA 16802 USA
| | - Hannah E Marx
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109-1048, USA
- Museum of Southwestern Biology and Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - David C Tank
- Department of Biological Sciences and Institute for Bioinformatics and Evolutionary Studies, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID 83844, USA
| | - Ya Yang
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN 55108, USA
| |
Collapse
|
24
|
Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol 2021; 38:5825-5829. [PMID: 34597405 PMCID: PMC8662613 DOI: 10.1093/molbev/msab293] [Citation(s) in RCA: 1189] [Impact Index Per Article: 396.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows for: 1) de novo gene prediction from raw contigs, 2) built-in pairwise orthology prediction, 3) fast protein domain discovery, and 4) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.
Collapse
Affiliation(s)
- Carlos P Cantalapiedra
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| | - Ana Hernández-Plaza
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| | | | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany.,Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany.,Yonsei Frontier Lab (YFL), Yonsei University, Seoul, South Korea
| | - Jaime Huerta-Cepas
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| |
Collapse
|
25
|
Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol 2021; 38:5825-5829. [PMID: 34597405 DOI: 10.1101/2021.06.03.446934] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2023] Open
Abstract
Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows for: 1) de novo gene prediction from raw contigs, 2) built-in pairwise orthology prediction, 3) fast protein domain discovery, and 4) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.
Collapse
Affiliation(s)
- Carlos P Cantalapiedra
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| | - Ana Hernández-Plaza
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| | | | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
- Yonsei Frontier Lab (YFL), Yonsei University, Seoul, South Korea
| | - Jaime Huerta-Cepas
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| |
Collapse
|
26
|
Stein WD, Hoshen MB. During evolution from the earliest tetrapoda, newly-recruited genes are increasingly paralogues of existing genes and distribute non-randomly among the chromosomes. BMC Genomics 2021; 22:794. [PMID: 34736418 PMCID: PMC8570013 DOI: 10.1186/s12864-021-08066-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 09/28/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The present availability of full genome sequences of a broad range of animal species across the whole range of evolutionary history enables one to ask questions as to the distribution of genes across the chromosomes. Do newly recruited genes, as new clades emerge, distribute at random or at non-random locations? RESULTS We extracted values for the ages of the human genes and for their current chromosome locations, from published sources. A quantitative analysis showed that the distribution of newly-added genes among and within the chromosomes appears to be increasingly non-random if one observes animals along the evolutionary series from the precursors of the tetrapoda through to the great apes, whereas the oldest genes are randomly distributed. CONCLUSIONS Randomization will result from chromosome evolution, but less and less time is available for this process as evolution proceeds. Much of the bunching of recently-added genes arises from new gene formation as paralogues in gene families, near the location of genes that were recruited in the preceding phylostratum. As examples we cite the KRTAP, ZNF, OR and some minor gene families. We show that bunching can also result from the evolution of the chromosomes themselves when, as for the KRTAP genes, blocks of genes that had previously been on disparate chromosomes become linked together.
Collapse
Affiliation(s)
- Wilfred D Stein
- Silberman Institute of Life Sciences, Hebrew University, 91904, Jerusalem, Israel.
| | - Moshe B Hoshen
- Bioinformatics Department, Jerusalem College of Technology, Tal Campus, Beit HaDfus 7, 95483, Jerusalem, Israel
| |
Collapse
|
27
|
Prabh N, Tautz D. Frequent lineage-specific substitution rate changes support an episodic model for protein evolution. G3-GENES GENOMES GENETICS 2021; 11:6372692. [PMID: 34542594 PMCID: PMC8664490 DOI: 10.1093/g3journal/jkab333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/13/2021] [Indexed: 12/04/2022]
Abstract
Since the inception of the molecular clock model for sequence evolution, the investigation of protein divergence has revolved around the question of a more or less constant change of amino acid sequences, with specific overall rates for each family. Although anomalies in clock-like divergence are well known, the assumption of a constant decay rate for a given protein family is usually taken as the null model for protein evolution. However, systematic tests of this null model at a genome-wide scale have lagged behind, despite the databases’ enormous growth. We focus here on divergence rate comparisons between very closely related lineages since this allows clear orthology assignments by synteny and reliable alignments, which are crucial for determining substitution rate changes. We generated a high-confidence dataset of syntenic orthologs from four ape species, including humans. We find that despite the appearance of an overall clock-like substitution pattern, several hundred protein families show lineage-specific acceleration and deceleration in divergence rates, or combinations of both in different lineages. Hence, our analysis uncovers a rather dynamic history of substitution rate changes, even between these closely related lineages, implying that one should expect that a large fraction of proteins will have had a history of episodic rate changes in deeper phylogenies. Furthermore, each of the lineages has a separate set of particularly fast diverging proteins. The genes with the highest percentage of branch-specific substitutions are ADCYAP1 in the human lineage (9.7%), CALU in chimpanzees (7.1%), SLC39A14 in the internal branch leading to humans and chimpanzees (4.1%), RNF128 in gorillas (9%), and S100Z in gibbons (15.2%). The mutational pattern in ADCYAP1 suggests a biased mutation process, possibly through asymmetric gene conversion effects. We conclude that a null model of constant change can be problematic for predicting the evolutionary trajectories of individual proteins.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| |
Collapse
|
28
|
Abstract
Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses.
Collapse
Affiliation(s)
- Xavier Grau-Bové
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, 08003, Spain
| | - Arnau Sebé-Pedrós
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, 08003, Spain
| |
Collapse
|
29
|
Mo N, Zhang X, Shi W, Yu G, Chen X, Yang JR. Bidirectional Genetic Control of Phenotypic Heterogeneity and Its Implication for Cancer Drug Resistance. Mol Biol Evol 2021; 38:1874-1887. [PMID: 33355660 PMCID: PMC8097262 DOI: 10.1093/molbev/msaa332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Negative genetic regulators of phenotypic heterogeneity, or phenotypic capacitors/stabilizers, elevate population average fitness by limiting deviation from the optimal phenotype and increase the efficacy of natural selection by enhancing the phenotypic differences among genotypes. Stabilizers can presumably be switched off to release phenotypic heterogeneity in the face of extreme or fluctuating environments to ensure population survival. This task could, however, also be achieved by positive genetic regulators of phenotypic heterogeneity, or "phenotypic diversifiers," as shown by recently reported evidence that a bacterial divisome factor enhances antibiotic resistance. We hypothesized that such active creation of phenotypic heterogeneity by diversifiers, which is functionally independent of stabilizers, is more common than previously recognized. Using morphological phenotypic data from 4,718 single-gene knockout strains of Saccharomyces cerevisiae, we systematically identified 324 stabilizers and 160 diversifiers and constructed a bipartite network between these genes and the morphological traits they control. Further analyses showed that, compared with stabilizers, diversifiers tended to be weaker and more promiscuous (regulating more traits) regulators targeting traits unrelated to fitness. Moreover, there is a general division of labor between stabilizers and diversifiers. Finally, by incorporating NCI-60 human cancer cell line anticancer drug screening data, we found that human one-to-one orthologs of yeast diversifiers/stabilizers likely regulate the anticancer drug resistance of human cancer cell lines, suggesting that these orthologs are potential targets for auxiliary treatments. Our study therefore highlights stabilizers and diversifiers as the genetic regulators for the bidirectional control of phenotypic heterogeneity as well as their distinct evolutionary roles and functional independence.
Collapse
Affiliation(s)
- Ning Mo
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xiaoyu Zhang
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Wenjun Shi
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Gongwang Yu
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xiaoshu Chen
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Corresponding authors: E-mails: ;
| | - Jian-Rong Yang
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control, Ministry of Education, Sun Yat-sen University, Guangzhou, China
- RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
- Corresponding authors: E-mails: ;
| |
Collapse
|
30
|
Independent duplications of the Golgi phosphoprotein 3 oncogene in birds. Sci Rep 2021; 11:12483. [PMID: 34127736 PMCID: PMC8203631 DOI: 10.1038/s41598-021-91909-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 06/02/2021] [Indexed: 02/05/2023] Open
Abstract
Golgi phosphoprotein 3 (GOLPH3) was the first reported oncoprotein of the Golgi apparatus. It was identified as an evolutionarily conserved protein upon its discovery about 20 years ago, but its function remains puzzling in normal and cancer cells. The GOLPH3 gene is part of a group of genes that also includes the GOLPH3L gene. Because cancer has deep roots in multicellular evolution, studying the evolution of the GOLPH3 gene family in non-model species represents an opportunity to identify new model systems that could help better understand the biology behind this group of genes. The main goal of this study is to explore the evolution of the GOLPH3 gene family in birds as a starting point to understand the evolutionary history of this oncoprotein. We identified a repertoire of three GOLPH3 genes in birds. We found duplicated copies of the GOLPH3 gene in all main groups of birds other than paleognaths, and a single copy of the GOLPH3L gene. We suggest there were at least three independent origins for GOLPH3 duplicates. Amino acid divergence estimates show that most of the variation is located in the N-terminal region of the protein. Our transcript abundance estimations show that one paralog is highly and ubiquitously expressed, and the others were variable. Our results are an example of the significance of understanding the evolution of the GOLPH3 gene family, especially for unraveling its structural and functional attributes.
Collapse
|
31
|
Yates B, Gray KA, Jones TEM, Bruford EA. Updates to HCOP: the HGNC comparison of orthology predictions tool. Brief Bioinform 2021; 22:6265175. [PMID: 33959747 PMCID: PMC8574622 DOI: 10.1093/bib/bbab155] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 03/19/2021] [Accepted: 04/02/2021] [Indexed: 11/15/2022] Open
Abstract
Multiple resources currently exist that predict orthologous relationships between genes. These resources differ both in the methodologies used and in the species they make predictions for. The HGNC Comparison of Orthology Predictions (HCOP) search tool integrates and displays data from multiple ortholog prediction resources for a specified human gene or set of genes. An indication of the reliability of a prediction is provided by the number of resources that support it. HCOP was originally designed to show orthology predictions between human and mouse but has been expanded to include data from a current total of 20 selected vertebrate and model organism species. The HCOP pipeline used to fetch and integrate the information from the disparate ortholog and nomenclature data resources has recently been rewritten, both to enable the inclusion of new data and to take advantage of modern web technologies. Data from HCOP are used extensively in our work naming genes as the Vertebrate Gene Nomenclature Committee (https://vertebrate.genenames.org).
Collapse
Affiliation(s)
- Bethan Yates
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kristian A Gray
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tamsin E M Jones
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Elspeth A Bruford
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
| |
Collapse
|
32
|
Derelle R, Philippe H, Colbourne JK. Broccoli: Combining Phylogenetic and Network Analyses for Orthology Assignment. Mol Biol Evol 2021; 37:3389-3396. [PMID: 32602888 DOI: 10.1093/molbev/msaa159] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Orthology assignment is a key step of comparative genomic studies, for which many bioinformatic tools have been developed. However, all gene clustering pipelines are based on the analysis of protein distances, which are subject to many artifacts. In this article, we introduce Broccoli, a user-friendly pipeline designed to infer, with high precision, orthologous groups, and pairs of proteins using a phylogeny-based approach. Briefly, Broccoli performs ultrafast phylogenetic analyses on most proteins and builds a network of orthologous relationships. Orthologous groups are then identified from the network using a parameter-free machine learning algorithm. Broccoli is also able to detect chimeric proteins resulting from gene-fusion events and to assign these proteins to the corresponding orthologous groups. Tested on two benchmark data sets, Broccoli outperforms current orthology pipelines. In addition, Broccoli is scalable, with runtimes similar to those of recent distance-based pipelines. Given its high level of performance and efficiency, this new pipeline represents a suitable choice for comparative genomic studies. Broccoli is freely available at https://github.com/rderelle/Broccoli.
Collapse
Affiliation(s)
- Romain Derelle
- School of Biosciences, University of Birmingham, Birmingham, United Kingdom
| | - Hervé Philippe
- Station d'Ecologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, France.,Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Montréal, QC, Canada
| | - John K Colbourne
- School of Biosciences, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
33
|
Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 2021; 18:366-368. [PMID: 33828273 PMCID: PMC8026399 DOI: 10.1038/s41592-021-01101-x] [Citation(s) in RCA: 1051] [Impact Index Per Article: 350.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 02/22/2021] [Indexed: 12/05/2022]
Abstract
We are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP. An updated version of DIAMOND uses improved algorithmic procedures and a customized high-performance computing framework to make seemingly prohibitive large-scale protein sequence alignments feasible.
Collapse
|
34
|
Linard B, Ebersberger I, McGlynn SE, Glover N, Mochizuki T, Patricio M, Lecompte O, Nevers Y, Thomas PD, Gabaldón T, Sonnhammer E, Dessimoz C, Uchiyama I. Ten Years of Collaborative Progress in the Quest for Orthologs. Mol Biol Evol 2021; 38:3033-3045. [PMID: 33822172 PMCID: PMC8321534 DOI: 10.1093/molbev/msab098] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/07/2021] [Accepted: 04/01/2021] [Indexed: 12/19/2022] Open
Abstract
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
Collapse
Affiliation(s)
- Benjamin Linard
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,SPYGEN, Le Bourget-du-Lac, France
| | - Ingo Ebersberger
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, Germany.,LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Natasha Glover
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Tomohiro Mochizuki
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BCS-CNS), Jordi Girona, Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, University College London, London, United Kingdom.,Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ikuo Uchiyama
- Department of Theoretical Biology, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | | |
Collapse
|
35
|
Rossier V, Warwick Vesztrocy A, Robinson-Rechavi M, Dessimoz C. OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches. Bioinformatics 2021; 37:2866-2873. [PMID: 33787851 PMCID: PMC8479680 DOI: 10.1093/bioinformatics/btab219] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 02/18/2021] [Accepted: 03/30/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive. RESULTS Here, we first show that in multiple animal and plant datasets, 18-62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND. AVAILABILITYAND IMPLEMENTATION OMAmer is available from the Python Package Index (as omamer), with the source code and a precomputed database available at https://github.com/DessimozLab/omamer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victor Rossier
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Alex Warwick Vesztrocy
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland,To whom correspondence should be addressed. or
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland,Department of Genetics, Evolution, and Environment, University College London, London, WC1E 6BT, UK,Department of Computer Science, University College London, London, WC1E 6BT, UK,To whom correspondence should be addressed. or
| |
Collapse
|
36
|
Natsidis P, Kapli P, Schiffer PH, Telford MJ. Systematic errors in orthology inference and their effects on evolutionary analyses. iScience 2021; 24:102110. [PMID: 33659875 PMCID: PMC7892920 DOI: 10.1016/j.isci.2021.102110] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 01/03/2021] [Accepted: 01/21/2021] [Indexed: 01/13/2023] Open
Abstract
The availability of complete sets of genes from many organisms makes it possible to identify genes unique to (or lost from) certain clades. This information is used to reconstruct phylogenetic trees; identify genes involved in the evolution of clade specific novelties; and for phylostratigraphy—identifying ages of genes in a given species. These investigations rely on accurately predicted orthologs. Here we use simulation to produce sets of orthologs that experience no gains or losses. We show that errors in identifying orthologs increase with higher rates of evolution. We use the predicted sets of orthologs, with errors, to reconstruct phylogenetic trees; to count gains and losses; and for phylostratigraphy. Our simulated data, containing information only from errors in orthology prediction, closely recapitulate findings from empirical data. We suggest published downstream analyses must be informed to a large extent by errors in orthology prediction that mimic expected patterns of gene evolution. Presence of shared orthologs across species is used for evolutionary analyses We simulated realistic sets of orthologs with no gains or losses Errors predicting shared orthologs correlate with phylogenetic relationships Presence/absence datasets based on errors recapitulate findings from empirical data
Collapse
Affiliation(s)
- Paschalis Natsidis
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Ecology, University College London, London WC1E 6BT, UK
| | - Paschalia Kapli
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Ecology, University College London, London WC1E 6BT, UK
| | - Philipp H Schiffer
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Ecology, University College London, London WC1E 6BT, UK
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Ecology, University College London, London WC1E 6BT, UK
| |
Collapse
|
37
|
Altenhoff AM, Train CM, Gilbert KJ, Mediratta I, Mendes de Farias T, Moi D, Nevers Y, Radoykova HS, Rossier V, Warwick Vesztrocy A, Glover NM, Dessimoz C. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res 2021; 49:D373-D379. [PMID: 33174605 PMCID: PMC7779010 DOI: 10.1093/nar/gkaa1007] [Citation(s) in RCA: 101] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/10/2020] [Accepted: 10/14/2020] [Indexed: 01/11/2023] Open
Abstract
OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages. Other new and improved features are detailed, such as reporting of the evolutionarily best conserved isoforms of alternatively spliced genes, the inferred local order of ancestral genes, phylogenetic profiling, better cross-references, fast genome mapping, semantic data sharing via RDF, as well as a special coronavirus OMA with 119 viruses from the Nidovirales order, including SARS-CoV-2, the agent of the COVID-19 pandemic. We conclude with improvements to the documentation of the resource through primers, tutorials and short videos. OMA is accessible at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Clément-Marie Train
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Kimberly J Gilbert
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ishita Mediratta
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Department of Computer Science and Information Systems, BITS Pilani K.K. Birla Goa Campus, India
| | | | - David Moi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Hale-Seda Radoykova
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower St, London WC1E 6BT, United Kingdom
- Department of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom
| | - Victor Rossier
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower St, London WC1E 6BT, United Kingdom
- Department of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom
| |
Collapse
|
38
|
Chen Y, Song W, Xie X, Wang Z, Guan P, Peng H, Jiao Y, Ni Z, Sun Q, Guo W. A Collinearity-Incorporating Homology Inference Strategy for Connecting Emerging Assemblies in the Triticeae Tribe as a Pilot Practice in the Plant Pangenomic Era. MOLECULAR PLANT 2020; 13:1694-1708. [PMID: 32979565 DOI: 10.1016/j.molp.2020.09.019] [Citation(s) in RCA: 111] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 09/03/2020] [Accepted: 09/21/2020] [Indexed: 05/18/2023]
Abstract
Plant genome sequencing has dramatically increased, and some species even have multiple high-quality reference versions. Demands for clade-specific homology inference and analysis have increased in the pangenomic era. Here we present a novel method, GeneTribe (https://chenym1.github.io/genetribe/), for homology inference among genetically similar genomes that incorporates gene collinearity and shows better performance than traditional sequence-similarity-based methods in terms of accuracy and scalability. The Triticeae tribe is a typical allopolyploid-rich clade with complex species relationships that includes many important crops, such as wheat, barley, and rye. We built Triticeae-GeneTribe (http://wheat.cau.edu.cn/TGT/), a homology database, by integrating 12 Triticeae genomes and 3 outgroup model genomes and implemented versatile analysis and visualization functions. With macrocollinearity analysis, we were able to construct a refined model illustrating the structural rearrangements of the 4A-5A-7B chromosomes in wheat as two major translocation events. With collinearity analysis at both the macro- and microscale, we illustrated the complex evolutionary history of homologs of the wheat vernalization gene Vrn2, which evolved as a combined result of genome translocation, duplication, and polyploidization and gene loss events. Our work provides a useful practice for connecting emerging genome assemblies, with awareness of the extensive polyploidy in plants, and will help researchers efficiently exploit genome sequence resources.
Collapse
Affiliation(s)
- Yongming Chen
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Wanjun Song
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; Beijing Geek Gene Technology Co Ltd, Beijing 100193, China
| | - Xiaoming Xie
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zihao Wang
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Panfeng Guan
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Huiru Peng
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yuannian Jiao
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhongfu Ni
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Qixin Sun
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Weilong Guo
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
39
|
Emms DM, Kelly S. Benchmarking Orthogroup Inference Accuracy: Revisiting Orthobench. Genome Biol Evol 2020; 12:2258-2266. [PMID: 33022036 PMCID: PMC7738749 DOI: 10.1093/gbe/evaa211] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2020] [Indexed: 01/24/2023] Open
Abstract
Orthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert-curated reference orthogroups (RefOGs) that span the Bilateria and cover a range of different challenges for orthogroup inference. Here, we leveraged improvements in tree inference algorithms and computational resources to reinterrogate these RefOGs and carry out an extensive phylogenetic delineation of their composition. This phylogenetic revision altered the membership of 31 of the 70 RefOGs, with 24 subject to extensive revision and 7 that required minor changes. We further used these revised and updated RefOGs to provide an assessment of the orthogroup inference accuracy of widely used orthogroup inference methods. Finally, we provide an open-source benchmarking suite to support the future development and use of the Orthobench benchmark.
Collapse
Affiliation(s)
- David M Emms
- Department of Plant Sciences, University of Oxford, United Kingdom
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, United Kingdom
| |
Collapse
|
40
|
Chorostecki U, Molina M, Pryszcz LP, Gabaldón T. MetaPhOrs 2.0: integrative, phylogeny-based inference of orthology and paralogy across the tree of life. Nucleic Acids Res 2020; 48:W553-W557. [PMID: 32343307 PMCID: PMC7319458 DOI: 10.1093/nar/gkaa282] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/01/2020] [Accepted: 04/25/2020] [Indexed: 12/23/2022] Open
Abstract
Inferring homology relationships across genes in different species is a central task in comparative genomics. Therefore, a large number of resources and methods have been developed over the years. Some public databases include phylogenetic trees of homologous gene families which can be used to further differentiate homology relationships into orthology and paralogy. MetaPhOrs is a web server that integrates phylogenetic information from different sources to provide orthology and paralogy relationships based on a common phylogeny-based predictive algorithm and associated with a consistency-based confidence score. Here we describe the latest version of the web server which includes major new implementations and provides orthology and paralogy relationships derived from ∼8.2 million gene family trees-from 13 different source repositories across ∼4000 species with sequenced genomes. MetaPhOrs server is freely available, without registration, at http://orthology.phylomedb.org/.
Collapse
Affiliation(s)
- Uciel Chorostecki
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Manuel Molina
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Leszek P Pryszcz
- Centre for Genomic Regulation, 08003 Barcelona, Spain.,International Institute of Molecular and Cell Biology, 4 Ks. Trojdena Street, 02-109 Warsaw, Poland
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain.,ICREA, 08010 Barcelona, Spain
| |
Collapse
|
41
|
Altenhoff AM, Garrayo-Ventas J, Cosentino S, Emms D, Glover NM, Hernández-Plaza A, Nevers Y, Sundesha V, Szklarczyk D, Fernández JM, Codó L, For Orthologs Consortium TQ, Gelpi JL, Huerta-Cepas J, Iwasaki W, Kelly S, Lecompte O, Muffato M, Martin MJ, Capella-Gutierrez S, Thomas PD, Sonnhammer E, Dessimoz C. The Quest for Orthologs benchmark service and consensus calls in 2020. Nucleic Acids Res 2020; 48:W538-W545. [PMID: 32374845 PMCID: PMC7319555 DOI: 10.1093/nar/gkaa308] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 04/16/2020] [Accepted: 04/20/2020] [Indexed: 12/18/2022] Open
Abstract
The identification of orthologs—genes in different species which descended from the same gene in their last common ancestor—is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,ETH Zurich, Department of Computer Science, Zurich, Switzerland
| | | | - Salvatore Cosentino
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - David Emms
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, UK
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Ana Hernández-Plaza
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Vicky Sundesha
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Damian Szklarczyk
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Laia Codó
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Josep Ll Gelpi
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Department of Biochemistry and Molecular Biomedicine. University of Barcelona. Barcelona, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnologia y Genomica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, UK
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, USA
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK
| |
Collapse
|
42
|
Zeng X, Sheng J, Zhu F, Wei T, Zhao L, Hu X, Zheng X, Zhou F, Hu Z, Diao Y, Jin S. Genetic, transcriptional, and regulatory landscape of monolignol biosynthesis pathway in Miscanthus × giganteus. BIOTECHNOLOGY FOR BIOFUELS 2020; 13:179. [PMID: 33117433 PMCID: PMC7590476 DOI: 10.1186/s13068-020-01819-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 10/16/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND Miscanthus × giganteus is widely recognized as a promising lignocellulosic biomass crop due to its advantages of high biomass production, low environmental impacts, and the potential to be cultivated on marginal land. However, the high costs of bioethanol production still limit the current commercialization of lignocellulosic bioethanol. The lignin in the cell wall and its by-products released in the pretreatment step is the main component inhibiting the enzymatic reactions in the saccharification and fermentation processes. Hence, genetic modification of the genes involved in lignin biosynthesis could be a feasible strategy to overcome this barrier by manipulating the lignin content and composition of M. × giganteus. For this purpose, the essential knowledge of these genes and understanding the underlying regulatory mechanisms in M. × giganteus is required. RESULTS In this study, MgPAL1, MgPAL5, Mg4CL1, Mg4CL3, MgHCT1, MgHCT2, MgC3'H1, MgCCoAOMT1, MgCCoAOMT3, MgCCR1, MgCCR2, MgF5H, MgCOMT, and MgCAD were identified as the major monolignol biosynthetic genes in M. × giganteus based on genetic and transcriptional evidence. Among them, 12 genes were cloned and sequenced. By combining transcription factor binding site prediction and expression correlation analysis, MYB46, MYB61, MYB63, WRKY24, WRKY35, WRKY12, ERF021, ERF058, and ERF017 were inferred to regulate the expression of these genes directly. On the basis of these results, an integrated model was summarized to depict the monolignol biosynthesis pathway and the underlying regulatory mechanism in M. × giganteus. CONCLUSIONS This study provides a list of potential gene targets for genetic improvement of lignocellulosic biomass quality of M. × giganteus, and reveals the genetic, transcriptional, and regulatory landscape of the monolignol biosynthesis pathway in M. × giganteus.
Collapse
Affiliation(s)
- Xiaofei Zeng
- School of Biology and Pharmaceutical Engineering, Wuhan Polytechnic University, Wuhan, 430023 People’s Republic of China
- School of Medicine, Southern University of Science and Technology, Shenzhen, 518055 People’s Republic of China
| | - Jiajing Sheng
- School of Life Sciences, Nantong University, Nantong, 226019 People’s Republic of China
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Hubei Lotus Engineering Center, Wuhan University, Wuhan, 430072 People’s Republic of China
| | - Fenglin Zhu
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Hubei Lotus Engineering Center, Wuhan University, Wuhan, 430072 People’s Republic of China
| | - Tianzi Wei
- School of Medicine, Southern University of Science and Technology, Shenzhen, 518055 People’s Republic of China
| | - Lingling Zhao
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Hubei Lotus Engineering Center, Wuhan University, Wuhan, 430072 People’s Republic of China
| | - Xiaohu Hu
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Hubei Lotus Engineering Center, Wuhan University, Wuhan, 430072 People’s Republic of China
| | - Xingfei Zheng
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Hubei Lotus Engineering Center, Wuhan University, Wuhan, 430072 People’s Republic of China
| | - Fasong Zhou
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Hubei Lotus Engineering Center, Wuhan University, Wuhan, 430072 People’s Republic of China
| | - Zhongli Hu
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Hubei Lotus Engineering Center, Wuhan University, Wuhan, 430072 People’s Republic of China
| | - Ying Diao
- School of Biology and Pharmaceutical Engineering, Wuhan Polytechnic University, Wuhan, 430023 People’s Republic of China
| | - Surong Jin
- School of Chemistry, Chemical Engineering and Life Sciences, Wuhan University of Technology, Wuhan, 430070 People’s Republic of China
| |
Collapse
|
43
|
Manolov A, Konanov D, Fedorov D, Osmolovsky I, Vereshchagin R, Ilina E. Genome Complexity Browser: Visualization and quantification of genome variability. PLoS Comput Biol 2020; 16:e1008222. [PMID: 33035207 PMCID: PMC7577506 DOI: 10.1371/journal.pcbi.1008222] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 10/21/2020] [Accepted: 08/05/2020] [Indexed: 12/30/2022] Open
Abstract
Comparative genomics studies may be used to acquire new knowledge regarding genome architecture, which defines the rules for combining sets of genes in the genome of living organisms. Hundreds of thousands of prokaryotic genomes have been sequenced and assembled. However, computational tools capable of simultaneously comparing large numbers of genomes are lacking. We developed the Genome Complexity Browser, a tool that allows the visualization of gene contexts, in a graph-based format, and the quantification of variability for different segments of a genome. The graph-based visualization allows the inspection of changes in gene contents and neighborhoods across hundreds of genomes, simultaneously, which may facilitate the identification of conserved and variable segments of operons or the estimation of the overall variability associated with a particular genome locus. We introduced a measure called complexity, to quantify genome variability. Intraspecies and interspecies comparisons revealed that regions with high complexity values tended to be located in areas that are conserved across different strains and species. The comparison of genomes among different bacteria and archaea species has revealed that many species frequently exchange genes. Occasionally, such horizontal gene transfer events result in the acquisition of pathogenic properties or antibiotic resistance in the recipient organism. Previously, the probabilities of gene insertions were found to vary, with unequal distributions along a chromosome. At some loci, referred to as hotspots, changes occur with much higher frequencies compared with other regions of the chromosome. We developed a computational method and a software tool, called Genome Complexity Browser, that allows the identification of genome variability hotspots and the visualization of changes. We compared the localization of various hotspots and revealed that some demonstrate conserved localizations, even across species, whereas others are transient. Our tool allows users to visually inspect the patterns of gene changes in graph-based format, which presents the visualization in a format that is both compact and informative.
Collapse
Affiliation(s)
- Alexander Manolov
- Federal Research and Clinical Center of Physical and Chemical Medicine, Federal Medical and Biological Agency of Russia, Moscow, Russian Federation
- * E-mail:
| | - Dmitry Konanov
- Federal Research and Clinical Center of Physical and Chemical Medicine, Federal Medical and Biological Agency of Russia, Moscow, Russian Federation
| | - Dmitry Fedorov
- Federal Research and Clinical Center of Physical and Chemical Medicine, Federal Medical and Biological Agency of Russia, Moscow, Russian Federation
| | - Ivan Osmolovsky
- Federal Research and Clinical Center of Physical and Chemical Medicine, Federal Medical and Biological Agency of Russia, Moscow, Russian Federation
| | - Rinat Vereshchagin
- Federal Research and Clinical Center of Physical and Chemical Medicine, Federal Medical and Biological Agency of Russia, Moscow, Russian Federation
| | - Elena Ilina
- Federal Research and Clinical Center of Physical and Chemical Medicine, Federal Medical and Biological Agency of Russia, Moscow, Russian Federation
| |
Collapse
|
44
|
Deutekom ES, Snel B, van Dam TJP. Benchmarking orthology methods using phylogenetic patterns defined at the base of Eukaryotes. Brief Bioinform 2020; 22:5906198. [PMID: 32935832 PMCID: PMC8138875 DOI: 10.1093/bib/bbaa206] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/10/2020] [Accepted: 08/11/2020] [Indexed: 12/26/2022] Open
Abstract
Insights into the evolution of ancestral complexes and pathways are generally achieved through careful and time-intensive manual analysis often using phylogenetic profiles of the constituent proteins. This manual analysis limits the possibility of including more protein-complex components, repeating the analyses for updated genome sets or expanding the analyses to larger scales. Automated orthology inference should allow such large-scale analyses, but substantial differences between orthologous groups generated by different approaches are observed. We evaluate orthology methods for their ability to recapitulate a number of observations that have been made with regard to genome evolution in eukaryotes. Specifically, we investigate phylogenetic profile similarity (co-occurrence of complexes), the last eukaryotic common ancestor’s gene content, pervasiveness of gene loss and the overlap with manually determined orthologous groups. Moreover, we compare the inferred orthologies to each other. We find that most orthology methods reconstruct a large last eukaryotic common ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence. At the same time, derived orthologous groups show imperfect overlap with manually curated orthologous groups. There is no strong indication of which orthology method performs better than another on individual or all of these aspects. Counterintuitively, despite the orthology methods behaving similarly regarding large-scale evaluation, the obtained orthologous groups differ vastly from one another. Availability and implementation The data and code underlying this article are available in github and/or upon reasonable request to the corresponding author: https://github.com/ESDeutekom/ComparingOrthologies.
Collapse
Affiliation(s)
| | - Berend Snel
- Corresponding author: Berend Snel, Padualaan 8, 358CH Utrecht, The Netherlands. Tel.: +31(0)30 253 8102; E-mail:
| | | |
Collapse
|
45
|
Alliance of Genome Resources Portal: unified model organism research platform. Nucleic Acids Res 2020; 48:D650-D658. [PMID: 31552413 PMCID: PMC6943066 DOI: 10.1093/nar/gkz813] [Citation(s) in RCA: 118] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 09/03/2019] [Accepted: 09/19/2019] [Indexed: 01/13/2023] Open
Abstract
The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g. genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.
Collapse
|
46
|
Abstract
Knowing phylogenetic relationships among species is fundamental for many studies in biology. An accurate phylogenetic tree underpins our understanding of the major transitions in evolution, such as the emergence of new body plans or metabolism, and is key to inferring the origin of new genes, detecting molecular adaptation, understanding morphological character evolution and reconstructing demographic changes in recently diverged species. Although data are ever more plentiful and powerful analysis methods are available, there remain many challenges to reliable tree building. Here, we discuss the major steps of phylogenetic analysis, including identification of orthologous genes or proteins, multiple sequence alignment, and choice of substitution models and inference methodologies. Understanding the different sources of errors and the strategies to mitigate them is essential for assembling an accurate tree of life.
Collapse
|
47
|
Exposito-Alonso M, Drost HG, Burbano HA, Weigel D. The Earth BioGenome project: opportunities and challenges for plant genomics and conservation. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 102:222-229. [PMID: 31788877 DOI: 10.1111/tpj.14631] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 11/03/2019] [Accepted: 11/18/2019] [Indexed: 05/28/2023]
Abstract
Sequencing them all. That is the ambitious goal of the recently launched Earth BioGenome project (Proceedings of the National Academy of Sciences of the United States of America, 115, 4325-4333), which aims to produce reference genomes for all eukaryotic species within the next decade. In this perspective, we discuss the opportunities of this project with a plant focus, but highlight also potential limitations. This includes the question of how to best capture all plant diversity, as the green taxon is one of the most complex clades in the tree of life, with over 300 000 species. For this, we highlight four key points: (i) the unique biological insights that could be gained from studying plants, (ii) their apparent underrepresentation in sequencing efforts given the number of threatened species, (iii) the necessity of phylogenomic methods that are aware of differences in genome complexity and quality, and (iv) the accounting for within-species genetic diversity and the historical aspect of conservation genetics.
Collapse
Affiliation(s)
| | - Hajk-Georg Drost
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076, Tübingen, Germany
- The Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, CB2 1LR, Cambridge, UK
| | - Hernán A Burbano
- Centre for Life's Origins and Evolution, Department of Genetics Evolution and Environment, University College London, London, WC1H 0AG, UK
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076, Tübingen, Germany
| |
Collapse
|
48
|
Abstract
The Orthologous Matrix (OMA) is a method and database that allows users to identify orthologs among many genomes. OMA provides three different types of orthologs: pairwise orthologs, OMA Groups and Hierarchical Orthologous Groups (HOGs). This Primer is organized in two parts. In the first part, we provide all the necessary background information to understand the concepts of orthology, how we infer them and the different subtypes of orthology in OMA, as well as what types of analyses they should be used for. In the second part, we describe protocols for using the OMA browser to find a specific gene and its various types of orthologs. By the end of the Primer, readers should be able to (i) understand homology and the different types of orthologs reported in OMA, (ii) understand the best type of orthologs to use for a particular analysis; (iii) find particular genes of interest in the OMA browser; and (iv) identify orthologs for a given gene. The data can be freely accessed from the OMA browser at https://omabrowser.org.
Collapse
Affiliation(s)
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Natasha M. Glover
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, 1015, Switzerland
| |
Collapse
|
49
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
50
|
The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases. Genetics 2019; 213:1189-1196. [PMID: 31796553 PMCID: PMC6893393 DOI: 10.1534/genetics.119.302523] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 10/11/2019] [Indexed: 12/17/2022] Open
Abstract
Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified "look and feel," the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient "knowledge commons" for model organisms using shared, modular infrastructure.
Collapse
|