1
|
Correa Marrero M, Jänes J, Baptista D, Beltrao P. Integrating Large-Scale Protein Structure Prediction into Human Genetics Research. Annu Rev Genomics Hum Genet 2024; 25:123-140. [PMID: 38621234 DOI: 10.1146/annurev-genom-120622-020615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Collapse
Affiliation(s)
- Miguel Correa Marrero
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | - Jürgen Jänes
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | | | - Pedro Beltrao
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| |
Collapse
|
2
|
Ata Ö, Mattanovich D. Into the metabolic wild: Unveiling hidden pathways of microbial metabolism. Microb Biotechnol 2024; 17:e14548. [PMID: 39126421 PMCID: PMC11316390 DOI: 10.1111/1751-7915.14548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Microbial metabolism has been deeply studied over decades and it is considered to be understood to a great extent. Annotated genome sequences of many microbial species have contributed a lot to generating biochemical knowledge on metabolism. However, researchers still discover novel pathways, unforeseen reactions or unexpected metabolites which contradict to the expected canon of biochemical reactions in living organisms. Here, we highlight a few examples of such non-canonical pathways, how they were found, and what their importance in microbial biotechnology may be. The predictive power of metabolic modelling, well-founded on biochemical knowledge and genomic information is discussed in the light of both discovery of yet unknown existing metabolic routes and the prediction of others, new to Nature.
Collapse
Affiliation(s)
- Özge Ata
- Department of Biotechnology, Institute of Microbiology and Microbial BiotechnologyBOKU UniversityViennaAustria
- Austrian Centre of Industrial BiotechnologyViennaAustria
| | - Diethard Mattanovich
- Department of Biotechnology, Institute of Microbiology and Microbial BiotechnologyBOKU UniversityViennaAustria
- Austrian Centre of Industrial BiotechnologyViennaAustria
| |
Collapse
|
3
|
de Crécy-Lagard V, Dias R, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601547. [PMID: 39005379 PMCID: PMC11244979 DOI: 10.1101/2024.07.01.601547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Thirty to seventy percent of proteins in any given genome have no assigned function and have been labeled as the protein "unknownme". This large knowledge gap prevents the biological community from fully leveraging the plethora of genomic data that is now available. Machine-learning approaches are showing some promise in propagating functional knowledge from experimentally characterized proteins to the correct set of isofunctional orthologs. However, they largely fail to predict enzymatic functions unseen in the training set, as shown by dissecting the predictions made for 450 enzymes of unknown function from the model bacteria Escherichia coli using the DeepECTransformer platform. Lessons from these failures can help the community develop machine-learning methods that assist domain experts in making testable functional predictions for more members of the uncharacterized proteome.
Collapse
|
4
|
Forsburg SL. The micromammals. G3 (BETHESDA, MD.) 2024; 14:jkae073. [PMID: 38837137 DOI: 10.1093/g3journal/jkae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
In this editorial, Senior Editor Susan Forsburg examines the reasons to keep studying eukaryotic microbes like S. pombe and S. cerevisiae—and other yeasts, algae, amoeba, and fungi—even as genetic and genomic technologies now allow manipulation and study of practically any organism. She explores the challenges and opportunities of working in these tiny organisms, pointing to the substantial biology their study has uncovered.
Collapse
Affiliation(s)
- Susan L Forsburg
- Program in Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089-2910, USA
| |
Collapse
|
5
|
Wenteler A, Cabrera CP, Wei W, Neduva V, Barnes MR. AI approaches for the discovery and validation of drug targets. CAMBRIDGE PRISMS. PRECISION MEDICINE 2024; 2:e7. [PMID: 39258224 PMCID: PMC11383977 DOI: 10.1017/pcm.2024.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 05/04/2024] [Accepted: 05/08/2024] [Indexed: 09/12/2024]
Abstract
Artificial intelligence (AI) holds immense promise for accelerating and improving all aspects of drug discovery, not least target discovery and validation. By integrating a diverse range of biological data modalities, AI enables the accurate prediction of drug target properties, ultimately illuminating biological mechanisms of disease and guiding drug discovery strategies. Despite the indisputable potential of AI in drug target discovery, there are many challenges and obstacles yet to be overcome, including dealing with data biases, model interpretability and generalisability, and the validation of predicted drug targets, to name a few. By exploring recent advancements in AI, this review showcases current applications of AI for drug target discovery and offers perspectives on the future of AI for the discovery and validation of drug targets, paving the way for the generation of novel and safer pharmaceuticals.
Collapse
Affiliation(s)
- Aaron Wenteler
- Digital Environment Research Institute, Queen Mary University of London, London, United Kingdom
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- MSD Discovery Centre, London, United Kingdom
| | - Claudia P Cabrera
- Digital Environment Research Institute, Queen Mary University of London, London, United Kingdom
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- NIHR Barts Cardiovascular Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Wei Wei
- MSD Discovery Centre, London, United Kingdom
| | | | - Michael R Barnes
- Digital Environment Research Institute, Queen Mary University of London, London, United Kingdom
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- NIHR Barts Cardiovascular Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
- The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
6
|
Rutherford KM, Lera-Ramírez M, Wood V. PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability. Genetics 2024; 227:iyae007. [PMID: 38376816 PMCID: PMC11075564 DOI: 10.1093/genetics/iyae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/13/2024] [Indexed: 02/21/2024] Open
Abstract
PomBase (https://www.pombase.org), the model organism database (MOD) for fission yeast, was recently awarded Global Core Biodata Resource (GCBR) status by the Global Biodata Coalition (GBC; https://globalbiodata.org/) after a rigorous selection process. In this MOD review, we present PomBase's continuing growth and improvement over the last 2 years. We describe these improvements in the context of the qualitative GCBR indicators related to scientific quality, comprehensivity, accelerating science, user stories, and collaborations with other biodata resources. This review also showcases the depth of existing connections both within the biocuration ecosystem and between PomBase and its user community.
Collapse
Affiliation(s)
- Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Manuel Lera-Ramírez
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
7
|
Richardson R, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife 2024; 12:RP93429. [PMID: 38546716 PMCID: PMC10977968 DOI: 10.7554/elife.93429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2024] Open
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of -omics studies. To promote the investigation of understudied genes, we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese Richardson
- Interdisciplinary Biological Sciences, Northwestern UniversityEvanstonUnited States
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
- Department of Molecular Biosciences, Northwestern UniversityEvanstonUnited States
- Department of Physics and Astronomy, Northwestern UniversityEvanstonUnited States
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- The Potocsnak Longevity Institute, Northwestern UniversityChicagoUnited States
- Simpson Querrey Lung Institute for Translational Science, Northwestern UniversityChicagoUnited States
| |
Collapse
|
8
|
Richardson RAK, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.28.530483. [PMID: 36909550 PMCID: PMC10002660 DOI: 10.1101/2023.02.28.530483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of - omics studies. To promote the investigation of understudied genes we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese AK Richardson
- Interdisciplinary Biological Sciences, Northwestern University
- Department of Chemical and Biological Engineering, Northwestern University
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
- Department of Physics and Astronomy, Northwestern University
- Department of Molecular Biosciences, Northwestern University
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University
- The Potocsnak Longevity Institute, Northwestern University
- Simpson Querrey Lung Institute for Translational Science, Northwestern University
| |
Collapse
|
9
|
Brunnsåker D, Kronström F, Tiukova IA, King RD. Interpreting protein abundance in Saccharomyces cerevisiae through relational learning. Bioinformatics 2024; 40:btae050. [PMID: 38273672 PMCID: PMC10868306 DOI: 10.1093/bioinformatics/btae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 01/16/2024] [Accepted: 01/23/2024] [Indexed: 01/27/2024] Open
Abstract
MOTIVATION Proteomic profiles reflect the functional readout of the physiological state of an organism. An increased understanding of what controls and defines protein abundances is of high scientific interest. Saccharomyces cerevisiae is a well-studied model organism, and there is a large amount of structured knowledge on yeast systems biology in databases such as the Saccharomyces Genome Database, and highly curated genome-scale metabolic models like Yeast8. These datasets, the result of decades of experiments, are abundant in information, and adhere to semantically meaningful ontologies. RESULTS By representing this knowledge in an expressive Datalog database we generated data descriptors using relational learning that, when combined with supervised machine learning, enables us to predict protein abundances in an explainable manner. We learnt predictive relationships between protein abundances, function and phenotype; such as α-amino acid accumulations and deviations in chronological lifespan. We further demonstrate the power of this methodology on the proteins His4 and Ilv2, connecting qualitative biological concepts to quantified abundances. AVAILABILITY AND IMPLEMENTATION All data and processing scripts are available at the following Github repository: https://github.com/DanielBrunnsaker/ProtPredict.
Collapse
Affiliation(s)
- Daniel Brunnsåker
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| | - Filip Kronström
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| | - Ievgeniia A Tiukova
- Department of Life Sciences, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Industrial Biotechnology, KTH Royal Institute of Technology, Stockholm 106 91, Sweden
| | - Ross D King
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
| |
Collapse
|
10
|
Rappsilber J. A dive into the unknome. Trends Genet 2024; 40:15-16. [PMID: 37968205 DOI: 10.1016/j.tig.2023.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 10/23/2023] [Indexed: 11/17/2023]
Abstract
We may never understand the function of all genes, findings by Freeman, Munro and colleagues suggest, unless we rethink our approaches. They make a thorough attempt at quantifying the unknownness of protein-coding genes and experimentally prove that many neglected genes hold the seed of important discoveries.
Collapse
Affiliation(s)
- Juri Rappsilber
- Technische Universität Berlin, Chair of Bioanalytics, 10623 Berlin, Germany; Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, UK; Si-M/'Der Simulierte Mensch', a Science Framework of Technische Universität Berlin and Charité - Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
11
|
Taghon GJ, Strychalski EA. Rise of synthetic yeast: Charting courses to new applications. CELL GENOMICS 2023; 3:100438. [PMID: 38020966 PMCID: PMC10667549 DOI: 10.1016/j.xgen.2023.100438] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
Microbes have long provided us with important capabilities, and the genome engineering of microbes has greatly empowered research and applications in biotechnology. This is especially true with the emergence of synthetic biology and recent advances in genome engineering to control microbial behavior. A fully synthetic, rationally designed genome promises opportunities for unprecedented control of cellular function. As a eukaryotic workhorse for research and industrial use, yeast is an organism at the forefront of synthetic biology; the tools and engineered cellular platform being delivered by the Sc2.0 consortium are enabling a new era of bespoke biology. This issue highlights recent advances delivered by this consortium, but hurdles remain to maximize the impact of engineered eukaryotic cells more broadly.
Collapse
Affiliation(s)
- Geoffrey J. Taghon
- National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
| | | |
Collapse
|
12
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
13
|
Allou L, Mundlos S. Disruption of regulatory domains and novel transcripts as disease-causing mechanisms. Bioessays 2023; 45:e2300010. [PMID: 37381881 DOI: 10.1002/bies.202300010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 05/24/2023] [Accepted: 06/06/2023] [Indexed: 06/30/2023]
Abstract
Deletions, duplications, insertions, inversions, and translocations, collectively called structural variations (SVs), affect more base pairs of the genome than any other sequence variant. The recent technological advancements in genome sequencing have enabled the discovery of tens of thousands of SVs per human genome. These SVs primarily affect non-coding DNA sequences, but the difficulties in interpreting their impact limit our understanding of human disease etiology. The functional annotation of non-coding DNA sequences and methodologies to characterize their three-dimensional (3D) organization in the nucleus have greatly expanded our understanding of the basic mechanisms underlying gene regulation, thereby improving the interpretation of SVs for their pathogenic impact. Here, we discuss the various mechanisms by which SVs can result in altered gene regulation and how these mechanisms can result in rare genetic disorders. Beyond changing gene expression, SVs can produce novel gene-intergenic fusion transcripts at the SV breakpoints.
Collapse
Affiliation(s)
- Lila Allou
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Stefan Mundlos
- RG Development & Disease, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
14
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. Did the early full genome sequencing of yeast boost gene function discovery? Biol Direct 2023; 18:46. [PMID: 37574542 PMCID: PMC10424406 DOI: 10.1186/s13062-023-00403-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 08/01/2023] [Indexed: 08/15/2023] Open
Abstract
BACKGROUND Although the genome of Saccharomyces cerevisiae (S. cerevisiae) was the first one of a eukaryote organism that was fully sequenced (in 1996), a complete understanding of the potential of encoded biomolecular mechanisms has not yet been achieved. Here, we wish to quantify how far the goal of a full list of S. cerevisiae gene functions still is. RESULTS The scientific literature about S. cerevisiae protein-coding genes has been mapped onto the yeast genome via the mentioning of names for genomic regions in scientific publications. The match was quantified with the ratio of a given gene name's occurrences to those of any gene names in the article. We find that ~ 230 elite genes with ≥ 75 full publication equivalents (FPEs, FPE = 1 is an idealized publication referring to just a single gene) command ~ 45% of all literature. At the same time, about two thirds of the genes (each with less than 10 FPEs) are described in just 12% of the literature (in average each such gene has just ~ 1.5% of the literature of an elite gene). About 600 genes have not been mentioned in any dedicated article. Compared with other groups of genes, the literature growth rates were highest for uncharacterized or understudied genes until late nineties of the twentieth century. Yet, these growth rates deteriorated and became negative thereafter. Thus, yeast function discovery for previously uncharacterized genes has returned to the level of ~ 1980. At the same time, literature for anyhow well-studied genes (with a threshold T10 (≥ 10 FPEs) and higher) remains steadily growing. CONCLUSIONS Did the early full genome sequencing of yeast boost gene function discovery? The data proves that the moment of publishing the full genome in reality coincides with the onset of decline of gene function discovery for previously uncharacterized genes. If the current status of literature about yeast molecular mechanisms can be extrapolated into the future, it will take about another ~ 50 years to complete the yeast gene function list. We found that a small group of scientific journals contributed extraordinarily to publishing early reports relevant to yeast gene function discoveries.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
| | - Swati Sinha
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
15
|
Rocha JJ, Jayaram SA, Stevens TJ, Muschalik N, Shah RD, Emran S, Robles C, Freeman M, Munro S. Functional unknomics: Systematic screening of conserved genes of unknown function. PLoS Biol 2023; 21:e3002222. [PMID: 37552676 PMCID: PMC10409296 DOI: 10.1371/journal.pbio.3002222] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/27/2023] [Indexed: 08/10/2023] Open
Abstract
The human genome encodes approximately 20,000 proteins, many still uncharacterised. It has become clear that scientific research tends to focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected. To address this, we have developed a publicly available and customisable "Unknome database" that ranks proteins based on how little is known about them. We applied RNA interference (RNAi) in Drosophila to 260 unknown genes that are conserved between flies and humans. Knockdown of some genes resulted in loss of viability, and functional screening of the rest revealed hits for fertility, development, locomotion, protein quality control, and resilience to stress. CRISPR/Cas9 gene disruption validated a component of Notch signalling and 2 genes contributing to male fertility. Our work illustrates the importance of poorly understood genes, provides a resource to accelerate future research, and highlights a need to support database curation to ensure that misannotation does not erode our awareness of our own ignorance.
Collapse
Affiliation(s)
- João J. Rocha
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Tim J. Stevens
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Rajen D. Shah
- Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Sahar Emran
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Cristina Robles
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Matthew Freeman
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Sean Munro
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
16
|
Arakawa K, Hirose T, Inada T, Ito T, Kai T, Oyama M, Tomari Y, Yoda T, Nakagawa S. Nondomain biopolymers: Flexible molecular strategies to acquire biological functions. Genes Cells 2023; 28:539-552. [PMID: 37249032 DOI: 10.1111/gtc.13050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 05/15/2023] [Accepted: 05/16/2023] [Indexed: 05/31/2023]
Abstract
A long-standing assumption in molecular biology posits that the conservation of protein and nucleic acid sequences emphasizes the functional significance of biomolecules. These conserved sequences fold into distinct secondary and tertiary structures, enable highly specific molecular interactions, and regulate complex yet organized molecular processes within living cells. However, recent evidence suggests that biomolecules can also function through primary sequence regions that lack conservation across species or gene families. These regions typically do not form rigid structures, and their inherent flexibility is critical for their functional roles. This review examines the emerging roles and molecular mechanisms of "nondomain biomolecules," whose functions are not easily predicted due to the absence of conserved functional domains. We propose the hypothesis that both domain- and nondomain-type molecules work together to enable flexible and efficient molecular processes within the highly crowded intracellular environment.
Collapse
Grants
- 21H05273 Ministry of Education, Culture, Sports, Science and Technology
- 21H05274 Ministry of Education, Culture, Sports, Science and Technology
- 21H05275 Ministry of Education, Culture, Sports, Science and Technology
- 21H05276 Ministry of Education, Culture, Sports, Science and Technology
- 21H05277 Ministry of Education, Culture, Sports, Science and Technology
- 21H05278 Ministry of Education, Culture, Sports, Science and Technology
- 21H05279 Ministry of Education, Culture, Sports, Science and Technology
- 21H05280 Ministry of Education, Culture, Sports, Science and Technology
- 21H05281 Ministry of Education, Culture, Sports, Science and Technology
- 21H05282 Ministry of Education, Culture, Sports, Science and Technology
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Tetsuro Hirose
- RNA Biofunction Laboratory, Graduate School of Frontier Biosciences, Osaka University, Suita, Japan
| | - Toshifumi Inada
- Division of RNA and Gene Regulation, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Takuhiro Ito
- Laboratory for Translation Structural Biology, RIKEN Center for Biosystems Dynamics Research, Yokohama, Japan
| | - Toshie Kai
- Germline Biology Laboratory, Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan
| | - Masaaki Oyama
- Medical Proteomics Laboratory, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Yukihide Tomari
- Laboratory of RNA Function, Institute for Quantitative Biosciences, The University of Tokyo, Tokyo, Japan
| | - Takao Yoda
- Nagahama Institute of Bio-Science and Technology, Nagahama, Japan
| | - Shinichi Nakagawa
- RNA Biology Laboratory, Faculty of Pharmaceutical Sciences, Hokkaido University, Sapporo, Japan
| |
Collapse
|
17
|
Xue B, Rhee SY. Status of genome function annotation in model organisms and crops. PLANT DIRECT 2023; 7:e499. [PMID: 37426891 PMCID: PMC10326244 DOI: 10.1002/pld3.499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/21/2023] [Accepted: 05/08/2023] [Indexed: 07/11/2023]
Abstract
Since the entry into genome-enabled biology several decades ago, much progress has been made in determining, describing, and disseminating the functions of genes and their products. Yet, this information is still difficult to access for many scientists and for most genomes. To provide easy access and a graphical summary of the status of genome function annotation for model organisms and bioenergy and food crop species, we created a web application (https://genomeannotation.rheelab.org) to visualize, search, and download genome annotation data for 28 species. The summary graphics and data tables will be updated semi-annually, and snapshots will be archived to provide a historical record of the progress of genome function annotation efforts. Clear and simple visualization of up-to-date genome function annotation status, including the extent of what is unknown, will help address the grand challenge of elucidating the functions of all genes in organisms.
Collapse
Affiliation(s)
- Bo Xue
- Department of Plant BiologyCarnegie Institution for ScienceStanfordCaliforniaUSA
- Present address:
Plant Resilience InstituteMichigan State UniversityEast LansingMI 4882
| | - Seung Y. Rhee
- Department of Plant BiologyCarnegie Institution for ScienceStanfordCaliforniaUSA
- Present address:
Plant Resilience InstituteMichigan State UniversityEast LansingMI 4882
| |
Collapse
|
18
|
Sokol L, Cuypers A, Truong ACK, Bouché A, Brepoels K, Souffreau J, Rohlenova K, Vinckier S, Schoonjans L, Eelen G, Dewerchin M, de Rooij LPMH, Carmeliet P. Prioritization and functional validation of target genes from single-cell transcriptomics studies. Commun Biol 2023; 6:648. [PMID: 37330599 PMCID: PMC10276815 DOI: 10.1038/s42003-023-05006-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 06/01/2023] [Indexed: 06/19/2023] Open
Abstract
Translation of academic results into clinical practice is a formidable unmet medical need. Single-cell RNA-sequencing (scRNA-seq) studies generate long descriptive ranks of markers with predicted biological function, but without functional validation, it remains challenging to know which markers truly exert the putative function. Given the lengthy/costly nature of validation studies, gene prioritization is required to select candidates. We address these issues by studying tip endothelial cell (EC) marker genes because of their importance for angiogenesis. Here, by tailoring Guidelines On Target Assessment for Innovative Therapeutics, we in silico prioritize previously unreported/poorly described, high-ranking tip EC markers. Notably, functional validation reveals that four of six candidates behave as tip EC genes. We even discover a tip EC function for a gene lacking in-depth functional annotation. Thus, validating prioritized genes from scRNA-seq studies offers opportunities for identifying targets to be considered for possible translation, but not all top-ranked scRNA-seq markers exert the predicted function.
Collapse
Affiliation(s)
- Liliana Sokol
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Anne Cuypers
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Anh-Co K Truong
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Ann Bouché
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Katleen Brepoels
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Joris Souffreau
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Katerina Rohlenova
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
- Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, Vestec, Prague-West, Czech Republic
| | - Stefan Vinckier
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Luc Schoonjans
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
- Laboratory of Angiogenesis and Vascular Heterogeneity, Department of Biomedicine, Aarhus University, 8000, Aarhus C, Denmark
| | - Guy Eelen
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Mieke Dewerchin
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium
| | - Laura P M H de Rooij
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium.
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.
| | - Peter Carmeliet
- Laboratory of Angiogenesis and Vascular Metabolism, Center for Cancer Biology (CCB), VIB and Department of Oncology, Leuven Cancer Institute (LKI), KU Leuven, Leuven, Belgium.
- Laboratory of Angiogenesis and Vascular Heterogeneity, Department of Biomedicine, Aarhus University, 8000, Aarhus C, Denmark.
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
19
|
Sanchez-Briñas A, Duran-Ruiz C, Astola A, Arroyo MM, Raposo FG, Valle A, Bolivar J. ZNF330/NOA36 interacts with HSPA1 and HSPA8 and modulates cell cycle and proliferation in response to heat shock in HEK293 cells. Biol Direct 2023; 18:26. [PMID: 37254218 DOI: 10.1186/s13062-023-00384-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 05/20/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND The human genome contains nearly 20.000 protein-coding genes, but there are still more than 6,000 proteins poorly characterized. Among them, ZNF330/NOA36 stand out because it is a highly evolutionarily conserved nucleolar zinc-finger protein found in the genome of ancient animal phyla like sponges or cnidarians, up to humans. Firstly described as a human autoantigen, NOA36 is expressed in all tissues and human cell lines, and it has been related to apoptosis in human cells as well as in muscle morphogenesis and hematopoiesis in Drosophila. Nevertheless, further research is required to better understand the roles of this highly conserved protein. RESULTS Here, we have investigated possible interactors of human ZNF330/NOA36 through affinity-purification mass spectrometry (AP-MS). Among them, NOA36 interaction with HSPA1 and HSPA8 heat shock proteins was disclosed and further validated by co-immunoprecipitation. Also, "Enhancer of Rudimentary Homolog" (ERH), a protein involved in cell cycle regulation, was detected in the AP-MS approach. Furthermore, we developed a NOA36 knockout cell line using CRISPR/Cas9n in HEK293, and we found that the cell cycle profile was modified, and proliferation decreased after heat shock in the knocked-out cells. These differences were not due to a different expression of the HSPs genes detected in the AP-MS after inducing stress. CONCLUSIONS Our results indicate that NOA36 is necessary for proliferation recovery in response to thermal stress to achieve a regular cell cycle profile, likely by interaction with HSPA1 and HSPA8. Further studies would be required to disclose the relevance of NOA36-EHR interaction in this context.
Collapse
Affiliation(s)
- Alejandra Sanchez-Briñas
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
| | - Carmen Duran-Ruiz
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
- Biomedical Research and Innovation Institute of Cadiz (INiBICA), Cadiz, Spain
| | - Antonio Astola
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
- Institute of Biomolecules (INBIO), University of Cadiz, Cadiz, Spain
| | - Marta Marina Arroyo
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
| | - Fátima G Raposo
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
| | - Antonio Valle
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain
- Institute of Viticulture and Agri-Food Research (IVAGRO) - International Campus of Excellence (ceiA3), University of Cadiz, Cadiz, Spain
| | - Jorge Bolivar
- Department of Biomedicine, Biotechnology and Public Health-Biochemistry and Molecular Biology, Campus Universitario de Puerto Real, University of Cadiz, Puerto Real, Cadiz, 11510, Spain.
- Institute of Biomolecules (INBIO), University of Cadiz, Cadiz, Spain.
| |
Collapse
|
20
|
Zeng X, Kahng A, Xue L, Mahamid J, Chang YW, Xu M. High-throughput cryo-ET structural pattern mining by unsupervised deep iterative subtomogram clustering. Proc Natl Acad Sci U S A 2023; 120:e2213149120. [PMID: 37027429 PMCID: PMC10104553 DOI: 10.1073/pnas.2213149120] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 02/24/2023] [Indexed: 04/08/2023] Open
Abstract
Cryoelectron tomography directly visualizes heterogeneous macromolecular structures in their native and complex cellular environments. However, existing computer-assisted structure sorting approaches are low throughput or inherently limited due to their dependency on available templates and manual labels. Here, we introduce a high-throughput template-and-label-free deep learning approach, Deep Iterative Subtomogram Clustering Approach (DISCA), that automatically detects subsets of homogeneous structures by learning and modeling 3D structural features and their distributions. Evaluation on five experimental cryo-ET datasets shows that an unsupervised deep learning based method can detect diverse structures with a wide range of molecular sizes. This unsupervised detection paves the way for systematic unbiased recognition of macromolecular complexes in situ.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA15213
| | - Anson Kahng
- Computer Science Department, University of Rochester, Rochester, NY14620
| | - Liang Xue
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg69117, Germany
- Faculty of Biosciences, Collaboration for joint PhD degree between European Molecular Biology Laboratory and Heidelberg University, Heidelberg69117, Germany
| | - Julia Mahamid
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg69117, Germany
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA15213
| |
Collapse
|
21
|
Brunnsåker D, Reder GK, Soni NK, Savolainen OI, Gower AH, Tiukova IA, King RD. High-throughput metabolomics for the design and validation of a diauxic shift model. NPJ Syst Biol Appl 2023; 9:11. [PMID: 37029131 PMCID: PMC10082077 DOI: 10.1038/s41540-023-00274-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 03/23/2023] [Indexed: 04/09/2023] Open
Abstract
Saccharomyces cerevisiae is a very well studied organism, yet ∼20% of its proteins remain poorly characterized. Moreover, recent studies seem to indicate that the pace of functional discovery is slow. Previous work has implied that the most probable path forward is via not only automation but fully autonomous systems in which active learning is applied to guide high-throughput experimentation. Development of tools and methods for these types of systems is of paramount importance. In this study we use constrained dynamical flux balance analysis (dFBA) to select ten regulatory deletant strains that are likely to have previously unexplored connections to the diauxic shift. We then analyzed these deletant strains using untargeted metabolomics, generating profiles which were then subsequently investigated to better understand the consequences of the gene deletions in the metabolic reconfiguration of the diauxic shift. We show that metabolic profiles can be utilised to not only gaining insight into cellular transformations such as the diauxic shift, but also on regulatory roles and biological consequences of regulatory gene deletion. We also conclude that untargeted metabolomics is a useful tool for guidance in high-throughput model improvement, and is a fast, sensitive and informative approach appropriate for future large-scale functional analyses of genes. Moreover, it is well-suited for automated approaches due to relative simplicity of processing and the potential to make massively high-throughput.
Collapse
Affiliation(s)
- Daniel Brunnsåker
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden.
| | - Gabriel K Reder
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
| | - Nikul K Soni
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
| | - Otto I Savolainen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
- Department of Clinical Nutrition, University of Eastern Finland, Kuopio, Finland
| | - Alexander H Gower
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
| | - Ievgeniia A Tiukova
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
- Division of Industrial Biotechnology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Ross D King
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
- Alan Turing Institute, London, UK
| |
Collapse
|
22
|
Reichling S, Doubleday PF, Germade T, Bergmann A, Loewith R, Sauer U, Holbrook-Smith D. Dynamic metabolome profiling uncovers potential TOR signaling genes. eLife 2023; 12:84295. [PMID: 36598488 PMCID: PMC9812406 DOI: 10.7554/elife.84295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/18/2022] [Indexed: 01/05/2023] Open
Abstract
Although the genetic code of the yeast Saccharomyces cerevisiae was sequenced 25 years ago, the characterization of the roles of genes within it is far from complete. The lack of a complete mapping of functions to genes hampers systematic understanding of the biology of the cell. The advent of high-throughput metabolomics offers a unique approach to uncovering gene function with an attractive combination of cost, robustness, and breadth of applicability. Here, we used flow-injection time-of-flight mass spectrometry to dynamically profile the metabolome of 164 loss-of-function mutants in TOR and receptor or receptor-like genes under a time course of rapamycin treatment, generating a dataset with >7000 metabolomics measurements. In order to provide a resource to the broader community, those data are made available for browsing through an interactive data visualization app hosted at https://rapamycin-yeast.ethz.ch. We demonstrate that dynamic metabolite responses to rapamycin are more informative than steady-state responses when recovering known regulators of TOR signaling, as well as identifying new ones. Deletion of a subset of the novel genes causes phenotypes and proteome responses to rapamycin that further implicate them in TOR signaling. We found that one of these genes, CFF1, was connected to the regulation of pyrimidine biosynthesis through URA10. These results demonstrate the efficacy of the approach for flagging novel potential TOR signaling-related genes and highlight the utility of dynamic perturbations when using functional metabolomics to deliver biological insight.
Collapse
Affiliation(s)
- Stella Reichling
- Institute of Molecular Systems Biology, ETH ZurichZurichSwitzerland
| | | | - Tomas Germade
- Institute of Molecular Systems Biology, ETH ZurichZurichSwitzerland
| | - Ariane Bergmann
- Department of Molecular Biology, University of GenevaGenevaSwitzerland
| | - Robbie Loewith
- Department of Molecular Biology, University of GenevaGenevaSwitzerland
| | - Uwe Sauer
- Institute of Molecular Systems Biology, ETH ZurichZurichSwitzerland
| | | |
Collapse
|
23
|
Delmas M, Filangi O, Duperier C, Paulhe N, Vinson F, Rodriguez-Mier P, Giacomoni F, Jourdan F, Frainay C. Suggesting disease associations for overlooked metabolites using literature from metabolic neighbors. Gigascience 2022; 12:giad065. [PMID: 37712592 PMCID: PMC10502579 DOI: 10.1093/gigascience/giad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 06/13/2023] [Accepted: 07/28/2023] [Indexed: 09/16/2023] Open
Abstract
In human health research, metabolic signatures extracted from metabolomics data have a strong added value for stratifying patients and identifying biomarkers. Nevertheless, one of the main challenges is to interpret and relate these lists of discriminant metabolites to pathological mechanisms. This task requires experts to combine their knowledge with information extracted from databases and the scientific literature. However, we show that most compounds (>99%) in the PubChem database lack annotated literature. This dearth of available information can have a direct impact on the interpretation of metabolic signatures, which is often restricted to a subset of significant metabolites. To suggest potential pathological phenotypes related to overlooked metabolites that lack annotated literature, we extend the "guilt-by-association" principle to literature information by using a Bayesian framework. The underlying assumption is that the literature associated with the metabolic neighbors of a compound can provide valuable insights, or an a priori, into its biomedical context. The metabolic neighborhood of a compound can be defined from a metabolic network and correspond to metabolites to which it is connected through biochemical reactions. With the proposed approach, we suggest more than 35,000 associations between 1,047 overlooked metabolites and 3,288 diseases (or disease families). All these newly inferred associations are freely available on the FORUM ftp server (see information at https://github.com/eMetaboHUB/Forum-LiteraturePropagation).
Collapse
Affiliation(s)
- Maxime Delmas
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| | - Olivier Filangi
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Christophe Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Nils Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Florence Vinson
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, 31300, France
| | - Pablo Rodriguez-Mier
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| | - Franck Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Fabien Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, 31300, France
| | - Clément Frainay
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| |
Collapse
|
24
|
Jagtap S, Çelikkanat A, Pirayre A, Bidard F, Duval L, Malliaros FD. BraneMF: integration of biological networks for functional analysis of proteins. Bioinformatics 2022; 38:5383-5389. [PMID: 36321881 DOI: 10.1093/bioinformatics/btac691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 10/05/2022] [Accepted: 11/01/2022] [Indexed: 11/11/2022] Open
Abstract
MOTIVATION The cellular system of a living organism is composed of interacting bio-molecules that control cellular processes at multiple levels. Their correspondences are represented by tightly regulated molecular networks. The increase of omics technologies has favored the generation of large-scale disparate data and the consequent demand for simultaneously using molecular and functional interaction networks: gene co-expression, protein-protein interaction (PPI), genetic interaction and metabolic networks. They are rich sources of information at different molecular levels, and their effective integration is essential to understand cell functioning and their building blocks (proteins). Therefore, it is necessary to obtain informative representations of proteins and their proximity, that are not fully captured by features extracted directly from a single informational level. We propose BraneMF, a novel random walk-based matrix factorization method for learning node representation in a multilayer network, with application to omics data integration. RESULTS We test BraneMF with PPI networks of Saccharomyces cerevisiae, a well-studied yeast model organism. We demonstrate the applicability of the learned features for essential multi-omics inference tasks: clustering, function and PPI prediction. We compare it to the state-of-the-art integration methods for multilayer networks. BraneMF outperforms baseline methods by achieving high prediction scores for a variety of downstream tasks. The robustness of results is assessed by an extensive parameter sensitivity analysis. AVAILABILITY AND IMPLEMENTATION BraneMF's code is freely available at: https://github.com/Surabhivj/BraneMF, along with datasets, embeddings and result files. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Surabhi Jagtap
- IFP Energies Nouvelles, 92852 Rueil-Malmaison, France.,Université Paris-Saclay, CentraleSupélec, Inria, Centre for Visual Computing, 91190 Gif-Sur-Yvette, France
| | | | | | | | - Laurent Duval
- IFP Energies Nouvelles, 92852 Rueil-Malmaison, France
| | - Fragkiskos D Malliaros
- Université Paris-Saclay, CentraleSupélec, Inria, Centre for Visual Computing, 91190 Gif-Sur-Yvette, France
| |
Collapse
|
25
|
Yu JSL, Heineike BM, Hartl J, Aulakh SK, Correia-Melo C, Lehmann A, Lemke O, Agostini F, Lee CT, Demichev V, Messner CB, Mülleder M, Ralser M. Inorganic sulfur fixation via a new homocysteine synthase allows yeast cells to cooperatively compensate for methionine auxotrophy. PLoS Biol 2022; 20:e3001912. [PMID: 36455053 PMCID: PMC9757880 DOI: 10.1371/journal.pbio.3001912] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/16/2022] [Accepted: 11/14/2022] [Indexed: 12/03/2022] Open
Abstract
The assimilation, incorporation, and metabolism of sulfur is a fundamental process across all domains of life, yet how cells deal with varying sulfur availability is not well understood. We studied an unresolved conundrum of sulfur fixation in yeast, in which organosulfur auxotrophy caused by deletion of the homocysteine synthase Met17p is overcome when cells are inoculated at high cell density. In combining the use of self-establishing metabolically cooperating (SeMeCo) communities with proteomic, genetic, and biochemical approaches, we discovered an uncharacterized gene product YLL058Wp, herein named Hydrogen Sulfide Utilizing-1 (HSU1). Hsu1p acts as a homocysteine synthase and allows the cells to substitute for Met17p by reassimilating hydrosulfide ions leaked from met17Δ cells into O-acetyl-homoserine and forming homocysteine. Our results show that cells can cooperate to achieve sulfur fixation, indicating that the collective properties of microbial communities facilitate their basic metabolic capacity to overcome sulfur limitation.
Collapse
Affiliation(s)
- Jason S. L. Yu
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Benjamin M. Heineike
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Johannes Hartl
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Simran K. Aulakh
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Clara Correia-Melo
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Andrea Lehmann
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Oliver Lemke
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Federica Agostini
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Cory T. Lee
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Christoph B. Messner
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Michael Mülleder
- Core Facility—High Throughput Mass Spectrometry, Charité Universitätsmedizin, Berlin, Germany
| | - Markus Ralser
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| |
Collapse
|
26
|
Antolin AA, Sanfelice D, Crisp A, Villasclaras Fernandez E, Mica IL, Chen Y, Collins I, Edwards A, Müller S, Al-Lazikani B, Workman P. The Chemical Probes Portal: an expert review-based public resource to empower chemical probe assessment, selection and use. Nucleic Acids Res 2022; 51:D1492-D1502. [PMID: 36268860 PMCID: PMC9825478 DOI: 10.1093/nar/gkac909] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/30/2022] [Accepted: 10/05/2022] [Indexed: 01/30/2023] Open
Abstract
We describe the Chemical Probes Portal (https://www.chemicalprobes.org/), an expert review-based public resource to empower chemical probe assessment, selection and use. Chemical probes are high-quality small-molecule reagents, often inhibitors, that are important for exploring protein function and biological mechanisms, and for validating targets for drug discovery. The publication, dissemination and use of chemical probes provide an important means to accelerate the functional annotation of proteins, the study of proteins in cell biology, physiology, and disease pathology, and to inform and enable subsequent pioneering drug discovery and development efforts. However, the widespread use of small-molecule compounds that are claimed as chemical probes but are lacking sufficient quality, especially being inadequately selective for the desired target or even broadly promiscuous in behaviour, has resulted in many erroneous conclusions in the biomedical literature. The Chemical Probes Portal was established as a public resource to aid the selection and best-practice use of chemical probes in basic and translational biomedical research. We describe the background, principles and content of the Portal and its technical development, as well as examples of its applications and use. The Chemical Probes Portal is a community resource and we therefore describe how researchers can be involved in its content and development.
Collapse
Affiliation(s)
- Albert A Antolin
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, SM2 5NG, UK,Department of Data Science, The Institute of Cancer Research, London, SM2 5NG, UK,Chemical Probes Portal, www.chemicalprobes.org
| | - Domenico Sanfelice
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, SM2 5NG, UK,Department of Data Science, The Institute of Cancer Research, London, SM2 5NG, UK,Chemical Probes Portal, www.chemicalprobes.org
| | - Alisa Crisp
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, SM2 5NG, UK,Department of Data Science, The Institute of Cancer Research, London, SM2 5NG, UK,Chemical Probes Portal, www.chemicalprobes.org
| | - Eloy Villasclaras Fernandez
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, SM2 5NG, UK,Department of Data Science, The Institute of Cancer Research, London, SM2 5NG, UK,Chemical Probes Portal, www.chemicalprobes.org
| | - Ioan L Mica
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, SM2 5NG, UK,Department of Data Science, The Institute of Cancer Research, London, SM2 5NG, UK,Chemical Probes Portal, www.chemicalprobes.org
| | - Yi Chen
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, SM2 5NG, UK,Department of Data Science, The Institute of Cancer Research, London, SM2 5NG, UK,Chemical Probes Portal, www.chemicalprobes.org
| | - Ian Collins
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, SM2 5NG, UK,Centre for Cancer Drug Discovery, The Institute of Cancer Research, London, SM2 5NG, UK,Chemical Probes Portal, www.chemicalprobes.org
| | - Aled Edwards
- Structural Genomics Consortium, University of Toronto, Toronto, ONM5G 1L7, Canada,Chemical Probes Portal, www.chemicalprobes.org
| | | | | | - Paul Workman
- To whom correspondence should be addressed. Tel: +44 2087224580;
| |
Collapse
|
27
|
Monzon V, Paysan-Lafosse T, Wood V, Bateman A. Reciprocal best structure hits: using AlphaFold models to discover distant homologues. BIOINFORMATICS ADVANCES 2022; 2:vbac072. [PMID: 36408459 PMCID: PMC9666668 DOI: 10.1093/bioadv/vbac072] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/16/2022] [Accepted: 10/05/2022] [Indexed: 11/17/2022]
Abstract
Motivation The conventional methods to detect homologous protein pairs use the comparison of protein sequences. But the sequences of two homologous proteins may diverge significantly and consequently may be undetectable by standard approaches. The release of the AlphaFold 2.0 software enables the prediction of highly accurate protein structures and opens many opportunities to advance our understanding of protein functions, including the detection of homologous protein structure pairs. Results In this proof-of-concept work, we search for the closest homologous protein pairs using the structure models of five model organisms from the AlphaFold database. We compare the results with homologous protein pairs detected by their sequence similarity and show that the structural matching approach finds a similar set of results. In addition, we detect potential novel homologs solely with the structural matching approach, which can help to understand the function of uncharacterized proteins and make previously overlooked connections between well-characterized proteins. We also observe limitations of our implementation of the structure-based approach, particularly when handling highly disordered proteins or short protein structures. Our work shows that high accuracy protein structure models can be used to discover homologous protein pairs, and we expose areas for improvement of this structural matching approach. Availability and Implementation Information to the discovered homologous protein pairs can be found at the following URL: https://doi.org/10.17863/CAM.87873. The code can be accessed here: https://github.com/VivianMonzon/Reciprocal_Best_Structure_Hits. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Vivian Monzon
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB21 4HH, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB21 4HH, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB21 4HH, UK
| |
Collapse
|
28
|
Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational - The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022; 20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.
Collapse
Affiliation(s)
- Abhishek Subramanian
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Pooya Zakeri
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Centre for Brain and Disease Research, Flanders Institute for Biotechnology (VIB), Leuven, Belgium
- Department of Neurosciences and Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Mira Mousa
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Halima Alnaqbi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Fatima Yousif Alshamsi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Leo Bettoni
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ernesto Damiani
- Robotics and Intelligent Systems Institute, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Habiba Alsafar
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Peter Carmeliet
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| |
Collapse
|
29
|
Asakawa H, Hirano Y, Shindo T, Haraguchi T, Hiraoka Y. Fission yeast Ish1 and Les1 interact with each other in the lumen of the nuclear envelope. Genes Cells 2022; 27:643-656. [PMID: 36043331 DOI: 10.1111/gtc.12981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/09/2022] [Accepted: 08/24/2022] [Indexed: 11/28/2022]
Abstract
Nuclear envelope (NE) provides a permeable barrier that separates the eukaryotic genome from the cytoplasm. NE is a double membrane composed of inner and outer nuclear membranes. Ish1 is a stress-responsive NE protein in the fission yeast, Schizosaccharomyces pombe. Les1 is another NE protein that shares several similar domains with Ish1, but the relationship between them remains unknown. In this study, using fluorescence and electron microscopy, we found that most regions of these proteins were localized within the NE lumen. We also found that Ish1 interacted with Les1 via its C-terminal region in the NE lumen and that the NE localization of Ish1 depended on the C-terminal region of Les1. Ish1 and Les1 were co-localized at the NE in interphase cells, but when the nucleus divided at the end of mitosis (closed mitosis), they showed distinguishable localization at the midzone membrane domain. These results suggest the regulated interaction between Ish1 and Les1 in the NE lumen, although this interaction does not appear to be essential for cell survival. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Haruhiko Asakawa
- Graduate School of Frontier Biosciences, Osaka University, 1-3 Yamadaoka, Suita, Japan
| | - Yasuhiro Hirano
- Graduate School of Frontier Biosciences, Osaka University, 1-3 Yamadaoka, Suita, Japan
| | - Tomoko Shindo
- Keio University, School of Medicine, Shinjuku-ku, Tokyo, Japan
| | - Tokuko Haraguchi
- Graduate School of Frontier Biosciences, Osaka University, 1-3 Yamadaoka, Suita, Japan
| | - Yasushi Hiraoka
- Graduate School of Frontier Biosciences, Osaka University, 1-3 Yamadaoka, Suita, Japan
| |
Collapse
|
30
|
A Comparative Analysis of the Core Proteomes within and among the Bacillus subtilis and Bacillus cereus Evolutionary Groups Reveals the Patterns of Lineage- and Species-Specific Adaptations. Microorganisms 2022; 10:microorganisms10091720. [PMID: 36144322 PMCID: PMC9505155 DOI: 10.3390/microorganisms10091720] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/19/2022] [Accepted: 08/23/2022] [Indexed: 11/17/2022] Open
Abstract
By integrating phylogenomic and comparative analyses of 1104 high-quality genome sequences, we identify the core proteins and the lineage-specific fingerprint proteins of the various evolutionary clusters (clades/groups/species) of the Bacillus genus. As fingerprints, we denote those core proteins of a certain lineage that are present only in that particular lineage and absent in any other Bacillus lineage. Thus, these lineage-specific fingerprints are expected to be involved in particular adaptations of that lineage. Intriguingly, with a few notable exceptions, the majority of the Bacillus species demonstrate a rather low number of species-specific fingerprints, with the majority of them being of unknown function. Therefore, species-specific adaptations are mostly attributed to highly unstable (in evolutionary terms) accessory proteomes and possibly to changes at the gene regulation level. A series of comparative analyses consistently demonstrated that the progenitor of the Cereus Clade underwent an extensive genomic expansion of chromosomal protein-coding genes. In addition, the majority (76–82%) of the B. subtilis proteins that are essential or play a significant role in sporulation have close homologs in most species of both the Subtilis and the Cereus Clades. Finally, the identification of lineage-specific fingerprints by this study may allow for the future development of highly specific vaccines, therapeutic molecules, or rapid and low-cost molecular tests for species identification.
Collapse
|
31
|
de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022; 2022:baac062. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]
Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Collapse
Affiliation(s)
- Valérie de Crécy-lagard
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19713, USA
| | - Jill Babor
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Crysten Blaby-Haas
- Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Alan J Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Stacey Cleveland
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Lucy J Colwell
- Departmenf of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Ana Conesa
- Spanish National Research Council, Institute for Integrative Systems Biology, Paterna, Valencia 46980, Spain
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
| | - Anita de Waard
- Research Collaboration Unit, Elsevier, Jericho, VT 05465, USA
| | - Adam Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Raquel Dias
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
| | - Gang Fang
- NYU-Shanghai, Shanghai 200120, China
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - John Gerlt
- Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Joshua Goldford
- Physics of Living Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Mark Gorelik
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Christopher Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | | | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Claire McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Paul Monaghan
- Department of Agricultural Education and Communication, University of Florida, Gainesville, FL 32611, USA
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Darren A Natale
- Georgetown University Medical Center, Washington, DC 20007, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratories, Richland, WA 99354, USA
| | - Seán O’Donoghue
- School of Biotechnology and Biomolecular Sciences, University of NSW, Sydney, NSW 2052, Australia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Colbie Reed
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Dmitri Rodionov
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Irina A Rodionova
- Department of Bioengineering, Division of Engineering, University of California at San Diego, La Jolla, CA 92093-0412, USA
| | - Jeffrey D Rudolf
- Department of Chemistry, University of Florida, Gainesville, FL 32611, USA
| | - Lana Saleh
- New England Biolabs, Ipswich, MA 01938, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90033, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry 91057, France
| | - Erica Watson Carter
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | | | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jin Xu
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| |
Collapse
|
32
|
|
33
|
Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber J. An open invitation to the Understudied Proteins Initiative. Nat Biotechnol 2022; 40:815-817. [PMID: 35534555 DOI: 10.1038/s41587-022-01316-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Georg Kustatscher
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK.
| | | | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Sinai Health System, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Tiannan Guo
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, China
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Kathryn S Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Emma Lundberg
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden.,Department of Bioengineering, Stanford University, Stanford, CA, USA.,Department of Pathology, Stanford University, Stanford, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX, USA
| | - Markus Ralser
- Department of Biochemistry, Charité University Medicine, Berlin, Germany.,The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Juri Rappsilber
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK. .,Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany. .,Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
34
|
Higgins DP, Weisman CM, Lui DS, D'Agostino FA, Walker AK. Defining characteristics and conservation of poorly annotated genes in Caenorhabditis elegans using WormCat 2.0. Genetics 2022; 221:6588682. [PMID: 35587742 PMCID: PMC9339291 DOI: 10.1093/genetics/iyac085] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 05/04/2022] [Indexed: 12/03/2022] Open
Abstract
Omics tools provide broad datasets for biological discovery. However, the computational tools for identifying important genes or pathways in RNA-seq, proteomics, or GWAS (Genome-Wide Association Study) data depend on Gene Ontogeny annotations and are biased toward well-described pathways. This limits their utility as poorly annotated genes, which could have novel functions, are often passed over. Recently, we developed an annotation and category enrichment tool for Caenorhabditis elegans genomic data, WormCat, which provides an intuitive visualization output. Unlike Gene Ontogeny-based enrichment tools, which exclude genes with no annotation information, WormCat 2.0 retains these genes as a special UNASSIGNED category. Here, we show that the UNASSIGNED gene category enrichment exhibits tissue-specific expression patterns and can include genes with biological functions identified in published datasets. Poorly annotated genes are often considered to be potentially species-specific and thus, of reduced interest to the biomedical community. Instead, we find that around 3% of the UNASSIGNED genes have human orthologs, including some linked to human diseases. These human orthologs themselves have little annotation information. A recently developed method that incorporates lineage relationships (abSENSE) indicates that the failure of BLAST to detect homology explains the apparent lineage specificity for many UNASSIGNED genes. This suggests that a larger subset could be related to human genes. WormCat provides an annotation strategy that allows the association of UNASSIGNED genes with specific phenotypes and known pathways. Building these associations in C. elegans, with its robust genetic tools, provides a path to further functional study and insight into these understudied genes.
Collapse
Affiliation(s)
- Daniel P Higgins
- Program in Molecular Medicine, UMASS Chan Medical School, Worcester MA 01605, USA
| | - Caroline M Weisman
- Lewis-Sigler Institute for Quantitative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Dominique S Lui
- Program in Molecular Medicine, UMASS Chan Medical School, Worcester MA 01605, USA
| | - Frank A D'Agostino
- Department of Applied Mathematics, Harvard University, Cambridge MA 02138, USA
| | - Amy K Walker
- Program in Molecular Medicine, UMASS Chan Medical School, Worcester MA 01605, USA
| |
Collapse
|
35
|
Delmont TO, Gaia M, Hinsinger DD, Frémont P, Vanni C, Fernandez-Guerra A, Eren AM, Kourlaiev A, d'Agata L, Clayssen Q, Villar E, Labadie K, Cruaud C, Poulain J, Da Silva C, Wessner M, Noel B, Aury JM, de Vargas C, Bowler C, Karsenti E, Pelletier E, Wincker P, Jaillon O. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. CELL GENOMICS 2022; 2:100123. [PMID: 36778897 PMCID: PMC9903769 DOI: 10.1016/j.xgen.2022.100123] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 12/10/2021] [Accepted: 04/04/2022] [Indexed: 12/20/2022]
Abstract
Marine planktonic eukaryotes play critical roles in global biogeochemical cycles and climate. However, their poor representation in culture collections limits our understanding of the evolutionary history and genomic underpinnings of planktonic ecosystems. Here, we used 280 billion Tara Oceans metagenomic reads from polar, temperate, and tropical sunlit oceans to reconstruct and manually curate more than 700 abundant and widespread eukaryotic environmental genomes ranging from 10 Mbp to 1.3 Gbp. This genomic resource covers a wide range of poorly characterized eukaryotic lineages that complement long-standing contributions from culture collections while better representing plankton in the upper layer of the oceans. We performed the first, to our knowledge, comprehensive genome-wide functional classification of abundant unicellular eukaryotic plankton, revealing four major groups connecting distantly related lineages. Neither trophic modes of plankton nor its vertical evolutionary history could completely explain the functional repertoire convergence of major eukaryotic lineages that coexisted within oceanic currents for millions of years.
Collapse
Affiliation(s)
- Tom O. Delmont
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Morgan Gaia
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Damien D. Hinsinger
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Paul Frémont
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Chiara Vanni
- Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Antonio Fernandez-Guerra
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - A. Murat Eren
- Helmholtz Institute for Functional Marine Biodiversity at Oldenburg, Germany
| | - Artem Kourlaiev
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Leo d'Agata
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Quentin Clayssen
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Emilie Villar
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
| | - Karine Labadie
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Corinne Cruaud
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Julie Poulain
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Corinne Da Silva
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Marc Wessner
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Colomban de Vargas
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
- Sorbonne Université and CNRS, UMR 7144 (AD2M), ECOMAP, Station Biologique de Roscoff, Roscoff, France
| | - Chris Bowler
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
- Institut de Biologie de l’ENS, Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Eric Karsenti
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
- Sorbonne Université and CNRS, UMR 7144 (AD2M), ECOMAP, Station Biologique de Roscoff, Roscoff, France
- Directors’ Research, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Eric Pelletier
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Olivier Jaillon
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| |
Collapse
|
36
|
Lario S, Ramírez-Lázaro MJ, Brunet-Vega A, Vila-Casadesús M, Aransay AM, Lozano JJ, Calvet X. Coding and non-coding co-expression network analysis identifies key modules and driver genes associated with precursor lesions of gastric cancer. Genomics 2022; 114:110370. [PMID: 35430283 DOI: 10.1016/j.ygeno.2022.110370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/23/2022] [Accepted: 04/11/2022] [Indexed: 01/14/2023]
Abstract
BACKGROUND Helicobacter pylori infection is the most important risk factor for gastric cancer (GC). Human gastric adenocarcinoma develops after long-term H. pylori infection via the Correa cascade. This carcinogenic pathway describes the progression from gastritis to atrophy, intestinal metaplasia (IM), dysplasia and GC. Patients with atrophy and intestinal metaplasia are considered to have precancerous lesions of GC (PLGC). H. pylori eradication and endoscopy surveillance are currently the only interventions for preventing GC. Better knowledge of the biology of human PLGC may help find stratification markers and contribute to better understanding of biological mechanisms. One way to achieve this is by using co-expression network analysis. Weighted gene co-expression network analysis (WGCNA) is often used to identify modules from co-expression networks and relate them to clinical traits. It also allows identification of driver genes that may be critical for PLGC. AIM The purpose of this study was to identify co-expression modules and differential gene expression in dyspeptic patients at different stages of the Correa pathway. METHODS We studied 96 gastric biopsies from 78 patients that were clinically classified as: non-active (n = 10) and chronic-active gastritis (n = 20), atrophy (n = 12), and IM (n = 36). Gene expression of coding RNAs was determined by microarrays and non-coding RNAs by RNA-seq. The WGCNA package was used for network construction, module detection, module preservation and hub and driver gene selection. RESULTS WGCNA identified 20 modules for coding RNAs and 4 for each miRNA and small RNA class. Modules were associated with antrum and corpus gastric locations, chronic gastritis and IM. Notably, coding RNA modules correlated with the Correa cascade. One was associated with the presence of H. pylori. In three modules, the module eigengene (ME) gradually increased in the stages toward IM, while in three others the inverse relationship was found. One miRNA module was negatively correlated to IM and was used for a mRNA-miRNA integration analysis. WGCNA also uncovered driver genes. Driver genes show both high connectivity within a module and are significantly associated with clinical traits. Some of those genes have been previously involved in H. pylori carcinogenesis, but others are new. Lastly, using similar external transcriptomic data, we confirmed that the discovered mRNA modules were highly preserved. CONCLUSION Our analysis captured co-expression modules that provide valuable information to understand the pathogenesis of the progression of PLGC.
Collapse
Affiliation(s)
- Sergio Lario
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III, Madrid, Spain; Digestive Diseases Unit, Hospital Universitari Parc Taulí, Institut d'Investigació i Innovació Parc Taulí I3PT, Universitat Autònoma de Barcelona, Sabadell, Spain.
| | - María J Ramírez-Lázaro
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III, Madrid, Spain; Digestive Diseases Unit, Hospital Universitari Parc Taulí, Institut d'Investigació i Innovació Parc Taulí I3PT, Universitat Autònoma de Barcelona, Sabadell, Spain
| | - Anna Brunet-Vega
- Oncology Unit, Hospital Universitari Parc Taulí, Institut d'Investigació i Innovació Parc Taulí I3PT, Universitat Autònoma de Barcelona, Sabadell, Spain
| | - Maria Vila-Casadesús
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III, Madrid, Spain; Bioinformatics Platform, CIBEREHD, Barcelona, Spain
| | - Ana M Aransay
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III, Madrid, Spain; Genome Analysis Platform, CIC bioGUNE, Bizkaia Technology Park, Derio, Bizkaia, Spain
| | - Juan J Lozano
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III, Madrid, Spain; Bioinformatics Platform, CIBEREHD, Barcelona, Spain
| | - Xavier Calvet
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III, Madrid, Spain; Digestive Diseases Unit, Hospital Universitari Parc Taulí, Institut d'Investigació i Innovació Parc Taulí I3PT, Universitat Autònoma de Barcelona, Sabadell, Spain; Departament de Medicina, UAB, Sabadell, Spain
| |
Collapse
|
37
|
Oliver SG. From Petri Plates to Petri Nets, a revolution in yeast biology. FEMS Yeast Res 2022; 22:6526310. [PMID: 35142857 PMCID: PMC8862034 DOI: 10.1093/femsyr/foac008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 01/26/2022] [Accepted: 02/07/2022] [Indexed: 11/22/2022] Open
Affiliation(s)
- Stephen G Oliver
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| |
Collapse
|
38
|
A deep learning model to detect novel pore-forming proteins. Sci Rep 2022; 12:2013. [PMID: 35132124 PMCID: PMC8821639 DOI: 10.1038/s41598-022-05970-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 01/12/2022] [Indexed: 11/09/2022] Open
Abstract
Many pore-forming proteins originating from pathogenic bacteria are toxic against agricultural pests. They are the key ingredients in several pesticidal products for agricultural use, including transgenic crops. There is an urgent need to identify novel pore-forming proteins to combat development of resistance in pests to existing products, and to develop products that are effective against a broader range of pests. Existing computational methodologies to search for these proteins rely on sequence homology-based approaches. These approaches are based on similarities between protein sequences, and thus are limited in their usefulness for discovering novel proteins. In this paper, we outline a novel deep learning model trained on pore-forming proteins from the public domain. We compare different ways of encoding protein information during training, and contrast it with traditional approaches. We show that our model is capable of identifying known pore formers with no sequence similarity to the proteins used to train the model, and therefore holds promise for identifying novel pore formers.
Collapse
|
39
|
Rodriguez-Lopez M, Anver S, Cotobal C, Kamrad S, Malecki M, Correia-Melo C, Hoti M, Townsend S, Marguerat S, Pong SK, Wu MY, Montemayor L, Howell M, Ralser M, Bähler J. Functional profiling of long intergenic non-coding RNAs in fission yeast. eLife 2022; 11:e76000. [PMID: 34984977 PMCID: PMC8730722 DOI: 10.7554/elife.76000] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 12/19/2022] Open
Abstract
Eukaryotic genomes express numerous long intergenic non-coding RNAs (lincRNAs) that do not overlap any coding genes. Some lincRNAs function in various aspects of gene regulation, but it is not clear in general to what extent lincRNAs contribute to the information flow from genotype to phenotype. To explore this question, we systematically analysed cellular roles of lincRNAs in Schizosaccharomyces pombe. Using seamless CRISPR/Cas9-based genome editing, we deleted 141 lincRNA genes to broadly phenotype these mutants, together with 238 diverse coding-gene mutants for functional context. We applied high-throughput colony-based assays to determine mutant growth and viability in benign conditions and in response to 145 different nutrient, drug, and stress conditions. These analyses uncovered phenotypes for 47.5% of the lincRNAs and 96% of the protein-coding genes. For 110 lincRNA mutants, we also performed high-throughput microscopy and flow cytometry assays, linking 37% of these lincRNAs with cell-size and/or cell-cycle control. With all assays combined, we detected phenotypes for 84 (59.6%) of all lincRNA deletion mutants tested. For complementary functional inference, we analysed colony growth of strains ectopically overexpressing 113 lincRNA genes under 47 different conditions. Of these overexpression strains, 102 (90.3%) showed altered growth under certain conditions. Clustering analyses provided further functional clues and relationships for some of the lincRNAs. These rich phenomics datasets associate lincRNA mutants with hundreds of phenotypes, indicating that most of the lincRNAs analysed exert cellular functions in specific environmental or physiological contexts. This study provides groundwork to further dissect the roles of these lincRNAs in the relevant conditions.
Collapse
Affiliation(s)
- Maria Rodriguez-Lopez
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Shajahan Anver
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Cristina Cotobal
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- The Francis Crick Institute, Molecular Biology of Metabolism LaboratoryLondonUnited Kingdom
- Charité Universitätsmedizin Berlin, Institute of BiochemistryBerlinGermany
| | - Michal Malecki
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Clara Correia-Melo
- The Francis Crick Institute, Molecular Biology of Metabolism LaboratoryLondonUnited Kingdom
| | - Mimoza Hoti
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - StJohn Townsend
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- The Francis Crick Institute, Molecular Biology of Metabolism LaboratoryLondonUnited Kingdom
| | - Samuel Marguerat
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Sheng Kai Pong
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Mary Y Wu
- The Francis Crick Institute, High Throughput ScreeningLondonUnited Kingdom
| | - Luis Montemayor
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Michael Howell
- The Francis Crick Institute, High Throughput ScreeningLondonUnited Kingdom
| | - Markus Ralser
- The Francis Crick Institute, Molecular Biology of Metabolism LaboratoryLondonUnited Kingdom
- Charité Universitätsmedizin Berlin, Institute of BiochemistryBerlinGermany
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
40
|
Harris MA, Rutherford KM, Hayles J, Lock A, Bähler J, Oliver SG, Mata J, Wood V. Fission stories: using PomBase to understand Schizosaccharomyces pombe biology. Genetics 2021; 220:6481557. [PMID: 35100366 PMCID: PMC9209812 DOI: 10.1093/genetics/iyab222] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
PomBase (www.pombase.org), the model organism database (MOD) for the fission yeast Schizosaccharomyces pombe, supports research within and beyond the S. pombe community by integrating and presenting genetic, molecular, and cell biological knowledge into intuitive displays and comprehensive data collections. With new content, novel query capabilities, and biologist-friendly data summaries and visualization, PomBase also drives innovation in the MOD community.
Collapse
Affiliation(s)
- Midori A Harris
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,Corresponding author: (M.A.H.); (V.W.)
| | - Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Jacqueline Hayles
- Cell Cycle Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Antonia Lock
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Jürg Bähler
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Stephen G Oliver
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Juan Mata
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,Corresponding author: (M.A.H.); (V.W.)
| |
Collapse
|
41
|
Rutherford KM, Harris MA, Oliferenko S, Wood V. JaponicusDB: rapid deployment of a model organism database for an emerging model species. Genetics 2021; 220:6481558. [PMID: 35380656 PMCID: PMC9209809 DOI: 10.1093/genetics/iyab223] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/09/2021] [Indexed: 02/03/2023] Open
Abstract
The fission yeast Schizosaccharomyces japonicus has recently emerged as a powerful system for studying the evolution of essential cellular processes, drawing on similarities as well as key differences between S. japonicus and the related, well-established model Schizosaccharomyces pombe. We have deployed the open-source, modular code and tools originally developed for PomBase, the S. pombe model organism database (MOD), to create JaponicusDB (www.japonicusdb.org), a new MOD dedicated to S. japonicus. By providing a central resource with ready access to a growing body of experimental data, ontology-based curation, seamless browsing and querying, and the ability to integrate new data with existing knowledge, JaponicusDB supports fission yeast biologists to a far greater extent than any other source of S. japonicus data. JaponicusDB thus enables S. japonicus researchers to realize the full potential of studying a newly emerging model species and illustrates the widely applicable power and utility of harnessing reusable PomBase code to build a comprehensive, community-maintainable repository of species-relevant knowledge.
Collapse
Affiliation(s)
- Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Midori A Harris
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Snezhana Oliferenko
- The Francis Crick Institute, London NW1 1AT, UK,Randall Centre for Cell and Molecular Biophysics, School of Basic and Medical Biosciences, King’s College London, London SE1 1UL, UK,Corresponding author: (S.O.); (V.W.)
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,Corresponding author: (S.O.); (V.W.)
| |
Collapse
|
42
|
Dupree EJ, Manzoor Z, Alwine S, Crimmins BS, Holsen TM, Darie CC. Proteomic analysis of the lake trout (Salvelinus namaycush) heart and blood: The beginning of a comprehensive lake trout protein database. Proteomics 2021; 22:e2100146. [PMID: 34676671 DOI: 10.1002/pmic.202100146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 10/14/2021] [Accepted: 10/15/2021] [Indexed: 11/07/2022]
Abstract
Lake trout (Salvelinus namaycush) are a top-predator species in the Laurentian Great Lakes that are often used as bioindicators of chemical stressors in the ecosystem. Although many studies are done using these fish to determine concentrations of stressors like legacy persistent, bioaccumulative and toxic chemicals, there are currently no proteomic studies on the biological effects these stressors have on the ecosystem. This lack of proteomic studies on Great Lakes lake trout is because there is currently no complete, comprehensive protein database for this species. Here, we employed proteomics approaches to develop a lake trout protein database that could aid in future research on this fish, in particular exposomics and adductomics. The current study utilized heart tissue and blood from two lake trout. Our previous work using lake trout liver revealed 4194 potential protein hits in the NCBI databases and 3811 potential protein hits in the UniProtKB databases. In the current study, using the NCBI databases we identified 838 proteins for the heart and 580 proteins for the blood tissues in the biological replicate 1 (BR1) and 1180 potential protein hits for the heart and 561 potential protein hits for the blood in BR2. Similar results were obtained using the UniProtKB databases. This study builds on our previous work by continuing to build the first comprehensive lake trout protein database and provides insight into protein homology through evolutionary relationships. This data is available via the PRIDE partner repository with the dataset identifier PXD023970.
Collapse
Affiliation(s)
- Emmalyn J Dupree
- Biochemistry and Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, Potsdam, New York, USA
| | - Zaen Manzoor
- Biochemistry and Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, Potsdam, New York, USA
| | - Shelby Alwine
- Biochemistry and Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, Potsdam, New York, USA
| | - Bernard S Crimmins
- Department of Civil and Environmental Engineering, Clarkson University, Potsdam, New York, USA
- AEACS, LLC, New Kensington, Pennsylvania, USA
| | - Thomas M Holsen
- Department of Civil and Environmental Engineering, Clarkson University, Potsdam, New York, USA
| | - Costel C Darie
- Biochemistry and Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, Potsdam, New York, USA
| |
Collapse
|
43
|
Kamenetzky L, Maldonado LL, Cucher MA. Cestodes in the genomic era. Parasitol Res 2021; 121:1077-1089. [PMID: 34665308 DOI: 10.1007/s00436-021-07346-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/10/2021] [Indexed: 12/20/2022]
Abstract
The first cestode genomes were obtained by an international consortium led by the Wellcome Sanger Institute that included representative institutions from countries where the sequenced parasites have been studied for decades, in part because they are etiological agents of endemic diseases (Argentina, Uruguay, Mexico, Canada, UK, Germany, Switzerland, Ireland, USA, Japan, and China). After this, several complete genomes were obtained reaching 16 species to date. Cestode genomes have smaller relative size compared to other animals including free-living flatworms. Moreover, the features genome size and repeat content seem to differ in the two analyzed orders. Cyclophyllidean species have smaller genomes and with fewer repetitive content than Diphyllobothriidean species. On average, cestode genomes have 13,753 genes with 6 exons per gene and 41% GC content. More than 5,000 shared cestode proteins were accurately annotated by the integration of gene predictions and transcriptome evidence being more than 40% of these proteins of unknown function. Several gene losses and reduction of gene families were found and could be related to the extreme parasitic lifestyle of these species. The application of cutting-edge sequencing technology allowed the characterization of the terminal sequences of chromosomes that possess unique characteristics. Here, we review the current status of knowledge of complete cestode genomes and place it within a comparative genomics perspective. Multidisciplinary work together with the implementation of new technologies will provide valuable information that can certainly improve our chances to finally eradicate or at least control diseases caused by cestodes.
Collapse
Affiliation(s)
- Laura Kamenetzky
- iB3, Instituto de Biociencias, Departamento de Fisiología Y Biología Molecular Y Celular, Facultad de Ciencias Exactas Y Naturales, Universidad de Buenos Aires, Biotecnología y Biología traslacional, Ciudad Autónoma de Buenos Aires, Buenos Aires, Argentina.
| | - Lucas L Maldonado
- Department of Microbiology, School of Medicine, University of Buenos Aires, Buenos Aires, Argentina.,Institute of Research On Microbiology and Medical Parasitology (IMPaM, UBA-CONICET), University of Buenos Aires, Buenos Aires, Argentina
| | - Marcela A Cucher
- Department of Microbiology, School of Medicine, University of Buenos Aires, Buenos Aires, Argentina.,Institute of Research On Microbiology and Medical Parasitology (IMPaM, UBA-CONICET), University of Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
44
|
Rodenburg SYA, Seidl MF, de Ridder D, Govers F. Uncovering the Role of Metabolism in Oomycete-Host Interactions Using Genome-Scale Metabolic Models. Front Microbiol 2021; 12:748178. [PMID: 34707596 PMCID: PMC8543037 DOI: 10.3389/fmicb.2021.748178] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 09/10/2021] [Indexed: 12/17/2022] Open
Abstract
Metabolism is the set of biochemical reactions of an organism that enables it to assimilate nutrients from its environment and to generate building blocks for growth and proliferation. It forms a complex network that is intertwined with the many molecular and cellular processes that take place within cells. Systems biology aims to capture the complexity of cells, organisms, or communities by reconstructing models based on information gathered by high-throughput analyses (omics data) and prior knowledge. One type of model is a genome-scale metabolic model (GEM) that allows studying the distributions of metabolic fluxes, i.e., the "mass-flow" through the network of biochemical reactions. GEMs are nowadays widely applied and have been reconstructed for various microbial pathogens, either in a free-living state or in interaction with their hosts, with the aim to gain insight into mechanisms of pathogenicity. In this review, we first introduce the principles of systems biology and GEMs. We then describe how metabolic modeling can contribute to unraveling microbial pathogenesis and host-pathogen interactions, with a specific focus on oomycete plant pathogens and in particular Phytophthora infestans. Subsequently, we review achievements obtained so far and identify and discuss potential pitfalls of current models. Finally, we propose a workflow for reconstructing high-quality GEMs and elaborate on the resources needed to advance a system biology approach aimed at untangling the intimate interactions between plants and pathogens.
Collapse
Affiliation(s)
- Sander Y. A. Rodenburg
- Laboratory of Phytopathology, Wageningen University & Research, Wageningen, Netherlands
- Bioinformatics Group, Wageningen University & Research, Wageningen, Netherlands
| | - Michael F. Seidl
- Laboratory of Phytopathology, Wageningen University & Research, Wageningen, Netherlands
- Theoretical Biology & Bioinformatics group, Department of Biology, Utrecht University, Wageningen, Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University & Research, Wageningen, Netherlands
| | - Francine Govers
- Laboratory of Phytopathology, Wageningen University & Research, Wageningen, Netherlands
| |
Collapse
|
45
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
46
|
Podolsky IA, Schauer EE, Seppälä S, O'Malley MA. Identification of novel membrane proteins for improved lignocellulose conversion. Curr Opin Biotechnol 2021; 73:198-204. [PMID: 34482155 DOI: 10.1016/j.copbio.2021.08.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 08/03/2021] [Accepted: 08/09/2021] [Indexed: 11/28/2022]
Abstract
Lignocellulose processing yields a heterogeneous mixture of substances, which are poorly utilized by current industrial strains. For efficient valorization of recalcitrant biomass, it is critical to identify and engineer new membrane proteins that enable the broad uptake of hydrolyzed substrates. Whereas glucose consumption rarely presents a bottleneck for cell factories, there is also a lack of transporters that allow co-consumption of glucose with other abundant biomass sugars such as xylose. This review discusses recent efforts to bioinformatically identify membrane proteins of high biotech potential for lignocellulose conversion and metabolic engineering in both model and nonconventional organisms. Of particular interest are transporters sourced from anaerobic gut fungi resident to large herbivores, which produce Sugars Will Eventually be Exported Transporters (SWEETs) that enhance xylose transport in the yeast Saccharomyces cerevisiae and enable glucose and xylose co-utilization. Additionally, recently identified fungal cellodextrin transporters are valuable alternatives to mitigate glucose repression and transporter inhibition.
Collapse
Affiliation(s)
- Igor A Podolsky
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA 93106, USA
| | - Elizabeth E Schauer
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA 93106, USA
| | - Susanna Seppälä
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA 93106, USA
| | - Michelle A O'Malley
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA 93106, USA; Joint BioEnergy Institute (JBEI), Emeryville, CA 94608, USA.
| |
Collapse
|
47
|
Reed CJ, Hutinet G, de Crécy-Lagard V. Comparative Genomic Analysis of the DUF34 Protein Family Suggests Role as a Metal Ion Chaperone or Insertase. Biomolecules 2021; 11:1282. [PMID: 34572495 PMCID: PMC8469502 DOI: 10.3390/biom11091282] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 08/20/2021] [Accepted: 08/24/2021] [Indexed: 12/12/2022] Open
Abstract
Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3 protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely annotated as "GTP cyclohydrolase I type 2" through electronic propagation based on one study. Here, the annotation status of this protein family was examined through a comprehensive literature review and integrative bioinformatic analyses that revealed varied pleiotropic associations and phenotypes. This analysis combined with functional complementation studies strongly challenges the current annotation and suggests that DUF34 family members may serve as metal ion insertases, chaperones, or metallocofactor maturases. This general molecular function could explain how DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion homeostasis, pathogen virulence, redox, and universal stress responses.
Collapse
Affiliation(s)
- Colbie J. Reed
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA; (C.J.R.); (G.H.)
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA; (C.J.R.); (G.H.)
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA; (C.J.R.); (G.H.)
- Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
48
|
Levine TP. TMEM106B in humans and Vac7 and Tag1 in yeast are predicted to be lipid transfer proteins. Proteins 2021; 90:164-175. [PMID: 34347309 DOI: 10.1002/prot.26201] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 07/11/2021] [Accepted: 07/23/2021] [Indexed: 11/05/2022]
Abstract
TMEM106B is an integral membrane protein of late endosomes and lysosomes involved in neuronal function, its overexpression being associated with familial frontotemporal lobar degeneration, and point mutation linked to hypomyelination. It has also been identified in multiple screens for host proteins required for productive SARS-CoV-2 infection. Because standard approaches to understand TMEM106B at the sequence level find no homology to other proteins, it has remained a protein of unknown function. Here, the standard tool PSI-BLAST was used in a nonstandard way to show that the lumenal portion of TMEM106B is a member of the late embryogenesis abundant-2 (LEA-2) domain superfamily. More sensitive tools (HMMER, HHpred, and trRosetta) extended this to predict LEA-2 domains in two yeast proteins. One is Vac7, a regulator of PI(3,5)P2 production in the degradative vacuole, equivalent to the lysosome, which has a LEA-2 domain in its lumenal domain. The other is Tag1, another vacuolar protein, which signals to terminate autophagy and has three LEA-2 domains in its lumenal domain. Further analysis of LEA-2 structures indicated that LEA-2 domains have a long, conserved lipid-binding groove. This implies that TMEM106B, Vac7, and Tag1 may all be lipid transfer proteins in the lumen of late endocytic organelles.
Collapse
|
49
|
Jakutis G, Stainier DYR. Genotype-Phenotype Relationships in the Context of Transcriptional Adaptation and Genetic Robustness. Annu Rev Genet 2021; 55:71-91. [PMID: 34314597 DOI: 10.1146/annurev-genet-071719-020342] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genetic manipulations with a robust and predictable outcome are critical to investigate gene function, as well as for therapeutic genome engineering. For many years, knockdown approaches and reagents including RNA interference and antisense oligonucleotides dominated functional studies; however, with the advent of precise genome editing technologies, CRISPR-based knockout systems have become the state-of-the-art tools for such studies. These technologies have helped decipher the role of thousands of genes in development and disease. Their use has also revealed how limited our understanding of genotype-phenotype relationships is. The recent discovery that certain mutations can trigger the transcriptional modulation of other genes, a phenomenon called transcriptional adaptation, has provided an additional explanation for the contradicting phenotypes observed in knockdown versus knockout models and increased awareness about the use of each of these approaches. In this review, we first cover the strengths and limitations of different gene perturbation strategies. Then we highlight the diverse ways in which the genotype-phenotype relationship can be discordant between these different strategies. Finally, we review the genetic robustness mechanisms that can lead to such discrepancies, paying special attention to the recently discovered phenomenon of transcriptional adaptation. Expected final online publication date for the Annual Review of Genetics, Volume 55 is November 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Gabrielius Jakutis
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany;
| | - Didier Y R Stainier
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany; .,German Centre for Cardiovascular Research (DZHK), Partner site Rhine-Main, 60590 Frankfurt am Main, Germany.,Excellence Cluster Cardio-Pulmonary Institute (CPI), 35392 Giessen, Germany
| |
Collapse
|
50
|
Romila CA, Townsend S, Malecki M, Kamrad S, Rodríguez-López M, Hillson O, Cotobal C, Ralser M, Bähler J. Barcode sequencing and a high-throughput assay for chronological lifespan uncover ageing-associated genes in fission yeast. MICROBIAL CELL (GRAZ, AUSTRIA) 2021; 8:146-160. [PMID: 34250083 PMCID: PMC8246024 DOI: 10.15698/mic2021.07.754] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/20/2021] [Accepted: 04/26/2021] [Indexed: 12/15/2022]
Abstract
Ageing-related processes are largely conserved, with simple organisms remaining the main platform to discover and dissect new ageing-associated genes. Yeasts provide potent model systems to study cellular ageing owing their amenability to systematic functional assays under controlled conditions. Even with yeast cells, however, ageing assays can be laborious and resource-intensive. Here we present improved experimental and computational methods to study chronological lifespan in Schizosaccharomyces pombe. We decoded the barcodes for 3206 mutants of the latest gene-deletion library, enabling the parallel profiling of ~700 additional mutants compared to previous screens. We then applied a refined method of barcode sequencing (Bar-seq), addressing technical and statistical issues raised by persisting DNA in dead cells and sampling bottlenecks in aged cultures, to screen for mutants showing altered lifespan during stationary phase. This screen identified 341 long-lived mutants and 1246 short-lived mutants which point to many previously unknown ageing-associated genes, including 46 conserved but entirely uncharacterized genes. The ageing-associated genes showed coherent enrichments in processes also associated with human ageing, particularly with respect to ageing in non-proliferative brain cells. We also developed an automated colony-forming unit assay to facilitate medium- to high-throughput chronological-lifespan studies by saving time and resources compared to the traditional assay. Results from the Bar-seq screen showed good agreement with this new assay. This study provides an effective methodological platform and identifies many new ageing-associated genes as a framework for analysing cellular ageing in yeast and beyond.
Collapse
Affiliation(s)
- Catalina A. Romila
- Institute of Healthy Ageing and Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, UK
- These authors contributed equally
| | - StJohn Townsend
- Institute of Healthy Ageing and Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, UK
- The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London, NW1 1AT, UK
- These authors contributed equally
| | - Michal Malecki
- Institute of Healthy Ageing and Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, UK
- Current address: Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Poland
| | - Stephan Kamrad
- Institute of Healthy Ageing and Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, UK
- The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London, NW1 1AT, UK
- Current address: Charité Universitätsmedizin Berlin, Department of Biochemistry, Germany
| | - María Rodríguez-López
- Institute of Healthy Ageing and Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, UK
| | - Olivia Hillson
- Institute of Healthy Ageing and Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, UK
| | - Cristina Cotobal
- Institute of Healthy Ageing and Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, UK
| | - Markus Ralser
- The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London, NW1 1AT, UK
- Charité Universitätsmedizin Berlin, Department of Biochemistry, Germany
| | - Jürg Bähler
- Institute of Healthy Ageing and Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|