1
|
Rubach P, Sikora M, Jarmolinska A, Perlinska A, Sulkowska J. AlphaKnot 2.0: a web server for the visualization of proteins' knotting and a database of knotted AlphaFold-predicted models. Nucleic Acids Res 2024; 52:W187-W193. [PMID: 38842945 PMCID: PMC11223836 DOI: 10.1093/nar/gkae443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 04/29/2024] [Accepted: 05/10/2024] [Indexed: 07/06/2024] Open
Abstract
The availability of 3D protein models is rapidly increasing with the development of structure prediction algorithms. With the expanding availability of data, new ways of analysis, especially topological analysis, of those predictions are becoming necessary. Here, we present the updated version of the AlphaKnot service that provides a straightforward way of analyzing structure topology. It was designed specifically to determine knot types of the predicted structure models, however, it can be used for all structures, including the ones solved experimentally. AlphaKnot 2.0 provides the user's ability to obtain the knowledge necessary to assess the topological correctness of the model. Both probabilistic and deterministic knot detection methods are available, together with various visualizations (including a trajectory of simplification steps to highlight the topological complexities). Moreover, the web server provides a list of proteins similar to the queried model within AlphaKnot's database and returns their knot types for direct comparison. We pre-calculated the topology of high-quality models from the AlphaFold Database (4th version) and there are now more than 680.000 knotted models available in the AlphaKnot database. AlphaKnot 2.0 is available at https://alphaknot.cent.uw.edu.pl/.
Collapse
Affiliation(s)
- Pawel Rubach
- Warsaw School of Economics, Al. Niepodleglosci 162, 02-554 Warsaw, Poland
| | - Maciej Sikora
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | | | - Agata P Perlinska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Joanna I Sulkowska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| |
Collapse
|
2
|
Hirsch P, Molano LA, Engel A, Zentgraf J, Rahmann S, Hannig M, Müller R, Kern F, Keller A, Schmartz G. Mibianto: ultra-efficient online microbiome analysis through k-mer based metagenomics. Nucleic Acids Res 2024; 52:W407-W414. [PMID: 38716863 PMCID: PMC11223814 DOI: 10.1093/nar/gkae364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/03/2024] [Accepted: 04/24/2024] [Indexed: 07/06/2024] Open
Abstract
Quantifying microbiome species and composition from metagenomic assays is often challenging due to its time-consuming nature and computational complexity. In Bioinformatics, k-mer-based approaches were long established to expedite the analysis of large sequencing data and are now widely used to annotate metagenomic data. We make use of k-mer counting techniques for efficient and accurate compositional analysis of microbiota from whole metagenome sequencing. Mibianto solves this problem by operating directly on read files, without manual preprocessing or complete data exchange. It handles diverse sequencing platforms, including short single-end, paired-end, and long read technologies. Our sketch-based workflow significantly reduces the data volume transferred from the user to the server (up to 99.59% size reduction) to subsequently perform taxonomic profiling with enhanced efficiency and privacy. Mibianto offers functionality beyond k-mer quantification; it supports advanced community composition estimation, including diversity, ordination, and differential abundance analysis. Our tool aids in the standardization of computational workflows, thus supporting reproducibility of scientific sequencing studies. It is adaptable to small- and large-scale experimental designs and offers a user-friendly interface, thus making it an invaluable tool for both clinical and research-oriented metagenomic studies. Mibianto is freely available without the need for a login at: https://www.ccb.uni-saarland.de/mibianto.
Collapse
Affiliation(s)
- Pascal Hirsch
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | | | - Annika Engel
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Jens Zentgraf
- Algorithmic Bioinformatics, Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Sven Rahmann
- Algorithmic Bioinformatics, Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Matthias Hannig
- Clinic of Operative Dentistry, Periodontology and Preventive Dentistry, Saarland University Hospital, Saarland University, Kirrberger Str. 100, Building 73, 66421 Homburg, Saar, Germany
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
- Deutsches Zentrum für Infektionsforschung (DZIF), Standort Hannover-Braunschweig, 38124 Braunschweig, Germany
- PharmaScienceHub, 66123 Saarbrücken, Germany
| | - Fabian Kern
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
| | - Andreas Keller
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
- PharmaScienceHub, 66123 Saarbrücken, Germany
| | - Georges P Schmartz
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| |
Collapse
|
3
|
Nestl BM, Nebel BA, Resch V, Schürmann M, Tischler D. The Development and Opportunities of Predictive Biotechnology. Chembiochem 2024; 25:e202300863. [PMID: 38713151 DOI: 10.1002/cbic.202300863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 04/05/2024] [Indexed: 05/08/2024]
Abstract
Recent advances in bioeconomy allow a holistic view of existing and new process chains and enable novel production routines continuously advanced by academia and industry. All this progress benefits from a growing number of prediction tools that have found their way into the field. For example, automated genome annotations, tools for building model structures of proteins, and structural protein prediction methods such as AlphaFold2TM or RoseTTAFold have gained popularity in recent years. Recently, it has become apparent that more and more AI-based tools are being developed and used for biocatalysis and biotechnology. This is an excellent opportunity for academia and industry to accelerate advancements in the field further. Biotechnology, as a rapidly growing interdisciplinary field, stands to benefit greatly from these developments.
Collapse
Affiliation(s)
- Bettina M Nestl
- Joint working group on biotransformations of the Association for General and Applied Microbiology VAAM, the Society for Chemical Engineering, Biotechnology DECHEMA, Theodor-Heuss-Allee 25, 60486, Frankfurt, Germany
- Innophore GmbH, Am Eisernen Tor 3, 8010, Graz, Austria
| | - Bernd A Nebel
- Innophore GmbH, Am Eisernen Tor 3, 8010, Graz, Austria
| | - Verena Resch
- Innophore GmbH, Am Eisernen Tor 3, 8010, Graz, Austria
| | - Martin Schürmann
- Joint working group on biotransformations of the Association for General and Applied Microbiology VAAM, the Society for Chemical Engineering, Biotechnology DECHEMA, Theodor-Heuss-Allee 25, 60486, Frankfurt, Germany
- InnoSyn B. V., Urmonderbaan 22, 6167 RD, Geleen, The Netherlands
- SynSilico B. V., Urmonderbaan 22, 6167 RD, Geleen, The Netherlands
| | - Dirk Tischler
- Joint working group on biotransformations of the Association for General and Applied Microbiology VAAM, the Society for Chemical Engineering, Biotechnology DECHEMA, Theodor-Heuss-Allee 25, 60486, Frankfurt, Germany
- Microbial Biotechnology, Ruhr University Bochum, Universitätsstrasse 150, 44780, Bochum, Germany
| |
Collapse
|
4
|
Meng L, Zhou B, Liu H, Chen Y, Yuan R, Chen Z, Luo S, Chen H. Advancing toxicity studies of per- and poly-fluoroalkyl substances (pfass) through machine learning: Models, mechanisms, and future directions. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 946:174201. [PMID: 38936709 DOI: 10.1016/j.scitotenv.2024.174201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 06/17/2024] [Accepted: 06/20/2024] [Indexed: 06/29/2024]
Abstract
Perfluorinated and perfluoroalkyl substances (PFASs), encompassing a vast array of isomeric chemicals, are recognized as typical emerging contaminants with direct or potential impacts on human health and the ecological environment. With the complex and elusive toxicological profiles of PFASs, machine learning (ML) has been increasingly employed in their toxicity studies due to its proficiency in prediction and data analytics. This integration is poised to become a predominant trend in environmental toxicology, propelled by the swift advancements in computational technology. This review diligently examines the literature to encapsulate the varied objectives of employing ML in the toxicity studies of PFASs: (1) Utilizing ML to establish Quantitative Structure-Activity Relationship (QSAR) models for PFASs with diverse toxicity endpoints, facilitating the targeted toxicity prediction of unidentified PFASs; (2) Investigating and substantiating the Adverse Outcome Pathway (AOP) through the synergy of ML and traditional toxicological methods, with this refining the toxicity assessment framework for PFASs; (3) Dissecting and elucidating the features of established ML models to advance Open Research into the toxicity of PFASs, with a primary focus on determinants and mechanisms. The discourse extends to an in-depth examination of ML studies, segregating findings based on their distinct application trajectories. Given that ML represents a nascent paradigm within PFASs research, this review delineates the collective challenges encountered in the ML-mediated study of PFAS toxicity and proffers strategic guidance for ensuing investigations.
Collapse
Affiliation(s)
- Lingxuan Meng
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Beihai Zhou
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Haijun Liu
- School of Resources and Environment, Anqing Normal University, Anqing, China.
| | - Yuefang Chen
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Rongfang Yuan
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Zhongbing Chen
- Faculty of Environmental Sciences, Czech University of Life Sciences Prague, Kamýcká 129, 16500 Praha-Suchdol, Czech Republic.
| | - Shuai Luo
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Huilun Chen
- Beijing Key Laboratory of Resource-oriented Treatment of Industrial Pollutants, School of Energy and Environmental Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| |
Collapse
|
5
|
Norton-Baker B, Denton MCR, Murphy NP, Fram B, Lim S, Erickson E, Gauthier NP, Beckham GT. Enabling high-throughput enzyme discovery and engineering with a low-cost, robot-assisted pipeline. Sci Rep 2024; 14:14449. [PMID: 38914665 PMCID: PMC11196671 DOI: 10.1038/s41598-024-64938-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 06/14/2024] [Indexed: 06/26/2024] Open
Abstract
As genomic databases expand and artificial intelligence tools advance, there is a growing demand for efficient characterization of large numbers of proteins. To this end, here we describe a generalizable pipeline for high-throughput protein purification using small-scale expression in E. coli and an affordable liquid-handling robot. This low-cost platform enables the purification of 96 proteins in parallel with minimal waste and is scalable for processing hundreds of proteins weekly per user. We demonstrate the performance of this method with the expression and purification of the leading poly(ethylene terephthalate) hydrolases reported in the literature. Replicate experiments demonstrated reproducibility and enzyme purity and yields (up to 400 µg) sufficient for comprehensive analyses of both thermostability and activity, generating a standardized benchmark dataset for comparing these plastic-degrading enzymes. The cost-effectiveness and ease of implementation of this platform render it broadly applicable to diverse protein characterization challenges in the biological sciences.
Collapse
Grants
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-AC36-08GO28308 Advanced Materials and Manufacturing Technologies Office (AMMTO)
- DE-AC36-08GO28308 Advanced Materials and Manufacturing Technologies Office (AMMTO)
- DE-AC36-08GO28308 Advanced Materials and Manufacturing Technologies Office (AMMTO)
- DE-AC36-08GO28308 Advanced Materials and Manufacturing Technologies Office (AMMTO)
- U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Bioenergy Technologies Office (BETO)
- Bio-Optimized Technologies to keep Thermoplastics out of Landfills and the Environment (BOTTLE) Consortium
- Dana-Farber Cancer Institute
Collapse
Affiliation(s)
- Brenna Norton-Baker
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA
- BOTTLE Consortium, Golden, CO, USA
- Agile BioFoundry, Emeryville, CA, USA
| | - Mackenzie C R Denton
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA
- BOTTLE Consortium, Golden, CO, USA
| | - Natasha P Murphy
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA
- BOTTLE Consortium, Golden, CO, USA
| | - Benjamin Fram
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Samuel Lim
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Erika Erickson
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA
- BOTTLE Consortium, Golden, CO, USA
| | - Nicholas P Gauthier
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Gregg T Beckham
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA.
- BOTTLE Consortium, Golden, CO, USA.
- Agile BioFoundry, Emeryville, CA, USA.
| |
Collapse
|
6
|
Iwaszkiewicz-Eggebrecht E, Zizka V, Lynggaard C. Three steps towards comparability and standardization among molecular methods for characterizing insect communities. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230118. [PMID: 38705189 PMCID: PMC11070264 DOI: 10.1098/rstb.2023.0118] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 12/10/2023] [Indexed: 05/07/2024] Open
Abstract
Molecular methods are currently some of the best-suited technologies for implementation in insect monitoring. However, the field is developing rapidly and lacks agreement on methodology or community standards. To apply DNA-based methods in large-scale monitoring, and to gain insight across commensurate data, we need easy-to-implement standards that improve data comparability. Here, we provide three recommendations for how to improve and harmonize efforts in biodiversity assessment and monitoring via metabarcoding: (i) we should adopt the use of synthetic spike-ins, which will act as positive controls and internal standards; (ii) we should consider using several markers through a multiplex polymerase chain reaction (PCR) approach; and (iii) we should commit to the publication and transparency of all protocol-associated metadata in a standardized fashion. For (i), we provide a ready-to-use recipe for synthetic cytochrome c oxidase spike-ins, which enable between-sample comparisons. For (ii), we propose two gene regions for the implementation of multiplex PCR approaches, thereby achieving a more comprehensive community description. For (iii), we offer guidelines for transparent and unified reporting of field, wet-laboratory and dry-laboratory procedures, as a key to making comparisons between studies. Together, we feel that these three advances will result in joint quality and calibration standards rather than the current laboratory-specific proof of concepts. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.
Collapse
Affiliation(s)
- Ela Iwaszkiewicz-Eggebrecht
- Bioinformatics and Genetics Department, Swedish Museum of Natural History, PO Box 50007, Stockholm, 104 05, Sweden
| | - Vera Zizka
- Leibniz Institute for the Analysis of Biodiversity Change, Museum Koenig Bonn, 53113, Germany
| | - Christina Lynggaard
- Section for Molecular Ecology & Evolution, Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| |
Collapse
|
7
|
Krause GR, Shands W, Wheeler TJ. Sensitive and error-tolerant annotation of protein-coding DNA with BATH. BIOINFORMATICS ADVANCES 2024; 4:vbae088. [PMID: 38966592 PMCID: PMC11223822 DOI: 10.1093/bioadv/vbae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 05/03/2024] [Accepted: 06/10/2024] [Indexed: 07/06/2024]
Abstract
Summary We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment of that DNA to a database of protein sequences or profile hidden Markov models (pHMMs). BATH is built on top of the HMMER3 code base, and simplifies the annotation workflow for pHMM-based translated sequence annotation by providing a straightforward input interface and easy-to-interpret output. BATH also introduces novel frameshift-aware algorithms to detect frameshift-inducing nucleotide insertions and deletions (indels). BATH matches the accuracy of HMMER3 for annotation of sequences containing no errors, and produces superior accuracy to all tested tools for annotation of sequences containing nucleotide indels. These results suggest that BATH should be used when high annotation sensitivity is required, particularly when frameshift errors are expected to interrupt protein-coding regions, as is true with long-read sequencing data and in the context of pseudogenes. Availability and implementation The software is available at https://github.com/TravisWheelerLab/BATH.
Collapse
Affiliation(s)
- Genevieve R Krause
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
| | - Walt Shands
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
- Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, United States
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
| |
Collapse
|
8
|
Hamamsy T, Morton JT, Blackwell R, Berenberg D, Carriero N, Gligorijevic V, Strauss CEM, Leman JK, Cho K, Bonneau R. Protein remote homology detection and structural alignment using deep learning. Nat Biotechnol 2024; 42:975-985. [PMID: 37679542 PMCID: PMC11180608 DOI: 10.1038/s41587-023-01917-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 07/26/2023] [Indexed: 09/09/2023]
Abstract
Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.
Collapse
Grants
- R35GM122515 National Science Foundation (NSF)
- IOS-1546218 National Science Foundation (NSF)
- R35 GM122515 NIGMS NIH HHS
- R01 DK103358 NIDDK NIH HHS
- CBET- 1728858 National Science Foundation (NSF)
- R01 AI130945 NIAID NIH HHS
- This research was supported by NIH R01DK103358, the Simons Foundation, NSF- IOS-1546218, R35GM122515, NSF CBET- 1728858, NIH R01AI130945, to T.H. This research was supported by the intramural research program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) to J.T.M. This research was supported by the Flatiron Institute as part of the Simons Foundation to Robert Blackwell, J.K.L., and N.C. This research was supported by Los Alamos National Lab to C.S. This research was supported by the Samsung Advanced Institute of Technology (Next Generation Deep Learning: from pattern recognition to AI), Samsung Research (Improving Deep Learning using Latent Structure), and NSF Award 1922658 to K.C.
- Simons Foundation
- U.S. Department of Health & Human Services | NIH | Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD)
Collapse
Affiliation(s)
- Tymor Hamamsy
- Center for Data Science, New York University, New York, NY, USA
| | - James T Morton
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Robert Blackwell
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Daniel Berenberg
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Prescient Design, New York, NY, USA
| | - Nicholas Carriero
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | | | | | - Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, USA.
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA.
- Prescient Design, New York, NY, USA.
- CIFAR, Toronto, Ontario, Canada.
| | - Richard Bonneau
- Center for Data Science, New York University, New York, NY, USA.
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA.
- Prescient Design, New York, NY, USA.
- Department of Biology, New York University, New York, NY, USA.
| |
Collapse
|
9
|
Barone F, Russo ET, Villegas Garcia EN, Punta M, Cozzini S, Ansuini A, Cazzaniga A. Protein family annotation for the Unified Human Gastrointestinal Proteome by DPCfam clustering. Sci Data 2024; 11:568. [PMID: 38824125 PMCID: PMC11144186 DOI: 10.1038/s41597-024-03131-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 03/08/2024] [Indexed: 06/03/2024] Open
Abstract
Technological advances in massively parallel sequencing have led to an exponential growth in the number of known protein sequences. Much of this growth originates from metagenomic projects producing new sequences from environmental and clinical samples. The Unified Human Gastrointestinal Proteome (UHGP) catalogue is one of the most relevant metagenomic datasets with applications ranging from medicine to biology. However, the low levels of sequence annotation may impair its usability. This work aims to produce a family classification of UHGP sequences to facilitate downstream structural and functional annotation. This is achieved through the release of the DPCfam-UHGP50 dataset containing 10,778 putative protein families generated using DPCfam clustering, an unsupervised pipeline grouping sequences into single or multi-domain architectures. DPCfam-UHGP50 considerably improves family coverage at protein and residue levels compared to the manually curated repository Pfam. In the hope that DPCfam-UHGP50 will foster future discoveries in the field of metagenomics of the human gut, we release a FAIR-compliant database of our results that is easily accessible via a searchable web server and Zenodo repository.
Collapse
Affiliation(s)
- Federico Barone
- Area Science Park, Padriciano, 99, 34149, Trieste, Italy
- University of Trieste, Trieste, 34127, Italy
| | | | | | - Marco Punta
- IRCCS San Raffaele Institute, Center for Omics Sciences, Milan, 20132, Italy
- IRCCS San Raffaele Institute, Unit of Immunogenetics, Leukemia Genomics and Immunobiology, Division of Immunology, Transplantation and Infectious Disease, Milan, 20132, Italy
| | | | | | | |
Collapse
|
10
|
Coelho LP, Santos-Júnior CD, de la Fuente-Nunez C. Challenges in computational discovery of bioactive peptides in 'omics data. Proteomics 2024; 24:e2300105. [PMID: 38458994 DOI: 10.1002/pmic.202300105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/06/2024] [Accepted: 02/06/2024] [Indexed: 03/10/2024]
Abstract
Peptides have a plethora of activities in biological systems that can potentially be exploited biotechnologically. Several peptides are used clinically, as well as in industry and agriculture. The increase in available 'omics data has recently provided a large opportunity for mining novel enzymes, biosynthetic gene clusters, and molecules. While these data primarily consist of DNA sequences, other types of data provide important complementary information. Due to their size, the approaches proven successful at discovering novel proteins of canonical size cannot be naïvely applied to the discovery of peptides. Peptides can be encoded directly in the genome as short open reading frames (smORFs), or they can be derived from larger proteins by proteolysis. Both of these peptide classes pose challenges as simple methods for their prediction result in large numbers of false positives. Similarly, functional annotation of larger proteins, traditionally based on sequence similarity to infer orthology and then transferring functions between characterized proteins and uncharacterized ones, cannot be applied for short sequences. The use of these techniques is much more limited and alternative approaches based on machine learning are used instead. Here, we review the limitations of traditional methods as well as the alternative methods that have recently been developed for discovering novel bioactive peptides with a focus on prokaryotic genomes and metagenomes.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Woolloongabba, Queensland, Australia
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
- Laboratory of Microbial Processes & Biodiversity - LMPB, Hydrobiology Department, Federal University of São Carlos - UFSCar, São Paulo, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
11
|
Ahmed O, Boucher C, Langmead B. Cliffy: robust 16S rRNA classification based on a compressed LCA index. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.25.595899. [PMID: 38854039 PMCID: PMC11160684 DOI: 10.1101/2024.05.25.595899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Taxonomic sequence classification is a computational problem central to the study of metagenomics and evolution. Advances in compressed indexing with the r -index enable full-text pattern matching against large sequence collections. But the data structures that link pattern sequences to their clades of origin still do not scale well to large collections. Previous work proposed the document array profiles, which use 𝒪 ( rd ) words of space where r is the number of maximal-equal letter runs in the Burrows-Wheeler transform and d is the number of distinct genomes. The linear dependence on d is limiting, since real taxonomies can easily contain 10,000s of leaves or more. We propose a method called cliff compression that reduces this size by a large factor, over 250x when indexing the SILVA 16S rRNA gene database. This method uses Θ( r log d ) words of space in expectation under a random model we propose here. We implemented these ideas in an open source tool called Cliffy that performs efficient taxonomic classification of sequencing reads with respect to a compressed taxonomic index. When applied to simulated 16S rRNA reads, Cliffy's read-level accuracy is higher than Kraken2's by 11-18%. Clade abundances are also more accurately predicted by Cliffy compared to Kraken2 and Bracken. Overall, Cliffy is a fast and space-economical extension to compressed full-text indexes, enabling them to perform fast and accurate taxonomic classification queries. 2012 ACM Subject Classification Applied computing → Computational genomics.
Collapse
|
12
|
Britton RA, Verdu EF, Di Rienzi SC, Reyes Muñoz A, Tarr PI, Preidis GA. Taking Microbiome Science to the Next Level: Recommendations to Advance the Emerging Field of Microbiome-Based Therapeutics and Diagnostics. Gastroenterology 2024:S0016-5085(24)05000-5. [PMID: 38815708 DOI: 10.1053/j.gastro.2024.05.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 05/13/2024] [Accepted: 05/14/2024] [Indexed: 06/01/2024]
Affiliation(s)
- Robert A Britton
- Department of Molecular Virology and Microbiology and Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, Houston, Texas; Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas
| | - Elena F Verdu
- Farncombe Family Digestive Health Research Institute, Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Sara C Di Rienzi
- Department of Molecular Virology and Microbiology and Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, Houston, Texas
| | - Alejandro Reyes Muñoz
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de Los Andes, Bogotá, Colombia
| | - Phillip I Tarr
- Division of Gastroenterology, Hepatology, and Nutrition, Department of Pediatrics, and, Department of Molecular Microbiology, Washington University School of Medicine, St Louis, Missouri
| | - Geoffrey A Preidis
- Division of Gastroenterology, Hepatology, and Nutrition, Department of Pediatrics, Baylor College of Medicine and Texas Children's Hospital, Houston, Texas
| |
Collapse
|
13
|
Fernández-Calvet A, Matilla-Cuenca L, Izco M, Navarro S, Serrano M, Ventura S, Blesa J, Herráiz M, Alkorta-Aranburu G, Galera S, Ruiz de Los Mozos I, Mansego ML, Toledo-Arana A, Alvarez-Erviti L, Valle J. Gut microbiota produces biofilm-associated amyloids with potential for neurodegeneration. Nat Commun 2024; 15:4150. [PMID: 38755164 PMCID: PMC11099085 DOI: 10.1038/s41467-024-48309-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/26/2024] [Indexed: 05/18/2024] Open
Abstract
Age-related neurodegenerative diseases involving amyloid aggregation remain one of the biggest challenges of modern medicine. Alterations in the gastrointestinal microbiome play an active role in the aetiology of neurological disorders. Here, we dissect the amyloidogenic properties of biofilm-associated proteins (BAPs) of the gut microbiota and their implications for synucleinopathies. We demonstrate that BAPs are naturally assembled as amyloid-like fibrils in insoluble fractions isolated from the human gut microbiota. We show that BAP genes are part of the accessory genomes, revealing microbiome variability. Remarkably, the abundance of certain BAP genes in the gut microbiome is correlated with Parkinson's disease (PD) incidence. Using cultured dopaminergic neurons and Caenorhabditis elegans models, we report that BAP-derived amyloids induce α-synuclein aggregation. Our results show that the chaperone-mediated autophagy is compromised by BAP amyloids. Indeed, inoculation of BAP fibrils into the brains of wild-type mice promote key pathological features of PD. Therefore, our findings establish the use of BAP amyloids as potential targets and biomarkers of α-synucleinopathies.
Collapse
Affiliation(s)
- Ariadna Fernández-Calvet
- Instituto de Agrobiotecnología (IDAB). CSIC-Gobierno de Navarra, Avenida Pamplona 123, Mutilva, 31192, Spain
| | - Leticia Matilla-Cuenca
- Instituto de Agrobiotecnología (IDAB). CSIC-Gobierno de Navarra, Avenida Pamplona 123, Mutilva, 31192, Spain
| | - María Izco
- Laboratory of Molecular Neurobiology, Center for Biomedical Research of La Rioja, Logroño, Spain
| | - Susanna Navarro
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquimica i Biologia Molecular, Universitat Autónoma de Barcelona, Bellaterra, Spain
| | - Miriam Serrano
- Instituto de Agrobiotecnología (IDAB). CSIC-Gobierno de Navarra, Avenida Pamplona 123, Mutilva, 31192, Spain
| | - Salvador Ventura
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquimica i Biologia Molecular, Universitat Autónoma de Barcelona, Bellaterra, Spain
| | - Javier Blesa
- HM CINAC (Centro Integral de Neurociencias Abarca Campal), Hospital Universitario HM Puerta del Sur, HM Hospitales, Madrid, Spain
- Instituto de Investigación Sanitaria, HM Hospitales, Madrid, Spain
| | - Maite Herráiz
- Department of Gastroenterology, Clínica Universitaria and Medical School, University of Navarra, Navarra, Spain
- IdiSNA, Instituto de Investigación Sanitaria de Navarra, Pamplona, Spain
| | - Gorka Alkorta-Aranburu
- IdiSNA, Instituto de Investigación Sanitaria de Navarra, Pamplona, Spain
- CIMA LAB Diagnostics, University of Navarra, Pamplona, Spain
| | - Sergio Galera
- Department of Personalized Medicine, NASERTIC, Government of Navarra, Pamplona, Spain
| | | | - María Luisa Mansego
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
| | - Alejandro Toledo-Arana
- Instituto de Agrobiotecnología (IDAB). CSIC-Gobierno de Navarra, Avenida Pamplona 123, Mutilva, 31192, Spain
| | - Lydia Alvarez-Erviti
- Laboratory of Molecular Neurobiology, Center for Biomedical Research of La Rioja, Logroño, Spain
| | - Jaione Valle
- Instituto de Agrobiotecnología (IDAB). CSIC-Gobierno de Navarra, Avenida Pamplona 123, Mutilva, 31192, Spain.
| |
Collapse
|
14
|
Piquer-Esteban S, Arnau V, Diaz W, Moya A. OMD Curation Toolkit: a workflow for in-house curation of public omics datasets. BMC Bioinformatics 2024; 25:184. [PMID: 38724907 PMCID: PMC11084137 DOI: 10.1186/s12859-024-05803-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 05/07/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.
Collapse
Affiliation(s)
- Samuel Piquer-Esteban
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain.
- Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain.
| | - Vicente Arnau
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain
- Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain
- Biomedical Research Networking Centre for Epidemiology and Public Health (CIBEResp), Madrid, Spain
| | - Wladimiro Diaz
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain
- Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain
- Biomedical Research Networking Centre for Epidemiology and Public Health (CIBEResp), Madrid, Spain
| | - Andrés Moya
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish National Research Council, Valencia, Spain.
- Area of Genomics and Health, Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health), Valencia, Spain.
- Biomedical Research Networking Centre for Epidemiology and Public Health (CIBEResp), Madrid, Spain.
| |
Collapse
|
15
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petabase-Scale Homology Search for Structure Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041465. [PMID: 38316555 PMCID: PMC11065157 DOI: 10.1101/cshperspect.a041465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
16
|
Poon BK, Terwilliger TC, Adams PD. The Phenix-AlphaFold webservice: Enabling AlphaFold predictions for use in Phenix. Protein Sci 2024; 33:e4992. [PMID: 38647406 PMCID: PMC11034488 DOI: 10.1002/pro.4992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 03/01/2024] [Accepted: 03/31/2024] [Indexed: 04/25/2024]
Abstract
Advances in machine learning have enabled sufficiently accurate predictions of protein structure to be used in macromolecular structure determination with crystallography and cryo-electron microscopy data. The Phenix software suite has AlphaFold predictions integrated into an automated pipeline that can start with an amino acid sequence and data, and automatically perform model-building and refinement to return a protein model fitted into the data. Due to the steep technical requirements of running AlphaFold efficiently, we have implemented a Phenix-AlphaFold webservice that enables all Phenix users to run AlphaFold predictions remotely from the Phenix GUI starting with the official 1.21 release. This webservice will be improved based on how it is used by the research community and the future research directions for Phenix.
Collapse
Affiliation(s)
- Billy K. Poon
- Molecular Biophysics & Integrated Bioimaging DivisionLawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Thomas C. Terwilliger
- New Mexico ConsortiumLos AlamosNew MexicoUSA
- Los Alamos National LaboratoryLos AlamosNew MexicoUSA
| | - Paul D. Adams
- Molecular Biophysics & Integrated Bioimaging DivisionLawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
- Department of BioengineeringUniversity of California, BerkeleyBerkeleyCaliforniaUSA
| |
Collapse
|
17
|
Sterzi L, Nodari R, Di Marco F, Ferrando ML, Saluzzo F, Spitaleri A, Allahverdi H, Papaleo S, Panelli S, Rimoldi SG, Batisti Biffignandi G, Corbella M, Cavallero A, Prati P, Farina C, Cirillo DM, Zuccotti G, Bandi C, Comandatore F. Genetic barriers more than environmental associations explain Serratia marcescens population structure. Commun Biol 2024; 7:468. [PMID: 38632370 PMCID: PMC11023947 DOI: 10.1038/s42003-024-06069-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/19/2024] [Indexed: 04/19/2024] Open
Abstract
Bacterial species often comprise well-separated lineages, likely emerged and maintained by genetic isolation and/or ecological divergence. How these two evolutionary actors interact in the shaping of bacterial population structure is currently not fully understood. In this study, we investigate the genetic and ecological drivers underlying the evolution of Serratia marcescens, an opportunistic pathogen with high genomic flexibility and able to colonise diverse environments. Comparative genomic analyses reveal a population structure composed of five deeply-demarcated genetic clusters with open pan-genome but limited inter-cluster gene flow, partially explained by Restriction-Modification (R-M) systems incompatibility. Furthermore, a large-scale research on hundred-thousands metagenomic datasets reveals only a partial habitat separation of the clusters. Globally, two clusters only show a separate gene composition coherent with ecological adaptations. These results suggest that genetic isolation has preceded ecological adaptations in the shaping of the species diversity, an evolutionary scenario coherent with the Evolutionary Extended Synthesis.
Collapse
Affiliation(s)
- Lodovico Sterzi
- Department of Biomedical and Clinical Sciences, Pediatric Clinical Research Center "Romeo and Enrica Invernizzi", Università Di Milano, 20157, Milan, Italy
| | - Riccardo Nodari
- Department of Biomedical and Clinical Sciences, Pediatric Clinical Research Center "Romeo and Enrica Invernizzi", Università Di Milano, 20157, Milan, Italy
| | - Federico Di Marco
- Emerging Bacterial Pathogens Unit, Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Maria Laura Ferrando
- Emerging Bacterial Pathogens Unit, Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Francesca Saluzzo
- Emerging Bacterial Pathogens Unit, Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | | | - Hamed Allahverdi
- Department of Biomedical and Clinical Sciences, Pediatric Clinical Research Center "Romeo and Enrica Invernizzi", Università Di Milano, 20157, Milan, Italy
| | - Stella Papaleo
- Department of Biomedical and Clinical Sciences, Pediatric Clinical Research Center "Romeo and Enrica Invernizzi", Università Di Milano, 20157, Milan, Italy
| | - Simona Panelli
- Department of Biomedical and Clinical Sciences, Pediatric Clinical Research Center "Romeo and Enrica Invernizzi", Università Di Milano, 20157, Milan, Italy
| | - Sara Giordana Rimoldi
- Laboratorio di Microbiologia Clinica, Virologia e Diagnostica delle Bioemergenze, ASST Fatebenefratelli Sacco, Milan, Italy
| | | | - Marta Corbella
- Department of Microbiology & Virology, Fondazione IRCCS Policlinico San Matteo, Viale Camillo Golgi 19, 27100, Pavia, Italy
| | | | - Paola Prati
- Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna (IZSLER), Pavia, Italy
| | - Claudio Farina
- Laboratory of Microbiology and Virology, Azienda Socio-Sanitaria Territoriale (ASST) Papa Giovanni XXIII, Bergamo, Italy
| | - Daniela Maria Cirillo
- Emerging Bacterial Pathogens Unit, Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Gianvincenzo Zuccotti
- Department of Biomedical and Clinical Sciences, Pediatric Clinical Research Center "Romeo and Enrica Invernizzi", Università Di Milano, 20157, Milan, Italy
- Department of Paediatrics, Children's Hospital "V. Buzzi", Milano, Italy
| | - Claudio Bandi
- Department of Biosciences and Pediatric Clinical Research Center "Romeo Ed Enrica Invernizzi", University of Milan, 20133, Milan, Italy
| | - Francesco Comandatore
- Department of Biomedical and Clinical Sciences, Pediatric Clinical Research Center "Romeo and Enrica Invernizzi", Università Di Milano, 20157, Milan, Italy.
| |
Collapse
|
18
|
Hwang Y, Cornman AL, Kellogg EH, Ovchinnikov S, Girguis PR. Genomic language model predicts protein co-regulation and function. Nat Commun 2024; 15:2880. [PMID: 38570504 PMCID: PMC10991518 DOI: 10.1038/s41467-024-46947-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 03/13/2024] [Indexed: 04/05/2024] Open
Abstract
Deciphering the relationship between a gene and its genomic context is fundamental to understanding and engineering biological systems. Machine learning has shown promise in learning latent relationships underlying the sequence-structure-function paradigm from massive protein sequence datasets. However, to date, limited attempts have been made in extending this continuum to include higher order genomic context information. Evolutionary processes dictate the specificity of genomic contexts in which a gene is found across phylogenetic distances, and these emergent genomic patterns can be leveraged to uncover functional relationships between gene products. Here, we train a genomic language model (gLM) on millions of metagenomic scaffolds to learn the latent functional and regulatory relationships between genes. gLM learns contextualized protein embeddings that capture the genomic context as well as the protein sequence itself, and encode biologically meaningful and functionally relevant information (e.g. enzymatic function, taxonomy). Our analysis of the attention patterns demonstrates that gLM is learning co-regulated functional modules (i.e. operons). Our findings illustrate that gLM's unsupervised deep learning of the metagenomic corpus is an effective and promising approach to encode functional semantics and regulatory syntax of genes in their genomic contexts and uncover complex relationships between genes in a genomic region.
Collapse
Affiliation(s)
- Yunha Hwang
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | | | - Elizabeth H Kellogg
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA.
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Peter R Girguis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
19
|
Gschwind R, Petitjean M, Fournier C, Lao J, Clermont O, Nordmann P, Mellmann A, Denamur E, Poirel L, Ruppé E. Inter-phylum circulation of a beta-lactamase-encoding gene: a rare but observable event. Antimicrob Agents Chemother 2024; 68:e0145923. [PMID: 38441061 PMCID: PMC10989005 DOI: 10.1128/aac.01459-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 02/12/2024] [Indexed: 03/06/2024] Open
Abstract
Beta-lactamase-mediated degradation of beta-lactams is the most common mechanism of beta-lactam resistance in Gram-negative bacteria. Beta-lactamase-encoding genes can be transferred between closely related bacteria, but spontaneous inter-phylum transfers (between distantly related bacteria) have never been reported. Here, we describe an extended-spectrum beta-lactamase (ESBL)-encoding gene (blaMUN-1) shared between the Pseudomonadota and Bacteroidota phyla. An Escherichia coli strain was isolated from a patient in Münster (Germany). Its genome was sequenced. The ESBL-encoding gene (named blaMUN-1) was cloned, and the corresponding enzyme was characterized. The distribution of the gene among bacteria was investigated using the RefSeq Genomes database. The frequency and relative abundance of its closest homolog in the global microbial gene catalog (GMGC) were analyzed. The E. coli strain exhibited two distinct morphotypes. Each morphotype possessed two chromosomal copies of the blaMUN-1 gene, with one morphotype having two additional copies located on a phage-plasmid p0111. Each copy was located within a 7.6-kb genomic island associated with mobility. blaMUN-1 encoded for an extended-spectrum Ambler subclass A2 beta-lactamase with 43.0% amino acid identity to TLA-1. blaMUN-1 was found in species among the Bacteroidales order and in Sutterella wadsworthensis (Pseudomonadota). Its closest homolog in GMGC was detected frequently in human fecal samples. This is, to our knowledge, the first reported instance of inter-phylum transfer of an ESBL-encoding gene, between the Bacteroidota and Pseudomonadota phyla. Although the gene was frequently detected in the human gut, inter-phylum transfer was rare, indicating that inter-phylum barriers are effective in impeding the spread of ESBL-encoding genes, but not entirely impenetrable.
Collapse
Affiliation(s)
- Rémi Gschwind
- Université Paris Cité, INSERM, Université Sorbonne Paris Nord, IAME, Paris, France
| | - Marie Petitjean
- Université Paris Cité, INSERM, Université Sorbonne Paris Nord, IAME, Paris, France
- AP-HP, Hôpital Bichat, Laboratoire de Bactériologie, Paris, France
| | - Claudine Fournier
- Emerging Antibiotic Resistance, Medical and Molecular Microbiology, Faculty of Science and Medicine, University of Fribourg, Fribourg, Switzerland
- Swiss National Reference Center for Emerging Antibiotic Resistance, Fribourg, Switzerland
- INSERM European Unit (IAME, France), University of Fribourg, Fribourg, Switzerland
| | - Julie Lao
- Université Paris Cité, INSERM, Université Sorbonne Paris Nord, IAME, Paris, France
| | - Olivier Clermont
- Université Paris Cité, INSERM, Université Sorbonne Paris Nord, IAME, Paris, France
| | - Patrice Nordmann
- Emerging Antibiotic Resistance, Medical and Molecular Microbiology, Faculty of Science and Medicine, University of Fribourg, Fribourg, Switzerland
- Swiss National Reference Center for Emerging Antibiotic Resistance, Fribourg, Switzerland
- INSERM European Unit (IAME, France), University of Fribourg, Fribourg, Switzerland
- University of Lausanne, University Hospital Center, Lausanne, Switzerland
| | | | - Erick Denamur
- Université Paris Cité, INSERM, Université Sorbonne Paris Nord, IAME, Paris, France
- AP-HP, Hôpital Bichat, Laboratoire de Génétique Moléculaire, Paris, France
| | - Laurent Poirel
- Emerging Antibiotic Resistance, Medical and Molecular Microbiology, Faculty of Science and Medicine, University of Fribourg, Fribourg, Switzerland
- Swiss National Reference Center for Emerging Antibiotic Resistance, Fribourg, Switzerland
- INSERM European Unit (IAME, France), University of Fribourg, Fribourg, Switzerland
- University of Lausanne, University Hospital Center, Lausanne, Switzerland
| | - Etienne Ruppé
- Université Paris Cité, INSERM, Université Sorbonne Paris Nord, IAME, Paris, France
- AP-HP, Hôpital Bichat, Laboratoire de Bactériologie, Paris, France
| |
Collapse
|
20
|
Pudžiuvelytė I, Olechnovič K, Godliauskaite E, Sermokas K, Urbaitis T, Gasiunas G, Kazlauskas D. TemStaPro: protein thermostability prediction using sequence representations from protein language models. Bioinformatics 2024; 40:btae157. [PMID: 38507682 PMCID: PMC11001493 DOI: 10.1093/bioinformatics/btae157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/28/2024] [Accepted: 03/18/2024] [Indexed: 03/22/2024] Open
Abstract
MOTIVATION Reliable prediction of protein thermostability from its sequence is valuable for both academic and industrial research. This prediction problem can be tackled using machine learning and by taking advantage of the recent blossoming of deep learning methods for sequence analysis. These methods can facilitate training on more data and, possibly, enable the development of more versatile thermostability predictors for multiple ranges of temperatures. RESULTS We applied the principle of transfer learning to predict protein thermostability using embeddings generated by protein language models (pLMs) from an input protein sequence. We used large pLMs that were pre-trained on hundreds of millions of known sequences. The embeddings from such models allowed us to efficiently train and validate a high-performing prediction method using over one million sequences that we collected from organisms with annotated growth temperatures. Our method, TemStaPro (Temperatures of Stability for Proteins), was used to predict thermostability of CRISPR-Cas Class II effector proteins (C2EPs). Predictions indicated sharp differences among groups of C2EPs in terms of thermostability and were largely in tune with previously published and our newly obtained experimental data. AVAILABILITY AND IMPLEMENTATION TemStaPro software and the related data are freely available from https://github.com/ievapudz/TemStaPro and https://doi.org/10.5281/zenodo.7743637.
Collapse
Affiliation(s)
- Ieva Pudžiuvelytė
- Institute of Biotechnology, Life Sciences Center, Vilnius University, LT-10257 Vilnius, Lithuania
- Institute of Computer Science, Faculty of Mathematics and Informatics, Vilnius University, LT-08303 Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, LT-10257 Vilnius, Lithuania
| | | | | | | | - Giedrius Gasiunas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, LT-10257 Vilnius, Lithuania
- CasZyme, LT-10257 Vilnius, Lithuania
| | - Darius Kazlauskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, LT-10257 Vilnius, Lithuania
| |
Collapse
|
21
|
Monteiro da Silva G, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. High-throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nat Commun 2024; 15:2464. [PMID: 38538622 PMCID: PMC10973385 DOI: 10.1038/s41467-024-46715-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/28/2024] [Indexed: 04/12/2024] Open
Abstract
This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution.
Collapse
Affiliation(s)
| | - Jennifer Y Cui
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
| | | | - George P Lisi
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
- Brown University Department of Chemistry, Providence, RI, USA
| | - Brenda M Rubenstein
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA.
- Brown University Department of Chemistry, Providence, RI, USA.
| |
Collapse
|
22
|
Fierro Morales JC, Redfearn C, Titus MA, Roh-Johnson M. Reduced PaxillinB localization to cell-substrate adhesions promotes cell migration in Dictyostelium. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585764. [PMID: 38562712 PMCID: PMC10983970 DOI: 10.1101/2024.03.19.585764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Many cells adhere to extracellular matrix for efficient cell migration. This adhesion is mediated by focal adhesions, a protein complex linking the extracellular matrix to the intracellular cytoskeleton. Focal adhesions have been studied extensively in mesenchymal cells, but recent research in physiological contexts and amoeboid cells suggest focal adhesion regulation differs from the mesenchymal focal adhesion paradigm. We used Dictyostelium discoideum to uncover new mechanisms of focal adhesion regulation, as Dictyostelium are amoeboid cells that form focal adhesion-like structures for migration. We show that PaxillinB, the Dictyostelium homologue of Paxillin, localizes to dynamic focal adhesion-like structures during Dictyostelium migration. Unexpectedly, reduced PaxillinB recruitment to these structures increases Dictyostelium cell migration. Quantitative analysis of focal adhesion size and dynamics show that lack of PaxillinB recruitment to focal adhesions does not alter focal adhesion size, but rather increases focal adhesion turnover. These findings are in direct contrast to Paxillin function at focal adhesions during mesenchymal migration, challenging the established focal adhesion model.
Collapse
Affiliation(s)
| | - Chandler Redfearn
- Department of Kinesiology, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA
| | - Margaret A Titus
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Minna Roh-Johnson
- Department of Biochemistry, University of Utah, Salt Lake City, UT, 84112, USA
- Department of Kinesiology, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
23
|
Johnson SR, Peshwa M, Sun Z. Sensitive remote homology search by local alignment of small positional embeddings from protein language models. eLife 2024; 12:RP91415. [PMID: 38488154 PMCID: PMC10942778 DOI: 10.7554/elife.91415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024] Open
Abstract
Accurately detecting distant evolutionary relationships between proteins remains an ongoing challenge in bioinformatics. Search methods based on primary sequence struggle to accurately detect homology between sequences with less than 20% amino acid identity. Profile- and structure-based strategies extend sensitive search capabilities into this twilight zone of sequence similarity but require slow pre-processing steps. Recently, whole-protein and positional embeddings from deep neural networks have shown promise for providing sensitive sequence comparison and annotation at long evolutionary distances. Embeddings are generally faster to compute than profiles and predicted structures but still suffer several drawbacks related to the ability of whole-protein embeddings to discriminate domain-level homology, and the database size and search speed of methods using positional embeddings. In this work, we show that low-dimensionality positional embeddings can be used directly in speed-optimized local search algorithms. As a proof of concept, we use the ESM2 3B model to convert primary sequences directly into the 3D interaction (3Di) alphabet or amino acid profiles and use these embeddings as input to the highly optimized Foldseek, HMMER3, and HH-suite search algorithms. Our results suggest that positional embeddings as small as a single byte can provide sufficient information for dramatically improved sensitivity over amino acid sequence searches without sacrificing search speed.
Collapse
Affiliation(s)
| | | | - Zhiyi Sun
- New England Biolabs IncIpswichUnited States
| |
Collapse
|
24
|
Makarova KS, Tobiasson V, Wolf YI, Lu Z, Liu Y, Zhang S, Krupovic M, Li M, Koonin EV. Diversity, origin, and evolution of the ESCRT systems. mBio 2024; 15:e0033524. [PMID: 38380930 PMCID: PMC10936438 DOI: 10.1128/mbio.00335-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 02/06/2024] [Indexed: 02/22/2024] Open
Abstract
Endosomal sorting complexes required for transport (ESCRT) play key roles in protein sorting between membrane-bounded compartments of eukaryotic cells. Homologs of many ESCRT components are identifiable in various groups of archaea, especially in Asgardarchaeota, the archaeal phylum that is currently considered to include the closest relatives of eukaryotes, but not in bacteria. We performed a comprehensive search for ESCRT protein homologs in archaea and reconstructed ESCRT evolution using the phylogenetic tree of Vps4 ATPase (ESCRT IV) as a scaffold and using sensitive protein sequence analysis and comparison of structural models to identify previously unknown ESCRT proteins. Several distinct groups of ESCRT systems in archaea outside of Asgard were identified, including proteins structurally similar to ESCRT-I and ESCRT-II, and several other domains involved in protein sorting in eukaryotes, suggesting an early origin of these components. Additionally, distant homologs of CdvA proteins were identified in Thermoproteales which are likely components of the uncharacterized cell division system in these archaea. We propose an evolutionary scenario for the origin of eukaryotic and Asgard ESCRT complexes from ancestral building blocks, namely, the Vps4 ATPase, ESCRT-III components, wH (winged helix-turn-helix fold) and possibly also coiled-coil, and Vps28-like domains. The last archaeal common ancestor likely encompassed a complex ESCRT system that was involved in protein sorting. Subsequent evolution involved either simplification, as in the TACK superphylum, where ESCRT was co-opted for cell division, or complexification as in Asgardarchaeota. In Asgardarchaeota, the connection between ESCRT and the ubiquitin system that was previously considered a eukaryotic signature was already established.IMPORTANCEAll eukaryotic cells possess complex intracellular membrane organization. Endosomal sorting complexes required for transport (ESCRT) play a central role in membrane remodeling which is essential for cellular functionality in eukaryotes. Recently, it has been shown that Asgard archaea, the archaeal phylum that includes the closest known relatives of eukaryotes, encode homologs of many components of the ESCRT systems. We employed protein sequence and structure comparisons to reconstruct the evolution of ESCRT systems in archaea and identified several previously unknown homologs of ESCRT subunits, some of which can be predicted to participate in cell division. The results of this reconstruction indicate that the last archaeal common ancestor already encoded a complex ESCRT system that was involved in protein sorting. In Asgard archaea, ESCRT systems evolved toward greater complexity, and in particular, the connection between ESCRT and the ubiquitin system that was previously considered a eukaryotic signature was established.
Collapse
Affiliation(s)
- Kira S. Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Victor Tobiasson
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Zhongyi Lu
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Yang Liu
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Siyu Zhang
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Mart Krupovic
- Archaeal Virology Unit, Institut Pasteur, Université de Paris, Paris, France
| | - Meng Li
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| |
Collapse
|
25
|
Hui X, Yang J, Sun J, Liu F, Pan W. MCSS: microbial community simulator based on structure. Front Microbiol 2024; 15:1358257. [PMID: 38516019 PMCID: PMC10956353 DOI: 10.3389/fmicb.2024.1358257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024] Open
Abstract
De novo assembly plays a pivotal role in metagenomic analysis, and the incorporation of third-generation sequencing technology can significantly improve the integrity and accuracy of assembly results. Recently, with advancements in sequencing technology (Hi-Fi, ultra-long), several long-read-based bioinformatic tools have been developed. However, the validation of the performance and reliability of these tools is a crucial concern. To address this gap, we present MCSS (microbial community simulator based on structure), which has the capability to generate simulated microbial community and sequencing datasets based on the structure attributes of real microbiome communities. The evaluation results indicate that it can generate simulated communities that exhibit both diversity and similarity to actual community structures. Additionally, MCSS generates synthetic PacBio Hi-Fi and Oxford Nanopore Technologies (ONT) long reads for the species within the simulated community. This innovative tool provides a valuable resource for benchmarking and refining metagenomic analysis methods. Code available at: https://github.com/panlab-bio/mcss.
Collapse
Affiliation(s)
- Xingqi Hui
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences (ICR, CAAS), Shenzhen, China
| | - Jinbao Yang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences (ICR, CAAS), Shenzhen, China
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Jinhuan Sun
- Key Laboratory of Plant Molecular Physiology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| | - Fang Liu
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
- National Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR, CAAS), Anyang, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences (ICR, CAAS), Shenzhen, China
| |
Collapse
|
26
|
Garmaeva S, Sinha T, Gulyaeva A, Kuzub N, Spreckels JE, Andreu-Sánchez S, Gacesa R, Vich Vila A, Brushett S, Kruk M, Dekens J, Sikkema J, Kuipers F, Shkoporov AN, Hill C, Scherjon S, Wijmenga C, Fu J, Kurilshikov A, Zhernakova A. Transmission and dynamics of mother-infant gut viruses during pregnancy and early life. Nat Commun 2024; 15:1945. [PMID: 38431663 PMCID: PMC10908809 DOI: 10.1038/s41467-024-45257-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024] Open
Abstract
Early development of the gut ecosystem is crucial for lifelong health. While infant gut bacterial communities have been studied extensively, the infant gut virome remains under-explored. To study the development of the infant gut virome over time and the factors that shape it, we longitudinally assess the composition of gut viruses and their bacterial hosts in 30 women during and after pregnancy and in their 32 infants during their first year of life. Using shotgun metagenomic sequencing applied to dsDNA extracted from Virus-Like Particles (VLPs) and bacteria, we generate 205 VLP metaviromes and 322 total metagenomes. With this data, we show that while the maternal gut virome composition remains stable during late pregnancy and after birth, the infant gut virome is dynamic in the first year of life. Notably, infant gut viromes contain a higher abundance of active temperate phages compared to maternal gut viromes, which decreases over the first year of life. Moreover, we show that the feeding mode and place of delivery influence the gut virome composition of infants. Lastly, we provide evidence of co-transmission of viral and bacterial strains from mothers to infants, demonstrating that infants acquire some of their virome from their mother's gut.
Collapse
Affiliation(s)
- Sanzhima Garmaeva
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Trishla Sinha
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Anastasia Gulyaeva
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Nataliia Kuzub
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Johanne E Spreckels
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Sergio Andreu-Sánchez
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Department of Pediatrics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Ranko Gacesa
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Department of Gastroenterology and Hepatology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Arnau Vich Vila
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Department of Gastroenterology and Hepatology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Siobhan Brushett
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Department of Health Sciences, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Marloes Kruk
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Jackie Dekens
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- University Medical Center Groningen, Center for Development and Innovation, Groningen, Netherlands
| | - Jan Sikkema
- University Medical Center Groningen, Center for Development and Innovation, Groningen, Netherlands
| | - Folkert Kuipers
- Department of Pediatrics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- European Research Institute for the Biology of Ageing (ERIBA), University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Andrey N Shkoporov
- APC Microbiome Ireland, University College Cork, Cork, Ireland
- School of Microbiology, University College Cork, Cork, Ireland
| | - Colin Hill
- APC Microbiome Ireland, University College Cork, Cork, Ireland
- School of Microbiology, University College Cork, Cork, Ireland
| | - Sicco Scherjon
- Department of Obstetrics and Gynecology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Cisca Wijmenga
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Jingyuan Fu
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Department of Pediatrics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Alexander Kurilshikov
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Alexandra Zhernakova
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands.
| |
Collapse
|
27
|
Tassoulas LJ, Wackett LP. Insights into the action of the pharmaceutical metformin: Targeted inhibition of the gut microbial enzyme agmatinase. iScience 2024; 27:108900. [PMID: 38318350 PMCID: PMC10839685 DOI: 10.1016/j.isci.2024.108900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 12/06/2023] [Accepted: 01/09/2024] [Indexed: 02/07/2024] Open
Abstract
Metformin is the first-line treatment for type 2 diabetes, yet its mechanism of action is not fully understood. Recent studies suggest metformin's interactions with gut microbiota are responsible for exerting therapeutic effects. In this study, we report that metformin targets the gut microbial enzyme agmatinase, as a competitive inhibitor, which may impair gut agmatine catabolism. The metformin inhibition constant (Ki) of E. coli agmatinase is 1 mM and relevant in the gut where the drug concentration is 1-10 mM. Metformin analogs phenformin, buformin, and galegine are even more potent inhibitors of E. coli agmatinase (Ki = 0.6, 0.1, and 0.007 mM, respectively) suggesting a shared mechanism. Agmatine is a known effector of human host metabolism and has been reported to augment metformin's therapeutic effects for type 2 diabetes. This gut-derived inhibition mechanism gives new insights on metformin's action in the gut and may lead to significant discoveries in improving metformin therapy.
Collapse
Affiliation(s)
- Lambros J. Tassoulas
- Department of Biochemistry, Biophysics & Molecular Biology, University of Minnesota, Minneapolis, MN 55455, USA
- BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| | - Lawrence P. Wackett
- Department of Biochemistry, Biophysics & Molecular Biology, University of Minnesota, Minneapolis, MN 55455, USA
- BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| |
Collapse
|
28
|
Fauser F, Kadam BN, Arangundy-Franklin S, Davis JE, Vaidya V, Schmidt NJ, Lew G, Xia DF, Mureli R, Ng C, Zhou Y, Scarlott NA, Eshleman J, Bendaña YR, Shivak DA, Reik A, Li P, Davis GD, Miller JC. Compact zinc finger architecture utilizing toxin-derived cytidine deaminases for highly efficient base editing in human cells. Nat Commun 2024; 15:1181. [PMID: 38360922 PMCID: PMC10869815 DOI: 10.1038/s41467-024-45100-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Nucleobase editors represent an emerging technology that enables precise single-base edits to the genomes of eukaryotic cells. Most nucleobase editors use deaminase domains that act upon single-stranded DNA and require RNA-guided proteins such as Cas9 to unwind the DNA prior to editing. However, the most recent class of base editors utilizes a deaminase domain, DddAtox, that can act upon double-stranded DNA. Here, we target DddAtox fragments and a FokI-based nickase to the human CIITA gene by fusing these domains to arrays of engineered zinc fingers (ZFs). We also identify a broad variety of Toxin-Derived Deaminases (TDDs) orthologous to DddAtox that allow us to fine-tune properties such as targeting density and specificity. TDD-derived ZF base editors enable up to 73% base editing in T cells with good cell viability and favorable specificity.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Garrett Lew
- Sangamo Therapeutics, Inc., Brisbane, CA, USA
| | - Danny F Xia
- Sangamo Therapeutics, Inc., Brisbane, CA, USA
| | | | - Colman Ng
- Sangamo Therapeutics, Inc., Brisbane, CA, USA
| | | | | | | | | | | | | | - Patrick Li
- Sangamo Therapeutics, Inc., Brisbane, CA, USA
| | | | | |
Collapse
|
29
|
Kumar B, Lorusso E, Fosso B, Pesole G. A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions. Front Microbiol 2024; 15:1343572. [PMID: 38419630 PMCID: PMC10900530 DOI: 10.3389/fmicb.2024.1343572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Metagenomics, Metabolomics, and Metaproteomics have significantly advanced our knowledge of microbial communities by providing culture-independent insights into their composition and functional potential. However, a critical challenge in this field is the lack of standard and comprehensive metadata associated with raw data, hindering the ability to perform robust data stratifications and consider confounding factors. In this comprehensive review, we categorize publicly available microbiome data into five types: shotgun sequencing, amplicon sequencing, metatranscriptomic, metabolomic, and metaproteomic data. We explore the importance of metadata for data reuse and address the challenges in collecting standardized metadata. We also, assess the limitations in metadata collection of existing public repositories collecting metagenomic data. This review emphasizes the vital role of metadata in interpreting and comparing datasets and highlights the need for standardized metadata protocols to fully leverage metagenomic data's potential. Furthermore, we explore future directions of implementation of Machine Learning (ML) in metadata retrieval, offering promising avenues for a deeper understanding of microbial communities and their ecological roles. Leveraging these tools will enhance our insights into microbial functional capabilities and ecological dynamics in diverse ecosystems. Finally, we emphasize the crucial metadata role in ML models development.
Collapse
Affiliation(s)
- Bablu Kumar
- Università degli Studi di Milano, Milan, Italy
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Erika Lorusso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| | - Bruno Fosso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| |
Collapse
|
30
|
Makarova KS, Tobiasson V, Wolf YI, Lu Z, Liu Y, Zhang S, Krupovic M, Li M, Koonin EV. Diversity, Origin and Evolution of the ESCRT Systems. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.06.579148. [PMID: 38903064 PMCID: PMC11188069 DOI: 10.1101/2024.02.06.579148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Endosomal Sorting Complexes Required for Transport (ESCRT) play key roles in protein sorting between membrane-bounded compartments of eukaryotic cells. Homologs of many ESCRT components are identifiable in various groups of archaea, especially in Asgardarchaeota, the archaeal phylum that is currently considered to include the closest relatives of eukaryotes, but not in bacteria. We performed a comprehensive search for ESCRT protein homologs in archaea and reconstructed ESCRT evolution using the phylogenetic tree of Vps4 ATPase (ESCRT IV) as a scaffold, using sensitive protein sequence analysis and comparison of structural models to identify previously unknown ESCRT proteins. Several distinct groups of ESCRT systems in archaea outside of Asgard were identified, including proteins structurally similar to ESCRT-I and ESCRT-II, and several other domains involved in protein sorting in eukaryotes, suggesting an early origin of these components. Additionally, distant homologs of CdvA proteins were identified in Thermoproteales which are likely components of the uncharacterized cell division system in these archaea. We propose an evolutionary scenario for the origin of eukaryotic and Asgard ESCRT complexes from ancestral building blocks, namely, the Vps4 ATPase, ESCRT-III components, wH (winged helix-turn-helix fold) and possibly also coiled-coil, and Vps28-like domains. The Last Archaeal Common Ancestor likely encompassed a complex ESCRT system that was involved in protein sorting. Subsequent evolution involved either simplification, as in the TACK superphylum, where ESCRT was co-opted for cell division, or complexification as in Asgardarchaeota. In Asgardarchaeota, the connection between ESCRT and the ubiquitin system that was previously considered a eukaryotic signature was already established.
Collapse
Affiliation(s)
- Kira S. Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Victor Tobiasson
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Zhongyi Lu
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Yang Liu
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Siyu Zhang
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Mart Krupovic
- Archaeal Virology Unit, Institut Pasteur, Université de Paris, F-75015 Paris, France
| | - Meng Li
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
31
|
Zheng W, Wuyun Q, Li Y, Zhang C, Freddolino PL, Zhang Y. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nat Methods 2024; 21:279-289. [PMID: 38167654 PMCID: PMC10864179 DOI: 10.1038/s41592-023-02130-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 11/13/2023] [Indexed: 01/05/2024]
Abstract
Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - P Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
32
|
Rodríguez Del Río Á, Giner-Lamia J, Cantalapiedra CP, Botas J, Deng Z, Hernández-Plaza A, Munar-Palmer M, Santamaría-Hernando S, Rodríguez-Herva JJ, Ruscheweyh HJ, Paoli L, Schmidt TSB, Sunagawa S, Bork P, López-Solanilla E, Coelho LP, Huerta-Cepas J. Functional and evolutionary significance of unknown genes from uncultivated taxa. Nature 2024; 626:377-384. [PMID: 38109938 PMCID: PMC10849945 DOI: 10.1038/s41586-023-06955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 12/08/2023] [Indexed: 12/20/2023]
Abstract
Many of the Earth's microbes remain uncultured and understudied, limiting our understanding of the functional and evolutionary aspects of their genetic material, which remain largely overlooked in most metagenomic studies1. Here we analysed 149,842 environmental genomes from multiple habitats2-6 and compiled a curated catalogue of 404,085 functionally and evolutionarily significant novel (FESNov) gene families exclusive to uncultivated prokaryotic taxa. All FESNov families span multiple species, exhibit strong signals of purifying selection and qualify as new orthologous groups, thus nearly tripling the number of bacterial and archaeal gene families described to date. The FESNov catalogue is enriched in clade-specific traits, including 1,034 novel families that can distinguish entire uncultivated phyla, classes and orders, probably representing synapomorphies that facilitated their evolutionary divergence. Using genomic context analysis and structural alignments we predicted functional associations for 32.4% of FESNov families, including 4,349 high-confidence associations with important biological processes. These predictions provide a valuable hypothesis-driven framework that we used for experimental validatation of a new gene family involved in cell motility and a novel set of antimicrobial peptides. We also demonstrate that the relative abundance profiles of novel families can discriminate between environments and clinical conditions, leading to the discovery of potentially new biomarkers associated with colorectal cancer. We expect this work to enhance future metagenomics studies and expand our knowledge of the genetic repertory of uncultivated organisms.
Collapse
Affiliation(s)
- Álvaro Rodríguez Del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
- Departamento de Bioquímica Vegetal y Biología Molecular, Facultad de Biología, Instituto de Bioquímica Vegetal y Fotosíntesis (IBVF), Universidad de Sevilla-CSIC, Seville, Spain
| | - Carlos P Cantalapiedra
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Ziqi Deng
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Ana Hernández-Plaza
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Martí Munar-Palmer
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Saray Santamaría-Hernando
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - José J Rodríguez-Herva
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Thomas S B Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Emilia López-Solanilla
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Shanghai, China
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
| |
Collapse
|
33
|
Ao YF, Dörr M, Menke MJ, Born S, Heuson E, Bornscheuer UT. Data-Driven Protein Engineering for Improving Catalytic Activity and Selectivity. Chembiochem 2024; 25:e202300754. [PMID: 38029350 DOI: 10.1002/cbic.202300754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 12/01/2023]
Abstract
Protein engineering is essential for altering the substrate scope, catalytic activity and selectivity of enzymes for applications in biocatalysis. However, traditional approaches, such as directed evolution and rational design, encounter the challenge in dealing with the experimental screening process of a large protein mutation space. Machine learning methods allow the approximation of protein fitness landscapes and the identification of catalytic patterns using limited experimental data, thus providing a new avenue to guide protein engineering campaigns. In this concept article, we review machine learning models that have been developed to assess enzyme-substrate-catalysis performance relationships aiming to improve enzymes through data-driven protein engineering. Furthermore, we prospect the future development of this field to provide additional strategies and tools for achieving desired activities and selectivities.
Collapse
Affiliation(s)
- Yu-Fei Ao
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
- Beijing National Laboratory for Molecular Sciences, CAS Key Laboratory of Molecular Recognition and Function, Institute of Chemistry, Chinese Academy of Sciences, Zhongguancun North First Street 2, Beijing, 100190, China
- University of Chinese Academy of Sciences, Yuquan Road 19(A), Beijing, 100049, China
| | - Mark Dörr
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Marian J Menke
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Stefan Born
- Technische Universität Berlin, Chair of Bioprocess Engineering, Ackerstraße 76, 13355, Berlin, Germany
| | - Egon Heuson
- Univ. Lille, CNRS, Centrale Lille, Univ. Artois, UMR 8181 UCCS, Unité de Catalyse et Chimie du Solide, 59000, Lille, France
| | - Uwe T Bornscheuer
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| |
Collapse
|
34
|
Verma B, Parkinson J. HiTaxon: a hierarchical ensemble framework for taxonomic classification of short reads. BIOINFORMATICS ADVANCES 2024; 4:vbae016. [PMID: 38371920 PMCID: PMC10873905 DOI: 10.1093/bioadv/vbae016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 02/16/2024] [Accepted: 02/16/2024] [Indexed: 02/20/2024]
Abstract
Motivation Whole microbiome DNA and RNA sequencing (metagenomics and metatranscriptomics) are pivotal to determining the functional roles of microbial communities. A key challenge in analyzing these complex datasets, typically composed of tens of millions of short reads, is accurately classifying reads to their taxa of origin. While still performing worse relative to reference-based short-read tools in species classification, ML algorithms have shown promising results in taxonomic classification at higher ranks. A recent approach exploited to enhance the performance of ML tools, which can be translated to reference-dependent classifiers, has been to integrate the hierarchical structure of taxonomy within the tool's predictive algorithm. Results Here, we introduce HiTaxon, an end-to-end hierarchical ensemble framework for taxonomic classification. HiTaxon facilitates data collection and processing, reference database construction and optional training of ML models to streamline ensemble creation. We show that databases created by HiTaxon improve the species-level performance of reference-dependent classifiers, while reducing their computational overhead. In addition, through exploring hierarchical methods for HiTaxon, we highlight that our custom approach to hierarchical ensembling improves species-level classification relative to traditional strategies. Finally, we demonstrate the improved performance of our hierarchical ensembles over current state-of-the-art classifiers in species classification using datasets comprised of either simulated or experimentally derived reads. Availability and implementation HiTaxon is available at: https://github.com/ParkinsonLab/HiTaxon.
Collapse
Affiliation(s)
- Bhavish Verma
- Program in Molecular Medicine, Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - John Parkinson
- Program in Molecular Medicine, Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
- Department of Biochemistry, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
35
|
Skaraki K, Pavloudi C, Dailianis T, Lagnel J, Pantazidou A, Magoulas A, Kotoulas G. Microbial diversity in four Mediterranean irciniid sponges. Biodivers Data J 2024; 12:e114809. [PMID: 38283142 PMCID: PMC10819633 DOI: 10.3897/bdj.12.e114809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 01/14/2024] [Indexed: 01/30/2024] Open
Abstract
This paper describes a dataset of microbial communities from four different sponge species: Irciniaoros (Schmidt, 1864), Irciniavariabilis (Schmidt, 1862), Sarcotragusspinosulus Schmidt, 1862 and Sarcotragusfasciculatus (Pallas, 1766). The examined sponges all belong to Demospongiae (Class); Keratosa (Subclass); Dictyoceratida (Order); Irciniidae (Family). Samples were collected by scuba diving at depths between 6-14 m from two sampling sites of rocky formations at the northern coast of Crete (Cretan Sea, eastern Mediterranean) and were subjected to metabarcoding for the V5-V6 region of the 16S rRNA gene.
Collapse
Affiliation(s)
- Katerina Skaraki
- Institute of Marine Biology, Biotechnology & Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, GreeceInstitute of Marine Biology, Biotechnology & Aquaculture, Hellenic Centre for Marine ResearchHeraklion, CreteGreece
- Department of Ecology & Systematics, Faculty of Biology, National & Kapodistrian University of Athens, Athens, GreeceDepartment of Ecology & Systematics, Faculty of Biology, National & Kapodistrian University of AthensAthensGreece
| | - Christina Pavloudi
- European Marine Biological Resource Centre (EMBRC-ERIC), Paris, FranceEuropean Marine Biological Resource Centre (EMBRC-ERIC)ParisFrance
| | - Thanos Dailianis
- Institute of Marine Biology, Biotechnology & Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, GreeceInstitute of Marine Biology, Biotechnology & Aquaculture, Hellenic Centre for Marine ResearchHeraklion, CreteGreece
| | - Jacques Lagnel
- INRAE, UR1052, Génétique et Amélioration des Fruits et Légumes (GAFL), Centre de Recherche PACA, Montfavet, FranceINRAE, UR1052, Génétique et Amélioration des Fruits et Légumes (GAFL), Centre de Recherche PACAMontfavetFrance
| | - Adriani Pantazidou
- Department of Ecology & Systematics, Faculty of Biology, National & Kapodistrian University of Athens, Athens, GreeceDepartment of Ecology & Systematics, Faculty of Biology, National & Kapodistrian University of AthensAthensGreece
| | - Antonios Magoulas
- Institute of Marine Biology, Biotechnology & Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, GreeceInstitute of Marine Biology, Biotechnology & Aquaculture, Hellenic Centre for Marine ResearchHeraklion, CreteGreece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology & Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, GreeceInstitute of Marine Biology, Biotechnology & Aquaculture, Hellenic Centre for Marine ResearchHeraklion, CreteGreece
| |
Collapse
|
36
|
Baltoumas FA, Karatzas E, Liu S, Ovchinnikov S, Sofianatos Y, Chen IM, Kyrpides N, Pavlopoulos G. NMPFamsDB: a database of novel protein families from microbial metagenomes and metatranscriptomes. Nucleic Acids Res 2024; 52:D502-D512. [PMID: 37811892 PMCID: PMC10767849 DOI: 10.1093/nar/gkad800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 09/19/2023] [Indexed: 10/10/2023] Open
Abstract
The Novel Metagenome Protein Families Database (NMPFamsDB) is a database of metagenome- and metatranscriptome-derived protein families, whose members have no hits to proteins of reference genomes or Pfam domains. Each protein family is accompanied by multiple sequence alignments, Hidden Markov Models, taxonomic information, ecosystem and geolocation metadata, sequence and structure predictions, as well as 3D structure models predicted with AlphaFold2. In its current version, NMPFamsDB hosts over 100 000 protein families, each with at least 100 members. The reported protein families significantly expand (more than double) the number of known protein sequence clusters from reference genomes and reveal new insights into their habitat distribution, origins, functions and taxonomy. We expect NMPFamsDB to be a valuable resource for microbial proteome-wide analyses and for further discovery and characterization of novel functions. NMPFamsDB is publicly available in http://www.nmpfamsdb.org/ or https://bib.fleming.gr/NMPFamsDB.
Collapse
Affiliation(s)
- Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Sirui Liu
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA
| | - Yorgos Sofianatos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - I-Min Chen
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 75 Mikras Asias Street, Athens 11527, Greece
| |
Collapse
|
37
|
Thakur M, Buniello A, Brooksbank C, Gurwitz KT, Hall M, Hartley M, Hulcoop DG, Leach AR, Marques D, Martin M, Mithani A, McDonagh EM, Mutasa-Gottgens E, Ochoa D, Perez-Riverol Y, Stephenson J, Varadi M, Velankar S, Vizcaino JA, Witham R, McEntyre J. EMBL's European Bioinformatics Institute (EMBL-EBI) in 2023. Nucleic Acids Res 2024; 52:D10-D17. [PMID: 38015445 PMCID: PMC10767983 DOI: 10.1093/nar/gkad1088] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 10/23/2023] [Accepted: 10/30/2023] [Indexed: 11/29/2023] Open
Abstract
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.
Collapse
Affiliation(s)
- Matthew Thakur
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Annalisa Buniello
- Open Targets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Catherine Brooksbank
- Training Team, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Kim T Gurwitz
- Training Team, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hall
- Industry Partnerships, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hartley
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - David G Hulcoop
- Open Targets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Andrew R Leach
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
- Industry Partnerships, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Diana Marques
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Maria Martin
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Aziz Mithani
- Training Team, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ellen M McDonagh
- Open Targets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Euphemia Mutasa-Gottgens
- Industry Partnerships, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - David Ochoa
- Open Targets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Yasset Perez-Riverol
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - James Stephenson
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mihaly Varadi
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sameer Velankar
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Juan Antonio Vizcaino
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Rick Witham
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Johanna McEntyre
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
38
|
Harrison PW, Amode MR, Austine-Orimoloye O, Azov A, Barba M, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Boddu S, Branco Lins PR, Brooks L, Ramaraju S, Campbell L, Martinez MC, Charkhchi M, Chougule K, Cockburn A, Davidson C, De Silva N, Dodiya K, Donaldson S, El Houdaigui B, Naboulsi T, Fatima R, Giron CG, Genez T, Grigoriadis D, Ghattaoraya G, Martinez JG, Gurbich T, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Lodha D, Marques-Coelho D, Maslen G, Merino G, Mirabueno L, Mushtaq A, Hossain S, Ogeh D, Sakthivel MP, Parker A, Perry M, Piližota I, Poppleton D, Prosovetskaia I, Raj S, Pérez-Silva J, Salam A, Saraf S, Saraiva-Agostinho N, Sheppard D, Sinha S, Sipos B, Sitnik V, Stark W, Steed E, Suner MM, Surapaneni L, Sutinen K, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Ware D, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley G, Keatley J, Loveland J, Moore B, Mudge J, Naamati G, Tate J, Trevanion S, Winterbottom A, Frankish A, Hunt SE, Cunningham F, Dyer S, Finn R, Martin F, Yates A. Ensembl 2024. Nucleic Acids Res 2024; 52:D891-D899. [PMID: 37953337 PMCID: PMC10767893 DOI: 10.1093/nar/gkad1049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 11/14/2023] Open
Abstract
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Collapse
Affiliation(s)
- Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthieu Barba
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Arne Becker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Simarpreet Kaur Bhurji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Paulo R Branco Lins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shashank Budhanuru Ramaraju
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lahcen I Campbell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manuel Carbajo Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
| | - Alexander Cockburn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nishadi H De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dionysios Grigoriadis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gurpreet S Ghattaoraya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tatiana A Gurbich
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vinay Kaykala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Disha Lodha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diego Marques-Coelho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gabriela Alejandra Merino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Louisse Paola Mirabueno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Syed Nakib Hossain
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Denye N Ogeh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manoj Pandian Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Daniel Poppleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shradha Saraf
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Swati Sinha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Botond Sipos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vasily Sitnik
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - William Stark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Natalie L Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jon Keatley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
39
|
Schmidt TSB, Fullam A, Ferretti P, Orakov A, Maistrenko OM, Ruscheweyh HJ, Letunic I, Duan Y, Van Rossum T, Sunagawa S, Mende DR, Finn RD, Kuhn M, Pedro Coelho L, Bork P. SPIRE: a Searchable, Planetary-scale mIcrobiome REsource. Nucleic Acids Res 2024; 52:D777-D783. [PMID: 37897342 PMCID: PMC10767986 DOI: 10.1093/nar/gkad943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 10/01/2023] [Accepted: 10/11/2023] [Indexed: 10/30/2023] Open
Abstract
Meta'omic data on microbial diversity and function accrue exponentially in public repositories, but derived information is often siloed according to data type, study or sampled microbial environment. Here we present SPIRE, a Searchable Planetary-scale mIcrobiome REsource that integrates various consistently processed metagenome-derived microbial data modalities across habitats, geography and phylogeny. SPIRE encompasses 99 146 metagenomic samples from 739 studies covering a wide array of microbial environments and augmented with manually-curated contextual data. Across a total metagenomic assembly of 16 Tbp, SPIRE comprises 35 billion predicted protein sequences and 1.16 million newly constructed metagenome-assembled genomes (MAGs) of medium or high quality. Beyond mapping to the high-quality genome reference provided by proGenomes3 (http://progenomes.embl.de), these novel MAGs form 92 134 novel species-level clusters, the majority of which are unclassified at species level using current tools. SPIRE enables taxonomic profiling of these species clusters via an updated, custom mOTUs database (https://motu-tool.org/) and includes several layers of functional annotation, as well as crosslinks to several (micro-)biological databases. The resource is accessible, searchable and browsable via http://spire.embl.de.
Collapse
Affiliation(s)
- Thomas S B Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Anthony Fullam
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Pamela Ferretti
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Askarbek Orakov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Oleksandr M Maistrenko
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Hans-Joachim Ruscheweyh
- Institute of Microbiology, Department of Biology and Swiss Institute of Bioinformatics, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Ivica Letunic
- Biobyte solutions GmbH, Bothestr. 142, 69117 Heidelberg, Germany
| | - Yiqian Duan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Thea Van Rossum
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Shinichi Sunagawa
- Institute of Microbiology, Department of Biology and Swiss Institute of Bioinformatics, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Daniel R Mende
- Department of Medical Microbiology, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Robert D Finn
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
- Department of Bioinformatics, Biozentrum, University of Würzburg, 97074 Würzburg, Germany
- Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany
| |
Collapse
|
40
|
Hirsch P, Tagirdzhanov A, Kushnareva A, Olkhovskii I, Graf S, Schmartz GP, Hegemann JD, Bozhüyük KAJ, Müller R, Keller A, Gurevich A. ABC-HuMi: the Atlas of Biosynthetic Gene Clusters in the Human Microbiome. Nucleic Acids Res 2024; 52:D579-D585. [PMID: 37994699 PMCID: PMC10767846 DOI: 10.1093/nar/gkad1086] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/18/2023] [Accepted: 10/30/2023] [Indexed: 11/24/2023] Open
Abstract
The human microbiome has emerged as a rich source of diverse and bioactive natural products, harboring immense potential for therapeutic applications. To facilitate systematic exploration and analysis of its biosynthetic landscape, we present ABC-HuMi: the Atlas of Biosynthetic Gene Clusters (BGCs) in the Human Microbiome. ABC-HuMi integrates data from major human microbiome sequence databases and provides an expansive repository of BGCs compared to the limited coverage offered by existing resources. Employing state-of-the-art BGC prediction and analysis tools, our database ensures accurate annotation and enhanced prediction capabilities. ABC-HuMi empowers researchers with advanced browsing, filtering, and search functionality, enabling efficient exploration of the resource. At present, ABC-HuMi boasts a catalog of 19 218 representative BGCs derived from the human gut, oral, skin, respiratory and urogenital systems. By capturing the intricate biosynthetic potential across diverse human body sites, our database fosters profound insights into the molecular repertoire encoded within the human microbiome and offers a comprehensive resource for the discovery and characterization of novel bioactive compounds. The database is freely accessible at https://www.ccb.uni-saarland.de/abc_humi/.
Collapse
Affiliation(s)
- Pascal Hirsch
- Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | - Azat Tagirdzhanov
- Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
| | - Aleksandra Kushnareva
- Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
| | - Ilia Olkhovskii
- Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken 66123, Germany
| | - Simon Graf
- Department of Computer Science, Saarland University, Saarbrücken 66123, Germany
| | - Georges P Schmartz
- Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | - Julian D Hegemann
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
- Department of Pharmacy, Saarland University, Saarbrücken 66123, Germany
| | - Kenan A J Bozhüyük
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
- Department of Pharmacy, Saarland University, Saarbrücken 66123, Germany
| | - Andreas Keller
- Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
| | - Alexey Gurevich
- Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken 66123, Germany
- Department of Computer Science, Saarland University, Saarbrücken 66123, Germany
| |
Collapse
|
41
|
Yurekten O, Payne T, Tejera N, Amaladoss FX, Martin C, Williams M, O’Donovan C. MetaboLights: open data repository for metabolomics. Nucleic Acids Res 2024; 52:D640-D646. [PMID: 37971328 PMCID: PMC10767962 DOI: 10.1093/nar/gkad1045] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 11/19/2023] Open
Abstract
MetaboLights is a global database for metabolomics studies including the raw experimental data and the associated metadata. The database is cross-species and cross-technique and covers metabolite structures and their reference spectra as well as their biological roles and locations where available. MetaboLights is the recommended metabolomics repository for a number of leading journals and ELIXIR, the European infrastructure for life science information. In this article, we describe the continued growth and diversity of submissions and the significant developments in recent years. In particular, we highlight MetaboLights Labs, our new Galaxy Project instance with repository-scale standardized workflows, and how data public on MetaboLights are being reused by the community. Metabolomics resources and data are available under the EMBL-EBI's Terms of Use at https://www.ebi.ac.uk/metabolights and under Apache 2.0 at https://github.com/EBI-Metabolights.
Collapse
Affiliation(s)
- Ozgur Yurekten
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Payne
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Noemi Tejera
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Felix Xavier Amaladoss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Callum Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire O’Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
42
|
Krause GR, Shands W, Wheeler TJ. Sensitive and error-tolerant annotation of protein-coding DNA with BATH. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.31.573773. [PMID: 38260252 PMCID: PMC10802276 DOI: 10.1101/2023.12.31.573773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment of that DNA to a database of protein sequences or profile hidden Markov models (pHMMs). BATH is built on top of the HMMER3 code base, and simplifies the annotation workflow for pHMM-based annotation by providing a straightforward input interface and easy-to-interpret output. BATH also introduces novel frameshift-aware algorithms to detect frameshift-inducing nucleotide insertions and deletions (indels). BATH matches the accuracy of HMMER3 for annotation of sequences containing no errors, and produces superior accuracy to all tested tools for annotation of sequences containing nucleotide indels. These results suggest that BATH should be used when high annotation sensitivity is required, particularly when frameshift errors are expected to interrupt protein-coding regions, as is true with long read sequencing data and in the context of pseudogenes.
Collapse
Affiliation(s)
- Genevieve R Krause
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, Arizona, USA
- Department of Computer Science, University of Montana, Missoula, Montana, USA
| | - Walt Shands
- Department of Computer Science, University of Montana, Missoula, Montana, USA
- UC Santa Cruz Genomics Institute, Santa Cruz, California, USA
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, Arizona, USA
- Department of Computer Science, University of Montana, Missoula, Montana, USA
| |
Collapse
|
43
|
Yu Y, Xu F, Zhao W, Thoma C, Che S, Richman JE, Jin B, Zhu Y, Xing Y, Wackett L, Men Y. Electron-bifurcation and fluoride efflux systems in Acetobacterium spp. drive defluorination of perfluorinated unsaturated carboxylic acids. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.13.568471. [PMID: 38168399 PMCID: PMC10760045 DOI: 10.1101/2023.12.13.568471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Enzymatic cleavage of C-F bonds in per- and polyfluoroalkyl substances (PFAS) is largely unknown but avidly sought to promote systems biology for PFAS bioremediation. Here, we report the reductive defluorination of α, β-unsaturated per- and polyfluorocarboxylic acids by Acetobacterium spp. Two critical molecular features in Acetobacterium species enabling reductive defluorination are (i) a functional fluoride efflux transporter (CrcB) and (ii) an electron-bifurcating caffeate reduction pathway (CarABCDE). The fluoride transporter was required for detoxification of released fluoride. Car enzymes were implicated in defluorination by the following evidence: (i) only Acetobacterium spp. with car genes catalyzed defluorination; (ii) caffeate and PFAS competed in vivo ; (iii) models from the X-ray structure of the electron-bifurcating reductase (CarC) positioned the PFAS substrate optimally for reductive defluorination; (iv) products identified by 19 F-NMR and high-resolution mass spectrometry were consistent with the model. Defluorination biomarkers identified here were found in wastewater treatment plant metagenomes on six continents.
Collapse
|
44
|
Parigger L, Krassnigg A, Grabuschnig S, Gruber K, Steinkellner G, Gruber CC. AI-assisted structural consensus-proteome prediction of human monkeypox viruses isolated within a year after the 2022 multi-country outbreak. Microbiol Spectr 2023; 11:e0231523. [PMID: 37874150 PMCID: PMC10714838 DOI: 10.1128/spectrum.02315-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 09/09/2023] [Indexed: 10/25/2023] Open
Abstract
IMPORTANCE The 2022 outbreak of the monkeypox virus already involves, by April 2023, 110 countries with 86,956 confirmed cases and 119 deaths. Understanding an emerging disease on a molecular level is essential to study infection processes and eventually guide drug discovery at an early stage. To support this, we provide the so far most comprehensive structural proteome of the monkeypox virus, which includes 210 structural models, each computed with three state-of-the-art structure prediction methods. Instead of building on a single-genome sequence, we generated our models from a consensus of 3,713 high-quality genome sequences sampled from patients within 1 year of the outbreak. Therefore, we present an average structural proteome of the currently isolated viruses, including mutational analyses with a special focus on drug-binding sites. Continuing dynamic mutation monitoring within the structural proteome presented here is essential to timely predict possible physiological changes in the evolving virus.
Collapse
Affiliation(s)
- Lena Parigger
- Innophore, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
| | | | | | - Karl Gruber
- Innophore, Graz, Austria
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
| | - Georg Steinkellner
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
- Innophore, San Francisco, California, USA
| | - Christian C. Gruber
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- Austrian Centre of Industrial Biotechnology, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
- Innophore, San Francisco, California, USA
| |
Collapse
|
45
|
Teixeira AM, Vaz-Moreira I, Calderón-Franco D, Weissbrodt D, Purkrtova S, Gajdos S, Dottorini G, Nielsen PH, Khalifa L, Cytryn E, Bartacek J, Manaia CM. Candidate biomarkers of antibiotic resistance for the monitoring of wastewater and the downstream environment. WATER RESEARCH 2023; 247:120761. [PMID: 37918195 DOI: 10.1016/j.watres.2023.120761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 10/17/2023] [Accepted: 10/18/2023] [Indexed: 11/04/2023]
Abstract
Urban wastewater treatment plants (UWTPs) are essential for reducing the pollutants load and protecting water bodies. However, wastewater catchment areas and UWTPs emit continuously antibiotic resistant bacteria (ARB) and antibiotic resistance genes (ARGs), with recognized impacts on the downstream environments. Recently, the European Commission recommended to monitor antibiotic resistance in UWTPs serving more than 100 000 population equivalents. Antibiotic resistance monitoring in environmental samples can be challenging. The expected complexity of these systems can jeopardize the interpretation capacity regarding, for instance, wastewater treatment efficiency, impacts of environmental contamination, or risks due to human exposure. Simplified monitoring frameworks will be essential for the successful implementation of analytical procedures, data analysis, and data sharing. This study aimed to test a set of biomarkers representative of ARG contamination, selected based on their frequent human association and, simultaneously, rare presence in pristine environments. In addition to the 16S rRNA gene, ten potential biomarkers (intI1, sul1, ermB, ermF, aph(3'')-Ib, qacEΔ1, uidA, mefC, tetX, and crAssphage) were monitored in DNA extracts (n = 116) from raw wastewater, activated sludge, treated wastewater, and surface water (upstream and downstream of UWTPs) samples collected in the Czech Republic, Denmark, Israel, the Netherlands, and Portugal. Each biomarker was sensitive enough to measure decreases (on average by up to 2.5 log-units gene copy/mL) from raw wastewater to surface water, with variations in the same order of magnitude as for the 16S rRNA gene. The use of the 10 biomarkers allowed the typing of water samples whose origin or quality could be predicted in a blind test. The results show that, based on appropriate biomarkers, qPCR can be used for a cost-effective and technically accessible approach to monitoring wastewater and the downstream environment.
Collapse
Affiliation(s)
- A Margarida Teixeira
- CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Universidade Católica Portuguesa, Rua de Diogo Botelho 1327, Porto 4169-005, Portugal
| | - Ivone Vaz-Moreira
- CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Universidade Católica Portuguesa, Rua de Diogo Botelho 1327, Porto 4169-005, Portugal
| | - David Calderón-Franco
- Department of Biotechnology, Environmental Biotechnology Section, Delft University of Technology, van der Maasweg 9, Delft, HZ 2629, the Netherlands
| | - David Weissbrodt
- Department of Biotechnology, Environmental Biotechnology Section, Delft University of Technology, van der Maasweg 9, Delft, HZ 2629, the Netherlands; Department of Biotechnology and Food Science, Norwegian University of Science and Technology, Trondheim 7034, Norway
| | - Sabina Purkrtova
- Department of Biochemistry and Microbiology, Faculty of Food and Biochemical Technology, University of Chemistry and Technology Prague, 5 Technická, Prague 166 28, Czech Republic
| | - Stanislav Gajdos
- Department of Water Technology and Environmental Engineering, Faculty of Environmental Technology, University of Chemistry and Technology Prague, 5 Technická, Prague 166 28, Czech Republic
| | - Giulia Dottorini
- Department of Chemistry and Bioscience, Center for Microbial Communities, Aalborg University, Aalborg 9220, Denmark
| | - Per Halkjær Nielsen
- Department of Chemistry and Bioscience, Center for Microbial Communities, Aalborg University, Aalborg 9220, Denmark
| | - Leron Khalifa
- Institute of Soil, Water and Environmental Sciences, The Volcani Institute, Agricultural Research Organization, P.O Box 15159, Rishon Lezion 7528809, Israel
| | - Eddie Cytryn
- Institute of Soil, Water and Environmental Sciences, The Volcani Institute, Agricultural Research Organization, P.O Box 15159, Rishon Lezion 7528809, Israel
| | - Jan Bartacek
- Department of Water Technology and Environmental Engineering, Faculty of Environmental Technology, University of Chemistry and Technology Prague, 5 Technická, Prague 166 28, Czech Republic
| | - Célia M Manaia
- CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Universidade Católica Portuguesa, Rua de Diogo Botelho 1327, Porto 4169-005, Portugal.
| |
Collapse
|
46
|
Lee JW, Won JH, Jeon S, Choo Y, Yeon Y, Oh JS, Kim M, Kim S, Joung I, Jang C, Lee SJ, Kim TH, Jin KH, Song G, Kim ES, Yoo J, Paek E, Noh YK, Joo K. DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics 2023; 39:btad712. [PMID: 37995286 PMCID: PMC10699847 DOI: 10.1093/bioinformatics/btad712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures. RESULTS Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups. AVAILABILITY AND IMPLEMENTATION DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.
Collapse
Affiliation(s)
- Jae-Won Lee
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jong-Hyun Won
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Seonggwang Jeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Yujin Choo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Yubin Yeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jin-Seon Oh
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Minsoo Kim
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - SeonHwa Kim
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | | | - Cheongjae Jang
- Artificial Intelligence Institute, Hanyang University, Seoul 04763, Korea
| | - Sung Jong Lee
- Basic Science Research Institute, Changwon National University, Changwon 51140, Korea
| | - Tae Hyun Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, Korea
| | - Eun-Sol Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Jejoong Yoo
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Yung-Kyun Noh
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
47
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
48
|
Buller R, Lutz S, Kazlauskas RJ, Snajdrova R, Moore JC, Bornscheuer UT. From nature to industry: Harnessing enzymes for biocatalysis. Science 2023; 382:eadh8615. [PMID: 37995253 DOI: 10.1126/science.adh8615] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/17/2023] [Indexed: 11/25/2023]
Abstract
Biocatalysis harnesses enzymes to make valuable products. This green technology is used in countless applications from bench scale to industrial production and allows practitioners to access complex organic molecules, often with fewer synthetic steps and reduced waste. The last decade has seen an explosion in the development of experimental and computational tools to tailor enzymatic properties, equipping enzyme engineers with the ability to create biocatalysts that perform reactions not present in nature. By using (chemo)-enzymatic synthesis routes or orchestrating intricate enzyme cascades, scientists can synthesize elaborate targets ranging from DNA and complex pharmaceuticals to starch made in vitro from CO2-derived methanol. In addition, new chemistries have emerged through the combination of biocatalysis with transition metal catalysis, photocatalysis, and electrocatalysis. This review highlights recent key developments, identifies current limitations, and provides a future prospect for this rapidly developing technology.
Collapse
Affiliation(s)
- R Buller
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - S Lutz
- Codexis Incorporated, Redwood City, CA 94063, USA
| | - R J Kazlauskas
- Department of Biochemistry, Molecular Biology and Biophysics, Biotechnology Institute, University of Minnesota, Saint Paul, MN 55108, USA
| | - R Snajdrova
- Novartis Institutes for BioMedical Research, Global Discovery Chemistry, 4056 Basel, Switzerland
| | - J C Moore
- MRL, Merck & Co., Rahway, NJ 07065, USA
| | - U T Bornscheuer
- Institute of Biochemistry, Dept. of Biotechnology and Enzyme Catalysis, Greifswald University, Greifswald, Germany
| |
Collapse
|
49
|
Meng D, Ai S, Spanos M, Shi X, Li G, Cretoiu D, Zhou Q, Xiao J. Exercise and microbiome: From big data to therapy. Comput Struct Biotechnol J 2023; 21:5434-5445. [PMID: 38022690 PMCID: PMC10665598 DOI: 10.1016/j.csbj.2023.10.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 12/01/2023] Open
Abstract
Exercise is a vital component in maintaining optimal health and serves as a prospective therapeutic intervention for various diseases. The human microbiome, comprised of trillions of microorganisms, plays a crucial role in overall health. Given the advancements in microbiome research, substantial databases have been created to decipher the functionality and mechanisms of the microbiome in health and disease contexts. This review presents an initial overview of microbiomics development and related databases, followed by an in-depth description of the multi-omics technologies for microbiome. It subsequently synthesizes the research pertaining to exercise-induced modifications of the microbiome and diseases that impact the microbiome. Finally, it highlights the potential therapeutic implications of an exercise-modulated microbiome in intestinal disease, obesity and diabetes, cardiovascular disease, and immune/inflammation-related diseases.
Collapse
Affiliation(s)
- Danni Meng
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Songwei Ai
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Michail Spanos
- Cardiovascular Division of the Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Xiaohui Shi
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Guoping Li
- Cardiovascular Division of the Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Dragos Cretoiu
- Department of Medical Genetics, Carol Davila University of Medicine and Pharmacy, Bucharest 020031, Romania
- Materno-Fetal Assistance Excellence Unit, Alessandrescu-Rusescu National Institute for Mother and Child Health, Bucharest 011062, Romania
| | - Qiulian Zhou
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Junjie Xiao
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
50
|
Durairaj J, Waterhouse AM, Mets T, Brodiazhenko T, Abdullah M, Studer G, Tauriello G, Akdel M, Andreeva A, Bateman A, Tenson T, Hauryliuk V, Schwede T, Pereira J. Uncovering new families and folds in the natural protein universe. Nature 2023; 622:646-653. [PMID: 37704037 PMCID: PMC10584680 DOI: 10.1038/s41586-023-06622-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 09/07/2023] [Indexed: 09/15/2023]
Abstract
We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Toomas Mets
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | | | - Minhal Abdullah
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | | | - Antonina Andreeva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Tanel Tenson
- Institute of Technology, University of Tartu, Tartu, Estonia
| | - Vasili Hauryliuk
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
- Science for Life Laboratory, Lund, Sweden
- Virus Centre, Lund University, Lund, Sweden
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland.
| | - Joana Pereira
- Biozentrum, University of Basel, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland.
| |
Collapse
|