1
|
Pereira H, Silva PC, Davis MW, Abraham L, Babnigg G, Bengtsson H, Johansson B. SEGUID v2: Extending SEGUID checksums for circular, linear, single- and double-stranded biological sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582384. [PMID: 39484537 PMCID: PMC11526859 DOI: 10.1101/2024.02.28.582384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Background Synthetic biology involves combining different DNA fragments, each containing functional biological parts, to address specific problems. Fundamental gene-function research often requires cloning and propagating DNA fragments, such as those from the iGEM Parts Registry or Addgene, typically distributed as circular plasmids.Addgene's repository alone offers around 150,000 plasmids. To ensure data integrity, cryptographic checksums can be calculated for the sequences. Each sequence has a unique checksum, making checksums useful for validation and quick lookups of associated annotations. For example, the SEGUID checksum, uniquely identifies protein sequences with a 27-character string. Objectives The original SEGUID, while effective for protein sequences and single-stranded DNA (ssDNA), is not suitable for circular and double-stranded DNA (dsDNA) due to topological differences. Challenges include how to uniquely represent linear dsDNA, circular ssDNA, and circular dsDNA. To meet these needs, we propose SEGUID v2, which extends the original SEGUID to handle additional types of sequences. Conclusions SEGUID v2 produces orientation and rotation invariant checksums for single-stranded, double-stranded, possibly staggered, linear, and circular DNA and RNA sequences. Customizable alphabets allow for other types of sequences. In contrast to the original SEGUID, which uses Base64, SEGUID v2 uses Base64url to encode the SHA-1 hash. This ensures SEGUID v2 checksums can be used as-is in filenames, regardless of platform, and in URLs, with minimal friction. Availability SEGUID v2 is readily available for major programming languages, distributed under the MIT license. JavaScript package seguid is available on npm, Python package seguid on PyPi, R package seguid on CRAN, and a Tcl script on GitHub. These tools, along with documentation, examples, and an online SEGUID Calculator , can be found at https://www.seguid.org .
Collapse
|
2
|
Petersen SD, Levassor L, Pedersen CM, Madsen J, Hansen LG, Zhang J, Haidar AK, Frandsen RJN, Keasling JD, Weber T, Sonnenschein N, K. Jensen M. teemi: An open-source literate programming approach for iterative design-build-test-learn cycles in bioengineering. PLoS Comput Biol 2024; 20:e1011929. [PMID: 38457467 PMCID: PMC10954146 DOI: 10.1371/journal.pcbi.1011929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 03/20/2024] [Accepted: 02/17/2024] [Indexed: 03/10/2024] Open
Abstract
Synthetic biology dictates the data-driven engineering of biocatalysis, cellular functions, and organism behavior. Integral to synthetic biology is the aspiration to efficiently find, access, interoperate, and reuse high-quality data on genotype-phenotype relationships of native and engineered biosystems under FAIR principles, and from this facilitate forward-engineering strategies. However, biology is complex at the regulatory level, and noisy at the operational level, thus necessitating systematic and diligent data handling at all levels of the design, build, and test phases in order to maximize learning in the iterative design-build-test-learn engineering cycle. To enable user-friendly simulation, organization, and guidance for the engineering of biosystems, we have developed an open-source python-based computer-aided design and analysis platform operating under a literate programming user-interface hosted on Github. The platform is called teemi and is fully compliant with FAIR principles. In this study we apply teemi for i) designing and simulating bioengineering, ii) integrating and analyzing multivariate datasets, and iii) machine-learning for predictive engineering of metabolic pathway designs for production of a key precursor to medicinal alkaloids in yeast. The teemi platform is publicly available at PyPi and GitHub.
Collapse
Affiliation(s)
- Søren D. Petersen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Lucas Levassor
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christine M. Pedersen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Jan Madsen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Lea G. Hansen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Jie Zhang
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Ahmad K. Haidar
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Rasmus J. N. Frandsen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Jay D. Keasling
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
- Joint BioEnergy Institute, Emeryville, California, United States of America
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Department of Chemical and Biomolecular Engineering, Department of Bioengineering, University of California, Berkeley, California, United States of America
- Center for Synthetic Biochemistry, Institute for Synthetic Biology, Shenzhen Institutes of Advanced Technologies, Shenzhen, China
| | - Tilmann Weber
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Nikolaus Sonnenschein
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Michael K. Jensen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|
3
|
Soliman MA, Azab MS, Hussein HA, Roushdy MM, Abu El-Naga MN. FBPP: software to design PCR primers and probes for nucleic acid base detection of foodborne pathogens. Sci Rep 2024; 14:1229. [PMID: 38216615 PMCID: PMC10786913 DOI: 10.1038/s41598-024-51372-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/04/2024] [Indexed: 01/14/2024] Open
Abstract
Foodborne pathogens can be found in various foods, and it is important to detect foodborne pathogens to provide a safe food supply and to prevent foodborne diseases. The nucleic acid base detection method is one of the most rapid and widely used methods in the detection of foodborne pathogens; it depends on hybridizing the target nucleic acid sequence to a synthetic oligonucleotide (probes or primers) that is complementary to the target sequence. Designing primers and probes for this method is a preliminary and critical step. However, new bioinformatics tools are needed to automate, specific and improve the design sets to be used in the nucleic acid‒base method. Thus, we developed foodborne pathogen primer probe design (FBPP), an open-source, user-friendly graphical interface Python-based application supported by the SQL database for foodborne pathogen virulence factors, for (i) designing primers/probes for detection purposes, (ii) PCR and gel electrophoresis photo simulation, and (iii) checking the specificity of primers/probes.
Collapse
Affiliation(s)
- Mohamed A Soliman
- Department of Botany and Microbiology, Faculty of Science (Boys), Al-Azhar University, Nasr City, Cairo, Egypt.
| | - Mohamed S Azab
- Department of Botany and Microbiology, Faculty of Science (Boys), Al-Azhar University, Nasr City, Cairo, Egypt
| | - Hala A Hussein
- Department of Radiation Microbiology, National Centre for Radiation Research and Technology, Atomic Energy Authority, Nasr City, Cairo, Egypt
| | - Mohamed M Roushdy
- Department of Botany and Microbiology, Faculty of Science (Boys), Al-Azhar University, Nasr City, Cairo, Egypt
| | - Mohamed N Abu El-Naga
- Department of Radiation Microbiology, National Centre for Radiation Research and Technology, Atomic Energy Authority, Nasr City, Cairo, Egypt
| |
Collapse
|
4
|
Arabi-Jeshvaghani F, Javadi-Zarnaghi F, Löchel HF, Martin R, Heider D. LAMPPrimerBank, a manually curated database of experimentally validated loop-mediated isothermal amplification primers for detection of respiratory pathogens. Infection 2023; 51:1809-1818. [PMID: 37828369 DOI: 10.1007/s15010-023-02100-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/13/2023] [Indexed: 10/14/2023]
Abstract
PURPOSE AND METHODS The emergence of coronavirus disease 2019 (COVID-19) has once again affirmed the significant threat of respiratory infections to global public health and the utmost importance of prompt diagnosis in managing and mitigating any pandemic. The nucleic acid amplification test (NAAT) is the primary detection method for most pathogens. Loop-mediated isothermal amplification (LAMP) is a rapid, simple, sensitive, and specific epitome of isothermal NAAT performed using a set of four to six primers. Primer design is a fundamental step in LAMP assays, with several complexities and experimental screening requirements. To address this challenge, an online database is presented here. Its workflow comprises three steps: literature aggregation, data curation, and database and website implementation. RESULTS LAMPPrimerBank ( https://lampprimerbank.mathematik.uni-marburg.de ) is a manually curated database dedicated to experimentally validated LAMP primers, their peculiarities of assays, and accompanying literature, with a primary emphasis on respiratory pathogens. LAMPPrimerBank, with its user-friendly web interface and an open application programming interface, enables the accelerated and facile exploration, comparison, and exportation of LAMP primer sequences and their respective information from the massively scattered literature. LAMPPrimerBank currently comprises LAMP primers for diagnosing viral, bacterial, and fungal respiratory pathogens. Additionally, to address the challenge of false-positive results generated by nonspecific amplifications, LAMPPrimerBank computationally predicted and visualized the sizes of LAMP products for recorded primer sets in the database. CONCLUSION LAMPPrimerBank, as a pioneering database in the rapidly expanding field of isothermal NAAT, endeavors to confront the two challenges of the LAMP: primer design and discrimination of false-positive results.
Collapse
Affiliation(s)
- Fatemeh Arabi-Jeshvaghani
- Department of Cell and Molecular Biology & Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran
| | - Fatemeh Javadi-Zarnaghi
- Department of Cell and Molecular Biology & Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran.
| | - Hannah Franziska Löchel
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| | - Roman Martin
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| |
Collapse
|
5
|
Pozdniakova TA, Cruz JP, Silva PC, Azevedo F, Parpot P, Domingues MR, Carlquist M, Johansson B. Optimization of a hybrid bacterial/ Arabidopsis thaliana fatty acid synthase system II in Saccharomyces cerevisiae. Metab Eng Commun 2023; 17:e00224. [PMID: 37415783 PMCID: PMC10320613 DOI: 10.1016/j.mec.2023.e00224] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 03/27/2023] [Accepted: 05/02/2023] [Indexed: 07/08/2023] Open
Abstract
Fatty acids are produced by eukaryotes like baker's yeast Saccharomyces cerevisiae mainly using a large multifunctional type I fatty acid synthase (FASI) where seven catalytic steps and a carrier domain are shared between one or two protein subunits. While this system may offer efficiency in catalysis, only a narrow range of fatty acids are produced. Prokaryotes, chloroplasts and mitochondria rely instead on a FAS type II (FASII) where each catalytic step is carried out by a monofunctional enzyme encoded by a separate gene. FASII is more flexible and capable of producing a wider range of fatty acid structures, such as the direct production of unsaturated fatty acids. An efficient FASII in the preferred industrial organism S. cerevisiae could provide a platform for developing sustainable production of specialized fatty acids. We functionally replaced either yeast FASI genes (FAS1 or FAS2) with a FASII consisting of nine genes from Escherichia coli (acpP, acpS and fab -A, -B, -D, -F, -G, -H, -Z) as well as three from Arabidopsis thaliana (MOD1, FATA1 and FATB). The genes were expressed from an autonomously replicating multicopy vector assembled using the Yeast Pathway Kit for in-vivo assembly in yeast. Two rounds of adaptation led to a strain with a maximum growth rate (μmax) of 0.19 h-1 without exogenous fatty acids, twice the growth rate previously reported for a comparable strain. Additional copies of the MOD1 or fabH genes resulted in cultures with higher final cell densities and three times higher lipid content compared to the control.
Collapse
Affiliation(s)
- Tatiana A. Pozdniakova
- CBMA - Center of Molecular and Environmental Biology, University of Minho, Campus de Gualtar, Braga, 4710-057, Portugal
| | - João P. Cruz
- CBMA - Center of Molecular and Environmental Biology, University of Minho, Campus de Gualtar, Braga, 4710-057, Portugal
| | - Paulo César Silva
- CBMA - Center of Molecular and Environmental Biology, University of Minho, Campus de Gualtar, Braga, 4710-057, Portugal
| | - Flávio Azevedo
- CBMA - Center of Molecular and Environmental Biology, University of Minho, Campus de Gualtar, Braga, 4710-057, Portugal
| | - Pier Parpot
- CEB - C, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
- LABBELS - Associate Laboratory, Braga, Portugal
- Centre of Chemistry, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
| | - Maria Rosario Domingues
- Mass Spectrometry Center & LAQV-REQUIMTE, Department of Chemistry, University of Aveiro, Campus Universitário de Santiago, 3810-193, Aveiro, Portugal
- CESAM–Centre for Environmental and Marine Studies, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Campus Universitário de Santiago, 3810-193, Aveiro, Portugal
| | - Magnus Carlquist
- Division of Applied Microbiology, Lund University, Box 124, 221 00, Lund, Sweden
| | - Björn Johansson
- CBMA - Center of Molecular and Environmental Biology, University of Minho, Campus de Gualtar, Braga, 4710-057, Portugal
| |
Collapse
|
6
|
Ataii N, Bakshi S, Chen Y, Fernandez M, Shao Z, Scheftel Z, Tou C, Vega M, Wang Y, Zhang H, Zhao Z, Anderson JC. Enabling AI in synthetic biology through Construction File specification. PLoS One 2023; 18:e0294469. [PMID: 37956196 PMCID: PMC10642840 DOI: 10.1371/journal.pone.0294469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
The Construction File (CF) specification establishes a standardized interface for molecular biology operations, laying a foundation for automation and enhanced efficiency in experiment design. It is implemented across three distinct software projects: PyDNA_CF_Simulator, a Python project featuring a ChatGPT plugin for interactive parsing and simulating experiments; ConstructionFileSimulator, a field-tested Java project that showcases 'Experiment' objects expressed as flat files; and C6-Tools, a JavaScript project integrated with Google Sheets via Apps Script, providing a user-friendly interface for authoring and simulation of CF. The CF specification not only standardizes and modularizes molecular biology operations but also promotes collaboration, automation, and reuse, significantly reducing potential errors. The potential integration of CF with artificial intelligence, particularly GPT-4, suggests innovative automation strategies for synthetic biology. While challenges such as token limits, data storage, and biosecurity remain, proposed solutions promise a way forward in harnessing AI for experiment design. This shift from human-driven design to AI-assisted workflows, steered by high-level objectives, charts a potential future path in synthetic biology, envisioning an environment where complexities are managed more effectively.
Collapse
Affiliation(s)
- Nassim Ataii
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Sanjyot Bakshi
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Yisheng Chen
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Michael Fernandez
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Zihang Shao
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Zachary Scheftel
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Connor Tou
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Mia Vega
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Yuting Wang
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Hanxiao Zhang
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - Zexuan Zhao
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
| | - J. Christopher Anderson
- Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America
- QB3: California Institute for Quantitative Biological Research, University of California, Berkeley, Berkeley, California, United States of America
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| |
Collapse
|
7
|
Plessis C, Jeanne T, Dionne A, Vivancos J, Droit A, Hogue R. ASVmaker: A New Tool to Improve Taxonomic Identifications for Amplicon Sequencing Data. PLANTS (BASEL, SWITZERLAND) 2023; 12:3678. [PMID: 37960035 PMCID: PMC10647208 DOI: 10.3390/plants12213678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/20/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]
Abstract
The taxonomic assignment of sequences obtained by high throughput amplicon sequencing poses a limitation for various applications in the biomedical, environmental, and agricultural fields. Identifications are constrained by the length of the obtained sequences and the computational processes employed to efficiently assign taxonomy. Arriving at a consensus is often preferable to uncertain identification for ecological purposes. To address this issue, a new tool called "ASVmaker" has been developed to facilitate the creation of custom databases, thereby enhancing the precision of specific identifications. ASVmaker is specifically designed to generate reference databases for allocating amplicon sequencing data. It uses publicly available reference data and generates specific sequences derived from the primers used to create amplicon sequencing libraries. This versatile tool can complete taxonomic assignments performed with pre-trained classifiers from the SILVA and UNITE databases. Moreover, it enables the generation of comprehensive reference databases for specific genes in cases where no directly applicable database exists for taxonomic classification tools.
Collapse
Affiliation(s)
- Clément Plessis
- Institut de Recherche et de Développement en Agroenvironnement, Québec, QC G1P 3W8, Canada
- Computational Biology Laboratory, CHU de Québec—Université Laval Research Center, Québec City, QC G1V 4G2, Canada
| | - Thomas Jeanne
- Institut de Recherche et de Développement en Agroenvironnement, Québec, QC G1P 3W8, Canada
- Computational Biology Laboratory, CHU de Québec—Université Laval Research Center, Québec City, QC G1V 4G2, Canada
| | - Antoine Dionne
- Laboratoire d’Expertise et de Diagnostic en Phytoprotection, Ministère de l’Agriculture, des Pêcheries et de l’Alimentation du Québec (MAPAQ), Québec City, QC G1P 3W6, Canada
| | - Julien Vivancos
- Laboratoire d’Expertise et de Diagnostic en Phytoprotection, Ministère de l’Agriculture, des Pêcheries et de l’Alimentation du Québec (MAPAQ), Québec City, QC G1P 3W6, Canada
| | - Arnaud Droit
- Computational Biology Laboratory, CHU de Québec—Université Laval Research Center, Québec City, QC G1V 4G2, Canada
| | - Richard Hogue
- Institut de Recherche et de Développement en Agroenvironnement, Québec, QC G1P 3W8, Canada
| |
Collapse
|
8
|
Mori H, Yachie N. A framework to efficiently describe and share reproducible DNA materials and construction protocols. Nat Commun 2022; 13:2894. [PMID: 35610233 PMCID: PMC9130275 DOI: 10.1038/s41467-022-30588-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 05/10/2022] [Indexed: 12/02/2022] Open
Abstract
DNA constructs and their annotated sequence maps have been rapidly accumulating with the advancement of DNA cloning, synthesis, and assembly methods. Such resources have also been utilized in designing and building new DNA materials. However, as commonly seen in the life sciences, no framework exists to describe reproducible DNA construction processes. Furthermore, the use of previously developed DNA materials and building protocols is usually not appropriately credited. Here, we report a framework QUEEN (framework to generate quinable and efficiently editable nucleotide sequence resources) to resolve these issues and accelerate the building of DNA. QUEEN enables the flexible design of new DNA by using existing DNA material resource files and recording its construction process in an output file (GenBank file format). A GenBank file generated by QUEEN can regenerate the process code such that it perfectly clones itself and bequeaths the same process code to its successive GenBank files, recycling its partial DNA resources. QUEEN-generated GenBank files are compatible with existing DNA repository services and software. We propose QUEEN as a solution to start significantly advancing the material and protocol sharing of DNA resources.
Collapse
Affiliation(s)
- Hideto Mori
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, 153-8904, Japan
- Institute for Advanced Biosciences, Keio University, Tsuruoka, 997-0035, Japan
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa, 252-0882, Japan
| | - Nozomu Yachie
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, 153-8904, Japan.
- School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, BC, V6T 1Z3, Canada.
| |
Collapse
|
9
|
Pereira H, Azevedo F, Domingues L, Johansson B. Expression of Yarrowia lipolytica acetyl-CoA carboxylase in Saccharomyces cerevisiae and its effect on in-vivo accumulation of Malonyl-CoA. Comput Struct Biotechnol J 2022; 20:779-787. [PMID: 36284710 PMCID: PMC9582701 DOI: 10.1016/j.csbj.2022.01.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 12/18/2022] Open
Abstract
Novel S. cerevisiae strain with tetracycline repressible ACC1 promoter. Functional expression of Y. lipolytica ACC1 in S. cerevisiae. Higher malonyl-CoA concentration achieved with Y. lipolytica ACC1 gene. S. cerevisiae Acc1p seems to interact with the heterologous Y. lipolytica Acc1p.
Malonyl-CoA is an energy-rich molecule formed by the ATP-dependent carboxylation of acetyl coenzyme A catalyzed by acetyl-CoA carboxylase. This molecule is an important precursor for many biotechnologically interesting compounds such as flavonoids, polyketides, and fatty acids. The yeast Saccharomyces cerevisiae remains one of the preferred cell factories, but has a limited capacity to produce malonyl-CoA compared to oleaginous organisms. We developed a new S. cerevisiae strain with a conditional allele of ACC1, the essential acetyl-CoA carboxylase (ACC) gene, as a tool to test heterologous genes for complementation. Yarrowia lipolytica is an oleaginous yeast with a higher capacity for lipid production than S. cerevisiae, possibly due to a higher capacity to produce malonyl-CoA. Measuring relative intracellular malonyl-CoA levels with an in-vivo biosensor confirmed that expression of Y. lipolytica ACC in S. cerevisiae leads to a higher accumulation of malonyl-CoA compared with overexpression of the native gene from an otherwise identical vector. The higher accumulation was generally accompanied by a decreased growth rate. Concomitant expression of both the homologous and heterologous ACC1 genes eliminated the growth defect, with a marginal reduction of malonyl-CoA accumulation.
Collapse
Affiliation(s)
- Humberto Pereira
- CBMA - Center of Molecular and Environmental Biology Engineering
| | - Flávio Azevedo
- CBMA - Center of Molecular and Environmental Biology Engineering
| | - Lucília Domingues
- CEB - Centre of Biological Engineering, University of Minho, Campus de Gualtar, Braga, 4710-057, Portugal
| | - Björn Johansson
- CBMA - Center of Molecular and Environmental Biology Engineering
- Corresponding author.
| |
Collapse
|
10
|
Li HL, Pang YH, Liu B. BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic Acids Res 2021; 49:e129. [PMID: 34581805 PMCID: PMC8682797 DOI: 10.1093/nar/gkab829] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 08/24/2021] [Accepted: 09/09/2021] [Indexed: 01/08/2023] Open
Abstract
In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
Collapse
Affiliation(s)
- Hong-Liang Li
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
11
|
Wang Y, Xue H, Pourcel C, Du Y, Gautheret D. 2-kupl: mapping-free variant detection from DNA-seq data of matched samples. BMC Bioinformatics 2021; 22:304. [PMID: 34090332 PMCID: PMC8180056 DOI: 10.1186/s12859-021-04185-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 05/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detection of genome variants, including point mutations, indels and structural variants, is a fundamental and challenging computational problem. We address here the problem of variant detection between two deep-sequencing (DNA-seq) samples, such as two human samples from an individual patient, or two samples from distinct bacterial strains. The preferred strategy in such a case is to align each sample to a common reference genome, collect all variants and compare these variants between samples. Such mapping-based protocols have several limitations. DNA sequences with large indels, aggregated mutations and structural variants are hard to map to the reference. Furthermore, DNA sequences cannot be mapped reliably to genomic low complexity regions and repeats. RESULTS We introduce 2-kupl, a k-mer based, mapping-free protocol to detect variants between two DNA-seq samples. On simulated and actual data, 2-kupl achieves higher accuracy than other mapping-free protocols. Applying 2-kupl to prostate cancer whole exome sequencing data, we identify a number of candidate variants in hard-to-map regions and propose potential novel recurrent variants in this disease. CONCLUSIONS We developed a mapping-free protocol for variant calling between matched DNA-seq samples. Our protocol is suitable for variant detection in unmappable genome regions or in the absence of a reference genome.
Collapse
Affiliation(s)
- Yunfeng Wang
- Institute of Integrative Cell Biology (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190 Gif-sur-Yvette, France
- Annoroad Gene Technology Co., Ltd, Beijing, 100176 China
| | - Haoliang Xue
- Institute of Integrative Cell Biology (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190 Gif-sur-Yvette, France
| | - Christine Pourcel
- Institute of Integrative Cell Biology (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190 Gif-sur-Yvette, France
| | - Yang Du
- Annoroad Gene Technology Co., Ltd, Beijing, 100176 China
| | - Daniel Gautheret
- Institute of Integrative Cell Biology (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190 Gif-sur-Yvette, France
- IHU PRISM, Gustave Roussy, 114 rue Edouard Vaillant, 94800 Villejuif, France
| |
Collapse
|
12
|
Zulkower V. Computer-Aided Design and Pre-validation of Large Batches of DNA Assemblies. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2229:157-166. [PMID: 33405220 DOI: 10.1007/978-1-0716-1032-9_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Type-2S restriction enzymes allow the routine assembly of large batches of synthetic constructs from individual genetic parts. However, design flaws in the part sequence can cause assembly failures, incurring troubleshooting costs and project delays. As a result, the careful design and checking of the assembly plan is often a bottleneck of large assembly projects, and may require computational support. This chapter demonstrates the use of two free and open-source web applications accelerating this task by automating genetic part design and simulating type-2S cloning to detect potential assembly issues.
Collapse
Affiliation(s)
- Valentin Zulkower
- Edinburgh Genome Foundry, SynthSys, School of Biological Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
13
|
A novel D-xylose isomerase from the gut of the wood feeding beetle Odontotaenius disjunctus efficiently expressed in Saccharomyces cerevisiae. Sci Rep 2021; 11:4766. [PMID: 33637780 PMCID: PMC7910561 DOI: 10.1038/s41598-021-83937-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 02/09/2021] [Indexed: 11/25/2022] Open
Abstract
Carbohydrate rich substrates such as lignocellulosic hydrolysates remain one of the primary sources of potentially renewable fuel and bulk chemicals. The pentose sugar d-xylose is often present in significant amounts along with hexoses. Saccharomyces cerevisiae can acquire the ability to metabolize d-xylose through expression of heterologous d-xylose isomerase (XI). This enzyme is notoriously difficult to express in S. cerevisiae and only fourteen XIs have been reported to be active so far. We cloned a new d-xylose isomerase derived from microorganisms in the gut of the wood-feeding beetle Odontotaenius disjunctus. Although somewhat homologous to the XI from Piromyces sp. E2, the new gene was identified as bacterial in origin and the host as a Parabacteroides sp. Expression of the new XI in S. cerevisiae resulted in faster aerobic growth than the XI from Piromyces on d-xylose media. The d-xylose isomerization rate conferred by the new XI was also 72% higher, while absolute xylitol production was identical in both strains. Interestingly, increasing concentrations of xylitol (up to 8 g L−1) appeared not to inhibit d-xylose consumption. The newly described XI displayed 2.6 times higher specific activity, 37% lower KM for d-xylose, and exhibited higher activity over a broader temperature range, retaining 51% of maximal activity at 30 °C compared with only 29% activity for the Piromyces XI.
Collapse
|
14
|
Deep diversification of an AAV capsid protein by machine learning. Nat Biotechnol 2021; 39:691-696. [PMID: 33574611 DOI: 10.1038/s41587-020-00793-4] [Citation(s) in RCA: 132] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Accepted: 12/08/2020] [Indexed: 11/08/2022]
Abstract
Modern experimental technologies can assay large numbers of biological sequences, but engineered protein libraries rarely exceed the sequence diversity of natural protein families. Machine learning (ML) models trained directly on experimental data without biophysical modeling provide one route to accessing the full potential diversity of engineered proteins. Here we apply deep learning to design highly diverse adeno-associated virus 2 (AAV2) capsid protein variants that remain viable for packaging of a DNA payload. Focusing on a 28-amino acid segment, we generated 201,426 variants of the AAV2 wild-type (WT) sequence yielding 110,689 viable engineered capsids, 57,348 of which surpass the average diversity of natural AAV serotype sequences, with 12-29 mutations across this region. Even when trained on limited data, deep neural network models accurately predict capsid viability across diverse variants. This approach unlocks vast areas of functional but previously unreachable sequence space, with many potential applications for the generation of improved viral vectors and protein therapeutics.
Collapse
|
15
|
Silva PC, Domingues L, Collins T, Oliveira R, Johansson B. Quantitative assessment of DNA damage in the industrial ethanol production strain Saccharomyces cerevisiae PE-2. FEMS Yeast Res 2018; 18:5097783. [PMID: 30219865 DOI: 10.1093/femsyr/foy101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 09/12/2018] [Indexed: 11/14/2022] Open
Abstract
Lignocellulosic hydrolysates remain one of the most abundantly used substrates for the sustainable production of second generation fuels and chemicals with Saccharomyces cerevisiae. Nevertheless, fermentation inhibitors such as acetic acid, furfural and hydroxymethylfurfural are formed during the process and can lead to slow or stuck fermentations and/or act as genotoxic agents leading to production strain genetic instability. We have developed a novel dominant deletion (DEL) cassette assay for quantification of DNA damage in both wild-type and industrial yeast strains. Using this assay, the ethanol production strain S. cerevisiae PE-2 was shown to be more resistant to hydrogen peroxide and furfural than the laboratory DEL strain RS112. Indeed, the PE-2 strain also showed a lower tendency for recombination, consistent with a more efficient DNA protection. The dominant DEL assay presented herein should prove to be a useful tool in the selection of robust yeast strains and process conditions for second generation feedstock fermentations.
Collapse
Affiliation(s)
| | - Lucília Domingues
- CEB-Centre of Biological Engineering, University of Minho, Campus de Gualtar, Braga 4710-057, Portugal
| | - Tony Collins
- CBMA - Center of Molecular and Environmental Biology
| | - Rui Oliveira
- CEB-Centre of Biological Engineering, University of Minho, Campus de Gualtar, Braga 4710-057, Portugal.,Centre for the Research and Technology of Agro-Environmental and Biological Sciences (CITAB), Department of Biology, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| | | |
Collapse
|
16
|
Cunha JT, Costa CE, Ferraz L, Romaní A, Johansson B, Sá-Correia I, Domingues L. HAA1 and PRS3 overexpression boosts yeast tolerance towards acetic acid improving xylose or glucose consumption: unravelling the underlying mechanisms. Appl Microbiol Biotechnol 2018; 102:4589-4600. [PMID: 29607452 DOI: 10.1007/s00253-018-8955-z] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 03/12/2018] [Accepted: 03/18/2018] [Indexed: 11/28/2022]
Abstract
Acetic acid tolerance and xylose consumption are desirable traits for yeast strains used in industrial biotechnological processes. In this work, overexpression of a weak acid stress transcriptional activator encoded by the gene HAA1 and a phosphoribosyl pyrophosphate synthetase encoded by PRS3 in a recombinant industrial Saccharomyces cerevisiae strain containing a xylose metabolic pathway was evaluated in the presence of acetic acid in xylose- or glucose-containing media. HAA1 or PRS3 overexpression resulted in superior yeast growth and higher sugar consumption capacities in the presence of 4 g/L acetic acid, and a positive synergistic effect resulted from the simultaneous overexpression of both genes. Overexpressing these genes also improved yeast adaptation to a non-detoxified hardwood hydrolysate with a high acetic acid content. Furthermore, the overexpression of HAA1 and/or PRS3 was found to increase the robustness of yeast cell wall when challenged with acetic acid stress, suggesting the involvement of the modulation of the cell wall integrity pathway. This study clearly shows HAA1 and/or, for the first time, PRS3 overexpression to play an important role in the improvement of industrial yeast tolerance towards acetic acid. The results expand the molecular toolbox and add to the current understanding of the mechanisms involved in higher acetic acid tolerance, paving the way for the further development of more efficient industrial processes.
Collapse
Affiliation(s)
- Joana T Cunha
- Centre of Biological Engineering (CEB), University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
| | - Carlos E Costa
- Centre of Biological Engineering (CEB), University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
| | - Luís Ferraz
- Centre of Biological Engineering (CEB), University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
| | - Aloia Romaní
- Centre of Biological Engineering (CEB), University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
| | - Björn Johansson
- Center of Molecular and Environmental Biology (CBMA), University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
| | - Isabel Sá-Correia
- Institute for Bioengineering and Biosciences, Department of Bioengineering, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001, Lisbon, Portugal
| | - Lucília Domingues
- Centre of Biological Engineering (CEB), University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal.
| |
Collapse
|
17
|
Taylor LJ, Strebel K. Pyviko: an automated Python tool to design gene knockouts in complex viruses with overlapping genes. BMC Microbiol 2017; 17:12. [PMID: 28061810 PMCID: PMC5219722 DOI: 10.1186/s12866-016-0920-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 12/20/2016] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Gene knockouts are a common tool used to study gene function in various organisms. However, designing gene knockouts is complicated in viruses, which frequently contain sequences that code for multiple overlapping genes. Designing mutants that can be traced by the creation of new or elimination of existing restriction sites further compounds the difficulty in experimental design of knockouts of overlapping genes. While software is available to rapidly identify restriction sites in a given nucleotide sequence, no existing software addresses experimental design of mutations involving multiple overlapping amino acid sequences in generating gene knockouts. RESULTS Pyviko performed well on a test set of over 240,000 gene pairs collected from viral genomes deposited in the National Center for Biotechnology Information Nucleotide database, identifying a point mutation which added a premature stop codon within the first 20 codons of the target gene in 93.2% of all tested gene-overprinted gene pairs. This shows that Pyviko can be used successfully in a wide variety of contexts to facilitate the molecular cloning and study of viral overprinted genes. CONCLUSIONS Pyviko is an extensible and intuitive Python tool for designing knockouts of overlapping genes. Freely available as both a Python package and a web-based interface ( http://louiejtaylor.github.io/pyViKO/ ), Pyviko simplifies the experimental design of gene knockouts in complex viruses with overlapping genes.
Collapse
Affiliation(s)
- Louis J. Taylor
- Viral Biochemistry Section, Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania Philadelphia, Pennsylvania, USA
| | - Klaus Strebel
- Viral Biochemistry Section, Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD USA
| |
Collapse
|
18
|
Pereira F, Azevedo F, Parachin NS, Hahn-Hägerdal B, Gorwa-Grauslund MF, Johansson B. Yeast Pathway Kit: A Method for Metabolic Pathway Assembly with Automatically Simulated Executable Documentation. ACS Synth Biol 2016; 5:386-94. [PMID: 26916955 DOI: 10.1021/acssynbio.5b00250] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
We have developed the Yeast Pathway Kit (YPK) for rational and random metabolic pathway assembly in Saccharomyces cerevisiae using reusable and redistributable genetic elements. Genetic elements are cloned in a suicide vector in a rapid process that omits PCR product purification. Single-gene expression cassettes are assembled in vivo using genetic elements that are both promoters and terminators (TP). Cassettes sharing genetic elements are assembled by recombination into multigene pathways. A wide selection of prefabricated TP elements makes assembly both rapid and inexpensive. An innovative software tool automatically produces detailed self-contained executable documentation in the form of pydna code in the narrative Jupyter notebook format to facilitate planning and sharing YPK projects. A d-xylose catabolic pathway was created using YPK with four or eight genes that resulted in one of the highest growth rates reported on d-xylose (0.18 h(-1)) for recombinant S. cerevisiae without adaptation. The two-step assembly of single-gene expression cassettes into multigene pathways may improve the yield of correct pathways at the cost of adding overall complexity, which is offset by the supplied software tool.
Collapse
Affiliation(s)
- Filipa Pereira
- CBMA—Centre
of Molecular and Environmental Biology, Department
of Biology, University of Minho, Campus de Gualtar, Braga 4710-057, Portugal
| | - Flávio Azevedo
- CBMA—Centre
of Molecular and Environmental Biology, Department
of Biology, University of Minho, Campus de Gualtar, Braga 4710-057, Portugal
| | - Nadia Skorupa Parachin
- Division
of Applied Microbiology, Department of Chemistry, Lund University, SE-22100 Lund, Sweden
| | - Bärbel Hahn-Hägerdal
- Division
of Applied Microbiology, Department of Chemistry, Lund University, SE-22100 Lund, Sweden
| | - Marie F. Gorwa-Grauslund
- Division
of Applied Microbiology, Department of Chemistry, Lund University, SE-22100 Lund, Sweden
| | - Björn Johansson
- CBMA—Centre
of Molecular and Environmental Biology, Department
of Biology, University of Minho, Campus de Gualtar, Braga 4710-057, Portugal
| |
Collapse
|