1
|
Zelenka NR, Di Cara N, Sharma K, Sarvaharman S, Ghataora JS, Parmeggiani F, Nivala J, Abdallah ZS, Marucci L, Gorochowski TE. Data hazards in synthetic biology. Synth Biol (Oxf) 2024; 9:ysae010. [PMID: 38973982 PMCID: PMC11227101 DOI: 10.1093/synbio/ysae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/17/2024] [Accepted: 06/19/2024] [Indexed: 07/09/2024] Open
Abstract
Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.
Collapse
Affiliation(s)
- Natalie R Zelenka
- Jean Golding Institute, University of Bristol, Bristol, UK
- BrisEngBio, University of Bristol, Bristol, UK
| | - Nina Di Cara
- School of Psychological Science, University of Bristol, Bristol, UK
| | - Kieren Sharma
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | | | - Jasdeep S Ghataora
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Fabio Parmeggiani
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biochemistry, University of Bristol, Bristol, UK
- School of Pharmacy and Pharmaceutical Sciences, Cardiff University, Cardiff, UK
| | - Jeff Nivala
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Zahraa S Abdallah
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | - Lucia Marucci
- BrisEngBio, University of Bristol, Bristol, UK
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | - Thomas E Gorochowski
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biological Sciences, University of Bristol, Bristol, UK
| |
Collapse
|
2
|
Yurchenko A, Özkul G, van Riel NAW, van Hest JCM, de Greef TFA. Mechanism-based and data-driven modeling in cell-free synthetic biology. Chem Commun (Camb) 2024; 60:6466-6475. [PMID: 38847387 DOI: 10.1039/d4cc01289e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Cell-free systems have emerged as a versatile platform in synthetic biology, finding applications in various areas such as prototyping synthetic circuits, biosensor development, and biomanufacturing. To streamline the prototyping process, cell-free systems often incorporate a modeling step that predicts the outcomes of various experimental scenarios, providing a deeper insight into the underlying mechanisms and functions. There are two recognized approaches for modeling these systems: mechanism-based modeling, which models the underlying reaction mechanisms; and data-driven modeling, which makes predictions based on data without preconceived interactions between system components. In this highlight, we focus on the latest advancements in both modeling approaches for cell-free systems, exploring their potential for the design and optimization of synthetic genetic circuits.
Collapse
Affiliation(s)
- Angelina Yurchenko
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands.
- Institute for Complex Molecular Systems Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands
- Synthetic Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands
| | - Gökçe Özkul
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands.
- Institute for Complex Molecular Systems Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands
- Synthetic Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands
| | - Natal A W van Riel
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands
- Eindhoven MedTech Innovation Center, 5612 AX Eindhoven, The Netherlands
- Department of Vascular Medicine, Amsterdam UMC, Amsterdam, The Netherlands
| | - Jan C M van Hest
- Bio-Organic Chemistry, Institute for Complex Molecular Systems, Eindhoven University of Technology, Eindhoven 5600 MB, The Netherlands
- Biomedical Engineering, Institute for Complex Molecular Systems, Eindhoven University of Technology, Eindhoven 5600 MB, The Netherlands
| | - Tom F A de Greef
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands.
- Institute for Complex Molecular Systems Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands
- Synthetic Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands
- Institute for Molecules and Materials, Radboud University, 6525 AJ Nijmegen, The Netherlands
- Center for Living Technologies, Eindhoven-Wageningen-Utrecht Alliance, 3584 CB Utrecht, The Netherlands
| |
Collapse
|
3
|
Çi Ftçi B, Teki N R. Prediction of viral families and hosts of single-stranded RNA viruses based on K-Mer coding from phylogenetic gene sequences. Comput Biol Chem 2024; 112:108114. [PMID: 38852362 DOI: 10.1016/j.compbiolchem.2024.108114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 05/06/2024] [Accepted: 05/25/2024] [Indexed: 06/11/2024]
Abstract
There are billions of virus species worldwide, and viruses, the smallest parasitic entities, pose a serious threat. Therefore, fighting associated disorders requires an understanding of the genetic structure of viruses. Considering the wide diversity and rapid evolution of viruses, there is a critical need to quickly and accurately classify viral species and their potential hosts to better understand transmission dynamics, facilitating the development of targeted therapies. Recognizing this, this study has investigated the classes of RNA viruses based on their genomic sequences using Machine Learning (ML) and Deep Learning (DL) models. The PhyVirus dataset, consisting of pathogenic Single-stranded RNA viruses of Baltimore group four (+ssRNA) and five (-ssRNA) with different hosts and species, was analyzed. The dataset containing viral gene sequences was analyzed using the K-Mer coding technique, which is based on base words of various lengths. The study used classical ML algorithms (Random Forest, Gradient Boosting and Extra Trees) and the Fully Connected Deep Neural Network, a Deep Learning algorithm, to predict viral families and hosts. Detailed analyses were performed on the classifier performance in scenarios with different train-test ratios and different word lengths (k-values) for K-Mer. The observed results show that Fully Connected Deep Neural Network has a high success rate of 99.60 % in predicting virus families. In predicting virus hosts, the Extra Trees classifier achieved the highest success rate of 81.53 %. This study is considered to be the first classification study in the literature on this dataset, which has a very large family and host diversity consisting of gene sequences of Single-stranded RNA viruses. Our detailed investigations on how varying word lengths based on K-Mer coding in gene sequences affect the classification into viral families and hosts make this study particularly valuable. This study shows that ML and DL methods have the potential to produce valuable results in phylogenetic studies. In addition, the results and high-performance values show that these methods can be successfully used in regenerative applications of gene sequences or in studies such as the elimination of losses in gene sequences.
Collapse
Affiliation(s)
- Bahar Çi Ftçi
- Batman University, Institute of Graduate Studies, Department of Electrical and Electronic Engineering, Turkey; Siirt University, Distance Education Application and Research Center, Turkey.
| | - Ramazan Teki N
- Batman University, Faculty of Engineering and Architecture, Department of Computer Engineering, Turkey.
| |
Collapse
|
4
|
Kukhtar D, Fussenegger M. Synthetic biology in multicellular organisms: Opportunities in nematodes. Biotechnol Bioeng 2023. [PMID: 37448225 DOI: 10.1002/bit.28497] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/27/2023] [Accepted: 07/05/2023] [Indexed: 07/15/2023]
Abstract
Synthetic biology has mainly focused on introducing new or altered functionality in single cell systems: primarily bacteria, yeast, or mammalian cells. Here, we describe the extension of synthetic biology to nematodes, in particular the well-studied model organism Caenorhabditis elegans, as a convenient platform for developing applications in a multicellular setting. We review transgenesis techniques for nematodes, as well as the application of synthetic biology principles to construct nematode gene switches and genetic devices to control motility. Finally, we discuss potential applications of engineered nematodes.
Collapse
Affiliation(s)
- Dmytro Kukhtar
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Martin Fussenegger
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Faculty of Life Science, University of Basel, Basel, Switzerland
| |
Collapse
|