1
|
Zelenka NR, Di Cara N, Sharma K, Sarvaharman S, Ghataora JS, Parmeggiani F, Nivala J, Abdallah ZS, Marucci L, Gorochowski TE. Data hazards in synthetic biology. Synth Biol (Oxf) 2024; 9:ysae010. [PMID: 38973982 PMCID: PMC11227101 DOI: 10.1093/synbio/ysae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/17/2024] [Accepted: 06/19/2024] [Indexed: 07/09/2024] Open
Abstract
Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.
Collapse
Affiliation(s)
- Natalie R Zelenka
- Jean Golding Institute, University of Bristol, Bristol, UK
- BrisEngBio, University of Bristol, Bristol, UK
| | - Nina Di Cara
- School of Psychological Science, University of Bristol, Bristol, UK
| | - Kieren Sharma
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | | | - Jasdeep S Ghataora
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Fabio Parmeggiani
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biochemistry, University of Bristol, Bristol, UK
- School of Pharmacy and Pharmaceutical Sciences, Cardiff University, Cardiff, UK
| | - Jeff Nivala
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Zahraa S Abdallah
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | - Lucia Marucci
- BrisEngBio, University of Bristol, Bristol, UK
- School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
| | - Thomas E Gorochowski
- BrisEngBio, University of Bristol, Bristol, UK
- School of Biological Sciences, University of Bristol, Bristol, UK
| |
Collapse
|
2
|
Sun G, DeFelice MM, Gillies TE, Ahn-Horst TA, Andrews CJ, Krummenacker M, Karp PD, Morrison JH, Covert MW. Cross-evaluation of E. coli's operon structures via a whole-cell model suggests alternative cellular benefits for low- versus high-expressing operons. Cell Syst 2024; 15:227-245.e7. [PMID: 38417437 PMCID: PMC10957310 DOI: 10.1016/j.cels.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 09/12/2023] [Accepted: 02/08/2024] [Indexed: 03/01/2024]
Abstract
Many bacteria use operons to coregulate genes, but it remains unclear how operons benefit bacteria. We integrated E. coli's 788 polycistronic operons and 1,231 transcription units into an existing whole-cell model and found inconsistencies between the proposed operon structures and the RNA-seq read counts that the model was parameterized from. We resolved these inconsistencies through iterative, model-guided corrections to both datasets, including the correction of RNA-seq counts of short genes that were misreported as zero by existing alignment algorithms. The resulting model suggested two main modes by which operons benefit bacteria. For 86% of low-expression operons, adding operons increased the co-expression probabilities of their constituent proteins, whereas for 92% of high-expression operons, adding operons resulted in more stable expression ratios between the proteins. These simulations underscored the need for further experimental work on how operons reduce noise and synchronize both the expression timing and the quantity of constituent genes. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Gwanggyu Sun
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Mialy M DeFelice
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Taryn E Gillies
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Travis A Ahn-Horst
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Cecelia J Andrews
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA
| | | | | | - Jerry H Morrison
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Markus W Covert
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
3
|
Georgouli K, Yeom JS, Blake RC, Navid A. Multi-scale models of whole cells: progress and challenges. Front Cell Dev Biol 2023; 11:1260507. [PMID: 38020904 PMCID: PMC10661945 DOI: 10.3389/fcell.2023.1260507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/19/2023] [Indexed: 12/01/2023] Open
Abstract
Whole-cell modeling is "the ultimate goal" of computational systems biology and "a grand challenge for 21st century" (Tomita, Trends in Biotechnology, 2001, 19(6), 205-10). These complex, highly detailed models account for the activity of every molecule in a cell and serve as comprehensive knowledgebases for the modeled system. Their scope and utility far surpass those of other systems models. In fact, whole-cell models (WCMs) are an amalgam of several types of "system" models. The models are simulated using a hybrid modeling method where the appropriate mathematical methods for each biological process are used to simulate their behavior. Given the complexity of the models, the process of developing and curating these models is labor-intensive and to date only a handful of these models have been developed. While whole-cell models provide valuable and novel biological insights, and to date have identified some novel biological phenomena, their most important contribution has been to highlight the discrepancy between available data and observations that are used for the parametrization and validation of complex biological models. Another realization has been that current whole-cell modeling simulators are slow and to run models that mimic more complex (e.g., multi-cellular) biosystems, those need to be executed in an accelerated fashion on high-performance computing platforms. In this manuscript, we review the progress of whole-cell modeling to date and discuss some of the ways that they can be improved.
Collapse
Affiliation(s)
- Konstantia Georgouli
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Jae-Seung Yeom
- Center for Applied Scientific Computing, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Robert C. Blake
- Center for Applied Scientific Computing, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Ali Navid
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States
| |
Collapse
|