1
|
Halper SM, Hossain A, Salis HM. Synthesis Success Calculator: Predicting the Rapid Synthesis of DNA Fragments with Machine Learning. ACS Synth Biol 2020; 9:1563-1571. [PMID: 32559378 DOI: 10.1021/acssynbio.9b00460] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The synthesis and assembly of long DNA fragments has greatly accelerated synthetic biology and biotechnology research. However, long turnaround times or synthesis failures create unpredictable bottlenecks in the design-build-test-learn cycle. We developed a machine learning model, called the Synthesis Success Calculator, to predict whether a long DNA fragment can be readily synthesized with a short turnaround time. The model also identifies the sequence determinants associated with the synthesis outcome. We trained a random forest classifier using biophysical features and a compiled data set of 1076 DNA fragment sequences to achieve high predictive performance (F1 score of 0.928 on 251 unseen sequences). Feature importance analysis revealed that repetitive DNA sequences were the most important contributor to synthesis failures. We then applied the Synthesis Success Calculator across large sequence data sets and found that 84.9% of the Escherichia coli MG1655 genome, but only 34.4% of sampled plasmids in NCBI, could be readily synthesized. Overall, the Synthesis Success Calculator can be applied on its own to prevent synthesis failures or embedded within optimization algorithms to design large genetic systems that can be rapidly synthesized and assembled.
Collapse
|
2
|
Richardson SM, Mitchell LA, Stracquadanio G, Yang K, Dymond JS, DiCarlo JE, Lee D, Huang CLV, Chandrasegaran S, Cai Y, Boeke JD, Bader JS. Design of a synthetic yeast genome. Science 2017; 355:1040-1044. [PMID: 28280199 DOI: 10.1126/science.aaf4557] [Citation(s) in RCA: 349] [Impact Index Per Article: 49.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Accepted: 01/26/2017] [Indexed: 01/25/2023]
Abstract
We describe complete design of a synthetic eukaryotic genome, Sc2.0, a highly modified Saccharomyces cerevisiae genome reduced in size by nearly 8%, with 1.1 megabases of the synthetic genome deleted, inserted, or altered. Sc2.0 chromosome design was implemented with BioStudio, an open-source framework developed for eukaryotic genome design, which coordinates design modifications from nucleotide to genome scales and enforces version control to systematically track edits. To achieve complete Sc2.0 genome synthesis, individual synthetic chromosomes built by Sc2.0 Consortium teams around the world will be consolidated into a single strain by "endoreduplication intercross." Chemically synthesized genomes like Sc2.0 are fully customizable and allow experimentalists to ask otherwise intractable questions about chromosome structure, function, and evolution with a bottom-up design strategy.
Collapse
Affiliation(s)
- Sarah M Richardson
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA.,High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Leslie A Mitchell
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, New York University Langone Medical Center, New York, NY 10016, USA
| | - Giovanni Stracquadanio
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA.,High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
| | - Kun Yang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA.,High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jessica S Dymond
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - James E DiCarlo
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Dongwon Lee
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Cheng Lai Victor Huang
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Srinivasan Chandrasegaran
- Department of Environmental Health Science, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Yizhi Cai
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,University of Edinburgh, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Jef D Boeke
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA. .,Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, New York University Langone Medical Center, New York, NY 10016, USA
| | - Joel S Bader
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA. .,High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|