1
|
Schwarz PM, Freisleben B. Data recovery methods for DNA storage based on fountain codes. Comput Struct Biotechnol J 2024; 23:1808-1823. [PMID: 38707543 PMCID: PMC11066528 DOI: 10.1016/j.csbj.2024.04.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Today's digital data storage systems typically offer advanced data recovery solutions to address the problem of catastrophic data loss, such as software-based disk sector analysis or physical-level data retrieval methods for conventional hard disk drives. However, DNA-based data storage currently relies solely on the inherent error correction properties of the methods used to encode digital data into strands of DNA. Any error that cannot be corrected utilizing the redundancy added by DNA encoding methods results in permanent data loss. To provide data recovery for DNA storage systems, we present a method to automatically reconstruct corrupted or missing data stored in DNA using fountain codes. Our method exploits the relationships between packets encoded with fountain codes to identify and rectify corrupted or lost data. Furthermore, we present file type-specific and content-based data recovery methods for three file types, illustrating how a fusion of fountain encoding-specific redundancy and knowledge about the data can effectively recover information in a corrupted DNA storage system, both in an automatic and in a guided manual manner. To demonstrate our approach, we introduce DR4DNA, a software toolkit that contains all methods presented. We evaluate DR4DNA using both in-silico and in-vitro experiments.
Collapse
Affiliation(s)
- Peter Michael Schwarz
- Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Straße 6, Marburg, D-35043, Germany
| | - Bernd Freisleben
- Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Straße 6, Marburg, D-35043, Germany
| |
Collapse
|
2
|
Zhong W, Geng C, Fu Z, Mao C, Zheng Y, Wang S, Liu K, Yang Y, Lu C, Jiang X. Flow Cytometry Sorting for Random Access in DNA Data Storage: Encapsulation for Enhanced Stability and Sequence Integrity of DNA. Anal Chem 2024; 96:16099-16108. [PMID: 39319639 DOI: 10.1021/acs.analchem.4c04637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
As digital data undergo explosive growth, deoxyribonucleic acid (DNA) has emerged as a promising storage medium due to its high density, longevity, and ease of replication, offering vast potential in data storage solutions. This study focuses on the protection and retrieval of data during the DNA storage process, developing a technique that employs flow cytometry sorting (FCS) to segregate multicolored fluorescent DNA microparticles encoded with data and facilitating efficient random access. Moreover, the encapsulated fluorescent DNA microparticles, formed through layer-by-layer self-assembly, preserve structural and sequence integrity even under harsh conditions while also supporting a high-density DNA payload. Experimental results have shown that the encoded data can still be successfully recovered from encapsulated DNA microparticles following de-encapsulation. We also successfully demonstrated the automated encapsulation process of fluorescent DNA microparticles using a microfluidic chip. This research provides an innovative approach to the long-term stability and random readability of DNA data storage.
Collapse
Affiliation(s)
- Wukun Zhong
- MOE Key Laboratory for Analytical Science of Food Safety and Biology, College of Chemistry, Fuzhou University, Fuzhou 350108, China
| | - Chunyang Geng
- Shenzhen Key Laboratory of Smart Healthcare Engineering, Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Zhangcheng Fu
- MOE Key Laboratory for Analytical Science of Food Safety and Biology, College of Chemistry, Fuzhou University, Fuzhou 350108, China
| | - Cuiping Mao
- Shenzhen Key Laboratory of Smart Healthcare Engineering, Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Yanlin Zheng
- MOE Key Laboratory for Analytical Science of Food Safety and Biology, College of Chemistry, Fuzhou University, Fuzhou 350108, China
| | - Saijie Wang
- Shenzhen Key Laboratory of Smart Healthcare Engineering, Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Kai Liu
- Engineering Research Center of Advanced Rare Earth Materials (Ministry of Education), Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Yang Yang
- Institute of Molecular Medicine and Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, State Key Laboratory of Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China
| | - Chunhua Lu
- MOE Key Laboratory for Analytical Science of Food Safety and Biology, College of Chemistry, Fuzhou University, Fuzhou 350108, China
| | - Xingyu Jiang
- Shenzhen Key Laboratory of Smart Healthcare Engineering, Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| |
Collapse
|
3
|
Gao Y, No A. Efficient and low-complexity variable-to-variable length coding for DNA storage. BMC Bioinformatics 2024; 25:320. [PMID: 39354338 PMCID: PMC11446080 DOI: 10.1186/s12859-024-05943-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 09/23/2024] [Indexed: 10/03/2024] Open
Abstract
BACKGROUND Efficient DNA-based storage systems offer substantial capacity and longevity at reduced costs, addressing anticipated data growth. However, encoding data into DNA sequences is limited by two key constraints: 1) a maximum of h consecutive identical bases (homopolymer constraint h), and 2) a GC ratio between [ 0.5 - c GC , 0.5 + c GC ] (GC content constraint c GC ). Sequencing or synthesis errors tend to increase when these constraints are violated. RESULTS In this research, we address a pure source coding problem in the context of DNA storage, considering both homopolymer and GC content constraints. We introduce a novel coding technique that adheres to these constraints while maintaining linear complexity for increased block lengths and achieving near-optimal rates. We demonstrate the effectiveness of the proposed method through experiments on both randomly generated data and existing files. For example, when h = 4 andc GC = 0.05 , the rate reached 1.988, close to the theoretical limit of 1.990. The associated code can be accessed at GitHub. CONCLUSION We propose a variable-to-variable-length encoding method that does not rely on concatenating short predefined sequences, which achieves near-optimal rates.
Collapse
Affiliation(s)
- Yunfei Gao
- SJTU-Ruijing-UIH Institute for Medical Imaging Technology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, No. 197 Ruijin Second Road, Shanghai, 200025, China
| | - Albert No
- Department of Artificial Intelligence, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, 03722, South Korea.
| |
Collapse
|
4
|
Xu J, Wang Y, Chen X, Wang L, Zhou H, Mei H, Chen S, Huang X. "Multi-layer" encryption of medical data in DNA for highly-secure storage. Mater Today Bio 2024; 28:101221. [PMID: 39309163 PMCID: PMC11415972 DOI: 10.1016/j.mtbio.2024.101221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/21/2024] [Accepted: 08/31/2024] [Indexed: 09/25/2024] Open
Abstract
The exponential increasement and the attributes of medical data drive the requirement for secure medical data archiving. DNA data storage shows promise for storing sensitive and important data like medical records due to its high density and endurance. Nevertheless, current DNA data storage working scheme generally does not fully consider the data encryption, posing a risk of data corruption by routine DNA sequencing. Here, we designed a "multi-layer" encryption pipeline for medical data archiving. Initially, digital information was encrypted using Blowfish algorithm at information technology (IT) layer, followed by two-layer data encryption at the biotechnology (BT) layer. The first BT layer exploited the molecular weight of synthetic DNA or nucleoside to encrypt the key, while the second BT layer encrypted digital information within DNA sequences. Consequently, decryption involved layer-by-layer interpretation of data, including mass spectroscopy, sequencing, and Blowfish decryption, significantly enhancing data security. Utilizing mass spectroscopy to retrieve information allows for employment of both natural and unnatural nucleosides, as well as their synthetic oligonucleotides, for data storage, thereby considerably boosting scalability. Our work implies expanded flexibility of DNA-based data storage, highlighting the potential for leveraging various physical and chemical characteristics of DNA molecules to encode and access digital information.
Collapse
Affiliation(s)
- Jiaxin Xu
- Department of Pulmonary and Critical Care Medicine, Institute of Respiratory Diseases, Post-doctoral Scientific Research Station of Basic Medicine, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, 518020, Guangdong, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Yu Wang
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Xue Chen
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Lingwei Wang
- Department of Pulmonary and Critical Care Medicine, Institute of Respiratory Diseases, Post-doctoral Scientific Research Station of Basic Medicine, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, 518020, Guangdong, China
| | - Haibo Zhou
- College of Pharmacy, Jinan University, Guangzhou, Guangdong, 510632, China
| | - Hui Mei
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Shanze Chen
- Department of Pulmonary and Critical Care Medicine, Institute of Respiratory Diseases, Post-doctoral Scientific Research Station of Basic Medicine, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, 518020, Guangdong, China
| | - Xiaoluo Huang
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| |
Collapse
|
5
|
Milisavljevic M, Rodriguez TR, Tyo KEJ. Elucidating sequence-function relationships in a template-independent polymerase to enable novel DNA recording applications. Biotechnol Bioeng 2024. [PMID: 39275897 DOI: 10.1002/bit.28838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/17/2024] [Accepted: 09/01/2024] [Indexed: 09/16/2024]
Abstract
Harnessing DNA as a high-density storage medium for information storage and molecular recording of signals has been of increasing interest in the biotechnology field. Recently, progress in enzymatic DNA synthesis, DNA digital data storage, and DNA-based molecular recording has been made by leveraging the activity of the template-independent DNA polymerase, terminal deoxynucleotidyl transferase (TdT). TdT adds deoxyribonucleotides to the 3' end of single-stranded DNA, generating random sequences of single-stranded DNA. TdT can use several divalent cations for its enzymatic activity and exhibits shifts in deoxyribonucleotide incorporation frequencies in response to changes in its reaction environment. However, there is limited understanding of sequence-structure-function relationships regarding these properties, which in turn limits our ability to modulate TdT to further advance TdT-based tools. Most TdT literature to-date explores the activity of murine, bovine or human TdTs; studies probing TdT sequence and structure largely focus on strictly conserved residues that are functionally critical to TdT activity. Here, we explore non-conserved TdT sequence space by surveying the natural diversity of TdT. We characterize a diverse set of TdT homologs from different organisms and identify several TdT residues/regions that confer differences in TdT behavior between homologs. The observations in this study can design rules for targeted TdT libraries, in tandem with a screening assay, to modulate TdT properties. Moreover, the data can be useful in guiding further studies of potential residues of interest. Overall, we characterize TdTs that have not been previously studied in the literature, and we provide new insights into TdT sequence-function relationships.
Collapse
Affiliation(s)
- Marija Milisavljevic
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, USA
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois, USA
| | - Teresa Rojas Rodriguez
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, USA
| | - Keith E J Tyo
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, USA
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
6
|
Takahashi CN, Ward DP, Cazzaniga C, Frost C, Rech P, Ganguly K, Blanchard S, Wender S, Nguyen BH, Smith JA. Evaluating the risk of data loss due to particle radiation damage in a DNA data storage system. Nat Commun 2024; 15:8067. [PMID: 39277598 PMCID: PMC11401870 DOI: 10.1038/s41467-024-51768-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 08/16/2024] [Indexed: 09/17/2024] Open
Abstract
DNA data storage is a potential alternative to magnetic tape for archival storage purposes, promising substantial gains in information density. Critical to the success of DNA as a storage media is an understanding of the role of environmental factors on the longevity of the stored information. In this paper, we evaluate the effect of exposure to ionizing particle radiation, a cause of data loss in traditional magnetic media, on the longevity of data in DNA data storage pools. We develop a mass action kinetics model to estimate the rate of damage accumulation in DNA strands due to neutron interactions with both nucleotides and residual water molecules, then utilize the model to evaluate the effect several design parameters of a typical DNA data storage scheme have on expected data longevity. Finally, we experimentally validate our model by exposing dried DNA samples to different levels of neutron irradiation and analyzing the resulting error profile. Our results show that particle radiation is not a significant contributor to data loss in DNA data storage pools under typical storage conditions.
Collapse
Affiliation(s)
- Christopher N Takahashi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - David P Ward
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | | | | | | | | | | | - Steve Wender
- Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Bichlien H Nguyen
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
- Microsoft Research, Redmond, WA, USA.
| | - Jake A Smith
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
- Microsoft Research, Redmond, WA, USA.
| |
Collapse
|
7
|
Zhang J. Levy Sooty Tern Optimization Algorithm Builds DNA Storage Coding Sets for Random Access. ENTROPY (BASEL, SWITZERLAND) 2024; 26:778. [PMID: 39330111 PMCID: PMC11431215 DOI: 10.3390/e26090778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/02/2024] [Accepted: 09/05/2024] [Indexed: 09/28/2024]
Abstract
DNA molecules, as a storage medium, possess unique advantages. Not only does DNA storage exhibit significantly higher storage density compared to electromagnetic storage media, but it also features low energy consumption and extremely long storage times. However, the integration of DNA storage into daily life remains distant due to challenges such as low storage density, high latency, and inevitable errors during the storage process. Therefore, this paper proposes constructing a DNA storage coding set based on the Levy Sooty Tern Optimization Algorithm (LSTOA) to achieve an efficient random-access DNA storage system. Firstly, addressing the slow iteration speed and susceptibility to local optima of the Sooty Tern Optimization Algorithm (STOA), this paper introduces Levy flight operations and propose the LSTOA. Secondly, utilizing the LSTOA, this paper constructs a DNA storage encoding set to facilitate random access while meeting combinatorial constraints. To demonstrate the coding performance of the LSTOA, this paper consists of analyses on 13 benchmark test functions, showcasing its superior performance. Furthermore, under the same combinatorial constraints, the LSTOA constructs larger DNA storage coding sets, effectively reducing the read-write latency and error rate of DNA storage.
Collapse
Affiliation(s)
- Jianxia Zhang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang 453003, China
- School of Intelligent Engineering, Henan Institute of Technology, Xinxiang 453003, China
| |
Collapse
|
8
|
Jo S, Shin H, Joe SY, Baek D, Park C, Chun H. Recent progress in DNA data storage based on high-throughput DNA synthesis. Biomed Eng Lett 2024; 14:993-1009. [PMID: 39220021 PMCID: PMC11362454 DOI: 10.1007/s13534-024-00386-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 04/24/2024] [Accepted: 04/26/2024] [Indexed: 09/04/2024] Open
Abstract
DNA data storage has emerged as a solution for storing massive volumes of data by utilizing nucleic acids as a digital information medium. DNA offers exceptionally high storage density, long durability, and low maintenance costs compared to conventional storage media such as flash memory and hard disk drives. DNA data storage consists of the following steps: encoding, DNA synthesis (i.e., writing), preservation, retrieval, DNA sequencing (i.e., reading), and decoding. Out of these steps, DNA synthesis presents a bottleneck due to imperfect coupling efficiency, low throughput, and excessive use of organic solvents. Overcoming these challenges is essential to establish DNA as a viable data storage medium. In this review, we provide the overall process of DNA data storage, presenting the recent progress of each step. Next, we examine a detailed overview of DNA synthesis methods with an emphasis on their limitations. Lastly, we discuss the efforts to overcome the constraints of each method and their prospects.
Collapse
Affiliation(s)
- Seokwoo Jo
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - Haewon Shin
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - Sung-yune Joe
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - David Baek
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - Chaewon Park
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| | - Honggu Chun
- Department of Biomedical Engineering, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
- Interdisciplinary Program in Precision Public Health, Korea University, 466 Hana Science Hall, Seoul, 02841 Korea
| |
Collapse
|
9
|
Lin KN, Volkel K, Cao C, Hook PW, Polak RE, Clark AS, San Miguel A, Timp W, Tuck JM, Velev OD, Keung AJ. A primordial DNA store and compute engine. NATURE NANOTECHNOLOGY 2024:10.1038/s41565-024-01771-6. [PMID: 39174834 DOI: 10.1038/s41565-024-01771-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 07/19/2024] [Indexed: 08/24/2024]
Abstract
Any modern information system is expected to feature a set of primordial features and functions: a substrate stably carrying data; the ability to repeatedly write, read, erase, reload and compute on specific data from that substrate; and the overall ability to execute such functions in a seamless and programmable manner. For nascent molecular information technologies, proof-of-principle realization of this set of primordial capabilities would advance the vision for their continued development. Here we present a DNA-based store and compute engine that captures these primordial capabilities. This system comprises multiple image files encoded into DNA and adsorbed onto ~50-μm-diameter, highly porous, hierarchically branched, colloidal substrate particles comprised of naturally abundant cellulose acetate. Their surface areas are over 200 cm2 mg-1 with binding capacities of over 1012 DNA oligos mg-1, 10 TB mg-1 or 104 TB cm-3. This 'dendricolloid' stably holds DNA files better than bare DNA with an extrapolated ability to be repeatedly lyophilized and rehydrated over 170 times compared with 60 times, respectively. Accelerated ageing studies project half-lives of ~6,000 and 2 million years at 4 °C and -18 °C, respectively. The data can also be erased and replaced, and non-destructive file access is achieved through transcribing from distinct synthetic promoters. The resultant RNA molecules can be directly read via nanopore sequencing and can also be enzymatically computed to solve simplified 3 × 3 chess and sudoku problems. Our study establishes a feasible route for utilizing the high information density and parallel computational advantages of nucleic acids.
Collapse
Affiliation(s)
- Kevin N Lin
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
| | - Kevin Volkel
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, USA
| | - Cyrus Cao
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Rachel E Polak
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
- Genetics Program, North Carolina State University, Raleigh, NC, USA
| | - Andrew S Clark
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
| | - Adriana San Miguel
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
- Genetics Program, North Carolina State University, Raleigh, NC, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - James M Tuck
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, USA
| | - Orlin D Velev
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA.
| | - Albert J Keung
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA.
- Genetics Program, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
10
|
Zhou Y, Bi K, Ge Q, Lu Z. Advances and Challenges in Random Access Techniques for In Vitro DNA Data Storage. ACS APPLIED MATERIALS & INTERFACES 2024; 16:43102-43113. [PMID: 39110103 DOI: 10.1021/acsami.4c07235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
With digital transformation and the general application of new technologies, data storage is facing new challenges with the demand for high-density loading of massive information. In response, DNA storage technology has emerged as a promising research direction. Efficient and reliable data retrieval is critical for DNA storage, and the development of random access technology plays a key role in its practicality and reliability. However, achieving fast and accurate random access functions has proven difficult for existing DNA storage efforts, which limits its practical applications in industry. In this review, we summarize the recent advances in DNA storage technology that enable random access functionality, as well as the challenges that need to be overcome and the current solutions. This review aims to help researchers in the field of DNA storage better understand the importance of the random access step and its impact on the overall development of DNA storage. Furthermore, the remaining challenges and future research trends in random access technology of DNA storage are discussed, with the goal of providing a solid foundation for achieving random access in DNA storage under large-scale data conditions.
Collapse
Affiliation(s)
- Ying Zhou
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, China
| | - Kun Bi
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
11
|
Berezin CT, Peccoud S, Kar DM, Peccoud J. Cryptographic approaches to authenticating synthetic DNA sequences. Trends Biotechnol 2024; 42:1002-1016. [PMID: 38418329 PMCID: PMC11309913 DOI: 10.1016/j.tibtech.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 02/01/2024] [Accepted: 02/02/2024] [Indexed: 03/01/2024]
Abstract
In a bioeconomy that relies on synthetic DNA sequences, the ability to ensure their authenticity is critical. DNA watermarks can encode identifying data in short sequences and can be combined with error correction and encryption protocols to ensure that sequences are robust to errors and securely communicated. New digital signature techniques allow for public verification that a sequence has not been modified and can contain sufficient information for synthetic DNA to be self-documenting. In translating these techniques from bacteria to more complex genetically modified organisms (GMOs), special considerations must be made to allow for public verification of these products. We argue that these approaches should be widely implemented to assert authorship, increase the traceability, and detect the unauthorized use of synthetic DNA.
Collapse
Affiliation(s)
- Casey-Tyler Berezin
- Department of Chemical & Biological Engineering, Colorado State University, Fort Collins, CO, USA
| | - Samuel Peccoud
- Department of Electrical Engineering, Colorado State University, Fort Collins, CO, USA
| | - Diptendu M Kar
- Department of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Jean Peccoud
- Department of Chemical & Biological Engineering, Colorado State University, Fort Collins, CO, USA; Department of Computer Sciences, Colorado State University, Fort Collins, CO, USA; School of Biomedical Engineering, Colorado State University, Fort Collins, CO, USA; Department of Systems Engineering, Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
12
|
Xu Y, Ding L, Wu S, Ruan J. Overcoming the High Error Rate of Composite DNA Letters-Based Digital Storage through Soft-Decision Decoding. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2402951. [PMID: 38874370 PMCID: PMC11321706 DOI: 10.1002/advs.202402951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/10/2024] [Indexed: 06/15/2024]
Abstract
Composite DNA letters, by merging all four DNA nucleotides in specified ratios, offer a pathway to substantially increase the logical density of DNA digital storage (DDS) systems. However, these letters are susceptible to nucleotide errors and sampling bias, leading to a high letter error rate, which complicates precise data retrieval and augments reading expenses. To address this, Derrick-cp is introduced as an innovative soft-decision decoding algorithm tailored for DDS utilizing composite letters. Derrick-cp capitalizes on the distinctive error sensitivities among letters to accurately predict and rectify letter errors, thus enhancing the error-correcting performance of Reed-Solomon codes beyond traditional hard-decision decoding limits. Through comparative analyses in the existing dataset and simulated experiments, Derrick-cp's superiority is validated, notably halving the sequencing depth requirement and slashing costs by up to 22% against conventional hard-decision strategies. This advancement signals Derrick-cp's significant role in elevating both the precision and cost-efficiency of composite letter-based DDS.
Collapse
Affiliation(s)
- Yaping Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural AffairsAgricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences7 Pengfei Street Dapeng New DistrictShenzhen518120P. R. China
| | - Lulu Ding
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural AffairsAgricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences7 Pengfei Street Dapeng New DistrictShenzhen518120P. R. China
- National Engineering Laboratory for Big Data System Computing TechnologyShenzhen UniversityShenzhen518060P. R. China
| | - Shigang Wu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural AffairsAgricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences7 Pengfei Street Dapeng New DistrictShenzhen518120P. R. China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural AffairsAgricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences7 Pengfei Street Dapeng New DistrictShenzhen518120P. R. China
| |
Collapse
|
13
|
Yan L, Ge X. A Thermodynamic Study on Information Power in Communication Systems. ENTROPY (BASEL, SWITZERLAND) 2024; 26:650. [PMID: 39202120 PMCID: PMC11353885 DOI: 10.3390/e26080650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/29/2024] [Accepted: 07/29/2024] [Indexed: 09/03/2024]
Abstract
Modern information theory pioneered by Shannon provides the mathematical foundation of information transmission and compression. However, the physical (and especially the energetic) nature of the information has been elusive. While the processing of information incurs inevitable energy dissipation, it is possible for communication systems to harness information to perform useful work. In this article, we prove that the thermodynamic cost (that is, the entropy production of the communication system) is at least equal to the information transmitted. Based on this result, a model of a communication heat engine is proposed, which can extract work from the heat bath by utilizing the transmission of information. The communication heat engine integrates the manipulation of both energy and information so that both information and power may be transmitted in parallel. The information transmission rate and the information power of the communication heat engine are derived from a pure thermodynamics argument. We find that the information power of the communication heat engine can be increased by increasing the number of communication channels, but the absolute energy efficiency of the heat engine first increases and then decreases after the number of channels of the system exceeds a threshold. The proposed model and definitions provide a new way to think of a classical communication system from a thermodynamic perspective.
Collapse
|
14
|
Dou C, Yang Y, Zhu F, Li B, Duan Y. Explorer: efficient DNA coding by De Bruijn graph toward arbitrary local and global biochemical constraints. Brief Bioinform 2024; 25:bbae363. [PMID: 39073829 DOI: 10.1093/bib/bbae363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 06/25/2024] [Accepted: 07/13/2024] [Indexed: 07/30/2024] Open
Abstract
With the exponential growth of digital data, there is a pressing need for innovative storage media and techniques. DNA molecules, due to their stability, storage capacity, and density, offer a promising solution for information storage. However, DNA storage also faces numerous challenges, such as complex biochemical constraints and encoding efficiency. This paper presents Explorer, a high-efficiency DNA coding algorithm based on the De Bruijn graph, which leverages its capability to characterize local sequences. Explorer enables coding under various biochemical constraints, such as homopolymers, GC content, and undesired motifs. This paper also introduces Codeformer, a fast decoding algorithm based on the transformer architecture, to further enhance decoding efficiency. Numerical experiments indicate that, compared with other advanced algorithms, Explorer not only achieves stable encoding and decoding under various biochemical constraints but also increases the encoding efficiency and bit rate by ¿10%. Additionally, Codeformer demonstrates the ability to efficiently decode large quantities of DNA sequences. Under different parameter settings, its decoding efficiency exceeds that of traditional algorithms by more than two-fold. When Codeformer is combined with Reed-Solomon code, its decoding accuracy exceeds 99%, making it a good choice for high-speed decoding applications. These advancements are expected to contribute to the development of DNA-based data storage systems and the broader exploration of DNA as a novel information storage medium.
Collapse
Affiliation(s)
- Chang Dou
- Center for Applied Mathematics, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China
| | - Yijie Yang
- Center for Applied Mathematics, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China
| | - Fei Zhu
- Center for Applied Mathematics, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China
| | - BingZhi Li
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China
- School of Chemical Engineering and Technology, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China
| | - Yuping Duan
- Center for Applied Mathematics, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China
| |
Collapse
|
15
|
Cao B, Wang K, Xie L, Zhang J, Zhao Y, Wang B, Zheng P. PELMI: Realize robust DNA image storage under general errors via parity encoding and local mean iteration. Brief Bioinform 2024; 25:bbae463. [PMID: 39288232 PMCID: PMC11407442 DOI: 10.1093/bib/bbae463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 09/01/2024] [Accepted: 09/04/2024] [Indexed: 09/19/2024] Open
Abstract
DNA molecules as storage media are characterized by high encoding density and low energy consumption, making DNA storage a highly promising storage method. However, DNA storage has shortcomings, especially when storing multimedia data, wherein image reconstruction fails when address errors occur, resulting in complete data loss. Therefore, we propose a parity encoding and local mean iteration (PELMI) scheme to achieve robust DNA storage of images. The proposed parity encoding scheme satisfies the common biochemical constraints of DNA sequences and the undesired motif content. It addresses varying pixel weights at different positions for binary data, thus optimizing the utilization of Reed-Solomon error correction. Then, through lost and erroneous sequences, data supplementation and local mean iteration are employed to enhance the robustness. The encoding results show that the undesired motif content is reduced by 23%-50% compared with the representative schemes, which improves the sequence stability. PELMI achieves image reconstruction under general errors (insertion, deletion, substitution) and enhances the DNA sequences quality. Especially under 1% error, compared with other advanced encoding schemes, the peak signal-to-noise ratio and the multiscale structure similarity address metric were increased by 10%-13% and 46.8%-122%, respectively, and the mean squared error decreased by 113%-127%. This demonstrates that the reconstructed images had better clarity, fidelity, and similarity in structure, texture, and detail. In summary, PELMI ensures robustness and stability of image storage in DNA and achieves relatively high-quality image reconstruction under general errors.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning 116024, China
| | - Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Lei Xie
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Jianxia Zhang
- School of Intelligent Engineering, Henan Institute of Technology, No. 90, East Hualan Avenue, Hongqi District, Xinxiang, Henan 451191, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, Upper Riccarton, Christchurch 8140, New Zealand
| |
Collapse
|
16
|
Bian T, Pei Y, Gao S, Zhou S, Sun X, Dong M, Song J. Xeno Nucleic Acids as Functional Materials: From Biophysical Properties to Application. Adv Healthc Mater 2024:e2401207. [PMID: 39036821 DOI: 10.1002/adhm.202401207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/14/2024] [Indexed: 07/23/2024]
Abstract
Xeno nucleic acid (XNA) are artificial nucleic acids, in which the chemical composition of the sugar moiety is changed. These modifications impart distinct physical and chemical properties to XNAs, leading to changes in their biological, chemical, and physical stability. Additionally, these alterations influence the binding dynamics of XNAs to their target molecules. Consequently, XNAs find expanded applications as functional materials in diverse fields. This review provides a comprehensive summary of the distinctive biophysical properties exhibited by various modified XNAs and explores their applications as innovative functional materials in expanded fields.
Collapse
Affiliation(s)
- Tianyuan Bian
- Academy of Medical Engineering and Translational Medicine (AMT), Tianjin University, Tianjin, 300072, China
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Yufeng Pei
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Shitao Gao
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310022, China
- College of Materials Science and Engineering, Zhejiang University of Technology, ChaoWang Road 18, HangZhou, 310014, China
| | - Songtao Zhou
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Xinyu Sun
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310022, China
- Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230001, China
| | - Mingdong Dong
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus C, Aarhus, DK-8000, Denmark
| | - Jie Song
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310022, China
| |
Collapse
|
17
|
Kim JW, Jeong J, Kwak HY, No JS. Design of DNA Storage Coding Scheme With LDPC Codes and Interleaving. IEEE Trans Nanobioscience 2024; 23:447-457. [PMID: 38512749 DOI: 10.1109/tnb.2024.3379976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
In this paper, we propose a new coding scheme for DNA storage using low-density parity-check (LDPC) codes and interleaving techniques. While conventional coding schemes generally employ error correcting codes in both inter and intra-oligo directions, we show that inter-oligo LDPC codes, optimized by differential evolution, are sufficient in ensuring the reliability of DNA storage due to the powerful soft decoding of LDPC codes. In addition, we apply interleaving techniques for handling non-uniform error characteristics of DNA storage to enhance the decoding performance. Consequently, the proposed coding scheme reduces the required number of oligo reads for perfect recovery by 26.25% ~ 38.5% compared to existing state-of-the-art coding schemes. Moreover, we develop an analytical DNA channel model in terms of non-uniform binary symmetric channels. This mathematical model allows us to demonstrate the superiority of the proposed coding scheme while isolating the experimental variation, as well as confirm the independent effects of LDPC codes and interleaving techniques.
Collapse
|
18
|
Wang Q, Zhang S, Li Y. Efficient DNA Coding Algorithm for Polymerase Chain Reaction Amplification Information Retrieval. Int J Mol Sci 2024; 25:6449. [PMID: 38928155 PMCID: PMC11204281 DOI: 10.3390/ijms25126449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 06/02/2024] [Accepted: 06/07/2024] [Indexed: 06/28/2024] Open
Abstract
Polymerase Chain Reaction (PCR) amplification is widely used for retrieving information from DNA storage. During the PCR amplification process, nonspecific pairing between the 3' end of the primer and the DNA sequence can cause cross-talk in the amplification reaction, leading to the generation of interfering sequences and reduced amplification accuracy. To address this issue, we propose an efficient coding algorithm for PCR amplification information retrieval (ECA-PCRAIR). This algorithm employs variable-length scanning and pruning optimization to construct a codebook that maximizes storage density while satisfying traditional biological constraints. Subsequently, a codeword search tree is constructed based on the primer library to optimize the codebook, and a variable-length interleaver is used for constraint detection and correction, thereby minimizing the likelihood of nonspecific pairing. Experimental results demonstrate that ECA-PCRAIR can reduce the probability of nonspecific pairing between the 3' end of the primer and the DNA sequence to 2-25%, enhancing the robustness of the DNA sequences. Additionally, ECA-PCRAIR achieves a storage density of 2.14-3.67 bits per nucleotide (bits/nt), significantly improving storage capacity.
Collapse
Affiliation(s)
| | - Shufang Zhang
- School of Electrical Automation and Information Engineering, Tianjin University, Tianjin 300072, China
| | | |
Collapse
|
19
|
Rasool A, Hong J, Hong Z, Li Y, Zou C, Chen H, Qu Q, Wang Y, Jiang Q, Huang X, Dai J. An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data. SMALL METHODS 2024:e2301585. [PMID: 38807543 DOI: 10.1002/smtd.202301585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 03/29/2024] [Indexed: 05/30/2024]
Abstract
DNA-based data storage is a new technology in computational and synthetic biology, that offers a solution for long-term, high-density data archiving. Given the critical importance of medical data in advancing human health, there is a growing interest in developing an effective medical data storage system based on DNA. Data integrity, accuracy, reliability, and efficient retrieval are all significant concerns. Therefore, this study proposes an Effective DNA Storage (EDS) approach for archiving medical MRI data. The EDS approach incorporates three key components (i) a novel fraction strategy to address the critical issue of rotating encoding, which often leads to data loss due to single base error propagation; (ii) a novel rule-based quaternary transcoding method that satisfies bio-constraints and ensure reliable mapping; and (iii) an indexing technique designed to simplify random search and access. The effectiveness of this approach is validated through computer simulations and biological experiments, confirming its practicality. The EDS approach outperforms existing methods, providing superior control over bio-constraints and reducing computational time. The results and code provided in this study open new avenues for practical DNA storage of medical MRI data, offering promising prospects for the future of medical data archiving and retrieval.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jingwei Hong
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
| | - Zhiling Hong
- Quanzhou Development Group Co., Ltd, Quanzhou, 362000, China
| | - Yuanzhen Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, 518055, China
| | - Chao Zou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Hui Chen
- Shenzhen Polytechnic University, Shenzhen, 518055, China
| | - Qiang Qu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yang Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qingshan Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Xiaoluo Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen, 518055, China
| | - Junbiao Dai
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518055, China
| |
Collapse
|
20
|
Welzel M, Dreßler H, Heider D. Turbo autoencoders for the DNA data storage channel with Autoturbo-DNA. iScience 2024; 27:109575. [PMID: 38638577 PMCID: PMC11024904 DOI: 10.1016/j.isci.2024.109575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/04/2024] [Accepted: 03/25/2024] [Indexed: 04/20/2024] Open
Abstract
DNA, with its high storage density and long-term stability, is a potential candidate for a next-generation storage device. The DNA data storage channel, composed of synthesis, amplification, storage, and sequencing, exhibits error probabilities and error profiles specific to the components of the channel. Here, we present Autoturbo-DNA, a PyTorch framework for training error-correcting, overcomplete autoencoders specifically tailored for the DNA data storage channel. It allows training different architecture combinations and using a wide variety of channel component models for noise generation during training. It further supports training the encoder to generate DNA sequences that adhere to user-defined constraints. Autoturbo-DNA exhibits error-correction capabilities close to non-neural-network state-of-the-art error correction and constrained codes for DNA data storage. Our results indicate that neural-network-based codes can be a viable alternative to traditionally designed codes for the DNA data storage channel.
Collapse
Affiliation(s)
- Marius Welzel
- Department of Mathematics and Computer Science, University of Marburg, 35043 Marburg, Hesse, Germany
| | - Hagen Dreßler
- Department of Sustainable Systems Engineering, University of Freiburg, Fahnenbergplatz, 79085 Freiburg im Breisgau, Baden-Württemberg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, University of Marburg, 35043 Marburg, Hesse, Germany
| |
Collapse
|
21
|
Yang B, Cui T, Guo L, Dong L, Wu J, Xing Y, Xu Y, Chen J, Wang Y, Cui Z, Dong Y. Advanced Smart Biomaterials for Regenerative Medicine and Drug Delivery Based on Phosphoramidite Chemistry: From Oligonucleotides to Precision Polymers. Biomacromolecules 2024; 25:2701-2714. [PMID: 38608139 DOI: 10.1021/acs.biomac.4c00259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
Over decades of development, while phosphoramidite chemistry has been known as the leading method in commercial synthesis of oligonucleotides, it has also revolutionized the fabrication of sequence-defined polymers (SDPs), offering novel functional materials in polymer science and clinical medicine. This review has introduced the evolution of phosphoramidite chemistry, emphasizing its development from the synthesis of oligonucleotides to the creation of universal SDPs, which have unlocked the potential for designing programmable smart biomaterials with applications in diverse areas including data storage, regenerative medicine and drug delivery. The key methodologies, functions, biomedical applications, and future challenges in SDPs, have also been summarized in this review, underscoring the significance of breakthroughs in precisely synthesized materials.
Collapse
Affiliation(s)
- Bo Yang
- Sinopec (Beijing) Research Institute of Chemical Industry CO., Ltd., Beijing 100013, P. R. China
| | - Ting Cui
- Sinopec (Beijing) Research Institute of Chemical Industry CO., Ltd., Beijing 100013, P. R. China
| | - Liang Guo
- Sinopec (Beijing) Research Institute of Chemical Industry CO., Ltd., Beijing 100013, P. R. China
| | - Lianqiang Dong
- CAS Key Laboratory of Colloid, Interface and Chemical Thermodynamics, Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jun Wu
- CAS Key Laboratory of Colloid, Interface and Chemical Thermodynamics, Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yongzheng Xing
- National Engineering Research Center for Colloidal Materials, School of Chemistry and Chemical Engineering, Shandong University, Jinan 250100, China
| | - Yun Xu
- Center for Medical Device Evaluation, China Food and Drug Administration (CFDA), Beijing 100084, China
| | - Jian Chen
- Sinopec (Beijing) Research Institute of Chemical Industry CO., Ltd., Beijing 100013, P. R. China
| | - Yufei Wang
- Sinopec (Beijing) Research Institute of Chemical Industry CO., Ltd., Beijing 100013, P. R. China
| | - Zhonghui Cui
- Sinopec (Beijing) Research Institute of Chemical Industry CO., Ltd., Beijing 100013, P. R. China
| | - Yuanchen Dong
- CAS Key Laboratory of Colloid, Interface and Chemical Thermodynamics, Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
22
|
Callisto A, Strutz J, Leeper K, Kalhor R, Church G, Tyo KE, Bhan N. Post-translation digital data encoding into the genomes of mammalian cell populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.591851. [PMID: 38765976 PMCID: PMC11100781 DOI: 10.1101/2024.05.12.591851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
High resolution cellular signal encoding is critical for better understanding of complex biological phenomena. DNA-based biosignal encoders alter genomic or plasmid DNA in a signal dependent manner. Current approaches involve the signal of interest affecting a DNA edit by interacting with a signal specific promoter which then results in expression of the effector molecule (DNA altering enzyme). Here, we present the proof of concept of a biosignal encoding system where the enzyme terminal deoxynucleotidyl transferase (TdT) acts as the effector molecule upon directly interacting with the signal of interest. A template independent DNA polymerase (DNAp), TdT incorporates nucleotides at the 3' OH ends of DNA substrate in a signal dependent manner. By employing CRISPR-Cas9 to create double stranded breaks in genomic DNA, we make 3'OH ends available to act as substrate for TdT. We show that this system can successfully resolve and encode different concentrations of various biosignals into the genomic DNA of HEK-293T cells. Finally, we develop a simple encoding scheme associated with the tested biosignals and encode the message "HELLO WORLD" into the genomic DNA of HEK-293T cells at a population level with 91% accuracy. This work demonstrates a simple and engineerable system that can reliably store local biosignal information into the genomes of mammalian cell populations.
Collapse
Affiliation(s)
- Alec Callisto
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Jonathan Strutz
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Kathleen Leeper
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Reza Kalhor
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - George Church
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Keith E.J. Tyo
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Namita Bhan
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Biomedical Research at Novartis, Cambridge, MA, USA
| |
Collapse
|
23
|
Gumus S, Biechele-Speziale D, Manz KE, Pennell KD, Rubenstein BM, Rosenstein JK. Repurposing Waste Chemicals for Sustainable and Durable Molecular Data Storage. ACS OMEGA 2024; 9:19904-19910. [PMID: 38737050 PMCID: PMC11079871 DOI: 10.1021/acsomega.3c09234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/31/2024] [Accepted: 04/15/2024] [Indexed: 05/14/2024]
Abstract
Molecular data storage offers the intriguing possibility of higher theoretical density and longer lifetimes than today's electronic memory devices. Some demonstrations have used deoxyribonucleic acid (DNA), but bottlenecks in nucleic acid synthesis continue to make DNA data storage orders of magnitude more expensive than electronic storage media. Additionally, despite its potential for long-term storage, DNA faces durability challenges from environmental degradation. In this work, we demonstrate nongenomic molecular data storage using molecular libraries redirected from chemical waste streams. This approach requires no synthetic effort and can be implemented by using molecules that have a minimal associated cost. While the technique is agnostic about the exact molecular content of its inputs, we confirmed that some sources contained poly fluoroalkyl substances (PFAS), which persist for long periods in the natural environment and could offer extremely durable information storage as well as environmental benefits. These demonstrations provide a perspective on some of the valuable possibilities for nongenomic molecular information systems.
Collapse
Affiliation(s)
| | | | - Katherine E. Manz
- Brown
University, Providence, Rhode Island 02912, United States
- University
of Michigan, Ann Arbor, Michigan 48109, United States
| | - Kurt D. Pennell
- Brown
University, Providence, Rhode Island 02912, United States
| | | | | |
Collapse
|
24
|
Yu M, Tang X, Li Z, Wang W, Wang S, Li M, Yu Q, Xie S, Zuo X, Chen C. High-throughput DNA synthesis for data storage. Chem Soc Rev 2024; 53:4463-4489. [PMID: 38498347 DOI: 10.1039/d3cs00469d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
With the explosion of digital world, the dramatically increasing data volume is expected to reach 175 ZB (1 ZB = 1012 GB) in 2025. Storing such huge global data would consume tons of resources. Fortunately, it has been found that the deoxyribonucleic acid (DNA) molecule is the most compact and durable information storage medium in the world so far. Its high coding density and long-term preservation properties make itself one of the best data storage carriers for the future. High-throughput DNA synthesis is a key technology for "DNA data storage", which encodes binary data stream (0/1) into quaternary long DNA sequences consisting of four bases (A/G/C/T). In this review, the workflow of DNA data storage and the basic methods of artificial DNA synthesis technology are outlined first. Then, the technical characteristics of different synthesis methods and the state-of-the-art of representative commercial companies, with a primary focus on silicon chip microarray-based synthesis and novel enzymatic DNA synthesis are presented. Finally, the recent status of DNA storage and new opportunities for future development in the field of high-throughput, large-scale DNA synthesis technology are summarized.
Collapse
Affiliation(s)
- Meng Yu
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Xiaohui Tang
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Zhenhua Li
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Weidong Wang
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Shaopeng Wang
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Min Li
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Qiuliyang Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 518055, Shenzhen, China
| | - Sijia Xie
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China.
| | - Chang Chen
- Institute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China.
- School of Microelectronics, Shanghai University, 201800, Shanghai, China
- Shanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, 200050, Shanghai, China
| |
Collapse
|
25
|
Ben Shabat D, Hadad A, Boruchovsky A, Yaakobi E. GradHC: highly reliable gradual hash-based clustering for DNA storage systems. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae274. [PMID: 38648049 DOI: 10.1093/bioinformatics/btae274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/27/2024] [Accepted: 04/17/2024] [Indexed: 04/25/2024]
Abstract
MOTIVATION As data storage challenges grow and existing technologies approach their limits, synthetic DNA emerges as a promising storage solution due to its remarkable density and durability advantages. While cost remains a concern, emerging sequencing and synthetic technologies aim to mitigate it, yet introduce challenges such as errors in the storage and retrieval process. One crucial task in a DNA storage system is clustering numerous DNA reads into groups that represent the original input strands. RESULTS In this paper, we review different methods for evaluating clustering algorithms and introduce a novel clustering algorithm for DNA storage systems, named Gradual Hash-based clustering (GradHC). The primary strength of GradHC lies in its capability to cluster with excellent accuracy various types of designs, including varying strand lengths, cluster sizes (including extremely small clusters), and different error ranges. Benchmark analysis demonstrates that GradHC is significantly more stable and robust than other clustering algorithms previously proposed for DNA storage, while also producing highly reliable clustering results. AVAILABILITY AND IMPLEMENTATION https://github.com/bensdvir/GradHC.
Collapse
Affiliation(s)
- Dvir Ben Shabat
- Department of Computer Science, Technion, Haifa 320003, Israel
| | - Adar Hadad
- Department of Computer Science, Technion, Haifa 320003, Israel
| | | | - Eitan Yaakobi
- Department of Computer Science, Technion, Haifa 320003, Israel
| |
Collapse
|
26
|
Onoe J, Noda Y, Wang Q, Harano K, Nakaya M, Nakayama T. Structures, fundamental properties, and potential applications of low-dimensional C 60 polymers and other nanocarbons: a review. SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS 2024; 25:2346068. [PMID: 38774495 PMCID: PMC11107862 DOI: 10.1080/14686996.2024.2346068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 04/17/2024] [Indexed: 05/24/2024]
Abstract
Since carbon (C) atom has a variety of chemical bonds via hybridization between s and p atomic orbitals, it is well known that there are robust carbon materials. In particular, discovery of C60 has been an epoch making to cultivate nanocarbon fields. Since then, nanocarbon materials such as nanotube and graphene have been reported. It is interesting to note that C60 is soluble and volatile unlike nanotube and graphene. This indicates that C60 film is easy to be produced on any kinds of substrates, which is advantage for device fabrication. In particular, electron-/photo-induced C60 polymerization finally results in formation of one-dimensional (1D) metallic peanut-shaped and 2D dumbbell-shaped semiconducting C60 polymers, respectively. This enables us to control the physicochemical properties of C60 films using electron-/photo-lithography techniques. In this review, we focused on the structures, fundamental properties, and potential applications of the low-dimensional C60 polymers and other nanocarbons such as C60 peapods, wavy-structured graphene, and penta-nanotubes with topological defects. We hope this review will provide new insights for producing new novel nanocarbon materials and inspire broad readers to cultivate new further research in carbon materials.
Collapse
Affiliation(s)
- Jun Onoe
- Department of Energy Science and Engineering, Nagoya University, Nagoya, Japan
| | - Yusuke Noda
- Department of Information and Communication Engineering, Okayama Prefectural University, Soja, Japan
| | - Qian Wang
- School of Materials Science and Engineering/Center for Applied Physics and Technology, Peking University, Beijing, China
| | - Koji Harano
- Center for Basic Research on Materials, and Division of International Collaborations and Public Relations, National Institute for Materials Science (NIMS), Tsukuba, Japan
| | - Masato Nakaya
- Department of Energy Science and Engineering, Nagoya University, Nagoya, Japan
| | - Tomonobu Nakayama
- Center for Basic Research on Materials, and Division of International Collaborations and Public Relations, National Institute for Materials Science (NIMS), Tsukuba, Japan
| |
Collapse
|
27
|
Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024; 43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open
Abstract
Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China; Centre for Frontier AI Research, Agency for Science, Technology, and Research (A(∗)STAR), 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| | - Qi Shao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Zhenlu Liu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Yunzhu Zhao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China.
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| |
Collapse
|
28
|
Luescher AM, Gimpel AL, Stark WJ, Heckel R, Grass RN. Chemical unclonable functions based on operable random DNA pools. Nat Commun 2024; 15:2955. [PMID: 38580696 PMCID: PMC10997750 DOI: 10.1038/s41467-024-47187-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 03/25/2024] [Indexed: 04/07/2024] Open
Abstract
Physical unclonable functions (PUFs) based on unique tokens generated by random manufacturing processes have been proposed as an alternative to mathematical one-way algorithms. However, these tokens are not distributable, which is a disadvantage for decentralized applications. Finding unclonable, yet distributable functions would help bridge this gap and expand the applications of object-bound cryptography. Here we show that large random DNA pools with a segmented structure of alternating constant and randomly generated portions are able to calculate distinct outputs from millions of inputs in a specific and reproducible manner, in analogy to physical unclonable functions. Our experimental data with pools comprising up to >1010 unique sequences and encompassing >750 comparisons of resulting outputs demonstrate that the proposed chemical unclonable function (CUF) system is robust, distributable, and scalable. Based on this proof of concept, CUF-based anti-counterfeiting systems, non-fungible objects and decentralized multi-user authentication are conceivable.
Collapse
Affiliation(s)
- Anne M Luescher
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Andreas L Gimpel
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Wendelin J Stark
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Reinhard Heckel
- Department of Computer Engineering, Technical University of Munich, Arcisstrasse 21, 80333, Munich, Germany
| | - Robert N Grass
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland.
| |
Collapse
|
29
|
Preuss I, Rosenberg M, Yakhini Z, Anavy L. Efficient DNA-based data storage using shortmer combinatorial encoding. Sci Rep 2024; 14:7731. [PMID: 38565928 PMCID: PMC11369284 DOI: 10.1038/s41598-024-58386-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/28/2024] [Indexed: 04/04/2024] Open
Abstract
Data storage in DNA has recently emerged as a promising archival solution, offering space-efficient and long-lasting digital storage solutions. Recent studies suggest leveraging the inherent redundancy of synthesis and sequencing technologies by using composite DNA alphabets. A major challenge of this approach involves the noisy inference process, obstructing large composite alphabets. This paper introduces a novel approach for DNA-based data storage, offering, in some implementations, a 6.5-fold increase in logical density over standard DNA-based storage systems, with near-zero reconstruction error. Combinatorial DNA encoding uses a set of clearly distinguishable DNA shortmers to construct large combinatorial alphabets, where each letter consists of a subset of shortmers. We formally define various combinatorial encoding schemes and investigate their theoretical properties. These include information density and reconstruction probabilities, as well as required synthesis and sequencing multiplicities. We then propose an end-to-end design for a combinatorial DNA-based data storage system, including encoding schemes, two-dimensional (2D) error correction codes, and reconstruction algorithms, under different error regimes. We performed simulations and show, for example, that the use of 2D Reed-Solomon error correction has significantly improved reconstruction rates. We validated our approach by constructing two combinatorial sequences using Gibson assembly, imitating a 4-cycle combinatorial synthesis process. We confirmed the successful reconstruction, and established the robustness of our approach for different error types. Subsampling experiments supported the important role of sampling rate and its effect on the overall performance. Our work demonstrates the potential of combinatorial shortmer encoding for DNA-based data storage and describes some theoretical research questions and technical challenges. Combining combinatorial principles with error-correcting strategies, and investing in the development of DNA synthesis technologies that efficiently support combinatorial synthesis, can pave the way to efficient, error-resilient DNA-based storage solutions.
Collapse
Affiliation(s)
- Inbal Preuss
- School of Computer Science, Reichman University, 4610101, Herzliya, Israel.
- Faculty of Computer Science, Technion, 3200003, Haifa, Israel.
| | - Michael Rosenberg
- Institute of Nanotechnology and Advanced Materials, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, 5290002, Ramat Gan, Israel
| | - Zohar Yakhini
- School of Computer Science, Reichman University, 4610101, Herzliya, Israel
- Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Leon Anavy
- School of Computer Science, Reichman University, 4610101, Herzliya, Israel
- Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| |
Collapse
|
30
|
Hou Z, Qiang W, Wang X, Chen X, Hu X, Han X, Shen W, Zhang B, Xing P, Shi W, Dai J, Huang X, Zhao G. "Cell Disk" DNA Storage System Capable of Random Reading and Rewriting. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305921. [PMID: 38332565 PMCID: PMC11022697 DOI: 10.1002/advs.202305921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 11/23/2023] [Indexed: 02/10/2024]
Abstract
DNA has emerged as an appealing material for information storage due to its great storage density and durability. Random reading and rewriting are essential tasks for practical large-scale data storage. However, they are currently difficult to implement simultaneously in a single DNA-based storage system, strongly limiting their practicability. Here, a "Cell Disk" storage system is presented, achieving high-density in vivo DNA data storage that enables both random reading and rewriting. In this system, each yeast cell is used as a chamber to store information, similar to a "disk block" but with the ability to self-replicate. Specifically, each genome of yeast cell has a customized CRISPR/Cas9-based "lock-and-key" module inserted, which allows selective retrieval, erasure, or rewriting of the targeted cell "block" from a pool of cells ("disk"). Additionally, a codec algorithm with lossless compression ability is developed to improve the information density of each cell "block". As a proof of concept, target-specific reading and rewriting of the compressed data from a mimic cell "disk" comprising up to 105 "blocks" are demonstrated and achieve high specificity and reliability. The "Cell Disk" system described here concurrently supports random reading and rewriting, and it should have great scalability for practical data storage use.
Collapse
Affiliation(s)
- Zhaohua Hou
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Wei Qiang
- Shenzhen Key Laboratory of Synthetic GenomicsGuangdong Provincial Key Laboratory of Synthetic GenomicsShenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenGuangdong518055P. R. China
| | - Xiangxiang Wang
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Xiaoxu Chen
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Xin Hu
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Xuye Han
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Wenlu Shen
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Bing Zhang
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Peng Xing
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Wenping Shi
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| | - Junbiao Dai
- Shenzhen Key Laboratory of Synthetic GenomicsGuangdong Provincial Key Laboratory of Synthetic GenomicsShenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenGuangdong518055P. R. China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenP. R. China
| | - Xiaoluo Huang
- Shenzhen Key Laboratory of Synthetic GenomicsGuangdong Provincial Key Laboratory of Synthetic GenomicsShenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenGuangdong518055P. R. China
| | - Guanghou Zhao
- School of Ecology and EnvironmentNorthwestern Polytechnical University1 Dongxiang Road, Chang'an DistrictXi'anShaanxi710129P. R. China
| |
Collapse
|
31
|
Huang X, Cui J, Qiang W, Ye J, Wang Y, Xie X, Li Y, Dai J. Storage-D: A user-friendly platform that enables practical and personalized DNA data storage. IMETA 2024; 3:e168. [PMID: 38882485 PMCID: PMC11170965 DOI: 10.1002/imt2.168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 10/30/2023] [Accepted: 11/14/2023] [Indexed: 06/18/2024]
Abstract
Deoxyribonucleic acid (DNA) has been suggested as a very promising medium for data storage in recent years. Although numerous studies have advocated for DNA data storage, its practical application remains obscure and there is a lack of a user-oriented platform. Here, we developed a DNA data storage platform, named Storage-D, which allows users to convert their data into DNA sequences of any length and vice versa by selecting algorithms, error-correction, random-access, and codec pin strategies in terms of their own choice. It incorporates a newly designed "Wukong" algorithm, which provides over 20 trillion codec pins for data privacy use. This algorithm can also control GC content to the selected standard, as well as adjust the homopolymer run length to a defined level, while maintaining a high coding potential of ~1.98 bis/nt, allowing it to outperform previous algorithms. By connecting to a commercial DNA synthesis and sequencing platform with "Storage-D," we successfully stored "Diagnosis and treatment protocol for COVID-19 patients" into 200 nt oligo pools in vitro, and 500 bp genes in vivo which replicated in both normal and extreme bacteria. Together, this platform allows for practical and personalized DNA data storage, potentially with a wide range of applications.
Collapse
Affiliation(s)
- Xiaoluo Huang
- Shenzhen Key Laboratory of Synthetic Genomics Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen China
| | - Junting Cui
- Shenzhen Key Laboratory of Synthetic Genomics Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen China
| | - Wei Qiang
- Shenzhen Key Laboratory of Synthetic Genomics Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen China
| | - Jianwen Ye
- School of Biology and Biological Engineering South China University of Technology Guangzhou China
| | - Yu Wang
- Shenzhen Key Laboratory of Synthetic Genomics Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen China
| | - Xinying Xie
- School of Biology and Biological Engineering South China University of Technology Guangzhou China
| | - Yuanzhen Li
- Shenzhen Key Laboratory of Synthetic Genomics Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen China
| | - Junbiao Dai
- Shenzhen Key Laboratory of Synthetic Genomics Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen China
- Shenzhen Branch Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences Shenzhen China
| |
Collapse
|
32
|
Zhang J, Hou C, Liu C. CRISPR-powered quantitative keyword search engine in DNA data storage. Nat Commun 2024; 15:2376. [PMID: 38491032 PMCID: PMC10943086 DOI: 10.1038/s41467-024-46767-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Despite the growing interest of archiving information in synthetic DNA to confront data explosion, quantitatively querying the data stored in DNA is still a challenge. Herein, we present Search Enabled by Enzymatic Keyword Recognition (SEEKER), which utilizes CRISPR-Cas12a to rapidly generate visible fluorescence when a DNA target corresponding to the keyword of interest is present. SEEKER achieves quantitative text searching since the growth rate of fluorescence intensity is proportional to keyword frequency. Compatible with SEEKER, we develop non-collision grouping coding, which reduces the size of dictionary and enables lossless compression without disrupting the original order of texts. Using four queries, we correctly identify keywords in 40 files with a background of ~8000 irrelevant terms. Parallel searching with SEEKER can be performed on a 3D-printed microfluidic chip. Overall, SEEKER provides a quantitative approach to conducting parallel searching over the complete content stored in DNA with simple implementation and rapid result generation.
Collapse
Affiliation(s)
- Jiongyu Zhang
- Department of Biomedical Engineering, University of Connecticut Health Center, Farmington, CT, 06030, USA
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT, 06269, USA
| | - Chengyu Hou
- Department of Biomedical Engineering, University of Connecticut Health Center, Farmington, CT, 06030, USA
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT, 06269, USA
| | - Changchun Liu
- Department of Biomedical Engineering, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
33
|
Nerantzaki M, Husser C, Ryckelynck M, Lutz JF. Exchanging and Releasing Information in Synthetic Digital Polymers Using a Strand-Displacement Strategy. J Am Chem Soc 2024; 146:6456-6460. [PMID: 38286022 DOI: 10.1021/jacs.3c13953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Toehold-mediated strand displacement (TMSD) was tested as a tool to edit information in synthetic digital polymers. Uniform DNA-polymer biohybrid macromolecules were first synthesized by automated phosphoramidite chemistry and characterized by HPLC, mass spectrometry, and polyacrylamide gel electrophoresis (PAGE). These precursors were diblock structures containing a synthetic poly(phosphodiester) (PPDE) segment covalently attached to a single-stranded DNA sequence. Three types of biohybrids were prepared herein: a substrate containing an accessible toehold as well as input and output macromolecules. The substrate and the input macromolecules contained noncoded PPDE homopolymers, whereas the output macromolecule contained a digitally encoded segment. After hybridization of the substrate with the output, incubation in the presence of the input led to efficient TMSD and the release of the digital segment. TMSD can therefore be used to erase or rewrite information in self-assembled biohybrid superstructures. Furthermore, it was found in this work that the conjugation of DNA single strands to synthetic segments of chosen lengths greatly facilitates the characterization and PAGE visualization of the TMSD process.
Collapse
Affiliation(s)
- Maria Nerantzaki
- Université de Strasbourg, CNRS, ISIS, 8 allée Gaspard Monge, 67000 Strasbourg, France
| | - Claire Husser
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, UPR 9002, 2 allée Konrad Roentgen, 67084 Strasbourg, France
| | - Michael Ryckelynck
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, UPR 9002, 2 allée Konrad Roentgen, 67084 Strasbourg, France
| | - Jean-François Lutz
- Université de Strasbourg, CNRS, ISIS, 8 allée Gaspard Monge, 67000 Strasbourg, France
| |
Collapse
|
34
|
Wang K, Cao B, Ma T, Zhao Y, Zheng Y, Wang B, Zhou S, Zhang Q. Storing Images in DNA via base128 Encoding. J Chem Inf Model 2024; 64:1719-1729. [PMID: 38385334 DOI: 10.1021/acs.jcim.3c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Current DNA storage schemes lack flexibility and consistency in processing highly redundant and correlated image data, resulting in low sequence stability and image reconstruction rates. Therefore, according to the characteristics of image storage, this paper proposes storing images in DNA via base128 encoding (DNA-base128). In the data writing stage, data segmentation and probability statistics are carried out, and then, the data block frequency and constraint encoding set are associated with achieving encoding. When the image needs to be recovered, DNA-base128 completes internal error correction by threshold setting and drift comparison. Compared with representative work, the DNA-base128 encoding results show that the undesired motifs were reduced by 71.2-90.7% and that the local guanine-cytosine content variance was reduced by 3 times, indicating that DNA-base128 can store images more stably. In addition, the structural similarity index (SSIM) and multiscale structural similarity (MS-SSIM) of image reconstruction using DNA-base128 were improved by 19-102 and 6.6-20.3%, respectively. In summary, DNA-base128 provides image encoding with internal error correction and provides a potential solution for DNA image storage. The data and code are available at the GitHub repository: https://github.com/123456wk/DNA_base128.
Collapse
Affiliation(s)
- Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Ma
- Brain Function Research Section, China Medical University, Shenyang 110001, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Shihua Zhou
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| |
Collapse
|
35
|
Zheng Y, Cao B, Zhang X, Cui S, Wang B, Zhang Q. DNA-QLC: an efficient and reliable image encoding scheme for DNA storage. BMC Genomics 2024; 25:266. [PMID: 38461245 PMCID: PMC10925009 DOI: 10.1186/s12864-024-10178-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 03/01/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND DNA storage has the advantages of large capacity, long-term stability, and low power consumption relative to other storage mediums, making it a promising new storage medium for multimedia information such as images. However, DNA storage has a low coding density and weak error correction ability. RESULTS To achieve more efficient DNA storage image reconstruction, we propose DNA-QLC (QRes-VAE and Levenshtein code (LC)), which uses the quantized ResNet VAE (QRes-VAE) model and LC for image compression and DNA sequence error correction, thus improving both the coding density and error correction ability. Experimental results show that the DNA-QLC encoding method can not only obtain DNA sequences that meet the combinatorial constraints, but also have a net information density that is 2.4 times higher than DNA Fountain. Furthermore, at a higher error rate (2%), DNA-QLC achieved image reconstruction with an SSIM value of 0.917. CONCLUSIONS The results indicate that the DNA-QLC encoding scheme guarantees the efficiency and reliability of the DNA storage system and improves the application potential of DNA storage for multimedia information such as images.
Collapse
Affiliation(s)
- Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, DalianLiaoning, 116024, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, DalianLiaoning, 116024, China
| | - Xiaokang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, DalianLiaoning, 116024, China
| | - Shuang Cui
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, DalianLiaoning, 116024, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, DalianLiaoning, 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, DalianLiaoning, 116024, China.
| |
Collapse
|
36
|
Zhang X, Qi B, Niu Y. A dual-rule encoding DNA storage system using chaotic mapping to control GC content. Bioinformatics 2024; 40:btae113. [PMID: 38419588 PMCID: PMC10937898 DOI: 10.1093/bioinformatics/btae113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/21/2024] [Accepted: 02/26/2024] [Indexed: 03/02/2024] Open
Abstract
MOTIVATION DNA as a novel storage medium is considered an effective solution to the world's growing demand for information due to its high density and long-lasting reliability. However, early coding schemes ignored the biologically constrained nature of DNA sequences in pursuit of high density, leading to DNA synthesis and sequencing difficulties. This article proposes a novel DNA storage coding scheme. The system encodes half of the binary data using each of the two GC-content complementary encoding rules to obtain a DNA sequence. RESULTS After simulating the encoding of representative document and image file formats, a DNA sequence strictly conforming to biological constraints was obtained, reaching a coding potential of 1.66 bit/nt. In the decoding process, a mechanism to prevent error propagation was introduced. The simulation results demonstrate that by adding Reed-Solomon code, 90% of the data can still be recovered after introducing a 2% error, proving that the proposed DNA storage scheme has high robustness and reliability. Availability and implementation: The source code for the codec scheme of this paper is available at https://github.com/Mooreniah/DNA-dual-rule-rotary-encoding-storage-system-DRRC.
Collapse
Affiliation(s)
- Xuncai Zhang
- College of Electrical Information Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, Henan, China
| | - Baonan Qi
- College of Electrical Information Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, Henan, China
| | - Ying Niu
- College of Building Environment Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, Henan, China
| |
Collapse
|
37
|
Kiryanova OY, Garafutdinov RR, Gubaydullin IM, Chemeris AV. A novel approach to encode melodies in DNA. Biosystems 2024; 237:105136. [PMID: 38316169 DOI: 10.1016/j.biosystems.2024.105136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/17/2023] [Accepted: 02/02/2024] [Indexed: 02/07/2024]
Abstract
DNA data storage has gained more attention last decades. DNA molecules can be used for encoding of non-biological information and as promising carriers due to greater data capacity, higher duration of the storage, and better technical failures stability. Here we propose a new method for encoding of notes and music in DNA. The encoding technique takes into account the duration and tonality of each note, enabling to encode all seven octaves by assigning a nucleotide sequence to each key. A certain set of short sequences is suggested to define the duration of note sound. The proposed method allows to encode more complicated melodies compared to the approach based on Huffman algorithm.
Collapse
Affiliation(s)
- Olga Yu Kiryanova
- Institute of Petrochemistry and Catalysis, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 141, 450075, Ufa, Bashkortostan, Russian Federation.
| | - Ravil R Garafutdinov
- Institute of Biochemistry and Genetics, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 71, 450054, Ufa, Bashkortostan, Russian Federation.
| | - Irek M Gubaydullin
- Institute of Petrochemistry and Catalysis, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 141, 450075, Ufa, Bashkortostan, Russian Federation.
| | - Alexey V Chemeris
- Institute of Biochemistry and Genetics, Ufa Federal Research Center, Russian Academy of Sciences, Prosp. Oktyabrya, 71, 450054, Ufa, Bashkortostan, Russian Federation.
| |
Collapse
|
38
|
Yang S, Bögels BWA, Wang F, Xu C, Dou H, Mann S, Fan C, de Greef TFA. DNA as a universal chemical substrate for computing and data storage. Nat Rev Chem 2024; 8:179-194. [PMID: 38337008 DOI: 10.1038/s41570-024-00576-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2024] [Indexed: 02/12/2024]
Abstract
DNA computing and DNA data storage are emerging fields that are unlocking new possibilities in information technology and diagnostics. These approaches use DNA molecules as a computing substrate or a storage medium, offering nanoscale compactness and operation in unconventional media (including aqueous solutions, water-in-oil microemulsions and self-assembled membranized compartments) for applications beyond traditional silicon-based computing systems. To build a functional DNA computer that can process and store molecular information necessitates the continued development of strategies for computing and data storage, as well as bridging the gap between these fields. In this Review, we explore how DNA can be leveraged in the context of DNA computing with a focus on neural networks and compartmentalized DNA circuits. We also discuss emerging approaches to the storage of data in DNA and associated topics such as the writing, reading, retrieval and post-synthesis editing of DNA-encoded data. Finally, we provide insights into how DNA computing can be integrated with DNA data storage and explore the use of DNA for near-memory computing for future information technology and health analysis applications.
Collapse
Affiliation(s)
- Shuo Yang
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Bas W A Bögels
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Fei Wang
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Can Xu
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Hongjing Dou
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Stephen Mann
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China.
- Centre for Protolife Research and Centre for Organized Matter Chemistry, School of Chemistry, University of Bristol, Bristol, UK.
- Max Planck-Bristol Centre for Minimal Biology, School of Chemistry, University of Bristol, Bristol, UK.
| | - Chunhai Fan
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Tom F A de Greef
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands.
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Institute for Molecules and Materials, Radboud University, Nijmegen, The Netherlands.
- Center for Living Technologies, Eindhoven-Wageningen-Utrecht Alliance, Utrecht, The Netherlands.
| |
Collapse
|
39
|
Kim J, Kim H, Bang D. An open-source, 3D printed inkjet DNA synthesizer. Sci Rep 2024; 14:3773. [PMID: 38355610 PMCID: PMC10867077 DOI: 10.1038/s41598-024-53944-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 02/07/2024] [Indexed: 02/16/2024] Open
Abstract
Synthetic oligonucleotides have become a fundamental tool in a wide range of biological fields, including synthetic biology, biosensing, and DNA storage. Reliable access to equipment for synthesizing high-density oligonucleotides in the laboratory ensures research security and the freedom of research expansion. In this study, we introduced the Open-Source Inkjet DNA Synthesizer (OpenIDS), an open-source inkjet-based microarray synthesizer that offers ease of construction, rapid deployment, and flexible scalability. Utilizing 3D printing, Arduino, and Raspberry Pi, this newly designed synthesizer achieved robust stability with an industrial inkjet printhead. OpenIDS maintains low production costs and is therefore suitable for self-fabrication and optimization in academic laboratories. Moreover, even non-experts can create and control the synthesizer with a high degree of freedom for structural modifications. Users can easily add printheads or alter the design of the microarray substrate according to their research needs. To validate its performance, we synthesized oligonucleotides on 144 spots on a 15 × 25-mm silicon wafer filled with controlled pore glass. The synthesized oligonucleotides were analyzed using urea polyacrylamide gel electrophoresis.
Collapse
Affiliation(s)
- Junhyeong Kim
- Department of Chemistry, Yonsei University, Seoul, Korea
| | - Haeun Kim
- Department of Chemistry, Yonsei University, Seoul, Korea
| | - Duhee Bang
- Department of Chemistry, Yonsei University, Seoul, Korea.
| |
Collapse
|
40
|
Zhang Y, Chen Y, Liu X, Ling Q, Wu R, Yang J, Zhang C. Programmable Primer Switching for Regulating Enzymatic DNA Circuits. ACS NANO 2024; 18:5089-5100. [PMID: 38286819 DOI: 10.1021/acsnano.3c12000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Developing DNA strand displacement reactions (SDRs) offers crucial technical support for regulating artificial nucleic acid circuits and networks. More recently, enzymatic SDR-based DNA circuits have gained significant attention because of their modular design, high orthogonality signaling, and extremely fast reaction rates. Typical enzymatic SDRs are regulated by relatively long primers (20-30 nucleotides) that hybridize to form stable double-stranded structures, facilitating enzyme-initiated events. Implementing more flexible primer-based enzymatic SDR regulations remains challenging due to the lack of convenient and simple primer control mechanism, which consequently limits the development of enzymatic DNA circuits. In this study, we propose an approach, termed primer switching regulation, that implements programmable and flexible regulations of enzymatic circuits by introducing switchable wires into the enzymatic circuits. We applied this method to generate diverse enzymatic DNA circuits, including cascading, fan-in/fan-out, dual-rail, feed-forward, and feedback functions. Through this method, complex circuit functions can be implemented by just introducing additional switching wires without reconstructing the basic circuit frameworks. The method is experimentally demonstrated to provide flexible and programmable regulations to control enzymatic DNA circuits and has future applications in DNA computing, biosensing, and DNA storage.
Collapse
Affiliation(s)
- Yongpeng Zhang
- School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
| | - Yiming Chen
- School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
| | - Xuan Liu
- School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
| | - Qian Ling
- School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
| | - Ranfeng Wu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jing Yang
- School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
| | - Cheng Zhang
- School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
| |
Collapse
|
41
|
Ding L, Wu S, Hou Z, Li A, Xu Y, Feng H, Pan W, Ruan J. Improving error-correcting capability in DNA digital storage via soft-decision decoding. Natl Sci Rev 2024; 11:nwad229. [PMID: 38213525 PMCID: PMC10776348 DOI: 10.1093/nsr/nwad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 08/03/2023] [Accepted: 08/15/2023] [Indexed: 01/13/2024] Open
Abstract
Error-correcting codes (ECCs) employed in the state-of-the-art DNA digital storage (DDS) systems suffer from a trade-off between error-correcting capability and the proportion of redundancy. To address this issue, in this study, we introduce soft-decision decoding approach into DDS by proposing a DNA-specific error prediction model and a series of novel strategies. We demonstrate the effectiveness of our approach through a proof-of-concept DDS system based on Reed-Solomon (RS) code, named as Derrick. Derrick shows significant improvement in error-correcting capability without involving additional redundancy in both in vitro and in silico experiments, using various sequencing technologies such as Illumina, PacBio and Oxford Nanopore Technology (ONT). Notably, in vitro experiments using ONT sequencing at a depth of 7× reveal that Derrick, compared with the traditional hard-decision decoding strategy, doubles the error-correcting capability of RS code, decreases the proportion of matrices with decoding-failure by 229-fold, and amplifies the potential maximum storage volume by impressive 32 388-fold. Also, Derrick surpasses 'state-of-the-art' DDS systems by comprehensively considering the information density and the minimum sequencing depth required for complete information recovery. Crucially, the soft-decision decoding strategy and key steps of Derrick are generalizable to other ECCs' decoding algorithms.
Collapse
Affiliation(s)
- Lulu Ding
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Shigang Wu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Zhihao Hou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou510642, China
| | - Alun Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Yaping Xu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Hu Feng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518120, China
| |
Collapse
|
42
|
Gervasio JHDB, da Costa Oliveira H, da Costa Martins AG, Pesquero JB, Verona BM, Cerize NNP. How close are we to storing data in DNA? Trends Biotechnol 2024; 42:156-167. [PMID: 37673693 DOI: 10.1016/j.tibtech.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/31/2023] [Accepted: 08/04/2023] [Indexed: 09/08/2023]
Abstract
DNA is an intelligent data storage medium due to its stability and high density. It has been used by nature for over 3.5 billion years. Compared with traditional methods, DNA offers better compression and physical density. DNA can retain information for thousands of years. However, challenges exist in scalability, standardization, metadata gathering, biocybersecurity, and specialized tools. Addressing these challenges is crucial for widespread implementation. Collaboration among experts, as well as keeping the future in mind, is needed to unlock the full potential of DNA data storage, which promises low energy costs, high-density storage, and long-term stability.
Collapse
Affiliation(s)
- Joao Henrique Diniz Brandao Gervasio
- Bionanomanufacturing Center, IPT - Institute for Technological Research, Sao Paulo, SP, Brazil; Department of Bioinformatics, UFMG - Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil; Department of Statistics, University of Oxford, Oxford, UK.
| | | | | | | | - Bruno Marinaro Verona
- Bionanomanufacturing Center, IPT - Institute for Technological Research, Sao Paulo, SP, Brazil
| | | |
Collapse
|
43
|
Wang S, Mao X, Wang F, Zuo X, Fan C. Data Storage Using DNA. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2307499. [PMID: 37800877 DOI: 10.1002/adma.202307499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/01/2023] [Indexed: 10/07/2023]
Abstract
The exponential growth of global data has outpaced the storage capacities of current technologies, necessitating innovative storage strategies. DNA, as a natural medium for preserving genetic information, has emerged as a highly promising candidate for next-generation storage medium. Storing data in DNA offers several advantages, including ultrahigh physical density and exceptional durability. Facilitated by significant advancements in various technologies, such as DNA synthesis, DNA sequencing, and DNA nanotechnology, remarkable progress has been made in the field of DNA data storage over the past decade. However, several challenges still need to be addressed to realize practical applications of DNA data storage. In this review, the processes and strategies of in vitro DNA data storage are first introduced, highlighting recent advancements. Next, a brief overview of in vivo DNA data storage is provided, with a focus on the various writing strategies developed to date. At last, the challenges encountered in each step of DNA data storage are summarized and promising techniques are discussed that hold great promise in overcoming these obstacles.
Collapse
Affiliation(s)
- Shaopeng Wang
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Xiuhai Mao
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Fei Wang
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, Zhangjiang Institute for Advanced Study and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, Zhangjiang Institute for Advanced Study and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chunhai Fan
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules, Zhangjiang Institute for Advanced Study and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
44
|
Soukarie D, Nocete L, Bittner AM, Santiago I. DNA data storage in electrospun and melt-electrowritten composite nucleic acid-polymer fibers. Mater Today Bio 2024; 24:100900. [PMID: 38234463 PMCID: PMC10792485 DOI: 10.1016/j.mtbio.2023.100900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 11/26/2023] [Accepted: 12/01/2023] [Indexed: 01/19/2024] Open
Abstract
Incorporating biomolecules as integral parts of computational systems represents a frontier challenge in bio- and nanotechnology. Using DNA to store digital data is an attractive alternative to conventional information technologies due to its high information density and long lifetime. However, developing an adequate DNA storage medium remains a significant challenge in permitting the safe archiving and retrieval of oligonucleotides. This work introduces composite nucleic acid-polymer fibers as matrix materials for digital information-bearing oligonucleotides. We devised a complete workflow for the stable storage of DNA in PEO, PVA, and PCL fibers by employing electrohydrodynamic processes to produce electrospun nanofibers with embedded oligonucleotides. The on-demand retrieval of messages is afforded by non-hazardous chemical treatment and subsequent PCR amplification and DNA sequencing. Finally, we develop a platform for melt-electrowriting of polymer-DNA composites to produce microfiber meshes of programmable patterns and geometries.
Collapse
Affiliation(s)
| | - Lluis Nocete
- Universitat Autònoma de Barcelona, Facultat de Ciències, Barcelona, 08193, Spain
| | - Alexander M. Bittner
- CIC nanoGUNE BRTA, Donostia-San Sebastián, 20018, Spain
- IKERBASQUE Basque Foundation for Science, 48009 Bilbao, Spain
| | - Ibon Santiago
- CIC nanoGUNE BRTA, Donostia-San Sebastián, 20018, Spain
| |
Collapse
|
45
|
Sabary O, Yucovich A, Shapira G, Yaakobi E. Reconstruction algorithms for DNA-storage systems. Sci Rep 2024; 14:1951. [PMID: 38263421 PMCID: PMC10806084 DOI: 10.1038/s41598-024-51730-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024] Open
Abstract
Motivated by DNA storage systems, this work presents the DNA reconstruction problem, in which a length-n string, is passing through the DNA-storage channel, which introduces deletion, insertion and substitution errors. This channel generates multiple noisy copies of the transmitted string which are called traces. A DNA reconstruction algorithm is a mapping which receives t traces as an input and produces an estimation of the original string. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm's estimation. In this work, we present several new algorithms for this problem. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original string. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data, on data from previous DNA storage experiments, and on a new synthesized dataset, and are shown to outperform previous algorithms in reconstruction accuracy.
Collapse
Affiliation(s)
- Omer Sabary
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel.
| | - Alexander Yucovich
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Guy Shapira
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Eitan Yaakobi
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| |
Collapse
|
46
|
Akash A, Bencurova E, Dandekar T. How to make DNA data storage more applicable. Trends Biotechnol 2024; 42:17-30. [PMID: 37591721 DOI: 10.1016/j.tibtech.2023.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/19/2023]
Abstract
The storage of digital data is becoming a worldwide problem. DNA has been recognized as a biological solution due to its ability to store genetic information without alteration over long periods. The first demonstrations of high-capacity long-lasting DNA digital data storage have been shown. However, high storage costs and slow retrieval of the data must be overcome to make DNA data storage more applicable and marketable. Herein, we discuss the issues and recent advances in DNA data storage methods and highlight pathways to make this technology more applicable to real-world digital data storage. We envision that a combination of molecular biology, nanotechnology, novel polymers, electronics, and automation with systematic development will allow DNA data storage sufficient for everyday use.
Collapse
Affiliation(s)
- Aman Akash
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany
| | - Elena Bencurova
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany
| | - Thomas Dandekar
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
47
|
Yeom H, Kim N, Lee AC, Kim J, Kim H, Choi H, Song SW, Kwon S, Choi Y. Highly Accurate Sequence- and Position-Independent Error Profiling of DNA Synthesis and Sequencing. ACS Synth Biol 2023; 12:3567-3577. [PMID: 37961855 PMCID: PMC10729760 DOI: 10.1021/acssynbio.3c00308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 11/01/2023] [Accepted: 11/01/2023] [Indexed: 11/15/2023]
Abstract
A comprehensive error analysis of DNA-stored data during processing, such as DNA synthesis and sequencing, is crucial for reliable DNA data storage. Both synthesis and sequencing errors depend on the sequence and the transition of bases of nucleotides; ignoring either one of the error sources leads to technical challenges in minimizing the error rate. Here, we present a methodology and toolkit that utilizes an oligonucleotide library generated from a 10-base-shifted sequence array, which is individually labeled with unique molecular identifiers, to delineate and profile DNA synthesis and sequencing errors simultaneously. This methodology enables position- and sequence-independent error profiling of both DNA synthesis and sequencing. Using this toolkit, we report base transitional errors in both synthesis and sequencing in general DNA data storage as well as degenerate-base-augmented DNA data storage. The methodology and data presented will contribute to the development of DNA sequence designs with minimal error.
Collapse
Affiliation(s)
- Huiran Yeom
- Division
of Data Science, College of Information and Communication Technology, The University of Suwon, Hwaseong 18323, Republic of Korea
| | - Namphil Kim
- Department
of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | | | - Jinhyun Kim
- Department
of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
| | - Hamin Kim
- Department
of Interdisciplinary Program for Bioengineering, Seoul National University, Seoul 08826, South Korea
| | - Hansol Choi
- Bio-MAX
Institute, Seoul National University, Seoul 08826, Republic of Korea
| | - Seo Woo Song
- Basic Science
and Engineering Initiative, Children’s Heart Center, Stanford University, Stanford, California 94304, United States
| | - Sunghoon Kwon
- Department
of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea
- Department
of Interdisciplinary Program for Bioengineering, Seoul National University, Seoul 08826, South Korea
- Bio-MAX
Institute, Seoul National University, Seoul 08826, Republic of Korea
| | - Yeongjae Choi
- School
of Materials Science and Engineering, Gwangju
Institute of Science and Technology (GIST), Gwangju 61105, Republic of Korea
| |
Collapse
|
48
|
Yang C, Gan X, Zeng Y, Xu Z, Xu L, Hu C, Ma H, Chai B, Hu S, Chai Y. Advanced design and applications of digital microfluidics in biomedical fields: An update of recent progress. Biosens Bioelectron 2023; 242:115723. [PMID: 37832347 DOI: 10.1016/j.bios.2023.115723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 09/11/2023] [Accepted: 09/29/2023] [Indexed: 10/15/2023]
Abstract
Significant breakthroughs have been made in digital microfluidic (DMF)-based technologies over the past decades. DMF technology has attracted great interest in bioassays depending on automatic microscale liquid manipulations and complicated multi-step processing. In this review, the recent advances of DMF platforms in the biomedical field were summarized, focusing on the integrated design and applications of the DMF system. Firstly, the electrowetting-on-dielectric principle, fabrication of DMF chips, and commercialization of the DMF system were elaborated. Then, the updated droplets and magnetic beads manipulation strategies with DMF were explored. DMF-based biomedical applications were comprehensively discussed, including automated sample preparation strategies, immunoassays, molecular diagnosis, blood processing/testing, and microbe analysis. Emerging applications such as enzyme activity assessment and DNA storage were also explored. The performance of each bioassay was compared and discussed, providing insight into the novel design and applications of the DMF technology. Finally, the advantages, challenges, and future trends of DMF systems were systematically summarized, demonstrating new perspectives on the extensive applications of DMF in basic research and commercialization.
Collapse
Affiliation(s)
- Chengbin Yang
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China.
| | - Xiangyu Gan
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China.
| | - Yuping Zeng
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China.
| | - Zhourui Xu
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China.
| | - Longqian Xu
- CAS Key Laboratory of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China.
| | - Chenxuan Hu
- CAS Key Laboratory of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China.
| | - Hanbin Ma
- CAS Key Laboratory of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China; Guangdong ACXEL Micro & Nano Tech Co., Ltd, Foshan, China.
| | - Bao Chai
- Department of Dermatology, Huazhong University of Science and Technology Union Shenzhen Hospital, Shenzhen, China; Department of Dermatology, The 6th Affiliated Hospital of Shenzhen University Health Science Center, Shenzhen, China.
| | - Siyi Hu
- CAS Key Laboratory of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China.
| | - Yujuan Chai
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China.
| |
Collapse
|
49
|
Liu DD, Cheow LF. Rapid Information Retrieval from DNA Storage with Microfluidic Very Large-Scale Integration Platform. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023:e2309867. [PMID: 38048539 DOI: 10.1002/smll.202309867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 11/09/2023] [Indexed: 12/06/2023]
Abstract
Due to its high information density, DNA is very attractive as a data storage system. However, a major obstacle is the high cost and long turnaround time for retrieving DNA data with next-generation sequencing. Herein, the use of a microfluidic very large-scale integration (mVLSI) platform is described to perform highly parallel and rapid readout of data stored in DNA. Additionally, it is demonstrated that multi-state data encoded in DNA can be deciphered with on-chip melt-curve analysis, thereby further increasing the data content that can be analyzed. The pairing of mVLSI network architecture with exquisitely specific DNA recognition gives rise to a scalable platform for rapid DNA data reading.
Collapse
Affiliation(s)
- Dong Dong Liu
- Department of Biomedical Engineering and Institute for Health Innovation and Technology, National University of Singapore, Singapore, 119077, Singapore
| | - Lih Feng Cheow
- Department of Biomedical Engineering and Institute for Health Innovation and Technology, National University of Singapore, Singapore, 119077, Singapore
| |
Collapse
|
50
|
Rekadwad BN, Shouche YS, Jangid K. Investigation of tRNA-based relatedness within the Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) superphylum: a comparative analysis. Arch Microbiol 2023; 205:366. [PMID: 37917352 DOI: 10.1007/s00203-023-03694-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/23/2023] [Accepted: 09/27/2023] [Indexed: 11/04/2023]
Abstract
The PVC superphylum is a diverse group of prokaryotes that require stringent growth conditions. RNA is a fascinating molecule to find evolutionary relatedness according to the RNA World Hypothesis. We conducted tRNA gene analysis to find evolutionary relationships in the PVC phyla. The analysis of genomic data (P = 9, V = 4, C = 8) revealed that the number of tRNA genes varied from 28 to 90 in Planctomycetes and Chlamydia, respectively. Verrucomicrobia has whole genomes and the longest scaffold (3 + 1), with tRNA genes ranging from 49 to 53 in whole genomes and 4 in the longest scaffold. Most tRNAs in the E. coli genome clustered with homologs, but approximately 43% clustered with tRNAs encoding different amino acids. Planctomyces, Akkermansia, Isosphaera, and Chlamydia were similar to E. coli tRNAs. In a phylum, tRNAs coding for different amino acids clustered at a range of 8 to 10%. Further analysis of these tRNAs showed sequence similarity with Cyanobacteria, Proteobacteria, Viridiplantae, Ascomycota and Basidiomycota (Eukaryota). This indicates the possibility of horizontal gene transfer or, otherwise, a different origin of tRNA in PVC bacteria. Hence, this work proves its importance for determining evolutionary relatedness and potentially identifying bacteria using tRNA. Thus, the analysis of these tRNAs indicates that primitive RNA may have served as the genetic material of LUCA before being replaced by DNA. A quantitative analysis is required to test these possibilities that relate the evolutionary significance of tRNA to the origin of life.
Collapse
Affiliation(s)
- Bhagwan Narayan Rekadwad
- National Centre for Microbial Resource (NCMR), DBT-National Centre for Cell Science (DBT-NCCS), Saviribai Phule Pune University Campus, Ganeshkhind, Pune, 411007, Maharashtra, India.
- Microbe AI Lab, Division of Microbiology and Biotechnology, Yenepoya Research Centre, Yenepoya (Deemed to Be University), Mangalore, 575018, Karnataka, India.
| | - Yogesh S Shouche
- National Centre for Microbial Resource (NCMR), DBT-National Centre for Cell Science (DBT-NCCS), Saviribai Phule Pune University Campus, Ganeshkhind, Pune, 411007, Maharashtra, India
- Gut Microbiology Research Division, SKAN Research Trust, Bangalore, 560034, Karnataka, India
| | - Kamlesh Jangid
- Bioenergy Group, DST-Agharkar Research Institute, Gopal Ganesh Agarkar Road, Pune, 411004, Maharashtra, India
| |
Collapse
|