1
|
Zhou Y, Bi K, Ge Q, Lu Z. Advances and Challenges in Random Access Techniques for In Vitro DNA Data Storage. ACS APPLIED MATERIALS & INTERFACES 2024; 16:43102-43113. [PMID: 39110103 DOI: 10.1021/acsami.4c07235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
With digital transformation and the general application of new technologies, data storage is facing new challenges with the demand for high-density loading of massive information. In response, DNA storage technology has emerged as a promising research direction. Efficient and reliable data retrieval is critical for DNA storage, and the development of random access technology plays a key role in its practicality and reliability. However, achieving fast and accurate random access functions has proven difficult for existing DNA storage efforts, which limits its practical applications in industry. In this review, we summarize the recent advances in DNA storage technology that enable random access functionality, as well as the challenges that need to be overcome and the current solutions. This review aims to help researchers in the field of DNA storage better understand the importance of the random access step and its impact on the overall development of DNA storage. Furthermore, the remaining challenges and future research trends in random access technology of DNA storage are discussed, with the goal of providing a solid foundation for achieving random access in DNA storage under large-scale data conditions.
Collapse
Affiliation(s)
- Ying Zhou
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, China
| | - Kun Bi
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
2
|
Li W, Miller D, Liu X, Tosi L, Chkaiban L, Mei H, Hung PH, Parekkadan B, Sherlock G, Levy S. Arrayed in vivo barcoding for multiplexed sequence verification of plasmid DNA and demultiplexing of pooled libraries. Nucleic Acids Res 2024; 52:e47. [PMID: 38709890 PMCID: PMC11162764 DOI: 10.1093/nar/gkae332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/23/2024] [Accepted: 04/16/2024] [Indexed: 05/08/2024] Open
Abstract
Sequence verification of plasmid DNA is critical for many cloning and molecular biology workflows. To leverage high-throughput sequencing, several methods have been developed that add a unique DNA barcode to individual samples prior to pooling and sequencing. However, these methods require an individual plasmid extraction and/or in vitro barcoding reaction for each sample processed, limiting throughput and adding cost. Here, we develop an arrayed in vivo plasmid barcoding platform that enables pooled plasmid extraction and library preparation for Oxford Nanopore sequencing. This method has a high accuracy and recovery rate, and greatly increases throughput and reduces cost relative to other plasmid barcoding methods or Sanger sequencing. We use in vivo barcoding to sequence verify >45 000 plasmids and show that the method can be used to transform error-containing dispersed plasmid pools into sequence-perfect arrays or well-balanced pools. In vivo barcoding does not require any specialized equipment beyond a low-overhead Oxford Nanopore sequencer, enabling most labs to flexibly process hundreds to thousands of plasmids in parallel.
Collapse
Affiliation(s)
- Weiyi Li
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Darach Miller
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Xianan Liu
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Lorenzo Tosi
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Lamia Chkaiban
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Han Mei
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| | - Po-Hsiang Hung
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Biju Parekkadan
- Department of Biomedical Engineering, Rutgers University, Piscataway, NJ, USA
| | - Gavin Sherlock
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Sasha F Levy
- SLAC National Accelerator Laboratory, Stanford University, Stanford, CA, USA
| |
Collapse
|
3
|
Yang S, Bögels BWA, Wang F, Xu C, Dou H, Mann S, Fan C, de Greef TFA. DNA as a universal chemical substrate for computing and data storage. Nat Rev Chem 2024; 8:179-194. [PMID: 38337008 DOI: 10.1038/s41570-024-00576-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2024] [Indexed: 02/12/2024]
Abstract
DNA computing and DNA data storage are emerging fields that are unlocking new possibilities in information technology and diagnostics. These approaches use DNA molecules as a computing substrate or a storage medium, offering nanoscale compactness and operation in unconventional media (including aqueous solutions, water-in-oil microemulsions and self-assembled membranized compartments) for applications beyond traditional silicon-based computing systems. To build a functional DNA computer that can process and store molecular information necessitates the continued development of strategies for computing and data storage, as well as bridging the gap between these fields. In this Review, we explore how DNA can be leveraged in the context of DNA computing with a focus on neural networks and compartmentalized DNA circuits. We also discuss emerging approaches to the storage of data in DNA and associated topics such as the writing, reading, retrieval and post-synthesis editing of DNA-encoded data. Finally, we provide insights into how DNA computing can be integrated with DNA data storage and explore the use of DNA for near-memory computing for future information technology and health analysis applications.
Collapse
Affiliation(s)
- Shuo Yang
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Bas W A Bögels
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Fei Wang
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Can Xu
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Hongjing Dou
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China
| | - Stephen Mann
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.
- Zhangjiang Institute for Advanced Study (ZIAS), Shanghai Jiao Tong University, Shanghai, China.
- Centre for Protolife Research and Centre for Organized Matter Chemistry, School of Chemistry, University of Bristol, Bristol, UK.
- Max Planck-Bristol Centre for Minimal Biology, School of Chemistry, University of Bristol, Bristol, UK.
| | - Chunhai Fan
- School of Chemistry and Chemical Engineering, New Cornerstone Science Laboratory, Frontiers Science Center for Transformative Molecules and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acids Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Tom F A de Greef
- Laboratory of Chemical Biology, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Institute for Complex Molecular Systems (ICMS), Eindhoven University of Technology, Eindhoven, The Netherlands.
- Computational Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
- Institute for Molecules and Materials, Radboud University, Nijmegen, The Netherlands.
- Center for Living Technologies, Eindhoven-Wageningen-Utrecht Alliance, Utrecht, The Netherlands.
| |
Collapse
|
4
|
Lu X, Kim S. Weakly mutually uncorrelated codes with maximum run length constraint for DNA storage. Comput Biol Med 2023; 165:107439. [PMID: 37678135 DOI: 10.1016/j.compbiomed.2023.107439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 08/14/2023] [Accepted: 08/29/2023] [Indexed: 09/09/2023]
Abstract
DNA storage systems have begun to attract considerable attention as next-generation storage technologies due to their high densities and longevity. However, efficient primer design for random-access in synthesized DNA strands is still an issue that needs to be solved. Although previous studies have explored various constraints for primer design in DNA storage systems, there is no attention paid to the combination of weakly mutually uncorrelated codes with the maximum run length constraint. In this paper, we first propose a code design by combining weakly mutually uncorrelated codes with the maximum run length constraint. Moreover, we also explore the weakly mutually uncorrelated codes to satisfy combinations of maximum run length constraint with more constraints such as being almost-balanced and having large Hamming distance, which are also efficient constraints for random-access in DNA storage systems. To guarantee that the proposed codes can be adapted to primer design with variable length, we present modified code construction methods to achieve different lengths of the code. Then, we provide an analysis of the size of the proposed codes, which indicates the capacity to support primer design. Finally, we compare the codes with those of previous works to show that the proposed codes can always guarantee the maximum run length constraint, which is helpful for random-access for DNA storage.
Collapse
Affiliation(s)
- Xiaozhou Lu
- Department of Electrical, Electronic, and Computer Engineering, University of Ulsan, Ulsan, 44610, South Korea.
| | - Sunghwan Kim
- Department of Electrical, Electronic, and Computer Engineering, University of Ulsan, Ulsan, 44610, South Korea.
| |
Collapse
|
5
|
Lau B, Chandak S, Roy S, Tatwawadi K, Wootters M, Weissman T, Ji HP. Magnetic DNA random access memory with nanopore readouts and exponentially-scaled combinatorial addressing. Sci Rep 2023; 13:8514. [PMID: 37231057 DOI: 10.1038/s41598-023-29575-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 02/07/2023] [Indexed: 05/27/2023] Open
Abstract
The storage of data in DNA typically involves encoding and synthesizing data into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption of synthesized DNA, basecalling errors, and limitations with scaling up read operations for individual data elements. Addressing these challenges, we describe a DNA storage system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive and efficient readouts of targeted files with nanopore-based sequencing. By conjugating synthesized DNA to magnetic agarose beads, we enabled repeated data readouts while preserving the original DNA analyte and maintaining data readout quality. MDRAM utilizes an efficient convolutional coding scheme that leverages soft information in raw nanopore sequencing signals to achieve information reading costs comparable to Illumina sequencing despite higher error rates. Finally, we demonstrate a proof-of-concept DNA-based proto-filesystem that enables an exponentially-scalable data address space using only small numbers of targeting primers for assembly and readout.
Collapse
Affiliation(s)
- Billy Lau
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
| | - Shubham Chandak
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA
| | - Sharmili Roy
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Kedar Tatwawadi
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA
| | - Mary Wootters
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA.
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA.
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA.
| |
Collapse
|
6
|
Doricchi A, Platnich CM, Gimpel A, Horn F, Earle M, Lanzavecchia G, Cortajarena AL, Liz-Marzán LM, Liu N, Heckel R, Grass RN, Krahne R, Keyser UF, Garoli D. Emerging Approaches to DNA Data Storage: Challenges and Prospects. ACS NANO 2022; 16:17552-17571. [PMID: 36256971 PMCID: PMC9706676 DOI: 10.1021/acsnano.2c06748] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
With the total amount of worldwide data skyrocketing, the global data storage demand is predicted to grow to 1.75 × 1014 GB by 2025. Traditional storage methods have difficulties keeping pace given that current storage media have a maximum density of 103 GB/mm3. As such, data production will far exceed the capacity of currently available storage methods. The costs of maintaining and transferring data, as well as the limited lifespans and significant data losses associated with current technologies also demand advanced solutions for information storage. Nature offers a powerful alternative through the storage of information that defines living organisms in unique orders of four bases (A, T, C, G) located in molecules called deoxyribonucleic acid (DNA). DNA molecules as information carriers have many advantages over traditional storage media. Their high storage density, potentially low maintenance cost, ease of synthesis, and chemical modification make them an ideal alternative for information storage. To this end, rapid progress has been made over the past decade by exploiting user-defined DNA materials to encode information. In this review, we discuss the most recent advances of DNA-based data storage with a major focus on the challenges that remain in this promising field, including the current intrinsic low speed in data writing and reading and the high cost per byte stored. Alternatively, data storage relying on DNA nanostructures (as opposed to DNA sequence) as well as on other combinations of nanomaterials and biomolecules are proposed with promising technological and economic advantages. In summarizing the advances that have been made and underlining the challenges that remain, we provide a roadmap for the ongoing research in this rapidly growing field, which will enable the development of technological solutions to the global demand for superior storage methodologies.
Collapse
Affiliation(s)
- Andrea Doricchi
- Istituto
Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy
- Dipartimento
di Chimica e Chimica Industriale, Università
di Genova, via Dodecaneso
31, 16146 Genova, Italy
| | - Casey M. Platnich
- Cavendish
Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - Andreas Gimpel
- Institute
for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland
| | - Friederikee Horn
- Technical
University of Munich, Department of Electrical
and Computer Engineering Munchen, Bayern, DE 80333, Germany
| | - Max Earle
- Cavendish
Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - German Lanzavecchia
- Istituto
Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy
- Dipartimento
di Fisica, Università di Genova, via Dodecaneso 33, 16146 Genova, Italy
| | - Aitziber L. Cortajarena
- Center
for Cooperative Research in Biomaterials (CICbiomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramón 194, 20014 Donostia-San Sebastián, Spain
- Ikerbasque, Basque
Foundation for Science, 48009 Bilbao, Spain
| | - Luis M. Liz-Marzán
- Center
for Cooperative Research in Biomaterials (CICbiomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramón 194, 20014 Donostia-San Sebastián, Spain
- Ikerbasque, Basque
Foundation for Science, 48009 Bilbao, Spain
- Biomedical
Research Networking Center in Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Av. Monforte de Lemos, 3-5. Pabellón 11.
Planta 0, 28029 Madrid, Spain
| | - Na Liu
- Second
Physics Institute, University of Stuttgart, 70569 Stuttgart, Germany
- Max Planck Institute for Solid State Research, 70569 Stuttgart, Germany
| | - Reinhard Heckel
- Technical
University of Munich, Department of Electrical
and Computer Engineering Munchen, Bayern, DE 80333, Germany
| | - Robert N. Grass
- Institute
for Chemical and Bioengineering, ETH Zurich, Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland
| | - Roman Krahne
- Istituto
Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy
| | - Ulrich F. Keyser
- Cavendish
Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - Denis Garoli
- Istituto
Italiano di Tecnologia, via Morego 30, I-16163 Genova, Italy
| |
Collapse
|