1
|
Schwarz PM, Freisleben B. Optimizing fountain codes for DNA data storage. Comput Struct Biotechnol J 2024; 23:3878-3896. [PMID: 39559773 PMCID: PMC11570749 DOI: 10.1016/j.csbj.2024.10.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 10/22/2024] [Accepted: 10/22/2024] [Indexed: 11/20/2024] Open
Abstract
Fountain codes, originally developed for reliable multicasting in communication networks, are effectively applied in various data transmission and storage systems. Their recent use in DNA data storage systems has unique challenges, since the DNA storage channel deviates from the traditional Gaussian white noise erasure model considered in communication networks and has several restrictions as well as special properties. Thus, optimizing fountain codes to address these challenges promises to improve their overall usability in DNA data storage systems. In this article, we present several methods for optimizing fountain codes for DNA data storage. Apart from generally applicable optimizations for fountain codes, we propose optimization algorithms to create tailored distribution functions of fountain codes, which is novel in the context of DNA data storage. We evaluate the proposed methods in terms of various metrics related to the DNA storage channel. Our evaluation shows that optimizing fountain codes for DNA data storage can significantly enhance the reliability and capacity of DNA data storage systems. The developed methods represent a step forward in harnessing the full potential of fountain codes for DNA-based data storage applications. The new coding schemes and all developed methods are available under a free and open-source software license.
Collapse
Affiliation(s)
- Peter Michael Schwarz
- Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| | - Bernd Freisleben
- Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| |
Collapse
|
2
|
Xie X, Wang S, Chen Z, Yu Y, Hu X, Ma N, Ji M, Tian Y. Exploring DNA Computers: Advances in Storage, Cryptography and Logic Circuits. Chembiochem 2024:e202400670. [PMID: 39365708 DOI: 10.1002/cbic.202400670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 09/20/2024] [Accepted: 10/04/2024] [Indexed: 10/06/2024]
Abstract
Over the last four decades, research on DNA as a functional material has primarily focused on its predictable conformation and programmable interaction. However, its low energy consumption, high responsiveness and sensitivity also make it ideal for designing specific signaling pathways, and enabling the development of molecular computers. This review mainly discusses recent advancements in the utilization of DNA nanotechnology for molecular computer, encompassing applications in storage, cryptography and logic circuits. It elucidates the challenges encountered in the application process and presents solutions exemplified by representative works. Lastly, it delineates the challenges and opportunities within this filed.
Collapse
Affiliation(s)
- Xiaolin Xie
- College of Engineering and Applied Sciences, State Key Laboratory of Analytical Chemistry for Life Science, National Laboratory of Solid State Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Chemistry and Biomedicine Innovation Center (ChemBIC), ChemBioMed Interdisciplinary Research Center at Nanjing University, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210023, China
| | - Shuang Wang
- State Key Laboratory of Marine Food Processing & Safety Control, College of Food Science and Engineering, Ocean University of China, Qingdao, 266404, China
| | - Zhi Chen
- College of Engineering and Applied Sciences, State Key Laboratory of Analytical Chemistry for Life Science, National Laboratory of Solid State Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Chemistry and Biomedicine Innovation Center (ChemBIC), ChemBioMed Interdisciplinary Research Center at Nanjing University, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210023, China
| | - Yifan Yu
- College of Engineering and Applied Sciences, State Key Laboratory of Analytical Chemistry for Life Science, National Laboratory of Solid State Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Chemistry and Biomedicine Innovation Center (ChemBIC), ChemBioMed Interdisciplinary Research Center at Nanjing University, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210023, China
| | - Xiaoxue Hu
- College of Engineering and Applied Sciences, State Key Laboratory of Analytical Chemistry for Life Science, National Laboratory of Solid State Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Chemistry and Biomedicine Innovation Center (ChemBIC), ChemBioMed Interdisciplinary Research Center at Nanjing University, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210023, China
| | - Ningning Ma
- College of Engineering and Applied Sciences, State Key Laboratory of Analytical Chemistry for Life Science, National Laboratory of Solid State Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Chemistry and Biomedicine Innovation Center (ChemBIC), ChemBioMed Interdisciplinary Research Center at Nanjing University, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210023, China
| | - Min Ji
- College of Engineering and Applied Sciences, State Key Laboratory of Analytical Chemistry for Life Science, National Laboratory of Solid State Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Chemistry and Biomedicine Innovation Center (ChemBIC), ChemBioMed Interdisciplinary Research Center at Nanjing University, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210023, China
| | - Ye Tian
- College of Engineering and Applied Sciences, State Key Laboratory of Analytical Chemistry for Life Science, National Laboratory of Solid State Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Chemistry and Biomedicine Innovation Center (ChemBIC), ChemBioMed Interdisciplinary Research Center at Nanjing University, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210023, China
| |
Collapse
|
3
|
Xu J, Wang Y, Chen X, Wang L, Zhou H, Mei H, Chen S, Huang X. "Multi-layer" encryption of medical data in DNA for highly-secure storage. Mater Today Bio 2024; 28:101221. [PMID: 39309163 PMCID: PMC11415972 DOI: 10.1016/j.mtbio.2024.101221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/21/2024] [Accepted: 08/31/2024] [Indexed: 09/25/2024] Open
Abstract
The exponential increasement and the attributes of medical data drive the requirement for secure medical data archiving. DNA data storage shows promise for storing sensitive and important data like medical records due to its high density and endurance. Nevertheless, current DNA data storage working scheme generally does not fully consider the data encryption, posing a risk of data corruption by routine DNA sequencing. Here, we designed a "multi-layer" encryption pipeline for medical data archiving. Initially, digital information was encrypted using Blowfish algorithm at information technology (IT) layer, followed by two-layer data encryption at the biotechnology (BT) layer. The first BT layer exploited the molecular weight of synthetic DNA or nucleoside to encrypt the key, while the second BT layer encrypted digital information within DNA sequences. Consequently, decryption involved layer-by-layer interpretation of data, including mass spectroscopy, sequencing, and Blowfish decryption, significantly enhancing data security. Utilizing mass spectroscopy to retrieve information allows for employment of both natural and unnatural nucleosides, as well as their synthetic oligonucleotides, for data storage, thereby considerably boosting scalability. Our work implies expanded flexibility of DNA-based data storage, highlighting the potential for leveraging various physical and chemical characteristics of DNA molecules to encode and access digital information.
Collapse
Affiliation(s)
- Jiaxin Xu
- Department of Pulmonary and Critical Care Medicine, Institute of Respiratory Diseases, Post-doctoral Scientific Research Station of Basic Medicine, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, 518020, Guangdong, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Yu Wang
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Xue Chen
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Lingwei Wang
- Department of Pulmonary and Critical Care Medicine, Institute of Respiratory Diseases, Post-doctoral Scientific Research Station of Basic Medicine, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, 518020, Guangdong, China
| | - Haibo Zhou
- College of Pharmacy, Jinan University, Guangzhou, Guangdong, 510632, China
| | - Hui Mei
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Shanze Chen
- Department of Pulmonary and Critical Care Medicine, Institute of Respiratory Diseases, Post-doctoral Scientific Research Station of Basic Medicine, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital of Southern University of Science and Technology), Shenzhen, 518020, Guangdong, China
| | - Xiaoluo Huang
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| |
Collapse
|
4
|
Cao B, Wang K, Xie L, Zhang J, Zhao Y, Wang B, Zheng P. PELMI: Realize robust DNA image storage under general errors via parity encoding and local mean iteration. Brief Bioinform 2024; 25:bbae463. [PMID: 39288232 PMCID: PMC11407442 DOI: 10.1093/bib/bbae463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 09/01/2024] [Accepted: 09/04/2024] [Indexed: 09/19/2024] Open
Abstract
DNA molecules as storage media are characterized by high encoding density and low energy consumption, making DNA storage a highly promising storage method. However, DNA storage has shortcomings, especially when storing multimedia data, wherein image reconstruction fails when address errors occur, resulting in complete data loss. Therefore, we propose a parity encoding and local mean iteration (PELMI) scheme to achieve robust DNA storage of images. The proposed parity encoding scheme satisfies the common biochemical constraints of DNA sequences and the undesired motif content. It addresses varying pixel weights at different positions for binary data, thus optimizing the utilization of Reed-Solomon error correction. Then, through lost and erroneous sequences, data supplementation and local mean iteration are employed to enhance the robustness. The encoding results show that the undesired motif content is reduced by 23%-50% compared with the representative schemes, which improves the sequence stability. PELMI achieves image reconstruction under general errors (insertion, deletion, substitution) and enhances the DNA sequences quality. Especially under 1% error, compared with other advanced encoding schemes, the peak signal-to-noise ratio and the multiscale structure similarity address metric were increased by 10%-13% and 46.8%-122%, respectively, and the mean squared error decreased by 113%-127%. This demonstrates that the reconstructed images had better clarity, fidelity, and similarity in structure, texture, and detail. In summary, PELMI ensures robustness and stability of image storage in DNA and achieves relatively high-quality image reconstruction under general errors.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning 116024, China
| | - Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Lei Xie
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Jianxia Zhang
- School of Intelligent Engineering, Henan Institute of Technology, No. 90, East Hualan Avenue, Hongqi District, Xinxiang, Henan 451191, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, Upper Riccarton, Christchurch 8140, New Zealand
| |
Collapse
|
5
|
Leblanc J, Boulle O, Roux E, Nicolas J, Lavenier D, Audic Y. Fully in vitro iterative construction of a 24 kb-long artificial DNA sequence to store digital information. Biotechniques 2024; 76:203-215. [PMID: 38573592 DOI: 10.2144/btn-2023-0109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024] Open
Abstract
In the absence of a DNA template, the ab initio production of long double-stranded DNA molecules of predefined sequences is particularly challenging. The DNA synthesis step remains a bottleneck for many applications such as functional assessment of ancestral genes, analysis of alternative splicing or DNA-based data storage. In this report we propose a fully in vitro protocol to generate very long double-stranded DNA molecules starting from commercially available short DNA blocks in less than 3 days using Golden Gate assembly. This innovative application allowed us to streamline the process to produce a 24 kb-long DNA molecule storing part of the Declaration of the Rights of Man and of the Citizen of 1789 . The DNA molecule produced can be readily cloned into a suitable host/vector system for amplification and selection.
Collapse
Affiliation(s)
- Julien Leblanc
- University Rennes, Inria, CNRS, IRISA, Campus de Beaulieu, Rennes, France
| | - Olivier Boulle
- University Rennes, Inria, CNRS, IRISA, Campus de Beaulieu, Rennes, France
| | - Emeline Roux
- Institut NuMeCan, INRAE, INSERM, University Rennes, France
| | - Jacques Nicolas
- University Rennes, Inria, CNRS, IRISA, Campus de Beaulieu, Rennes, France
| | | | - Yann Audic
- CNRS, University Rennes, Institut de Génétique et Développement de Rennes (IGDR) UMR 6290, Rennes, France
| |
Collapse
|
6
|
Gomes CP, Martins AGC, Nunes SE, Ramos B, Wisinewski HR, Reis JLMS, Lima AP, Aoyagi TY, Goncales I, Maia DS, Tunussi AS, Menossi MS, Pereira SM, Turrini PCG, Gervasio JHDB, Verona BM, Cerize NNP. Coding, Decoding and Retrieving a Message Using DNA: An Experience from a Brazilian Center Research on DNA Data Storage. MICROMACHINES 2024; 15:474. [PMID: 38675287 PMCID: PMC11051810 DOI: 10.3390/mi15040474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024]
Abstract
DNA data storage based on synthetic oligonucleotides is a major attraction due to the possibility of storage over long periods. Nowadays, the quantity of data generated has been growing exponentially, and the storage capacity needs to keep pace with the growth caused by new technologies and globalization. Since DNA can hold a large amount of information with a high density and remains stable for hundreds of years, this technology offers a solution for current long-term data centers by reducing energy consumption and physical storage space. Currently, research institutes, technology companies, and universities are making significant efforts to meet the growing need for data storage. DNA data storage is a promising field, especially with the advancement of sequencing techniques and equipment, which now make it possible to read genomes (i.e., to retrieve the information) and process this data easily. To overcome the challenges associated with developing new technologies for DNA data storage, a message encoding and decoding exercise was conducted at a Brazilian research center. The exercise performed consisted of synthesizing oligonucleotides by the phosphoramidite route. An encoded message, using a coding scheme that adheres to DNA sequence constraints, was synthesized. After synthesis, the oligonucleotide was sequenced and decoded, and the information was fully recovered.
Collapse
Affiliation(s)
- Caio P. Gomes
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - André G. C. Martins
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Sabrina E. Nunes
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Bruno Ramos
- Microfluidic & Photoelectrocatalytic Engineering Group, Department of Chemical Engineering, FEI University Center, São Bernardo do Campo 09850-901, SP, Brazil;
| | - Henrique R. Wisinewski
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - João L. M. S. Reis
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Ariel P. Lima
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Thiago Y. Aoyagi
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Icaro Goncales
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Danilo S. Maia
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Ariane S. Tunussi
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Marília S. Menossi
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Sergio M. Pereira
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Paula C. G. Turrini
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - João H. D. B. Gervasio
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Bruno M. Verona
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| | - Natalia N. P. Cerize
- Bionanomanufacturing Center, Institute for Technological Research—IPT, Sao Paulo 05508-901, SP, Brazil; (A.G.C.M.); (S.E.N.); (H.R.W.); (J.L.M.S.R.); (A.P.L.); (T.Y.A.); (I.G.); (D.S.M.); (A.S.T.); (M.S.M.); (S.M.P.J.); (P.C.G.T.); (B.M.V.); (N.N.P.C.)
| |
Collapse
|
7
|
Wang K, Cao B, Ma T, Zhao Y, Zheng Y, Wang B, Zhou S, Zhang Q. Storing Images in DNA via base128 Encoding. J Chem Inf Model 2024; 64:1719-1729. [PMID: 38385334 DOI: 10.1021/acs.jcim.3c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Current DNA storage schemes lack flexibility and consistency in processing highly redundant and correlated image data, resulting in low sequence stability and image reconstruction rates. Therefore, according to the characteristics of image storage, this paper proposes storing images in DNA via base128 encoding (DNA-base128). In the data writing stage, data segmentation and probability statistics are carried out, and then, the data block frequency and constraint encoding set are associated with achieving encoding. When the image needs to be recovered, DNA-base128 completes internal error correction by threshold setting and drift comparison. Compared with representative work, the DNA-base128 encoding results show that the undesired motifs were reduced by 71.2-90.7% and that the local guanine-cytosine content variance was reduced by 3 times, indicating that DNA-base128 can store images more stably. In addition, the structural similarity index (SSIM) and multiscale structural similarity (MS-SSIM) of image reconstruction using DNA-base128 were improved by 19-102 and 6.6-20.3%, respectively. In summary, DNA-base128 provides image encoding with internal error correction and provides a potential solution for DNA image storage. The data and code are available at the GitHub repository: https://github.com/123456wk/DNA_base128.
Collapse
Affiliation(s)
- Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Ma
- Brain Function Research Section, China Medical University, Shenyang 110001, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Shihua Zhou
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| |
Collapse
|
8
|
Lin W, Chu L, Su Y, Xie R, Yao X, Zan X, Xu P, Liu W. Limit and screen sequences with high degree of secondary structures in DNA storage by deep learning method. Comput Biol Med 2023; 166:107548. [PMID: 37801922 DOI: 10.1016/j.compbiomed.2023.107548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 08/24/2023] [Accepted: 09/28/2023] [Indexed: 10/08/2023]
Abstract
BACKGROUND In single-stranded DNAs/RNAs, secondary structures are very common especially in long sequences. It has been recognized that the high degree of secondary structures in DNA sequences could interfere with the correct writing and reading of information in DNA storage. However, how to circumvent its side-effect is seldom studied. METHOD As the degree of secondary structures of DNA sequences is closely related to the magnitude of the free energy released in the complicated folding process, we first investigate the free-energy distribution at different encoding lengths based on randomly generated DNA sequences. Then, we construct a bidirectional long short-term (BiLSTM)-attention deep learning model to predict the free energy of sequences. RESULTS Our simulation results indicate that the free energy of DNA sequences at a specific length follows a right skewed distribution and the mean increases as the length increases. Given a tolerable free energy threshold of 20 kcal/mol, we could control the ratio of serious secondary structures in the encoding sequences to within 1% of the significant level through selecting a feasible encoding length of 100 nt. Compared with traditional deep learning models, the proposed model could achieve a better prediction performance both in the mean relative error (MRE) and the coefficient of determination (R2). It achieved MRE = 0.109 and R2 = 0.918 respectively in the simulation experiment. The combination of the BiLSTM and attention module can handle the long-term dependencies and capture the feature of base pairing. Further, the prediction has a linear time complexity which is suitable for detecting sequences with severe secondary structures in future large-scale applications. Finally, 70 of 94 predicted free energy can be screened out on a real dataset. It demonstrates that the proposed model could screen out some highly suspicious sequences which are prone to produce more errors and low sequencing copies.
Collapse
Affiliation(s)
- Wanmin Lin
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ling Chu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Yanqing Su
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Ranze Xie
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Xiangyu Yao
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Xiangzhen Zan
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China
| | - Peng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China; School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China.
| | - Wenbin Liu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, Guangdong, China.
| |
Collapse
|
9
|
Cao B, Wang B, Zhang Q. GCNSA: DNA storage encoding with a graph convolutional network and self-attention. iScience 2023; 26:106231. [PMID: 36876131 PMCID: PMC9982308 DOI: 10.1016/j.isci.2023.106231] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 01/31/2023] [Accepted: 02/14/2023] [Indexed: 02/22/2023] Open
Abstract
DNA Encoding, as a key step in DNA storage, plays an important role in reading and writing accuracy and the storage error rate. However, currently, the encoding efficiency is not high enough and the encoding speed is not fast enough, which limits the performance of DNA storage systems. In this work, a DNA storage encoding system with a graph convolutional network and self-attention (GCNSA) is proposed. The experimental results show that DNA storage code constructed by GCNSA increases by 14.4% on average under the basic constraints, and by 5%-40% under other constraints. The increase of DNA storage codes effectively improves the storage density of 0.7-2.2% in the DNA storage system. The GCNSA predicted more DNA storage codes in less time while ensuring the quality of codes, which lays a foundation for higher read and write efficiency in DNA storage.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
10
|
Mollon JD, Danilova MV, Zhuravlev AV. A possible mechanism of neural read-out from a molecular engram. Neurobiol Learn Mem 2023; 200:107748. [PMID: 36907505 DOI: 10.1016/j.nlm.2023.107748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 03/04/2023] [Accepted: 03/06/2023] [Indexed: 03/12/2023]
Abstract
What is the physical basis of declarative memory? The predominant view holds that stored information is embedded in the structure of a neural net, that is, in the signs and weights of its synaptic connections. An alternative possibility is that storage and processing are separated, and that the engram is encoded chemically, most probably in the sequence of a nucleic acid. One deterrent to adoption of the latter hypothesis has been the difficulty of envisaging how neural actively could be converted to and from a molecular code. Our purpose here is limited to suggesting how a molecular sequence could be read out from nucleic acid to neural activity by means of nanopores.
Collapse
Affiliation(s)
- J D Mollon
- Department of Psychology, University of Cambridge, Downing St., Cambridge CB2 3EB, United Kingdom.
| | - M V Danilova
- Department of Psychology, University of Cambridge, Downing St., Cambridge CB2 3EB, United Kingdom; I.P. Pavlov Institute of Physiology, nab Makarova 6, 199034 St Petersburg, Russian Federation
| | - A V Zhuravlev
- I.P. Pavlov Institute of Physiology, nab Makarova 6, 199034 St Petersburg, Russian Federation
| |
Collapse
|
11
|
Ezekannagha C, Welzel M, Heider D, Hattab G. DNAsmart: Multiple attribute ranking tool for DNA data storage systems. Comput Struct Biotechnol J 2023; 21:1448-1460. [PMID: 36851917 PMCID: PMC9957737 DOI: 10.1016/j.csbj.2023.02.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/07/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
In an ever-growing need for data storage capacity, the Deoxyribonucleic Acid (DNA) molecule gains traction as a new storage medium with a larger capacity, higher density, and a longer lifespan over conventional storage media. To effectively use DNA for data storage, it is important to understand the different methods of encoding information in DNA and compare their effectiveness. This requires evaluating which decoded DNA sequences carry the most encoded information based on various attributes. However, navigating the field of coding theory requires years of experience and domain expertise. For instance, domain experts rely on various mathematical functions and attributes to score and evaluate their encodings. To enable such analytical tasks, we provide an interactive and visual analytical framework for multi-attribute ranking in DNA storage systems. Our framework follows a three-step view with user-settable parameters. It enables users to find the optimal en-/de-coding approaches by setting different weights and combining multiple attributes. We assess the validity of our work through a task-specific user study on domain experts by relying on three tasks. Results indicate that all participants completed their tasks successfully under two minutes, then rated the framework for design choices, perceived usefulness, and intuitiveness. In addition, two real-world use cases are shared and analyzed as direct applications of the proposed tool. DNAsmart enables the ranking of decoded sequences based on multiple attributes. In sum, this work unveils the evaluation of en-/de-coding approaches accessible and tractable through visualization and interactivity to solve comparison and ranking tasks.
Collapse
Affiliation(s)
- Chisom Ezekannagha
- Department of Mathematics and Computer Science, Philipps-Universität, Hans-Meerwein-Str. 6, Marburg D-35043, Germany
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Karl-von-Frisch-Str. 14, Marburg D-35043, Germany
| | - Marius Welzel
- Department of Mathematics and Computer Science, Philipps-Universität, Hans-Meerwein-Str. 6, Marburg D-35043, Germany
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Karl-von-Frisch-Str. 14, Marburg D-35043, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-Universität, Hans-Meerwein-Str. 6, Marburg D-35043, Germany
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Karl-von-Frisch-Str. 14, Marburg D-35043, Germany
| | - Georges Hattab
- Department of Mathematics and Computer Science, Philipps-Universität, Hans-Meerwein-Str. 6, Marburg D-35043, Germany
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Karl-von-Frisch-Str. 14, Marburg D-35043, Germany
| |
Collapse
|