1
|
Bi K, Xu Q, Lai X, Zhao X, Lu Z. Multi-file dynamic compression method based on classification algorithm in DNA storage. Med Biol Eng Comput 2024; 62:3623-3635. [PMID: 38922373 DOI: 10.1007/s11517-024-03156-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 06/17/2024] [Indexed: 06/27/2024]
Abstract
The exponential growth in data volume has necessitated the adoption of alternative storage solutions, and DNA storage stands out as the most promising solution. However, the exorbitant costs associated with synthesis and sequencing impeded its development. Pre-compressing the data is recognized as one of the most effective approaches for reducing storage costs. However, different compression methods yield varying compression ratios for the same file, and compressing a large number of files with a single method may not achieve the maximum compression ratio. This study proposes a multi-file dynamic compression method based on machine learning classification algorithms that selects the appropriate compression method for each file to minimize the amount of data stored into DNA as much as possible. Firstly, four different compression methods are applied to the collected files. Subsequently, the optimal compression method is selected as a label, as well as the file type and size are used as features, which are put into seven machine learning classification algorithms for training. The results demonstrate that k-nearest neighbor outperforms other machine learning algorithms on the validation set and test set most of the time, achieving an accuracy rate of over 85% and showing less volatility. Additionally, the compression rate of 30.85% can be achieved according to k-nearest neighbor model, more than 4.5% compared to the traditional single compression method, resulting in significant cost savings for DNA storage in the range of $0.48 to 3 billion/TB. In comparison to the traditional compression method, the multi-file dynamic compression method demonstrates a more significant compression effect when compressing multiple files. Therefore, it can considerably decrease the cost of DNA storage and facilitate the widespread implementation of DNA storage technology.
Collapse
Affiliation(s)
- Kun Bi
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China.
| | - Qi Xu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
| | - Xin Lai
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
- Southeast University - Monash University Joint Graduate School, 215123, Suzhou, China
| | - Xiangwei Zhao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
- Southeast University - Monash University Joint Graduate School, 215123, Suzhou, China
| | - Zuhong Lu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, 210096, Nanjing, China
| |
Collapse
|
2
|
Xu Q, Ma Y, Lu Z, Bi K. DP-ID: Interleaving and Denoising to Improve the Quality of DNA Storage Image. Interdiscip Sci 2024:10.1007/s12539-024-00671-6. [PMID: 39578306 DOI: 10.1007/s12539-024-00671-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 10/22/2024] [Accepted: 10/23/2024] [Indexed: 11/24/2024]
Abstract
In the field of storing images into DNA, the code tables and universal error correction codes have the potential to mitigate the effect of base errors to a certain extent. However, they prove to be ineffective in dealing with indels (insertion and deletion errors), resulting in a decline in information density and the quality of reconstructed image. This paper proposes a novel encoding and decoding method named DP-ID for storing images into DNA that improves information density and the quality of reconstructed image. Firstly, the image is compressed as bitstreams by the dynamic programming algorithm. Secondly, the bitstreams obtained are mapped to DNA, which are then interleaved. The reconstructed image is obtained by applying median filtering to remove salt-and-pepper noise. Simulation results show the reconstructed image by DP-ID at 5% error rate is better than that by other methods at 1% error rate. This robustness to high errors is compatible with the unsatisfied biological constraints caused by high information density. Wet experiments show that DP-ID can reconstruct high quality image at 5X sequencing depth. The high information density and low sequencing depth significantly reduce the cost of DNA storage, facilitating the large-scale storage of images into DNA.
Collapse
Affiliation(s)
- Qi Xu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Yitong Ma
- Monash University Joint Graduate School, Southeast University, Suzhou, 215123, China
| | - Zuhong Lu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Kun Bi
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China.
| |
Collapse
|
3
|
Cao B, Wang K, Xie L, Zhang J, Zhao Y, Wang B, Zheng P. PELMI: Realize robust DNA image storage under general errors via parity encoding and local mean iteration. Brief Bioinform 2024; 25:bbae463. [PMID: 39288232 PMCID: PMC11407442 DOI: 10.1093/bib/bbae463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 09/01/2024] [Accepted: 09/04/2024] [Indexed: 09/19/2024] Open
Abstract
DNA molecules as storage media are characterized by high encoding density and low energy consumption, making DNA storage a highly promising storage method. However, DNA storage has shortcomings, especially when storing multimedia data, wherein image reconstruction fails when address errors occur, resulting in complete data loss. Therefore, we propose a parity encoding and local mean iteration (PELMI) scheme to achieve robust DNA storage of images. The proposed parity encoding scheme satisfies the common biochemical constraints of DNA sequences and the undesired motif content. It addresses varying pixel weights at different positions for binary data, thus optimizing the utilization of Reed-Solomon error correction. Then, through lost and erroneous sequences, data supplementation and local mean iteration are employed to enhance the robustness. The encoding results show that the undesired motif content is reduced by 23%-50% compared with the representative schemes, which improves the sequence stability. PELMI achieves image reconstruction under general errors (insertion, deletion, substitution) and enhances the DNA sequences quality. Especially under 1% error, compared with other advanced encoding schemes, the peak signal-to-noise ratio and the multiscale structure similarity address metric were increased by 10%-13% and 46.8%-122%, respectively, and the mean squared error decreased by 113%-127%. This demonstrates that the reconstructed images had better clarity, fidelity, and similarity in structure, texture, and detail. In summary, PELMI ensures robustness and stability of image storage in DNA and achieves relatively high-quality image reconstruction under general errors.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning 116024, China
| | - Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Lei Xie
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Jianxia Zhang
- School of Intelligent Engineering, Henan Institute of Technology, No. 90, East Hualan Avenue, Hongqi District, Xinxiang, Henan 451191, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, No. 10 Xuefu Street, Dalian Economic-Technological Development Zone, Dalian, Liaoning 116622, China
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, Upper Riccarton, Christchurch 8140, New Zealand
| |
Collapse
|
4
|
Aqeel S, Khan SU, Khan AS, Alharbi M, Shah S, Affendi ME, Ahmad N. DNA encoding schemes herald a new age in cybersecurity for safeguarding digital assets. Sci Rep 2024; 14:13839. [PMID: 38879689 PMCID: PMC11180196 DOI: 10.1038/s41598-024-64419-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 06/09/2024] [Indexed: 06/19/2024] Open
Abstract
With the urge to secure and protect digital assets, there is a need to emphasize the immediacy of taking measures to ensure robust security due to the enhancement of cyber security. Different advanced methods, like encryption schemes, are vulnerable to putting constraints on attacks. To encode the digital data and utilize the unique properties of DNA, like stability and durability, synthetic DNA sequences are offered as a promising alternative by DNA encoding schemes. This study enlightens the exploration of DNA's potential for encoding in evolving cyber security. Based on the systematic literature review, this paper provides a discussion on the challenges, pros, and directions for future work. We analyzed the current trends and new innovations in methodology, security attacks, the implementation of tools, and different metrics to measure. Various tools, such as Mathematica, MATLAB, NIST test suite, and Coludsim, were employed to evaluate the performance of the proposed method and obtain results. By identifying the strengths and limitations of proposed methods, the study highlights research challenges and offers future scope for investigation.
Collapse
Affiliation(s)
- Sehrish Aqeel
- Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300, Kota Samarahan, Malaysia
| | - Sajid Ullah Khan
- Department of Information Systems, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, AlKharj, Kingdom of Saudi Arabia.
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia.
| | - Adnan Shahid Khan
- Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300, Kota Samarahan, Malaysia
| | - Meshal Alharbi
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Sajid Shah
- EIAS Lab, CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| | | | - Naveed Ahmad
- College of Computer Information Sciences, CCIS, Prince Sultan University, Riyadh, Saudi Arabia
| |
Collapse
|
5
|
Wang K, Cao B, Ma T, Zhao Y, Zheng Y, Wang B, Zhou S, Zhang Q. Storing Images in DNA via base128 Encoding. J Chem Inf Model 2024; 64:1719-1729. [PMID: 38385334 DOI: 10.1021/acs.jcim.3c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Current DNA storage schemes lack flexibility and consistency in processing highly redundant and correlated image data, resulting in low sequence stability and image reconstruction rates. Therefore, according to the characteristics of image storage, this paper proposes storing images in DNA via base128 encoding (DNA-base128). In the data writing stage, data segmentation and probability statistics are carried out, and then, the data block frequency and constraint encoding set are associated with achieving encoding. When the image needs to be recovered, DNA-base128 completes internal error correction by threshold setting and drift comparison. Compared with representative work, the DNA-base128 encoding results show that the undesired motifs were reduced by 71.2-90.7% and that the local guanine-cytosine content variance was reduced by 3 times, indicating that DNA-base128 can store images more stably. In addition, the structural similarity index (SSIM) and multiscale structural similarity (MS-SSIM) of image reconstruction using DNA-base128 were improved by 19-102 and 6.6-20.3%, respectively. In summary, DNA-base128 provides image encoding with internal error correction and provides a potential solution for DNA image storage. The data and code are available at the GitHub repository: https://github.com/123456wk/DNA_base128.
Collapse
Affiliation(s)
- Kun Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Ma
- Brain Function Research Section, China Medical University, Shenyang 110001, China
| | - Yunzhu Zhao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Shihua Zhou
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| |
Collapse
|
6
|
Zheng Y, Cao B, Wu J, Wang B, Zhang Q. High Net Information Density DNA Data Storage by the MOPE Encoding Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2992-3000. [PMID: 37015121 DOI: 10.1109/tcbb.2023.3263521] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
DNA has recently been recognized as an attractive storage medium due to its high reliability, capacity, and durability. However, encoding algorithms that simply map binary data to DNA sequences have the disadvantages of low net information density and high synthesis cost. Therefore, this paper proposes an efficient, feasible, and highly robust encoding algorithm called MOPE (Modified Barnacles Mating Optimizer and Payload Encoding). The Modified Barnacles Mating Optimizer (MBMO) algorithm is used to construct the non-payload coding set, and the Payload Encoding (PE) algorithm is used to encode the payload. The results show that the lower bound of the non-payload coding set constructed by the MBMO algorithm is 3%-18% higher than the optimal result of previous work, and theoretical analysis shows that the designed PE algorithm has a net information density of 1.90 bits/nt, which is close to the ideal information capacity of 2 bits per nucleotide. The proposed MOPE encoding algorithm with high net information density and satisfying constraints can not only effectively reduce the cost of DNA synthesis and sequencing but also reduce the occurrence of errors during DNA storage.
Collapse
|