1
|
Zhang J. Levy Sooty Tern Optimization Algorithm Builds DNA Storage Coding Sets for Random Access. ENTROPY (BASEL, SWITZERLAND) 2024; 26:778. [PMID: 39330111 PMCID: PMC11431215 DOI: 10.3390/e26090778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/02/2024] [Accepted: 09/05/2024] [Indexed: 09/28/2024]
Abstract
DNA molecules, as a storage medium, possess unique advantages. Not only does DNA storage exhibit significantly higher storage density compared to electromagnetic storage media, but it also features low energy consumption and extremely long storage times. However, the integration of DNA storage into daily life remains distant due to challenges such as low storage density, high latency, and inevitable errors during the storage process. Therefore, this paper proposes constructing a DNA storage coding set based on the Levy Sooty Tern Optimization Algorithm (LSTOA) to achieve an efficient random-access DNA storage system. Firstly, addressing the slow iteration speed and susceptibility to local optima of the Sooty Tern Optimization Algorithm (STOA), this paper introduces Levy flight operations and propose the LSTOA. Secondly, utilizing the LSTOA, this paper constructs a DNA storage encoding set to facilitate random access while meeting combinatorial constraints. To demonstrate the coding performance of the LSTOA, this paper consists of analyses on 13 benchmark test functions, showcasing its superior performance. Furthermore, under the same combinatorial constraints, the LSTOA constructs larger DNA storage coding sets, effectively reducing the read-write latency and error rate of DNA storage.
Collapse
Affiliation(s)
- Jianxia Zhang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang 453003, China
- School of Intelligent Engineering, Henan Institute of Technology, Xinxiang 453003, China
| |
Collapse
|
2
|
Kim JW, Jeong J, Kwak HY, No JS. Design of DNA Storage Coding Scheme With LDPC Codes and Interleaving. IEEE Trans Nanobioscience 2024; 23:447-457. [PMID: 38512749 DOI: 10.1109/tnb.2024.3379976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
In this paper, we propose a new coding scheme for DNA storage using low-density parity-check (LDPC) codes and interleaving techniques. While conventional coding schemes generally employ error correcting codes in both inter and intra-oligo directions, we show that inter-oligo LDPC codes, optimized by differential evolution, are sufficient in ensuring the reliability of DNA storage due to the powerful soft decoding of LDPC codes. In addition, we apply interleaving techniques for handling non-uniform error characteristics of DNA storage to enhance the decoding performance. Consequently, the proposed coding scheme reduces the required number of oligo reads for perfect recovery by 26.25% ~ 38.5% compared to existing state-of-the-art coding schemes. Moreover, we develop an analytical DNA channel model in terms of non-uniform binary symmetric channels. This mathematical model allows us to demonstrate the superiority of the proposed coding scheme while isolating the experimental variation, as well as confirm the independent effects of LDPC codes and interleaving techniques.
Collapse
|
3
|
Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024; 43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open
Abstract
Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China; Centre for Frontier AI Research, Agency for Science, Technology, and Research (A(∗)STAR), 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| | - Qi Shao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Zhenlu Liu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Yunzhu Zhao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China.
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| |
Collapse
|
4
|
Zhang X, Zhou F. An Encoding Table Corresponding to ASCII Codes for DNA Data Storage and a New Error Correction Method HMSA. IEEE Trans Nanobioscience 2024; 23:344-354. [PMID: 38252580 DOI: 10.1109/tnb.2024.3356522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
DNA storage stands out from other storage media due to its high capacity, eco-friendliness, long lifespan, high stability, low energy consumption, and low data maintenance costs. To standardize the DNA encoding system, maintain consistency in character representation and transmission, and link binary, base, and character together, this paper combines the encoding method with ASCII code to construct an ASCII-DNA encoding table. The encoding method can encode not only pure text information but also audio and video information and satisfies the GC content constraint and the homopolymer constraint, with the encoding density reaching 1.4 bits/nt. In particular, when encoding textual information, it directly skips the binary conversion process, which reduces the complexity of encoding, and increasing the encoding density to 1.6 bits/nt. In order to solve the problem of errors in sequences, under the influence of heuristic algorithms, this paper proposes a new error correction method (HMSA) by combining minimum Hamming distance, multiple sequence alignment, and encoding scheme. It can correct not only substitution, insertion, and deletion errors in Reads but also consecutive errors in Reads. It greatly improves the utilization of the Reads and avoids the waste of resources. Simulation results show that the recovery rate of Reads increases with the increasing number of sequencing times. When the number of erroneous bases in a 150nt sequence reaches 5nt, the error correction rate can exceed 96% by sequencing the base sequence only 10 times regardless of whether the errors are consecutive or not. Additionally, the HMSA error correction method is applicable to all coding schemes for lookup code table types.
Collapse
|
5
|
Jeong J, Park H, Kwak HY, No JS, Jeon H, Lee JW, Kim JW. Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding. IEEE Trans Nanobioscience 2024; 23:81-90. [PMID: 37294652 DOI: 10.1109/tnb.2023.3284406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3% ∼ 7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.
Collapse
|
6
|
Rasool A, Hong J, Jiang Q, Chen H, Qu Q. BO-DNA: Biologically optimized encoding model for a highly-reliable DNA data storage. Comput Biol Med 2023; 165:107404. [PMID: 37666064 DOI: 10.1016/j.compbiomed.2023.107404] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/13/2023] [Accepted: 08/26/2023] [Indexed: 09/06/2023]
Abstract
DNA data storage is a promising technology that utilizes computer simulation, and synthetic biology, offering high-density and reliable digital information storage. It is challenging to store massive data in a small amount of DNA without losing the original data since nonspecific hybridization errors occur frequently and severely affect the reliability of stored data. This study proposes a novel biologically optimized encoding model for DNA data storage (BO-DNA) to overcome the reliability problem. BO-DNA model is developed by a new rule-based mapping method to avoid data drop during the transcoding of binary data to premier nucleotides. A customized optimization algorithm based on a tent chaotic map is applied to maximize the lower bounds that help to minimize the nonspecific hybridization errors. The robustness of BO-DNA is computed by four bio-constraints to confirm the reliability of newly generated DNA sequences. Experimentally, different medical images are encoded and decoded successfully with 12%-59% improved lower bounds and optimally constrained-based DNA sequences reported with 1.77bit/nt average density. BO-DNA's results demonstrate substantial advantages in constructing reliable DNA data storage.
Collapse
Affiliation(s)
- Abdur Rasool
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Jingwei Hong
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
| | - Qingshan Jiang
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Hui Chen
- Shenzhen Polytechnic University, Shenzhen, 518055, Guangdong, China
| | - Qiang Qu
- Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
7
|
Mu Z, Cao B, Wang P, Wang B, Zhang Q. RBS: A Rotational Coding Based on Blocking Strategy for DNA Storage. IEEE Trans Nanobioscience 2023; 22:912-922. [PMID: 37028365 DOI: 10.1109/tnb.2023.3254514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
The data volume of global information has grown exponentially in recent years, but the development of silicon-based memory has entered a bottleneck period. Deoxyribonucleic acid (DNA) storage is drawing attention owing to its advantages of high storage density, long storage time, and easy maintenance. However, the base utilization and information density of existing DNA storage methods are insufficient. Therefore, this study proposes a rotational coding based on blocking strategy (RBS) for encoding digital information such as text and images in DNA data storage. This strategy satisfies multiple constraints and produces low error rates in synthesis and sequencing. To illustrate the superiority of the proposed strategy, it was compared and analyzed with existing strategies in terms of entropy value change, free energy size, and Hamming distance. The experimental results show that the proposed strategy has higher information storage density and better coding quality in DNA storage, so it will improve the efficiency, practicality, and stability of DNA storage.
Collapse
|
8
|
Zheng Y, Cao B, Wu J, Wang B, Zhang Q. High Net Information Density DNA Data Storage by the MOPE Encoding Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2992-3000. [PMID: 37015121 DOI: 10.1109/tcbb.2023.3263521] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
DNA has recently been recognized as an attractive storage medium due to its high reliability, capacity, and durability. However, encoding algorithms that simply map binary data to DNA sequences have the disadvantages of low net information density and high synthesis cost. Therefore, this paper proposes an efficient, feasible, and highly robust encoding algorithm called MOPE (Modified Barnacles Mating Optimizer and Payload Encoding). The Modified Barnacles Mating Optimizer (MBMO) algorithm is used to construct the non-payload coding set, and the Payload Encoding (PE) algorithm is used to encode the payload. The results show that the lower bound of the non-payload coding set constructed by the MBMO algorithm is 3%-18% higher than the optimal result of previous work, and theoretical analysis shows that the designed PE algorithm has a net information density of 1.90 bits/nt, which is close to the ideal information capacity of 2 bits per nucleotide. The proposed MOPE encoding algorithm with high net information density and satisfying constraints can not only effectively reduce the cost of DNA synthesis and sequencing but also reduce the occurrence of errors during DNA storage.
Collapse
|
9
|
Mortuza GM, Guerrero J, Llewellyn S, Tobiason MD, Dickinson GD, Hughes WL, Zadegan R, Andersen T. In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA). BMC Bioinformatics 2023; 24:160. [PMID: 37085766 PMCID: PMC10120115 DOI: 10.1186/s12859-023-05264-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 03/30/2023] [Indexed: 04/23/2023] Open
Abstract
Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments.
Collapse
Affiliation(s)
- Golam Md Mortuza
- Department of Computer Science, Boise State University, Boise, Idaho USA
| | - Jorge Guerrero
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University, Greensboro, NC USA
| | | | | | | | - William L. Hughes
- School of Engineering, Kelowna, University of British Columbia, Kelowna, British Columbia Canada
| | - Reza Zadegan
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University, Greensboro, NC USA
| | - Tim Andersen
- Department of Computer Science, Boise State University, Boise, Idaho USA
| |
Collapse
|
10
|
Cao B, Wang B, Zhang Q. GCNSA: DNA storage encoding with a graph convolutional network and self-attention. iScience 2023; 26:106231. [PMID: 36876131 PMCID: PMC9982308 DOI: 10.1016/j.isci.2023.106231] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 01/31/2023] [Accepted: 02/14/2023] [Indexed: 02/22/2023] Open
Abstract
DNA Encoding, as a key step in DNA storage, plays an important role in reading and writing accuracy and the storage error rate. However, currently, the encoding efficiency is not high enough and the encoding speed is not fast enough, which limits the performance of DNA storage systems. In this work, a DNA storage encoding system with a graph convolutional network and self-attention (GCNSA) is proposed. The experimental results show that DNA storage code constructed by GCNSA increases by 14.4% on average under the basic constraints, and by 5%-40% under other constraints. The increase of DNA storage codes effectively improves the storage density of 0.7-2.2% in the DNA storage system. The GCNSA predicted more DNA storage codes in less time while ensuring the quality of codes, which lays a foundation for higher read and write efficiency in DNA storage.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
11
|
FMG: An observable DNA storage coding method based on frequency matrix game graphs. Comput Biol Med 2022; 151:106269. [PMID: 36356390 DOI: 10.1016/j.compbiomed.2022.106269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/20/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022]
Abstract
Using complex biomolecules for storage is a new carbon-based storage method. For example, DNA has the potential to be a good method for archival long-term data storage. Reasonable and efficient coding is the first and most important step in DNA storage. However, current coding methods, such as altruism algorithm, have the problem of low coding efficiency and high complexity, and coding constraints and sets make it difficult to see the coding results visually. In this study, a new DNA storage coding method based on frequency matrix game graph (FMG) is proposed to generate DNA storage coding satisfying combinatorial constraints. Compared with the randomness of the heuristic algorithm that satisfies the constraints, the coding method based on the FMG is deterministic and can clearly explain the coding process. In addition, the constraints and coding results have observable characteristics and are better than the previously published results for the size of the coding set. For example, when length of the code n = 10, hamming distance d = 4, the results obtained by proposed approach combining chaos game and graph are 24% better than the previous results. The proposed coding scheme successfully constructs high-quality coding sets with less complexity, which effectively promotes the development of carbon-based storage coding.
Collapse
|
12
|
Zhang J. Levy Equilibrium Optimizer algorithm for the DNA storage code set. PLoS One 2022; 17:e0277139. [PMID: 36395269 PMCID: PMC9671426 DOI: 10.1371/journal.pone.0277139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/10/2022] [Indexed: 11/18/2022] Open
Abstract
The generation of massive data puts forward higher requirements for storage technology. DNA storage is a new storage technology which uses biological macromolecule DNA as information carrier. Compared with traditional silicon-based storage, DNA storage has the advantages of large capacity, high density, low energy consumption and high durability. DNA coding is to store data information with as few base sequences as possible without errors. Coding is a key technology in DNA storage, and its results directly affect the performance of storage and the integrity of data reading and writing. In this paper, a Levy Equilibrium Optimizer (LEO) algorithm is proposed to construct a DNA storage code set that satisfies combinatorial constraints. The performance of the proposed algorithm is tested on 13 benchmark functions, and 4 new global optima are obtained. Under the same constraints, the DNA storage code set is constructed. Compared with previous work, the lower bound of DNA storage code set is improved by 4-13%.
Collapse
Affiliation(s)
- Jianxia Zhang
- School of Intelligent Engineering, Henan Institute of Technology, Xinxiang, China
- * E-mail:
| |
Collapse
|
13
|
Wang S, Zhou S, Yan W. An enhanced whale optimization algorithm for DNA storage encoding. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:14142-14172. [PMID: 36654084 DOI: 10.3934/mbe.2022659] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Metaheuristic algorithms have the drawback that local optimal solutions are prone to precocious convergence. In order to overcome the disadvantages of the whale optimization algorithm, we propose an improved selective opposition whale optimization algorithm (ISOWOA) in this paper. Firstly, the enhanced quasi-opposition learning (EQOBL) is applied to selectively update the position of the predator, calculate the fitness of the population before and after, and retain optimal individuals as the food source position; Secondly, an improved time-varying update strategy for inertia weight predator position is proposed, and the position update of the food source is completed by this strategy. The performance of the algorithm is analyzed by 23 benchmark functions of CEC 2005 and 15 benchmark functions of CEC 2015 in various dimensions. The superior results are further shown by Wilcoxon's rank sum test and Friedman's nonparametric rank test. Finally, its applicability is demonstrated through applications to the field of biological computing. In this paper, our aim is to achieve access to DNA files and designs high-quantity DNA code sets by ISOWOA. The experimental results show that the lower bounds of the multi-constraint storage coding sets implemented in this paper equals or surpasses that of previous optimal constructions. The data show that the amount of the DNA storage cods filtered by ISOWOA increased 2-18%, which demonstrates the algorithm's reliability in practical optimization tasks.
Collapse
Affiliation(s)
- Sijie Wang
- Key laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Shihua Zhou
- Key laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Weiqi Yan
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| |
Collapse
|
14
|
Bencurova E, Shityakov S, Schaack D, Kaltdorf M, Sarukhanyan E, Hilgarth A, Rath C, Montenegro S, Roth G, Lopez D, Dandekar T. Nanocellulose Composites as Smart Devices With Chassis, Light-Directed DNA Storage, Engineered Electronic Properties, and Chip Integration. Front Bioeng Biotechnol 2022; 10:869111. [PMID: 36105598 PMCID: PMC9465592 DOI: 10.3389/fbioe.2022.869111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 06/24/2022] [Indexed: 11/13/2022] Open
Abstract
The rapid development of green and sustainable materials opens up new possibilities in the field of applied research. Such materials include nanocellulose composites that can integrate many components into composites and provide a good chassis for smart devices. In our study, we evaluate four approaches for turning a nanocellulose composite into an information storage or processing device: 1) nanocellulose can be a suitable carrier material and protect information stored in DNA. 2) Nucleotide-processing enzymes (polymerase and exonuclease) can be controlled by light after fusing them with light-gating domains; nucleotide substrate specificity can be changed by mutation or pH change (read-in and read-out of the information). 3) Semiconductors and electronic capabilities can be achieved: we show that nanocellulose is rendered electronic by iodine treatment replacing silicon including microstructures. Nanocellulose semiconductor properties are measured, and the resulting potential including single-electron transistors (SET) and their properties are modeled. Electric current can also be transported by DNA through G-quadruplex DNA molecules; these as well as classical silicon semiconductors can easily be integrated into the nanocellulose composite. 4) To elaborate upon miniaturization and integration for a smart nanocellulose chip device, we demonstrate pH-sensitive dyes in nanocellulose, nanopore creation, and kinase micropatterning on bacterial membranes as well as digital PCR micro-wells. Future application potential includes nano-3D printing and fast molecular processors (e.g., SETs) integrated with DNA storage and conventional electronics. This would also lead to environment-friendly nanocellulose chips for information processing as well as smart nanocellulose composites for biomedical applications and nano-factories.
Collapse
Affiliation(s)
- Elena Bencurova
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Sergey Shityakov
- Laboratory of Chemoinformatics, Infochemistry Scientific Center, ITMO University, Saint Petersburg, Russia
| | - Dominik Schaack
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Martin Kaltdorf
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Edita Sarukhanyan
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Alexander Hilgarth
- Aerospace Information Technology, University of Würzburg, Würzburg, Germany
| | - Christin Rath
- Laboratory for Microarray Copying, Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Sergio Montenegro
- Aerospace Information Technology, University of Würzburg, Würzburg, Germany
| | - Günter Roth
- Laboratory for Microarray Copying, Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
- BioCopy GmbH, Emmendingen, Germany
| | - Daniel Lopez
- Centro Nacional de Biotecnologia CNB, Universidad Autonoma de Madrid, Madrid, Spain
| | - Thomas Dandekar
- Functional Genomics and Systems Biology Group, Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
- Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany
- *Correspondence: Thomas Dandekar,
| |
Collapse
|
15
|
Adaptive coding for DNA storage with high storage density and low coverage. NPJ Syst Biol Appl 2022; 8:23. [PMID: 35788589 PMCID: PMC9253015 DOI: 10.1038/s41540-022-00233-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 06/10/2022] [Indexed: 11/09/2022] Open
Abstract
The rapid development of information technology has generated substantial data, which urgently requires new storage media and storage methods. DNA, as a storage medium with high density, high durability, and ultra-long storage time characteristics, is promising as a potential solution. However, DNA storage is still in its infancy and suffers from low space utilization of DNA strands, high read coverage, and poor coding coupling. Therefore, in this work, an adaptive coding DNA storage system is proposed to use different coding schemes for different coding region locations, and the method of adaptively generating coding constraint thresholds is used to optimize at the system level to ensure the efficient operation of each link. Images, videos, and PDF files of size 698 KB were stored in DNA using adaptive coding algorithms. The data were sequenced and losslessly decoded into raw data. Compared with previous work, the DNA storage system implemented by adaptive coding proposed in this paper has high storage density and low read coverage, which promotes the development of carbon-based storage systems.
Collapse
|
16
|
Ezekannagha C, Becker A, Heider D, Hattab G. Design considerations for advancing data storage with synthetic DNA for long-term archiving. Mater Today Bio 2022; 15:100306. [PMID: 35677811 PMCID: PMC9167972 DOI: 10.1016/j.mtbio.2022.100306] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 05/05/2022] [Accepted: 05/22/2022] [Indexed: 11/22/2022]
Abstract
Deoxyribonucleic acid (DNA) is increasingly emerging as a serious medium for long-term archival data storage because of its remarkable high-capacity, high-storage-density characteristics and its lasting ability to store data for thousands of years. Various encoding algorithms are generally required to store digital information in DNA and to maintain data integrity. Indeed, since DNA is the information carrier, its performance under different processing and storage conditions significantly impacts the capabilities of the data storage system. Therefore, the design of a DNA storage system must meet specific design considerations to be less error-prone, robust and reliable. In this work, we summarize the general processes and technologies employed when using synthetic DNA as a storage medium. We also share the design considerations for sustainable engineering to include viability. We expect this work to provide insight into how sustainable design can be used to develop an efficient and robust synthetic DNA-based storage system for long-term archiving.
Collapse
Affiliation(s)
- Chisom Ezekannagha
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
- Corresponding author.
| | - Anke Becker
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Karl-von-Frisch-Str. 14, D-35043, Marburg, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| | - Georges Hattab
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, D-35043, Marburg, Germany
| |
Collapse
|
17
|
Liu X, Zhang Q, Zhang X, Liu Y, Yao Y, Kasabov N. Construction of Multiple Logic Circuits Based on Allosteric DNAzymes. Biomolecules 2022; 12:biom12040495. [PMID: 35454084 PMCID: PMC9032175 DOI: 10.3390/biom12040495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 03/20/2022] [Accepted: 03/21/2022] [Indexed: 11/22/2022] Open
Abstract
In DNA computing, the implementation of complex and stable logic operations in a universal system is a critical challenge. It is necessary to develop a system with complex logic functions based on a simple mechanism. Here, the strategy to control the secondary structure of assembled DNAzymes’ conserved domain is adopted to regulate the activity of DNAzymes and avoid the generation of four-way junctions, and makes it possible to implement basic logic gates and their cascade circuits in the same system. In addition, the purpose of threshold control achieved by the allosteric secondary structure implements a three-input DNA voter with one-vote veto function. The scalability of the system can be remarkably improved by adjusting the threshold to implement a DNA voter with 2n + 1 inputs. The proposed strategy provides a feasible idea for constructing more complex DNA circuits and a highly integrated computing system.
Collapse
Affiliation(s)
- Xin Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
- Correspondence: ; Tel.: +86-0411-84708470
| | - Xun Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
| | - Yuan Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
| | - Yao Yao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China; (X.L.); (X.Z.); (Y.L.); (Y.Y.)
| | - Nikola Kasabov
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1010, New Zealand;
- Intelligent Systems Research Center, Ulster University, Londonderry BT52 1SA, UK
| |
Collapse
|
18
|
Fu H, Lv H, Zhang Q. Using entropy-driven amplifier circuit response to build nonlinear model under the influence of Lévy jump. BMC Bioinformatics 2022; 22:437. [PMID: 35057730 PMCID: PMC8772049 DOI: 10.1186/s12859-021-04331-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 08/23/2021] [Indexed: 02/06/2023] Open
Abstract
Background Bioinformatics is a subject produced by the combination of life science and computer science. It mainly uses computer technology to study the laws of biological systems. The design and realization of DNA circuit reaction is one of the important contents of bioinformatics. Results In this paper, nonlinear dynamic system model with Lévy jump based on entropy-driven amplifier (EDA) circuit response is studied. Firstly, nonlinear biochemical reaction system model is established based on EDA circuit response. Considering the influence of disturbance factors on the system, nonlinear biochemical reaction system with Lévy jump is built. Secondly, in order to prove that the constructed system conforms to the actual meaning, the existence and uniqueness of the system solution is analyzed. Next, the sufficient conditions for the end and continuation of EDA circuit reaction are certified. Finally, the correctness of the theoretical results is proved by numerical simulation, and the reactivity of THTSignal in EDA circuit under different noise intensity is verified. Conclusions In EDA circuit reaction, the intensity of external noise has a significant impact on the system. The end of EDA circuit reaction is closely related to the intensity of Lévy noise, and Lévy jump has a significant impact on the nature of biochemical reaction system.
Collapse
|
19
|
Zhang P, Wei Z, Che C, Jin B. DeepMGT-DTI: Transformer network incorporating multilayer graph information for Drug-Target interaction prediction. Comput Biol Med 2022; 142:105214. [PMID: 35030496 DOI: 10.1016/j.compbiomed.2022.105214] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 12/26/2021] [Accepted: 01/02/2022] [Indexed: 12/29/2022]
Abstract
Drug-target interaction (DTI) prediction reduces the cost and time of drug development, and plays a vital role in drug discovery. However, most of research does not fully explore the molecular structures of drug compounds in DTI prediction. To this end, we propose a deep learning model to capture the molecular structure information of drug compounds for DTI prediction. This model utilizes a transformer network incorporating multilayer graph information, which captures the features of a drug's molecular structure so that the interactions between atoms of drug compounds can be explored more deeply. At the same time, a convolutional neural network is employed to capture the local residue information in the target sequence, and effectively extract the feature information of the target. The experiments on the DrugBank dataset showed that the proposed model outperformed previous models based on the structure of target sequences. The results indicate that the improved transformer network fuses the feature information between layers in the graph convolutional neural network and extracts the interaction data for the molecular structure. The drug repositioning experiment on COVID-19 and Alzheimer's disease demonstrated the proposed model's ability to find therapeutic drugs in drug discovery. The code of our model is available at https://github.com/zhangpl109/DeepMGT-DTI.
Collapse
Affiliation(s)
- Peiliang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing (Dalian University), Ministry of Education, Dalian, 116622, China.
| | - Ziqi Wei
- School of Software, Tsinghua University, Beijing, 100084, China.
| | - Chao Che
- Key Laboratory of Advanced Design and Intelligent Computing (Dalian University), Ministry of Education, Dalian, 116622, China.
| | - Bo Jin
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, 116024, China.
| |
Collapse
|
20
|
Xing C, Zheng X, Zhang Q. Constructing DNA logic circuits based on the toehold preemption mechanism. RSC Adv 2021; 12:338-345. [PMID: 35424506 PMCID: PMC8978688 DOI: 10.1039/d1ra08687a] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 12/14/2021] [Indexed: 11/21/2022] Open
Abstract
Strand displacement technology and ribozyme digestion technology have enriched the intelligent toolbox of molecular computing and provided more methods for the construction of DNA logic circuits. In recent years, DNA logic circuits have developed rapidly, and their scalability and accuracy in molecular computing and information processing have been fully demonstrated. However, existing DNA logic circuits still have some problems such as high complexity of DNA strands (number of DNA strands) hindering the expansion of practical computing tasks. In view of the above problems, we presented a toehold preemption mechanism and applied it to construct DNA logic circuits using E6-type DNAzymes, such as half adder circuit, half subtractor circuit, and 4-bit square root logic circuit. Different from the dual-track logic expressions, all the signals in the circuits of this study were monorail which substantially reduced the number of DNA strands in the DNA logic circuits. The presented preemption mechanism provides a way to simplify the implementation of large and complex DNA integrated circuits.
Collapse
Affiliation(s)
- Cuicui Xing
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education Dalian 116622 China
| | - Xuedong Zheng
- College of Computer Science, Shenyang Aerospace University Shenyang 110136 China
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education Dalian 116622 China
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| |
Collapse
|
21
|
Xu S, Liu Y, Zhou S, Zhang Q, Kasabov NK. DNA Matrix Operation Based on the Mechanism of the DNAzyme Binding to Auxiliary Strands to Cleave the Substrate. Biomolecules 2021; 11:1797. [PMID: 34944442 PMCID: PMC8698824 DOI: 10.3390/biom11121797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/21/2021] [Accepted: 11/27/2021] [Indexed: 11/16/2022] Open
Abstract
Numerical computation is a focus of DNA computing, and matrix operations are among the most basic and frequently used operations in numerical computation. As an important computing tool, matrix operations are often used to deal with intensive computing tasks. During calculation, the speed and accuracy of matrix operations directly affect the performance of the entire computing system. Therefore, it is important to find a way to perform matrix calculations that can ensure the speed of calculations and improve the accuracy. This paper proposes a DNA matrix operation method based on the mechanism of the DNAzyme binding to auxiliary strands to cleave the substrate. In this mechanism, the DNAzyme binding substrate requires the connection of two auxiliary strands. Without any of the two auxiliary strands, the DNAzyme does not cleave the substrate. Based on this mechanism, the multiplication operation of two matrices is realized; the two types of auxiliary strands are used as elements of the two matrices, to participate in the operation, and then are combined with the DNAzyme to cut the substrate and output the result of the matrix operation. This research provides a new method of matrix operations and provides ideas for more complex computing systems.
Collapse
Affiliation(s)
- Shaoxia Xu
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Dalian 116622, China;
| | - Yuan Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China;
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Dalian 116622, China;
| | - Qiang Zhang
- Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Dalian 116622, China;
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China;
| | - Nikola K. Kasabov
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1010, New Zealand;
- Intelligent Systems Research Center, Ulster University, Londonderry BT52 1SA, UK
| |
Collapse
|
22
|
RDFNet: A Fast Caries Detection Method Incorporating Transformer Mechanism. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:9773917. [PMID: 34804198 PMCID: PMC8598360 DOI: 10.1155/2021/9773917] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 10/25/2021] [Indexed: 11/17/2022]
Abstract
Dental caries is a prevalent disease of the human oral cavity. Given the lack of research on digital images for caries detection, we construct a caries detection dataset based on the caries images annotated by professional dentists and propose RDFNet, a fast caries detection method for the requirement of detecting caries on portable devices. The method incorporates the transformer mechanism in the backbone network for feature extraction, which improves the accuracy of caries detection and uses the FReLU activation function for activating visual-spatial information to improve the speed of caries detection. The experimental results on the image dataset constructed in this study show that the accuracy and speed of the method for caries detection are improved compared with the existing methods, achieving a good balance in accuracy and speed of caries detection, which can be applied to smart portable devices to facilitate human dental health management.
Collapse
|
23
|
Wu J, Zheng Y, Wang B, Zhang Q. Enhancing Physical and Thermodynamic Properties of DNA Storage Sets with End-constraint. IEEE Trans Nanobioscience 2021; 21:184-193. [PMID: 34662278 DOI: 10.1109/tnb.2021.3121278] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the explosion of data, DNA is considered as an ideal carrier for storage due to its high storage density. However, low-quality DNA sets hamper the widespread use of DNA storage. This work proposes a new method to design high-quality DNA storage sets. Firstly, random switch and double-weight offspring strategies are introduced in Double-strategy Black Widow Optimization Algorithm (DBWO). Experimental results of 26 benchmark functions show that the exploration and exploitation abilities of DBWO are greatly improved from previous work. Secondly, DBWO is applied in designing DNA storage sets, and compared with previous work, the lower bounds of storage sets are boosted by 9%-37%. Finally, to improve the poor stabilities of sequences, the End-constraint is proposed in designing DNA storage sets. By measuring the number of hairpin structures, melting temperature, and minimum free energy, it is evaluated that with our innovative constraint, DBWO can construct not only a larger number of storage sets, but also enhance physical and thermodynamic properties of DNA storage sets.
Collapse
|
24
|
Shi Y, Hu Y, Wang B. Image Encryption Scheme Based on Multiscale Block Compressed Sensing and Markov Model. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1297. [PMID: 34682021 PMCID: PMC8534541 DOI: 10.3390/e23101297] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/23/2021] [Accepted: 09/26/2021] [Indexed: 11/25/2022]
Abstract
Many image encryption schemes based on compressed sensing have the problem of poor quality of decrypted images. To deal with this problem, this paper develops an image encryption scheme by multiscale block compressed sensing. The image is decomposed by a three-level wavelet transform, and the sampling rates of coefficient matrices at all levels are calculated according to multiscale block compressed sensing theory and the given compression ratio. The first round of permutation is performed on the internal elements of the coefficient matrices at all levels. Then the coefficient matrix is compressed and combined. The second round of permutation is performed on the combined matrix based on the state transition matrix. Independent diffusion and forward-backward diffusion between pixels are used to obtain the final cipher image. Different sampling rates are set by considering the difference of information between an image's low- and high-frequency parts. Therefore, the reconstruction quality of the decrypted image is better than that of other schemes, which set one sampling rate on an entire image. The proposed scheme takes full advantage of the randomness of the Markov model and shows an excellent encryption effect to resist various attacks.
Collapse
Affiliation(s)
| | | | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China; (Y.S.); (Y.H.)
| |
Collapse
|
25
|
Wu J, Zhang S, Zhang T, Liu Y. HD-Code: End-to-End High Density Code for DNA Storage. IEEE Trans Nanobioscience 2021; 20:455-463. [PMID: 34343096 DOI: 10.1109/tnb.2021.3102122] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the rapid development of digital information techniques, the use of DNA media for information storage is considered as the future direction of data storage. Existing DNA storage schemes simply map compressed binary multimedia data into DNA base data, which has the disadvantages of data loss, low logical storage density and high cost of synthesis. This paper presents an end-to-end high density DNA encoding algorithm(referred to as HD-code, where HD stands for high density). The novelty and contributions of this work contain three parts. First, by taking full advantage of the statistical characteristics of the original multimedia data and considering the biological constraints on the DNA bases, the proposed scheme achieves higher logical storage density and improves the flexibility and consistency in data storage. Second, by performing data conversion, the proposed scheme can effectively encode extreme images with large proportion of single color. Third, the proposed method can reconstruct high quality images and reduce synthesis costs by yielding better rate-PSNR(Peak Signal to Noise Ratio).
Collapse
|
26
|
Xiaoru L, Ling G. Combinatorial constraint coding based on the EORS algorithm in DNA storage. PLoS One 2021; 16:e0255376. [PMID: 34324571 PMCID: PMC8320985 DOI: 10.1371/journal.pone.0255376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 07/15/2021] [Indexed: 11/19/2022] Open
Abstract
The development of information technology has produced massive amounts of data, which has brought severe challenges to information storage. Traditional electronic storage media cannot keep up with the ever-increasing demand for data storage, but in its place DNA has emerged as a feasible storage medium with high density, large storage capacity and strong durability. In DNA data storage, many different approaches can be used to encode data into codewords. DNA coding is a key step in DNA storage and can directly affect storage performance and data integrity. However, since errors are prone to occur in DNA synthesis and sequencing, and non-specific hybridization is prone to occur in the solution, how to effectively encode DNA has become an urgent problem to be solved. In this article, we propose a DNA storage coding method based on the equilibrium optimization random search (EORS) algorithm, which meets the Hamming distance, GC content and no-runlength constraints and can reduce the error rate in storage. Simulation experiments have shown that the size of the DNA storage code set constructed by the EORS algorithm that meets the combination constraints has increased by an average of 11% compared with previous work. The increase in the code set means that shorter DNA chains can be used to store more data.
Collapse
Affiliation(s)
- Li Xiaoru
- Hulunbeier Vocational and Technical College, Hulunbeier, Inner Mongolia, China
| | - Guo Ling
- Baidu Co., Ltd., Shanghai, China
| |
Collapse
|
27
|
Yuan Y, Lv H, Zhang Q. DNA strand displacement reactions to accomplish a two-degree-of-freedom PID controller and its application in subtraction gate. IEEE Trans Nanobioscience 2021; 20:554-564. [PMID: 34161242 DOI: 10.1109/tnb.2021.3091685] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Synthesis control circuits can be used to effectively control biochemical molecule processes. In the controller design based on chemical reaction networks (CRNs), generally only the tracking set-point is considered. However, the influence of disturbances, which are frequently encountered in biochemical systems, is often neglected, thus weakening the control effect of the system. In this article, tracking set-point input and suppressing disturbance input are considered in the control effect. Firstly, CRNs are adopted to construct a two-degree-of-freedom PID controller by combining a one-degree-of-freedom PID controller with a feedforward controller for the first time. Then, CRN expressions of the two input functions (step function and ramp function) used as input signals are defined. Furthermore, the two-degree-of-freedom PID controller is founded by DNA strand displacement (DSD) reaction networks, because DNA is an ideal engineering material to constitute molecular devices based on CRNs. The overshoot of the two-degree-of-freedom PID control system is significantly reduced compared to the one-degree-of-freedom PID control system. Finally, a leak reaction is treated as an extraneous disturbance input to a subtraction gate. The influence of external disturbance is solved by the two-degree-of-freedom PID controller. It is worth noting that the two-degree-of-freedom subtraction gate control system better restrains the impact of a disturbance input (leak reaction).
Collapse
|
28
|
Constrained transformer network for ECG signal processing and arrhythmia classification. BMC Med Inform Decis Mak 2021; 21:184. [PMID: 34107920 PMCID: PMC8191107 DOI: 10.1186/s12911-021-01546-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 05/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Heart disease diagnosis is a challenging task and it is important to explore useful information from the massive amount of electrocardiogram (ECG) records of patients. The high-precision diagnostic identification of ECG can save clinicians and cardiologists considerable time while helping reduce the possibility of misdiagnosis at the same time.Currently, some deep learning-based methods can effectively perform feature selection and classification prediction, reducing the consumption of manpower. METHODS In this work, an end-to-end deep learning framework based on convolutional neural network (CNN) is proposed for ECG signal processing and arrhythmia classification. In the framework, a transformer network is embedded in CNN to capture the temporal information of ECG signals and a new link constraint is introduced to the loss function to enhance the classification ability of the embedding vector. RESULTS To evaluate the proposed method, extensive experiments based on real-world data were conducted. Experimental results show that the proposed model achieve better performance than most baselines. The experiment results also proved that the transformer network pays more attention to the temporal continuity of the data and captures the hidden deep features of the data well. The link constraint strengthens the constraint on the embedded features and effectively suppresses the effect of data imbalance on the results. CONCLUSIONS In this paper, an end-to-end model is used to process ECG signal and classify arrhythmia. The model combine CNN and Transformer network to extract temporal information in ECG signal and is capable of performing arrhythmia classification with acceptable accuracy. The model can help cardiologists perform assisted diagnosis of heart disease and improve the efficiency of healthcare delivery.
Collapse
|
29
|
Li X, Wei Z, Wang B, Song T. Stable DNA Sequence Over Close-Ending and Pairing Sequences Constraint. Front Genet 2021; 12:644484. [PMID: 34079580 PMCID: PMC8165483 DOI: 10.3389/fgene.2021.644484] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 04/12/2021] [Indexed: 11/15/2022] Open
Abstract
DNA computing is a new method based on molecular biotechnology to solve complex problems. The design of DNA sequences is a multi-objective optimization problem in DNA computing, whose objective is to obtain optimized sequences that satisfy multiple constraints to improve the quality of the sequences. However, the previous optimized DNA sequences reacted with each other, which reduced the number of DNA sequences that could be used for molecular hybridization in the solution and thus reduced the accuracy of DNA computing. In addition, a DNA sequence and its complement follow the principle of complementary pairing, and the sequence of base GC at both ends is more stable. To optimize the above problems, the constraints of Pairing Sequences Constraint (PSC) and Close-ending along with the Improved Chaos Whale (ICW) optimization algorithm were proposed to construct a DNA sequence set that satisfies the combination of constraints. The ICW optimization algorithm is added to a new predator–prey strategy and sine and cosine functions under the action of chaos. Compared with other algorithms, among the 23 benchmark functions, the new algorithm obtained the minimum value for one-third of the functions and two-thirds of the current minimum value. The DNA sequences satisfying the constraint combination obtained the minimum of fitness values and had stable and usable structures.
Collapse
Affiliation(s)
- Xue Li
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Ziqi Wei
- School of Software, Tsinghua University, Beijing, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Tao Song
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao, China
| |
Collapse
|
30
|
Zheng Y, Wu J, Wang B. CLGBO: An Algorithm for Constructing Highly Robust Coding Sets for DNA Storage. Front Genet 2021; 12:644945. [PMID: 34017354 PMCID: PMC8129200 DOI: 10.3389/fgene.2021.644945] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 04/08/2021] [Indexed: 11/22/2022] Open
Abstract
In the era of big data, new storage media are urgently needed because the storage capacity for global data cannot meet the exponential growth of information. Deoxyribonucleic acid (DNA) storage, where primer and address sequences play a crucial role, is one of the most promising storage media because of its high density, large capacity and durability. In this study, we describe an enhanced gradient-based optimizer that includes the Cauchy and Levy mutation strategy (CLGBO) to construct DNA coding sets, which are used as primer and address libraries. Our experimental results show that the lower bounds of DNA storage coding sets obtained using the CLGBO algorithm are increased by 4.3–13.5% compared with previous work. The non-adjacent subsequence constraint was introduced to reduce the error rate in the storage process. This helps to resolve the problem that arises when consecutive repetitive subsequences in the sequence cause errors in DNA storage. We made use of the CLGBO algorithm and the non-adjacent subsequence constraint to construct larger and more highly robust coding sets.
Collapse
Affiliation(s)
- Yanfen Zheng
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Jieqiong Wu
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian, China
| |
Collapse
|
31
|
Wang L, Wang S. HUIL-TN & HUI-TN: Mining high utility itemsets based on pattern-growth. PLoS One 2021; 16:e0248349. [PMID: 33711048 PMCID: PMC7954358 DOI: 10.1371/journal.pone.0248349] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Accepted: 02/24/2021] [Indexed: 11/19/2022] Open
Abstract
In recent years, high utility itemsets (HUIs) mining has been an active research topic in data mining. In this study, we propose two efficient pattern-growth based HUI mining algorithms, called High Utility Itemset based on Length and Tail-Node tree (HUIL-TN) and High Utility Itemset based on Tail-Node tree (HUI-TN). These two algorithms avoid the time-consuming candidate generation stage and the need of scanning the original dataset multiple times for exact utility values. A novel tree structure, named tail-node tree (TN-tree) is proposed as a key element of our algorithms to maintain complete utililty-information of existing itemsets of a dataset. The performance of HUIL-TN and HUI-TN was evaluated against state-of-the-art reference methods on various datasets. Experimental results showed that our algorithms exceed or close to the best performance on all datasets in terms of running time, while other algorithms can only excel in certain types of dataset. Scalability tests were also performed and our algorithms obtained the flattest curves among all competitors.
Collapse
Affiliation(s)
- Le Wang
- College of Digital Technology and Engineering, Ningbo University of Finance and Economics, Ningbo, Zhejiang, China
| | - Shui Wang
- College of Digital Technology and Engineering, Ningbo University of Finance and Economics, Ningbo, Zhejiang, China
- * E-mail:
| |
Collapse
|
32
|
Chen C, Wu R, Wang B. Development of a neuron model based on DNAzyme regulation. RSC Adv 2021; 11:9985-9994. [PMID: 35423534 PMCID: PMC8695483 DOI: 10.1039/d0ra10515e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 03/02/2021] [Indexed: 12/25/2022] Open
Abstract
Neural networks based on DNA molecular circuits play an important role in molecular information processing and artificial intelligence systems. In fact, some DNA molecular systems can become dynamic units with the assistance of DNAzymes. The complex DNA circuits can spontaneously induce corresponding feedback behaviors when their inputs changed. However, most of the reported DNA neural networks have been implemented by the toehold-mediated strand displacement (TMSD) method. Therefore, it was important to develop a method to build a neural network utilizing the TMSD mechanism and adding a mechanism to account for modulation by DNAzymes. In this study, we designed a model of a DNA neuron controlled by DNAzymes. We proposed an approach based on the DNAzyme modulation of neuronal function, combing two reaction mechanisms: DNAzyme digestion and TMSD. Using the DNAzyme adjustment, each component simulating the characteristics of neurons was constructed. By altering the input and weight of the neuron model, we verified the correctness of the computational function of the neurons. Furthermore, in order to verify the application potential of the neurons in specific functions, a voting machine was successfully implemented. The proposed neuron model regulated by DNAzymes was simple to construct and possesses strong scalability, having great potential for use in the construction of large neural networks.
Collapse
Affiliation(s)
- Cong Chen
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University Dalian 116622 China
| | - Ranfeng Wu
- School of Computer Science and Technology, Dalian University of Technology Dalian 116024 China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University Dalian 116622 China
| |
Collapse
|