Back to Search Start Over

Correcting a Single Indel/Edit for DNA-Based Data Storage: Linear-Time Encoders and Order-Optimality

Authors :
Han Mao Kiah
Yeow Meng Chee
Ryan Gabrys
Kui Cai
Tuan Thanh Nguyen
Source :
IEEE Transactions on Information Theory. 67:3438-3451
Publication Year :
2021
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2021.

Abstract

An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this article, we investigate codes that correct either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with $\lceil {\log \text {n}}\rceil+\text {O}(\log \log \text {n})$ redundancy bits, while the other corrects a single indel with $\lceil {\log \text {n}}\rceil+2$ redundant bits. These two encoders are order-optimal . The former encoder is the first known order-optimal encoder that corrects a single edit, while the latter encoder (that corrects a single indel) reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, we impose an additional constraint: the $\mathtt {GC}$ -balanced constraint and require that exactly half of the symbols of any DNA codeword to be either $\mathtt {C}$ or $\mathtt {G}$ . In particular, via a modification of Knuth’s balancing technique, we provide a linear-time map that translates binary messages into $\mathtt {GC}$ -balanced codewords and the resulting codebook is able to correct a single indel or a single edit. These are the first known constructions of $\mathtt {GC}$ -balanced codes that correct a single indel or a single edit.

Details

ISSN :
15579654 and 00189448
Volume :
67
Database :
OpenAIRE
Journal :
IEEE Transactions on Information Theory
Accession number :
edsair.doi...........81f62ca2f2a8a0e71a99bd0baf8efaff