Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model

Publications

Scientific reports, Jun 2025 | 15 (1), 20920

DOI: 10.1038/s41598-025-05717-3

Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model

Lin, Wanmin ; Chu, Ling ; Yao, Xiangyu ; Chen, Zhihua ; Xu, Peng ; Liu, Wenbin

PRODUCTS USED

NGS

Read Full Article

ABSTRACT

DNA storage has been widely considered as a promising alternative for exponentially growing data. However, the inherent complex secondary structures severely compromise the processes of synthesis, PCR amplification, and sequencing, interfering with reliable information recovery. In large-scale storage applications, how to effectively circumvent the negative effects is a critical problem. As secondary structures are formed by contiguous bases with reversal complementary relations and accompanied by the released free energy, we construct a BiLSTM-Transformer model with k-mer embedding to predict the free energy of sequences and further screen out these sequences with high values. K-mer embedding can capture the characteristics of contiguous base pairings through overlapping short subsequences, further facilitating free-energy prediction. Compared with other deep learning models, our simulation results demonstrate that BiLSTM-Transformer model with k-mer embedding has a better prediction performance. Application on a real dataset demonstrates that the proposed model can screen out those top high-risk sequences which are prone to more read errors and fewer retrieved copy numbers in real DNA storage. The proposed screening method for top high-risk sequences can be a proactive step to prevent the occurrence of severe secondary structures, providing a solution for reliable information retrieval.

Read Full Article

PRODUCTS USED

NGS

Genes

Oligo Pools

NGS Target Enrichment Solutions

Variant Libraries

Synthetic Controls

Antibody Discovery