Generative design and construction of functional plasmids with a DNA language model

PRODUCTS USED

Genes
Read Full Article

ABSTRACT

Abstract DNA language models offer a new paradigm for sequence design, yet their ability to generate functional genomic sequences remains underexplored. Plasmids act as a good testbed for evaluating DNA language model generation potential due to their simplicity and ease of construction. Here, we develop an end-to-end pipeline for generative design of Escherichia coli plasmid backbones, from large-scale data curation through fine-tuning, sampling, bioinformatic assessment, and candidate selection. A curated plasmid library was assembled from PlasmidScope and Addgene, and PlasmidGPT, a GPT-2-style DNA model, was fine-tuned on these corpora using circular-aware batching and random crops. Generations (1,000 per model) were produced under two prompting strategies: a minimal ATG seed to expose default tendencies, and a GFP cassette to enforce functional context. From 1000 generated synthetic plasmids, 16 candidates survived strict filtering and these were prioritised for wet-lab validation. Three shortlisted plasmids were synthesised and found to be functional, supporting growth, antibiotic resistance, and GFP expression in E. coli . These represent, to our knowledge, the first full AI-generated plasmids to be synthesised and validated in vivo . This work demonstrates that curated fine-tuning and prompt-aware generation enable DNA language models to progress from raw sequence sampling to experimentally testable plasmid designs. The approach offers a foundation for extending DNA design optimisation beyond E. coli , toward broader applications across engineering biology.

Read Full Article

PRODUCTS USED

Genes