Twist Bioscience HQ
681 Gateway Blvd
South San Francisco, CA 94080
Quantitative profiling of millions of nucleotides reveals sequence-encoded interactions that govern plasmid propagation
PRODUCTS USED
ABSTRACT
Plasmids are central to modern biotechnology, especially therapeutic development, yet their propagation in Escherichia coli remains difficult to predict. Although expression-induced burden is well understood and can be mitigated, the impact of foreign DNA segments that do not function in bacteria on plasmid propagation and stability remains largely unknown. Here we developed a pooled, sequencing-based framework that performs quantitative profiling across millions of bases, enabling high-resolution assessment of plasmid fitness at scale and revealing cryptic, sequence-encoded interactions between foreign DNA elements and bacterial hosts. Promoter-like motifs, transcription factor binding site homology, and recombination-prone architectures emerge as major determinants of propagation efficiency, with context-dependent effects demonstrating that plasmid behaviour arises from higher-order interactions between parts rather than isolated elements. Extending this framework, we introduce TRACE, a neural-network model trained on degenerate sequence libraries that predicts plasmid propagation directly from sequence. TRACE generalises across plasmid architectures and can be fine-tuned on experimental datasets to improve predictions of manufacturability and host compatibility. These advances establish a generalisable, data-driven framework for understanding and designing host-aware plasmids, transforming plasmid production from an empirical process into a predictable property of DNA sequence.