System-wide extraction of cis-regulatory rules from sequence-to-function models in human neural development

PRODUCTS USED

Genes
Read Full Article

ABSTRACT

Abstract The genomic cis -regulatory code (CRC) underlies spatiotemporal specificity of gene expression. While sequence-to-function (S2F) models can accurately encode the CRC of transcriptional enhancers, decoding these models into human-interpretable rules remains a major challenge. Here we tackle this challenge in human neural development, for which we generate two new single-cell multiome atlases, one from a human embryo and one from neural tube organoids. We use this comparative framework to robustly extract combinations of transcription factor (TF) binding sites that are necessary and sufficient to design enhancers. As such we extract cis -regulatory rules for dorsal-ventral progenitors, neural crest, mesenchyme and neurons. To enable this, we develop a new strategy and computational package, called TF-MINDI, to embed, cluster, and annotate candidate TF binding sites, and to extract combinatorial rules for each cell type. We evaluate rule-based models in conjunction with blackbox S2F models through simulations, evolutionary comparisons with zebrafish, topic modeling, and enhancer reporter-assays. Our findings show robust and interpretable rule extraction and constitute a step forward in deciphering, explaining, and formalizing the CRC. TF-MINDI is available at: https://github.com/aertslab/TF-MINDI .

Read Full Article

PRODUCTS USED

Genes