Twist Bioscience
June 5, 2023
14 min read

A conversation with the experts: Designing sgRNA Libraries for High-throughput CRISPR Screening

Digital rendering of the CRISRP-CAS9 complex binding to a DNA strand.

The following is an excerpt from an E-book that Twist developed for Genetic Engineering News: The Changing Landscape of CRISPR Screening: A guidebook to the latest CRISPR screening methodologies and technologies. You can download the full E-book here.

Even for seasoned researchers, designing an sgRNA library can be a daunting task. Each guide in a library of thousands requires careful consideration to ensure fidelity for its target gene. And, accuracy doesn’t always equate to effectiveness—approximately 20% of CRISPR mutagenesis leads to in-frame mutations, many of which are unlikely to affect protein function1. In the decade since CRISPR was first applied to gene editing in mammalian cells, guidelines for designing sgRNA libraries have continually evolved in response to new insights and new technologies. With this evolution has been a growing need for custom CRISPR sgRNA libraries, tailored to maximize efficiency while taking full advantage of CRISPR screening’s discovery power.

Here, we sit down with two of Twist’s CRISPR experts to discuss the basics of sgRNA library design and what tips they have for anyone considering a custom approach. Julian Jude, PH.D., is a renowned CRISPR expert and co-author of the Vienna Bioactivity CRISPR (VBC) algorithm for scoring sgRNA design1. Julian has been helping researchers improve CRISPR screening for the better part of a decade. With Julian is Twist Bioscience’s Chief Technology Officer Siyuan Chen, PH.D.

Where should researchers start when designing a sgRNA library?

SIYUAN: Designing the best sgRNA library for your experiment will depend in part on your screening goals. The perfect sgRNA for a CRISPRi screen will be different from the perfect sgRNA for a classic knockout screen, and should be inserted into a backbone that is designed to match your method of delivery. Therefore it’s important to start from a position that builds on generalized rules, but ultimately considers your specific needs.

JULIAN: As you said, a good sgRNA library is carefully designed to address specific needs. But there are some generalities that should be considered. Many excellent resources are available to help interested researchers catch up on the base rules for sgRNA design. Briefly, sgRNAs should have minimal complementarity with non-target DNA sequences, have between 40% and 80% GC content, and the spacer RNA portion of the sgRNA should extend approximately 17 to 24 nucleotides in length (the exact length will depend on the associated Cas protein you choose to use).

Identifying the appropriate CRISPR-associated (Cas) protein to use is a critical step. At this stage, there are many different types of nucleases that have been developed for genomic screens. To pick the one that’s right for you requires an analysis of available PAM sequences near your target genes and consideration of what type of perturbation you need the Cas protein to perform. If you need guidance on this, I’d suggest reviewing John Doench’s 2019 review of CRISPR technologies2, which includes a nice overview of the various Cas proteins, their functions, and their associated PAM sequences.

What are the latest and greatest tools available for sgRNA design?

JULIAN: If your goal is to perform a knockout screen, there are ways to optimize sgRNAs to increase the likelihood of achieving on-target knockout or, at least, to induce loss-of-function in-frame mutations.

I was recently part of a team of researchers from the Vienna BioCenter, Austria, who developed the Vienna Bioactivity CRISPR score (VBC)—a high-performance sgRNA prediction tool that helps researchers select sgRNAs that reliably generate loss-of-function alleles in mammalian cells 1. In developing this tool, we assessed the qualities of sgRNAs that were most likely to lead to successful loss-of-function mutations. Our analysis revealed that sgRNAs designed to target stretches of highly conserved amino acids were more likely to result in loss of protein function, with the best results observed for stretches of seven amino acids (21 base pairs). Similarly, the sgRNAs that targeted DNA sequences corresponding to hydrophobic domains in the protein’s core were more likely to perform well.

Box 1: General considerations for CRISPR sgRNA design

  • GC content between 40% and 80% to ensure strong binding between sgRNA and the target DNA.
  • Ensure that secondary structures like hairpins and polymerase termination sequences are limited in your sgRNA design. These structures can affect cloning efficiency and the transcription of guides.
  • Remove polyA sites, these can disrupt packaging if you’re using viral delivery.
  • Minimize mismatches, particularly within the first 10 bases upstream of the PAM. Mismatches can be tolerated in the latter half of the spacer sequence, but mismatches nearer to the PAM site can disrupt binding and subsequent editing.
  • Match the spacer sequence length in the sgRNA to the Cas protein being used. Discordance between these two can severely reduce editing efficiency.
  • Research and select the best Cas protein for your needs. Cas proteins differ in the PAM sequences they recognize, the type of cut they perform, and multiple other factors that affect editing performance.
  • Ensure that sgRNA promoters aren’t placed in overlapping conflict with other promoters, such as the EF-1a promoter that’s frequently used to transcribe selection resistance genes. Overlap with this promoter has resulted in less efficient transcription of the sgRNA.

Insights for this box were derived from several recent studies1-7. These insights were derived from studies focused on mammalian cells and may not hold true in other systems.

What’s highlighted in our work, and captured in the VBC score, is that sgRNA libraries are more likely to achieve knockout when we consider more than just sgRNA specificity, but rather select guides based on gene and protein structure. [See Boxes 1 and 2 for a list of general considerations for sgRNAs that came out of this work.]

SIYUAN: For experiments that aim to modulate gene expression through CRISPRi, CRISPRa, CRISPRon or CRISPRoff, there are some constraints that limit the number of sgRNAs to choose from. For example, studies have shown that CRISPRi sgRNAs are most effective when targeting sequences within a 75 base pair window immediately downstream of the transcription start site. This window is broadened for CRISPRoff, such that sgRNAs targeting sequences within 500 base pairs upstream or downstream of the transcriptional start site appear most effective at transcriptional inhibition3.

JULIAN: Also, as single-cell CRISPR screening can use knockout or epigenetic perturbation methods, these sgRNA libraries must also be designed to ensure that each sgRNA is faithfully linked to its perturbation and subsequent transcriptomic profile. Single-cell CRISPR screening has been challenging because functional sgRNAs can’t be polyadenylated, yet many RNA sequencing methods rely on capture of polyadenylated transcripts.

Methods have been developed to overcome this hurdle by inserting barcodes into the 3’ long terminal repeat section downstream of plasmid selection genes. And while this approach is effective, it has some limitations, such as dissociation of the barcode from the sgRNA as a result of template switching during the cloning step (multiple studies suggest this can affect as much as 50% of plasmids)4-6.

It’s therefore very important to consider where in the backbone your sgRNA is inserted and how you’ll detect it when sequencing. Methods like CROP-seq and direct capture Peturb-seq offer alternative approaches. [See chapter 7 of our E-Book for an in-depth analysis on the latest single-cell CRISPR screening technology] There are several tools available to help researchers design sgRNAs, including the VBC score described earlier. These tools have been collected in a helpful github repository.

Box 2: Context-specific considerations

Gene Knockout sgRNA Design Considerations

  • Target guides to conserved amino acid sequences (ideally 7aa in length). It’s likely that these sequences have been conserved because they are important to protein function, therefore disruption in these regions is more likely to be effective.
  • Target hydrophobic domains in the protein’s core. It’s not clear why editing core hydrophobic domains is more likely to disrupt protein function, but experimental evidence indicates that this is an effective targeting approach.
  • Target larger exons to avoid alternative splicing as this can circumvent gene knockout despite successful editing.
  • Don’t target regions near the end of the protein-coding sequence because some proteins may still be expressed and functional despite slight truncations.
  • Avoid targeting locations near alternative start sites. As with alternative splice sites, alternative start sites can compensate for the loss of the standard transcript to prevent gene knockout.
  • Avoid locations that frequently contain polymorphisms. Even if your sgRNA is designed to perfectly match your target sequence, polymorphisms in the target sequence can lead to unexpected mismatches and reduced editing efficiency.
  • Ensure that sgRNA promoters aren’t placed in overlapping conflict with other promoters, such as the EF-1a promoter that’s frequently used to transcribe selection resistance genes. Overlap with this promoter has resulted in less efficient transcription of the sgRNA.

CRISPRi/a/on/off sgRNA Design Considerations

  • Limited sgRNA options are available as you need to target the transcription start site/ CpG islands
  • Ideal target window is +25 to +75 nts downstream of the TSS for CRISPRi sgRNAs, based on tiling arrays that show optimal disruption of gene expression within this window.
  • For CRISPRa: 150–75 nucleotides upstream of the TSS provides optimal activation.
  • CRISPRoff and CRISPRon sgRNAs are most effective when targeted to a 1kb window centered on the transcription start site. Methylation in this region is likely to disrupt key transcription factor binding and recruitment of polymerases.

Insights for this box were derived from several recent studies1-7. These insights were derived from studies focused on mammalian cells and may not hold true in other systems.

There are lots of premade CRISPR sgRNA libraries available. Should researchers use these or consider custom-designed libraries?

SIYUAN: Researchers who are either wary of designing their own sgRNA libraries or are unsure of the right backbone to use for CRISPR screening have two options: use off-the-shelf libraries for a general screen, or get help with designing custom sgRNA libraries.

The former may be a good option for researchers who have a limited budget and want to go with a library that others have validated. These libraries are usually ready to go, meaning there’s little wait time for them to be made and deployed. However, such libraries have some significant drawbacks. The pace of advancement in the CRISPR field is such that any static sgRNA library is likely to be outdated by the time it is used. Pre-made libraries are also not malleable to a researcher’s specific needs—libraries are designed to meet several potential research goals and may provide too much, or too little coverage over areas of specific interest.

Custom sgRNA libraries offer many advantages. Inherent to the concept of a custom sgRNA library is flexibility, and that flexibility can be particularly helpful for screens using some of the more recent CRISPR technology.

JULIAN: A good example of this is in the prime-editing space. Prime editing has significant potential for precise perturbation in large-scale screening, but doing so requires libraries of really long sgRNA sequences. Prime editing sgRNA (pegRNA) is necessarily longer than a sgRNA as it includes both a sgRNA and an additional 30+ nucleotides that code for both a primer binding site and a repair template8. To facilitate this, pegRNA libraries must be precisely built using long oligonucleotide synthesis. Few companies can do this well, but Twist Bioscience is one of them.

Custom libraries also enable you to play around with the plasmid backbone, the tracrRNA, the placement of barcodes, and other features that may help you improve your screening, whereas premade libraries typically come with these features set in stone.

"Don’t be afraid to ask for help and be creative"

We are also seeing researchers combine pre-made libraries with custom solutions. Broad-scale screens can lead to hundreds to thousands of perturbations that need to be followed up. One of the most efficient ways to do so uses a focused secondary screen utilizing a custom guide library designed to target all the genes identified in the primary genome-wide screen. So, researchers can get a lot of use out of both custom sgRNA libraries and pre-made ones. It depends on your timelines, resources, and needs. But whatever library you choose, you want to make sure it’s uniform and created with a low error rate.

Where can researchers get uniform, low-error CRISPR sgRNA libraries?

SIYUAN: Twist Bioscience. Twist’s silicon-based oligonucleotide synthesis platform enables the rapid generation of low-error, highly uniform oligonucleotides which can then be turned into sgRNA libraries. These qualities of uniformity and low error are extremely appealing for any CRISPR screening experiment.

JULIAN: I agree, uniformity is key. Uniformity is a description of how frequently each guide occurs relative to the rest of the library. A highly uniform library will have each guide represented equally, whereas a low uniformity library may contain an overrepresentation of a few select guides. When screening, uniformity is important because a non-uniform library can lead to a perceived loss or diminished effect from underrepresented guides. Put another way, non-uniform libraries reduce screening sensitivity and may result in biased results.

To compensate for non-uniformity, screens require a larger number of cells to ensure that the effect of underrepresented guides can be detected. Therefore uniform libraries enable successful screens with fewer cells—an important feature when working with primary cell lines that may be limited in number. That’s why it’s extremely important to ensure you select a vendor that offers oligo pools with a high degree of uniformity when designing sgRNA libraries. We’ve written a bit more about this in our white paper on the importance of uniformity.

Histogram plot showing frequency on the y-axis, and read counts on the x axis. The peak frequency occurs at 300 read counts, with the 95th and 5th percentiles falling at approximately 380 and 210 read counts, respectively.

NGS quality control data from a typical oligo pool containing 23,000 90mer oligos show the uniformity

of the pool at 300x read coverage. The corresponding table indicates the uniformity metrics for this pool. Twist oligos are synthesized bias-free with high uniformity and complete oligo representation.

SIYUAN: Error rates are similarly important. CRISPR systems are designed to precisely target a sequence of DNA based on the spacer portion of the sgRNA. Mismatches in the latter 10 nucleotides of the sgRNA are tolerable to a point; however, mismatches increase the likelihood of off-target effects. Errors during sgRNA generation can thus affect sgRNA specificity. It’s also worth noting that for applications that use templates for repair, such as prime editing, it’s critical that the template be coded without errors.

With Twist Bioscience’s oligo pools, researchers can design custom libraries that can then be cloned into your vector of choice. Twist’s CRISPR experts can also offer some guidance on the library design, vector and NGS strategy if needed.

Are there any final considerations for sgRNA design?

JULIAN: I think it’s also important to call out an often overlooked aspect to sgRNA design: The need to protect your experiment from primer binding site contamination.

NGS sequencing sgRNAs is a fast and simple method for assessing which sgRNAs are present at the end of a screen. Critical to this process is having unique primer binding sites that allow you to limit amplification to just the sgRNAs in your screen, rather than any other sgRNA expressing vectors that may have snuck in. Primer binding site contamination can occur far too easily in labs that work with CRISPR often. It’s much easier to amplify from a plasmid than from genomic DNA, resulting in most of your NGS reads coming from the contaminating vector.

An oft overlooked issue: primer binding site contamination

This type of contamination can occur for many reasons. Typically labs use primer binding sites that are located within the U6 promoter that’s used to promote sgRNA transcription. If, for example, you want to perform a knockout in your cell line before carrying out your screen, it’s easy to reuse the same CRISPR plasmid constructs, including the same U6 promoter, and accidentally insert your primer binding site with it.

To protect your experiment, it’s important to consider adding a primer binding site that is unique to your screening library. One way to do this is to insert a unique primer binding site in the screening library plasmid.

Any last words of encouragement for researchers getting into screening?

SIYUAN: Don’t be afraid to ask for help and be creative. A lot of people want to just follow what’s been done before. But there are creative ways to perform screens that can help improve your experiment. And, there are always experts available to help in places like Twist Bioscience. Feel free to reach out to us at twistbioscience.com/contact

Read the full E-book this excerpt came from to learn more about the latest CRISPR screening methodologies and technologies.

References

  1. Michlits, Georg, et al. “Multilayered VBC Score Predicts SgRNAs That Efficiently Generate Loss-of-Function Alleles.” Nature Methods, vol. 17, no. 7, 8 June 2020, pp. 708–716, 10.1038/s41592-020-0850-8.
  2. Doench, John G. “Am I Ready for CRISPR? A User’s Guide to Genetic Screens.” Nature Reviews Genetics, vol. 19, no. 2, 4 Dec. 2017, pp. 67–80, 10.1038/nrg.2017.97.
  3. Nuñez, James K., et al. “Genome-Wide Programmable Transcriptional Memory by CRISPR-Based Epigenome Editing.” Cell, vol. 0, no.0, 9 Apr. 2021, www.cell.com/cell/fulltext/S0092-8674(21)00353-6, 10.1016/j.cell.2021.03.025.
  4. Datlinger, Paul, et al. “Pooled CRISPR Screening with Single-Cell Transcriptome Readout.” Nature Methods, vol. 14, no. 3, 18 Jan. 2017, pp. 297–301, 10.1038/nmeth.4177.
  5. Hill, Andrew J., et al. "On the Design of CRISPR-Based Single Cell Molecular Screens." 29 Jan. 2018, 10.1101/254334.
  6. Xie, Shiqi, et al. “Frequent SgRNA-Barcode Recombination in SingleCell Perturbation Assays.” PLOS ONE, vol. 13, no. 6, 6 June 2018, p.e0198635, 10.1371/journal.pone.0198635.
  7. Mohr, Stephanie E., et al. “CRISPR Guide RNA Design for Research Applications.” The FEBS Journal, vol. 283, no. 17, 22 June 2016, pp. 3232–3238, www.ncbi.nlm.nih.gov/pmc/articles/PMC5014588/, 10.1111/febs.13777.
  8. Anzalone, Andrew V et al. “Search-and-replace genome editing without double-strand breaks or donor DNA.” Nature vol. 576,7785 (2019): 149-157. doi:10.1038/s41586-019-1711-4

For Research Use Only. Not for use in diagnostic procedures.

What did you think?

Dislike

Love

Surprised

Interesting

Get the latest by subscribing to our blog