CRISPRitz
:rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR
genome editing [1]
[1] Samuele Cancellieri, Matthew C Canver, Nicola Bombieri, Rosalba Giugno, Luca Pinello, CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing, Bioinformatics, Volume 36, Issue 7, 1 April 2020, Pages 2001–2008, https://doi.org/10.1093/bioinformatics/btz867
Samuele Cancellieri, PhD Student University of Verona
Many other fundamental variables necessary to
validate CRISPR results
• Variants
• SNP
• INDELs
• Annotation for functional regions
• Exons
• Promoters
• CTCF
• and others…
• Efficiency scores
• Cutting Frequency Determination (CFD) [1]
• Doench 2016 On-Target Score [1]
• and others…
[1] Doench, John G., et al. "Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9." Nature biotechnology 34.2 (2016): 184-191.
CRISPRitz – Add variants
• Starting with a VCF and a reference genome • Conversion of ‘real’ nucleotide into 4 bit code
• A -> 0001 • C -> 0010 • G -> 0100 • T -> 1000
• Generation of ambigous nucleotides with OR operation • D -> T+G+A -> 1101 • R -> G+A -> 0101 VCF Add-variant R M S K Genome
CRISPRitz – Index Genome
• Fast (linear) algorithm (Aho-Corasick [1]) to
extract PAM sequences from the genome
• Saved into a specific tree structure
(Ternary search tree [2]) to speed-up the
searches with bulges
[1] Aho, Alfred V., and Margaret J. Corasick. "Efficient string matching: an aid to bibliographic
search." Communications of the ACM 18.6 (1975): 333-340.
[2] Bentley, Jon, and Bob Sedgewick. "Ternary search trees." Dr. Dobb's Journal 23.4 (1998).
CRISPRitz – Search
Guide Genome
• Search with a brute force approach • Easy, fast, no database required • Search on the tree
• More complex and requires a database creation phase
• Allows bulges due to tree structure that has no exponential execution time
CRISPRitz – Annotate results
• Personal .bed file with functional
annotation
CRISPRitz – Generate report
• Graphical representation of two different guides extracted from target guides specific to the CCR5 gene and compared with Gecko Library [1] precomputed database of guide behaviour
• Left one, shows a ‘good’ behaviour with respect to Gecko Library database
• Rigth one, shows an ‘average’ behaviour with respect to Gecko Library database
[1] Improved vectors and genome-wide libraries for CRISPR screening . Sanjana NE, Shalem O, Zhang F. Nat Methods. 2014 Aug;11(8):783-4. doi: 10.1038/nmeth.3047. 10.1038/nmeth.3047PubMed 25075903
CRISPRitz – Process data
(under development)
• New function designed to help the user validate targets
and compare reference results to variant results
• Assing variant target to a real sample extracted from the
VCF
A real case study
We started with a guide designed to target a specific region of BCL11A gene CTAACAGTTGCTTTTATCACNNN
We create a variant genome using the 1000 Genome Project VCFs
K S
VCF Add-variant
R M
We create two databases of the genome with a non-specific PAM
PAM Index-genome
We run two searches, one on the reference database and one on the variant database
Guide Search
Search
Thresholds
We run process data to validate results on samples and to enlighten differences between reference and variant genome
Results Process-data
REF VAR
Results from a case study
• 1289 reference targets found in the top2000 rows, ordered by CFD score
• 710 variant targets found in the top2000 rows, meaning that inclusion of SNPs has a very significant impact in terms of off-targets discovery
Conclusions
CRISPRitz allows to:
• Add genetic variants like SNPs (for multi sample) and INDELs (for single sample) to any genome
• Generate a database to search with mismatches and bulges with a very fast algorithm based on tree search
• Search on any FASTA genome w/ and w/out variants with any number of mismatches and bulges plus calculation of two scoring metrics (CFD and Doench2016 for on-targets)
• Annotation of functional regions with any personal .bed file
• Exhaustive graphical report to collect results for any guide and any combination of mismatches plus bulges in a very intuitive and representative way
• Process the data (under development) to create a result file with a set of target sequences ready to be used as verification panel to assess results with wet-lab experiments supported by a graphical representation of CFD distribution
Thank you very much for the attention
Software availability:
• https://bioconda.github.io/recipes/crispritz/README.html