• Non ci sono risultati.

CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing

N/A
N/A
Protected

Academic year: 2021

Condividi "CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing"

Copied!
16
0
0

Testo completo

(1)

CRISPRitz

:

rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR

genome editing [1]

[1] Samuele Cancellieri, Matthew C Canver, Nicola Bombieri, Rosalba Giugno, Luca Pinello, CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing, Bioinformatics, Volume 36, Issue 7, 1 April 2020, Pages 2001–2008, https://doi.org/10.1093/bioinformatics/btz867

Samuele Cancellieri, PhD Student University of Verona

(2)
(3)
(4)
(5)

Many other fundamental variables necessary to

validate CRISPR results

• Variants

• SNP

• INDELs

• Annotation for functional regions

• Exons

• Promoters

• CTCF

• and others…

• Efficiency scores

• Cutting Frequency Determination (CFD) [1]

• Doench 2016 On-Target Score [1]

• and others…

[1] Doench, John G., et al. "Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9." Nature biotechnology 34.2 (2016): 184-191.

(6)
(7)

CRISPRitz – Add variants

• Starting with a VCF and a reference genome • Conversion of ‘real’ nucleotide into 4 bit code

• A -> 0001 • C -> 0010 • G -> 0100 • T -> 1000

• Generation of ambigous nucleotides with OR operation • D -> T+G+A -> 1101 • R -> G+A -> 0101 VCF Add-variant R M S K Genome

(8)

CRISPRitz – Index Genome

• Fast (linear) algorithm (Aho-Corasick [1]) to

extract PAM sequences from the genome

• Saved into a specific tree structure

(Ternary search tree [2]) to speed-up the

searches with bulges

[1] Aho, Alfred V., and Margaret J. Corasick. "Efficient string matching: an aid to bibliographic

search." Communications of the ACM 18.6 (1975): 333-340.

[2] Bentley, Jon, and Bob Sedgewick. "Ternary search trees." Dr. Dobb's Journal 23.4 (1998).

(9)

CRISPRitz – Search

Guide Genome

• Search with a brute force approach • Easy, fast, no database required • Search on the tree

• More complex and requires a database creation phase

• Allows bulges due to tree structure that has no exponential execution time

(10)

CRISPRitz – Annotate results

• Personal .bed file with functional

annotation

(11)

CRISPRitz – Generate report

• Graphical representation of two different guides extracted from target guides specific to the CCR5 gene and compared with Gecko Library [1] precomputed database of guide behaviour

• Left one, shows a ‘good’ behaviour with respect to Gecko Library database

• Rigth one, shows an ‘average’ behaviour with respect to Gecko Library database

[1] Improved vectors and genome-wide libraries for CRISPR screening . Sanjana NE, Shalem O, Zhang F. Nat Methods. 2014 Aug;11(8):783-4. doi: 10.1038/nmeth.3047. 10.1038/nmeth.3047PubMed 25075903

(12)

CRISPRitz – Process data

(under development)

• New function designed to help the user validate targets

and compare reference results to variant results

• Assing variant target to a real sample extracted from the

VCF

(13)

A real case study

We started with a guide designed to target a specific region of BCL11A gene CTAACAGTTGCTTTTATCACNNN

We create a variant genome using the 1000 Genome Project VCFs

K S

VCF Add-variant

R M

We create two databases of the genome with a non-specific PAM

PAM Index-genome

We run two searches, one on the reference database and one on the variant database

Guide Search

Search

Thresholds

We run process data to validate results on samples and to enlighten differences between reference and variant genome

Results Process-data

REF VAR

(14)

Results from a case study

• 1289 reference targets found in the top2000 rows, ordered by CFD score

• 710 variant targets found in the top2000 rows, meaning that inclusion of SNPs has a very significant impact in terms of off-targets discovery

(15)

Conclusions

CRISPRitz allows to:

• Add genetic variants like SNPs (for multi sample) and INDELs (for single sample) to any genome

• Generate a database to search with mismatches and bulges with a very fast algorithm based on tree search

• Search on any FASTA genome w/ and w/out variants with any number of mismatches and bulges plus calculation of two scoring metrics (CFD and Doench2016 for on-targets)

• Annotation of functional regions with any personal .bed file

• Exhaustive graphical report to collect results for any guide and any combination of mismatches plus bulges in a very intuitive and representative way

• Process the data (under development) to create a result file with a set of target sequences ready to be used as verification panel to assess results with wet-lab experiments supported by a graphical representation of CFD distribution

(16)

Thank you very much for the attention

Software availability:

• https://bioconda.github.io/recipes/crispritz/README.html

Riferimenti

Documenti correlati

Using genomic DNA obtained from isolated cardiomyocytes, we observed robust editing at the Myh6 exon 3 locus using a T7 En- donuclease I assay, used to assess genomic editing ( Fig.

According to the revised act, regional public authorities were given the choice between two di¤erent contrac- tual regimes to procure public transport services: to put lines out

The approach to the daily modelling of this large-scale background, is to use bright-star observations to measure a two-dimensional background surface independently for each device

The simulation results, using the encoded 8B10B sequence, setting the Line Signal bit-rate and the Supervisory Signal bit-rate to the usual value of 10 Gb/s and 20 Mb/s

It was integrated in the general WiMUST software architecture using the Open Motion Planning Library (OMPL) [20] running on ROS (Robot Operating System [19]). This library was

To further demonstrate the risks associated to using deep learning “as is” for malware classification, we propose a novel attack algorithm that generates adversarial malware binaries

While establishing provisions that are not binding, in fact, the Global Compact, when expressly agreed upon, would still operate as a vehicle for international obligations, based

the chromatin structure. Histone acetylation and phosphorylation can effectively reduce the positive charge of histones thus disrupting the electrostatic DNA‐histones