Find number of off-targets

usage: cas_offinder.py [-h] [-j JID] -f INPUT [-n NUM_MISMATCHES] [--add_PAM]
                       [--remove_first_G] [--PAM_seq PAM_SEQ] [-g GENOME]
                       [--chr_fa CHR_FA]

optional arguments:
  -h, --help            show this help message and exit
  -j JID, --jid JID     enter a job ID, which is used to make a new directory.
                        Every output will be moved into this folder. (default:
                        cas_offinder_yli11_2020-03-24)
  -f INPUT, --input INPUT
                        a list of gRNA sequences (default: None)
  -n NUM_MISMATCHES, --num_mismatches NUM_MISMATCHES
                        Number of allowed mis-matches in the gRNA, excluding
                        PAM sequence (default: 2)
  --add_PAM             if PAM sequence is not included in your gRNA, please
                        add this option. (default: False)
  --remove_first_G      remove first letter G in the input gRNA list (default:
                        False)
  --PAM_seq PAM_SEQ     specify the PAM sequence, e.g., NGG. (default: NGG)

Genome Info:
  -g GENOME, --genome GENOME
                        genome version: hg19, hg38, mm9, mm10.(default: hg19)
  --chr_fa CHR_FA       This will be automatically changed with -g option
                        (default: /home/yli11/Data/Human/hg19/fasta/chr)

Summary

Given a list of gRNA sequences and number of allowed mismatches (excluding PAM), with or without PAM sequences, output number of off-targets (i.e., number of matches up to maximal mismatches) for each gRNA.

Latest updates:

  1. To allow mismatches in the PAM sequence, use --add_PAM --allow_PAM_mis --PAM_seq NGG, for example:

cas_offinder.py -g hg38 --add_PAM --allow_PAM_mis --PAM_seq NGG -f input2.list -j mis_4_GA -n 4
  1. To allow G-A mismatch in the protospacer sequence, change G to R in the protospacer sequence:

for example ACTGGGAGACACCTCCCAGT becomes ACT``R````R````R``A``R``ACACCTCCCA``R``T

Output example

ACTGGGAGACACCTCCCAGTAGG 4058
TGGGAGACACCTCCCAGTAGGGG 4011
CTGGGAGACACCTCCCAGTAGGG 3998
AGGTGTCTGTCGGCCCCTACTGG 3262
GGTGTCTGTCGGCCCCTACTGGG 3208
TCAGGAATTCGAGACCAGCAGGG 3146
GTCTGTCGGCCCCTACTGGGAGG 1687
CAGGAATTCGAGACCAGCAGGGG 919
CTCTCTCCTTGGCCTGCAGATGG 832
GTCAGGAATTCGAGACCAGCAGG 509

Input

A list of gRNAs:

ACGACCTTGGCGCCACCACCTGG
TTATCTTTAACACCCCCTGCTGG
AGCTCTCGCACCGCCACTAGAGG
TTACAGGAGAGACCAGATGATGG
CACCGGTGGAATCCAGTAGGGGG
AGTGAGAGGAGGTGCCAGCAGGG
AGCCAGGTGCCGCCCTCCTGAGG
TGGGGGCCTGGGTGTCCACCAGG
GAGCCTTCAGCTACCTCATGTGG
TAGCAGCTGGGAACCAGCAGAGG

Note

if no PAM sequences in the gRNA input, please add --add_PAM option.

Usage

Step 0: Load python version 2.7.13.

module load python/2.7.13

Step 1: Run the command

Example command 1: with --add_PAM option.

cas_offinder.py -f gRNA.list2 --add_PAM

2019-07-17 11:09:04,293 - INFO - main - The job id is: cas_offinder_yli11_2019-07-17
2019-07-17 11:09:04,649 - INFO - submit_pipeline_jobs - cas has been submitted; JobID: 83786440

Example command 2: if input gRNA sequences contain PAM, then just run the following command.

cas_offinder.py -f gRNA.list

2019-07-17 11:09:24,777 - WARNING - main - The input job id is not available!
2019-07-17 11:09:24,777 - INFO - main - The new job id is: cas_offinder_yli11_2019-07-17_f0811dd87951
2019-07-17 11:09:24,890 - INFO - submit_pipeline_jobs - cas has been submitted; JobID: 83786441

Note

By default, maximal allowed mismatches is 2. You can control this by -n option.

To find gRNA locations

This program can also be helpful to find gRNA coordinates in the genome.

Now, my gRNA list doesn’t have PAM and actually contains G in the beginning of every gRNA. my command will be:

cas_offinder.py -f VPR.gRNA.list -n 0 --add_PAM --remove_first_G

2020-03-24 14:49:45,002 - INFO - main - The job id is: cas_offinder_yli11_2020-03-24
2020-03-24 14:49:45,154 - INFO - submit_pipeline_jobs - cas has been submitted; JobID: 99715775

Output

Once the job is finished, you will receive a notification email with the result attached.

In the JobID folder:

match.bed cas-offinder otput bed file (not standard format) showing the matches

match.bed.sorted: sorted standard bed format that are ready to use.

Comments

code @ github.