Off-targets and Variants overlap

usage: off_target_overlap_variant.py [-h] -o OUTPUT --cols COLS -f INPUT file [file ...]

positional arguments:
  file                  any number of vcf file, vcf file name is used as additional columns added to your input

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output file (default: None)
  --cols COLS           input chr, start, column names, sep by comma (default: None)
  -f INPUT, --input INPUT
                        input a csv table with chr start end of your off-targets (default: None)

Summary

Variant can affect CRISPR-cas9 activity, especially at PAM positions. This script will report any variant overlap with given off-targets.

Input

1. Off-target table (CSV)

This CSV table can have any number of columns, but it must have columns for chromosome, start and end location. Users need to provide these 3 column names.

e.g. -f CHANGE-seq-BE_CRISPRme_Circle-Seq.all.2_7_2025.csv, if your input file is in your working dir.

2. Variant file (VCF)

Usually these files end with .haplotype.g.vcf.gz.

e.g., 3278245_VK_2506/249598260/3278245_VK_2506.haplotype.g.vcf.gz  3278246_VK_2507/249598265/3278246_VK_2507.haplotype.g.vcf.gz  3278247_VK_2508/249598269/3278247_VK_2508.haplotype.g.vcf.gz You can input any number of vcf files, they can be provided as relative path or absolute path.

Usage

hpcf_interactive

module load conda3/202402

source activate /home/yli11/.conda/envs/jupyterlab_2024

bsub -R "rusage[mem=40000] span[hosts=1]" -n 4 -P Genome -J VCF -q standard -oo job.out -eo job.err "off_target_overlap_variant.py -f CHANGE-seq-BE_CRISPRme_Circle-Seq.all.2_7_2025.csv -o overlap.result --cols '#Chromosome,Start,End' 3278245_VK_2506/249598260/3278245_VK_2506.haplotype.g.vcf.gz  3278246_VK_2507/249598265/3278246_VK_2507.haplotype.g.vcf.gz  3278247_VK_2508/249598269/3278247_VK_2508.haplotype.g.vcf.gz"

Output

Overlap results keep the original input table the same, but adding vcf file name columns to indicate variants overlap. The last columns are vcf file names, if it shows variant(s), then it means that off-target overlaps with a variant. Variant ID is chr:pos:ref:variant format, the ref or variant base is always on positive strand.

code @ github.