gCrisprTools: Genome-wide CRISPR Screening¶
usage: crispr_seq.py [-h] [-j JID] [--interative] [-d DESIGN_MATRIX]
[-l GRNA_LIBRARY] [-c CONTROL_GRNA_GROUP]
[--min_read_count MIN_READ_COUNT] [-b BED]
(-f FASTQ_TSV | --guess_input) [-g GENOME]
analysis of crispr gRNA deep sequencing data
optional arguments:
-h, --help show this help message and exit
-j JID, --jid JID enter a job ID, which is used to make a new directory.
Every output will be moved into this folder. (default:
crispr_seq_yli11_2020-06-04)
--interative run pipeline interatively (default: False)
-d DESIGN_MATRIX, --design_matrix DESIGN_MATRIX
(Required) tsv 3 columns: group 1 , group 2,
comparison name. The second group is used as control.
(default: None)
-l GRNA_LIBRARY, --gRNA_library GRNA_LIBRARY
(Required) 3 columns csv, with header: id,seq,gene
(default: None)
-c CONTROL_GRNA_GROUP, --control_gRNA_group CONTROL_GRNA_GROUP
(Required) mageck format (default: None)
--min_read_count MIN_READ_COUNT
filter sgRNAs using read count, sgRNAs with less than
the given value will be filtered out (default: 10)
-b BED, --bed BED Genomic coordinates for gRNAs (Format: chr, start,
end, name). If provided, raw counts, logFC, logFDR
will be uploaded to protein paint for visualization.
(default: None)
-f FASTQ_TSV, --fastq_tsv FASTQ_TSV
tab delimited 3 columns (tsv file): Read 1 fastq,
Sample ID, group ID (default: None)
--guess_input Let the program generate the fastq.tsv and design.tsv
files for you. (default: False)
Genome Info:
-g GENOME, --genome GENOME
genome version: hg19, hg38, mm9, mm10. By default,
specifying a genome version will automatically update
index file, black list, chrom size and
effectiveGenomeSize, unless a user explicitly sets
those options. (default: hg19)
Summary¶
sgRNA were counted using Mageck
, then the significance of gRNA enrichment or depletion is evaluated using gCrisprTool
Usage¶
Go to your data directory and type the following.
Step 0: Load python version 2.7.13.
module load python/2.7.13
Step 1: Prepare input files, generate fastq.tsv and design_matrix.tsv
crispr_seq.py --guess_input
Note
pairwise comparison is specified in design_matrix.tsv, please make sure these computer generated files are correct.
Step 2: Submit your job.
You have to prepare a gRNA library file, see the next section for more details.
crispr_seq.py -f fastq.tsv -d design_matrix.tsv -l inhibation.gRNA.csv -c NON-TARGETING --interative
Input file¶
fastq.tsv & design_matrix.tsv¶
==> design_matrix.tsv <==
REP_DIFF_D5_BAND3HIGH REP_DIFF_D5_BAND3LOW REP_DIFF_D5_BAND3HIGH.vs.REP_DIFF_D5_BAND3LOW
REP_DIFF_D5_BAND3HIGH REP_DIFF_D3_BAND3LOW REP_DIFF_D5_BAND3HIGH.vs.REP_DIFF_D3_BAND3LOW
REP_DIFF_D5_BAND3HIGH REP_DIFF_D3_BAND3HIGH REP_DIFF_D5_BAND3HIGH.vs.REP_DIFF_D3_BAND3HIGH
REP_DIFF_D5_BAND3HIGH REP_D2_EXP REP_DIFF_D5_BAND3HIGH.vs.REP_D2_EXP
REP_DIFF_D5_BAND3HIGH REP_D8_EXP REP_DIFF_D5_BAND3HIGH.vs.REP_D8_EXP
REP_DIFF_D5_BAND3HIGH REP_D0_EXP REP_DIFF_D5_BAND3HIGH.vs.REP_D0_EXP
REP_DIFF_D5_BAND3HIGH REP_D5_EXP REP_DIFF_D5_BAND3HIGH.vs.REP_D5_EXP
REP_DIFF_D5_BAND3LOW REP_DIFF_D3_BAND3LOW REP_DIFF_D5_BAND3LOW.vs.REP_DIFF_D3_BAND3LOW
REP_DIFF_D5_BAND3LOW REP_DIFF_D3_BAND3HIGH REP_DIFF_D5_BAND3LOW.vs.REP_DIFF_D3_BAND3HIGH
REP_DIFF_D5_BAND3LOW REP_D2_EXP REP_DIFF_D5_BAND3LOW.vs.REP_D2_EXP
==> fastq.tsv <==
REP_DIFF_D5_BAND3HIGH_R3_C7.fastq.gz REP_DIFF_D5_BAND3HIGH_R3_C7 REP_DIFF_D5_BAND3HIGH
REP_DIFF_D5_BAND3LOW_R3_C11.fastq.gz REP_DIFF_D5_BAND3LOW_R3_C11 REP_DIFF_D5_BAND3LOW
REP_DIFF_D3_BAND3LOW_R1_B9.fastq.gz REP_DIFF_D3_BAND3LOW_R1_B9 REP_DIFF_D3_BAND3LOW
REP_DIFF_D5_BAND3LOW_R2_C10.fastq.gz REP_DIFF_D5_BAND3LOW_R2_C10 REP_DIFF_D5_BAND3LOW
REP_DIFF_D5_BAND3LOW_R1_C9.fastq.gz REP_DIFF_D5_BAND3LOW_R1_C9 REP_DIFF_D5_BAND3LOW
REP_DIFF_D3_BAND3HIGH_R3_B7.fastq.gz REP_DIFF_D3_BAND3HIGH_R3_B7 REP_DIFF_D3_BAND3HIGH
REP_DIFF_D3_BAND3LOW_R3_B11.fastq.gz REP_DIFF_D3_BAND3LOW_R3_B11 REP_DIFF_D3_BAND3LOW
REP_DIFF_D3_BAND3HIGH_R4_B8.fastq.gz REP_DIFF_D3_BAND3HIGH_R4_B8 REP_DIFF_D3_BAND3HIGH
REP_DIFF_D3_BAND3HIGH_R1_B5.fastq.gz REP_DIFF_D3_BAND3HIGH_R1_B5 REP_DIFF_D3_BAND3HIGH
REP_DIFF_D3_BAND3HIGH_R2_B6.fastq.gz REP_DIFF_D3_BAND3HIGH_R2_B6 REP_DIFF_D3_BAND3HIGH
gRNA library file¶
gRNA library csv file (–gRNA_library option, required)
This file specifies your gRNA library. It is a csv file where the columns are sgRNA id, sgRNA sequence, and the targeted gene. An example file is shown below.
id,seq,Gene
chr11:4167629-AAATTTCCTCAGCAGATTAC,AAATTTCCTCAGCAGATTAC,Gene1
Please_no_space_anywhere,ACAAGCAACAGTTGACCAAC,Gene1
could_be_anything,ACATGAGACTGGAAACCGCC,control
Comments¶
code @ github.