CRISPR Screening Demultiplexing

Note

This protocol assumes your barcodes locating at 5’-end

Step 1: Prepare barcode.fa

Note

Your barcode file must start with ^. The ^ is supposed to indicate the the adapter is “anchored” at the beginning of the read.

>BE_D5_R1
^GCATGCAC
>BE_D5_R2
^TACGATGC
>ABE_DIFF_D5_R1
^CTATAGAG

Edit and save the file in sublime text, and then use FileZilla to upload it to HPC.

Step 2: Submit job

Tip

You can change the requested memory in rusage[mem=10000]. This example requests 10G memory. If the original fastq.gz file is less than 20G, then it doesn’t need much memory.

#BSUB -P split
#BSUB -oo split.out -eo split.err
#BSUB -n 1
#BSUB -q standard
#BSUB -R "rusage[mem=10000]"
#BSUB -J "Demultiplex"

module load python/3.7.0

cutadapt \
--no-indels \
-g file:barcode.fa \
--no-trim \
--untrimmed-output untrimmed.fastq.gz \
-o {name}.fastq.gz \
gRNA_S1_R1_001.fastq.gz

barcode.fa is the barcode file

gRNA_S1_R1_001.fastq.gz is the fastq file to be demultiplexed.

You can copy and edit the above code in sublime text, and save it as split.lsf and then use FileZilla to upload it to HPC. For example, you need to change the input file name (i.e., gRNA_S1_R1_001.fastq.gz) to yours.

Once you have the split.lsf on HPC, you can do the following to submit the job.

/home/yli11/HemTools/bin/dos2unix split.lsf
bsub < split.lsf

Include other parameters

Allow one mismatch -e 0.15

#BSUB -P split
#BSUB -oo split.out -eo split.err
#BSUB -n 1
#BSUB -q standard
#BSUB -R "rusage[mem=10000]"
#BSUB -J "Demultiplex"

module load python/3.7.0

cutadapt \
--no-indels \
-e 0.15 \
-g file:barcode.fa \
--no-trim \
--untrimmed-output untrimmed.fastq.gz \
-o {name}.fastq.gz \
gRNA_S1_R1_001.fastq.gz

Count N in the barcode sequence

Sometimes the read is CTGTANGTxxxxxx, your barcode is CTGTATGT, this is one mismatch, however, cutadapt will just simply ignore it. For this siuation, use --match-read-wildcards.

Note

The following script is still zero mismatch because N is not counted as a mismatch by cutadaptor.

#BSUB -P split
#BSUB -oo split.out -eo split.err
#BSUB -n 1
#BSUB -q standard
#BSUB -R "rusage[mem=10000]"
#BSUB -J "Demultiplex"

module load python/3.7.0

cutadapt \
--no-indels \
--match-read-wildcards \
-g file:barcode.fa \
--no-trim \
--untrimmed-output untrimmed.fastq.gz \
-o {name}.fastq.gz \
gRNA_S1_R1_001.fastq.gz

Relaxed string matching

Combining -e 0.15 and --match-read-wildcards

#BSUB -P split
#BSUB -oo split.out -eo split.err
#BSUB -n 1
#BSUB -q priority
#BSUB -R "rusage[mem=10000]"
#BSUB -J "Demultiplex"

module load python/3.7.0

cutadapt \
--match-read-wildcards \
-e 0.15 \
--no-indels \
-g file:barcode.fa \
--no-trim \
--untrimmed-output untrimmed.fastq.gz \
-o {name}.fastq.gz \
Undetermined_S0_R1_001.fastq.gz

code @ github.