CRISPR Screening Demultiplexing (hard trim first N random bp)

Note

This protocol assumes your barcode is located after 4 random bp beginning at 5’-end.

Summary

We will remove the first 4 base pairs and then do the same thing as described in this post.

Usage

Step 1: Prepare barcode.fa

Note

Your barcode file must start with ^. The ^ is supposed to indicate the the adapter is “anchored” at the beginning of the read.

>BE_D5_R1
^GCATGCAC
>BE_D5_R2
^TACGATGC
>ABE_DIFF_D5_R1
^CTATAGAG

Edit and save the file in sublime text, and then use FileZilla to upload it to HPC.

Step 2: Submit job

Tip

You can change the requested memory in rusage[mem=10000]. This example requests 10G memory. If the original fastq.gz file is less than 20G, then it doesn’t need much memory.

#BSUB -P split
#BSUB -oo split.out -eo split.err
#BSUB -n 1
#BSUB -q standard
#BSUB -R "rusage[mem=10000]"
#BSUB -J "Demultiplex"

module load python/3.7.0

cutadapt --cut 4 -o output.fastq.gz gRNA_S1_R1_001.fastq.gz

/home/yli11/HemTools/bin/dos2unix barcode.fa

cutadapt \
--no-indels \
-g file:barcode.fa \
--no-trim \
--untrimmed-output untrimmed.fastq.gz \
-o {name}.fastq.gz \
output.fastq.gz

barcode.fa is the barcode file

gRNA_S1_R1_001.fastq.gz is the fastq file to be demultiplexed.

You can copy and edit the above code in sublime text, and save it as split.lsf and then use FileZilla to upload it to HPC. For example, you need to change the input file name (i.e., gRNA_S1_R1_001.fastq.gz) to yours.

Once you have the split.lsf on HPC, you can do the following to submit the job.

/home/yli11/HemTools/bin/dos2unix split.lsf
bsub < split.lsf

code @ github.