cas9ENG¶
Input¶
Copy fastq files in the working dir and prepare the manifest file:
amplicon_seq: TTTCGGGTTTATTACAGGGACAGCAGAGATCCACTTTGGCGCCGGCGGATCCGGCATCGACTTCAAGGAGGANNNNNGGCTTAAGTAGGTACCGCACGTCGATATCTTCGAANNNNNNNNNNCCGGGTGCAAAGATGGATAAAGTTTTAAACAGAGAGGAATCTTTGCAGCTAATGGACCTTCTAGGTCTTGAAAGGAGTGGGAATTGGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAG
gRNA_seq: GGCATCGACTTCAAGGAGGA
barcode_seq: NNNNNNNNNN
PAM_seq: NNNNN
min_read_count: 5
input: 134_052820_input_merge.fastq.gz
cas9sso7d_rep2: 134_Cas9Sso7d_repB_S61_R1_001.fastq.gz.merge.extendedFrags.fastq.gz
cas9_rep1: 134_Cas9_repA_S40_R1_001.fastq.gz.merge.extendedFrags.fastq.gz
control_rep1: 134_Control_repA_S39_R1_001.fastq.gz.merge.extendedFrags.fastq.gz
cas9_rep2: 134_Cas9_repB_S60_R1_001.fastq.gz.merge.extendedFrags.fastq.gz
control_rep2: 134_Control_repB_S59_R1_001.fastq.gz.merge.extendedFrags.fastq.gz
cas9sso7d_rep1: 134_Cas9Sso7d_repA_S41_R1_001.fastq.gz.merge.extendedFrags.fastq.gz
Usage¶
export PATH=$PATH:"/home/yli11/HemTools/bin"
hpcf_interactive.sh
module load conda3
source activate /home/yli11/.conda/envs/crispresso2_env/
cas9ENG.py -m input.yaml
Output¶
1.raw_barcode_PAM.count.csv¶
This output is generated for each input and control file. Fist two columns are aligned sequences between the input amplicon sequence and read. Only alignment passed the default CrisprESSO cutoff (60) is used and outputed.
PAM_count.csv output is generated for cas9 and cas9sso7d files. The following columns are different:
2is either “Reference_MODIFIED” or “Reference_UNMODIFIED”, crisprEsso output.PAM_seq2is the detected PAM sequence in the reads.PAM_seqis the final assigned PAM. The priority of assignment is : (1) use PAM_seq2 if possible (e.g., no deletion in PAM_seq2) (2) assign PAM according to the barcode (3) assign PAM to barcode within maximal mismatch=1.PAMis the second and third bp of PAM_seq,PAM_seq[1:3]
2.barcode_PAM.filter.csv¶
The barcode dictionary after filtering (default min_read_count=5)
3. editing frequency¶
edit.freq.csv
cutting_freq_heatmap.pdf