Tsai Lab Bioinformatics

Protein mutagenesis

Code: https://github.com/tsailabSJ/Cas9Variants/tree/master/Mammalian_system

1. Dictionary generation

See PacBio_amplicon_sequencing folder. The code was originally written for Cas9 (Kasey), recently adapted to PE/RT (Kiera).

Kiera’s data is large, I have to split the reads and run the pipeline individually. use split.sh to split the reads.

Kasey dict: /research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/projects/Cas9mutagenesis/pacbio_230301_293246_tsaigrp_Amplicon/pacbio_cas9mut_amp_yli11_2023-07-13

KJ001.read_stat.csv.all_dictionary.tsv

KJ002.read_stat.csv.all_dictionary.tsv

Kiera dict: /research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/projects/pacbio/825386_tsaigrp_Amplicon_Kiera/create_barcode_yli11_2024-05-22

2. Compare enrichment of mutagenesis assay

See main_v2.lsf pipeline description. Code is generic to Cas9 or PE/RT.

Usually Kesey run this pipeline herself. Sometimes error happens when the input format is wrong, e.g., upper-case lower-case, tab/space, name not match in fastq.tsv and design_matrix.

Interactive heatmap is generated by the pipeline. This is how to run it manually.

src=/home/yli11/Tools/Cas9Variants/Mammalian_system

module load conda3/202011

source activate /home/yli11/.conda/envs/captureC

cd {{jid}}

comparison=${COL1}_vs_${COL2}

interactive_heatmap.py -f $comparison.AA_compare.csv --reformat_config liyc --header -o $comparison.AA_compare.interactive.html

PAM specificity assay

code: https://github.com/tsailabSJ/LentiviralPAMspecificity

1. Dictionary generation

2. Calculate enrichment

Pooled-GUIDE-seq

pooled gRNA design: /research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/projects/Azusa_pooled_GUIDE

code: guideseq_pool_gRNA_design_given_genes.py guideseq_pool_gRNA_design.py

guideseq_pool_gRNA_design_given_genes.py is unfinished I think.

This is how I used to generate the library:

module load conda3/202011

source activate /home/yli11/.conda/envs/captureC

guideseq_pool_gRNA_design.py -h

guideseq_pool_gRNA_design.py -i tcell_exon.bed --sample 21083 -o guideseq_pool_run1.csv

GUIDE-seq, CHANGE-seq, CHANGE-seq BE

They all know how to run them.

CHANGE-seq-BE:/research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/src/changeseq_py3

GUIDE-seq:/research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/src/changeseq_py3

CHANGE-seq:/research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/src/changeseq

PARADIGM

variant design code: /research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/projects/Paradigm/Yichao_code

pipeline code: https://github.com/tsailabSJ/PARADIGM_code

R code to calculate different crisprscores

cutadapt used to extract variable sequence given two flanking sequences

Genetic variation

data and code: /research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/projects/GUIDEseq/02172023_GUIDEseq2_GV_Novaseq

randomized CHANGE-seq

code: https://github.com/tsailabSJ/changeseq_randomized

/research_jude/rgs01_jude/groups/tsaigrp/projects/Genomics/common/projects/CHANGE_seq/liyc_Mixed_Base_analysis

Usually Ashely run this pipeline herself.

code @ github.