CHANGE-seq analysis pipeline¶
usage: changeseq.py [-h]
{all,parallel,align,merge,identify,visualize,variants,reference-free}
...
optional arguments:
-h, --help show this help message and exit
subcommands:
Individual Step Commands
{all,parallel,align,merge,identify,visualize,variants,reference-free}
Use this to run individual steps of the pipeline
all Run all steps of the pipeline
parallel Run all steps of the pipeline in parallel
align Run alignment only
merge Merge paired end reads
identify Run identification only
visualize Run visualization only
variants Run variants analysis only
reference-free Run reference-free discovery only
Summary¶
Three new YAML parameters to specify:
PAM sequence (default:
PAM: NGG
)Read count threshold for off-target calling (default:
read_count_cutoff: 6
)Read length parameter for outputs from different sequencing platform (default:
read_length: 151
)
Fix the code to get the ‘realigned target sequence’ to use the new feature in regex library that provides fuzzy_changes
Fix bug in visualization.py where it stops when no off-targets are found
Add control read counts to output
Input¶
A YAML file, (same as before, the three new parameters are not required)
reference_genome: data/input/CIRCLEseq_test_genome.fa
analysis_folder: data/MergedOutput
bwa: bwa
samtools: samtools
window_size: 3
mapq_threshold: 50
start_threshold: 1
gap_threshold: 3
mismatch_threshold: 6
merged_analysis: True
PAM: NGG
read_count_cutoff: 6
read_length: 151
samples:
TestSample:
target: GAGTCCGAGCAGAAGAAGAANGG
read1: data/input/TEST.r1.fastq.gz
read2: data/input/TEST.r2.fastq.gz
controlread1: data/input/TEST_control.r1.fastq.gz
controlread2: data/input/TEST_control.r2.fastq.gz
description: TestCell
Usage¶
export PATH=$PATH:"/home/yli11/HemTools/bin"
hpcf_interative.sh
module load conda3
source activate /home/yli11/.conda/envs/change_seq
module load bwa
# copy test data
cp -r /home/yli11/Tools/changeseq/test .
cd test
changeseq.py all -m CIRCLEseq_MergedTest.yaml
Output files are located in data/MergedOutput/