Paired-end histone ChIP-seq or CUT&RUN¶
usage: HemTools cut_run_histone [-h] [-j JID] [--short] [--debug] [--broad]
[-f INPUT] [-d DESIGN_MATRIX] [--guess_input]
[-i INDEX_FILE] [-g GENOME] [-b BLACKLIST]
[-s CHROM_SIZE] [-e EFFECTIVEGENOMESIZE]
Named Arguments¶
- -j, --jid
enter a job ID, which is used to make a new directory. Every output will be moved into this folder.
Default: “{{subcmd}}_docs_2024-04-18”
- --short
Force to use the short queue. (only if R1+R2 fastq.gz size <=250M)
Default: False
- --debug
Not for end-user.
Default: False
- --broad
broad peak calling
Default: False
- -f, --input
tab delimited 3 columns (tsv file): Read 1 fastq, Read 2 fastq, sample ID
- -d, --design_matrix
tab delimited 3 columns (tsv file): treatment sample ID, control sample ID, peakcall ID
- --guess_input
Let the program generate the input files for you.
Default: False
Genome Info¶
- -i, --index_file
BWA index file
Default: “/home/docs/checkouts/readthedocs.org/user_builds/hemtools/checkouts/latest/subcmd/../hg19/bwa_16a_index/hg19.fa”
- -g, --genome
genome version: hg19, hg38, mm10, mm9.
Default: “hg19”
- -b, --Blacklist
Blacklist file
Default: “/home/docs/checkouts/readthedocs.org/user_builds/hemtools/checkouts/latest/subcmd/../hg19/Hg19_Blacklist.bed”
- -s, --chrom_size
chrome size
Default: “/home/docs/checkouts/readthedocs.org/user_builds/hemtools/checkouts/latest/subcmd/../hg19/hg19.chrom.sizes”
- -e, --effectiveGenomeSize
effectiveGenomeSize for bamCoverage
Default: “2451960000”
Usage¶
Go to your data directory and type the following.
Step 0: Load python version 2.7.13.
module load python/2.7.13
Step 1: Prepare input files, generate fastq.tsv.
HemTools cut_run_histone --guess_input
Input fastq files preparation complete! ALL GOOD!
Please check if you like the computer-generated labels in : fastq.tsv
Input peakcall file preparation complete! File name: peakcall.tsv
Note
If you are preparing fastq.tsv and peakcall.tsv yourself, please make sure no space anywhere
in the file. Note that the seperator is tab. Spaces in file name will cause errors.
Step 2: Check the computer-generated input list (manually), make sure they are correct.
less fastq.tsv
less peakcall.tsv
Note
a random string will be added to the generated files (e.g., fastq.94c049cbff1f.tsv) if they exist before running step 1.
Step 3a: (Narrow Peak) Submit your job.
HemTools cut_run_histone -f fastq.tsv -d peakcall.tsv
Step 3b: (Broad Peak) Submit your job.
HemTools cut_run_histone -f fastq.tsv -d peakcall.tsv --broad
Tip
If you have both narrow peak histone and broad peak histone data in your fastq.tsv, then the simplest way to run is just run step3a and step3b at the same time (see step3c
). Note that HemTools generates data in the current directory, then after everything is finished, move the files to the jid folder. That means, to run step3a and step3b at the same time, you have to create two new folders, see step3c
.
Step 3c: Run both narrow peak and broad peak for the same input
mkdir narrowPeak_call
cd narrowPeak_call
ln -s ../*.gz .
ln -s ../*.tsv .
HemTools cut_run_histone -f fastq.tsv -d peakcall.tsv
cd ..
mkdir broadPeak_call
cd broadPeak_call
ln -s ../*.gz .
ln -s ../*.tsv .
HemTools cut_run_histone -f fastq.tsv -d peakcall.tsv --broad
Sample input format¶
fastq.tsv
This is a tab-seperated-value format file. The 3 columns are: Read 1, Read 2, sample ID.
peakcall.tsv
This is also a tab-seperated-value format file. The 3 columns are: treatment sample ID, control/input sample ID, peakcall ID.
Report bug¶
Once the job is finished, you will be notified by email with some attachments. If no attachment can be found, it might be caused by an error. In such case, please go to the result directory (where the log_files folder is located) and type:
HemTools report_bug
Use different genome index¶
HemTools cut_run -f fastq.tsv -d peakcall.tsv -i YOUR_GENOME_INDEX
Comments¶
code @ github.