Total reads in peaks normalized bigwiggle track¶
usage: normalize_bw_given_peak.py [-h] [-j JID] -f INPUT_LIST
[--paired_end_flag] [-g GENOME]
[-e EFFECTIVEGENOMESIZE]
optional arguments:
-h, --help show this help message and exit
-j JID, --jid JID enter a job ID, which is used to make a new directory.
Every output will be moved into this folder. (default:
normalize_bw_given_peak_yli11_2019-08-20)
-f INPUT_LIST, --input_list INPUT_LIST
a 3-col tsv file containing bam, peak, output_name
(default: None)
--paired_end_flag
Genome Info:
-g GENOME, --genome GENOME
genome version: hg19, hg38, mm9, mm10. (default: hg19)
-e EFFECTIVEGENOMESIZE, --effectiveGenomeSize EFFECTIVEGENOMESIZE
effectiveGenomeSize for bamCoverage (default:
2451960000)
Summary¶
Scale the bigwiggle files using the total number of reads in a given peak file.
Tip: the fold enrichment (_FE.bw) is already a normalized chip-seq signal by MACS2. If you already see a difference, this frip-nomalized method may make the difference disappear. See details below.
This normalization method re-scale the data by number of reads in peak (FRiP), ENCODE standard for FRiP is 1%, that means high quality data should have at least 1% of the total mapped reads in called peaks. FRiP depends on TF, cell line, as well as your chip-seq data quality. WT and KO may have true FRiP differences, if using this method, this difference could be removed and resuling no difference in the visualization.
Input¶
A 3-column tsv file: bam, peak, output_name
The first 4 columns of peak file should be chr, start, end, name. Additional columns are allowed, but they will be ignored.
The second column is the peak file, which should be the same for every line.
Bam index files (.bai) should be located in the same dir as the bam files. If you are using the bam files generated by HemTools, then by default, they are in the same directory.
/path/to/file/GATA1_S10.markdup.bam myPeak.narrowPeak output1
GATA1_S1.bam myPeak.narrowPeak output2
GATA1_S2.markdup.bam myPeak.narrowPeak output3
Usage¶
hpcf_interactive
module load python/2.7.13
normalize_bw_given_peak.py -f input.list --paired_end_flag
For single-end data use:
normalize_bw_given_peak.py -f input.list
For other genomes use:
normalize_bw_given_peak.py -f input.list -g hg38
normalize_bw_given_peak.py -f input.list -g mm10
Output¶
Once finished, you will be notified by email. All generated bw files are located in the job ID folder.