Total reads in peaks normalized bigwiggle track

usage: normalize_bw_given_peak.py [-h] [-j JID] -f INPUT_LIST
                                  [--paired_end_flag] [-g GENOME]
                                  [-e EFFECTIVEGENOMESIZE]

optional arguments:
  -h, --help            show this help message and exit
  -j JID, --jid JID     enter a job ID, which is used to make a new directory.
                        Every output will be moved into this folder. (default:
                        normalize_bw_given_peak_yli11_2019-08-20)
  -f INPUT_LIST, --input_list INPUT_LIST
                        a 3-col tsv file containing bam, peak, output_name
                        (default: None)
  --paired_end_flag

Genome Info:
  -g GENOME, --genome GENOME
                        genome version: hg19, hg38, mm9, mm10. (default: hg19)
  -e EFFECTIVEGENOMESIZE, --effectiveGenomeSize EFFECTIVEGENOMESIZE
                        effectiveGenomeSize for bamCoverage (default:
                        2451960000)

Summary

Scale the bigwiggle files using the total number of reads in a given peak file.

Tip: the fold enrichment (_FE.bw) is already a normalized chip-seq signal by MACS2. If you already see a difference, this frip-nomalized method may make the difference disappear. See details below.

This normalization method re-scale the data by number of reads in peak (FRiP), ENCODE standard for FRiP is 1%, that means high quality data should have at least 1% of the total mapped reads in called peaks. FRiP depends on TF, cell line, as well as your chip-seq data quality. WT and KO may have true FRiP differences, if using this method, this difference could be removed and resuling no difference in the visualization.

Input

A 3-column tsv file: bam, peak, output_name

The first 4 columns of peak file should be chr, start, end, name. Additional columns are allowed, but they will be ignored.

The second column is the peak file, which should be the same for every line.

Bam index files (.bai) should be located in the same dir as the bam files. If you are using the bam files generated by HemTools, then by default, they are in the same directory.

/path/to/file/GATA1_S10.markdup.bam     myPeak.narrowPeak       output1
GATA1_S1.bam    myPeak.narrowPeak       output2
GATA1_S2.markdup.bam    myPeak.narrowPeak       output3

Usage

hpcf_interactive

module load python/2.7.13

normalize_bw_given_peak.py -f input.list --paired_end_flag

For single-end data use:

normalize_bw_given_peak.py -f input.list

For other genomes use:

normalize_bw_given_peak.py -f input.list -g hg38

normalize_bw_given_peak.py -f input.list -g mm10

Output

Once finished, you will be notified by email. All generated bw files are located in the job ID folder.

code @ github.