Plot bw file correlation

usage: plot_bw_corr.py [-h] [-j JID] [-f BW_FILES] [-b BIN_SIZE]
                       [--bed_file BED_FILE] [-r REGION] [-o OUTPUT]
                       [--addon_parameter ADDON_PARAMETER]

plot correlation for all bw files in the current dir

optional arguments:
  -h, --help            show this help message and exit
  -j JID, --jid JID     enter a job ID, which is used to make a new directory.
                        Every output will be moved into this folder. (default:
                        plot_bw_corr_yli11_2021-08-02)
  -f BW_FILES, --bw_files BW_FILES
                        input file or use all bw files in the current dir
                        (default: None)
  -b BIN_SIZE, --bin_size BIN_SIZE
  --bed_file BED_FILE
  -r REGION, --region REGION
                        Could be chr11:5267561-5277281, HBG region (default:
                        None)
  -o OUTPUT, --output OUTPUT
  --addon_parameter ADDON_PARAMETER

Summary

Plot spearman correlation given all bw files in the current dir. By default, bin size is 10kb.

Updates: Now user can provide a peak file to calculate correlation.

Input

Copy the bw files to your working dir, if you have a peak file, you can also copy it here. If you have multiple peak files, merge the first (How to merge? see: Merge_bed) and then copy the merged bed file here.

No specific input files are needed because all bw files in the current dir will be automatically used.

You can definitely control the input files using -f option. Files have to be quoted and separated by space, i.e., "file1.bw file2.bw file3.bw"

Output

In these plots, blue color indicates density.

../../_images/bw_corr.png

Usage

Go to your data directory and type the following.

Step 0: Load python version 2.7.13.

hpcf_interactive

module load python/2.7.13

Step 1: Run the program

plot_bw_corr.py

2019-10-18 14:40:49,646 - INFO - main - The job id is: plot_bw_corr_yli11_2019-10-18
2019-10-18 14:40:49,763 - INFO - submit_pipeline_jobs - cor has been submitted; JobID: 88117190

Note

You can also control the bin size and a specific region to use when calculating correlations. See the example below.

plot_bw_corr.py -b 150 -r chr11:5267561-5277281

Usage: user input bed file

The following code uses all bw files in the current dir and a user-input bed file to calculate correlatin. Output is [output_label]_spearman_bed.pdf and [output_label]_pearson_bed.pdf

plot_bw_corr.py --bed_file input.bed

Re-order labels

Once the program is finish and you are not satistified with the order of columns and rows, you can re-order the figure using the following commands.

Input is the plotCorrelation.tab in your result folder.

Details are also provided in: plot_corr_reorder

Example — compare your ATAC-seq to public blood lineage ATAC-seq

Save the bw file list in blood_data as input.list.

hpcf_interactive

module load python/2.7.13

for i in `cat input.list`;do ln -s $i;done

## ln -s your own bw files here

plot_bw_corr.py

Why the low-values looks wider than the high-values in the scatter plot?

../../_images/log2_scatter_plots.png

There are several reasons:

    1. usually low-expressed genes tend to have more variance.

    1. scale is different. most dots are squezed in a small area, maybe 0-200, but the range is 0-1400 (left figure), makes it look like quite the same between the two conditions (X and Y). On the other hand, log-transformed plot will have smaller range, which makes the variance visuable.

    1. log-transformed is not linear. Differences in low-value will be larger and differences in high-value will be smaller. see: https://people.revoledu.com/kardi/tutorial/Regression/nonlinear/NonLinearTransformation.htm

    1. Specifically for MA-plot or any other count-based scatter plots, log-transform of low-values, for example, 1-10, the output will look quite sparse on X-axis. On Y-axis, since it’s a ratio, it will still look like continous. Altogether, makes the MA-plot looks like https://mikelove.github.io/counts-model/model.html

log2 or log10 transformation?

Not likely to have visuable differences.

code @ github.