Plot correlation scatter plots

usage: [-h] -f F [-s S] -x X -y Y [--index INDEX]
                          [--regression] [--diagnal_line] [--lowess]
                          [-bc BACKGROUND_DOTS_COLOR] [-hc HIGHLIGHT_COLOR]
                          [--highlight HIGHLIGHT] [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -f F                  data table (default: None)
  -s S                  sep (default: )
  -x X                  column name for x axis (default: None)
  -y Y                  column name for y axis (default: None)
  --index INDEX         index name for index (default: None)
  --regression          by default it is a dignal line (default: False)
  --diagnal_line        force to draw a dignal line (default: False)
  --lowess              fit a curve (default: False)
                        background_dots_color (default: #0000ff)
  -hc HIGHLIGHT_COLOR, --highlight_color HIGHLIGHT_COLOR
                        highlight_color (default: #ff1500)
  --highlight HIGHLIGHT
                        column name for y axis, sep by comma (default: None)
  -o OUTPUT, --output OUTPUT
                        output file name (default:


This script can be used to calculate sample correlation. Values are log-transformed, e.g., log2(x+1). Addtionally, you can also highlight some points, for example, to show some differences.


Input can be tsv or csv. For tsv use -s "\t", for csv use -s ,

Geneid  Chr     Start   End     Strand  Length  Banana  Orange
asd1    chr1    3513707 3514076 +       370     800     22
b       chr1    3538168 3538438 +       271     24      16
a       chr1    3970540 3970785 +       246     16      6
b       chr1    4059120 4059436 +       317     44      12
a       chr1    4388977 4389294 +       318     22      11
b       chr1    4561768 4562101 +       334     31      11
a       chr1    4760133 4760340 +       208     23      9
b       chr1    5073062 5073299 +       238     36      20

The above example is a read count distribution for two chip-seq replicates.

The aim is to see the correlation between Banana and Orange, use -x Banana -y Orange to plot Banana column as the X-axis and Orange column as the Y-axis.


Sample correlation usage

hpcf_interactive -q standard -R "rusage[mem=10000]"

module load conda3

source activate /home/yli11/.conda/envs/py2/ -f input.tsv -s "\t" -x Banana -y Orange

Point highlight usage

hpcf_interactive -q standard -R "rusage[mem=10000]"

module load conda3

source activate /home/yli11/.conda/envs/py2/ -f input.tsv -s "\t" -x Banana -y Orange --index Geneid --highlight asd1 --regression

--regression is to add regression line. For sample correlation plot, regression line is on, for differential plot, the default is off.

If you have multiple points to highlight, use --highlight asd1,another_name,another_name2. These names should match the index name, which is defined using --index Geneid


sample correlation

The value shown on the upper left corner is pearson correlation coefficicient.


differential analysis highlight

The value shown on the upper left corner is pearson correlation coefficicient.


code @ github.