Replicate correlation and QC for HiC data

Input

Use hicpro_to_bedpe.py to generate input1 and input2.

1. contact matrix for your two samples (.gz)

I found the last column has to be int

21      10050000        21      10050000        6
21      10050000        21      10150000        1
21      10050000        21      11000000        1
21      10100000        21      41700000        1
21      10200000        21      40350000        1
21      10350000        21      10900000        1
21      10350000        21      25600000        1
21      10400000        21      10400000        12
21      10400000        21      10450000        4
21      10400000        21      10500000        1

2. bed.gz

chr22   0       50000   0
chr22   50000   100000  50000
chr22   100000  150000  100000
chr22   150000  200000  150000
chr22   200000  250000  200000
chr22   250000  300000  250000

looks like we need to remove chr? Answer is no, tested.

you have to remove chrM from the matrix and bed file.

3. metadata

==> metadata.pairs <==
HIC001  HIC002

==> metadata.samples <==
HIC001  /home/yli11/Programs/3DChromatin_ReplicateQC/examples/HIC001.res50000.gz
HIC002  /home/yli11/Programs/3DChromatin_ReplicateQC/examples/HIC002.res50000.gz

Notes

https://github.com/kundajelab/3DChromatin_ReplicateQC

Installation

You have to create a new conda env for python2.7 because HiFive requires py2.

R>3.4

I have no problem following the installation.sh but for R, I have to do some manual installation.

Overall it is smooth, the original document did not specify python2.7.

Starter example

Finished correctly.

3d_genome_py2

bsub -q priority -P Genomics -R ‘rusage[mem=60000]’ 3DChromatin_ReplicateQC run_all –metadata_samples metadata.samples –metadata_pairs metadata.pairs –bins /home/yli11/dirs/hg19_20copy_result/keep_dup_Jurkat_20copy/hicpro_results/hic_results/matrix/Jurkat_20copy/HiCPro_100000_bed.repQC.gz –outdir replicate_QC

code @ github.