MNase-based HiCHIP data analysis¶
usage: hichip.py [-h] [-j JID] -f FASTQ_TSV -p PEAK_BED [-IntType INTTYPE]
[-BINSIZE BINSIZE] [-mapq_cutoff MAPQ_CUTOFF]
[-LowDistThr LOWDISTTHR] [-UppDistThr UPPDISTTHR]
[-UseP2PBackgrnd USEP2PBACKGRND] [-BiasType BIASTYPE]
[-MergeInt MERGEINT] [-QVALUE QVALUE] [-g GENOME]
[--bwa_index BWA_INDEX] [-s CHROM_SIZE]
RGT_HINT atac-seq footprint with bias correction
optional arguments:
-h, --help show this help message and exit
-j JID, --jid JID enter a job ID, which is used to make a new directory.
Every output will be moved into this folder. (default:
hichip_yli11_2022-08-01)
-f FASTQ_TSV, --fastq_tsv FASTQ_TSV
3-col tsv, R1, R2, sample_ID (default: None)
-p PEAK_BED, --peak_bed PEAK_BED
narrowpeak file (default: None)
-IntType INTTYPE #Interaction type - 1: peak to peak (CTCF) 2: peak to
non peak 3: peak to all (default, h3k27ac) 4: all to
all 5: everything from 1 to 4. (default: 3)
-BINSIZE BINSIZE # Size of the bins, in bases, for detecting the
interactions. (default: 2500)
-mapq_cutoff MAPQ_CUTOFF
mapq_cutoff (default: 0)
-LowDistThr LOWDISTTHR
Lower distance threshold of interaction between two
segments (default: 10000)
-UppDistThr UPPDISTTHR
Upper distance threshold of interaction between two
segments (default: 2000000)
-UseP2PBackgrnd USEP2PBACKGRND
# Applicable only for peak to all output interactions
- values: 0 / 1 # if 1, uses only peak to peak loops
for background modeling - corresponds to FitHiChIP(S)
# if 0, uses both peak to peak and peak to nonpeak
loops for background modeling - corresponds to
FitHiChIP(L) (default: 1)
-BiasType BIASTYPE # parameter signifying the type of bias vector -
values: 1 / 2 # 1: coverage bias regression 2: ICE
bias regression (default: 1)
-MergeInt MERGEINT # following parameter, if 1, means that merge
filtering (corresponding to either FitHiChIP(L+M) or
FitHiChIP(S+M)) # depending on the background model,
would be employed. Otherwise (if 0), no merge
filtering is employed. (default: 1)
-QVALUE QVALUE FDR (q-value) threshold for loop significance
(default: 0.05)
Genome Info:
-g GENOME, --genome GENOME
genome version: hg19, hg38, mm9, mm10. By default,
specifying a genome version will automatically update
index file, black list, chrom size and
effectiveGenomeSize, unless a user explicitly sets
those options. (default: hg19)
--bwa_index BWA_INDEX
bwa index file (default: /home/yli11/Data/Human/hg19/i
ndex/bwa_16a_index/hg19.fa)
-s CHROM_SIZE, --chrom_size CHROM_SIZE
chrome size (default: /home/yli11/Data/Human/hg19/anno
tations/hg19.chrom.sizes)
Summary¶
Pipeline is adopted from: https://hichip.readthedocs.io/en/latest/index.html
Input¶
Requires paired-end fastq files and a peak (narrowPeak format) file. If we are doing a CTCF HiChIP, then we should have done a CTCF ChIP-seq in order to have the CTCF narrowPeak file.
Usage¶
Go to your fastq files folder and do the following:
hpcf_interactive
module load python/2.7.13
run_lsf.py --guess_input
hichip.py -f fastq.tsv -p CTCF.bed -g hg19 -IntType 1
hichip.py -f fastq.tsv -p H3k27ac.bed -g hg19 -IntType 3
1 means peak-to-peak interaction 3 means peak-to-all interaction (see help message for other types 1-5)
Output¶
1. Library QC¶
See hichip_qc_summary.html
. QC standards are:
Metric |
Shallow Seq (20M) |
Deep Seq (100-200M) |
---|---|---|
No-Dup Read Pairs |
>75% |
>50% |
No-dup cis read pairs ≥ 1kb |
>20% |
>20% |
Total reads in 1000 bp around center of peaks |
>2% |
>2% |
NOTE: The FitHiCHIP author said if ``No-dup cis read pairs ≥ 10kb `` is around 10M, you should increase bin size (i.e., lower resolution) to 20kb in order to see more significant loops. I found if the value is near 50M, then 2.5kb or 5kb bin is OK; I’m aiming for 10k to 20k total interactions.
I also found this GATA1 HiCHIP data in mouse https://www.pnas.org/doi/full/10.1073/pnas.2008672117 where they used juicer tools and identified 40K-80K interactions (I guess these are ALL to ALL interactions).
To just redo fithichip calling, modify the config file (hichip.config.txt
) in the jobID folder and run the following:
Remember to change OutDir
, otherwise the default output folder fithichip_results
will be overwritten.
bsub -q priority -P Genomics -R 'rusage[mem=30000]' -J fit /home/yli11/Programs/FitHiChIP/FitHiChIP_HiCPro.sh -C hichip.config.txt
2. Called interactions¶
*fithic.merged.counts.bedpe
and *fithic.merged.pvalue.bedpe
FitHiChIP detailed results are provided in fihichip_results
. It also provides a result summary html file.
For fitHiCHIP called interactions, 10k to 20k are already good numbers (Q value 0.05) because this tool is quite stringent.
The above files contain merged loops. We found the unmerged interactions seem to be better (more sensitive). You can find them in folders like fithichip_results/FitHiChIP_Peak2ALL_{binSize}_L10000_U2000000/P2PBckgr_0/Coverage_Bias/FitHiC_BiasCorr/*.interactions_FitHiC_{Q-value}.bed
3. Tracks visualization¶
In the upload
folder, users can find:
mapped.PT.bw: read pairs bw files
macs2_peaks.narrowPeak: de-novo called peaks from HiCHIP data
fithic.merged.counts.bedpe: FitHiChIP called interactions
user-provided peak file
Users can run create_tracks.py --current_dir -g $genome
to upload these files to protein paint.