Footprint analysis for ATAC-seq data

usage: atac_seq_footprint.py [-h] [-j JID] -f INPUT [-t TREATMENT]
                             [-c CONTROL] [-g GENOME]

RGT_HINT atac-seq footprint with bias correction

optional arguments:
  -h, --help            show this help message and exit
  -j JID, --jid JID     enter a job ID, which is used to make a new directory.
                        Every output will be moved into this folder. (default:
                        atac_seq_footprint_yli11_2020-07-04)
  -f INPUT, --input INPUT
                        3-col tsv, bam,bed,output-prefix (default: None)
  -t TREATMENT, --treatment TREATMENT
                        default is the output-prefix in the first row.
                        treatment output-prefix for differential footprint
                        analysis, should match to names in the input file
                        (default: None)
  -c CONTROL, --control CONTROL
                        default is the second row. control output-prefix for
                        differential footprint analysis (default: None)

Genome Info:
  -g GENOME, --genome GENOME
                        genome version: hg19, hg38, mm9, mm10. By default,
                        specifying a genome version will automatically update
                        index file, black list, chrom size and
                        effectiveGenomeSize, unless a user explicitly sets
                        those options. (default: hg19)

Summary

This pipeline applies HINT-ATAC (v0.13) and output bias-corrected footprint bed files and cutsites bw files.

Additionally, if -t and -c options are given, this program will perform differential footprint analysis. Example: https://www.regulatory-genomics.org/hint/tutorial/.

By default, -t uses the name in the first row of the input file. and -c uses the name in the second row.

Input

The input file is a tsv format containing 3 columns: bam, bed, output-prefix (sample name).

Either relative path or absolute path is OK.

Suppose you run this pipeline in bam_files folder generated by HemTools atac_seq

Hudep1.markdup.bam      ../peak_files/Hudep1.markdup.rmchrM_peaks.narrowPeak    H1
Hudep2.markdup.bam      ../peak_files/Hudep2.markdup.rmchrM_peaks.narrowPeak    H2

Our recommendation is to create a new working dir and copy the input data so that the input file looks nicer.

Hudep1.markdup.bam      Hudep1.markdup.rmchrM_peaks.narrowPeak  H1
Hudep2.markdup.bam      Hudep2.markdup.rmchrM_peaks.narrowPeak  H2

Output

  1. bias-corrected bigwig files

Look for *_bc.bw in {{jid}} folder

  1. called footprints

*.bed in {{jid}} folder

  1. differential motifs

Results are in Diff_footprints folder

The txt contains the p-value

The pdf shows a scatter plot of the p-values.

Usage

module load python/2.7.13

atac_seq_footprint.py -f input.list

OR

module load python/2.7.13

atac_seq_footprint.py -f input.list -t H2 -c H1 -g hg19

Reference

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1642-2

Include conservation may enhance the footprint plot:

https://slowkow.github.io/CENTIPEDE.tutorial/

https://link.springer.com/article/10.1186/s13059-020-1929-3

https://www.regulatory-genomics.org/motif-analysis/additional-motif-data/

https://www.regulatory-genomics.org/rgt/rgt-data-folder/

Other new tools

https://github.com/loosolab/TOBIAS

https://github.com/Boyle-Lab/TRACE

code @ github.