Extract user-defined gene promoter from refseq TSS database¶
usage: get_promoter.py [-h] -f INPUT_LIST [-u U] [-d D] [-o OUTPUT]
[-g GENOME] [--refseq_TSS REFSEQ_TSS]
[--gene_name_db GENE_NAME_DB]
optional arguments:
-h, --help show this help message and exit
-f INPUT_LIST, --input_list INPUT_LIST
gene list, any type of Entrez ID, Ensemble Gene ID,
Ensemble Transcript ID, gene name (default: None)
-u U upstream bp (default: 1000)
-d D downstream bp (default: 200)
-o OUTPUT, --output OUTPUT
output bed file name (default:
get_promoter_yli11_2019-12-10.bed)
Genome Info:
-g GENOME, --genome GENOME
genome version: hg19, hg38, mm9, mm10. By default,
specifying a genome version will automatically update
index file, black list, chrom size and
effectiveGenomeSize, unless a user explicitly sets
those options. (default: hg19)
--refseq_TSS REFSEQ_TSS
refseq_TSS (default: /home/yli11/Data/Human/hg19/annot
ations/hg19.tss.refseq.bed.gz)
--gene_name_db GENE_NAME_DB
gene_name_db (default: /home/yli11/Data/Human/hg19/ann
otations/gene_id_name_all.conversion)
Summary¶
Currently only work for hg19 or mm9.
Input¶
A list of gene names.
HBB
TP53
Parameters¶
-u number of bp extending from the upstream of TSS
-d number of bp extending from the downstream of TSS
Output¶
Output file name is get_promoter_USER_DATE.bed
==> get_promoter_yli11_2019-12-10.bed <==
chr11 5248101 5249301 HBB 0 -
Usage¶
by default -u 1000 -d 200
hpcf_interactive
module load python/2.7.13
module load bedtools
get_promoter.py -f genes.list
For mouse genome, use:
get_promoter.py -f genes.list -g mm9