Motif Annotation¶
usage: motif_annotation.py [-h] [-j JID] -f BED_FILE [-d1 D1] [-d2 D2]
[-d3 D3] [-g GENOME] [-m MOTIF_LIST]
[-a GENE_ANNOTATION]
motif annotation
optional arguments:
-h, --help show this help message and exit
-j JID, --jid JID enter a job ID, which is used to make a new directory.
Every output will be moved into this folder. (default:
motif_annotation_yli11_2020-07-22)
-f BED_FILE, --bed_file BED_FILE
a bed file with chr, start, end as the first 3
columns, addtional columns will be ignored (default:
None)
-d1 D1 extend query bed for intersection (default: 0)
-d2 D2 extend tss for intersection (default: 2000)
-d3 D3 extend epi for intersection (default: 0)
Genome Info:
-g GENOME, --genome GENOME
genome version: hg19, hg38, mm9, mm10. By default,
specifying a genome version will automatically update
index file, black list, chrom size and
effectiveGenomeSize, unless a user explicitly sets
those options. (default: hg19)
-m MOTIF_LIST, --motif_list MOTIF_LIST
a list of motif location bed files (default:
/home/yli11/Data/Human/hg19/motif_mapping/motif.list)
-a GENE_ANNOTATION, --gene_annotation GENE_ANNOTATION
gene annotation file (default: /home/yli11/Data/Human/
hg19/Ensembl_v99_2020_Jan/hg19.ensembl.TSS.gene_name.b
ed)
Summary¶
Given a bed file, this program add columns for nearest TSS and known motifs.
Only working for hg19 and mm9.
Input¶
bed file, the first 3 columns should be chr, start, end, additional columns will be ignored.
Usage¶
hpcf_interactive
module load python/2.7.13
motif_annotation.py -f loci.bed -g hg19
motif_annotation.py -f loci.bed -g mm9
You will be notified by email when it is finished.
You can use “-d1, -d2, -d3” to extend the input regions.
Output¶
The first 3 columns are from user’s bed file, then we have columns showing the region name and the extended coordinates (by default, extending length = 0).
Then we have nearest_TSS_gene, nearest_TSS_distance, hard_assignment. Hard_assignment means if the region is overlaped with promoter, (by default, defined as +-2kb TSS). If YES, the value will be the gene name; otherwise will be “.”.
The last column is the overlapped motifs.
query_chr |
query_start |
query_end |
query_name |
query_extend_start |
query_extend_end |
nearest_TSS_gene |
nearest_TSS_distance |
hard_assignment |
merged_info |
---|---|---|---|---|---|---|---|---|---|
chr11 |
4167360 |
4167570 |
chr11:4167360-4167570 |
4167360 |
4167570 |
OR55B1P |
955 |
OR55B1P |
|
chr11 |
4203440 |
4203590 |
chr11:4203440-4203590 |
4203440 |
4203590 |
RP11-23F23.2 |
4781 |
. |
HOCOM_ANDR_HUMAN.H11MO.1.A_chr11_4203580,HOCOM_AP2B_HUMAN.H11MO.0.B_chr11_4203571 |
chr11 |
4208260 |
4208430 |
chr11:4208260-4208430 |
4208260 |
4208430 |
RP11-23F23.2 |
0 |
RP11-23F23.2 |
HOCOM_AP2A_HUMAN.H11MO.0.A_chr11_4208404,HOCOM_AP2B_HUMAN.H11MO.0.B_chr11_4208332,HOCOM_AP2B_HUMAN.H11MO.0.B_chr11_4208406 |
chr11 |
4208860 |
4209050 |
chr11:4208860-4209050 |
4208860 |
4209050 |
RP11-23F23.2 |
490 |
RP11-23F23.2 |
HOCOM_AP2B_HUMAN.H11MO.0.B_chr11_4208943 |
chr11 |
4216260 |
4216410 |
chr11:4216260-4216410 |
4216260 |
4216410 |
RP11-23F23.2 |
7890 |
. |
HOCOM_AIRE_HUMAN.H11MO.0.C_chr11_4216392 |
chr11 |
4218180 |
4218330 |
chr11:4218180-4218330 |
4218180 |
4218330 |
RP11-23F23.2 |
9810 |
. |
HOCOM_ANDR_HUMAN.H11MO.0.A_chr11_4218218,HOCOM_ANDR_HUMAN.H11MO.0.A_chr11_4218220,HOCOM_ANDR_HUMAN.H11MO.2.A_chr11_4218229 |
chr11 |
4353240 |
4353390 |
chr11:4353240-4353390 |
4353240 |
4353390 |
SSU72P3 |
2145 |
. |
|
chr11 |
4403240 |
4403410 |
chr11:4403240-4403410 |
4403240 |
4403410 |
OR52B3P |
3728 |
. |