Convert bed files to vcf

usage: bed2vcf.py [-h] -f TSV --required_cols REQUIRED_COLS --info_cols
                  INFO_COLS --info_types INFO_TYPES [-o OUTPUT]
                  [--column_number COLUMN_NUMBER] [--add_chr]
                  [--remove_cols REMOVE_COLS] [--log10_cols LOG10_COLS]

optional arguments:
  -h, --help            show this help message and exit
  -f TSV, --tsv TSV     input tsv file (default: None)
  --required_cols REQUIRED_COLS
                        input the chr, pos, ID, ref, alt col names in order
                        (default: None)
  --info_cols INFO_COLS
                        input col names (please make sure no spaces in the col
                        names) to be put in the info column (default: None)
  --info_types INFO_TYPES
                        input col types for the info columns, should match
                        info cols in order. choose from Integer, Float, String
                        (default: None)
  -o OUTPUT, --output OUTPUT
                        output file name (default: bed2vcf.vcf)
  --column_number COLUMN_NUMBER
                        if no header, then columns will be named as numbers,
                        start from 0 (default: None)
  --add_chr             add string chr to the chrom column (default: False)
  --remove_cols REMOVE_COLS
                        remove columns not used in the vcf file (default:
                        None)
  --log10_cols LOG10_COLS
                        convert a col to -np.log10 (default: None)

Summary

This program converts any tsv to vcf file (tabix index).

The output vcf file can be visualized on protein paint with the following json:

"locusinfo":{ "key":"P" }, is the key option to show variants spreading on different values.

{"type":"vcf",
"name":"TableS3",
"file":"yli11/CRM_gRNA/SGP/TableS3.hg19.vcf.gz",
"itemlabelname":"variant",
"axisheight":200,
"vcfinfofilter":{
        "lst":[
                { "name":"-log10 P value",
                "locusinfo":{ "key":"P" },
                "numericfilter":[{"side":">","value":2},{"side":">","value":10}]
                }

                ],
        "setidx4numeric":0
        }
}

Input

chr2 60724085 60724086 rs1896295 BCL11A T C 37/201/307 0.25229 -11.0735 1.02e-25 -0.6421 0.05798 0.1917 chr2 60718042 60718043 rs1427407 BCL11A T G 35/186/324 0.23485999999999999 -11.0656 1.09e-25 -0.6471 0.05848 0.1915

Usage

hpcf_interactive.sh

module load python/2.7.13

bed2vcf.py -f input.bed --required_cols 0,2,3,5,6  --column_number 4,7,8,9,10,11,12,13  --info_types String,String,Float,Float,Float,Float,Float,Float  --info_cols Nearest_gene,GENO,MAF,STAT,P,BETA,SEBETA,R2 --remove_cols 1 --log10_cols 10 --output TableS3.hg19.vcf

In the example input file, we have 14 columns, the required columns for vcf file is chr, pos, id, ref, alt, which corresponds to column 0,2,3,5,6. We also want to include the INFO column for vcf file, which includes column 4,7,8,9,10,11,12,13, the corresponding labels are Nearest_gene,GENO,MAF,STAT,P,BETA,SEBETA,R2 and the types are String,String,Float,Float,Float,Float,Float,Float. Lastly, the pvalue is column 10 and we want to convert it to -np.log10. --remove_cols 1 will delete column 1 from out input, which is the “start” position in the bed file.

The output is TableS3.hg19.vcf.gz and TableS3.hg19.vcf.gz.tbi

Another example

inputfile

chr2    123     124     A       G       rs57

chr2    11      12      A       C       rs16
bed2vcf.py -f input2.hg38_to_hg19.bed --required_cols 0,2,5,3,4 --info_cols RSID --info_types String --column_number 5 --output input2.hg38_to_hg19.vcf

code @ github.