Annotate vcf file (custom annotation not work)

Summary

I found this tool when I try to add GERP score to my vcf files. I tried SnpSift. Built 4.0a database, not working. I then downloaded version 2.9 database, for my >100k variants, it only annotated 450 SNPs, and the GERP scores are all similar for different position. So I gave up on this tool, although it does have a more visually-appealing documentation and capacity to add custom files to annotation database.

Gemini is developed by the same author from bedtools. So I know it must have good quality.

Installation is easy but there is a trick. You can use conda to install it. But conda won’t create annotation database for you. So you have to create ~/.gemini/ folder and create a yaml file gemini-config.yaml like below:

annotation_dir: /home/yli11/.gemini/data
versions:
  ALL.wgs.phase3_shapeit2_mvncall_integrated_v5a.20130502.sites.tidy.vcf.gz: 4
  ESP6500SI.all.snps_indels.tidy.v2.vcf.gz: 2
  ExAC.r0.3.sites.vep.tidy.vcf.gz: 4
  GRCh37-gms-mappability.vcf.gz: 2
  cosmic-v68-GRCh37.tidy.vcf.gz: 3
  detailed_gene_table_v75: 2
  geno2mp.variants.tidy.vcf.gz: 1
  hg19.rmsk.bed.gz: 2
  summary_gene_table_v75: 2

Then do a gemini update --dataonly --extra gerp_bp.

You are good to go now.

The latest gemini version only works for hg19.

Input

Input must be a vcf file like below:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
11      4167375 11_4167374_4167375      A       G       .       .       .
11      4167377 11_4167376_4167377      A       G       .       .       .
11      4167378 11_4167377_4167378      A       G       .       .       .
11      4167381 11_4167380_4167381      A       G       .       .       .
11      4167382 11_4167381_4167382      A       G       .       .       .
11      4167384 11_4167383_4167384      A       G       .       .       .
11      4167389 11_4167388_4167389      A       G       .       .       .
11      4167392 11_4167391_4167392      A       G       .       .       .

Usage

module load conda3

source activate /home/yli11/.conda/envs/variant

gemini --annotation-dir /home/yli11/.gemini/data load --save-info-string -v input.vcf output.db

## this output.db is a binary file

gemini query --header -q "select * from variants" output.db > my_results.tsv

Output

See my_results.tsv from the commands above.

code @ github.