HemTools
latest
  • NGS pipelines
  • Data Visualization
  • Motif Analysis Suite
  • Integrative Analysis
  • Linux Art
  • Installation
  • A collection of Jupyter Notebooks
  • Bioinformatic Tools
  • Accessible Data in HemTools
  • Gallery (stand-alone tools)
  • Differential Analysis pipelines
  • Study notes
    • Notes on ATAC-seq, open chromatin, and footprints/motif
    • Hosting web server on AWS
    • Current observations/claims for CTCF in insulation potency and chromatin interaction
    • Duplicated sequences in HBG1/HBG2 promoters
    • Duplicated reads, multi-mapped reads
    • Notes on python
    • R for python people
    • Tsai Lab Bioinformatics
    • Thoughts on HemTools
    • bioinformatics
    • Chromatin Interaction data analysis
    • pegRNA design
    • Facts about CRISPR-Cas9
    • Docker your applications
    • All errors and how it was solved
    • Concepts, names, and other small things to remember
    • install tensorlfow on stjude HPC
    • A very good description of histone marks
    • Motif notes
    • Python Pandas
    • Notes on how to use plink
      • subset by individuals
      • create allele frequency report
      • subset by region
      • subset by variant ID
      • for the genetic variantion project, to get vcf files for the 95 donors
      • Extract meta info
      • Update SNP ID with RS ID
      • intersect bed file
    • Quantify mutations
    • All questions and how it was solved
    • (Notes) single-cell RNA-seq analysis
    • Using Scanpy to replace pandas
    • Published statistics about genome biology
  • Machine Learning pipelines
  • CRISPR tools
  • Bioinformatics Core Competencies
  • HemAgent: Autonomous and Reproducible Bioinformatics
HemTools
  • »
  • Study notes »
  • Notes on how to use plink
  • Edit on GitHub

Notes on how to use plink¶

I found PLINK is really fast.

module load plink/2.0

subset by individuals¶

My sample ID is named like NA06985_NA06985, so the keep list just need one column.

[yli11@noderome203 test]$ head 2.list
NA06985_NA06985
[yli11@noderome203 test]$ plink2 --vcf chrAll.95samples.withinRegions.v2.03282023.vcf --keep 2.list --out test --export vcf


[yli11@noderome203 test]$ head 3.list
NA06985
NA06986
[yli11@noderome203 test]$ plink2 --vcf $f --keep 3.list --out test2 --export vcf

create allele frequency report¶

The goal is to filter out all variants that are 0/0 in your set of individuals.

plink2 --vcf $f --out test3 --freq --set-missing-var-ids @_#_\$r_\$a --new-id-max-allele-len 94

The output is test3.afreq

https://2cjenn.github.io/PRS_Pipeline/

../../_images/plink_afreq.png

subset by region¶

plink2 --vcf $f --extract range set1.bed --out test4 --export vcf

subset by variant ID¶

plink2 --vcf $f --extract significant.ID.list --out $f.sig --export vcf

I once had an issue getting subset, because the binary plink file uses double id. What we need to do first is to check if this is the case by generating the pgen file. If so, the id.list need to have space seprated id for two columns.

plink2 --bfile 1000G.allPops.GRCh38.allSamples.chrALL.recodeXrsid --make-pgen  --out test

for the genetic variantion project, to get vcf files for the 95 donors¶

module load python/2.7.13

run_lsf.py -f /home/yli11/HemTools/share/misc/1kg.filelist --sample_list donor.list -p subset_1kg_by_people

## once it is finished, the final.vcf is what you need

Extract meta info¶

bcftools query -f '%CHROM %POS %AN_EAS\n' chrAll.95samples.withinRegions.v2.03282023.vcf.gz

[yli11@noderome153 SNP152]$ bcftools query -f '%CHROM %POS %ID %REF %ALT %GENEINFO %FREQ %COMMON\n' GCF_000001405.38.bgz

#view header
bcftools view -h GCF_000001405.38.bgz

Update SNP ID with RS ID¶

https://www.biostars.org/p/171557/

bcftools query -f ‘%ID %AF_EAS %AF_AMR %AF_EUR %AF_AFR %AF_SAS %AF_EUR_unrel %AF_EAS_unrel %AF_AMR_unrel %AF_SAS_unrel %AF_AFR_unrel %MAF_EUR_unrel %MAF_EAS_unrel %MAF_AMR_unrel %MAF_SAS_unrel %MAF_AFR_unreln’ ${COL1} > $label.AF.txt

intersect bed file¶

PLINK only work for vcf file with genotypes, for gnomad vcf file, use vcftools for region extraction:

vcftools --gzvcf gnomad.genomes.v4.0.sites.chrY.vcf.bgz --bed 1115_OT_flank100.bed --out output_prefix --recode --keep-INFO-all

code @ github.

Next Previous

© Copyright 2020, Yichao Li, Yong Cheng.

Built with Sphinx using a theme provided by Read the Docs.