HemTools
latest
  • NGS pipelines
  • Data Visualization
  • Motif Analysis Suite
  • Integrative Analysis
  • Linux Art
  • Installation
  • A collection of Jupyter Notebooks
  • Bioinformatic Tools
    • Predicting in vivo TFBS using Catchitt
    • Calling significant interactions from Capture-C or Capture-HiC
    • Functional Variant scores
    • GSEA and pathway/GO enrichment analyses
    • Replicate correlation and QC for HiC data
    • Gene expression clustering
    • Consensus peaks given multiple (>=2) replicates
    • NCBI data submission
    • Local UCSC cell browser usage for Seurat
    • notes on alphafold
    • Assigning features to a bed file.
    • General bait design
    • calculate chrM percent
    • Filter bam files and generate bw files
    • check sample barcode frequency in index reads
    • Barcode frequency in 5’-end
    • Download raw data from Illumina Base Space
    • Convert BCL basecall files to FASTQ files
    • BedGraph to BigWiggle
    • bed overlap bedpe
    • Query bed overlap with a list of bed files
    • Merging bigwiggle files into one bw
    • Input
    • Usage
    • Chromatin interaction calling in captureC data
    • ATAC-seq model and footprint prediction using Chrombpnet
    • Summary
    • Input
    • Important note
    • Usage
    • Output
    • QC
    • Visualize genomic loci (overview)
    • Count indel integration pipeline
    • Count indel integration pipeline (simplified version)
    • Crispresso2 for HDR
    • Convert CRISPResso allele frequency table to vcf-like table
    • Interactive visualization using Dash Bio
    • convert dataframe to html
    • CRISPR Screening Demultiplexing
    • CRISPR Screening Demultiplexing (hard trim first N random bp)
    • Demultiplexing fastq files
    • Diff or merge of two bw files
    • DNAnexus download and upload
    • EGACryptor for EGA submission
    • Call interactions from HiC
    • Extract inward/outward oriented pairs from BAM file
    • Merge fastq I1 I2 R1 R2 reads into R1 and R2
    • subsample fastq and visualize in sequence logo
      • Summary
      • Usage
    • Run fastQC for a list of fastq files
    • Filter out reads mapped to specific sequences
    • Annotate vcf file (custom annotation not work)
    • Genomic features annotatoin given bed file
    • Extract user-defined gene promoter from refseq TSS database
    • Find allele (e.g., SNPs) specific effects
    • Integrating gene expression data and PPI network
    • Objective
    • Steps
    • Cons
    • Input
    • Usage
    • Output
    • GTF operations
    • Running GUIDE-seq in HPC
    • HiC-Pro
    • Generate indexed genome, chrom size, and res fragment bed for HicPro analysis
    • Homer ChIP-seq analysis
    • How to download all files from a website
    • ENCODE database query
    • long-read RNA-seq quantification using espresso
    • long-read RNA-seq quantification using isoQuant
    • Transcript-level abundance quantification
    • Kmer count over bed
    • Lift Over Bed or bigWiggle files
    • LiftOverVCF
    • Seurat to Loupe browser
    • Merge multiple bedfiles
    • Merge fastq files for L001 L002 L003 L004
    • Write flowchart using text
    • Literature Search and paper download
    • Using nf-core pipelines on HPC
    • OnTAD
    • Optimal subset finding problem in mutagenesis studies
    • Filtering out peaks in narrowPeak files
    • Convert rmd to html
    • RNA-seq QC
    • Across cell type NGS data normalization
    • sequence alignment seq + bar plot
    • FASTQ files operations
    • Smoothing a bedgraph file
    • Download fastq data from NCBI SRA
    • Subsample fastq to the same sequencing depth
    • Super-enhancer identification
    • Convert a column to bigwiggle file
    • Using GPU on HPC
    • Test differences in number of interactions
    • Identify direct targets and co-binding factors
    • Extract Ensembl Gene Name and IDs given IDs or names from any databases
    • (TOBIAS) Footprint analysis for ATAC-seq data
    • Uditas
    • Generate new genome given vcf file
  • Accessible Data in HemTools
  • Gallery (stand-alone tools)
  • Differential Analysis pipelines
  • Study notes
  • Machine Learning pipelines
  • CRISPR tools
  • Bioinformatics Core Competencies
  • HemAgent: Autonomous and Reproducible Bioinformatics
HemTools
  • »
  • Bioinformatic Tools »
  • subsample fastq and visualize in sequence logo
  • Edit on GitHub

subsample fastq and visualize in sequence logo¶

Summary¶

Mostly used for exploration and debug of raw fastq files, to identify common sequence patterns.

Steps: fastq was subsamples to 2M reads, clustered and reordered based on k-mer and sequence logo was generated for each of the 10-splits reads. For example, a sequence logo for 0% to 10% of the reads.

../../_images/fastq_vis_example.png

Usage¶

Copy your fastq to a working dir.

hpcf_interactive

module load python/2.7.13

run_lsf.py --guess_input --single

run_lsf.py -f fastq.tsv -p fastq_vis

code @ github.

Next Previous

© Copyright 2020, Yichao Li, Yong Cheng.

Built with Sphinx using a theme provided by Read the Docs.