Differential exon analysis

usage: DEXseq.py [-h] [-j JID] -f BAM_TSV -d DESIGN_MATRIX [--cores CORES]
                 [--src SRC] [--count_cutoff COUNT_CUTOFF] [-g GENOME]
                 [-gff GFF]

optional arguments:
  -h, --help            show this help message and exit
  -j JID, --jid JID     enter a job ID, which is used to make a new directory.
                        Every output will be moved into this folder. (default:
                        DEXseq_yli11_2022-05-20)
  -f BAM_TSV, --bam_tsv BAM_TSV
                        TSV file, 3 columns, bam file, sample name, group name
                        (default: None)
  -d DESIGN_MATRIX, --design_matrix DESIGN_MATRIX
                        TSV file, 3 columns, group ID, group ID, output_prefix
                        (default: None)
  --cores CORES         Number of CPUs (default: 10)
  --src SRC             DEX src (default:
                        /home/yli11/Programs/DEXSeq/inst/python_scripts)
  --count_cutoff COUNT_CUTOFF
                        filter exons by sum read count, more samples should
                        increase this cutoff (default: 10)

Genome Info:
  -g GENOME, --genome GENOME
                        genome version: hg19, hg38, mm9, mm10 (default: hg19)
  -gff GFF              DEXseq gff file (default: /home/yli11/Programs/DEXSeq/
                        inst/python_scripts/gencode.v30.hg19.noagg.gff)

Summary

Differential exon analysis to identify exon skip events.

5/20/2022

Only works for hg19.

Only works for paired-end RNA-seq data.

Input

Example data dir: Projects/Siqi_data/RNAseq_data/Differential_Exon_Analysis/test

1. bam.tsv

Please copy (or softlink) position sorted and indexed bam files into your working dir. Prepare bam.tsv as a 3-column tsv file: bam_file_name, sample_name, group_name. Example shown below:

2393109_Nontarget_Hudep2_1_S140_L003.markdup.bam.chr21.bam      C1      control
2393111_M1_1_Hudep2_S142_L003.markdup.bam.chr21.bam     K1      KO
2393113_M1_2_Hudep2_2_S144_L003.markdup.bam.chr21.bam   K2      KO
2393110_Nontarget_Hudep2_2_S141_L003.markdup.bam.chr21.bam      C2      control

2. design matrix

This is similar to peak_call.tsv. The columns are group 1, group 2, output prefix.

KO      control myOut

Each line will be used as a comparison. For example, if you have three groups and possibly 3 comparisons (e.g., 1 vs 2, 1 vs 3, and 2 vs 3), then you can write down the comparisons in three lines. Again, this is same as the peak_call.tsv.

Usage

    hpcf_interactive

module load python/2.7.13

DEXseq.py -f bam.tsv -d design.tsv

Output

Results are stored in the $jid folder.

.out files are exon read counts; these are inputs to DEXseq program.

.png files showing mean-variance, MA plot, and p-value distribution.

.csv files are the final outputs, containing p-value and fold change, just as different gene expression data.

_results folder contains some HTML results summary.

code @ github.