Load library

[1]:
library("DESeq2")

Loading required package: S4Vectors

Loading required package: stats4

Loading required package: BiocGenerics

Loading required package: parallel


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min



Attaching package: ‘S4Vectors’


The following object is masked from ‘package:base’:

    expand.grid


Loading required package: IRanges

Loading required package: GenomicRanges

Loading required package: GenomeInfoDb

Loading required package: SummarizedExperiment

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: ‘MatrixGenerics’


The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars


Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.



Attaching package: ‘Biobase’


The following object is masked from ‘package:MatrixGenerics’:

    rowMedians


The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians


Read count table

This count table is generated from the diffpeak pipeline: https://hemtools.readthedocs.io/en/latest/content/Differential_analysis/deseq2_diffpeak.html?highlight=diffpeak

[2]:
infile="DiffPeak_syi_2021-11-16.count_table.bed"
countData = read.csv(infile,sep="\t")
head(countData)
A data.frame: 6 × 10
GeneidChrStartEndStrandLengthX1098561_Hudep2_CTCF.rmdup.uq.bamX1047954_Hudep2_CTCF_50bp.rmdup.uq.bamX2268629_KO_29_CTCF_S134_L004.rmdup.uq.bamX2268630_KO_68_CTCF_S135_L004.rmdup.uq.bam
<chr><chr><int><int><chr><int><int><int><int><int>
1region_1chr1 15918 16556+6395234278 32
2region_2chr1 54581 54686+106 6 9 38 3
3region_3chr1 91023 91696+6743815381130
4region_4chr1104705105259+5554034150 34
5region_5chr1115570115837+2683519140 23
6region_6chr1138518139511+9943220268 24

Specify design matrix, control + treatment

[3]:
group1=c("X1098561_Hudep2_CTCF.rmdup.uq.bam","X1047954_Hudep2_CTCF_50bp.rmdup.uq.bam") # WT
group2=c("X2268629_KO_29_CTCF_S134_L004.rmdup.uq.bam","X2268630_KO_68_CTCF_S135_L004.rmdup.uq.bam") # KO
group_label <- c(rep("control", length(group1)), rep("treatment", length(group2)))

[4]:
group_label
  1. 'control'
  2. 'control'
  3. 'treatment'
  4. 'treatment'

Get count matrix

[5]:
mat <- countData[,7:10]
rownames(mat) <- countData[,1]
head(mat)
A data.frame: 6 × 4
X1098561_Hudep2_CTCF.rmdup.uq.bamX1047954_Hudep2_CTCF_50bp.rmdup.uq.bamX2268629_KO_29_CTCF_S134_L004.rmdup.uq.bamX2268630_KO_68_CTCF_S135_L004.rmdup.uq.bam
<int><int><int><int>
region_15234278 32
region_2 6 9 38 3
region_33815381130
region_44034150 34
region_53519140 23
region_63220268 24

RUN DEseq2

[6]:
sample_info <- data.frame(sampleName=c(group1, group2), Group=group_label)
dds <- DESeqDataSetFromMatrix(countData=mat, colData=sample_info, design=~Group)
dds = DESeq(dds) # main function
res=results(dds)
raw_count = counts(dds)
norm_count = counts(dds,normalized=TRUE)

Warning message in DESeqDataSet(se, design = design, ignoreRank):
“some variables in design formula are characters, converting to factors”
estimating size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

Save result (this is the result without LFC shrinkage)

[7]:
outfile="deseq2.result"
colnames(norm_count) = paste(colnames(norm_count),".norm",sep="")
d <- data.frame(countData,norm_count, logFC=res[,"log2FoldChange"], AveExpr=res[,"baseMean"], t=res[,"stat"], P.Value=res[,"pvalue"], adj.P.Val=res[,"padj"])
d <- d[order(d[,"P.Value"]),]
write.table(d, file=paste(outfile, ".deseq2_wo_lfcShrink.tsv", sep=""), sep="\t", quote=FALSE, row.names=FALSE)
[8]:
plotMA(res, ylim = c(-4, 4))
../_images/jupyter_notebooks_DEseq2_example_13_0.png

apply LFC shrinkage

[10]:
res <- lfcShrink(dds,coef="Group_treatment_vs_control", type="ashr")
plotMA(res, ylim = c(-4, 4))
using 'ashr' for LFC shrinkage. If used in published research, please cite:
    Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
    https://doi.org/10.1093/biostatistics/kxw041

../_images/jupyter_notebooks_DEseq2_example_15_1.png

save new result

[12]:
head(res)
log2 fold change (MMSE): Group treatment vs control
Wald test p-value: Group treatment vs control
DataFrame with 6 rows and 5 columns
          baseMean log2FoldChange     lfcSE      pvalue       padj
         <numeric>      <numeric> <numeric>   <numeric>  <numeric>
region_1  63.44804     0.00805928 0.0788364 3.31818e-01 0.99770039
region_2   9.55499    -0.00183120 0.1008589 8.50857e-01         NA
region_3 110.64886     1.97421221 0.5220874 6.11724e-07 0.00147051
region_4  52.58138     0.00282279 0.0643666 6.94088e-01 0.99770039
region_5  38.47367     0.00579136 0.0769100 4.82171e-01 0.99770039
region_6  47.76244     0.03886980 0.1892013 6.02855e-02 0.90094156
[13]:
outfile="deseq2.result"
colnames(norm_count) = paste(colnames(norm_count),".norm",sep="")
d <- data.frame(countData,norm_count, logFC=res[,"log2FoldChange"], AveExpr=res[,"baseMean"], P.Value=res[,"pvalue"], adj.P.Val=res[,"padj"])
d <- d[order(d[,"P.Value"]),]
write.table(d, file=paste(outfile, ".deseq2_with_lfcShrink.tsv", sep=""), sep="\t", quote=FALSE, row.names=FALSE)