Single-cell RNA-seq analysis¶
Summary¶
This pipeline performs scRNA-scATAC paired analysis (i.e., known as 10x multiome) using cellranger-arc/2.0.0
. Then data is analyzed using seurat
and signac
. We have two pipelines: standard
and WNN
, each will save R object for further downstream analysis.
standard
is data integration using RNA data only.
WNN
is data integration using RNA and ATAC.
Since scATAC is far more sparse than scRNA, the cell clustering and annotation is considered to be more accurate in scRNA data.
Currently only work for hg38.
Input¶
Put all fastq files in the same working dir.
10X Genomics single-cell RNA-seq data contains R1
and R2
reads. R1
is the barcode information. R2
is the actual 3-end mRNA sequencing result.
Your working directory should contain all input fastq files. For example:
Chicken_S4_L001_R1_001.fastq.gz
Chicken_S4_L001_R2_001.fastq.gz
Chicken_S4_L002_R1_001.fastq.gz
Chicken_S4_L002_R2_001.fastq.gz
Chicken_S4_L003_R1_001.fastq.gz
Chicken_S4_L003_R2_001.fastq.gz
Chicken_S4_L004_R1_001.fastq.gz
Chicken_S4_L004_R2_001.fastq.gz
Orange_S1_L001_R1_001.fastq.gz
Orange_S1_L001_R2_001.fastq.gz
Orange_S1_L002_R1_001.fastq.gz
Orange_S1_L002_R2_001.fastq.gz
Orange_S1_L003_R1_001.fastq.gz
Orange_S1_L003_R2_001.fastq.gz
Orange_S1_L004_R1_001.fastq.gz
Orange_S1_L004_R2_001.fastq.gz
Apple_S2_L001_R1_001.fastq.gz
Apple_S2_L001_R2_001.fastq.gz
Apple_S2_L002_R1_001.fastq.gz
Apple_S2_L002_R2_001.fastq.gz
Apple_S2_L003_R1_001.fastq.gz
Apple_S2_L003_R2_001.fastq.gz
Apple_S2_L004_R1_001.fastq.gz
Apple_S2_L004_R2_001.fastq.gz
Banana_S3_L001_R1_001.fastq.gz
Banana_S3_L001_R2_001.fastq.gz
Banana_S3_L002_R1_001.fastq.gz
Banana_S3_L002_R2_001.fastq.gz
Banana_S3_L003_R1_001.fastq.gz
Banana_S3_L003_R2_001.fastq.gz
Banana_S3_L004_R1_001.fastq.gz
Banana_S3_L004_R2_001.fastq.gz
Tip
If you have the fastq files stored in different folders, you can use ln -s path_to_fastq_folder/ .
to create soft links to your fastq files. Do it for each folder, so that you have all fastq files in your working directory.
input.list
In the above example, you have 4 groups/patients, namely: Chicken, Orange, Apple, Banana. Then you just have to create an input.list
and put the group name (Case sensitive! Make sure the names are exactly the same as in your fastq files!) line by line, like below:
Chicken
Orange
Apple
Banana
Usage¶
module load python/2.7.13
single_cell.py -f input.list
For single-cell ATAC data, add --atac
, only available in hg38:
single_cell.py -f input.list --atac
Output¶
cellranger output, see results in the jodID folder. *_results
Report bug¶
$ HemTools report_bug
Old notes¶
This pipeline generates gene expression table and several figures described as below:
Processing single-cell RNA-seq data and quantifying gene expression using
cellRanger
Removing genes with all zeros
The following are not included in the pipeline yet:
plot read cound density for all input samples
identify genes with mean read count above a cutoff
identify genes with X% of cells containing read count above a cutoff
clustermap with gene names (by default cellRanger is Ensembl ID)
plot pair-wise gene correlation
top expression plot , as well as other plots generated by
scater
: http://bioconductor.org/packages/release/bioc/vignettes/scater/inst/doc/vignette-qc.htmlplot mean-variance for all cells and all samples, and put label for user input gene names
PCA plot (not implemented), T-SNE plot, UMAP plot (not implemented) (shape by k-means) (not implemented) with color intensity using expression values of a user input gene
Note that Single-cell differential expression analysis is not implemented yet.
Note
Available genomes are hg19, hg38, mm10. hg38 and mm10 supports lateset Chromium 3’ gene expression library, including V3, V3.1 and V3.2. Hg19 only works with V2. Default genome is hg38.
QC¶
https://academic.oup.com/bioinformatics/article/35/24/5306/5542946
Gene density plot¶
Ribosomal protein reads¶
We have a recent blood scRNA-seq data where the RP reads% is about 30-40% and most DEGs are actually RP proteins.
RP non-RP
all genes (count>20) 67 20
all DEG (count>20) 23 6
as a percentage 0.343283582 0.3
Comments¶
code @ github.