Single-cell multiomc analysis

Input

Create library.csv for each sample like below. The 3 columns are fastqs,sample,library_type. The first column is the path. The 2nd column is the fastq file prefix. The last column is library type.

==> PD1.csv <==
fastqs,sample,library_type
/home/yli11/dirs/hem_seq/chenggrp/pdoerfler_single-cell/pdoerfler_10XscRNAseq/weissgrp_286254_10XscRNAseq-1,PD1,Gene Expression
/home/yli11/dirs/hem_seq/chenggrp/pdoerfler_single-cell/pdoerfler_10XscATACseq/weissgrp_286255_10x_Other_workflows-1,PD1,Chromatin Accessibility

==> PD2.csv <==
fastqs,sample,library_type
/home/yli11/dirs/hem_seq/chenggrp/pdoerfler_single-cell/pdoerfler_10XscRNAseq/weissgrp_286254_10XscRNAseq-1,PD2,Gene Expression
/home/yli11/dirs/hem_seq/chenggrp/pdoerfler_single-cell/pdoerfler_10XscATACseq/weissgrp_286255_10x_Other_workflows-1,PD2,Chromatin Accessibility

==> PD3.csv <==
fastqs,sample,library_type
/home/yli11/dirs/hem_seq/chenggrp/pdoerfler_single-cell/pdoerfler_10XscRNAseq/weissgrp_286254_10XscRNAseq-1,PD3,Gene Expression
/home/yli11/dirs/hem_seq/chenggrp/pdoerfler_single-cell/pdoerfler_10XscATACseq/weissgrp_286255_10x_Other_workflows-1,PD3,Chromatin Accessibility

==> PD4.csv <==
fastqs,sample,library_type
/home/yli11/dirs/hem_seq/chenggrp/pdoerfler_single-cell/pdoerfler_10XscRNAseq/weissgrp_286254_10XscRNAseq-1,PD4,Gene Expression
/home/yli11/dirs/hem_seq/chenggrp/pdoerfler_single-cell/pdoerfler_10XscATACseq/weissgrp_286255_10x_Other_workflows-1,PD4,Chromatin Accessibility

Create input.list as sample name list

PD1
PD2
PD3
PD4

Note that PD1 correspond to the file PD1.csv.

Code to automatically generate input.list.

Usually the data given to us is seprated into different folders for different samples. To use our code, you need to put all the RNA fastq in one folder and all the ATAC fastq in one folder. This can be done using ln -s.

mkdir pdoerfler_10XscRNAseq_rep2

cd pdoerfler_10XscRNAseq_rep2

ln -s ../weissgrp_298284_10XscRNAseq-*/*/*gz .

ls *L001_I1*gz

Create sample ID to sample table tsv file like below using sublime text

2593900 PD1
2593903 PD5
2593906 PD_PBMC1
2593901 PD2
2593904 PD_thal1
2593907 PD_PBMC2
2593902 PD3
2593905 PD_thal2

Then

module load python/3.7.0

cellranger_rename_fastq.py label.tsv > run.sh

bash run.sh

ll -rht

RNA=$PWD

You will find the fastq files are renamed. Do the same thing for the ATAC library. Save the ATAC data directory as DNA=$PWD

Create a working directory rep2_data_analysis.

mkdir rep2_data_analysis

cd  rep2_data_analysis

cp $DNA/label.tsv .

cellranger_create_library.py $RNA $DNA label.tsv

Usage

module remove python/3.7.0

module load python/2.7.13

run_lsf.py -f input.list -p single_cell_arc

Default genome

GRCh38_HBG1_mask

The HBG1 gene body and 400bp promoter is masked in the default hg38 genome because ATAC-seq pipeline removes multi-mapped reads

code @ github.