CITE-seq (scRNA-seq with antibodies) analysis

usage: single_cell2.py [-h] [-j JID] -f LIBRARY_CSV [-a ANTIBODY_BARCODE]
                       [-g GENOME] [--genes GENES]
                       [--cellranger_refdata CELLRANGER_REFDATA]

perform 10X single-cell RNA-seq analysis or CITE-seq

optional arguments:
  -h, --help            show this help message and exit
  -j JID, --jid JID     enter a job ID, which is used to make a new directory.
                        Every output will be moved into this folder. (default:
                        single_cell2_yli11_2021-04-05)
  -f LIBRARY_CSV, --library_csv LIBRARY_CSV
                        A list of group name (fastq file prefix). (default:
                        None)
  -a ANTIBODY_BARCODE, --antibody_barcode ANTIBODY_BARCODE
                        antibody barcodes see: https://support.10xgenomics.com
                        /single-cell-gene-
                        expression/software/pipelines/latest/using/feature-bc-
                        analysis (default: None)
  -g GENOME, --genome GENOME
                        genome version: hg19, hg38, mm10. (default: hg38)
  --cellranger_refdata CELLRANGER_REFDATA
                        Not for end-user (default: /research/rgs01/application
                        s/hpcf/authorized_apps/rhel7_apps/cellranger/refdata
                        /refdata-cellranger-GRCh38-3.0.0/)

Summary

Perform CITE-seq analysis. Only for CITE-seq data.

Input

Note

This program assumes the fastq files for each sample is stored in an individual folder. For example, if you have A,B,C sample fastq files in the same directory, then please create a folder for each sample and mv the corresponding fastq files there.

Note

If you have A_S1_L001 and A_S2_L001, please note that each sample name should match to a unique S number. So in this example, you can rename A_S2_L001 to A_S1_L002.

You need two input files: library.csv and antibody.csv, corresponding to the Library CSV and Feature Reference CSV here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis

For CITE-seq data, we should have one normal scRNA-seq data and one seq data only for antibody. In the original library.csv format, the sample column should be unique, and fastq files should start with the string specified in this column. But here, note that for the library.csv used here, we keep sample name the same for the same sample, but with 2 different library_type, namely Gene Expression and Antibody Capture. The python script will transform this batch run library.csv to correct library.csv used for cellranger.

The following antibody.csv is for TotalSeq-B type. There are also A or C types.

==> antibody.csv <==
id,name,read,pattern,sequence,feature_type
CD235ab,CD235ab_TotalSeqB,R2,5PNNNNNNNNNN(BC)NNNNNNNNN,GCTCCTTTACACGTA,Antibody Capture
CD71,CD71_TotalSeqB,R2,5PNNNNNNNNNN(BC)NNNNNNNNN,CCGTGTTCCTCATTA,Antibody Capture


==> library.csv <==
fastqs,sample,library_type
/ABS_PATH/2-1437806/,WT_CD34_Diff_D7,Gene Expression
/ABS_PATH/2-1437807/,HS_D0_CD34_Diff_D7,Gene Expression
/ABS_PATH/2-1437808/,HS_D6_CD34_Diff_D7,Gene Expression
/ABS_PATH/2-1437809/,WT_CD34_Diff_D7,Antibody Capture
/ABS_PATH/2-1437810/,HS_D0_CD34_Diff_D7,Antibody Capture
/ABS_PATH/2-1437811/,HS_D6_CD34_Diff_D7,Antibody Capture

Usage

module load python/2.7.13

single_cell2.py -f library.csv -a antibody.csv -g hg38

Output

Gene expression table

A file named cellrange_final_gene_expression_removed_all_zeros.csv is located at {{job_id}}/{{group_name}}_results/{{group_name}}/outs

Report bug

$ HemTools report_bug

Note

cite-seq DASH visualization

This has been included in the pipeline, you don’t need to run it manually any more.

usage: cite_seq_vis.py [-h] (--current_dir | --input_csv INPUT_CSV)
                       [--MT_percent MT_PERCENT] [--max_genes MAX_GENES]
                       [-o OUTPUT] [-g GENOME]

cite-seq visualization pipeline

optional arguments:
  -h, --help            show this help message and exit
  --current_dir         run in current dir, suppose cellRanger is finished
                        correctly (default: False)
  --input_csv INPUT_CSV
                        manually input csv (default: None)
  --MT_percent MT_PERCENT
                        MT_percent, default is 20, sometimes I use 10 or 5
                        (default: 20)
  --max_genes MAX_GENES
                        max_genes (default: 6000)
  -o OUTPUT, --output OUTPUT
                        output prefix (default:
                        sc_integration_yli11_2021-04-26)

Genome Info:
  -g GENOME, --genome GENOME
                        genome version: hg19, hg38, mm9, mm10. By default,
                        specifying a genome version will automatically update
                        index file, black list, chrom size and
                        effectiveGenomeSize, unless a user explicitly sets
                        those options. (default: hg19)

Run this after sc_data_integration.py

usage: sc_data_integration.py [-h] -f INPUT_CSV [--MT_prefix MT_PREFIX] [--MT_percent MT_PERCENT] [--max_genes MAX_GENES] [-o OUTPUT] [--citeseq]

optional arguments:
  -h, --help            show this help message and exit
  -f INPUT_CSV, --input_csv INPUT_CSV
                        Need at least 2 columns with column names, Sample,Location, see: https://pegasus.readthedocs.io/en/stable/usage.html (default: None)
  --MT_prefix MT_PREFIX
                        MT_prefix, seems that mm is mt- and human is MT- (default: MT-)
  --MT_percent MT_PERCENT
                        MT_percent, default is 20, sometimes I use 10 or 5 (default: 20)
  --max_genes MAX_GENES
                        max_genes (default: 6000)
  -o OUTPUT, --output OUTPUT
                        output prefix pdf (default: sc_integration_yli11_2021-04-26)
  --citeseq             is data is cite-seq (default: False)
module load conda3
source activate /home/yli11/.conda/envs/dash
cite_seq_dash.py sc_integration_yli11_2021-04-25

Comments

code @ github.