Analysis of single cell Strand-seq data

Summary

Single-cell DNA template strand sequencing (Strand-seq) can identify high-resolution haplotypes (i.e., one copy per chromosome) for single cells, useful for detecting genomic variants in a “phased” manner.

In terms of the technique, parental cells are paused when they go into the cell dicision stage (DNA replication occurs here), then they are treated with BrDu, these chemicals will create nick sites when exposed to UV photolysis. Then “PCR with P2 primer amplifies only original template strand DNA”. (I’m not sure exactly why, but it could be these “short” single strand DNA will be digested by some enzymes, or through some size selection methods to filter these short noise?) Last, sequence the sample and we will get the haplotype.

Public pipelines

these 3 pipelines all come from the same lab, not sure how they differ (shown by publication date):

    1. BAIT

    1. Mosaicatcher-pipeline (contains a lot of tools)

    1. Ashley-qc

Tools to try: Delly2

Input

fastq.tsv

4 columns: R1, R2, sample name, cell line

This file can be generated by run_lsf.py --guess_input, however, you still need to fill in the last column for cell line names.

Usage

[yli11@noderome155 analysis]$ run_lsf.py -f fastq.tsv2 -p strand_seq -g hg38
2021-10-15 16:15:12,452 - INFO - main - The job id is: strand_seq_yli11_2021-10-15
2021-10-15 16:15:12,662 - INFO - submit_pipeline_jobs - bwa has been submitted; JobID: 142856628
2021-10-15 16:15:12,737 - INFO - submit_pipeline_jobs - SV has been submitted; JobID: 142856629
2021-10-15 16:15:12,829 - INFO - submit_pipeline_jobs - email has been submitted; JobID: 142856630

code @ github.