RNA-seq alternative splicing pipeline¶
Summary¶
This pipeline performs the following steps:
RNA-Seq De novo Assembly Using
Trinity
(https://github.com/trinityrnaseq/trinityrnaseq/wiki)Map de novo transcripts to genome using
minimap2
Quantify de novo transcripts abundance using
kallisto
Cons¶
Trinity pipeline is very slow, may take weeks to finish.
De novo transcripts are not annotated, people can use the bam file from step2 to perform annotation.
Usually I just use this pipeline to check some specific genes, so I will just subset the bam file and visualize in IGV. I can manually check the abundance in the abundance.tsv using the read names from IGV.
samtools view -b trinity_gmap.st.bam chr8:87173540-87374477 > trinity_gmap.NFIX.bam
samtools index trinity_gmap.NFIX.bam
Input¶
fastq.tsv
Use run_lsf.py --guess_input
to automatically generate this.
Banana_R1.fastq.gz Banana_R2.fastq.gz Banana_lovers
Orange_R1.fastq.gz Orange_R2.fastq.gz Orange_lovers
Usage¶
hpcf_interactive
module load python/2.7.13
run_lsf.py -f fastq.tsv -p trinity
Email notification will be sent once it is finished.
Output¶
Outputs are generated for each fastq file, named as {label}_trinity_out
step1 output file is
Trinity.fasta
. There are also many intermediate files from Trinity.step2 output file is
{label}.st.bam
, you can vis them in IGV.step3 output file is
{label}_abundance/abundance.tsv