pysam example: checking softclip readsΒΆ

[1]:
region = "chr11:134033589-134033627"
[2]:
!ls *bam
Control_CRL3128_MKSR_Digenomeseq_buffer_rep1.st.bam
Control_CRL3128_MKSR_Digenomeseq_buffer_rep1.st.name_sorted.bam
Control_CRL3129_MKSR_Digenomeseq_buffer_rep2.st.bam
Control_CRL3129_MKSR_Digenomeseq_buffer_rep2.st.name_sorted.bam
Control_CRL3130_MKSR_Digenomeseq_buffer_2xABE_rep1.st.bam
Control_CRL3130_MKSR_Digenomeseq_buffer_2xABE_rep1.st.name_sorted.bam
Control_CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.bam
Control_CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.name_sorted.bam
CRL3128_MKSR_Digenomeseq_buffer_rep1.st.bam
CRL3128_MKSR_Digenomeseq_buffer_rep1.st.name_sorted.bam
CRL3129_MKSR_Digenomeseq_buffer_rep2.st.bam
CRL3129_MKSR_Digenomeseq_buffer_rep2.st.name_sorted.bam
CRL3130_MKSR_Digenomeseq_buffer_2xABE_rep1.st.bam
CRL3130_MKSR_Digenomeseq_buffer_2xABE_rep1.st.name_sorted.bam
CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.bam
CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.name_sorted.bam
[5]:
!module load samtools/1.7;samtools view -b CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.bam chr11:134033589-134033627 > CRL3131.chr11_134M.bam
[6]:
!ls *bam
Control_CRL3128_MKSR_Digenomeseq_buffer_rep1.st.bam
Control_CRL3128_MKSR_Digenomeseq_buffer_rep1.st.name_sorted.bam
Control_CRL3129_MKSR_Digenomeseq_buffer_rep2.st.bam
Control_CRL3129_MKSR_Digenomeseq_buffer_rep2.st.name_sorted.bam
Control_CRL3130_MKSR_Digenomeseq_buffer_2xABE_rep1.st.bam
Control_CRL3130_MKSR_Digenomeseq_buffer_2xABE_rep1.st.name_sorted.bam
Control_CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.bam
Control_CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.name_sorted.bam
CRL3128_MKSR_Digenomeseq_buffer_rep1.st.bam
CRL3128_MKSR_Digenomeseq_buffer_rep1.st.name_sorted.bam
CRL3129_MKSR_Digenomeseq_buffer_rep2.st.bam
CRL3129_MKSR_Digenomeseq_buffer_rep2.st.name_sorted.bam
CRL3130_MKSR_Digenomeseq_buffer_2xABE_rep1.st.bam
CRL3130_MKSR_Digenomeseq_buffer_2xABE_rep1.st.name_sorted.bam
CRL3131.chr11_134M.bam
CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.bam
CRL3131_MKSR_Digenomeseq_buffer_2xABE_rep2.st.name_sorted.bam
[7]:
!samtools view CRL3131.chr11_134M.bam
NB551526:165:H2FWWAFX3:1:11110:4648:3276        83      chr11   134033461       60      151M    =       134033605       -7      GCCCACACCCAAGGTCCGGGGGGCCTGCCTGTGGCCCAAAGTGCACAAGAGGCAAGCCCTGGGCGGGAAGCGGGCTGTGGGGTAGCGGGGGGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG /EE<//E/A///EE///AA<A/AE/<<<////</<A/EE//A///AA/E/E/////AE/<E<//EA///E//EA////</E///A/AEE/<AEEEAEA/<E/EEEAEEEE/<//EEAAAEA/EEEEE/E/EEA//EA//AE6EA/EA/AAA  NM:i:11 MD:Z:0T15A4T7A33A6A3T0A1A4A6T61 AS:i:100        XS:i:19
NB551526:165:H2FWWAFX3:2:11111:12498:1139       83      chr11   134033526       60      65S86M  =       134033603       -9      GGTATTATCGGGGCATGATTGACCCTGTATATTATACACATCTAGGTTTTTGTTTTTATGAGACAGGAAGAGGGTTGAGGGGATGCGGGTGGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG <E////6</6EE////E/////////E////E////////A/<///E/////<6///A////AA/EE//E/AAEE/E/EEEE</EAEEE/EEEEAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA  NM:i:2  MD:Z:10A7A67    AS:i:76 XS:i:19
NB551526:165:H2FWWAFX3:3:11506:23842:15959      129     chr11   134033552       60      10S141M chr4    15229442        0       ATGGTGGCACGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACCCCAAGCAAGCAGCCAAGCATATGCTTT A/A//AEE/AE/EE/AAEE/E/AAE/AA<E<E<E<EE/EEEE/AEEEEAE<E/E/EAEE/A/E<</EEE/E<</<EEAE/EAAAE/EE/<EAEEEEEAEEEE/<AA/EEEE/E/AEAEE/A/E/AE//AE/EEAA/EE<//E<A<6AEEAE    NM:i:1  MD:Z:113A27     AS:i:136        XS:i:20
NB551526:165:H2FWWAFX3:2:11306:7264:13463       97      chr11   134033556       60      129M22S =       134033556       56      GGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAGATCGGCCAAGAAGAAACAGAAGTTCAGGAGGAAGAGACAGATAAACAGCAGACCCCAAGCAAGCAGCCAAGCAAATGCCTGTCTCTTATACACAA AAAAAEEEEEEEEAEEEEEEEEAE/EEEAEEEEEEE/EEAEE/AE/EEEEEEEAE<E//////EEE///A/AEAA/EE/<E///E///E////<</E/E////AEEEAE/<</E<EE//E<E/<AE/EE/E/AAE/<AEAEEEEAEE/E//   NM:i:3  MD:Z:58G24T25A19        AS:i:114        XS:i:20
NB551526:165:H2FWWAFX3:2:11306:7264:13463       145     chr11   134033556       60      56M     =       134033556       -56     GGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG        EEE/AEEEEEE<EEEE/EEEEEEAAEAEEEEEEEEEEEEEEEEEEEEEEEEAAAAA     NM:i:0  MD:Z:56 AS:i:56 XS:i:15
NB551526:165:H2FWWAFX3:3:21606:19105:9847       81      chr11   134033563       60      49M     chr17   22118522        0       GTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG       EAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA       NM:i:0  MD:Z:49 AS:i:49 XS:i:16
NB551526:165:H2FWWAFX3:2:11111:12498:1139       163     chr11   134033603       60      151M    =       134033526       9       AGGAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCATATGCTTTGCAAGCAGGCAACACAAGCACCCAGTAGGGTTTGGGGGCTGGGGAGGGTACATGAGCCCAC AAAAAEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEAEAEEEEEEEEEEE6EAEEEEE//EEEE<AEEAEAEEAEAA<AAAEEEEEEEEEEAEAEEEEEEAE<EEAEEAEEEE<<EE/E//EAEEEEAE/EEEAEEEEE6EE/AAEEEEE  NM:i:0  MD:Z:151        AS:i:151        XS:i:20
NB551526:165:H2FWWAFX3:1:11110:4648:3276        163     chr11   134033605       60      151M    =       134033461       7       GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGAAGACACCAAGCAAGCAGCCAAGCATATGCTTTGAAAGCAGGCAACACAAGCACCCAGTAGGGTTTGGGGGCTGGGGAGGGTACATGAGCCCACAG A/AAAEEEEEAA//EE/EEEEEEEE/EAAEEAEE/EEEAEEEEEAEEEEEEEAAE//A/<<<<EEAAA<A<E/E/6/A/<A</6</AA//</<EAA/EAE<E/AE//EE/AEEAEE</AEE/A</<<EE<E//EE//AEEA/EE/AEEE<<        NM:i:2  MD:Z:55C33C61   AS:i:141        XS:i:23
NB551526:165:H2FWWAFX3:2:21204:24770:10803      97      chr11   134033605       60      81M     chr10   115488125       0       GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCAT       AAAAAEE6EEEEEEEEAEEE//EEEEEA//EEE/<A/AEEEEAEE</EEEAEEE/66E/EE/AEE/EE/AEEAEEE/<EEA      NM:i:0  MD:Z:81 AS:i:81 XS:i:19
NB551526:165:H2FWWAFX3:4:21506:11727:7870       161     chr11   134033605       60      81M     chr10   115488125       0       GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCAT       AAAAAEEEEEEEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEAEA<AEEEEEEAEA<AE<AEAAE<EAE        NM:i:0  MD:Z:81 AS:i:81 XS:i:19
[11]:
!samtools view CRL3131.chr11_134M.bam | cut -f 1 | sort|uniq > read.list
[12]:
!module load seqtk;seqtk subseq

Usage:   seqtk subseq [options] <in.fa> <in.bed>|<name.list>

Options: -t       TAB delimited output
         -l INT   sequence line length [0]

Note: Use 'samtools faidx' if only a few regions are intended.

[13]:
!ls ../../*gz
../../CRL3128_S1_R1_001.fastq.gz  ../../CRL3130_S3_R2_001.fastq.gz
../../CRL3128_S1_R2_001.fastq.gz  ../../CRL3131_S4_R1_001.fastq.gz
../../CRL3129_S2_R1_001.fastq.gz  ../../CRL3131_S4_R2_001.fastq.gz
../../CRL3129_S2_R2_001.fastq.gz  ../../CRL3132_S5_R1_001.fastq.gz
../../CRL3130_S3_R1_001.fastq.gz  ../../CRL3132_S5_R2_001.fastq.gz
[14]:
!module load seqtk;seqtk subseq ../../CRL3131_S4_R1_001.fastq.gz read.list > test.R1.fastq
!module load seqtk;seqtk subseq ../../CRL3131_S4_R2_001.fastq.gz read.list > test.R2.fastq
[15]:
!head ../../*yaml
==> ../../07062021_MKSR.yaml <==
reference_genome: /rgs01/project_space/tsaigrp/Genomics/common/genomes/hg38/hg38_chroms_only.fa
analysis_folder: /rgs01/project_space/tsaigrp/Genomics/common/projects/CHANGE_seq/CHANGE_seq_07062021

bwa: bwa
samtools: samtools

read_threshold: 3
window_size: 30
mapq_threshold: 50
start_threshold: 1

==> ../../changeseq_py3_test.yaml <==
reference_genome: /rgs01/project_space/tsaigrp/Genomics/common/genomes/hg38/hg38_chroms_only.fa
analysis_folder: /rgs01/project_space/tsaigrp/Genomics/common/projects/CHANGE_seq/CHANGE_seq_07062021/py3_test

# PATH to analysis tools
# If not specified, default is just the tool name, i.e., accessible in the $PATH
bwa: bwa
samtools: samtools
cutadapt: cutadapt

# read_threshold: 3
[16]:
!a=/rgs01/project_space/tsaigrp/Genomics/common/genomes/hg38/hg38_chroms_only.fa
!echo $a

[17]:
!module load bwa;bwa mem /rgs01/project_space/tsaigrp/Genomics/common/genomes/hg38/hg38_chroms_only.fa test.R1.fastq test.R2.fastq > test.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 14 sequences (2114 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 3, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 14 reads in 0.005 CPU sec, 0.005 real sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa mem /rgs01/project_space/tsaigrp/Genomics/common/genomes/hg38/hg38_chroms_only.fa test.R1.fastq test.R2.fastq
[main] Real time: 1.949 sec; CPU: 1.097 sec
[22]:
!cat test.sam
@SQ     SN:chr10        LN:133797422
@SQ     SN:chr11        LN:135086622
@SQ     SN:chr12        LN:133275309
@SQ     SN:chr13        LN:114364328
@SQ     SN:chr14        LN:107043718
@SQ     SN:chr15        LN:101991189
@SQ     SN:chr16        LN:90338345
@SQ     SN:chr17        LN:83257441
@SQ     SN:chr18        LN:80373285
@SQ     SN:chr19        LN:58617616
@SQ     SN:chr1 LN:248956422
@SQ     SN:chr20        LN:64444167
@SQ     SN:chr21        LN:46709983
@SQ     SN:chr22        LN:50818468
@SQ     SN:chr2 LN:242193529
@SQ     SN:chr3 LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ     SN:chr6 LN:170805979
@SQ     SN:chr7 LN:159345973
@SQ     SN:chr8 LN:145138636
@SQ     SN:chr9 LN:138394717
@SQ     SN:chrEBV       LN:171823
@SQ     SN:chrM LN:16569
@SQ     SN:chrX LN:156040895
@SQ     SN:chrY LN:57227415
@PG     ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem /rgs01/project_space/tsaigrp/Genomics/common/genomes/hg38/hg38_chroms_only.fa test.R1.fastq test.R2.fastq
NB551526:165:H2FWWAFX3:1:11110:4648:3276        81      chr11   134033461       60      151M    =       134033605       -7      GCCCACACCCAAGGTCCGGGGGGCCTGCCTGTGGCCCAAAGTGCACAAGAGGCAAGCCCTGGGCGGGAAGCGGGCTGTGGGGTAGCGGGGGGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG /EE<//E/A///EE///AA<A/AE/<<<////</<A/EE//A///AA/E/E/////AE/<E<//EA///E//EA////</E///A/AEE/<AEEEAEA/<E/EEEAEEEE/<//EEAAAEA/EEEEE/E/EEA//EA//AE6EA/EA/AAA  NM:i:11 MD:Z:0T15A4T7A33A6A3T0A1A4A6T61 AS:i:100        XS:i:19
NB551526:165:H2FWWAFX3:1:11110:4648:3276        161     chr11   134033605       60      151M    =       134033461       7       GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGAAGACACCAAGCAAGCAGCCAAGCATATGCTTTGAAAGCAGGCAACACAAGCACCCAGTAGGGTTTGGGGGCTGGGGAGGGTACATGAGCCCACAG A/AAAEEEEEAA//EE/EEEEEEEE/EAAEEAEE/EEEAEEEEEAEEEEEEEAAE//A/<<<<EEAAA<A<E/E/6/A/<A</6</AA//</<EAA/EAE<E/AE//EE/AEEAEE</AEE/A</<<EE<E//EE//AEEA/EE/AEEE<<        NM:i:2  MD:Z:55C33C61   AS:i:141        XS:i:23
NB551526:165:H2FWWAFX3:2:11111:12498:1139       81      chr11   134033526       60      65S86M  =       134033603       -9      GGTATTATCGGGGCATGATTGACCCTGTATATTATACACATCTAGGTTTTTGTTTTTATGAGACAGGAAGAGGGTTGAGGGGATGCGGGTGGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG <E////6</6EE////E/////////E////E////////A/<///E/////<6///A////AA/EE//E/AAEE/E/EEEE</EAEEE/EEEEAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA  NM:i:2  MD:Z:10A7A67    AS:i:76 XS:i:0
NB551526:165:H2FWWAFX3:2:11111:12498:1139       161     chr11   134033603       60      151M    =       134033526       9       AGGAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCATATGCTTTGCAAGCAGGCAACACAAGCACCCAGTAGGGTTTGGGGGCTGGGGAGGGTACATGAGCCCAC AAAAAEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEAEAEEEEEEEEEEE6EAEEEEE//EEEE<AEEAEAEEAEAA<AAAEEEEEEEEEEAEAEEEEEEAE<EEAEEAEEEE<<EE/E//EAEEEEAE/EEEAEEEEE6EE/AAEEEEE  NM:i:0  MD:Z:151        AS:i:151        XS:i:20
NB551526:165:H2FWWAFX3:2:11306:7264:13463       97      chr11   134033556       60      129M22S =       134033556       56      GGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAGATCGGCCAAGAAGAAACAGAAGTTCAGGAGGAAGAGACAGATAAACAGCAGACCCCAAGCAAGCAGCCAAGCAAATGCCTGTCTCTTATACACAA AAAAAEEEEEEEEAEEEEEEEEAE/EEEAEEEEEEE/EEAEE/AE/EEEEEEEAE<E//////EEE///A/AEAA/EE/<E///E///E////<</E/E////AEEEAE/<</E<EE//E<E/<AE/EE/E/AAE/<AEAEEEEAEE/E//   NM:i:3  MD:Z:58G24T25A19        AS:i:114        XS:i:20
NB551526:165:H2FWWAFX3:2:11306:7264:13463       145     chr11   134033556       60      95S56M  =       134033556       -56     TGAGGCTGGAGATCTGCTTGATAATGGGAGGCAGAAGTTGCAGTCCCCTTAGACTGTCTCTTATACCCATCTCCGCAGATGTGTATATGAGACCGGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG /6/EE//EE<E///EA////E/////<//////E//E</AA/E/////////<AE/////E/E</E/EAA</<6//////A/AA/<//A/66///EEE/AEEEEEE<EEEE/EEEEEEAAEAEEEEEEEEEEEEEEEEEEEEEEEEAAAAA      NM:i:0  MD:Z:56 AS:i:56 XS:i:0
NB551526:165:H2FWWAFX3:2:21204:24770:10803      97      chr11   134033605       60      81M70S  chr10   115488125       0       GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCATCTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGAT AAAAAEE6EEEEEEEEAEEE//EEEEEA//EEE/<A/AEEEEAEE</EEEAEEE/66E/EE/AEE/EE/AEEAEEE/<EEAE/E<E/E/AEAEE/EAEEEEE</EEEEEE/EEEEEEAAEA/E//AAA/A//EEAAEEEEA/<EEEAE/E<    NM:i:0  MD:Z:81 AS:i:81 XS:i:19
NB551526:165:H2FWWAFX3:2:21204:24770:10803      145     chr10   115488125       60      15S136M chr11   134033605       0       TGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGATGTTCATCAGGGATATTGGCCTGAAATTTTTCCTTTTGCTGTGTCTCTACCAGGTTTTGGTGTCAGGATGATGCTGGCCTCTTAAAATGAGTTAGGGAGGATTCCTTC /<EAE<<A/AAE/A</<///EAAEAAA<EEEEEEEAE<EEEEEEA/<EAA<AEEEEEEEEEE6/6AA/E<AA<6A<6//<EA/EEEAAA/EE/E<EEAEEEEEEE/EEEEE<EEAEEEEE/EAEAEEEEEEEEEEEEEAEEEEEE/AAAAA    NM:i:0  MD:Z:136        AS:i:136        XS:i:84
NB551526:165:H2FWWAFX3:3:11506:23842:15959      65      chr4    15229442        60      147M4S  chr11   134033552       0       TCAAAAATGTTTGTTGACTGAATAAATGATATATGAATGAAGCTTAAAATAGAAACCATAAGAACCCTCCAATTAAAGTGTACCATTCTTATAGGTAAGGTGGGATTCATAGCACACCGCTCATCTTTCAGCTGCCAAGGAGCCTTGCTGT AAAAAEEEEEEEEEEEAEEEE/EAEEEE//EEEEE/EAE/EEEA/EEEEAEEEA/EAE<EEE/EEEEEEEEEEEEE//EEEEEEEAE/E/EAEEEAEEEAAEEEEEEEEE/EE<EEEEE<EAAEA/<AEEE<E/EE//AA/6EEEAEEEAE  NM:i:0  MD:Z:147        AS:i:147        XS:i:0
NB551526:165:H2FWWAFX3:3:11506:23842:15959      129     chr11   134033552       60      10S141M chr4    15229442        0       ATGGTGGCACGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACCCCAAGCAAGCAGCCAAGCATATGCTTT A/A//AEE/AE/EE/AAEE/E/AAE/AA<E<E<E<EE/EEEE/AEEEEAE<E/E/EAEE/A/E<</EEE/E<</<EEAE/EAAAE/EE/<EAEEEEEAEEEE/<AA/EEEE/E/AEAEE/A/E/AE//AE/EEAA/EE<//E<A<6AEEAE    NM:i:1  MD:Z:113A27     AS:i:136        XS:i:20
NB551526:165:H2FWWAFX3:3:21606:19105:9847       113     chr17   22118820        8       60M91S  chr7    57551444        0       GTGGCTTGCTTCTTCAGTGCCCTGCTGCTCAGACCTCTCAGGGAGGATACAGATGGGCAGCTGTCTCTTATACATATCTACGTAGTTGTGTATAAGAGACAGGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG E/EEAE/E<E/6E///E/EE<</A//A//6///E/<<EA/EA/<<///EE///AEE/E/E/AE/EAAE/E/AE//AE/<///E///E<</6/<<<//<EEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA       NM:i:2  MD:Z:31A7C20    AS:i:50 XS:i:45 SA:Z:chr11,134033563,-,102S49M,8,0;     XA:Z:chr4,+33838273,91S60M,3;chr6,+99255481,91S60M,3;chr20,-26162944,60M91S,3;
NB551526:165:H2FWWAFX3:3:21606:19105:9847       2161    chr11   134033563       8       102H49M chr7    57551444        0       GTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG       EAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA       NM:i:0  MD:Z:49 AS:i:49 XS:i:0  SA:Z:chr17,22118820,-,60M91S,8,2;
NB551526:165:H2FWWAFX3:3:21606:19105:9847       177     chr7    57551444        0       6S145M  chr17   22118820        0       CCTCAAAAGTATGATCATTTCTTTTCATCTTTTATCTGGGACATAACCAAAATTTAGAAGTTTAATATTAAAAAGATGATTCTCACACAAACCTAGGATAAACATATTTTAAGACTATATAGAACTCATTAACGAAGGAACCGAAAAGAAG <EEAAEAEEEAAEEEEEAEAAEA<AEEEEAEA/EAEEEEEEEEEEEEEEEEEAE<EEEEEEE6<E6EEAEEE<EAEEAEEEAAAEEEEEEEEE<EEEEEEEEEEEEEAEAEEEEEAE<EAEEEEEEEEEAEEEEEEEEEEEEEEEEAAAAA    NM:i:11 MD:Z:6A13C3G19C1G14C46T6C10T2G6A8       AS:i:90 XS:i:90 XA:Z:chr17,+22118522,135M16S,9;chr4,-33838485,6S145M,15;
NB551526:165:H2FWWAFX3:4:21506:11727:7870       81      chr10   115488125       60      15S136M chr11   134033605       0       TGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGATGTTCATCAGGGATATTGGCCTGAAATTTTTCCTTTTGCTGTGTCTCTACCAGGTTTTGGTGTCAGGATGATGCTGGCCTCTTAAAATGAGTTAGGGAGGATTCCTTC /EEEA/AEAAEAAAAEAEEEAE<EAAEEEEAEEAEEEEE/AEE6EEEAE/EEEEEEEEEEEEEEAE<EEA<AAAA<EA6AEEEEEEEEAEAEEEEEEAEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAAA     NM:i:0  MD:Z:136        AS:i:136        XS:i:84
NB551526:165:H2FWWAFX3:4:21506:11727:7870       161     chr11   134033605       60      81M70S  chr10   115488125       0       GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCATCTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGAT AAAAAEEEEEEEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEAEA<AEEEEEEAEA<AE<AEAAE<EAEA</<<AAEA<AEAAE<AEEEEEEAEEEEEAE<AAEAAEAEAEEEE<EEEEEEEEEEEEEEE<AEE/AEEA  NM:i:0  MD:Z:81 AS:i:81 XS:i:19
[27]:
!module load samtools/1.7;samtools view -bS test.sam > test.bam;samtools sort -o test.st.bam test.bam;samtools index test.st.bam
[23]:
import pysam
[28]:

f = pysam.AlignmentFile('test.sam') for read in f.fetch(): print (read.query_name,read.reference_name,read.reference_start,read.reference_end)
NB551526:165:H2FWWAFX3:1:11110:4648:3276 chr11 134033460 134033611
NB551526:165:H2FWWAFX3:1:11110:4648:3276 chr11 134033604 134033755
NB551526:165:H2FWWAFX3:2:11111:12498:1139 chr11 134033525 134033611
NB551526:165:H2FWWAFX3:2:11111:12498:1139 chr11 134033602 134033753
NB551526:165:H2FWWAFX3:2:11306:7264:13463 chr11 134033555 134033684
NB551526:165:H2FWWAFX3:2:11306:7264:13463 chr11 134033555 134033611
NB551526:165:H2FWWAFX3:2:21204:24770:10803 chr11 134033604 134033685
NB551526:165:H2FWWAFX3:2:21204:24770:10803 chr10 115488124 115488260
NB551526:165:H2FWWAFX3:3:11506:23842:15959 chr4 15229441 15229588
NB551526:165:H2FWWAFX3:3:11506:23842:15959 chr11 134033551 134033692
NB551526:165:H2FWWAFX3:3:21606:19105:9847 chr17 22118819 22118879
NB551526:165:H2FWWAFX3:3:21606:19105:9847 chr11 134033562 134033611
NB551526:165:H2FWWAFX3:3:21606:19105:9847 chr7 57551443 57551588
NB551526:165:H2FWWAFX3:4:21506:11727:7870 chr10 115488124 115488260
NB551526:165:H2FWWAFX3:4:21506:11727:7870 chr11 134033604 134033685
[47]:
from skbio.alignment import global_pairwise_align_nucleotide
from skbio.sequence import DNA

tn5="CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG"
f = pysam.AlignmentFile('test.sam')
for read in f.fetch():
    print ("~~~~~~~~~~~~~~~~~~~~~~~~~~")
    print (read.query_name,read.reference_name,read.mapq,read.cigarstring,read.cigartuples,read.is_reverse,read.is_read1 )
    pos = 0
    print (read.seq)
    for i,j in read.cigartuples:
        if i == 4: # soft clip
            soft_clip = read.seq[pos:pos+j]
            alignment, score, start_end_positions = global_pairwise_align_nucleotide(DNA(soft_clip),DNA(tn5))
            a,b = alignment[0]._string.decode("utf-8"),alignment[1]._string.decode("utf-8")
            print (a)
            print (b)
        pos+=j
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:1:11110:4648:3276 chr11 60 151M [(0, 151)] True True
GCCCACACCCAAGGTCCGGGGGGCCTGCCTGTGGCCCAAAGTGCACAAGAGGCAAGCCCTGGGCGGGAAGCGGGCTGTGGGGTAGCGGGGGGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:1:11110:4648:3276 chr11 60 151M [(0, 151)] False False
GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGAAGACACCAAGCAAGCAGCCAAGCATATGCTTTGAAAGCAGGCAACACAAGCACCCAGTAGGGTTTGGGGGCTGGGGAGGGTACATGAGCCCACAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:2:11111:12498:1139 chr11 60 65S86M [(4, 65), (0, 86)] True True
GGTATTATCGGGGCATGATTGACCCTGTATATTATACACATCTAGGTTTTTGTTTTTATGAGACAGGAAGAGGGTTGAGGGGATGCGGGTGGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG
GGTATTATCGGGGCATGATTGACCCTGTATATTATACACATCTAGGTTTTTGTTTTTATGAGACA-
------------------------CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:2:11111:12498:1139 chr11 60 151M [(0, 151)] False False
AGGAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCATATGCTTTGCAAGCAGGCAACACAAGCACCCAGTAGGGTTTGGGGGCTGGGGAGGGTACATGAGCCCAC
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:2:11306:7264:13463 chr11 60 129M22S [(0, 129), (4, 22)] False True
GGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAGATCGGCCAAGAAGAAACAGAAGTTCAGGAGGAAGAGACAGATAAACAGCAGACCCCAAGCAAGCAGCCAAGCAAATGCCTGTCTCTTATACACAA
AATGCCTGTCTCTTATACACAA-------------------------
-----CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:2:11306:7264:13463 chr11 60 95S56M [(4, 95), (0, 56)] True False
TGAGGCTGGAGATCTGCTTGATAATGGGAGGCAGAAGTTGCAGTCCCCTTAGACTGTCTCTTATACCCATCTCCGCAGATGTGTATATGAGACCGGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG
TGAGGCTGGAGATCTGCTTGATAATGGGAGGCAGAAGTTGCAGTCCCCTTAGACTGTCTCTTATACCCATCTCCGCAGATGTGTATATGAGACCG
-----------------------------------------------------CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:2:21204:24770:10803 chr11 60 81M70S [(0, 81), (4, 70)] False True
GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCATCTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGAT
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGAT
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG----------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:2:21204:24770:10803 chr10 60 15S136M [(4, 15), (0, 136)] True False
TGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGATGTTCATCAGGGATATTGGCCTGAAATTTTTCCTTTTGCTGTGTCTCTACCAGGTTTTGGTGTCAGGATGATGCTGGCCTCTTAAAATGAGTTAGGGAGGATTCCTTC
--------------------------TGTGTATAAGAGACA-
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:3:11506:23842:15959 chr4 60 147M4S [(0, 147), (4, 4)] False True
TCAAAAATGTTTGTTGACTGAATAAATGATATATGAATGAAGCTTAAAATAGAAACCATAAGAACCCTCCAATTAAAGTGTACCATTCTTATAGGTAAGGTGGGATTCATAGCACACCGCTCATCTTTCAGCTGCCAAGGAGCCTTGCTGT
CTGT--------------------------------------
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:3:11506:23842:15959 chr11 60 10S141M [(4, 10), (0, 141)] False False
ATGGTGGCACGGGAGGACAGTGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACCCCAAGCAAGCAGCCAAGCATATGCTTT
ATGGTGGCAC-----------------------------------------
---------CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:3:21606:19105:9847 chr17 8 60M91S [(0, 60), (4, 91)] True True
GTGGCTTGCTTCTTCAGTGCCCTGCTGCTCAGACCTCTCAGGGAGGATACAGATGGGCAGCTGTCTCTTATACATATCTACGTAGTTGTGTATAAGAGACAGGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG
CTGTCTCTTATACATATCTACGTAGTTGTGTATAAGAGACAGGTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG-------------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:3:21606:19105:9847 chr11 8 102H49M [(5, 102), (0, 49)] True True
GTCGATTGCAGGCGCAATCCGAGCAAAAGACGGGCACCACAGGAGCCAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:3:21606:19105:9847 chr7 0 6S145M [(4, 6), (0, 145)] True False
CCTCAAAAGTATGATCATTTCTTTTCATCTTTTATCTGGGACATAACCAAAATTTAGAAGTTTAATATTAAAAAGATGATTCTCACACAAACCTAGGATAAACATATTTTAAGACTATATAGAACTCATTAACGAAGGAACCGAAAAGAAG
------------------------------------------CCTCAA
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG------
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:4:21506:11727:7870 chr10 60 15S136M [(4, 15), (0, 136)] True True
TGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGATGTTCATCAGGGATATTGGCCTGAAATTTTTCCTTTTGCTGTGTCTCTACCAGGTTTTGGTGTCAGGATGATGCTGGCCTCTTAAAATGAGTTAGGGAGGATTCCTTC
--------------------------TGTGTATAAGAGACA-
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG
~~~~~~~~~~~~~~~~~~~~~~~~~~
NB551526:165:H2FWWAFX3:4:21506:11727:7870 chr11 60 81M70S [(0, 81), (4, 70)] False False
GAGCCAGATGGGCCAAGAAGAAACAGAAGTTCAGTAGGAAGAGACAGATAAACAGCAGACACCAAGCAAGCAGCCAAGCATCTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGAT
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAGGTATTTTACTGAGGATTTTTGCATCGAT
CTGTCTCTTATACACATCTACGTAGATGTGTATAAGAGACAG----------------------------