Filter bam files and generate bw files

usage: bam_to_bw.py [-h] [-j JID] [--bamCoverage_addon BAMCOVERAGE_ADDON]
                    [--no_filter] [-g GENOME] [-e EFFECTIVEGENOMESIZE]
                    file [file ...]

positional arguments:
  file

optional arguments:
  -h, --help            show this help message and exit
  -j JID, --jid JID     enter a job ID, which is used to make a new directory.
                        Every output will be moved into this folder. (default:
                        bam_to_bw_yli11_2021-10-04)
  --bamCoverage_addon BAMCOVERAGE_ADDON
                        for PE data, you add --center to get sharper peaks
                        (default: )
  --no_filter           no_filter for input bam (default: False)

Genome Info:
  -g GENOME, --genome GENOME
                        genome version: hg19, hg38, mm9, mm10. By default,
                        specifying a genome version will automatically update
                        index file, black list, chrom size and
                        effectiveGenomeSize, unless a user explicitly sets
                        those options. (default: hg19)
  -e EFFECTIVEGENOMESIZE, --effectiveGenomeSize EFFECTIVEGENOMESIZE
                        effectiveGenomeSize for bamCoverage (default:
                        2451960000)

Summary

Read all bam files in the current dir, keep only properly paired mapped reads (-f 3), remove ummapped and duplicated reads (-F 4 -F 8 -F 1024), and filter out low MAPQ reads (-q 5, this will also remove multi-mapped reads). Lastly, generate bw files using the filtered bam files.

10/4/2021 updates

use --no_filter to convert input bam to bw without any filters.

Input

No need to prepare an input list of files. This program will parse all files in a specified pattern.

For example:

*.bam will match all bam files in the current dir.

../../../*.rmdup.bam will match all rmdup.bam files in up 3 dir level.

Output

See bam_files folder and bw_files folder.

Usage

Suppose all your bam files are in the current working dir.

hpcf_interactive

module load python/2.7.13

bam_to_bw.py *.bam

For PE data, use:

bam_to_bw.py *.bam --bamCoverage_addon " --center"

For no filter, use:

bam_to_bw.py *.bam --no_filter

Source code

inbam=$(basename ${COL1})
outbam=${inbam%.bam}.filtered.bam
outbw=${inbam%.bam}.bw

ln -s ${COL1} {{jid}}/$inbam

cd {{jid}}

# filter
samtools view -b -h -f 3 -F 4 -F 8 -F 1024 -q 5 -o $outbam $inbam

# index filter bam

samtools sort $outbam -o ${outbam}.sorted
rm $outbam
mv ${outbam}.sorted $outbam
samtools index $outbam

# bam Coverage

module purge
module load python/2.7.15-rhel7

bamCoverage -b $outbam -o $outbw --smoothLength=200 --ignoreForNormalization chrX chrM  --effectiveGenomeSize {{effectiveGenomeSize}} --numberOfProcessors 4 {{bamCoverage_addon}}

rm $inbam

code @ github.