From Tophat to Pysam
I recently needed to interact with a SAM file in python that was created as part of a Tophat run. Rather rolling my own parser I decided to use the handy pysam module. However, I immediately ran into trouble. I couldn’t get pysam to open my file. It turns out pysam requires a BAM file not a SAM file. This meant I need to convert filetypes. So, I fired up samtools and tried a conversion.
>>> samtools view -bS -o accepted_hits.bam accepted_hits.sam
This produced an error about a missing ‘@SQ line.’ Turns that the Tophat SAM output doesn’t contain the necessary information about contig/scaffold/chrm locations. So, to generate that data I made a ‘fasta index’ file from the genome I was working on.
>>> samtools faidx mygenome.fa
Then I ran ‘samtools view’ with the index.
>>> samtools view -bS -t mygenome.fa.fai -o accepted_hits.bam accepted_hits.sam
This created BAM file I need with no errors. Well, almost. Running it in pysam generated an error. To get it to work there required more manipulation. The BAM file needed sorting and indexing.
>>> samtools sort accepted_hits.bam accepted_hits.sorted.bam
>>> samtools index accepted_hits.sorted.bam
‘accepted_hits.sorted.bam’ was searchable when I opened it in pysam. Whew!
