Object Oriented Evolution

Jul 21 2010

From Tophat to Pysam

I recently needed to interact with a SAM file in python that was created as part of a Tophat run. Rather rolling my own parser I decided to use the handy pysam module. However, I immediately ran into trouble. I couldn’t get pysam to open my file. It turns out pysam requires a BAM file not a SAM file.  This meant I need to convert filetypes. So, I fired up samtools and tried a conversion.

>>> samtools view -bS -o accepted_hits.bam accepted_hits.sam 

This produced an error about a missing ‘@SQ line.’  Turns that the Tophat SAM output doesn’t contain the necessary information about contig/scaffold/chrm locations. So, to generate that data I made a ‘fasta index’ file from the genome I was working on.

>>> samtools faidx mygenome.fa 

Then I ran ‘samtools view’ with the index.

>>> samtools view -bS -t mygenome.fa.fai -o accepted_hits.bam accepted_hits.sam

This created BAM file I need with no errors.  Well, almost.  Running it in pysam generated an error. To get it to work there required more manipulation.  The BAM file needed sorting and indexing.

>>> samtools sort accepted_hits.bam accepted_hits.sorted.bam
>>> samtools index accepted_hits.sorted.bam

accepted_hits.sorted.bam’ was searchable when I opened it in pysam.  Whew!

Page 1 of 1