Accomplishments in the last six months:

Split read mapping:

  • Finished implementation of splitRazers, poster at ISMB
  • Implemented sanity check perl script (using readsim to simulate split reads, split mapping, counting mapped reads)
  • Set up an evaluation pipeline:
  1. simulate indels within given size ranges or from a gff file of known indels into a reference sequence (indelSimulator)
  2. simulate reads from indeled reference (readsim)
  3. map reads onto original reference (razers)
  4. map leftover reads onto original reference (splitRazers)
  5. do indel calling on split reads (snpStore)
  6. check for TPs, FPs, FNs per size range (compareVariants)
  • Evaluated on simulated data:
  1. simulated indels (size range -30 to 3000bp) --> almost everything can be recovered, and there are almost no FPs
  2. simulate from dbSNP indels --> indels harder to
  • Comparison with GSNAP: work in progress (on 1. there were no big surprises)

Variant detection:

  • Added split-read indel calling in snpStore
  • Minor things like read clipping
  • Integrated realignment (finally!)

Ongoing data analysis projects:

  • Resequencing projects of Ropers department
  • Gene fusions in 454 RNA-Seq data (Lena Feldhahn, Tübingen)

SeqAn maintenance/ tutoring:

  • Seqan Retreat
  • Split-read mapping (hamming/edit) now in library/apps
  • SnpStore now in library/apps
  • Tutored Kerstin Neubert's Master thesis “Detection of copy number variations”


  • poster at ISMB 2010, Illumina meeting, SeqAn retreat
  • Reviewed 2 papers
  • Incorporated self-matches in refinement and corresponding self-edges in alignment graph

Goals for the next six months

  • Finish paper on splitRazers, submit before Christmas
  • With Birte: Split mapping with local swift (multiply split reads)
  • Start writing up thesis after Christmas break


