Sign up at
ProgrammingGroupList for one of the exercises on Tuesday 15th and Wednesday 16th
Exercise 2
Read mapping with QUASAR
Deadline: 23.05.2012 9:00 a.m.
Implement the QUASAR algorithm for to improve the efficiency of you read mapper in Exercise 1
So you need to:
- Efficiently create a q-gram index of the genome.
- Divide the genome into blocks
- Search for matching q-grams and count matches for each block
- For each block where the matches exceed the threshold (q-gram Lemma) use the semi-global aligner from Exercise 1 for verification.
Format specifications
The input and output file formats and outfile name are the same as in Exercise 1
The program usage should be:
./exercise2 <genome.fasta> <reads.fasta> <k> <q> <b>
where:
- k is the allowed maximal edit distance of a match.
- q is the length of the q-grams.
- b the block size for the genome.
Remark: The choice of q in testing will always respect the memory usage of a q-gram index without hashing (table size 4
q).
Your implementation should allow for a length of 12 at least on a normal desktop.
If the block size is smaller than the read length + k your program should write a warning "read length exceeds block size" into the result file and terminate.
If for the choice of k and q and the read length the calculated threshold is not greater than zero the program should write a warning "Bad choice of k and q for input read length" into the result file and terminate.
For the application note you should test different values of k, q and b and measure their effects on the runtime. Compare to the runtime in Exercise 1