Exercise 1
Read mapping with semi global alignment
Deadline: 02.05.2012 9:00 a.m.
- Write a C++ program that:
- reads two fast files genome.fa and reads.fa where genome.fa contains a single sequence and reads.fa contains an unknown number of reads.
- uses semi global alignment with dynamic programming for each read to find all matching positions with edit distance at most k.
- implements the semi global alignment with and without the Ukkonen trick.
- creates a file containing for each read all hits (startPos, endPos, errors).
- Test your program on the given Dataset (will be added)
- Compare the performance of the naive approach and the Ukkonen trick. Also use the RazerS read mapper in Seqan and compare the runtimes.
- Write a short application note (no more than 1-2 pages) where you describe your implementation and your observations.
The program usage should be:
./exercise1 <genome.fasta> <reads.fasta> <k> <m> where k its the number of allowed errors and m is either 0 (without Ukkonen) or 1 (with Ukkonen)
The program should produce an output file named
hits.result with one line for each hit in the format:
<read_id>, <start_pos>, <end_pos>, <errors>
Point scheme
For each exercise you can score 4 Points, 3 for the program and 1 for the application note.
For the program:
3 Pts = program runs without errors
2 Pts = program contains small errors
1 Pts = program contains critical errors
0 Pts = no program submitted
Requirements: You have to reach at least 75% of the points summed for all programming exercises