The SeqAn library contains a powerful method for realignment and consensus based on the alignment graph [1]. While this works quite well, the approach has some drawbacks. The following picture shows a multi-read alignment where the graph-based (re-)alignment method did not succeed in create a good alignment.
The input for the graph-based alignment are matches between the reads. The matches might be conflicting and the alignment algorithm selects some matches while discarding others. In the left two marked regions, this leads to many small insertions and deletions while in the right marked region, there is a long stretchTAATT…CAACA
that the matches were discarded for.
Another problem of the realignment method is the cubic running time of the triplet consensus extension. This is problematic with deep alignments (hundreds of stacked sequences).
The aim of this thesis is to fix the issues mentioned above.