The purpose of this thesis is to allow "easy parallelism" in the SeqAn library. This will consist of identifying parts of the library where parallelization can be applied effectively and actually engineering (designing, implementing and evaluating) parts of the library to use parallelism. Common parallelization patterns should be extracted into a generalized framework/library that can then expose high-level parallelism to the users of SeqAn.
The parallel execution of the string search algorithms in the module "find" as well as the algorithms for constructing indices are obvious candidate for parallelization. The parallel execution of string matching algorithms is a good starting point. Additionally, it allows the extension of the task to many interesting research questions, e.g. how to collect the results, how to process them etc.
The module "find" is one of the oldest modules in SeqAn and a lot of "technical debt" has accumulated. We are currently working on a new draft of this module and a naive implementation of the various search types. For the application of the implementation, it will probably be easier to work with the new interface and adjust the old -- working -- code to the new specification.
To do this for the competetive algorithms (e.g. Myers Bit Vector Algorithm) is part of this thesis work.
There is lots of related practical and theoretical work. The implementation resulting from this thesis should find a "sweet spot" between reimplementing existing implementations and using them. The decision finding process is to be documented. Competetive previous implementation of algorithms are to be taken into consideration in an exhaustive experimental study.
Related work includes [TBB], [Singler, 2008], [Singler, 2007], [Naeher, 2008], [Tichy, 2009], [Tichy, 2008] and [Massingil et al., 2000]. The design of experiments and presentation of data could be guided by [Sanders, 2002].
We propose the following tentative schedule.
Tentative total: 25 weeks. There will be regular meetings with your supervisor.
We stress that one month should be allocated for the thesis write-up to ensure enough time for a high-quality write-up. However, with all other schedule items: If the work progresses more quickly than anticipated, time can be rescheduled.
The wiki pages ThesisParallelismInSeqanWorkSchedule and ThesisParallelismInSeqanWeeklyReports should track the updated actual schedule for the next weeks and weekly reports.The following can serve as a starting point for your own research: