Myers Bit-Vector Algorithm on GPU for Seqan
Massiv parallelization of Myers Fast Bit-Vector Algorithm for Approximate String Matchinsg using GPGPU technics (NVidia CUDA) in Seqan
Student
Lars Langner
Academic Advisor
Prof. Dr. Knut Reinert, David Weese
Description
This work will be about the possibilty to use GPGPU methodes in Seqan (especially with NVidia CUDA) and
the implementation and parallelization of Myers fast bitvector algorithm for approximate string matchings.
It is stated that using massive multiprocessor programming can dramatically increases the speed and performance
of algorithms, depending on the amount that could be parallized and the efficiency it is implemented.
Using hundreds or thousands of parallel threads for read mapping/localizing/matching to a genom with the fast GPU cores
and Myers fast matching algorithm can give an great gain of speed for sequence analyse and string matchings.
Serveral issues has to be worked out:
- denote what preconditions are needed for using and compiling GPGPU/NVidia CUDA in Seqan
- if and how generic programming or meta-programming can be used within the algorithm on GPU
- generic implementation with any desired bitvector-length in an efficient manner for Seqan
- generic implementation of Myers bitvector algorithm on CPU and GPU with different strategies (global vs. thread-local memorys)
- extension to the banded Myers bitvector algorithm (Ukkonen Trick)
- usage of different distance measures (Levenshtein and Damerau Edit Distances)
- efficently distribution/scheduling of the tasks onto the GPU processors/threads
- testing, performance and usage analyse for the algorithm and comparing to the CPU variant
This will require an efficient adopting of the problem and problem-size for partitioning the
parallel work most effective on GPU with different hardware settings.
Literature
[1] Myers, G. (1999). A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming,
Journal of the ACM, 46(3): 495-415
[2] Hyyrö, H. (2001). A Bit-Vector Algorithm for Computing Levenshtein and Damerau Edit Distances
[3] Gröpl, C., Klau, G. and Reinert, K. (2009). Bit-parallel string matching
[4] Bailey, M., Cunningham, S. (2009). Graphics Shaders - Theory and Practice
[5] Kirk, David B., Hwu, Wen.mei W. (2010). Programming Massively Parallel Processors - A Hands-on Approach
Work Plan
This diploma thesis has a duration of 6 month from 01.10.2010 until 01.04.2011.
Month 1 - Integrate and Implement the algorithm for GPU and CPU
Month 2 - Implement different variants, strategies for scheduling and find usage capabilitys
Month 3 - Integrate into Seqan, Real-World data and performance tests
Month 4 - Enhance implementations and beginn writing diploma thesis
Month 5 & 6 - Working and writing out the diploma thesis
Weekly Report
Week 0 - Literature Study and Researches
Week 1 (4.10) - Seqan Retreat 2010 / Getting into Seqan
Week 2 (11.10) - Generic Bitvectors & Myers-Algorithm Implementation
Week 3 (18.10) - Generic CUDA Implementation
Week 4 (25.10) - Solving Memory & Threading issues
Week 5 (01.11) - Improved memory handling, matching detection
Week 6 (08.11) - Implem. other distance measures; first performance measurements
Week 7 (15.11) - Implem. CPU methode like the CUDA methode ; Unrolling loops, force inlining, performance issues
Week 8 (22.11) - begin
OpenCL implementation, Seqan Interface Adaptions ; Code fixings ; double-buffer logic
Week 9 (29.11) - unify code for independent CUDA,OpenCL or CPU usage ; threading & memory issues ; trible-buffering logic
Week 10(06.12) - scheduling, computation and performance test & improvements, many threads test (up to 1 million)
Week 13(13.12) - implementing Myers Banded Variantions, better asyncronisation, commenting code
Week 14(20.12) - dddoc documentations, Seqan integration
Week 15(27.12) - ddoc- documentations, Christmas holiday
Week 16(03.01) - Seqan Finder vs. GPU Test, Diploma Thesis writing
Week 17(10.01) - Buffering,Stability & Performance Improvments, Test Suite Improvments,
OpenCL run
Week 18(17.01) -
OpenCL version for small bitvectors ( <=64 )
Week 19(24.01) - Writing Diploma Thesis
Week 20(31.01) - Writing Diploma Thesis
Week 21(07.02) - Writing Diploma Thesis
Week 22(14.02) - Adding Hamming Distance, Writing Diploma Thesis
Week 23(21.02) - Gathering performance results, Writing Diploma Thesis
Week 24(28.02) - Gathering performance results, Writing Diploma Thesis
Week 25(07.03) - Writing Diploma Thesis
Week 26(14.03) - Diploma Thesis Review
Week 27(21.03) - Diploma Thesis Review
Week 28(28.03) - Finishing and release