Implementation of LAST with gapped seeds and applications to read mapping and local alignment
Area
Substring Indices, read mapping, local alignment
Topic
The goal of this thesis is to reimplement and extend parts of LAST [1], which in essence uses a string index to adaptively search for sequence seeds that do not occur too often. The thesis can be split into 5 larger parts.
- Implementation of a gapped suffix array using the method described in [2] or a similar method.
- Implementation of LAST [1], i.e. finding adaptive seeds (gapped and ungapped)
- Using LAST to find seeds in replacement of epsilon-cores for STELLAR. Develop strategies to handle overlapping seeds.
- Using LAST to find seeds to verify using Myers bitvector algorithm.
- Evaluating both tools.
Timeline
References
- Kiełbasa, Szymon M, Raymond Wan, Kengo Sato, Paul Horton, and Martin C Frith. 2011. Adaptive Seeds Tame Genomic Sequence Comparison. Genome Research 21 (3) (March): 487493. doi:10.1101/gr.113985.110.
- Horton, P, and SM Kiełbasa. 2008. DisLex: a Transformation for Discontiguous Suffix Array Construction. In.
- Kehr, B., David Weese, and Knut Reinert. 2011. STELLAR: Fast and Exact Local Alignments. BMC Bioinformatics 12 (Suppl 9): S15.
- Siragusa, Enrico, David Weese, and Knut Reinert. 2013. Fast and Accurate Read Mapping with Approximate Seeds and Multiple Backtracking. Nucleic Acids Research (January 28). doi:10.1093/nar/gkt005.