You are here: Foswiki>ABI Web>LectureWiki>AdvancedAlgorithms11 (01 May 2013, weese)Edit Attach

Page AdvancedAlgorithms11

Welcome to the Wiki of the Lecture Advanced Algorithms in Bioinformatics (P4).

News

28.09. (Weese): The second examination (Nachklausur) will take place on Wednesday, 19.10., 2-4 pm, in room 005 (T9).
12.07. (Weese): The examination results are online.
31.05. (Weese): The examination (Klausur) will take place on Wednesday, 29.06., in room 006 (T9).
07.04. (Weese): Please sign up for this lecture at https://www.mi.fu-berlin.de/kvv/course.htm?cid=9685&sid=22 (updated URL)
21.03. (Weese): The first lecture will be given on Monday, 11.04., at 2:15 pm, room SR FB (1.1.16) in Arnimallee 14.
21.03. (Weese): Important/urgent information will be distributed via our mailing list. Please sign up at https://lists.fu-berlin.de/listinfo/bioinf-P4-2011

Exam Results

Mat	Pts	Grade
xxx4940	62,5	2,0
xxx1192	27	5,0
xxx0760	0	5,0
xxx9983	0	5,0
xxx2042	45	3,3
xxx2152	49	3,0
xxx1288	0	5,0
xxx4523	0	5,0
xxx0284	39	3,7
xxx2000	71	1,3
xxx2779	0	5,0
xxx6890	62,5	2,0
xxx6034	39	3,7
xxx1235	65	1,7
xxx9546	68	1,3
xxx9672	57	2,3
xxx9469	43	3,3
xxx2200	43,5	3,3
xxx4248	43	3,3

Schedule

Event	Day	Time	Address	Room
Lecture	Mon	14-16	Arnimallee 14	SR FB (1.1.16)
Lecture	Fri	14-16	Takustr. 9	006
Exercise	Wed	14-16	Takustr. 9	006

Date	Lecture Block	Lecturer
11.04.-06.05.	Online String Searching and Filtering	David Weese
09.05.-30.05.	Index Data Structures	David Weese
11.05.	Review 1
03.06.-17.06.	Genome Comparison and RNA Analysis	Sandro Andreotti
08.06.	Review 2
29.06.	Klausur
18.06.-16.07.	Individual Projects

Exercises

You can download the exercise sheets on the subpages of the different lecture blocks.

Reviews

Requirements for Aktive Teilnahme

You have to hand in 75% of all exercise problems. (An exercise problem counts, if it is clearly visible that a solution was attempted for some time.) In addition you need to reach 50% of all points of the two reviews.

Projects

The time from 18.06. to 16.07. is reserved for your projects. Please pick a project from below and sign in on this (coming soon) page. You are expected to write a C++ program which you have to successfully present and explain in a code review.

For additional information and material check: P4Projects

Project A: Suffix array search (Firdevs, Hatice)

Student 1
- Read in a text from disk
- Construct the suffix array with Quicksort or an algorithm of this lecture
- Construct the lcp-table with the Kasai algorithm
- Write the 2 tables to disk
Student 2
- Read in a text the corresponding suffix array and lcp table from disk
- Implement the suffix array search with mlr-heuristic
- Implement the O(m + log n) suffix array search and therefor construct an lcp-interval tree
- Compare the search times of both algorithms (use a genomic text and a large set of substrings)

Project B: FM-Index (Arthur, Felix, Duc)

Student 1
- Read in a text from disk and construct its suffix array (you are allowed to use the results of Project A, you could also read it in after executing the tool of Project A)
- Construct the Burrows-Wheeler Transformation (BWT) using the suffix array from Project A
- Construct the Occ and c-table
- Provide a function for the L-to-F mapping (used by Student 2)
Student 2
- Implement the exact match algorithm that uses BWT, Occ and c table to search a string in a text
Student 3
- Use the exact match algorithm of Student 2 and backtracking to implement an approximate string search with up to k errors

Project C: Pigeonhole Filter (Aileen, Anke, Yvonne)

Implement the hierarchical filter (PEX) to find approximate matches of a pattern with up to k edit-errors in a text.

Student 1
- You are given the name of a text file, a pattern P and the number of edit-errors k on command line
- Read in the text from disk
- Construct the hierarchical tree as described in the PEX algorithm
Student 2
- Write a semi-global DP-aligner to search a substring of P in a substring of T with edit-errors
- Implement the PEX search algorithm
(Optional: Student 3)
- Implement a banded version of Myers' bit-vector algorithm as a replacement for the DP-aligner

Project D: QUASAR (Daniel, Jörn)

Implement QUASAR to search semi-global alignments with up to k edit-errors of patterns in a text.

Student 1
- Read in a text and construct the q-gram index (assume a DNA alphabet and q=10)
- Read in a set of patterns (you can assume that all have the same size w)
- For each pattern P
  - Count the occurrences of all overlapping q-grams of P in overlapping blocks of the text
Student 2
- For each pattern P
  - For each block that reaches the threshold t (from q-gram lemma)
    - Verify if the block contains a semi-global alignment of P with up to k edit-errors
    - If so, output the pattern P, its end position in the text and the number of errors

Project E: Calculation of the exact q-gram threshold (Franzi, Christopher)

Student 1
- Read in (from command line) a gapped shape, a text length m and a number of mismatches k
- Calculate the exact gapped q-gram threshold t using the DP algorithm described in [1]
Student 2
- Calculate a lower bound for t using the q-gram lemma
- Calculate an upper bound using a greedy algorithm that tries to destroy as many q-grams as possible with k errors (see [2])
- For different gapped shapes and values for m and k compare the results of Student 1 with your bounds

Project F: Skew algorithm (Mathias, Oliver, Jan-Martin)

Implement the skew-algorithm for the difference cover D = {1,2,4} and m = 7

Student 1
- Sort (7 radix passes) and name the septuples T[i..i+6] where i mod 7 equals 1, 2 or 4
- Create the text T' = t₁ t₂ t₄ of septuple-names
- Create suffix array A' of T' recursively
Student 2
- Derive suffix array A¹²⁴ from A'
- Construct suffix arrays A⁰, A³, A⁵, A⁶ iteratively from A¹²⁴
Student 3
- Merge suffix arrays from Student 2 using a priority queue and appropriate comparisons

Project G: Nussinov SCFG (Martin, Kurt)

Student 1
- Implement the Nussinov-SCFG for RNA-folding with the Nussinov CYK algorithm.
- Together with Student 2 compare the folds predicted by your algorithm to those predicted by mfold and RNAFold.
Student 2
- Implement a training procedure that uses RFAM-alignments as training input to learn the SCFG's parameters that can be used by Student 1.

References

Please note that some journals might only be accessible from granted subnets like the FU subnet (or VPN).

[1] Stefan Burkhardt, Juha Kärkkäinen, Better Filtering with Gapped q-Grams, Fundamenta Informaticae, 2003
[2] http://en.wikipedia.org/wiki/Maximum_coverage_problem

Topic revision: r23 - 01 May 2013, weese - This page was cached on 11 Mar 2025 - 06:29.

ABI

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback