Welcome to the Wiki of the Lecture Advanced Algorithms in Bioinformatics (P4).
News
- 28.09. (Weese): The second examination (Nachklausur) will take place on Wednesday, 19.10., 2-4 pm, in room 005 (T9).
- 12.07. (Weese): The examination results are online.
- 31.05. (Weese): The examination (Klausur) will take place on Wednesday, 29.06., in room 006 (T9).
- 07.04. (Weese): Please sign up for this lecture at https://www.mi.fu-berlin.de/kvv/course.htm?cid=9685&sid=22 (updated URL)
- 21.03. (Weese): The first lecture will be given on Monday, 11.04., at 2:15 pm, room SR FB (1.1.16) in Arnimallee 14.
- 21.03. (Weese): Important/urgent information will be distributed via our mailing list. Please sign up at https://lists.fu-berlin.de/listinfo/bioinf-P4-2011
Exam Results
Mat |
Pts |
Grade |
xxx4940 |
62,5 |
2,0 |
xxx1192 |
27 |
5,0 |
xxx0760 |
0 |
5,0 |
xxx9983 |
0 |
5,0 |
xxx2042 |
45 |
3,3 |
xxx2152 |
49 |
3,0 |
xxx1288 |
0 |
5,0 |
xxx4523 |
0 |
5,0 |
xxx0284 |
39 |
3,7 |
xxx2000 |
71 |
1,3 |
xxx2779 |
0 |
5,0 |
xxx6890 |
62,5 |
2,0 |
xxx6034 |
39 |
3,7 |
xxx1235 |
65 |
1,7 |
xxx9546 |
68 |
1,3 |
xxx9672 |
57 |
2,3 |
xxx9469 |
43 |
3,3 |
xxx2200 |
43,5 |
3,3 |
xxx4248 |
43 |
3,3 |
Schedule
Event |
Day |
Time |
Address |
Room |
Lecture |
Mon |
14-16 |
Arnimallee 14 |
SR FB (1.1.16) |
Lecture |
Fri |
14-16 |
Takustr. 9 |
006 |
Exercise |
Wed |
14-16 |
Takustr. 9 |
006 |
Exercises
You can download the exercise sheets on the subpages of the different lecture blocks.
Reviews
Requirements for Aktive Teilnahme
You have to hand in 75% of all exercise problems. (An exercise problem counts, if it is clearly visible that a solution was attempted for some time.)
In addition you need to reach 50% of all points of the two reviews.
Projects
The time from 18.06. to 16.07. is reserved for your projects. Please pick a project from below and sign in on this (coming soon) page. You are expected to write a C++ program which you have to successfully present and explain in a code review.
For additional information and material check:
P4Projects
Project A: Suffix array search (Firdevs, Hatice)
- Student 1
- Read in a text from disk
- Construct the suffix array with Quicksort or an algorithm of this lecture
- Construct the lcp-table with the Kasai algorithm
- Write the 2 tables to disk
- Student 2
- Read in a text the corresponding suffix array and lcp table from disk
- Implement the suffix array search with mlr-heuristic
- Implement the O(m + log n) suffix array search and therefor construct an lcp-interval tree
- Compare the search times of both algorithms (use a genomic text and a large set of substrings)
Project B: FM-Index (Arthur, Felix, Duc)
- Student 1
- Read in a text from disk and construct its suffix array (you are allowed to use the results of Project A, you could also read it in after executing the tool of Project A)
- Construct the Burrows-Wheeler Transformation (BWT) using the suffix array from Project A
- Construct the Occ and c-table
- Provide a function for the L-to-F mapping (used by Student 2)
- Student 2
- Implement the exact match algorithm that uses BWT, Occ and c table to search a string in a text
- Student 3
- Use the exact match algorithm of Student 2 and backtracking to implement an approximate string search with up to k errors
Project C: Pigeonhole Filter (Aileen, Anke, Yvonne)
Implement the hierarchical filter (PEX) to find approximate matches of a pattern with up to k edit-errors in a text.
- Student 1
- You are given the name of a text file, a pattern P and the number of edit-errors k on command line
- Read in the text from disk
- Construct the hierarchical tree as described in the PEX algorithm
- Student 2
- Write a semi-global DP-aligner to search a substring of P in a substring of T with edit-errors
- Implement the PEX search algorithm
- (Optional: Student 3)
- Implement a banded version of Myers' bit-vector algorithm as a replacement for the DP-aligner
Project D: QUASAR (Daniel, Jörn)
Implement QUASAR to search semi-global alignments with up to k edit-errors of patterns in a text.
- Student 1
- Read in a text and construct the q-gram index (assume a DNA alphabet and q=10)
- Read in a set of patterns (you can assume that all have the same size w)
- For each pattern P
- Count the occurrences of all overlapping q-grams of P in overlapping blocks of the text
- Student 2
- For each pattern P
- For each block that reaches the threshold t (from q-gram lemma)
- Verify if the block contains a semi-global alignment of P with up to k edit-errors
- If so, output the pattern P, its end position in the text and the number of errors
Project E: Calculation of the exact q-gram threshold (Franzi, Christopher)
- Student 1
- Read in (from command line) a gapped shape, a text length m and a number of mismatches k
- Calculate the exact gapped q-gram threshold t using the DP algorithm described in [1]
- Student 2
- Calculate a lower bound for t using the q-gram lemma
- Calculate an upper bound using a greedy algorithm that tries to destroy as many q-grams as possible with k errors (see [2])
- For different gapped shapes and values for m and k compare the results of Student 1 with your bounds
Project F: Skew algorithm (Mathias, Oliver, Jan-Martin)
Implement the skew-algorithm for the difference cover
D = {1,2,4} and
m = 7
- Student 1
- Sort (7 radix passes) and name the septuples T[i..i+6] where i mod 7 equals 1, 2 or 4
- Create the text T' = t1 t2 t4 of septuple-names
- Create suffix array A' of T' recursively
- Student 2
- Derive suffix array A124 from A'
- Construct suffix arrays A0, A3, A5, A6 iteratively from A124
- Student 3
- Merge suffix arrays from Student 2 using a priority queue and appropriate comparisons
Project G: Nussinov SCFG (Martin, Kurt)
- Student 1
- Implement the Nussinov-SCFG for RNA-folding with the Nussinov CYK algorithm.
- Together with Student 2 compare the folds predicted by your algorithm to those predicted by mfold and RNAFold.
- Student 2
- Implement a training procedure that uses RFAM-alignments as training input to learn the SCFG's parameters that can be used by Student 1.
References
Please note that some journals might only be accessible from granted subnets like the FU subnet (or VPN).
[1] Stefan Burkhardt, Juha Kärkkäinen,
Better Filtering with Gapped q-Grams,
Fundamenta Informaticae, 2003
[2]
http://en.wikipedia.org/wiki/Maximum_coverage_problem