This is where I write my weekly goals.
all time todos:
- make splitRazers project page
- add link in seqanswers forum srp thread
- check sensitivity at 4x coverage and compare with dindel
6.12. - 10.12.
PLANNED
- Monday: write split mapping + SNP/indel calling part in Mat&Meth section. go through comments regarding miscalls/missing indels.
look at simulation pipeline results, complete runs
- Tuesday: include supersplat, include a generic indel detector
- Wednesday: write, write, plot, write
- Thursday: real results! make plan! --> Stefan (show first draft to him)
- Friday: send draft to Knut, should contain: methods, simresults, + a bit of intro
REALIZED:
- Monday: JC, mensa, commented comments, read dindel paper! --> they observe homopolymer-indels! and otherwise very low indel rates --> adapted simulation: mismatch probs on default, pi=pd=0.0002 (which still seems relatively high), started on queue. wrote half of MM text while falling asleep
- Tuesday: finished MM text. cluster down → restart simulation pipeline asap. took closer look at supersplat → cant do alignment errors + output is super raw, a lot of processing would have to be done --> raus!
29.11. - 3.12.
PLANNED
- Monday: adapt indelSimulator to simulate known indels within ranges (garuanteeing a certain number of indels per size range), set up smaller pipeline for testing purposes
- Tuesday: replace readsim with mason, include gsnap, bwa into pipeline
- Wednesday: include supersplat in pipeline, read up on other SE indel detection software
- Thursday: produce some example results, indelPercThresh 0/0.5, include 454 in pipeline
- Friday: meeting with Knut at 10.30, make new plan
REALIZED (or what i did instead..)
- Monday:
- resolved snpStore issues for Stefan → input file was corrupt
- discovered another pitfall for indel calling and realignment: read clipping may loose indels when clipped read is pairwise aligned to ref → minNumIndels required for realignment may not be reached → no indel call (solution: make indelThreshold for realignment independent from indelThreshold for indel calling + keep indel info for clipped reads (todo: friday))
- fixed addInterval
- read about indel rates: should be about 8-10:1 to snp rates
- experimented with mason
- Tuesday:
- adapted indelSimulator
- replaced readsim with mason
- Wednesday:
- set up small test pipeline, included gsnap, bwa
- tested.. cant find sensible bwa settings
- Thursday:
- added 454 simulation to pipeline
- Friday:
See progress report for the last 6 months.. all big todos are done!
big todos:
- indel calling on split-aligned reads in SeqAn
- gapped prefix/suffix alignment in razersSpliced
- realignment in snp calling
medium todos:
- generalized gff-to-fragmentStore parsing
small todos:
- open exchange calender (IE from home)
25.5. - 28.5.
PLANNED
- Tuesday: continued debugging
- Wednesday: ---
- Thursday: debugging/testing/code-cleaning
- Friday: 10.30 Kerstin
do also:
- make a more robust indel calling program --> collect positions in perl script or make c program!!!
17.5. - 21.5.
PLANNED
- Monday: make presentation, write review, check index construction in uniqueReads.cpp (done, sent to David)
- Tuesday: JC talk
- Wednesday: check Cougar's indels, sent improved microRazerS to Ho-Ryun, make split mapping run again
- Thursday: Kerstin 10.30 (vorher CNV-HMM angucken!), answer cougar,
- Friday: wrote tests for split alignment (still buggy obviously...)
3.5. - 7.5.
PLANNED
- process 454 data (doing!)
- debug split alignment for edit distance
- make plan for split-mapping results!!!!!!!!
- answer kerstin, sabrina, lena
- continue writing paper
26.4. - 30.4.
PLANNED
- process 454 data (doing!)
- finish split alignment for edit distance (done!)
- make plan for split-mapping results!!!!!!!!
kleines todo fuer dienstag abend:
splitmapping auf A14 unmapped batches anschmeissen!!!
sabrina antworten, lena antworten (A14 mappings, snps, indels, + readme! + mapped split? )
19.4. - 23.4.
PLANNED
- process 454 data
- finish split alignment for edit distance!!!
- make plan for split-mapping results for ismb/paper?
REALIZED
12.4. - 16.4.
PLANNED
- process 454 data
- finish split alignment for edit distance!!!
- make plan for split-mapping results for ismb/paper?
- read and take notes about other split-mappers
- look into realignment again...
monday: send travel fellowship application, put progress report online, answer lena, finish edit split mapping!
5.4. - 8.4.
PLANNED
- progress report
- process 454 data
- split alignment for edit distance!
22.3. - 26.3.
PLANNED
- split alignment for edit distance! --> not done yet
- reprocess some patients with new split alignment (done)
- meet Kerstin and maybe try simple CNV detection on our data (not ready for testing yet)
- 25. & 26. Lena aus Tübingen
15.3. - 19.3.
PLANNED
- write poster abstract
- resolve pindel calling problems
- finish SeqAn tutorial
REALIZED
- sent abstract to ISMB
- resolved split alignment pindel calling problems
- integrated random match estimate into split alignment
- tutorial as good as finished
8.3. - 12.3.
PLANNED
- make plan for poster!
- breakpoint statistics: adapt parameters and check influence of windowlength, make plots
- continue SeqAn tutorial
REALIZED
- expected number of random matches is extremely low for 76bp reads with 23bp minMatchLen, even with errors, and even on whole genome scale → iid model probably not good enough to approximate real-genome-situation
1.3. - 5.3.
PLANNED
- fix split-alignment (done) * missing: maskDuplicates for spliced matches (done for hamming distance)
- fix edit-distance-indel-calling bug(XLMR_1996, siehe Stefans mail) (done)
- continue SeqAn tutorial (continuing)
- get snpStore ready for Marcel (done)
- check out Marcel's breakpoint statistics, adapt parameters and check influence of windowlength (halfway done)
REALIZED
- finished snpStore for Marcel, met on Tuesday
- fixed edit-distance-indel-calling bug: wrong indel position was calculated for reverse gapped reads
- seemingly fixed split-alignment, instead of maskDuplicates: directly discard duplicate prefix and duplicate suffix matches (only works for hamming distance)
22.2. - 26.2.
PLANNED
- test split-alignment
- implement indel calling on split-aligned reads
- continue reading and taking notes
- integrate indel calling on realigned reads
- meet with Paz
- do realignment on windows of varying size? inspect influence of window size. debug realigner?
- SeqAn tutorial
REALIZED
- split-alignment --> model gap costs with exponential funktion? → talked to Marcel about problem of read placement, + probabilities of random matches, expected numbers of reads → variant predicition
- somehow produced a weird bug in split-alignment when integrating into uptodateSeqan
- SeqAn tutorial: finished motif finding + started with alignments
- met with Kerstin --> seqan basics, structs, fragmentStore
- seminars: reseq meeting, genereg meeting, group meeting, Illumina Casava 1.6 talk by Oliver Goldenberg
15.2. - 19.2.
PLANNED
- test split-alignment --> weird insertion/deletion calls (doing)
- implement indel calling on split-aligned reads (not done)
- continue reading and taking notes (naja)
- integrate indel calling on realigned reads (not done)
- meet with Paz (next week)
- do realignment on windows of varying size? inspect influence of window size. debug realigner? (not done)
- meet with Marcel Grunert (done)
- meet with Bernd Timmermann --> casava 1.6, heterogeneous dataset (done)
REALIZED
- test split-alignment --> weird insertion/deletion calls: problem in sorting of matches which was done like this 1) minimizing number of errors (not even minimizing of distanceError was implemented!!!). instead
- met with Marcel Grunert: need to adapt snpStore to incorporate readcounts given as gff-tag. do snp calling on mircoRNAs.
- met with Bernd Timmermann
1.2. - 5.2.
PLANNED
- fix razers uniqueness bug (dachte dass done, aber doch nicht...)
- correct maxPile correction for indel calls (done)
- integrate indel calling on realigned reads (not done)
- meet with Paz (not done)
- do realignment on windows of varying size? inspect influence of window size. debug realigner? (not done)
- continue reading and taking notes (doing)
REALIZED
- Monday: meeting with Kerstin, discussed different CNV tools
- Tuesday: fixed razers uniqueness bug in all three versions
- Wednesday: checked distribution of call quality scores and presented in reseq meeting
- Thursday: integrated indel calling, now insertion/deletion seperate in gff
- Friday: tested snpStore and razerS
25.1. - 29.1.
PLANNED
major:
- test/debug snp calling on realigned reads (doing)
- read SV detection papers and take notes in wiki (doing now)
minor:
- answer tübingen people (done)
- read MA Vipul (done)
REALIZED
- Realignment: problem is in including the reference sequence into the multi read alignment. If includeReference == true this only means that the original reference sequence is stored as an artificial read that is NOT realigned. if one includes the artificial reference-read in the realignment process, the reference is not given enough weight, ie many artificial snp calls may be the result. instead one needs to do a pairwise alignment of reference and consensus sequence, to get a better reference-to-reads alignment.
--> Did that. Results are worse. TODO: inspect cases to find cause. maybe a problem with static windowlength?
- Inspected quality scores of SNP calls: the more FP calls, the more their distribution resembles a geometric distribution. GC bias is also detected in high-quality calls, even gets stronger with increasing quality for geometrically distributed quality scores. --> base call quality value recalibration needed? (to correct for nucleotide specific bias, downweight reads with many errors, ie assign mapping quality) more stringent clipping? (stefan)