Page Worklog_Hauswedell
Project Work log Hauswedell
Friday, 25.06.2010:
- second "real" meeting with Thieme to do planning and division of work
- got access to svn, built benchmark tools, razers2
- together with thieme setup directories, reorganised data
Saturday 26.06.2010:
- tried to do RazerS2 runs but failed due to memory being insufficient
- wrote mail to Holtgrewe to discuss whether we can get access to higher performance computers
Sunday, 27.06.2010:
- received response from Holtgrewe, but didn't agree with strategy of reducing data, as I was unsure of how to sample in a way that preserves the expected behaviour (distribution of reads producing deviating results not known)
Monday, 28.06.2010:
- met with Holtgrewe and Thieme, discussed various issues
- was convinced by Holtgrewe that sampling is indeed a good idea and that there is a seqan tool for uniformly sampling from a given file
- reorganised file and directory structure to accomodate for more different sample sets
- resolved issues with permissions/umasks
- built seq_sample and created 0.1%, 1% and 10% sets from the original input data
- started razers2-run on 0.1% of set I
Tuesday, 29.06.2010 - Thursday 01.07.2010:
- started and finished razers2-runs on 0.1% and 1% of I, and on 0.1% of II
- razers2-run on 1% of II still running
Thursday 01.07.2010:
- done bwa run on 0.1% of I
- got benchmark running on that bwa-run
- Result: {"total_intervals": 0, "found_intervals": 0, "superflous_intervals": 0, "additional_intervals": 0}
- → something went wrong, time to debug
Friday 02.07.2010 - Sunday 04.07.2010:
- spent weekend in correspondence with Holtgrewe to debug this issue
- it turned out that neither the benchmark nor bwa verify their input
- → the output generated by bwa aln is not SAM and not BAM, it is another kind of index
- after bwa aln it is neccessary to run bwa samse that creates SAM
- there now are some bwa results
Monday 05.07.2010:
- razers2-run on 1% of II still running
- since run-time is so much longer than with I, more investigation
- run on I was started without paramchooser → data is useless and also time information
- met with Thieme, Holtgrewe and Weese to discuss current issues
- Weese and Holtgrewe agree that a run-time of now over 130hours is not usual for mapping 30MiB vs 3GiB
- checked whether razers2 binary was built with optimizations and without testing/debugging stuff
- binary seems to be ok, problem has to be somewhere else
Tuesday 06.07.2010:
- started razers2-run on 0.1% of I again to see if it takes equally long with activated paramchooser
- takes 4.5 hours, which is better than for II, but still longer than it should
- built razers1 and started a run to check for regression in RazerS
- successfully did bwa-runs for 0.1% and 1% of both I and II
- tried to do a benchmark of bwa 0.1%/I against RazerS 01%/I
- benchmark segfaults