You are here:
Foswiki
>
ABI Web
>
ResearchLogEnricoSiragusa
>
BoSSADraft
(22 Jun 2011, EnricoSiragusa)
Edit
Attach
Page
BoSSADraft
This page describes the current draft of a Benchmark of Seeds for Sequence Analysis. Sources of inspiration are
Rabema
,
PAPI
and
Pizza&Chili Corpus
.
Approximate String Matching
Purpose
Specific Questions to be Answered
Specific Applications
Approaches
Filtration Algorithms
Verification Algorithms
Corpus
Real World Sources
Random Sources
Methods
Queries
Gold Standards
Index Construction Statistics
Query Statistics
Construction Estimates
Query Estimates
Local Alignment
Purpose
Comments
Approximate String Matching
Purpose
Empirically assess the performance of various seed based filtration criteria into solving off-line k-mismatches and k-differences problems.
Specific Questions to be Answered
Which is the fastest lossless filtration approach?
Which is the fastest lossy filtration approach?
Which is the impact of cache misses, false positives, etc. on total time?
Which is the correlation between threshold and speed/precision?
How good are our specificity/sensitivity estimates w.r.t. random and real world sources?
Specific Applications
Read mapping
Assembly, or genome comparison
Approaches
Filtration Algorithms
Counting based approaches (Swift, QUASAR?)
Maximum weight ungapped seed + optimum lossless threshold (e.g. razerS)
Maximum weight gapped seed + optimum lossless threshold (From
Burhkardt et al.
,
Sahinalp et al.
)
Disjunctive seeds family + threshold value (From
Kucherov et al.
,
Sahinalp et al.
)
Conjunctive seeds family + threshold vector (From
Fontaine et al.
, adapted
Sahinalp et al.
)
Non-counting based approaches (i.e. Threshold = 1, e.g.
PatternHunter
)
Non-overlapping based approaches (i.e. Pigeonhole, e.g.
MrFast
indexing reads and genome, Eland?)
Adapted counting (HOBBES)
Verification Algorithms
Naive Hamming distance
Myers+Ukkonen Edit distance
Corpus
Real World Sources
Tiny length (Viruses?)
Small length (S. Cerevisiae)
Medium length (D. Melanogaster)
Large length (H. Sapiens)
Random Sources
They can be generated from
PAPI
.
Uniform text, which parameters?
Bernoulli i.i.d. text, which parameters?
Markov text, which parameters?
Methods
Queries
Mason simulated reads on real world sources (or randomly sampled from the corresponding random sources?)
Length m in [20,36,50,100,200,500,1000,2000]
Errors rate k/m <= 0.2 for time reasons, theoretical filtration bound is 1-e/sqrt(sigma)
Gold Standards
For lossy methods
Produce a gold standard for each dataset, as in
Holtgrewe et al.
Index Construction Statistics
Total time (User+System)
Memory (Peak)
Cache misses, page faults, etc. (
Perf
to access Intel PMU data])
Query Statistics
Total time (User+System)
Memory usage
Cache misses, page faults, etc. (Use
Perf
to access Intel PMU data])
Number of hits per lookup in query
Number of false/true positives
Number of false negatives (for lossy filters)
Lookup time
Counting time
Verification time
Construction Estimates
Query Estimates
Estimate on the number of hits per lookup in query
Estimate on the number of false/true positives (Based on
Karp et al.
)
Estimate on the number of false negatives (Based on
Karp et al.
,
Weese et al.
)
Local Alignment
Purpose
Empirically assess the performance of various seed based filtration criteria into solving local alignment problems. Here, we have experiences with STELLAR and Birte.
Comments
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r14
<
r13
<
r12
<
r11
|
B
acklinks
|
V
iew wiki text
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r13 - 22 Jun 2011, EnricoSiragusa
- This page was cached on 22 Feb 2025 - 13:58.
ABI
ABI Library
Consultations Knut Reinert
Contact
Events, Conferences, Talks
Members
Projects
Publications
Teaching
Theses
Wiki
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback