Parallelism In The SeqAn Library: Patterns for Parallelization
A brief summary of patterns for parallelization, based on [Massingill et al. 2000].
Finding Concurrency
Decomposition
Task decomposition
Identify functional units that can be computed independently/in parallel.
Tasks may map to
function calls or
independent loop iterations.
Data decomposition
Identify physically continuous blocks of data to operate on, partition the data appropriately and assign a task for each partition.
For example, decompose loops into independent tasks for each iteration, or partition loop/data into chunks and assign one task to each chunk.
Dependency analysis
Task grouping
Aggregate tasks to form task groups, thus reducing granularity.
“Tasks that correspond to a high-level operation naturally group together.”
Merge tasks that share the same constraints (shared data …).
Task ordering
Identify ordering constraints between task groups:
- Tasks that need to execute concurrently.
- Tasks that need to execute sequentially because the second depends on the results of the first.
- Absence of any constraints between sets of tasks.
Data sharing
Identify which data is
- Task local
- Shared, but read-only
- Effectively task local (i.e. partitioned into subsets corresponding to tasks)
- Read/write
Identify border regions where data sets of different tasks overlap.
Evaluation
- How many PEs are available?
- How are data structures shared among PEs?
- What does the target architecture imply about the # of UE and how structures are shared among them?
- Will the time spent doing useful work in a task be significantly greater than the time taken to deal with dependencies?
Design quality
- Flexibility
- Efficiency
- Simplicity
Preparation for the next phase
- How regular are the tasks and their data dependencies?
- Are interactions between tasks (groups) synchronous or asynchronous?
- Are the tasks grouped in the best way?
Algorithm Structure
Organized by …
… tasks
- Task parallelism
- Divide and conquer
… data decomposition
- Geometric decomposition
- Recursive decomposition
- Domain decomposition
… flow of data
- Pipeline – Each stage should take approximately the same time. Subsequent stages execute in parallel (communication via queue). Replicate stateless stages to augment parallelism.
- Event-based coordinaton
Supporting Structures
Program structures
- Master/worker
- static: each worker works on one task only, then returns
- dynamic: workers on standby until they receive work via input queue, process work, push into output queue, idle.
- Producer/consumer
- Fixed-size queue
- Consumers wait while queue empty; producers wait while queue full
- Requires synchronizing guards
- Loop parallelism
- Fork/join
- Leader/follower
- Work stealing
- Asynchronous round-robin
- Global round-robin
- Random polling
Literature
- [Massingill et al., 2000] B. L. Massingill, T. G. Mattson, and B. A. Sanders. Patterns for finding concurrency for parallel application programs. In 7th. Pattern Languages of Programs Conference (PLOP 2000), 2000.
- [Kumar et al., 1994] V. Kumar, A. Grama, A. Gupta, G Karypis, Introduction to parallel computing: Design and analysis of algorithms, The Benjamin/Cummings Publishing Company, Inc. (Redwood City, California), 1994.