Next: Implementation
Up: The TASK_REGION Construct
Previous: Semantics of the TASK_REGION
A task region does not introduce a fundamentally new execution model.
However, the assertions implicit in a task region imply that only the
specified active processors of an execution task need to participate
in its execution, and that other processors can skip its execution. A
processor executing a task region participates in the execution of all
tasks executing on a processor subset that it belongs to, and does not
participate in the execution of tasks executing on processor subsets
that it does not belong to. Code outside lexical tasks is executed as
normal data parallel code by all active processors of the task region.
The access restrictions for a task region guarantee that the results
obtained by this execution paradigm will be consistent with pure data
parallel execution of a task region.
A task region presents a simple yet powerful model to write integrated
task and data parallel programs. We illustrate three basic computation
structures in which task parallelism can be effectively exploited with
this model.
- Parallel sections: A task region can be used to divide
the available processors into disjoint sets for performing
independent computations, simulating what is often referred to as
parallel sections. This form of task parallelism is
relatively straightforward and useful in many application scenarios,
an example being multiblock applications. The task region simply
contains a sequence of RESIDENT ON blocks on disjoint
processor subsets. Note that the division of processors among
subsets can be dynamic, that is, it can be in terms of other
variables computed during execution.
- Nested parallelism: Task regions can be nested, and in
particular, a subroutine call made from an execution task can
further subdivide the active processors using another task region
directive. This allows the exploitation of nested parallelism. An
example is the implementation of dynamic tree structured divide and
conquer computations. As a specific example, quicksort can be
implemented by recursively partitioning the input array of keys
around a pivot, and assigning proportionate number of processors to
the two new arrays obtained as a result of partitioning.
- Data parallel pipelines: Task regions can be used to
implement pipelined data parallel computations. We will illustrate
this with a 2 dimensional fast Fourier transform (2D FFT)
computation. The first stage of a 2D FFT reads a two dimensional
matrix and performs a 1 dimensional FFT on each row of the matrix.
The second stage performs a 1 dimensional FFT on each column of the
matrix and generates the final output. In a pipelined data parallel
implementation of this form of 2D FFT, the two stages are mapped on
to disjoint subsets of processors. Task and data parallel code for
a 2D FFT, along with a brief description, is included in
Section 9.4.5.
Next: Implementation
Up: The TASK_REGION Construct
Previous: Semantics of the TASK_REGION