next up previous contents
Next: Implementation Up: The TASK_REGION Construct Previous: Semantics of the TASK_REGION

Execution Model and Usage

A task region does not introduce a fundamentally new execution model. However, the assertions implicit in a task region imply that only the specified active processors of an execution task need to participate in its execution, and that other processors can skip its execution. A processor executing a task region participates in the execution of all tasks executing on a processor subset that it belongs to, and does not participate in the execution of tasks executing on processor subsets that it does not belong to. Code outside lexical tasks is executed as normal data parallel code by all active processors of the task region. The access restrictions for a task region guarantee that the results obtained by this execution paradigm will be consistent with pure data parallel execution of a task region.

A task region presents a simple yet powerful model to write integrated task and data parallel programs. We illustrate three basic computation structures in which task parallelism can be effectively exploited with this model.

  1. Parallel sections: A task region can be used to divide the available processors into disjoint sets for performing independent computations, simulating what is often referred to as parallel sections. This form of task parallelism is relatively straightforward and useful in many application scenarios, an example being multiblock applications. The task region simply contains a sequence of RESIDENT ON blocks on disjoint processor subsets. Note that the division of processors among subsets can be dynamic, that is, it can be in terms of other variables computed during execution.
  2. Nested parallelism: Task regions can be nested, and in particular, a subroutine call made from an execution task can further subdivide the active processors using another task region directive. This allows the exploitation of nested parallelism. An example is the implementation of dynamic tree structured divide and conquer computations. As a specific example, quicksort can be implemented by recursively partitioning the input array of keys around a pivot, and assigning proportionate number of processors to the two new arrays obtained as a result of partitioning.
  3. Data parallel pipelines: Task regions can be used to implement pipelined data parallel computations. We will illustrate this with a 2 dimensional fast Fourier transform (2D FFT) computation. The first stage of a 2D FFT reads a two dimensional matrix and performs a 1 dimensional FFT on each row of the matrix. The second stage performs a 1 dimensional FFT on each column of the matrix and generates the final output. In a pipelined data parallel implementation of this form of 2D FFT, the two stages are mapped on to disjoint subsets of processors. Task and data parallel code for a 2D FFT, along with a brief description, is included in Section 9.4.5.


next up previous contents
Next: Implementation Up: The TASK_REGION Construct Previous: Semantics of the TASK_REGION