Next: Pointers
Up: Approved Extensions for Data Mapping
Previous: Remapping and Subprogram Interfaces
This extension allows objects to be directly distributed to processor
subsets by allowing a processor subset to be specified where a
processor could be named, e.g., in a DISTRIBUTE directive.
The specified subset must be a proper subset of the named processor
arrangement.
The syntax of the extended dist-target is as follows:
H806 extended-dist-target |
is processors-name [ ( section-subscript-list ) ]
or * processors-name [ ( section-subscript-list ) ]
or * |
- The section-subscripts in the
section-subscript-list may not be
vector-subscripts and are restricted to be either
subscripts or subscript-triplets.
- In the section-subscript-list, the number of
section-subscripts must equal the rank of the
processor-name.
- Within a DISTRIBUTE directive, each
section-subscript must be a specification-expr.
- Within a DISTRIBUTE or a REDISTRIBUTE
directive, if both a dist-format-list and a
dist-target appear, the number of elements of the
dist-format-list that are not ``*'' must equal the
number of subscript-triplets in the named processor
arrangement.
- Within a DISTRIBUTE or a REDISTRIBUTE
directive, if a dist-target appears but not a
dist-format-list, the rank of each distributee
must equal the number of subscript-triplets in the named
processor arrangement.
!Example 1
!HPF$ PROCESSORS P(10)
REAL A(100)
!HPF$ DISTRIBUTE A(BLOCK) ONTO P(2:5)
!Example 2
!HPF$ PROCESSORS Q(10,10)
REAL A(100,100)
!HPF DISTRIBUTE B(BLOCK,BLOCK) ONTO Q(5:10,5:10)
In Example 1, the array A is distributed by block across the
processors P(2) to P(5) while in the second example, the array B
is distributed across the lower right quadrant of the processor array Q.
-
- Advice to users.This extension is most useful in conjunction with the tasking
construct, see Section 9.4, which allows
multiple independent phases of a computation to execute
simultaneously on different subsets of processors. A similar
situation arises when the code uses multiple data structures which
can be computed in parallel where the computation on each individual
object also exhibits parallelism, e.g., the multiple blocks in a
multi-block grid used in some fluid dynamics calculation. Here, the
individual blocks have to be distributed over subsets of processors
to exploit both levels of parallelism.
(End of advice to users.)
Next: Pointers
Up: Approved Extensions for Data Mapping
Previous: Remapping and Subprogram Interfaces