[ HPF Home | Versions | Compilers | Projects | Publications | Applications | Benchmarks | Events | Contact ] |
REAL SALAMI(10000) !HPF$ DISTRIBUTE SALAMI(BLOCK)specifies that the array SALAMI should be distributed across some set of abstract processors by slicing it uniformly into blocks of contiguous elements. If there are 50 processors, the directive implies that the array should be divided into groups of 200 elements, with SALAMI(1:200) mapped to the first processor, SALAMI(201:400) mapped to the second processor, and so on. If there is only one processor, the entire array is mapped to that processor as a single block of 10000 elements.
The block size may be specified explicitly:
REAL WEISSWURST(10000) !HPF$ DISTRIBUTE WEISSWURST(BLOCK(256))This specifies that groups of exactly 256 elements should be mapped to successive abstract processors. (There must be at least [10000/256] = 40 abstract processors if the directive is to be satisfied. The fortieth processor will contain a partial block of only 16 elements, namely WEISSWURST(9985:10000).)
HPF also provides a cyclic distribution format:
REAL DECK_OF_CARDS(52) !HPF$ DISTRIBUTE CHESS_BOARD(BLOCK, BLOCK) !HPF$ DISTRIBUTE DECK_OF_CARDS(CYCLIC)If there are 4 abstract processors, the first processor will contain DECK_OF_CARDS(1:49:4), the second processor will contain DECK_OF_CARDS(2:50:4), the third processor will contain DECK_OF_CARDS(3:51:4), and the fourth processor will contain DECK_OF_CARDS(4:52:4). Successive array elements are dealt out to successive abstract processors in round-robin fashion.
Distributions may be specified independently for each dimension of a multidimensional array:
INTEGER CHESS_BOARD(8,8), GO_BOARD(19,19) !HPF$ DISTRIBUTE CHESS_BOARD(BLOCK, BLOCK) !HPF$ DISTRIBUTE GO_BOARD(CYCLIC,*)The CHESS_BOARD array will be carved up into contiguous rectangular patches, which will be distributed onto a two-dimensional arrangement of abstract processors. The GO_BOARD array will have its rows distributed cyclically over a one-dimensional arrangement of abstract processors. (The ``*'' specifies that GO_BOARD is not to be distributed along its second axis; thus an entire row is to be distributed as one object. This is sometimes called ``on-processor'' distribution.)
The REDISTRIBUTE directive is similar to the DISTRIBUTE directive but is considered executable. An array (or template) may be redistributed at any time, provided it has been declared DYNAMIC (see Section 3.5). Any other arrays currently ultimately aligned with an array (or template) when it is redistributed are also remapped to reflect the new distribution, in such a way as to preserve alignment relationships (see Section 3.4). (This can require a lot of computational and communication effort at run time; the programmer must take care when using this feature.)
The DISTRIBUTE directive may appear only in the specification-part of a scoping unit. The REDISTRIBUTE directive may appear only in the execution-part of a scoping unit. The principal difference between DISTRIBUTE and REDISTRIBUTE is that DISTRIBUTE must contain only a specification-expr as the argument to a BLOCK or CYCLIC option, whereas in REDISTRIBUTE such an argument may be any integer expression. Another difference is that DISTRIBUTE is an attribute, and so can be combined with other attributes as part of a combined-directive, whereas REDISTRIBUTE is not an attribute (although a REDISTRIBUTE statement may be written in the style of attributed syntax, using ``::'' punctuation).
Formally, the syntax of the DISTRIBUTE and REDISTRIBUTE directives is:
H303 distribute-directive is DISTRIBUTE< i>distributee dist-directive-stuff H304 redistribute-directive is REDISTRIBUTE distributee dist-directive-stuff or REDISTRIBUTE dist-attribute-stuff :: distributee-list H305 dist-directive-stuff is dist-format-clause [ dist-onto-clause ] H306 dist-attribute-stuff is dist-directive-stuff or dist-onto-clause H307 distributee is object-name or template-name H308 dist-format-clause is ( dist-format-list ) or * ( dist-format-list ) or * H309 dist-format is BLOCK [ ( int-expr ) ] or CYCLIC [ ( int-expr ) ] or * H310 dist-onto-clause is ONTO dist-target H311 dist-target is processors-name or * processors-name or * Constraint: An object-name mentioned as a distributee must be a simple name and not a subobject designator. Constraint: An object-name mentioned as a distributee may not appear as an alignee. Constraint: An object-name} mentioned as a distributee may not have the POINTER attribute. Constraint: A distributee that appears in a REDISTRIBUTE directive must have the DYNAMIC attribute (see Section 3.5). Constraint: If a dist-format-list is specified, its length must equal the rank of each distributee. Constraint: If both a dist-format-list and a processors-name appear, the number of elements of the dist-format-list that are not ``*'' must equal the rank of the named processor arrangement. Constraint: If a processors-name appears but not a dist-format-list, the rank of each distributee must equal the rank of the named processor arrangement. Constraint: If either the dist-format-clause or the dist-target in a DISTRIBUTE directive begins with ``*'' then every distributee must be a dummy argument. Constraint: Neither the dist-format-clause nor the dist-target in a REDISTRIBUTE may begin with ``*''. Constraint: Any int-expr appearing in a dist-format of a DISTRIBUTE directive must be a specification-expr.Note that the possibility of a {\tt DISTRIBUTE} directive of the form
!HPF$ DISTRIBUTE dist-attribute-stuff :: distributee-list is covered by syntax rule 301 for a combined-directive. Examples:!HPF$ DISTRIBUTE D1(BLOCK) !HPF$ DISTRIBUTE (BLOCK,*,BLOCK) ONTO SQUARE:: D2,D3,D4The meanings of the alternatives for dist-format are given below.Define the ceiling division function CD(J,K) = (J+K-1)/K (using Fortran integer arithmetic with truncation toward zero.)
Define the ceiling remainder function CR(J,K) = J-K*CD(J,K).
The dimensions of a processor arrangement appearing as a dist-target are said to correspond in left-to-right order with those dimensions of a distributee for which the corresponding dist-format is not *. In the example above, processor arrangement SQUARE must be two-dimensional; its first dimension corresponds to the first dimensions of D2, D3, and D4 and its second dimension corresponds to the third dimensions of D2, D3, and D4.
Let d be the size of a distributee in a certain dimension and let p be the size of the processor arrangement in the corresponding dimension. For simplicity, assume all dimensions have a lower bound of 1. Then BLOCK(m) means that a distributee position whose index along that dimension is j is mapped to an abstract processor whose index along the corresponding dimension of the processor arrangement is CD(j,m) (note that m X p >= d must be true), and is position number m+CR(j,m) among positions mapped to that abstract processor. The first distributee position in abstract processor k along that axis is position number 1+m*(k-1).
The block size m must be a positive integer.
BLOCK by definition means the same as BLOCK(CD(d,p)).
CYCLIC(m) means that a distributee position whose index along that dimension is j is mapped to an abstract processor whose index along the corresponding dimension of the processor arrangement is 1+MODULO(CD(j,m)-1,p). The first distributee position in abstract processor k along that axis is position number 1+m*(k-1).
The block size \(m\) must be a positive integer.
CYCLIC by definition means the same as CYCLIC(1).
CYCLIC(m) and BLOCK(m) imply the same distribution when m X p >= d, but BLOCK(m) additionally asserts that the distribution will not wrap around in a cyclic manner, which a compiler cannot determine at compile time if m is not constant. Note that CYCLIC and BLOCK (without argument expressions) do not imply the same distribution unless p >= d, a degenerate case in which the block size is 1 and the distribution does not wrap around.
Suppose that we have 16 abstract processors and an array of length 100:
!HPF$ PROCESSORS SEDECIM(16) REAL CENTURY(100)Distributing the array BLOCK (which in this case would mean the same as BLOCK(7)):!HPF$ DISTRIBUTE CENTURY(BLOCK) ONTO SEDECIMresults in this mapping of array elements onto abstract processors:
Distributing the array BLOCK(8):
!HPF$ DISTRIBUTE CENTURY(CYCLIC) ONTO SEDECIMresults in this mapping of array elements onto abstract processors:
Distributing the array CYCLIC(3):
!HPF$ DISTRIBUTE CENTURY(BLOCK(256)) ONTO SEDECIMresults in having only one non-empty block-a partially-filled one at that, having only 100 elements-on processor 1, with processors 2 through 16 having no elements of the array. newA DISTRIBUTE or REDISTRIBUTE directive must not cause any data object associated with the distributee via storage association (COMMON or EQUIVALENCE) to be mapped such that storage units of a scalar data object are split across more than one abstract processor. See Section for further discussion of storage association.
Next: ALIGN and REALIGN Up: Data Alignment and Previous: Syntax of Data
©2000-2006 Rice University | [ Contact Us | HiPerSoft | Computer Science ] |