[ HPF Home | Versions | Compilers | Projects | Publications | Applications | Benchmarks | Events | Contact ] |
The DISTRIBUTE directive specifies a mapping of data objects to abstract processors in a processor arrangement. For example,
specifies that the array SALAMI should be distributed across some set of abstract processors by slicing it uniformly inot blocks of contiguous elements. If there are 50 processors, the directive implies that the array should be divided into groups of [10000/50]=200 elements, with SALAMI(1:200) mapped to the first processor, SALAMI(201:400) mapped to the second processor, and so on. If there is only one processor, the entire array is mapped to that processor as a single block of 10000 elements.REAL SALAMI(10000) !HPF$ DISTRIBUTE SALAMI(BLOCK)
The block size may be specified explicitly:
REAL SALAMI(10000) !HPF$ DISTRIBUTE WEISSWURST(BLOCK(256))
This specifies that groups of exactly 256 elements should be mapped to successive abstract processors. (There must be at least abstract processors if the directive is to be satisfied. The fortieth processor will contain a partial block of only 16 elements, namely WEISSWURST(9985:10000).)
HPF also provides a cyclic distribution format:
If there are 4 abstract processors, the first processor will contain DECK_OF_CARDS(1:49:4), the second processor will contain DECK_OF_CARDS(2:50:4), the third processor will contain DECK_OF_CARDS(3:51:4) and the fourth processor will contain DECK_OF_CARDS(4:52:4). Successive array elements are dealt out to successive abstract processors in round-robin fashion.REAL DECK_OF_CARDS(52) !HPF$ DISTRIBUTE DECK_OF_CARDS(CYCLIC)
Distributions are specified independently for each dimension of a multidimensional array:
INTEGER CHESS_BOARD(8,8), GO_BOARD(19,19) !HPF$ DISTRIBUTE CHESS_BOARD(BLOCK, BLOCK) !HPF$ DISTRIBUTE GO_BOARD(CYCLIC,*)
The CHESS_BOARD array will be carved up into contiguous rectangular patches, which will be distributed onto a two-dimensional arrangement of abstract processors. The GO_BOARD array will have its rows distributed cyclically over a one-dimensional arrangement of abstract processors. (The "*" specifies that GO_BOARD is not to be distributed along its second axis; thus an entire row is to be distributed as one object. This is sometimes called "on-processor" distribution.)
The DISTRIBUTE directive may appear only in the specification-part of a scoping unit and can contain only a specification-expr as the argument o a BLOCK or CYCLIC option.
The syntax of the DISTRIBUTE directive is:
H305 distribute-directive is DISTRIBUTE distributee dist-directive-stuff H306 dist-directive-stuff is dist-format clause H307 dist-attribute-stuff is dist-directive-stuff
or dist-onto-clauseH308 distributee is object-name
or template-nameH309 dist-format-clause is ( dist-format-list )
or * ( dist-format list )
or *H310 dist-format is BLOCK [ ( scalar-int-expr ) ]
is CYCLIC [ ( scalar-int-expr ) ]
or *H311 dist-onto-clause is ONTO dist-target H312 dist-target is processors-name
or * processors-name
or *
The full syntax is given here for completeness. However, some of the forms are discussed only in Section 4. These "interprocedural" forms are:
- The last two options of rule H309 (containing the * form)
- The last tow options of rule H312 (containing the * form)
Advice to users. Some of the above constraints are relaxed under the approved extensions (see Section 8): mapping of derived type components (relaxes constraint 1), and mapping of pointers and targets (relaxes constraints 3, 4, and 9). (End of advice to users)Note that the possibility of a DISTRIBUTE directive of the form
!HPF DISTRIBUTE dist-attribute-stuff :: distributee-list
is covered by syntax rule H301 for a combined-directive.
Examples:
!HPF$ DISTRIBUTE D1(BLOCK) !HPF$ DISTRIBUTE (BLOCK,*,BLOCK) ONTO SQUARE:: D2,D3,D4
The meanings of the alternatives for dist-format are given below.
Define the ceiling division function CD(J,K) = (J+K-1)/K (using Fortran integer arithmetic with truncation toward zero.)
Define the ceiling remainder function CR(J,K) = J-K*CD(J,K).
The dimensions of a processor arrangement appearing as a dist-target are said to correspond in left-to-right order with those dimensions of a distributee for which the corresponding dist-format is not *. In the example above, processor arrangement SQUARE must be two-dimensional; its first dimension corresponds to the first dimensions of D2, D3, and D4 and its second dimension corresponds to the third dimensions of D2, D3, and D4.
Let d be the size of a distributee in a certain dimension and let be the size of the processor arrangement in the corresponding dimension. For simplicity, assume all dimensions have a lower bound of 1. Then BLOCK(m) means that a distributee position whose index along that dimension is is mapped to an abstract processor whose index along the corresponding dimension of the processor arrangement is CD(j,m) (note that m x p > d must be true), and is position number m+CR(j,m) among positions mapped to that abstract processor. The first distributee position in abstract processor k along that axis is position number 1+m*(k-1).
The block size m must be a positive integer.
BLOCK by definition means the same as BLOCK(CD(d,p)).
CYCLIC(m) means that a distributee position whose index along that dimension is j is mapped to an abstract processor whose index along the corresponding dimension of the processor arrangement is 1+MODULO(CD(d,p)-1,). The first distributee position in abstract processor k along that axis is position number 1+m*(k-1).
The block size m must be a positive integer.
CYCLIC by definition means the same as CYCLIC(1).
CYCLIC(m) and BLOCK(m) imply the same distribution when m x p > d, but BLOCK() additionally asserts that the distribution will not wrap around in a cyclic manner, which a compiler cannot determine at compile time if is not constant. Note that CYCLIC and BLOCK (without argument expressions) do not imply the same distribution unless p > d, a degenerate case in which the block size is 1 and the distribution does not wrap around.
Suppose that we have 16 abstract processors and an array of length 100:
!HPF$ PROCESSORS SEDECIM(16) REAL CENTURY(100)
Distributing the array BLOCK (which in this case would mean the same as BLOCK(7)):
!HPF$ DISTRIBUTE CENTURY(BLOCK) ONTO SEDECIM
results in this mapping of array elements onto abstract processors:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 2 9 16 23 30 37 44 51 58 65 72 79 86 93 100 3 10 17 24 31 38 45 52 59 66 73 80 87 94 4 11 18 25 32 39 46 53 60 67 74 81 88 95 5 12 19 26 33 40 47 54 61 68 75 82 89 96 6 13 20 27 34 41 48 55 62 69 76 83 90 97 7 14 21 28 35 42 49 56 63 70 77 84 91 98
Distributing the array BLOCK(8):
!HPF$ DISTRIBUTE CENTURY(BLOCK(8)) ONTO SEDECIM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 9 17 25 33 41 49 57 65 73 81 89 97 2 10 18 26 34 42 50 58 66 74 82 90 98 3 11 19 27 35 43 51 59 67 75 83 91 99 4 12 20 28 36 44 52 60 68 76 84 92 100 5 13 21 29 37 45 53 61 69 77 85 93 6 14 22 30 38 46 54 62 70 78 86 94 7 15 23 31 39 47 55 63 71 79 87 95 8 16 24 32 40 48 56 64 72 80 88 96
Distributing the array BLOCK(6) is not HPF-conforming because 6 x 6 < 100.
Distributing the array CYCLIC (which means exactly the same as CYCLIC (1)):
results in this mapping of array elements onto abstract processors:!HPF$ DISTRIBUTE CENTURY(CYCLIC) ONTO SEDECIM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Distributing the array CYCLIC(3):
results in this mapping of array elements onto abstract processors!HPF$ DISTRIBUTE CENTURY(CYCLIC(3)) ONTO SEDECIM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 50 53 56 59 62 65 68 71 74 77 80 83 86 89 92 95 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 97 100 98 99
Note that it is perfectly permissible for an array to be distributed so that some processors have no elements. Indeed, an array may be "distributed" so that all elements reside on one processor. For example,
!HPF$ DISTRIBUTE CENTURY(BLOCK(256)) ONTO SEDECIM
results in having only one non-empty block--a partially-filled one at that, having only 100 elements--on processor 1, with processors 2 through 16 having no elements of the array.
The statement form of a DISTRIBUTE directive may be considered an abbreviation for an attributed form that happens to mention only one distributee; for example,
is equivalent to!HPF$ DISTRIBUTE distributee ( dist-format-list ) ONTO dist-target
!HPF$ DISTRIBUTE ( dist-format-list ) ONTO dist-target :: distributee
Note that, to prevent syntactic ambiguity, the dist-format-clause must be present in the statement form, so in general the statement form of the directive may not be used to specify the mapping of scalars.
If the dist-format-clause is omitted from the attributed form, then the language processor may make an arbitrary choice of distribution formats for each template or array. So the directive
means the same as!HPF$ DISTRIBUTE ONTO P :: D1,D2,D3
!HPF$ DISTRIBUTE ONTO P :: D1 !HPF$ DISTRIBUTE ONTO P :: D2 !HPF$ DISTRIBUTE ONTO P :: D3
to which a compiler, perhaps taking into account patterns of use of D1, D2, and D3 within the code, might choose to supply three distinct distributions such as, for example,
Then again, the compiler might happen to choose the same distribution for all three arrays.!HPF$ DISTRIBUTE D1(BLOCK, BLOCK) ONTO P !HPF$ DISTRIBUTE D2(CYCLIC, BLOCK) ONTO P !HPF$ DISTRIBUTE D1(BLOCK(43), CYCLIC) ONTO P
In either the statement form or the attributed form, if the ONTO clause is present, it specifies the processor arrangement that is the target of the distribution. If the ONTO clause is omitted, then an implementation-dependent processor arrangement is chosen arbitrarily for each distributee. So, for example,
REAL, DIMENSION(1000) :: ARTHUR, ARNOLD, LINUS, LUCY !HPF$ PROCESSORS EXCALIBUR(32) !HPF$ DISTRIBUTE (BLOCK) ONTO EXCALIBUR :: ARTHUR, ARNOLD !HPF$ DISTRIBUTE (BLOCK) :: LINUS, LUCY
causes the arrays ARTHUR and ARNOLD to have the same mapping, so that corresponding elements reside in the same abstract processor, because they are the same size and distributed in the same way (BLOCK) onto the same processor arrangement (EXCALIBUR). However, LUCY and LINUS do not necessarily have the same mapping because they might, depending on the implementation, be distributed onto differently chosen processor arrangements; so corresponding elements of LUCY and LINUS might not reside on the same abstract processor. (The ALIGN directive provides a way to ensure that two arrays have the same mapping without having to specify an explicit processor arrangement.)
In a given environment, for some distributions, there may be no appropriate processor arrangement.
©2000-2006 Rice University | [ Contact Us | HiPerSoft | Computer Science ] |