next up previous contents
Next: The ALIGN Directive Up: Data Mapping Previous: Syntax of Data Alignment

The DISTRIBUTE Directive

 

The DISTRIBUTE directive specifies a mapping of data objects to abstract processors in a processor arrangement. For example,

      REAL SALAMI(10000)
!HPF$ DISTRIBUTE SALAMI(BLOCK)
specifies that the array SALAMI should be distributed across some set of abstract processors by slicing it uniformly inot blocks of contiguous elements. If there are 50 processors, the directive implies that the array should be divided into groups of [10000/50]=200 elements, with SALAMI(1:200) mapped to the first processor, SALAMI(201:400) mapped to the second processor, and so on. If there is only one processor, the entire array is mapped to that processor as a single block of 10000 elements.

The block size may be specified explicitly:

      REAL SALAMI(10000)
!HPF$ DISTRIBUTE WEISSWURST(BLOCK(256))

This specifies that groups of exactly 256 elements should be mapped to successive abstract processors. (There must be at least abstract processors if the directive is to be satisfied. The fortieth processor will contain a partial block of only 16 elements, namely WEISSWURST(9985:10000).)

HPF also provides a cyclic distribution format:

      REAL DECK_OF_CARDS(52)
!HPF$ DISTRIBUTE DECK_OF_CARDS(CYCLIC)
If there are 4 abstract processors, the first processor will contain DECK_OF_CARDS(1:49:4), the second processor will contain DECK_OF_CARDS(2:50:4), the third processor will contain DECK_OF_CARDS(3:51:4) and the fourth processor will contain DECK_OF_CARDS(4:52:4). Successive array elements are dealt out to successive abstract processors in round-robin fashion.

Distributions are specified independently for each dimension of a multidimensional array:

      INTEGER CHESS_BOARD(8,8), GO_BOARD(19,19)
!HPF$ DISTRIBUTE CHESS_BOARD(BLOCK, BLOCK)
!HPF$ DISTRIBUTE GO_BOARD(CYCLIC,*)

The CHESS_BOARD array will be carved up into contiguous rectangular patches, which will be distributed onto a two-dimensional arrangement of abstract processors. The GO_BOARD array will have its rows distributed cyclically over a one-dimensional arrangement of abstract processors. (The "*" specifies that GO_BOARD is not to be distributed along its second axis; thus an entire row is to be distributed as one object. This is sometimes called "on-processor" distribution.)

The DISTRIBUTE directive may appear only in the specification-part of a scoping unit and can contain only a specification-expr as the argument o a BLOCK or CYCLIC option.

The syntax of the DISTRIBUTE directive is:

H305 distribute-directive is DISTRIBUTE distributee dist-directive-stuff
H306 dist-directive-stuff is dist-format clause
H307 dist-attribute-stuff is dist-directive-stuff
or dist-onto-clause
H308 distributee is object-name
or template-name
H309 dist-format-clause is ( dist-format-list )
or * ( dist-format list )
or *
H310 dist-format is BLOCK [ ( scalar-int-expr ) ]
is CYCLIC [ ( scalar-int-expr ) ]
or *
H311 dist-onto-clause is ONTO dist-target
H312 dist-target is processors-name
or * processors-name
or *

The full syntax is given here for completeness. However, some of the forms are discussed only in Section 4. These "interprocedural" forms are:

Constraint:
An object-name mentioned as a distributee must be a simple name and not a subobject designator or a component-name.

Constraint:
An object-name mentioned as a distributee may not appear as an alignee.

Constraint:
An object-name mentioned as a distributee may not have the POINTER attribute.

Constraint:
An object-name mentioned as a distributee may not have the TARGET attribute.

Constraint:
If the distributee is scalar, the dist-format-list (and its surrounding parantheses) must not appear. In this case, the statement form of the directive is allowed only if a dist-format-clause of "*" is present.

Constraint:
If a dist-format-list is specified, its length must equal the rank of each distributee to which it applies.

Constraint:
If both a dist-format-list and a dist-target appear, the number of elements of the dist-format-list that are not "*" must equal the rank of the specified processor arrangement.

Constraint:
If a dist-target appears but not a dist-format-list, the rank of each distributee must equal the rank of the specified processor arrangement.

Constraint:
If either the dist-format-clause or the dist-target in a DISTRIBUTE directive begins with "*" then every distributee must be a dummy argument.

Constraint:
Any scalar-int-expr appearing in a dist-format of a DISTRIBUTE directive must be a specification-expr.
Advice to users. Some of the above constraints are relaxed under the approved extensions (see Section 8): mapping of derived type components (relaxes constraint 1), and mapping of pointers and targets (relaxes constraints 3, 4, and 9). (End of advice to users)

Note that the possibility of a DISTRIBUTE directive of the form

!HPF DISTRIBUTE dist-attribute-stuff :: distributee-list

is covered by syntax rule H301 for a combined-directive.

Examples:

!HPF$ DISTRIBUTE D1(BLOCK)
!HPF$ DISTRIBUTE (BLOCK,*,BLOCK) ONTO SQUARE:: D2,D3,D4

The meanings of the alternatives for dist-format are given below.

Define the ceiling division function CD(J,K) = (J+K-1)/K (using Fortran integer arithmetic with truncation toward zero.)

Define the ceiling remainder function CR(J,K) = J-K*CD(J,K).

The dimensions of a processor arrangement appearing as a dist-target are said to correspond in left-to-right order with those dimensions of a distributee for which the corresponding dist-format is not *. In the example above, processor arrangement SQUARE must be two-dimensional; its first dimension corresponds to the first dimensions of D2, D3, and D4 and its second dimension corresponds to the third dimensions of D2, D3, and D4.

Let d be the size of a distributee in a certain dimension and let be the size of the processor arrangement in the corresponding dimension. For simplicity, assume all dimensions have a lower bound of 1. Then BLOCK(m) means that a distributee position whose index along that dimension is is mapped to an abstract processor whose index along the corresponding dimension of the processor arrangement is CD(j,m) (note that m x p > d must be true), and is position number m+CR(j,m) among positions mapped to that abstract processor. The first distributee position in abstract processor k along that axis is position number 1+m*(k-1).

The block size m must be a positive integer.

BLOCK by definition means the same as BLOCK(CD(d,p)).

CYCLIC(m) means that a distributee position whose index along that dimension is j is mapped to an abstract processor whose index along the corresponding dimension of the processor arrangement is 1+MODULO(CD(d,p)-1,). The first distributee position in abstract processor k along that axis is position number 1+m*(k-1).

The block size m must be a positive integer.

CYCLIC by definition means the same as CYCLIC(1).

CYCLIC(m) and BLOCK(m) imply the same distribution when m x p > d, but BLOCK() additionally asserts that the distribution will not wrap around in a cyclic manner, which a compiler cannot determine at compile time if is not constant. Note that CYCLIC and BLOCK (without argument expressions) do not imply the same distribution unless p > d, a degenerate case in which the block size is 1 and the distribution does not wrap around.

Suppose that we have 16 abstract processors and an array of length 100:

!HPF$ PROCESSORS SEDECIM(16)
      REAL CENTURY(100)

Distributing the array BLOCK (which in this case would mean the same as BLOCK(7)):

!HPF$ DISTRIBUTE CENTURY(BLOCK) ONTO SEDECIM

results in this mapping of array elements onto abstract processors:

12345678910111213141516
1815222936435057647178859299
29162330374451586572798693100
310172431384552596673808794
411182532394653606774818895
512192633404754616875828996
613202734414855626976839097
714212835424956637077849198

Distributing the array BLOCK(8):

!HPF$ DISTRIBUTE CENTURY(BLOCK(8)) ONTO SEDECIM
12345678910111213141516
191725334149576573818997
2101826344250586674829098
3111927354351596775839199
41220283644526068768492100
51321293745536169778593
61422303846546270788694
71523313947556371798795
81624324048566472808896

Distributing the array BLOCK(6) is not HPF-conforming because 6 x 6 < 100.

Distributing the array CYCLIC (which means exactly the same as CYCLIC (1)):

!HPF$ DISTRIBUTE CENTURY(CYCLIC) ONTO SEDECIM
results in this mapping of array elements onto abstract processors:
12345678910111213141516
12345678910111213141516
17181920212223242526272829303132
33343536373839404142434445464748
49505152535455565758596061626364
65666768697071727374757677787980
81828384858687888990919293949596
979899100

Distributing the array CYCLIC(3):

!HPF$ DISTRIBUTE CENTURY(CYCLIC(3)) ONTO SEDECIM
results in this mapping of array elements onto abstract processors
12345678910111213141516
14710131619222528313437404346
25811141720232629323538414447
36912151821242730333639424548
49525558616467707376798285889194
50535659626568717477808386899295
51545760636669727578818487909396
97100
98
99

Note that it is perfectly permissible for an array to be distributed so that some processors have no elements. Indeed, an array may be "distributed" so that all elements reside on one processor. For example,

!HPF$ DISTRIBUTE CENTURY(BLOCK(256)) ONTO SEDECIM

results in having only one non-empty block--a partially-filled one at that, having only 100 elements--on processor 1, with processors 2 through 16 having no elements of the array.

The statement form of a DISTRIBUTE directive may be considered an abbreviation for an attributed form that happens to mention only one distributee; for example,

!HPF$ DISTRIBUTE distributee ( dist-format-list ) ONTO dist-target 
is equivalent to
!HPF$ DISTRIBUTE ( dist-format-list ) ONTO dist-target :: distributee

Note that, to prevent syntactic ambiguity, the dist-format-clause must be present in the statement form, so in general the statement form of the directive may not be used to specify the mapping of scalars.

If the dist-format-clause is omitted from the attributed form, then the language processor may make an arbitrary choice of distribution formats for each template or array. So the directive

!HPF$ DISTRIBUTE ONTO P :: D1,D2,D3
means the same as
!HPF$ DISTRIBUTE ONTO P :: D1
!HPF$ DISTRIBUTE ONTO P :: D2
!HPF$ DISTRIBUTE ONTO P :: D3

to which a compiler, perhaps taking into account patterns of use of D1, D2, and D3 within the code, might choose to supply three distinct distributions such as, for example,

!HPF$ DISTRIBUTE D1(BLOCK, BLOCK) ONTO P
!HPF$ DISTRIBUTE D2(CYCLIC, BLOCK) ONTO P
!HPF$ DISTRIBUTE D1(BLOCK(43), CYCLIC) ONTO P
Then again, the compiler might happen to choose the same distribution for all three arrays.

In either the statement form or the attributed form, if the ONTO clause is present, it specifies the processor arrangement that is the target of the distribution. If the ONTO clause is omitted, then an implementation-dependent processor arrangement is chosen arbitrarily for each distributee. So, for example,

      REAL, DIMENSION(1000) :: ARTHUR, ARNOLD, LINUS, LUCY
!HPF$ PROCESSORS EXCALIBUR(32)
!HPF$ DISTRIBUTE (BLOCK) ONTO EXCALIBUR :: ARTHUR, ARNOLD
!HPF$ DISTRIBUTE (BLOCK) :: LINUS, LUCY

causes the arrays ARTHUR and ARNOLD to have the same mapping, so that corresponding elements reside in the same abstract processor, because they are the same size and distributed in the same way (BLOCK) onto the same processor arrangement (EXCALIBUR). However, LUCY and LINUS do not necessarily have the same mapping because they might, depending on the implementation, be distributed onto differently chosen processor arrangements; so corresponding elements of LUCY and LINUS might not reside on the same abstract processor. (The ALIGN directive provides a way to ensure that two arrays have the same mapping without having to specify an explicit processor arrangement.)

In a given environment, for some distributions, there may be no appropriate processor arrangement.


next up previous contents
Next: The ALIGN Directive Up: Data Mapping Previous: Syntax of Data Alignment