Semantics and Implementation of Reduction

Next: Further Examples of INDEPENDENT Up: The INDEPENDENT Directive Previous: REDUCTION Variables and Statements

Semantics and Implementation of Reduction

HPF specifies an allowed parallel implementation of an INDEPENDENT DO loop with reduction statements, thereby specifying the semantics of such a loop.

Just as the result of the Fortran intrinsic function SUM is defined to be a implementation-dependent approximation to the sum of the elements of its argument array, the value of a reduction variable on exit from its INDEPENDENT DO loop is likewise not completely specified by HPF. One possible value is that which would have been computed by sequential execution of the loop, but other implementation-dependent approximations to this value may be produced. Any such implementation-dependent value is, however, an approximation to the value produced by sequential execution of the loop. If rounding error, underflow, and overflow do not occur, it will be identical to that value.

Advice to users.If overflow, underflow, or rounding occur, this is one of the few places where an HPF directive in a conforming program may cause that program to produce different output. However, the same problems occur in other systems that attempt to parallelize these operations, for the same reasons. (End of advice to users.)

Since no reference to a protected reduction variable can occur except in a reduction statement, it is not necessary to define the values that these variables may have while protected.

Advice to users.The following ``advice to implementors'' is useful for understanding the behavior of an INDEPENDENT loop with reduction statements. (End of advice to users.)

Advice to implementors.In the discussion in this section, the term ``processor'' means a single physical processor or a group of physical processors that together sequentially execute some or all of the iterations of an independent loop.

We describe a simple implementation mechanism that applies to commutative reduction operations. On entry to an independent loop, every executing processor allocates a private accumulator variable associated with each variable in the reduction clause on the INDEPENDENT directive, and initializes it to the identity element for the corresponding intrinsic reduction operator. The private accumulator variable has the same shape, type, and kind type parameter as the reduction variable.

The identity elements for the intrinsic operators are defined in Table 5.1.

Operator Identity Element

+ 0

- 0

* 1

/ 1

.AND. .TRUE.

.OR. .FALSE.

.EQV. .TRUE.

.NEQV. .FALSE.

Operator	Identity Element
+	0
-	0
*	1
/	1
`.AND.`	`.TRUE.`
`.OR.`	`.FALSE.`
`.EQV.`	`.TRUE.`
`.NEQV.`	`.FALSE.`

Table 5.1: Identity elements for intrinsic reduction operators.

Function Identity Element

IAND(I,J) NOT(0) (all one-bits)

IOR(I,J) 0

IEOR(I,J) 0

MIN(X,Y) the positive number of largest absolute value that has the same type and kind type parameter as the reduction variable

MAX(X,Y) the negative number of largest absolute value that has the same type and kind type parameter as the reduction variable

Function	Identity Element
`IAND(I,J)`	`NOT(0)` (all one-bits)
`IOR(I,J)`	`0`
`IEOR(I,J)`	`0`
`MIN(X,Y)`	the positive number of largest absolute value that has the same type and kind type parameter as the reduction variable
`MAX(X,Y)`	the negative number of largest absolute value that has the same type and kind type parameter as the reduction variable

Table 5.2: Identity elements for intrinsic reduction functions.

The intrinsic functions that may be used as reduction functions are listed, together with their identity elements, in Table 5.2.

Each processor performs a subset of the loop iterations; when it encounters a reduction statement, it updates its own accumulator variable. A processor is free to perform its loop iterations in any order; furthermore, it may start an iteration, suspend work on it, do some or all of the work of other iterations, and resume work on the suspended iteration. However, any update of a private accumulator variable occurs through the execution of a reduction statement, and reduction statements are executed atomically.

The final value of the reduction variable is computed by combining the private accumulator variables with the value of the reduction variable on entry to the loop, using the reduction operator. The ordering of this reduction is language-processor dependent, just as it is for the intrinsic reduction functions (SUM, etc.).

As an example, consider:

      REAL Z

      Z = 5.
!HPF$ INDEPENDENT, REDUCTION(X)
      DO I = 1, 10
        Z = Z + I
      END DO

The final value of Z will be 5 + (1+2+3+4+5+6+7+8+9+10) = 60; the order in which the additions occur is not specified by HPF.

For a second example, here is a SUM_SCATTER done as an independent loop:

!HPF$ INDEPENDENT, REDUCTION(X)
DO I = 1, N
    X(INDEX(I)) = X(INDEX(I)) - F(I)
END DO

The implementation will most likely make a private copy on every processor of an accumulator array XLOCAL of the same type and shape as X, and initialize it to zero. Each iteration will subtract the value of F(I) from its own XLOCAL(INDEX(I)). To create the final result, the implementation must combine all the private accumulator arrays with the initial value of X. The combining operator is the same as the reduction operator, namely addition, so that the result is the sum of the initial value of X and the accumulator arrays. The implementation has the option of using a sparse data structure to store only the updated elements of the local accumulator.

In an MPI based implementation, the MPI_REDUCE function could be used for this task. (End of advice to implementors.)

Next: Further Examples of INDEPENDENT Up: The INDEPENDENT Directive Previous: REDUCTION Variables and Statements

[ Contact Us | HiPerSoft | Computer Science ]