HPF specifies an allowed parallel implementation of an
`INDEPENDENT DO` loop with reduction statements, thereby
specifying the semantics of such a loop.

Just as the result of the Fortran intrinsic function `SUM` is
defined to be a implementation-dependent approximation to the sum of
the elements of its argument array, the value of a reduction variable
on exit from its `INDEPENDENT DO` loop is likewise not
completely specified by HPF. One possible value is that which would
have been computed by sequential execution of the loop, but other
implementation-dependent approximations to this value may be produced.
Any such implementation-dependent value is, however, an approximation
to the value produced by sequential execution of the loop. If
rounding error, underflow, and overflow do not occur, it will be
identical to that value.

*Advice to users.*If overflow, underflow, or rounding occur, this is one of the few places where an HPF directive in a conforming program may cause that program to produce different output. However, the same problems occur in other systems that attempt to parallelize these operations, for the same reasons. (*End of advice to users.*)

Since no reference to a protected reduction variable can occur except in a reduction statement, it is not necessary to define the values that these variables may have while protected.

*Advice to users.*The following ``advice to implementors'' is useful for understanding the behavior of an`INDEPENDENT`loop with reduction statements. (*End of advice to users.*)

*Advice to implementors.*In the discussion in this section, the term ``processor'' means a single physical processor or a group of physical processors that together sequentially execute some or all of the iterations of an independent loop.We describe a simple implementation mechanism that applies to commutative reduction operations. On entry to an independent loop, every executing processor allocates a private accumulator variable associated with each variable in the reduction clause on the

`INDEPENDENT`directive, and initializes it to the identity element for the corresponding intrinsic reduction operator. The private accumulator variable has the same shape, type, and kind type parameter as the reduction variable.The identity elements for the intrinsic operators are defined in Table 5.1.

Operator Identity Element + 0 - 0 * 1 / 1 `.AND.``.TRUE.``.OR.``.FALSE.``.EQV.``.TRUE.``.NEQV.``.FALSE.`**Table 5.1:**Identity elements for intrinsic reduction operators.Function Identity Element `IAND(I,J)``NOT(0)`(all one-bits)`IOR(I,J)``0``IEOR(I,J)``0``MIN(X,Y)`the positive number of largest absolute value that has the same type and kind type parameter as the reduction variable `MAX(X,Y)`the negative number of largest absolute value that has the same type and kind type parameter as the reduction variable **Table 5.2:**Identity elements for intrinsic reduction functions.

The intrinsic functions that may be used as reduction functions are listed, together with their identity elements, in Table 5.2.

Each processor performs a subset of the loop iterations; when it encounters a reduction statement, it updates its own accumulator variable. A processor is free to perform its loop iterations in any order; furthermore, it may start an iteration, suspend work on it, do some or all of the work of other iterations, and resume work on the suspended iteration. However, any update of a private accumulator variable occurs through the execution of a reduction statement, and reduction statements

*are*executed atomically.The final value of the reduction variable is computed by combining the private accumulator variables with the value of the reduction variable on entry to the loop, using the reduction operator. The ordering of this reduction is language-processor dependent, just as it is for the intrinsic reduction functions (

`SUM`, etc.).As an example, consider:

REAL Z Z = 5. !HPF$ INDEPENDENT, REDUCTION(X) DO I = 1, 10 Z = Z + I END DO

The final value of

`Z`will be 5 + (1+2+3+4+5+6+7+8+9+10) = 60; the order in which the additions occur is not specified by HPF.For a second example, here is a

`SUM_SCATTER`done as an independent loop:!HPF$ INDEPENDENT, REDUCTION(X) DO I = 1, N X(INDEX(I)) = X(INDEX(I)) - F(I) END DO

The implementation will most likely make a private copy on every processor of an accumulator array

`XLOCAL`of the same type and shape as`X`, and initialize it to zero. Each iteration will subtract the value of`F(I)`from its own`XLOCAL(INDEX(I))`. To create the final result, the implementation must combine all the private accumulator arrays with the initial value of`X`. The combining operator is the same as the reduction operator, namely addition, so that the result is the sum of the initial value of`X`and the accumulator arrays. The implementation has the option of using a sparse data structure to store only the updated elements of the local accumulator.In an MPI based implementation, the

`MPI_REDUCE`function could be used for this task. (*End of advice to implementors.*)