The mapping of a global HPF array to the physical processors places one or more blocks, which are groups of elements with consecutive indices, on each processor. The number of blocks mapped to a processor is the product of the number of blocks of consecutive indices in each dimension that are mapped to it. For example, a rank-one array X with a CYCLIC(4) distribution will have blocks containing four elements, except for a possible last block having elements. On the other hand, if X is first aligned to a template or an array having a CYCLIC(4) distribution, and a non-unit stride is employed (as is !HPF ALIGN X(I) WITH T(3*I)), then its blocks may have fewer than four elements. In this case, when the align stride is three and the template has a block-cyclic distribution with four template elements per block, the blocks of X have either one or two elements each. If the align stride were five, then all blocks of X would have exactly one element, as template blocks to which no array element is aligned are not counted in the reckoning of numbers of blocks.
The portion of a global array argument associated with a dummy argument in an HPF_LOCAL subprogram may be accessed in a block-by-block fashion. Three of the local library routines, LOCAL_BLKCNT, LOCAL_LINDEX, and LOCAL_UINDEX, allow easy access to the local storage of a particular block. Their use for this purpose is illustrated by the following example, in which the local data are initialized one block at a time:
EXTRINSIC(HPF_LOCAL) SUBROUTINE NEWKI_DONT_HEBLOCK(X)
REAL X(:,:,:)
INTEGER BL(3)
INTEGER, ALLOCATABLE LIND1(:), LIND2(:), LIND3(:)
INTEGER, ALLOCATABLE UIND1(:), UIND2(:), UIND3(:)
BL = LOCAL_BLKCNT(X)
ALLOCATE LIND1(BL(1))
ALLOCATE LIND2(BL(2))
ALLOCATE LIND3(BL(3))
ALLOCATE UIND1(BL(1))
ALLOCATE UIND2(BL(2))
ALLOCATE UIND3(BL(3))
LIND1 = LOCAL_LINDEX(X, DIM = 1)
UIND1 = LOCAL_UINDEX(X, DIM = 1)
LIND2 = LOCAL_LINDEX(X, DIM = 2)
UIND2 = LOCAL_UINDEX(X, DIM = 2)
LIND3 = LOCAL_LINDEX(X, DIM = 3)
UIND3 = LOCAL_UINDEX(X, DIM = 3)
DO IB1 = 1, BL(1)
DO IB2 = 1, BL(2)
DO IB3 = 1, BL(3)
FORALL (I1 = LIND1(IB1) : UIND1(IB1), &
I2 = LIND2(IB2) : UIND2(IB2), &
I3 = LIND3(IB3) : UIND3(IB3) ) &
X(I1, I2, I3) = IB1 + 10*IB2 + 100*IB3
ENDDO
ENDDO
ENDDO
END SUBROUTINE NEWKI_DONT_HEBLOCK