The mapping of a global HPF array to the physical processors places one or more blocks, which are groups of elements with consecutive indices, on each processor. The number of blocks mapped to a processor is the product of the number of blocks of consecutive indices in each dimension that are mapped to it. For example, a rank-one array X with a CYCLIC(4) distribution will have blocks containing four elements, except for a possible last block having elements. On the other hand, if X is first aligned to a template or an array having a CYCLIC(4) distribution, and a non-unit stride is employed (as is !HPF ALIGN X(I) WITH T(3*I)), then its blocks may have fewer than four elements. In this case, when the align stride is three and the template has a block-cyclic distribution with four template elements per block, the blocks of X have either one or two elements each. If the align stride were five, then all blocks of X would have exactly one element, as template blocks to which no array element is aligned are not counted in the reckoning of numbers of blocks.
The portion of a global array argument associated with a dummy argument in an HPF_LOCAL subprogram may be accessed in a block-by-block fashion. Three of the local library routines, LOCAL_BLKCNT, LOCAL_LINDEX, and LOCAL_UINDEX, allow easy access to the local storage of a particular block. Their use for this purpose is illustrated by the following example, in which the local data are initialized one block at a time:
EXTRINSIC(HPF_LOCAL) SUBROUTINE NEWKI_DONT_HEBLOCK(X) REAL X(:,:,:) INTEGER BL(3) INTEGER, ALLOCATABLE LIND1(:), LIND2(:), LIND3(:) INTEGER, ALLOCATABLE UIND1(:), UIND2(:), UIND3(:) BL = LOCAL_BLKCNT(X) ALLOCATE LIND1(BL(1)) ALLOCATE LIND2(BL(2)) ALLOCATE LIND3(BL(3)) ALLOCATE UIND1(BL(1)) ALLOCATE UIND2(BL(2)) ALLOCATE UIND3(BL(3)) LIND1 = LOCAL_LINDEX(X, DIM = 1) UIND1 = LOCAL_UINDEX(X, DIM = 1) LIND2 = LOCAL_LINDEX(X, DIM = 2) UIND2 = LOCAL_UINDEX(X, DIM = 2) LIND3 = LOCAL_LINDEX(X, DIM = 3) UIND3 = LOCAL_UINDEX(X, DIM = 3) DO IB1 = 1, BL(1) DO IB2 = 1, BL(2) DO IB3 = 1, BL(3) FORALL (I1 = LIND1(IB1) : UIND1(IB1), & I2 = LIND2(IB2) : UIND2(IB2), & I3 = LIND3(IB3) : UIND3(IB3) ) & X(I1, I2, I3) = IB1 + 10*IB2 + 100*IB3 ENDDO ENDDO ENDDO END SUBROUTINE NEWKI_DONT_HEBLOCK