The mapping of a global HPF array to the physical processors places
one or more *blocks*, which are groups of elements with
consecutive indices, on each processor. The number of blocks mapped
to a processor is the product of the number of blocks of consecutive
indices in each dimension that are mapped to it. For example, a
rank-one array `X` with a `CYCLIC(4)` distribution will
have blocks containing four elements, except for a possible last block
having elements. On the other hand,
if `X` is first aligned to a template or an array having a
`CYCLIC(4)` distribution, and a non-unit stride is employed (as
is `!HPF ALIGN X(I) WITH T(3*I)`), then its blocks may have
fewer than four elements. In this case, when the align stride is
three and the template has a block-cyclic distribution with four
template elements per block, the blocks of `X` have either one
or two elements each. If the align stride were five, then all blocks
of `X` would have exactly one element, as template blocks to
which no array element is aligned are not counted in the reckoning of
numbers of blocks.

The portion of a global array argument associated with a dummy
argument in an HPF_LOCAL subprogram may be accessed in a
block-by-block fashion. Three of the local library routines,
`LOCAL_BLKCNT`, `LOCAL_LINDEX`, and
`LOCAL_UINDEX`, allow easy access to the local storage of a
particular block. Their use for this purpose is illustrated by the
following example, in which the local data are initialized one block
at a time:

EXTRINSIC(HPF_LOCAL) SUBROUTINE NEWKI_DONT_HEBLOCK(X) REAL X(:,:,:) INTEGER BL(3) INTEGER, ALLOCATABLE LIND1(:), LIND2(:), LIND3(:) INTEGER, ALLOCATABLE UIND1(:), UIND2(:), UIND3(:) BL = LOCAL_BLKCNT(X) ALLOCATE LIND1(BL(1)) ALLOCATE LIND2(BL(2)) ALLOCATE LIND3(BL(3)) ALLOCATE UIND1(BL(1)) ALLOCATE UIND2(BL(2)) ALLOCATE UIND3(BL(3)) LIND1 = LOCAL_LINDEX(X, DIM = 1) UIND1 = LOCAL_UINDEX(X, DIM = 1) LIND2 = LOCAL_LINDEX(X, DIM = 2) UIND2 = LOCAL_UINDEX(X, DIM = 2) LIND3 = LOCAL_LINDEX(X, DIM = 3) UIND3 = LOCAL_UINDEX(X, DIM = 3) DO IB1 = 1, BL(1) DO IB2 = 1, BL(2) DO IB3 = 1, BL(3) FORALL (I1 = LIND1(IB1) : UIND1(IB1), & I2 = LIND2(IB2) : UIND2(IB2), & I3 = LIND3(IB3) : UIND3(IB3) ) & X(I1, I2, I3) = IB1 + 10*IB2 + 100*IB3 ENDDO ENDDO ENDDO END SUBROUTINE NEWKI_DONT_HEBLOCK