As in Section 9.2.3, our aim here is to suggest idioms that may be generally useful to programmers. We begin by expanding on two earlier examples.
RESIDENT is most useful in cases where the compiler cannot detect access patterns. Often this arises due to the use of indirection, as in the following examples:
REAL X(N), Y(N) INTEGER IX1(M), IX2(M) !HPF$ PROCESSORS P(NP) !HPF$ DISTRIBUTE (BLOCK) ONTO P :: X, Y !HPF$ DISTRIBUTE (BLOCK) ONTO P :: IX, IY !HPF$ INDEPENDENT DO I = 1, N !HPF$ ON HOME( X(I) ), RESIDENT( Y(IX(I)) ) X(I) = Y(IX(I)) - Y(IY(I)) END DO !HPF$ INDEPENDENT DO J = 1, N !HPF$ ON HOME( IX(J) ), RESIDENT( Y ) X(J) = Y(IX(J)) - Y(IY(J)) END DO !HPF$ INDEPENDENT DO K = 1, N !HPF$ ON HOME( X(IX(K)) ), RESIDENT( X(K) ) X(K) = Y(IX(K)) - Y(IY(K)) END DO
As we saw in Section 9.2.3, X(I) is always local in the I loop and IX(I) and IY(I) rarely are. The RESIDENT directive above ensures that Y(IX(I)) is local as well. This would most likely to be due to some property of the algorithm that generated IX (for example, if IX(I)=I for all I). Note that it is possible for an expression (e.g., Y(IX(I))) to be local even though one of its subexpressions (IX(I)) is not.
The directive gives no information about Y(IY(I)); it might have only one nonlocal value, or all its values might be nonlocal. (We assume that if there were no nonlocal values, then the RESIDENT clause would include Y(IY(I)) as well.) If there are many local elements referenced by this expression, and they can easily be separated from the local elements, then it may be worthwile to restructure the loop to make this clear to the compiler. For example, suppose that we knew that only the "first" and "last" X elements on each processor were nonlocal. The loop could then be split thus:
!HPF$ INDEPENDENT, NEW(LOCALI) DO I = 1, N !HPF$ ON HOME( X(I) ), RESIDENT( Y(IX(I)), Y(IY(I)) ) BEGIN LOCALI = MOD(I,N/NP IF (LOCALI\=1 .AND. LOCALI\=0) THEN X(I) = Y(IX(I)) - Y(IY(I)) END IF !HPF$ END ON END DO !HPF$ INDEPENDENT, NEW(LOCALI) DO I = 1, NP !HPF$ ON (P(I)), RESIDENT( X(LOCALI), Y(IX(LOCALI)) ) BEGIN LOCALI = (I-1)*N/NP X(LOCALI) = Y(IX(LOCALI)) - Y(IY(LOCALI)) LOCALI = I*N/NP X(LOCALI) = Y(IX(LOCALI)) - Y(IY(LOCALI)) !HPF$ END ON END DO
The first loop (inefficiently) processes the local elements of Y(IY(I)), while the second (more efficiently) handles the rest. On most machines, it would pay to rewrite both loops to avoid the division operations, for example by creating a logical mask a priori.
In the J loop, the RESIDENT clause asserts that all accessed elements of Y are local. In this case, that is equivalent to the assertion
!HPF$ RESIDENT( Y(IX(J)), Y(IY(J)) )
Although the original RESIDENT clause only referred to the lexical expression Y, the compiler can infer that the subexpressions are also local. This is because it is impossible for a subobject to be on a different processor than the ``parent'' object is. This observation can often shorten RESIDENT clauses substantially.
In the K loop, the following references are local:
Because it is an assertion of act, the compiler can draw many inferences from a single RESIDENT clause. For example, consider the following case:
!HPF$ ALIGN Y(I) WITH X(I) !HPF$ ALIGN Z(J) WITH X(J+1) !HPF$ ON HOME( X(K) ), RESIDENT( X(INDX(K)) ) X(K) = X(INDX(K)) + Y(INDX(K)) + Z(INDX(K))
The compiler is justified in making the following assumptions in compiling the assignment statement (assuming it honors both the ALIGN directives and the ON directive):