High Performance Fortran Forum Meeting

Sept. 20-22, 1995 Dallas, Texas
Record of Action: Mary Zosel

Executive Summary

The primary business of this meeting was the first or second reading of a number of language proposals. Following summarizes the status. Details of the proposals are included in the full minutes below.

Proposals passing second reading: Generalized block distributions.

Proposals still in progress of second reading: Irregular mapping; Explicit interface requirements; Async I/O.

Proposals passing in first reading: Mapping Derived Type Components; Mapping to Subsets of Processors; Specification of Shadow Widths; C interoperability; HPF kernel definition; Function result in Local_to_Global query; Reductions; ON and RESIDENT directives.

Proposals discussed in a preliminary review: Mapping function (won't be pursued); Generalized transpose; Task parallelism functionality; Document reorganization; Out-of-core arrays; SPMD to HPF interface; Restrictions on Dynamic redistribution; Eliminate explicit Sequential mapping.

Other items of special note: There will be an HPFF BOF Wednesday evening at SC95 where the proposals for HPF2 will be reviewed. Two of the areas we are considering, ASYNC I/O and C interoperability, have been identified as "high priority" items in the US official vote for functionality in the next round of Fortran standardization. HPF definitions are likely to form a basis for proposals in these areas.

And a further note about HPF simplification. A proposal for a severely limited kernel of HPF has strong support. There is also a proposal under consideration that this smaller form of the language should be designated HPF2, with all other features considered "extended HPF".

The next HPFF meeting is schedule for the Dallas area November 1-3. It is expected to be a similar session of heavy proposal processing.

End of Executive Summary

Detailed Record of Action

Sept. 20: Subgroup meetings chaired by Rob Schreiber, David Loveman, and Piyush Mehrotra were held from 1:30 through the evening.

Sept. 21: Ken Kennedy called the meeting to order at 8:40. Introductions and the initial count of installations were made. 27 people from 24 institutions were present.

In the review of the vendor list, Edinburgh Portable Compilers(EPC) was added to the interested list and the Convex entry dropped to interested, pending resolution of the HP acquisition of Convex. The current vendor implementation list is:

Announced Products

Applied Parallel Research
Digital
Hitachi
Intel
Meiko
Motorola
NA Software
NEC
Pacific-Sierra Research
The Portland Group, Inc. (PGI)
SofTech

Announced Efforts

ACE
Fujitsu
IBM
Lahey
MasPar
NAG
nCube
Thinking Machines

Interested

Cray Research
EPC
Convex
HP
Silicon Graphics
Sun

The Distribution group proposals were presented first by Piyush Mehrotra.

Generalized block distributions: add to H09 dist-format is BLOCK [(int-expr)] BLOCK [(int-array)] ...

Constraint: The int-array appearing in the dist-format of a DISTRIBUTE directive must be a restricted expression.

Semantic constraints:

int-array must be a one-dimensional integer array of size equal to the extent of the corresponding dimension of the target processor arrangement.
The sum of the values of the int-array must be at least the size of the dimension of the array being distributed.
All values must be non-negative.

need to add text on number of processors int-array(i) is the size of the block on the ith processor

Passed: 18 - 1 - 4

The next proposal presented was for Irregular Mapping - 2nd Reading:

Add to H309 dist-format is BLOCK ... or INDIRECT (int-array) Constraint: The int-array appearing in the dist- format of a DISTRIBUTE directive must be a restricted expression. Semantic constraints:

int-array must be a one-dimensional integer array of size equal to the extent of the corresponding dimension of the array being distributed.
The value of int-array must lie between the lower and upper bound of the corresponding dimension of the target processor arrangement.

Int-array (i) yields the processor number of the ith element of the array

Discussion: Chuck Koelbel gave an alternative, perhaps more natural way to get this same function - e.g. aligning edges with nodes. Rob Schreiber asked about the onto clause. The expectation is that it is a 1 to number of processors. It was agreed that clean-up was needed. An official vote was taken, with a very large abstain vote. Ken asked for clarification of why people were abstaining. There was a combination of reasons - e.g. some because of technical details (what does "yields" mean?), and others because they thought there might be a better way. In light of this, the formal vote was withdrawn, and the proposal was returned to subgroup for more detailed work.

Next was a first reading of Mapping function

Add to H309 dist-format is ... or INDIRECT (func-name [,arg- list])

Constraint: func-name is a function with the following properties

it returns an integer (processor index).
it takes as arguments an integer (array index) and other arguments as specified by the last.
does not access or modify any global variables - purer than pure.

The functions takes the array index and other values as arguments and returns the processor index. Example: DISTRIBUTE A (INDIRECT (f,10,100)) onto P f(i,10,100) returns the processor index of A(i) There is no restriction on arguments - compiler has to make a copy. Advice should be given to use scalars or replicated arrays.

There was a question of whether the array index should be explicit in the declaration, but the i in f(i,10,100) can be supplied by the compiler

Rob asked why the extra restriction on pure? The reason was that otherwise the compiler might have to make a copy of everything that it can access?

Chuck asked how to make this a syntactic constraint. Do we need a keyword PURER? Otherwise this is probably better a semantic constraint. Joel Saltz questioned the purer-than-pure, saying one typically might want a data structure for binary search with strange data structure . Rob pointed it could be some kind of an opaque struct. The issue is that these parameters are passed by value - as if f were evaluated for all k on entry. A copy must be made at definition time, because if the data structure changes later, there is a need to keep the original value. A straw poll was taken: Should this be pursued? 5 - 8 - 13 It won't be pursued.

First Reading of Mapping Derived Type Components:

Defn: A component of a derived type is considered to be mapped if it is either an intrinsic type and explicitly mapped, or it is a structure and any one of its components is mapped.

Change the first constraint under H303-311 and H312- 318

An object-name mentioned as a distributee/allgnee must be a simple name OR A COMPONENT OF A DERIVED TYPE BUT MAY NOT BE A SUBOBJECT DESIGNATOR OF ANY OTHER TYPE.

Add: Constraint: A component of a derived type may be explicitly mapped if it is of an intrinsic type or if it is a structure and none of its components are mapped.

Constraint: A variable of a derived type can be explicitly mapped if none of the components of the derived type are mapped.

Note:

components maybe DYNAMIC.
under current definition, pointers are not considered mapped. They can, however, point to mapped objects.

Guy Steele asked if the terminology used is right - for component names, subobject designator ... etc.

Ken interjected general group instructions at this time. For the second reading of proposals, we need full text.

There is a need to allow for both a default distribution and an explicit distribution.

straw poll: 17 - 1 - 8

1st reading Mapping to Subsets of Processors:

  Change  H311 to:
   dist-target is processor-name [(section-subscript-list]]
   or  * processors-names [(section-subscript-list)]
   or  *

Restrict subscripts used to be non-vector-valued: Constraint: In a section-subscript-list, the number of section-subscripts must equal the rank of processor- name.

Add rules:
  subgroup-directive is SUBGROUP procs-name of target-sect
  target-sect        is procs-name (section-subscript-list)

Constraint: In a section-subscript-list, the number of section-subscripts must equal the rank of processor- name.

Constraint: target processors name must be defined via a PROCESSORS directive (no subgroup of subgroup) . The subgroup vote on this was 2 - 2 - 2.

Inherits the rank and extents of the target subsection. Example: SUBGROUP PSUB of P (2:6, 3:15:2) defines PSUB as a processor array (5,7).

Add rule: subgroup-dir is SUBGROUP procs-name [(explicit-shape-spec-list)] OF target-sect (subgroup vote: 2-3-1)

Where:

the size of procs-name and target-sect must match.
column major ordering is used to map subgroup processor to target processor.

    PROCESSORS  P(100)
    SUBGROUP  P1(50)  OF P(1:50)
    SUBGROUPS P2(5,10)  of P(51:100)...

Ken asked that we first consider the base proposal of sections. A straw poll of 14 - 2 - 10 indicated yes, go ahead and develop a full proposal for sections.

Next was discussion of the subgroups part of the proposal. Part is syntactic sugar and part has functionality.

The vote about named subgroups was 6 - 3 - 16. This will require a VERY good case in 2nd reading to pass.

Next was discussion about reshaping subgroups - should it be allowed?

Guy asked if he could have P3(4,4) of Q(3:6,4:7).

But what about subgroups of subgroups? If subgroups are added, should we forbid subgroups of subgroups? 14 - 0 - 12

So Guy's point is that either subgroups of subgroups or the 2nd version, but not both.

Jaspal Subhlok asked: why not separate reshaping from subgroups? ... history ...

Rob pointed out that this is a form of align for processors, why not do it with ALIGN?

Piyush asked that we vote on the reshaping in general: strawpoll 9 - 8 - 9 Ken would like to see a full proposal --- but it BETTER be good!!! Jaspal says that this functionality is needed for the tasking proposals.

There was an additional discussion about whether there should be a proposal using align for subgroups. A comment was made that if the proposal comes back with align, it will be a 1st reading instead of a 2nd reading. A straw poll about using align got a vote of 3-9-12.

Carl Offner presented the first reading of a proposal for Specification of Shadow Widths.

Replace H309 (page 26) with dist-format is BLOCK [,SHADOW-SPEC-OR-INT-EXPR])] ... with the optional specification added to all the different kinds of distributions.

Add new rules:

shadow-spec-or-int-expr  is      shadow-spec    
                         or      int-expr

shadow-spec  is SHADOW-int-expr
             or low-shadow-in-expr
             or high-shadow-int-expr
             or low_shadow-int-expr, high-shadow-int-expr
             or high-shadow-int-expr, low-shadow-int-expr

constraint: ANY int-expr appearing in a shadow-spec or in a shadow-spec-or-int-expr must be a specification- expr with value >+ 0.

The absence of a shadow-spec or a shadow-spec-or-int-expr is equivalent to shadow=0. Shadow-int-expr is equivalent to low-shadow-int-expr, high-shadow-int-expr. A shadow-spec-or-int-expr that is just an int-expr is equivalent to shadow-int-expr. Specifying only low- shadow implies high-shadow=0. Specifying only high- shadow implies low-shadow=0.

The primary reason for doing this is to facilitate the passing of shadowed arrays across subroutine boundaries. There are both storage allocation issues and the data-motion issues. This is intended to be just advice - to tell the compiler that if it uses shadows, how big to make them. This does expose some implementation issues to the user. Scott Baden ask if the specification should be on the compile line? It was pointed out that any time shadow widths are different across the boundaries of a subroutine call, then there is some implied data motion. This requires more things to pass in the descriptors across the subroutine boundary. The owner of the data is still well defined. And any updates of the data will result in messages to update the shadow values.

Straw vote about should this go to full proposal? 13 - 2 - 10

The group took a 15 minute break.

The next proposals discussed were from the Control Subgroup.

Chuck Koelbel presented the first reading of an ON directive proposal.

Motivation: Some programmers want more control where which processor executes the operation makes a big difference, or whether data is moved makes a big difference. HPF1.1 makes these decision the compiler's job. Some people simply don't trust the compiler.

ON HOME recommends where operations should be executed (like distribute recommends where data should be placed). RESIDENT asserts that data need not move (like independent asserts operations are parallel).

These apply to a single statement or block of statements. They name a processor or set of processors, either directly (i.e. section of processor array) or indirectly (e.g. owner of array or template elements). They tell the compiler to execute the operation on the given processor. Other processors may have to move data (see RESIDENT), and call statements need special consideration. This functionality is most useful inside a parallel loop or region ON directives can be nested, but the inner processor set must be subset of the outer set.

Examples

!hpf$ on home (a(i))
a(i+1) = b(i) + a(i)
---
DO i=1,n
  !hpf$ on home (a(indx(i))) block
         a(indx(i)) - b(i)
       b(i) - c(indx(i))
   !hpf$ end on
end do
------------
!hpf$ on home (x(i,:)) block
Do j - 1,m
   !hpf$ on home(x(i,j))
        x(i,j) - foo (y(j,i))
    end do
    !hpf$ end on

What can go in HOME?

Original proposal: Any reference to a variable, template, or processor arrangement (including sections and vector-valued subscripts).

Amendments from subgroup meeting: Only scalar references and regular sections of variables, templates, and processors. The argument for the amendment is easier implementation. The argument against the amendment is that there is no difference for irregular distributions.

Second amendment: Any function called must be PURE. The argument pro is to ensure no side effects and well- definedness. There was no argument against. There will probably be CCI issues related to function calls in realign.

Ken commended the group on the preparation and presentation detail of the initial proposal. The vote to pursue ON for a second reading passed: 20-4-3.

Some additional detail was then presented.

Calls in ON HOME blocks:

Consider the following example.

!hpf$ ON HOME (a(1))
  call foo(a)
   ----
   subroutine foo(x)
   x(2) = ...

The problem: can foo be compiled using the owner- computes rule?

Suggested solutions are that the called routine must have a valid on-home. There might be a declarative form of ON that could go in an explicit interface, or every processor might execute every call.

There was discussion about whether this forces 1-sided communications. We could say that all procs execute all calls and add some other form of ON HOME that eliminates this requirement. There was a query about whether we are really looking for something similar to hpf-serial, or need a purer-than-pure routine.

A straw poll was taken about defining some declaration usable in an explicit interface to have the solution: 7 - 1 - 18

Resident directive: (This was called local in the paper version of the proposal.) This can be an optional clause in ON HOME or a free-standing directive. It gives an optional list of variable references (i.e. variable names, elements, regular sections, ...). Each reference (and, therefore, any subobject of the reference) is stored on processes invoked in surrounding ON HOME. There was some refinement of the definition that resulted from the subgroup discussions. For a read reference, at lease one of the ON processors must store a copy and for a write reference, the ON processors store all copies. Terms of the form mean that every reference to that variable name follows the above rules. If there is no list, it means for every variable.

Resident examples

!HPF$ ON HOME (x(k)), RESIDENT (x(indx(k))
     x(k) = x(indx(d)) + y(indx(k))  ! know about x references but not indx and y
-----
!HPF$ ON HOME (x(j)), RESIDENT (all=x)
     x(j) = x(j) * x(ipermute(j)) *x(j+1) * y(j-1)  ! know about all x, not ipermute, y
------
!HPF$ ON HOME (procs(1:np/2)), RESIDENT
   call foo(a,b)   ! what does this say about what happens inside foo?

As a further example for discussion:

   ON HOME (proc(1)) RESIDENT
     call f(x(a))

    subr f(a)
      = B(2)   ! rob says ok, resident not in this scope, Piyush says illegal if not on p1

It was pointed out that pure can only call pure, and it must be declared. We might have the restriction that the subroutine has to be declared resident and everything it calls must also have the declaration.

!HPF$ ON HOME (indx1(i)) block   ! example of free-standing resident
    n1 = indx1(i)
    n2 =  indx2 (i)
    !HPF$ RESIDENT (x(n1)) block
         tmp = y(n1) - y(n2)
         x(n1) = x(n1) + tmp
         x(n2) = x(n2) - tmp
   !HPF$ end resident
!HPF$ end on

Why ALL is needed:

!HPF$ ALIGN y(i) WITH x(i)
!HPF$ ON HOME (procs(1:2)), RESIDENT (ALL=x)
        x(i) =  x(indx1(i))
        y(i) = y(indx2(i))  ! is y(indx2(i)) is resident?
!HPF$ END ON

(no align)
!HPF$ ON HOME (procs(1:2)), RESIDENT (all=x)
        x(i) =  x(indx1(i))
        y(i) = y(indx2(i))  ! y(indx2(i)) is resident!
!HPF$ END ON

This relates to whether we are talking about the address or the reference.

Guy expressed an opinion that ALL is confusing - we need something else. (And tongue in check suggested that maybe it should be *.)

First reading straw vote ON RESIDENT : 9 - 1 - 17

LUNCH BREAK
The group restarted at 1:05.

Rob Schreiber presented the Reduction proposal 1st reading

Overview examples of what is in the written proposal:

Intrinsic reduction

!hpf$ independent
do I=1,n
   !hpf$ reduce
   x=x+f(i)
enddo

Defined Reduction

!HPF$ independent, ...,  Reduction(list; 
combine=concat; Identity=emptylist)
! list might actually be a "list" of reduction 
variables, but concat must
! be defined for all of the types included.  Also 
emptylist might be a function.
Do I-1,N
   !hpf$ reduce
list = append(list,element)
enddo

It was suggested that the syntax should be "," instead of ";" because the combine= keyword will disambiguate. A question was asked about any restrictions on the kind of function that is used for combine.

Guy suggested that the identity be identified in the interface block for the operator. The combine operator might also be in the interface block --- then the whole reduction statement goes away. The information it provides would already be known to the compiler.

(next example slide)

Real x(,4,8), y(8,8)

!hpf$ independent, new y
Do i-1,1000
    y = ...
!hpf$ reduce
   x = matmul(x,y)
enddo

Implementation: there would be a "local" x per consecutive block of iterations, with a 4,4 identity, constrained (non commutative) fan-in combine. There was a discussion of commutivity of the matrix example and the shape of the identity, and shape results for the combines. Intermediate temporaries are the same shape as y,

Straw polls: Intrinsic reduction (without things like matmul): 22 - 3 - 1

Adding things like matmul: 8 - 7 - 10

Defined reductions that are commutative and associative (with the ideas that were proposed by Guy to simplified); (including discussion of whether the reduce statement belongs on the independent rather than by the actual reduction.) 17 - 6 - 3. (all no votes and most of extensions from vendors)

intrinsic reduction: no mixed reduction operators implied combine entity.

Question - should it be an error if the user gives an identity operation that isn't valid.

ASYNC I/O 2nd reading

Summary

READ ( ..., ID = scalar-int-var, ...)
WRITE (   ..., ID = scalar-int-var, ... )

Wait(ID ...)

both the I/O statements and WAIT may have IOSTATE and ERR args.

(2) Wait(Unit = int, Poll='ID', ID = int, DONE = lvar) WAIT (POLL= 'ALL', UNIT= int, DONE = lvar)

(3) UNFORMATTED, DIRECT files only

(4) Multiple outstanding reads ok, multiple outstanding writes ok, but not both!

Note, in this proposal there is no indication in the open statement ....and the ASYNC keyword is gone.

Alok questions the need for an explicit ban on writing the same block simultaneously.

straw poll ... if there are 2 outstanding writes on the same block should we guarantee the second? 3 - 3 - 15!!! Should it be processor dependent? 5 in favor; non-conforming? 2 in favor.

The next proposal addressed was a 0th reading about a new (non-intrinsic) HPF_LIBRARY function.

We need a generalized transpose and F95 isn't doing it.

GEN_TRANSPOSE (ARRAY, ORDER)
        ARRAY - ANY TYPE = RANK N
        ORDER - INTEGER, SHAPE = [N]
        RESULT - RS (ORDER) == AS  where RS is shape of result and AS is shape of ARRAY.
RESULT (J(1), J(2), ... J(N))= ARRAY (J(ORDER(1)), 
J(ORDER(2)), ..., J(ORDER(N)))

We might do this with reshape , but reshape is pretty awful. This would be a recognizable (efficient) version of the special useful case. But should we just lobby vendors to do a better job with this special case? In reshape, a SHAPE is required. Another possibility is to overload transpose and define it for higher dimensions (reverse) where order is optional. This then makes it an intrinsic. Straw poll about developing a proposal for overloaded transpose: 17 - 1 - 4.

This concluded the Subgroup C presentations.

Next Mary Zosel presented for group record, two of the CCI's that were considered by Subgroup E.

CCI 31: Should number_of_processors reflect the processor "structure"? Currently HPF 1.1 defines the result as "vendor-independent". The subgroup decided that this is appropriate for HPF 1.n.

But that HPF2 might want to consider something different. Consider that number_of_processors is used for controlling data distribution (where it really refers to number_of_memories) and might also be useful for work distribution (e.g. ON). Sorting this out and doing the right thing for SMP's is a non-trivial issue and a topic for HPF2 (or 3?) - not a CCI. We may need additional calls to reflect the different functionality.

CCI 33: local_to_global currently can only be called for arrays bound to global actuals. Should we also be able to inquire about the function result? In theory the answer is yes. This appears to be a very simple change to the document. But, there may be a small additional runtime cost for HPF to HPF_LOCAL calls. The primary discussion was whether or not this was a CCI (fix for HPF 1.2) or new proposal (HPF 2.0). The subgroup recommended HPF2, with vendors free to retrofit to HPF 1.n if they wish. This was confirmed by an institutional vote 19-0-2.

Next Andy Meltzer presented the first reading of the HPF Kernel proposal. Two changes were made to the text of the original document circulated: (1) there is no longer a need to have an extrinsic identifier for the kernel because of another proposal, and (2) the sequence directive has been removed from the language.

There was discussion for straw-polls on some of the specific details.

What distributions should be in the kernel? The subgroup considered 4 possibilities:

BLOCK (only)
BLOCK, CYCLIC
BLOCK, CYCLIC(n)
BLOCK, BLOCK(n)

Ken lead a discussion and set of straw votes, starting with the assumption that BLOCK is in, what else should be included?

Straw polls:
cyclic(n)  - 6 - 7 - 10 ! some support, but?
block(n) - 3 - 10 - 6   ! not really any support
cyclic - 10 - 5 - 9     !  yes people think

Next was a discussion of the options for ALIGN. These include:

   straight alignment ALIGN A(I,J)  WITH  B(I,J)  only 
- exactly same shape
   straight alignment ALIGN A(I,J)  WITH  B(I,J)   (but 
allow for different extents)
   allow only ":"   (just identity alignment - no 
reference to I,J)
   allow only ":" and "*" alignments (adding 
dimensional collapse, replications)
   add offsets

No one supported permutations of subscripts or strides for the kernel.

Straw poll on offsets in addition to everything else: 5 - 9 - 11

Straw poll on different extents 13 - 3 - 7

Straw poll on collapse 13 - 3 - 10 and replication 13-3-10.

("Slow" people were asked to sit on the far side of the room.)

After this vote, there was a reconsideration of CYCLIC (N) in the context of simpler aligns: 8 - 8 - 10.

Andy pointed out that removing SEQUENCE from HPF kernel means that there is no equivalence, common blocks must be the same everywhere, no assumed size, and no arguments with miss-matched shapes.

No new vote on the overall kernel was recorded in the minutes. The disposition of the kernel is tied to the discussion of document reorganization.

BREAK till 3:25

Andy Meltzer presented the 1st reading of the Interoperability proposal.

Changes to HPF are:

LOC - intrinsic function ! beware c_void_pointer - an integral type large enough to hold largest c pointer
EXTRINSIC(C) NAME("char string") new attributes for extrinsic C environment:
MAP_TO([ctype=] ctype specifier [, [LAYOUT=] layout specifier])
CTYPES: No-change (default), int, long, short, char, float, double, long_double, char_pointer, pointer, file-pointer
LAYOUTS: C_ARRAY, HPF_ARRAY, NO_CHANGE (default)

Interoperability example:

Interface
        extrinsic (c) FUNCTION CFUNC(w,x,y,z,a,p) 
Name("Cfunc")
        real, map_to(cfloat)::cfunc
        type(my_type    ::w
        integer, map_to(short)::x
        real, map_to(float)::y
        integer, map_to(char)::x
        integer (kind=4), map_to(short,layout-
c_array):: a(100)
        integer (kind =c-void-pointer), map-
to(pointer)::p
        end function
end interface

The call:
        type(my_type) ;;x
        integer ;;z,x
        real ::y
        integer(kind=4)::a(100)
        integer, (kind=c_void_pointer) ::p
        r=cfunc(w,x,y,z,a,loc(p))

C prototype:
        float cfunc(struct my_type w, short x, float y, 
char z, short a{100), int *p)

What if you need to prepare a field of a structure or ??? Might want a function that is convert-to-c-type. Might also have your own function that converts. There was a question about converting file pointers ... but this hasn't been addressed. Some concern about adding LOC to was expressed. Jerry Wagener asked if we should pick something to be the equivalent of "*" in C instead of loc? (More chuckles about the multiplicity of functions that * seems natural for.)

Miles Ellis (sp?) from Oxford is current chair for the ISO group chartered with C interoperability, but Jerry Wagener (and probably X3J3/ISO) is looking to what we do here.

Strawpoll on the proposed interface - 16 - 1 - 7.

We next had a strawpoll about whether user-defined map functions to convert user defined types should be addressed: 13 - 2- 10.

The vote on conversion functions convert-to-c(x,ctype- "float") was similar: 13 - 1 - 10.

Carl Offner next took a straw poll related to the proposal for changing the requirements for explicit subroutine interfaces.

Should we remove the distinction between prescriptive and descriptive? Concern is that it may break current programs, but there is a lot of resulting simplification in definition. Yes remove: 18-0-7

Mary asked for an additional strawpoll. Should we simply require explicit interfaces everywhere. It was argued this raises the threshold for learning - and for simple programs. Requirement everywhere was voted down: 6 - 9 - 10.

Scott Baden presented a 0th reading of his document about Calling HPF from an Extrinsic Language - SPMD to HPF. Multi-Data-Parallel is one way to think of this.

The motivation is to coordinate multiple HPF programs. Potential uses are SPMD coordination interfaces such as adaptive multiblock, MPMD such as multi-disciplinary or task parallel, and SMP clusters or processor subsets where a coordination layer handles external communications.

The proposed model:

Identify each HPF program (context) as an MPI communicator
Add special HPF runtime entries create-mapped-array-descriptor(...) which returns opaque pointer to a mapped array. query access functions to provide a clear interface to HPF local <-> global extents, put, get, etc.

Data motion between HPF programs is handled explicitly at extrinsic level, e.g. through MPI inter communications.

As a restriction, you can't pass mapped arrays to allocated within different communicators to the same HPF routine because the set of processors is not well defined.

Outstanding issues are: Language independent definition and data conversion.

Changing a processor topology for an HPF program and overlapping communications for 2 or more HPF programs.

Should HPF common blocks be accessible from SPMD - perhaps only if they are sequential (kernel)?

Restriction #4 in the paper should be dropped.

Some general comments were:

Andy: we should talk in terms in turn of hpf-local calling hpf.
Ian: don't do it in a way that implies MPI only.

General straw poll about developing a proposal for callability of HPF from SPMD environment: 19 - 0 - 5.

Ken announced HP has purchased Convex. This will result in a change in our vendor slide.

Jerry Wagener gave a report on F95 status. In their public review process, they got 3 communications from HPFF ... a general letter plus detailed comments from Carol Munroe. There were about 450 comments and they resolved all but 2. They also got a number of suggestions for F2000. Reply letters will come in about November.

The US vote on the proposed standard was NO with (20 minor fixes) {It will change to yes when fixes are made --- e.g. there was a missing "not" in one place.}

WG5 will make the actual changes. They expect it to be official in mid 96.

The requirements for F2000, including some from HPFF have been logged into a journal of requirements. These are available from:

ftp.ncsa.uiuc.edu X3J3 document 004

US recommendations for requirements were made with 6 high and 17 medium. All 3 of the HPFF recommendations (interop, fp except, async I/O) made the US "high" list.

Floating point exception handling is well along. John Reed is taking lead on this. C interop is highest on everyone's list, but partially because of some political reasons, no one has taken the ball. And for async I/O, there is a good chance that the HPFF definition will be it.

Jerry gave a review of floating point exception issues. There are now two approaches. The long-time approach is via an ENABLE construct.

ENABLE 
 .. ! conditions may be set
HANDLE
   ! conditions cleared
END ENABLE

module conditions
   type condition
       private
        logical :: flag
   end type condition

    type (condition) :: overflow
    type (condition) :: divide-by-zero
    ...
    type (condition), parameter:: quiet - 
condition(.false.)
    type (condition), parameter :: signaling - 
condition(.true.)
  ...
end module conditions

The other approach uses intrinsic procedures and is more specific to floating point:

logical function  IEEE_NAN
logical function  IEEE_INF
logical function  IEEE_UNDERFLOW

15 query functions allow you to inquire about flags, etc.

subroutine IEEE_flag_set(flag, value)
subroutine IEEE_flags_clear
logical function IEEE_flag_get(flag)

There are 28 procedures in all.

   module IEEE_arithmetic
        type IEEE_flag
        private
        integer ::I
     end type IEEE_flag

    type (IEEE_Flag), parameter :: overflow - IEEE_flag(3)

The problems associated with these are approaches have to do with how to define the enable action on a data parallel operation where the overflow only happens on one processor.

enable (overflow)
   A = B*C   ! data parallel operation
  handle
  ...  ! what to do with overflow on only one processor?
end enable

Does each processor have its own local overflow flag? That may be the way that the hardware works, but it is not the way that F90 module objects work. If there are local flags, is the "handling" done locally or later? Guy Steele asked about what would happen if this were the equivalent F77-style loop.

The other problem occurs where an outer procedure declares a handler, calls a middle procedure, which calls an inner procedure which has enabled overflow with no handler. If overflow occurs, it returns to the middle layer where nothing in enabled, so nothing happens. Partly this problem stems from separate compilation.

The group finished for the day about 5:45.

Friday Morning 8:05 Piyush Mehrotra went over two pending CCI issues:

CCI 28: ALIGN with A(*,:)::B(:) is misleading - so this has been changed so that what the user is intending is ALIGN (:) WITH A(*,:):: B

Proposed changes:

Page 24, line 6
     Change "" to ""
Page 24, after line 13, add
     H303  is  [(explicit-shape spec-list)}
                                     or  
     H304  is 
                                     or