[ HPF Home | Versions | Compilers | Projects | Publications | Applications | Benchmarks | Events | Contact ] |
Executive Summary
The primary business of this meeting was the first or second reading of a number of language proposals. Following summarizes the status. Details of the proposals are included in the full minutes below.
Proposals passing second reading: Generalized block distributions.
Proposals still in progress of second reading: Irregular mapping; Explicit interface requirements; Async I/O.
Proposals passing in first reading: Mapping Derived Type Components; Mapping to Subsets of Processors; Specification of Shadow Widths; C interoperability; HPF kernel definition; Function result in Local_to_Global query; Reductions; ON and RESIDENT directives.
Proposals discussed in a preliminary review: Mapping function (won't be pursued); Generalized transpose; Task parallelism functionality; Document reorganization; Out-of-core arrays; SPMD to HPF interface; Restrictions on Dynamic redistribution; Eliminate explicit Sequential mapping.
Other items of special note: There will be an HPFF BOF Wednesday evening at SC95 where the proposals for HPF2 will be reviewed. Two of the areas we are considering, ASYNC I/O and C interoperability, have been identified as "high priority" items in the US official vote for functionality in the next round of Fortran standardization. HPF definitions are likely to form a basis for proposals in these areas.
And a further note about HPF simplification. A proposal for a severely limited kernel of HPF has strong support. There is also a proposal under consideration that this smaller form of the language should be designated HPF2, with all other features considered "extended HPF".
The next HPFF meeting is schedule for the Dallas area November 1-3. It is expected to be a similar session of heavy proposal processing.
End of Executive Summary
Sept. 20: Subgroup meetings chaired by Rob Schreiber, David Loveman, and Piyush Mehrotra were held from 1:30 through the evening.
Sept. 21: Ken Kennedy called the meeting to order at 8:40. Introductions and the initial count of installations were made. 27 people from 24 institutions were present.
Generalized block distributions: add to H09 dist-format is BLOCK [(int-expr)] BLOCK [(int-array)] ...
Constraint: The int-array appearing in the dist-format of a DISTRIBUTE directive must be a restricted expression.
Semantic constraints:
need to add text on number of processors int-array(i) is the size of the block on the ith processor
Passed: 18 - 1 - 4
The next proposal presented was for Irregular Mapping - 2nd Reading:
Add to H309 dist-format is BLOCK ... or INDIRECT (int-array) Constraint: The int-array appearing in the dist- format of a DISTRIBUTE directive must be a restricted expression. Semantic constraints:
Int-array (i) yields the processor number of the ith element of the array
Discussion: Chuck Koelbel gave an alternative, perhaps more natural way to get this same function - e.g. aligning edges with nodes. Rob Schreiber asked about the onto clause. The expectation is that it is a 1 to number of processors. It was agreed that clean-up was needed. An official vote was taken, with a very large abstain vote. Ken asked for clarification of why people were abstaining. There was a combination of reasons - e.g. some because of technical details (what does "yields" mean?), and others because they thought there might be a better way. In light of this, the formal vote was withdrawn, and the proposal was returned to subgroup for more detailed work.
Next was a first reading of Mapping function
Add to H309 dist-format is ... or INDIRECT (func-name [,arg- list])
Constraint: func-name is a function with the following properties
The functions takes the array index and other values as arguments and returns the processor index. Example: DISTRIBUTE A (INDIRECT (f,10,100)) onto P f(i,10,100) returns the processor index of A(i) There is no restriction on arguments - compiler has to make a copy. Advice should be given to use scalars or replicated arrays.
There was a question of whether the array index should be explicit in the declaration, but the i in f(i,10,100) can be supplied by the compiler
Rob asked why the extra restriction on pure? The reason was that otherwise the compiler might have to make a copy of everything that it can access?
Chuck asked how to make this a syntactic constraint. Do we need a keyword PURER? Otherwise this is probably better a semantic constraint. Joel Saltz questioned the purer-than-pure, saying one typically might want a data structure for binary search with strange data structure . Rob pointed it could be some kind of an opaque struct. The issue is that these parameters are passed by value - as if f were evaluated for all k on entry. A copy must be made at definition time, because if the data structure changes later, there is a need to keep the original value. A straw poll was taken: Should this be pursued? 5 - 8 - 13 It won't be pursued.
First Reading of Mapping Derived Type Components:
Defn: A component of a derived type is considered to be mapped if it is either an intrinsic type and explicitly mapped, or it is a structure and any one of its components is mapped.
Change the first constraint under H303-311 and H312- 318
An object-name mentioned as a distributee/allgnee must be a simple name OR A COMPONENT OF A DERIVED TYPE BUT MAY NOT BE A SUBOBJECT DESIGNATOR OF ANY OTHER TYPE.
Add: Constraint: A component of a derived type may be explicitly mapped if it is of an intrinsic type or if it is a structure and none of its components are mapped.
Constraint: A variable of a derived type can be explicitly mapped if none of the components of the derived type are mapped.
Note:
Guy Steele asked if the terminology used is right - for component names, subobject designator ... etc.
Ken interjected general group instructions at this time. For the second reading of proposals, we need full text.
There is a need to allow for both a default distribution and an explicit distribution.
straw poll: 17 - 1 - 8
1st reading Mapping to Subsets of Processors:
Change H311 to: dist-target is processor-name [(section-subscript-list]] or * processors-names [(section-subscript-list)] or *
Restrict subscripts used to be non-vector-valued: Constraint: In a section-subscript-list, the number of section-subscripts must equal the rank of processor- name.
Add rules: subgroup-directive is SUBGROUP procs-name of target-sect target-sect is procs-name (section-subscript-list)
Constraint: In a section-subscript-list, the number of section-subscripts must equal the rank of processor- name.
Constraint: target processors name must be defined via a PROCESSORS directive (no subgroup of subgroup) . The subgroup vote on this was 2 - 2 - 2.
Inherits the rank and extents of the target subsection. Example: SUBGROUP PSUB of P (2:6, 3:15:2) defines PSUB as a processor array (5,7).
Add rule: subgroup-dir is SUBGROUP procs-name [(explicit-shape-spec-list)] OF target-sect (subgroup vote: 2-3-1)
Where:
PROCESSORS P(100) SUBGROUP P1(50) OF P(1:50) SUBGROUPS P2(5,10) of P(51:100)...
Ken asked that we first consider the base proposal of sections. A straw poll of 14 - 2 - 10 indicated yes, go ahead and develop a full proposal for sections.
Next was discussion of the subgroups part of the proposal. Part is syntactic sugar and part has functionality.
The vote about named subgroups was 6 - 3 - 16. This will require a VERY good case in 2nd reading to pass.
Next was discussion about reshaping subgroups - should it be allowed?
Guy asked if he could have P3(4,4) of Q(3:6,4:7).
But what about subgroups of subgroups? If subgroups are added, should we forbid subgroups of subgroups? 14 - 0 - 12
So Guy's point is that either subgroups of subgroups or the 2nd version, but not both.
Jaspal Subhlok asked: why not separate reshaping from subgroups? ... history ...
Rob pointed out that this is a form of align for processors, why not do it with ALIGN?
Piyush asked that we vote on the reshaping in general: strawpoll 9 - 8 - 9 Ken would like to see a full proposal --- but it BETTER be good!!! Jaspal says that this functionality is needed for the tasking proposals.
There was an additional discussion about whether there should be a proposal using align for subgroups. A comment was made that if the proposal comes back with align, it will be a 1st reading instead of a 2nd reading. A straw poll about using align got a vote of 3-9-12.
Carl Offner presented the first reading of a proposal for Specification of Shadow Widths.
Replace H309 (page 26) with dist-format is BLOCK [,SHADOW-SPEC-OR-INT-EXPR])] ... with the optional specification added to all the different kinds of distributions.
Add new rules:
shadow-spec-or-int-expr is shadow-spec or int-expr shadow-spec is SHADOW-int-expr or low-shadow-in-expr or high-shadow-int-expr or low_shadow-int-expr, high-shadow-int-expr or high-shadow-int-expr, low-shadow-int-exprconstraint: ANY int-expr appearing in a shadow-spec or in a shadow-spec-or-int-expr must be a specification- expr with value >+ 0.
The absence of a shadow-spec or a shadow-spec-or-int-expr is equivalent to shadow=0. Shadow-int-expr is equivalent to low-shadow-int-expr, high-shadow-int-expr. A shadow-spec-or-int-expr that is just an int-expr is equivalent to shadow-int-expr. Specifying only low- shadow implies high-shadow=0. Specifying only high- shadow implies low-shadow=0.
The primary reason for doing this is to facilitate the passing of shadowed arrays across subroutine boundaries. There are both storage allocation issues and the data-motion issues. This is intended to be just advice - to tell the compiler that if it uses shadows, how big to make them. This does expose some implementation issues to the user. Scott Baden ask if the specification should be on the compile line? It was pointed out that any time shadow widths are different across the boundaries of a subroutine call, then there is some implied data motion. This requires more things to pass in the descriptors across the subroutine boundary. The owner of the data is still well defined. And any updates of the data will result in messages to update the shadow values.
Straw vote about should this go to full proposal? 13 - 2 - 10
Chuck Koelbel presented the first reading of an ON directive proposal.
Motivation: Some programmers want more control where which processor executes the operation makes a big difference, or whether data is moved makes a big difference. HPF1.1 makes these decision the compiler's job. Some people simply don't trust the compiler.
ON HOME recommends where operations should be executed (like distribute recommends where data should be placed). RESIDENT asserts that data need not move (like independent asserts operations are parallel).
These apply to a single statement or block of statements. They name a processor or set of processors, either directly (i.e. section of processor array) or indirectly (e.g. owner of array or template elements). They tell the compiler to execute the operation on the given processor. Other processors may have to move data (see RESIDENT), and call statements need special consideration. This functionality is most useful inside a parallel loop or region ON directives can be nested, but the inner processor set must be subset of the outer set.
Examples
!hpf$ on home (a(i)) a(i+1) = b(i) + a(i) --- DO i=1,n !hpf$ on home (a(indx(i))) block a(indx(i)) - b(i) b(i) - c(indx(i)) !hpf$ end on end do ------------ !hpf$ on home (x(i,:)) block Do j - 1,m !hpf$ on home(x(i,j)) x(i,j) - foo (y(j,i)) end do !hpf$ end onWhat can go in HOME?
Original proposal: Any reference to a variable, template, or processor arrangement (including sections and vector-valued subscripts).
Amendments from subgroup meeting: Only scalar references and regular sections of variables, templates, and processors. The argument for the amendment is easier implementation. The argument against the amendment is that there is no difference for irregular distributions.
Second amendment: Any function called must be PURE. The argument pro is to ensure no side effects and well- definedness. There was no argument against. There will probably be CCI issues related to function calls in realign.
Ken commended the group on the preparation and presentation detail of the initial proposal. The vote to pursue ON for a second reading passed: 20-4-3.
Some additional detail was then presented.
Calls in ON HOME blocks:
Consider the following example.
!hpf$ ON HOME (a(1)) call foo(a) ---- subroutine foo(x) x(2) = ...
The problem: can foo be compiled using the owner- computes rule?
Suggested solutions are that the called routine must have a valid on-home. There might be a declarative form of ON that could go in an explicit interface, or every processor might execute every call.
There was discussion about whether this forces 1-sided communications. We could say that all procs execute all calls and add some other form of ON HOME that eliminates this requirement. There was a query about whether we are really looking for something similar to hpf-serial, or need a purer-than-pure routine.
A straw poll was taken about defining some declaration usable in an explicit interface to have the solution: 7 - 1 - 18
Resident directive: (This was called local in the
paper version of the proposal.) This can be an optional
clause in ON HOME or a free-standing directive. It
gives an optional list of variable references (i.e.
variable names, elements, regular sections, ...). Each
reference (and, therefore, any subobject of the
reference) is stored on processes invoked in
surrounding ON HOME. There was some refinement of the
definition that resulted from the subgroup discussions.
For a read reference, at lease one of the ON
processors must store a copy and for a write reference,
the ON processors store all copies.
Terms of the form
Resident examples
!HPF$ ON HOME (x(k)), RESIDENT (x(indx(k)) x(k) = x(indx(d)) + y(indx(k)) ! know about x references but not indx and y ----- !HPF$ ON HOME (x(j)), RESIDENT (all=x) x(j) = x(j) * x(ipermute(j)) *x(j+1) * y(j-1) ! know about all x, not ipermute, y ------ !HPF$ ON HOME (procs(1:np/2)), RESIDENT call foo(a,b) ! what does this say about what happens inside foo?
As a further example for discussion:
ON HOME (proc(1)) RESIDENT call f(x(a)) subr f(a) = B(2) ! rob says ok, resident not in this scope, Piyush says illegal if not on p1
It was pointed out that pure can only call pure, and it must be declared. We might have the restriction that the subroutine has to be declared resident and everything it calls must also have the declaration.
!HPF$ ON HOME (indx1(i)) block ! example of free-standing resident n1 = indx1(i) n2 = indx2 (i) !HPF$ RESIDENT (x(n1)) block tmp = y(n1) - y(n2) x(n1) = x(n1) + tmp x(n2) = x(n2) - tmp !HPF$ end resident !HPF$ end on
Why ALL is needed:
!HPF$ ALIGN y(i) WITH x(i) !HPF$ ON HOME (procs(1:2)), RESIDENT (ALL=x) x(i) = x(indx1(i)) y(i) = y(indx2(i)) ! is y(indx2(i)) is resident? !HPF$ END ON (no align) !HPF$ ON HOME (procs(1:2)), RESIDENT (all=x) x(i) = x(indx1(i)) y(i) = y(indx2(i)) ! y(indx2(i)) is resident! !HPF$ END ON
This relates to whether we are talking about the address or the reference.
Guy expressed an opinion that ALL is confusing - we need something else. (And tongue in check suggested that maybe it should be *.)
First reading straw vote ON RESIDENT : 9 - 1 - 17
LUNCH BREAK
The group restarted at 1:05.
Overview examples of what is in the written proposal:
Intrinsic reduction
!hpf$ independent do I=1,n !hpf$ reduce x=x+f(i) enddo
Defined Reduction
!HPF$ independent, ..., Reduction(list; combine=concat; Identity=emptylist) ! list might actually be a "list" of reduction variables, but concat must ! be defined for all of the types included. Also emptylist might be a function. Do I-1,N !hpf$ reduce list = append(list,element) enddo
It was suggested that the syntax should be "," instead of ";" because the combine= keyword will disambiguate. A question was asked about any restrictions on the kind of function that is used for combine.
Guy suggested that the identity be identified in the interface block for the operator. The combine operator might also be in the interface block --- then the whole reduction statement goes away. The information it provides would already be known to the compiler.
(next example slide)
Real x(,4,8), y(8,8) !hpf$ independent, new y Do i-1,1000 y = ... !hpf$ reduce x = matmul(x,y) enddo
Implementation: there would be a "local" x per consecutive block of iterations, with a 4,4 identity, constrained (non commutative) fan-in combine. There was a discussion of commutivity of the matrix example and the shape of the identity, and shape results for the combines. Intermediate temporaries are the same shape as y,
Straw polls: Intrinsic reduction (without things like matmul): 22 - 3 - 1
Adding things like matmul: 8 - 7 - 10
Defined reductions that are commutative and associative (with the ideas that were proposed by Guy to simplified); (including discussion of whether the reduce statement belongs on the independent rather than by the actual reduction.) 17 - 6 - 3. (all no votes and most of extensions from vendors)
intrinsic reduction: no mixed reduction operators implied combine entity.
Question - should it be an error if the user gives an identity operation that isn't valid.
Summary
READ ( ..., ID = scalar-int-var, ...) WRITE ( ..., ID = scalar-int-var, ... ) Wait(ID ...)
both the I/O statements and WAIT may have IOSTATE and ERR args.
(2) Wait(Unit = int, Poll='ID', ID = int, DONE = lvar) WAIT (POLL= 'ALL', UNIT= int, DONE = lvar)
(3) UNFORMATTED, DIRECT files only
(4) Multiple outstanding reads ok, multiple outstanding writes ok, but not both!
Note, in this proposal there is no indication in the open statement ....and the ASYNC keyword is gone.
Alok questions the need for an explicit ban on writing the same block simultaneously.
straw poll ... if there are 2 outstanding writes on the same block should we guarantee the second? 3 - 3 - 15!!! Should it be processor dependent? 5 in favor; non-conforming? 2 in favor.
We need a generalized transpose and F95 isn't doing it.
GEN_TRANSPOSE (ARRAY, ORDER) ARRAY - ANY TYPE = RANK N ORDER - INTEGER, SHAPE = [N] RESULT - RS (ORDER) == AS where RS is shape of result and AS is shape of ARRAY. RESULT (J(1), J(2), ... J(N))= ARRAY (J(ORDER(1)), J(ORDER(2)), ..., J(ORDER(N)))
We might do this with reshape , but reshape is pretty awful. This would be a recognizable (efficient) version of the special useful case. But should we just lobby vendors to do a better job with this special case? In reshape, a SHAPE is required. Another possibility is to overload transpose and define it for higher dimensions (reverse) where order is optional. This then makes it an intrinsic. Straw poll about developing a proposal for overloaded transpose: 17 - 1 - 4.
CCI 31: Should number_of_processors reflect the processor "structure"? Currently HPF 1.1 defines the result as "vendor-independent". The subgroup decided that this is appropriate for HPF 1.n.
But that HPF2 might want to consider something different. Consider that number_of_processors is used for controlling data distribution (where it really refers to number_of_memories) and might also be useful for work distribution (e.g. ON). Sorting this out and doing the right thing for SMP's is a non-trivial issue and a topic for HPF2 (or 3?) - not a CCI. We may need additional calls to reflect the different functionality.
CCI 33: local_to_global currently can only be called for arrays bound to global actuals. Should we also be able to inquire about the function result? In theory the answer is yes. This appears to be a very simple change to the document. But, there may be a small additional runtime cost for HPF to HPF_LOCAL calls. The primary discussion was whether or not this was a CCI (fix for HPF 1.2) or new proposal (HPF 2.0). The subgroup recommended HPF2, with vendors free to retrofit to HPF 1.n if they wish. This was confirmed by an institutional vote 19-0-2.
There was discussion for straw-polls on some of the specific details.
What distributions should be in the kernel? The subgroup considered 4 possibilities:
BLOCK (only) BLOCK, CYCLIC BLOCK, CYCLIC(n) BLOCK, BLOCK(n)
Ken lead a discussion and set of straw votes, starting with the assumption that BLOCK is in, what else should be included?
Straw polls: cyclic(n) - 6 - 7 - 10 ! some support, but? block(n) - 3 - 10 - 6 ! not really any support cyclic - 10 - 5 - 9 ! yes people thinkNext was a discussion of the options for ALIGN. These include:
straight alignment ALIGN A(I,J) WITH B(I,J) only - exactly same shape straight alignment ALIGN A(I,J) WITH B(I,J) (but allow for different extents) allow only ":" (just identity alignment - no reference to I,J) allow only ":" and "*" alignments (adding dimensional collapse, replications) add offsets
No one supported permutations of subscripts or strides for the kernel.
Straw poll on offsets in addition to everything else: 5 - 9 - 11
Straw poll on different extents 13 - 3 - 7
Straw poll on collapse 13 - 3 - 10 and replication 13-3-10.
("Slow" people were asked to sit on the far side of the room.)
After this vote, there was a reconsideration of CYCLIC (N) in the context of simpler aligns: 8 - 8 - 10.
Andy pointed out that removing SEQUENCE from HPF kernel means that there is no equivalence, common blocks must be the same everywhere, no assumed size, and no arguments with miss-matched shapes.
No new vote on the overall kernel was recorded in the minutes. The disposition of the kernel is tied to the discussion of document reorganization.
Changes to HPF are:
Interoperability example:
Interface extrinsic (c) FUNCTION CFUNC(w,x,y,z,a,p) Name("Cfunc") real, map_to(cfloat)::cfunc type(my_type ::w integer, map_to(short)::x real, map_to(float)::y integer, map_to(char)::x integer (kind=4), map_to(short,layout- c_array):: a(100) integer (kind =c-void-pointer), map- to(pointer)::p end function end interface The call: type(my_type) ;;x integer ;;z,x real ::y integer(kind=4)::a(100) integer, (kind=c_void_pointer) ::p r=cfunc(w,x,y,z,a,loc(p)) C prototype: float cfunc(struct my_type w, short x, float y, char z, short a{100), int *p)
What if you need to prepare a field of a structure or ??? Might want a function that is convert-to-c-type. Might also have your own function that converts. There was a question about converting file pointers ... but this hasn't been addressed. Some concern about adding LOC to was expressed. Jerry Wagener asked if we should pick something to be the equivalent of "*" in C instead of loc? (More chuckles about the multiplicity of functions that * seems natural for.)
Miles Ellis (sp?) from Oxford is current chair for the ISO group chartered with C interoperability, but Jerry Wagener (and probably X3J3/ISO) is looking to what we do here.
Strawpoll on the proposed interface - 16 - 1 - 7.
We next had a strawpoll about whether user-defined map functions to convert user defined types should be addressed: 13 - 2- 10.
The vote on conversion functions convert-to-c(x,ctype- "float") was similar: 13 - 1 - 10.
Carl Offner next took a straw poll related to the proposal for changing the requirements for explicit subroutine interfaces.
Should we remove the distinction between prescriptive and descriptive? Concern is that it may break current programs, but there is a lot of resulting simplification in definition. Yes remove: 18-0-7
Mary asked for an additional strawpoll. Should we simply require explicit interfaces everywhere. It was argued this raises the threshold for learning - and for simple programs. Requirement everywhere was voted down: 6 - 9 - 10.
Scott Baden presented a 0th reading of his document about Calling HPF from an Extrinsic Language - SPMD to HPF. Multi-Data-Parallel is one way to think of this.
The motivation is to coordinate multiple HPF programs. Potential uses are SPMD coordination interfaces such as adaptive multiblock, MPMD such as multi-disciplinary or task parallel, and SMP clusters or processor subsets where a coordination layer handles external communications.
The proposed model:
Data motion between HPF programs is handled explicitly at extrinsic level, e.g. through MPI inter communications.
As a restriction, you can't pass mapped arrays to allocated within different communicators to the same HPF routine because the set of processors is not well defined.
Outstanding issues are: Language independent definition and data conversion.
Changing a processor topology for an HPF program and overlapping communications for 2 or more HPF programs.
Should HPF common blocks be accessible from SPMD - perhaps only if they are sequential (kernel)?
Restriction #4 in the paper should be dropped.
Some general comments were:
General straw poll about developing a proposal for callability of HPF from SPMD environment: 19 - 0 - 5.
Jerry Wagener gave a report on F95 status. In their public review process, they got 3 communications from HPFF ... a general letter plus detailed comments from Carol Munroe. There were about 450 comments and they resolved all but 2. They also got a number of suggestions for F2000. Reply letters will come in about November.
The US vote on the proposed standard was NO with (20 minor fixes) {It will change to yes when fixes are made --- e.g. there was a missing "not" in one place.}
WG5 will make the actual changes. They expect it to be official in mid 96.
The requirements for F2000, including some from HPFF have been logged into a journal of requirements. These are available from:
ftp.ncsa.uiuc.edu X3J3 document 004
US recommendations for requirements were made with 6 high and 17 medium. All 3 of the HPFF recommendations (interop, fp except, async I/O) made the US "high" list.
Floating point exception handling is well along. John Reed is taking lead on this. C interop is highest on everyone's list, but partially because of some political reasons, no one has taken the ball. And for async I/O, there is a good chance that the HPFF definition will be it.
Jerry gave a review of floating point exception issues. There are now two approaches. The long-time approach is via an ENABLE construct.
ENABLE.. ! conditions may be set HANDLE ! conditions cleared END ENABLE module conditions type condition private logical :: flag end type condition type (condition) :: overflow type (condition) :: divide-by-zero ... type (condition), parameter:: quiet - condition(.false.) type (condition), parameter :: signaling - condition(.true.) ... end module conditions
The other approach uses intrinsic procedures and is more specific to floating point:
logical function IEEE_NAN logical function IEEE_INF logical function IEEE_UNDERFLOW
15 query functions allow you to inquire about flags, etc.
subroutine IEEE_flag_set(flag, value) subroutine IEEE_flags_clear logical function IEEE_flag_get(flag)There are 28 procedures in all.
module IEEE_arithmetic type IEEE_flag private integer ::I end type IEEE_flag type (IEEE_Flag), parameter :: overflow - IEEE_flag(3)
The problems associated with these are approaches have to do with how to define the enable action on a data parallel operation where the overflow only happens on one processor.
enable (overflow) A = B*C ! data parallel operation handle ... ! what to do with overflow on only one processor? end enable
Does each processor have its own local overflow flag? That may be the way that the hardware works, but it is not the way that F90 module objects work. If there are local flags, is the "handling" done locally or later? Guy Steele asked about what would happen if this were the equivalent F77-style loop.
The other problem occurs where an outer procedure declares a handler, calls a middle procedure, which calls an inner procedure which has enabled overflow with no handler. If overflow occurs, it returns to the middle layer where nothing in enabled, so nothing happens. Partly this problem stems from separate compilation.
The group finished for the day about 5:45.
CCI 28: ALIGN with A(*,:)::B(:) is misleading - so this has been changed so that what the user is intending is ALIGN (:) WITH A(*,:):: B
Proposed changes:
Page 24, line 6 Change "" to " " Page 24, after line 13, add H303 is [(explicit-shape spec-list)} or H304 is or Page 24, lines 24, 26 Change " " to " " Page 24, lines 34, 35 Change " " to " " CCI 11 Pointers cannot be mapped. Can they be associated with both sequential and non sequential targets? Subgroup recommends that pointers cannot point to sequential variables. Full group didn't think this was right and asked the subgroup to try again.
Next Piyush asked for some guidance for the subgroup on indirect mapping: The subgroup voted 0-5-0 : Use indirect align instead of (replacement for) indirect distribute; 5-0-0 Must use an ONTO clause whenever INDIRECT is used in a distribute and redistribute. Votes from the full group were: Indirect distribute over indirect align if you can only do one: 15 - 0 -2. Develop a proposal to consider addition of indirect align? 5 - 7 - 7.
A new proposal was presented for group opinion: No sequential variables can be explicitly mapped. Why? Mapping of 1-d aggregate covers has never been implemented. EQUIVALENCE can be largely replaced by using modules (with renaming) dynamic memory allocation and TRANSFER. Should a full proposal be developed: 13 - 3 - 5 .
Jaspal Subhlok gave an overview of Task Parallelism proposals for HPF processor subgroup with variables mapped to subgroups execution ON processor subgroups task Regions subgroups and ON are all ready being discussedVariable visibility in task regions is the primary issue. The processor subgroup code has unrestricted access to local data while other code has unrestricted access to all data. A subgroup can access global data only if that would not cause a data dependence - i.e. *. And subgroup code sections are single entry, single exit.
*(a) access is read only or (b) access is exclusively by one subgroup.
Execution Model
For subgroup P1 processors execute code on P1 skip code on anything else execute other code (ALL) like (normal) HPF code.This allow "old fashioned" parallel sections
task region ON P1 ... ENDON ON P2 ... ENDON end regionAnd pipelining:
task region DO i-1,100 ON P1 read and compute (A1) ENDON A2 = A1 ON P2 use A2, print something ENDON end regionNote this proposal assumed that there is named subgroups for allocation of data. This is impacted by earlier group votes about processor subgroups and names.Rob added an example
on home (P(lo)) call F on home (P(hi)) call GThis is ok as long as there is nothing touched by both that is written by one of them (causing some wait). There is substantial analysis required for compilers to determine this.If there are no dependencies, the regions can run in parallel .
Consider:
A= F(A) 1:3 B=A 1:6 B=G(B) 4:6 C=B 4:9 D=W(C) 7:9where this is in a loop and the x:y following a statement just represents which processors are executing that line (are in the subgroup named for the region).In this example above ... procs 1:3 could start the next iteration as soon as they are finished with the B=A.
Note that this is related to Scott's multiparallel proposal.
Should group C develop a proposal along this line for task parallelism: 17 - 2 - 2.
For next time prepare examples to show differences between Rob and Jaspal's versions ... and show how it works on SMP's
Carl Offner --- 2nd reading Subroutine interfaces
24/48 change "three" to "two" 25/7-11 delete (explanation of descriptive syntax 16/39 delete *(dist-format-list) in H308 26/47 delete *processors-name in H311 27/18-22 delete (2 constraints) 33/22 delete *align-target [(align-subscript-list)] in H320 33/47-48 delete (constraint) 34/1 delete (constraint) 45/45 - 46/3 example now need explicit interfaceRewrite section 3.10 to delete all references to descriptive mappings, and to include and be consistent with the following statement:
An explicit interface is required in each of the following cases:
- A parameter is passed transcriptively or with the inherit attribute.
- The mapping of a dummy argument is not the same as the mapping of the corresponding actual argument.
Rob asks about scalar arguments. Don't want to have to have explicit interfaces for scalars.
Rob: Amend the mapping of a non-scalar dummy is not the same as a non-scalar actual.
Henry and Carol worried about implicitly mapped arrays. The default is that they are eligible to be mapped.
Andy: this is a HPF language problem ... Ken: this proposal when added to existing HPF, breaks F90 compatibility more.
Need to fix the implications for implicit mapped arrays. The proposal was tabled until next meeting: 17 - 2 - 1
Piyush Mehrotra presented one more idea for consideration. Facilities for redistribution:optional list with Dynamic specifying possible range of distributions that can occur
Dynamic [(possible-distr-list)] # of items is same as rank of array possible -distr is (dist-item-list) dist-item is BLOCK[()} or CYCLIC[()] or var-block or indirect or * or ALL Dynamic ((block,cyclic), All, *, Indirect)restricts the kind of things each dimension of a Dynamic 4-dimensional array can point to.Would like to see a proposal: 14 - 0 - 8
Finally, some administrative issues were addressed. If we meet at the Bristol instead of this hotel in the case that an Arlington hotel is not available? 3 - 7 - many.Meeting dates and sites were discussed. November 1-3 and January 9-12 have already been selected. Dates for the March meeting will be March 13-15. The location is TBD. The possibility of a European meeting in May is being investigated. Good dates for Ken are after the first week of May, up to the first week of June. Barbara Chapman has invited the group to Vienna. Henk Sips will work with Barbara on a date and place - perhaps a workshop followed by a meeting. Ken requested a location with convenient direct flights.
Mary reported that the SC95 BOF is scheduled for Wednesday evening - and that there is a WWW page for the BOF accessible from the SC95 home page. At the November meeting we will determine the specific agenda.Mary also reported on an group called the SSWTG that is working with Cherri Pancake to develop requirements for parallel systems that can be used in RFP's. The group process has been for users to define a list of desires, vendors to prepare an estimation of difficulty, and finally the user group to vote on a final list. The third meeting of the group is scheduled just after this HPFF meeting.
The proposal schedules for next meetings was reviewed. Following are the goals for proposal processing.
group E 1st reading 2nd reading kernel Sept. Nov interop Sept. Nov format Nov Jan. (straw poll in Sept.) explicit interf July Nov. SPMD-HPF Nov Jan. local_global func Sept Nov.. group D 1st reading 2nd reading gen block July Sept. irreg map July Nov dist exten Sept. (Nov) Nov seq. nonmap Nov Jan. subsets Sept. Nov derived type Sept. Nov dist ranges Nov Jan. shadow width Sept. Nov OOC Nov Jan. group C 1st reading 2nd reading async July Nov on Sept. Nov reduc Sept. Nov task Nov Jan. gen transpose Nov Jan.Also proposal may be needed to clarify number_of_processors.
BREAK for 15 minutes
Alok Choudhary made a presentation of a proposal for "out of core": OOC (Ken comments to Rob that it couldn't be sent to HPFF- "core" because it is "out-of-core".)The purpose is to support large data sets that do not fit in memory. Compiler and runtime support needed to manage, access/stage OOC data (transparent to the user...).
Question:
distribution of OOC arrays? Any restrictions? loading ooc arrays? e.g. initialization, presumable data comes >from some file(s) given large data sets, parallelism, is important in loading/writing. management of ooc arrays should be hidden >from user Given those questions: !hpf$ processors P(2,2) !hpf$ template T(100,100) !hpf$ distribute T(block,block) onto P !hpf$ out-of-core:T !hpf$ Align with T: A,B,Ccompiler can use this info to set up files using whatever model it wishes to use ... shared, local- distributed ...File specification
!hpf$ Associate(
, [,other parameter) This automatically tells the compiler where the data comes from [,other parameters] can be used to specify data organization in file. Initialization could also be done using runtime library?
Call popen (20, "huge0data', unformatted) call file-info (20, "forge-data') ! get information about file call pread(20,A) ! in parallel call file-info(20,'huge-data') ! get information about file call pread(20,a) ! in parallelOther Questions specification of persistence? specification of data organization within ooc array (i.e. if compiler is going to create temp files - hint about how to organize within file)
Should proposal happen? 10 - 1 - 11
Rob comments that this is remapping at program boundaries.
Discussion: F95 doesn't fix the NEXTREC as we requested in HPFF94 - default integer problem. X3J3 turned it down. Jerry explained his difficulty in constructing examples to see the need. Some vendors think it would cause difficulties, and the vendor who was the primary requester is no longer represented on X3J3. We decided not to take any further action.
There was a brief discussion of what can be done to foster HPF acceptance. Mary presented a list is issues that she has prepared for Ken earlier.What language features will help? (C interface, etc.) Role of F90 acceptance / use? Getting HPF into education system ... What new parallel systems to universities have? Do they have HPF yet? How will HPF be attractive for SMP clusters? Is compiler performance ready yet? Digital's approach that HPF is just extra features of regular Fortran compiler Role of basic/advanced language idea - for implementations AND for understanding/teaching. As part of this discussion, David Loveman presented a proposal for a major document reorganization for the HPF 2 document. Main Points introduce a part structure show the kernel features in part of the document labeled HPF rest of stuff in hpf-extended hpf-conforming means have the basic stuff hpf subset is removed move F95 features to annex ... use F95 syntax move library procs, etc. to annex put locals into extrinsics chapter F77 features ... in separate section with HPF with Fortran 77 in separate section hpf extended features. {marketing says this needs to be in both sections} and lots of details ...Straw poll on 0th reading .... 16 - 0 - 1X3J3 got free copies of Frame. The proposal is to move the document to MsWord and maybe also to Frame.
Special HPF report curtsey of Jaspal Subhlok (present by Andy Meltzer).Top 10 reason for HPF success: 10. It focuses on what we know best - dense arrays does not get distracted by what computational scientists really want - sparse arrays, irregular meshes, etc.....
9. It was designed by computer scientists -languages designed by non-CS people, like Matlab and Mathematica, have clearly been unsuccessful.
8. It was designed by academics - Languages designed by industry people, Fortran, C, C++, are ugly. Academics have good taste and design clean languages, Haskell, ID, EUCLID, ...
7. It was designed by a committee. - It is important to exploit the collective wisdom of a large group of people. Languages designed by 1-2 people are never successful - C, C++ come to mind. When a languages is designed by a large committee, ADA for example, nothing gets overlooked.
6. It builds on proven technology base - f90. This is called risk reduction in industry
5. The committee has taken its time to do things right. By building on a proven base, you can afford to take time to mull things over. Besides its not like we are in a hurry - what's the competition? Uniprocessors will never be powerful enough for most users, and explicitly parallel languages will never be popular.
4. It is difficult for both users and compiler write. This makes the users respect the language, and is a full employment act for compiler writers.
3. It is hard to predict the performance of programs. Small changes in the program make big changes in performance. This "Gump Effect" makes a language powerful.
2. It is not C++. Abstract data types, type hierarchies, inheritance are all hot topics, but since C++ is worrying about them, we ignore them.
1. Any problem that costs $10M is a success.
And finally, the LPF Report: Andy MeltzerLPF 2 of the 1,297,323,241,913,021 proposals submitted, ... were accepted.
LPF proposal 235,714,011,122: !LPF$ Nobody_home -the process must wait for the processor to arrive. If the processor doesn't arrive the process waters the plants, brings in the mail and steals the TV.
LPF proposal 3,780 !LPF$ ALIEN asserts that needed data is on a computer in a different country, but with appropriate paper work, it can immigrate.
LPF proposal 14,297,456 !LPF$ Illegal Alien ignore the fact that data is not resident and pays no social security. Use it then throw it out. The feature is only accepted by programs written by elitist liberals in academia or the media.
LPF proposal 11,222,373 !LPF$ Resident_alien so long as the alien data is immediately productive it may stay. If it comes unproductive it is kicked out-of-core.
LPF proposal 21,222,891 !LPF$ ADJUNCT_GROUP_SHADOW assures that there is a group of processor separate from the known processors in the system. They actively censor the other processors.
LPF feel the pain of the HOMEless data left out-of- core. With this in mind, we propose feature # 1412521436 !LPF$TAX_HOMEOWNERS gets memory for homeless data by taking it from data with a HOME.
Once this data has a temporary domicile, LPF want to make the data productive. #1412821438 ON TEMPORARY_SHELTER this clause must be used fast before the data is kicked back out-of-core for being a slacker and being unproductive.
In core data that has not contributed to a solution in more than 2 million operations is to be migrated out- of-core. Why should productive data support data that doesn't want to do anything? (Proposal 2,822,123,721)
Meeting adjourned at 12:00
Next meeting: Nov 1-3 Dallas area.
Attending the Sept. 95 HPFF Meeting: Robert Babb U. of Denver babb@cs.du.edu Scott Baden UCSD baden@cs.ucsd.edu Bob Boland LANL wrb@lanl.gov Zeki Bozkus The Portland Group zeki@pgroup.com Alok Choudhary Syracuse U. choudhar@cat.syr.edu Ian Foster ANL itf@mcs.anl.gov Tom Haupt Syracuse U. haupt@npac.syr.edu Ken Kennedy Rice U./CRPC ken@rice.edu Charles Koelbel Rice U. chk@cs.rice.edu John Levesque APR levesque@apri.com David Loveman Digital loveman@msbcs.enet.dec.com Piyush Mehrotra ICASE pm@icase.edu Andy Meltzer Cray Research meltzer@cray.com Carol Munroe Thinking Machines munroe@think.com Carl Offner Digital offner@hpc.pko.dec.com Guy Robinson University of Vienna robinson@vcpc.univie.ac.at P. Sadayappan Ohio State U. saday@cis.ohio-state.edu Joel Saltz U. of Maryland saltz@cs.umd.edu Rob Schreiber RIACS schreiber@riacs.edu Yosiki Seo NEC Henk Sips TNO/Delft U of Tech. henk@cp.tn.tudelft.nl Guy Steele Sun Microsystems guy.steele@east.sun.com Jaspal Subhlok Carnegie Mellon jass@cs.emu.edu Jerry Wagener Oklahoma U.,X3J3 jwagener@cs.uoknor.edu Joel Williamson HP/Convex joelw@mozart.convex.com Henry Zongaro IBM Canada zongaro@vnet.ibm.com Mary Zosel LLNL zosel@llnl.gov