Executive Summary
The primary business of this meeting was the first or second reading of a number of language proposals. Following summarizes the status. Details of the proposals are included in the full minutes below.
Proposals passing second reading: Generalized block distributions.
Proposals still in progress of second reading: Irregular mapping; Explicit interface requirements; Async I/O.
Proposals passing in first reading: Mapping Derived Type Components; Mapping to Subsets of Processors; Specification of Shadow Widths; C interoperability; HPF kernel definition; Function result in Local_to_Global query; Reductions; ON and RESIDENT directives.
Proposals discussed in a preliminary review: Mapping function (won't be pursued); Generalized transpose; Task parallelism functionality; Document reorganization; Out-of-core arrays; SPMD to HPF interface; Restrictions on Dynamic redistribution; Eliminate explicit Sequential mapping.
Other items of special note: There will be an HPFF BOF Wednesday evening at SC95 where the proposals for HPF2 will be reviewed. Two of the areas we are considering, ASYNC I/O and C interoperability, have been identified as "high priority" items in the US official vote for functionality in the next round of Fortran standardization. HPF definitions are likely to form a basis for proposals in these areas.
And a further note about HPF simplification. A proposal for a severely limited kernel of HPF has strong support. There is also a proposal under consideration that this smaller form of the language should be designated HPF2, with all other features considered "extended HPF".
The next HPFF meeting is schedule for the Dallas area November 1-3. It is expected to be a similar session of heavy proposal processing.
End of Executive Summary
Sept. 20: Subgroup meetings chaired by Rob Schreiber, David Loveman, and Piyush Mehrotra were held from 1:30 through the evening.
Sept. 21: Ken Kennedy called the meeting to order at 8:40. Introductions and the initial count of installations were made. 27 people from 24 institutions were present.
Generalized block distributions: add to H09 dist-format is BLOCK [(int-expr)] BLOCK [(int-array)] ...
Constraint: The int-array appearing in the dist-format of a DISTRIBUTE directive must be a restricted expression.
Semantic constraints:
need to add text on number of processors int-array(i) is the size of the block on the ith processor
Passed: 18 - 1 - 4
The next proposal presented was for Irregular Mapping - 2nd Reading:
Add to H309 dist-format is BLOCK ... or INDIRECT (int-array) Constraint: The int-array appearing in the dist- format of a DISTRIBUTE directive must be a restricted expression. Semantic constraints:
Int-array (i) yields the processor number of the ith element of the array
Discussion: Chuck Koelbel gave an alternative, perhaps more natural way to get this same function - e.g. aligning edges with nodes. Rob Schreiber asked about the onto clause. The expectation is that it is a 1 to number of processors. It was agreed that clean-up was needed. An official vote was taken, with a very large abstain vote. Ken asked for clarification of why people were abstaining. There was a combination of reasons - e.g. some because of technical details (what does "yields" mean?), and others because they thought there might be a better way. In light of this, the formal vote was withdrawn, and the proposal was returned to subgroup for more detailed work.
Next was a first reading of Mapping function
Add to H309 dist-format is ... or INDIRECT (func-name [,arg- list])
Constraint: func-name is a function with the following properties
The functions takes the array index and other values as arguments and returns the processor index. Example: DISTRIBUTE A (INDIRECT (f,10,100)) onto P f(i,10,100) returns the processor index of A(i) There is no restriction on arguments - compiler has to make a copy. Advice should be given to use scalars or replicated arrays.
There was a question of whether the array index should be explicit in the declaration, but the i in f(i,10,100) can be supplied by the compiler
Rob asked why the extra restriction on pure? The reason was that otherwise the compiler might have to make a copy of everything that it can access?
Chuck asked how to make this a syntactic constraint. Do we need a keyword PURER? Otherwise this is probably better a semantic constraint. Joel Saltz questioned the purer-than-pure, saying one typically might want a data structure for binary search with strange data structure . Rob pointed it could be some kind of an opaque struct. The issue is that these parameters are passed by value - as if f were evaluated for all k on entry. A copy must be made at definition time, because if the data structure changes later, there is a need to keep the original value. A straw poll was taken: Should this be pursued? 5 - 8 - 13 It won't be pursued.
First Reading of Mapping Derived Type Components:
Defn: A component of a derived type is considered to be mapped if it is either an intrinsic type and explicitly mapped, or it is a structure and any one of its components is mapped.
Change the first constraint under H303-311 and H312- 318
An object-name mentioned as a distributee/allgnee must be a simple name OR A COMPONENT OF A DERIVED TYPE BUT MAY NOT BE A SUBOBJECT DESIGNATOR OF ANY OTHER TYPE.
Add: Constraint: A component of a derived type may be explicitly mapped if it is of an intrinsic type or if it is a structure and none of its components are mapped.
Constraint: A variable of a derived type can be explicitly mapped if none of the components of the derived type are mapped.
Note:
Guy Steele asked if the terminology used is right - for component names, subobject designator ... etc.
Ken interjected general group instructions at this time. For the second reading of proposals, we need full text.
There is a need to allow for both a default distribution and an explicit distribution.
straw poll: 17 - 1 - 8
1st reading Mapping to Subsets of Processors:
Change H311 to: dist-target is processor-name [(section-subscript-list]] or * processors-names [(section-subscript-list)] or *
Restrict subscripts used to be non-vector-valued: Constraint: In a section-subscript-list, the number of section-subscripts must equal the rank of processor- name.
Add rules: subgroup-directive is SUBGROUP procs-name of target-sect target-sect is procs-name (section-subscript-list)
Constraint: In a section-subscript-list, the number of section-subscripts must equal the rank of processor- name.
Constraint: target processors name must be defined via a PROCESSORS directive (no subgroup of subgroup) . The subgroup vote on this was 2 - 2 - 2.
Inherits the rank and extents of the target subsection. Example: SUBGROUP PSUB of P (2:6, 3:15:2) defines PSUB as a processor array (5,7).
Add rule: subgroup-dir is SUBGROUP procs-name [(explicit-shape-spec-list)] OF target-sect (subgroup vote: 2-3-1)
Where:
PROCESSORS P(100) SUBGROUP P1(50) OF P(1:50) SUBGROUPS P2(5,10) of P(51:100)...
Ken asked that we first consider the base proposal of sections. A straw poll of 14 - 2 - 10 indicated yes, go ahead and develop a full proposal for sections.
Next was discussion of the subgroups part of the proposal. Part is syntactic sugar and part has functionality.
The vote about named subgroups was 6 - 3 - 16. This will require a VERY good case in 2nd reading to pass.
Next was discussion about reshaping subgroups - should it be allowed?
Guy asked if he could have P3(4,4) of Q(3:6,4:7).
But what about subgroups of subgroups? If subgroups are added, should we forbid subgroups of subgroups? 14 - 0 - 12
So Guy's point is that either subgroups of subgroups or the 2nd version, but not both.
Jaspal Subhlok asked: why not separate reshaping from subgroups? ... history ...
Rob pointed out that this is a form of align for processors, why not do it with ALIGN?
Piyush asked that we vote on the reshaping in general: strawpoll 9 - 8 - 9 Ken would like to see a full proposal --- but it BETTER be good!!! Jaspal says that this functionality is needed for the tasking proposals.
There was an additional discussion about whether there should be a proposal using align for subgroups. A comment was made that if the proposal comes back with align, it will be a 1st reading instead of a 2nd reading. A straw poll about using align got a vote of 3-9-12.
Carl Offner presented the first reading of a proposal for Specification of Shadow Widths.
Replace H309 (page 26) with dist-format is BLOCK [,SHADOW-SPEC-OR-INT-EXPR])] ... with the optional specification added to all the different kinds of distributions.
Add new rules:
shadow-spec-or-int-expr is shadow-spec or int-expr shadow-spec is SHADOW-int-expr or low-shadow-in-expr or high-shadow-int-expr or low_shadow-int-expr, high-shadow-int-expr or high-shadow-int-expr, low-shadow-int-exprconstraint: ANY int-expr appearing in a shadow-spec or in a shadow-spec-or-int-expr must be a specification- expr with value >+ 0.
The absence of a shadow-spec or a shadow-spec-or-int-expr is equivalent to shadow=0. Shadow-int-expr is equivalent to low-shadow-int-expr, high-shadow-int-expr. A shadow-spec-or-int-expr that is just an int-expr is equivalent to shadow-int-expr. Specifying only low- shadow implies high-shadow=0. Specifying only high- shadow implies low-shadow=0.
The primary reason for doing this is to facilitate the passing of shadowed arrays across subroutine boundaries. There are both storage allocation issues and the data-motion issues. This is intended to be just advice - to tell the compiler that if it uses shadows, how big to make them. This does expose some implementation issues to the user. Scott Baden ask if the specification should be on the compile line? It was pointed out that any time shadow widths are different across the boundaries of a subroutine call, then there is some implied data motion. This requires more things to pass in the descriptors across the subroutine boundary. The owner of the data is still well defined. And any updates of the data will result in messages to update the shadow values.
Straw vote about should this go to full proposal? 13 - 2 - 10
Chuck Koelbel presented the first reading of an ON directive proposal.
Motivation: Some programmers want more control where which processor executes the operation makes a big difference, or whether data is moved makes a big difference. HPF1.1 makes these decision the compiler's job. Some people simply don't trust the compiler.
ON HOME recommends where operations should be executed (like distribute recommends where data should be placed). RESIDENT asserts that data need not move (like independent asserts operations are parallel).
These apply to a single statement or block of statements. They name a processor or set of processors, either directly (i.e. section of processor array) or indirectly (e.g. owner of array or template elements). They tell the compiler to execute the operation on the given processor. Other processors may have to move data (see RESIDENT), and call statements need special consideration. This functionality is most useful inside a parallel loop or region ON directives can be nested, but the inner processor set must be subset of the outer set.
Examples
!hpf$ on home (a(i)) a(i+1) = b(i) + a(i) --- DO i=1,n !hpf$ on home (a(indx(i))) block a(indx(i)) - b(i) b(i) - c(indx(i)) !hpf$ end on end do ------------ !hpf$ on home (x(i,:)) block Do j - 1,m !hpf$ on home(x(i,j)) x(i,j) - foo (y(j,i)) end do !hpf$ end onWhat can go in HOME?
Original proposal: Any reference to a variable, template, or processor arrangement (including sections and vector-valued subscripts).
Amendments from subgroup meeting: Only scalar references and regular sections of variables, templates, and processors. The argument for the amendment is easier implementation. The argument against the amendment is that there is no difference for irregular distributions.
Second amendment: Any function called must be PURE. The argument pro is to ensure no side effects and well- definedness. There was no argument against. There will probably be CCI issues related to function calls in realign.
Ken commended the group on the preparation and presentation detail of the initial proposal. The vote to pursue ON for a second reading passed: 20-4-3.
Some additional detail was then presented.
Calls in ON HOME blocks:
Consider the following example.
!hpf$ ON HOME (a(1)) call foo(a) ---- subroutine foo(x) x(2) = ...
The problem: can foo be compiled using the owner- computes rule?
Suggested solutions are that the called routine must have a valid on-home. There might be a declarative form of ON that could go in an explicit interface, or every processor might execute every call.
There was discussion about whether this forces 1-sided communications. We could say that all procs execute all calls and add some other form of ON HOME that eliminates this requirement. There was a query about whether we are really looking for something similar to hpf-serial, or need a purer-than-pure routine.
A straw poll was taken about defining some declaration usable in an explicit interface to have the solution: 7 - 1 - 18
Resident directive: (This was called local in the
paper version of the proposal.) This can be an optional
clause in ON HOME or a free-standing directive. It
gives an optional list of variable references (i.e.
variable names, elements, regular sections, ...). Each
reference (and, therefore, any subobject of the
reference) is stored on processes invoked in
surrounding ON HOME. There was some refinement of the
definition that resulted from the subgroup discussions.
For a read reference, at lease one of the ON
processors must store a copy and for a write reference,
the ON processors store all copies.
Terms of the form
Resident examples
As a further example for discussion:
It was pointed out that pure can only call pure, and it
must be declared. We might have the restriction that
the subroutine has to be declared resident and
everything it calls must also have the declaration.
Why ALL is needed:
This relates to whether we are talking about the
address or the reference.
Guy expressed an opinion that ALL is confusing - we
need something else. (And tongue in check suggested
that maybe it should be *.)
First reading straw vote ON RESIDENT : 9 - 1 - 17
LUNCH BREAK
Overview examples of what is in the written proposal:
Intrinsic reduction
Defined Reduction
It was suggested that the syntax should be "," instead
of ";" because the combine= keyword will disambiguate.
A question was asked about any restrictions on the kind
of function that is used for combine.
Guy suggested that the identity be identified in the
interface block for the operator. The combine
operator might also be in the interface block --- then
the whole reduction statement goes away. The
information it provides would already be known to the
compiler.
(next example slide)
Implementation: there would be a "local" x per
consecutive block of iterations, with a 4,4 identity,
constrained (non commutative) fan-in combine. There was
a discussion of commutivity of the matrix example and
the shape of the identity, and shape results for the
combines. Intermediate temporaries are the same shape
as y,
Straw polls:
Intrinsic reduction (without things like matmul): 22
- 3 - 1
Adding things like matmul: 8 - 7 - 10
Defined reductions that are commutative and associative
(with the ideas that were proposed by Guy to
simplified); (including discussion of whether the
reduce statement belongs on the independent rather than
by the actual reduction.) 17 - 6 - 3. (all no votes
and most of extensions from vendors)
intrinsic reduction: no mixed reduction operators
implied combine entity.
Question - should it be an error if the user gives an
identity operation that isn't valid.
Summary
both the I/O statements and WAIT may have IOSTATE and
ERR args.
(2) Wait(Unit = int, Poll='ID', ID = int, DONE = lvar)
WAIT (POLL= 'ALL', UNIT= int, DONE = lvar)
(3) UNFORMATTED, DIRECT files only
(4) Multiple outstanding reads ok, multiple outstanding
writes ok, but not both!
Note, in this proposal there is no indication in the
open statement ....and the ASYNC keyword is gone.
Alok questions the need for an explicit ban on writing
the same block simultaneously.
straw poll ... if there are 2 outstanding writes on the
same block should we guarantee the second? 3 - 3 -
15!!! Should it be processor dependent? 5 in favor;
non-conforming? 2 in favor.
We need a generalized transpose and F95 isn't doing it.
We might do this with reshape , but reshape is pretty
awful. This would be a recognizable (efficient)
version of the special useful case. But should we just
lobby vendors to do a better job with this special
case? In reshape, a SHAPE is required. Another
possibility is to overload transpose and define it for
higher dimensions (reverse) where order is optional.
This then makes it an intrinsic. Straw poll about
developing a proposal for overloaded transpose: 17
- 1 - 4.
CCI 31: Should number_of_processors reflect the
processor "structure"? Currently HPF 1.1 defines the
result as "vendor-independent". The subgroup decided
that this is appropriate for HPF 1.n.
But that HPF2 might want to consider something
different. Consider that number_of_processors is used
for controlling data distribution (where it really
refers to number_of_memories) and might also be useful
for work distribution (e.g. ON). Sorting this out and
doing the right thing for SMP's is a non-trivial issue
and a topic for HPF2 (or 3?) - not a CCI. We may need
additional calls to reflect the different
functionality.
CCI 33: local_to_global currently can only be called
for arrays bound to global actuals. Should we also be
able to inquire about the function result? In theory
the answer is yes. This appears to be a very simple
change to the document. But, there may be a small
additional runtime cost for HPF to HPF_LOCAL calls.
The primary discussion was whether or not this was a
CCI (fix for HPF 1.2) or new proposal (HPF 2.0). The
subgroup recommended HPF2, with vendors free to
retrofit to HPF 1.n if they wish. This was confirmed
by an institutional vote 19-0-2.
There was discussion for straw-polls on some of the
specific details.
What distributions should be in the kernel? The
subgroup considered 4 possibilities:
Ken lead a discussion and set of straw votes, starting
with the assumption that BLOCK is in, what else should
be included?
No one supported permutations of subscripts or strides
for the kernel.
Straw poll on offsets in addition to everything else:
5 - 9 - 11
Straw poll on different extents 13 - 3 - 7
Straw poll on collapse 13 - 3 - 10 and replication
13-3-10.
("Slow" people were asked to sit on the far side of the
room.)
After this vote, there was a reconsideration of CYCLIC
(N) in the context of simpler aligns: 8 - 8 - 10.
Andy pointed out that removing SEQUENCE from HPF kernel
means that there is no equivalence, common blocks must
be the same everywhere, no assumed size, and no
arguments with miss-matched shapes.
No new vote on the overall kernel was recorded in the
minutes. The disposition of the kernel is tied to the
discussion of document reorganization.
Changes to HPF are:
Interoperability example:
What if you need to prepare a field of a structure or
??? Might want a function that is convert-to-c-type.
Might also have your own function that converts. There
was a question about converting file pointers ... but
this hasn't been addressed. Some concern about adding
LOC to was expressed. Jerry Wagener asked if we should
pick something to be the equivalent of "*" in C instead
of loc? (More chuckles about the multiplicity of
functions that * seems natural for.)
Miles Ellis (sp?) from Oxford is current chair for the
ISO group chartered with C interoperability, but Jerry
Wagener (and probably X3J3/ISO) is looking to what we
do here.
Strawpoll on the proposed interface - 16 - 1 - 7.
We next had a strawpoll about whether user-defined map
functions to convert user defined types should be
addressed: 13 - 2- 10.
The vote on conversion functions convert-to-c(x,ctype-
"float") was similar: 13 - 1 - 10.
Carl Offner next took a straw poll related to the
proposal for changing the requirements for explicit
subroutine interfaces.
Should we remove the distinction between prescriptive
and descriptive? Concern is that it may break current
programs, but there is a lot of resulting
simplification in definition. Yes remove: 18-0-7
Mary asked for an additional strawpoll.
Should we simply require explicit interfaces
everywhere. It was argued this raises the threshold
for learning - and for simple programs. Requirement
everywhere was voted down: 6 - 9 - 10.
Scott Baden presented a 0th reading of his document
about Calling HPF from an Extrinsic Language - SPMD
to HPF. Multi-Data-Parallel is one way to think of
this.
The motivation is to coordinate multiple HPF programs.
Potential uses are SPMD coordination interfaces such as
adaptive multiblock, MPMD such as multi-disciplinary or
task parallel, and SMP clusters or processor subsets
where a coordination layer handles external
communications.
The proposed model:
Data motion between HPF programs is handled explicitly
at extrinsic level, e.g. through MPI inter
communications.
As a restriction, you can't pass mapped arrays to
allocated within different communicators to the same
HPF routine because the set of processors is not well
defined.
Outstanding issues are:
Language independent definition and data conversion.
Changing a processor topology for an HPF program and
overlapping communications for 2 or more HPF programs.
Should HPF common blocks be accessible from SPMD -
perhaps only if they are sequential (kernel)?
Restriction #4 in the paper should be dropped.
Some general comments were:
General straw poll about developing a proposal for
callability of HPF from SPMD environment: 19 - 0 - 5.
Jerry Wagener gave a report on F95 status.
In their public review process, they got 3
communications from HPFF ... a general letter plus
detailed comments from Carol Munroe. There were about
450 comments and they resolved all but 2. They also
got a number of suggestions for F2000. Reply letters
will come in about November.
The US vote on the proposed standard was NO with (20
minor fixes) {It will change to yes when fixes are
made --- e.g. there was a missing "not" in one place.}
WG5 will make the actual changes. They expect it to be
official in mid 96.
The requirements for F2000, including some from HPFF
have been logged into a journal of requirements. These
are available from:
ftp.ncsa.uiuc.edu X3J3 document 004
US recommendations for requirements were made with 6
high and 17 medium. All 3 of the HPFF recommendations
(interop, fp except, async I/O) made the US "high"
list.
Floating point exception handling is well along. John
Reed is taking lead on this. C interop is highest on
everyone's list, but partially because of some
political reasons, no one has taken the ball. And for
async I/O, there is a good chance that the HPFF
definition will be it.
Jerry gave a review of floating point exception issues.
There are now two approaches. The long-time approach
is via an ENABLE construct.
The other approach uses intrinsic procedures and is
more specific to floating point:
15 query functions allow you to inquire about flags,
etc.
The problems associated with these are approaches have
to do with how to define the enable action on a data
parallel operation where the overflow only happens on
one processor.
Does each processor have its own local overflow flag?
That may be the way that the hardware works, but it is
not the way that F90 module objects work. If there are
local flags, is the "handling" done locally or later?
Guy Steele asked about what would happen if this were
the equivalent F77-style loop.
The other problem occurs where an outer procedure
declares a handler, calls a middle procedure, which
calls an inner procedure which has enabled overflow
with no handler. If overflow occurs, it returns to the
middle layer where nothing in enabled, so nothing
happens. Partly this problem stems from separate
compilation.
The group finished for the day about 5:45.
CCI 28:
ALIGN with A(*,:)::B(:) is misleading - so this has
been changed so that what the user is intending is
ALIGN (:) WITH A(*,:):: B
Proposed changes:
CCI 11 Pointers cannot be mapped. Can they be
associated with both sequential and non sequential
targets? Subgroup recommends that pointers cannot
point to sequential variables. Full group didn't
think this was right and asked the subgroup to try
again.
Next Piyush asked for some guidance for the subgroup on
indirect mapping:
The subgroup voted 0-5-0 : Use indirect align instead
of (replacement for) indirect distribute; 5-0-0
Must use an ONTO clause whenever INDIRECT is used in a
distribute and redistribute.
Votes from the full group were:
Indirect distribute over indirect align if you can only
do one: 15 - 0 -2.
Develop a proposal to consider addition of indirect
align? 5 - 7 - 7.
Variable visibility in task regions is the primary
issue. The processor subgroup code has unrestricted
access to local data while other code has unrestricted
access to all data. A subgroup can access global data
only if that would not cause a data dependence - i.e.
*. And subgroup code sections are single entry, single
exit.
*(a) access is read only or (b) access is
exclusively by one subgroup.
Execution Model
This allow "old fashioned" parallel sections
And pipelining:
Rob added an example
If there are no
dependencies, the regions can run in parallel .
Consider:
In this example above ... procs 1:3 could start the next
iteration as soon as they are finished with the B=A.
Note that
this is related to Scott's multiparallel proposal.
Should group C
develop a proposal along this line for task parallelism: 17 - 2 - 2.
For next time prepare examples to show differences between Rob and
Jaspal's versions ... and show how it works on SMP's
Rewrite section 3.10 to delete all references to
descriptive mappings, and to include and be consistent with the
following statement:
An explicit interface is required in each of
the following cases:
Rob asks about scalar arguments. Don't want to have to
have explicit interfaces for scalars.
Rob: Amend the mapping of a non-scalar dummy is not the
same as a non-scalar actual.
Henry and Carol worried about implicitly mapped
arrays. The default is that they are eligible to be
mapped.
Andy: this is a HPF language problem ... Ken: this
proposal when added to existing HPF, breaks F90
compatibility more.
Need to fix the implications for implicit mapped
arrays.
The proposal was tabled until next meeting: 17 - 2
- 1
optional list with Dynamic specifying possible range of
distributions that can occur
Would like to see a proposal: 14 - 0 - 8
Meeting dates and sites were discussed. November 1-3
and January 9-12 have already been selected. Dates
for the March meeting will be March 13-15. The
location is TBD. The possibility of a European meeting
in May is being investigated. Good dates for Ken are
after the first week of May, up to the first week of
June. Barbara Chapman has invited the group to Vienna.
Henk Sips will work with Barbara on a date and place -
perhaps a workshop followed by a meeting. Ken
requested a location with convenient direct flights.
Mary also reported on an group called the SSWTG that is
working with Cherri Pancake to develop requirements for
parallel systems that can be used in RFP's. The group
process has been for users to define a list of desires,
vendors to prepare an estimation of difficulty, and
finally the user group to vote on a final list. The
third meeting of the group is scheduled just after this
HPFF meeting.
The purpose is to support large data sets that do not
fit in memory.
Compiler and runtime support needed to manage,
access/stage OOC data (transparent to the user...).
Question:
File specification
!hpf$ Associate(
Initialization could also be done using runtime
library?
Other Questions
specification of persistence?
specification of data organization within ooc
array
(i.e. if compiler is going to create temp files - hint
about how to organize within file)
Should proposal happen? 10 - 1 - 11
Rob comments that this is remapping at program
boundaries.
X3J3 got free copies of Frame. The proposal is to move
the document to MsWord and maybe also to Frame.
Top 10 reason for HPF success:
10. It focuses on what we know best - dense arrays
does not get distracted by what computational
scientists really want - sparse
arrays, irregular meshes, etc.....
9. It was designed by computer scientists -languages
designed by non-CS people, like Matlab and Mathematica,
have clearly been unsuccessful.
8. It was designed by academics - Languages designed
by industry people, Fortran, C, C++, are ugly.
Academics have good taste and design clean languages,
Haskell, ID, EUCLID, ...
7. It was designed by a committee. - It is important to
exploit the collective wisdom of a large group of
people. Languages designed by 1-2 people are never
successful - C, C++ come to mind. When a languages is
designed by a large committee, ADA for example, nothing
gets overlooked.
6. It builds on proven technology base - f90. This is
called risk reduction in industry
5. The committee has taken its time to do things right.
By building on a proven base, you can afford to take
time to mull things over. Besides its not like we are
in a hurry - what's the competition? Uniprocessors
will never be powerful enough for most users, and
explicitly parallel languages will never be popular.
4. It is difficult for both users and compiler write.
This makes the users respect the language, and is a
full employment act for compiler writers.
3. It is hard to predict the performance of programs.
Small changes in the program make big changes in
performance. This "Gump Effect" makes a language
powerful.
2. It is not C++. Abstract data types, type
hierarchies, inheritance are all hot topics, but since
C++ is worrying about them, we ignore them.
1. Any problem that costs $10M is a success.
LPF 2 of the 1,297,323,241,913,021 proposals
submitted, ... were accepted.
LPF proposal 235,714,011,122:
!LPF$ Nobody_home
-the process must wait for the processor to arrive. If
the processor doesn't arrive the process waters the
plants, brings in the mail and steals the TV.
LPF proposal 3,780
!LPF$ ALIEN
asserts that needed data is on a computer in a
different country, but with appropriate paper work, it
can immigrate.
LPF proposal 14,297,456
!LPF$ Illegal Alien
ignore the fact that data is not resident and
pays no social security. Use it then throw it out.
The feature is only accepted by programs written by
elitist liberals in academia or the media.
LPF proposal 11,222,373
!LPF$ Resident_alien
so long as the alien data is immediately
productive it may stay. If it comes unproductive it is
kicked out-of-core.
LPF proposal 21,222,891
!LPF$ ADJUNCT_GROUP_SHADOW
assures that there is a group of processor
separate from the known processors in the system. They
actively censor the other processors.
LPF feel the pain of the HOMEless data left out-of-
core. With this in mind, we propose feature #
1412521436
!LPF$TAX_HOMEOWNERS
gets memory for homeless data by taking it from
data with a HOME.
Once this data has a temporary domicile, LPF want to
make the data productive.
#1412821438
ON TEMPORARY_SHELTER
this clause must be used fast before the data is
kicked back out-of-core for being a slacker and being
unproductive.
In core data that has not contributed to a solution in
more than 2 million operations is to be migrated out-
of-core. Why should productive data support data that
doesn't want to do anything? (Proposal 2,822,123,721)
!HPF$ ON HOME (x(k)), RESIDENT (x(indx(k))
x(k) = x(indx(d)) + y(indx(k)) ! know about x references but not indx and y
-----
!HPF$ ON HOME (x(j)), RESIDENT (all=x)
x(j) = x(j) * x(ipermute(j)) *x(j+1) * y(j-1) ! know about all x, not ipermute, y
------
!HPF$ ON HOME (procs(1:np/2)), RESIDENT
call foo(a,b) ! what does this say about what happens inside foo?
ON HOME (proc(1)) RESIDENT
call f(x(a))
subr f(a)
= B(2) ! rob says ok, resident not in this scope, Piyush says illegal if not on p1
!HPF$ ON HOME (indx1(i)) block ! example of free-standing resident
n1 = indx1(i)
n2 = indx2 (i)
!HPF$ RESIDENT (x(n1)) block
tmp = y(n1) - y(n2)
x(n1) = x(n1) + tmp
x(n2) = x(n2) - tmp
!HPF$ end resident
!HPF$ end on
!HPF$ ALIGN y(i) WITH x(i)
!HPF$ ON HOME (procs(1:2)), RESIDENT (ALL=x)
x(i) = x(indx1(i))
y(i) = y(indx2(i)) ! is y(indx2(i)) is resident?
!HPF$ END ON
(no align)
!HPF$ ON HOME (procs(1:2)), RESIDENT (all=x)
x(i) = x(indx1(i))
y(i) = y(indx2(i)) ! y(indx2(i)) is resident!
!HPF$ END ON
The group restarted at 1:05.
Rob Schreiber presented the Reduction proposal 1st
reading
!hpf$ independent
do I=1,n
!hpf$ reduce
x=x+f(i)
enddo
!HPF$ independent, ..., Reduction(list;
combine=concat; Identity=emptylist)
! list might actually be a "list" of reduction
variables, but concat must
! be defined for all of the types included. Also
emptylist might be a function.
Do I-1,N
!hpf$ reduce
list = append(list,element)
enddo
Real x(,4,8), y(8,8)
!hpf$ independent, new y
Do i-1,1000
y = ...
!hpf$ reduce
x = matmul(x,y)
enddo
ASYNC I/O 2nd reading
READ ( ..., ID = scalar-int-var, ...)
WRITE ( ..., ID = scalar-int-var, ... )
Wait(ID ...)
The next proposal addressed was a 0th reading about a
new (non-intrinsic) HPF_LIBRARY function.
GEN_TRANSPOSE (ARRAY, ORDER)
ARRAY - ANY TYPE = RANK N
ORDER - INTEGER, SHAPE = [N]
RESULT - RS (ORDER) == AS where RS is shape of result and AS is shape of ARRAY.
RESULT (J(1), J(2), ... J(N))= ARRAY (J(ORDER(1)),
J(ORDER(2)), ..., J(ORDER(N)))
This concluded the Subgroup C presentations.
Next Mary Zosel presented for group record, two of the
CCI's that were
considered by Subgroup E.
Next Andy Meltzer presented the first reading of the
HPF Kernel proposal. Two changes were made to the text
of the original document circulated: (1) there is no
longer a need to have an extrinsic identifier for the
kernel because of another proposal, and (2) the
sequence directive has been removed from the language.
BLOCK (only)
BLOCK, CYCLIC
BLOCK, CYCLIC(n)
BLOCK, BLOCK(n)
Straw polls:
cyclic(n) - 6 - 7 - 10 ! some support, but?
block(n) - 3 - 10 - 6 ! not really any support
cyclic - 10 - 5 - 9 ! yes people think
Next was a discussion of the options for ALIGN. These
include:
straight alignment ALIGN A(I,J) WITH B(I,J) only
- exactly same shape
straight alignment ALIGN A(I,J) WITH B(I,J) (but
allow for different extents)
allow only ":" (just identity alignment - no
reference to I,J)
allow only ":" and "*" alignments (adding
dimensional collapse, replications)
add offsets
BREAK till 3:25
Andy Meltzer presented the 1st reading of the
Interoperability proposal.
Interface
extrinsic (c) FUNCTION CFUNC(w,x,y,z,a,p)
Name("Cfunc")
real, map_to(cfloat)::cfunc
type(my_type ::w
integer, map_to(short)::x
real, map_to(float)::y
integer, map_to(char)::x
integer (kind=4), map_to(short,layout-
c_array):: a(100)
integer (kind =c-void-pointer), map-
to(pointer)::p
end function
end interface
The call:
type(my_type) ;;x
integer ;;z,x
real ::y
integer(kind=4)::a(100)
integer, (kind=c_void_pointer) ::p
r=cfunc(w,x,y,z,a,loc(p))
C prototype:
float cfunc(struct my_type w, short x, float y,
char z, short a{100), int *p)
Ken announced HP has purchased Convex. This will
result in a change in our vendor slide.
ENABLE
logical function IEEE_NAN
logical function IEEE_INF
logical function IEEE_UNDERFLOW
subroutine IEEE_flag_set(flag, value)
subroutine IEEE_flags_clear
logical function IEEE_flag_get(flag)
There are 28 procedures in all.
module IEEE_arithmetic
type IEEE_flag
private
integer ::I
end type IEEE_flag
type (IEEE_Flag), parameter :: overflow - IEEE_flag(3)
enable (overflow)
A = B*C ! data parallel operation
handle
... ! what to do with overflow on only one processor?
end enable
Friday Morning 8:05
Piyush Mehrotra went over two pending CCI issues:
Page 24, line 6
Change "
A new proposal was presented for group opinion:
No sequential variables can be explicitly mapped.
Why? Mapping of 1-d aggregate covers has never
been implemented.
EQUIVALENCE can be largely replaced by using
modules (with renaming) dynamic memory allocation and
TRANSFER.
Should a full proposal be developed: 13 - 3 - 5 .
Jaspal Subhlok gave an overview of Task Parallelism
proposals for HPF
processor subgroup with variables mapped to
subgroups
execution ON processor subgroups
task Regions
subgroups and ON are all ready being discussed
For subgroup P1 processors
execute code on P1
skip code on anything else
execute other code (ALL) like (normal) HPF code.
task region
ON P1 ... ENDON
ON P2 ... ENDON
end region
task region
DO i-1,100
ON P1 read and compute (A1) ENDON
A2 = A1
ON P2 use A2, print something ENDON
end region
Note this proposal assumed that there is named
subgroups for allocation of data. This is impacted by
earlier group votes about processor subgroups and
names.
on home (P(lo)) call F
on home (P(hi)) call G
This is ok as long as there is nothing touched by both that is written
by one of them (causing some wait). There is substantial analysis
required for compilers to determine this.
A= F(A) 1:3
B=A 1:6
B=G(B) 4:6
C=B 4:9
D=W(C) 7:9
where this is
in a loop and the x:y following a statement just represents which
processors are executing that line (are in the subgroup named for the
region).
Carl
Offner --- 2nd reading Subroutine interfaces
24/48 change "three" to "two"
25/7-11 delete (explanation of descriptive syntax
16/39 delete *(dist-format-list) in H308
26/47 delete *processors-name in H311
27/18-22 delete (2 constraints)
33/22 delete *align-target [(align-subscript-list)] in H320
33/47-48 delete (constraint)
34/1 delete (constraint)
45/45 - 46/3 example now need explicit interface
Piyush Mehrotra presented one more idea for
consideration.
Facilities for redistribution:
Dynamic [(possible-distr-list)]
# of items is same as rank of array
possible -distr is (dist-item-list)
dist-item is BLOCK[()}
or CYCLIC[()]
or var-block or indirect or * or ALL
Dynamic ((block,cyclic), All, *, Indirect)
restricts the kind of things each dimension of a
Dynamic 4-dimensional array can point to.
Finally, some administrative issues were addressed. If
we meet at the Bristol instead of this hotel in the
case that an Arlington hotel is not available? 3 - 7 -
many.
Mary reported that the SC95 BOF is scheduled for
Wednesday evening - and that there is a WWW page for
the BOF accessible from the SC95 home page. At the
November meeting we will determine the specific agenda.
The proposal schedules for next meetings was reviewed.
Following are the goals for proposal processing.
group E 1st reading 2nd reading
kernel Sept. Nov
interop Sept. Nov
format Nov Jan.
(straw poll in Sept.)
explicit interf July Nov.
SPMD-HPF Nov Jan.
local_global func Sept Nov..
group D 1st reading 2nd reading
gen block July Sept.
irreg map July Nov
dist exten Sept. (Nov) Nov
seq. nonmap Nov Jan.
subsets Sept. Nov
derived type Sept. Nov
dist ranges Nov Jan.
shadow width Sept. Nov
OOC Nov Jan.
group C 1st reading 2nd reading
async July Nov
on Sept. Nov
reduc Sept. Nov
task Nov Jan.
gen transpose Nov Jan.
Also proposal may be needed to clarify
number_of_processors.
BREAK for 15 minutes
Alok Choudhary made a presentation of a proposal for
"out of core": OOC
(Ken comments to Rob that it couldn't be sent to HPFF-
"core" because it is "out-of-core".)
distribution of OOC arrays? Any restrictions?
loading ooc arrays?
e.g. initialization, presumable data comes
>from some file(s)
given large data sets, parallelism, is
important in loading/writing.
management of ooc arrays should be hidden
>from user
Given those questions:
!hpf$ processors P(2,2)
!hpf$ template T(100,100)
!hpf$ distribute T(block,block) onto P
!hpf$ out-of-core:T
!hpf$ Align with T: A,B,C
compiler can use this info to set up files using
whatever model it wishes to use ... shared, local-
distributed ...
Call popen (20, "huge0data', unformatted)
call file-info (20, "forge-data')
! get information about file
call pread(20,A) ! in parallel
call file-info(20,'huge-data')
! get information about file
call pread(20,a) ! in parallel
Discussion: F95 doesn't fix the NEXTREC as we
requested in HPFF94 - default integer problem. X3J3
turned it down. Jerry explained his difficulty in
constructing examples to see the need. Some vendors
think it would cause difficulties, and the vendor who
was the primary requester is no longer represented on
X3J3. We decided not to take any further action.
There was a brief discussion of what can be done to
foster HPF acceptance. Mary presented a list is
issues that she has prepared for Ken earlier.
What language features will help?
(C interface, etc.)
Role of F90 acceptance / use?
Getting HPF into education system ...
What new parallel systems to universities have?
Do they have HPF yet?
How will HPF be attractive for SMP clusters?
Is compiler performance ready yet?
Digital's approach that HPF is just extra features
of regular Fortran compiler
Role of basic/advanced language idea - for
implementations AND for understanding/teaching.
As part of this discussion, David Loveman presented a
proposal for a major document reorganization for the
HPF 2 document.
Main Points
introduce a part structure
show the kernel features in part of the document
labeled HPF
rest of stuff in hpf-extended
hpf-conforming means have the basic stuff
hpf subset is removed
move F95 features to annex ... use F95 syntax
move library procs, etc. to annex
put locals into extrinsics chapter
F77 features ... in separate section with HPF
with Fortran 77 in separate section hpf extended
features. {marketing says this needs to be in both
sections}
and lots of details ...
Straw poll on 0th reading .... 16 - 0 - 1
Special HPF report curtsey of Jaspal Subhlok (present
by Andy Meltzer).
And finally, the LPF Report: Andy Meltzer
Meeting adjourned at 12:00
Next meeting: Nov 1-3 Dallas area.
Attending the Sept. 95 HPFF Meeting:
Robert Babb U. of Denver babb@cs.du.edu
Scott Baden UCSD baden@cs.ucsd.edu
Bob Boland LANL wrb@lanl.gov
Zeki Bozkus The Portland Group zeki@pgroup.com
Alok Choudhary Syracuse U. choudhar@cat.syr.edu
Ian Foster ANL itf@mcs.anl.gov
Tom Haupt Syracuse U. haupt@npac.syr.edu
Ken Kennedy Rice U./CRPC ken@rice.edu
Charles Koelbel Rice U. chk@cs.rice.edu
John Levesque APR levesque@apri.com
David Loveman Digital loveman@msbcs.enet.dec.com
Piyush Mehrotra ICASE pm@icase.edu
Andy Meltzer Cray Research meltzer@cray.com
Carol Munroe Thinking Machines munroe@think.com
Carl Offner Digital offner@hpc.pko.dec.com
Guy Robinson University of Vienna robinson@vcpc.univie.ac.at
P. Sadayappan Ohio State U. saday@cis.ohio-state.edu
Joel Saltz U. of Maryland saltz@cs.umd.edu
Rob Schreiber RIACS schreiber@riacs.edu
Yosiki Seo NEC
Henk Sips TNO/Delft U of Tech. henk@cp.tn.tudelft.nl
Guy Steele Sun Microsystems guy.steele@east.sun.com
Jaspal Subhlok Carnegie Mellon jass@cs.emu.edu
Jerry Wagener Oklahoma U.,X3J3 jwagener@cs.uoknor.edu
Joel Williamson HP/Convex joelw@mozart.convex.com
Henry Zongaro IBM Canada zongaro@vnet.ibm.com
Mary Zosel LLNL zosel@llnl.gov