Hydra

Principal Contact Person and Organization (including e-mail address):

F.R.Pearce@durham.ac.uk Durham University

Brief Description of Application:

A N-body+hydrodynamics code that includes many classic parallelisation problems. The long-range component of the gravitational force is solved using a 3-dimensional FFT and then the short range component and smoothed particle hydrodynamics gas forces are solved by a neighbour finding direct summation algorithm. Also included is a recursive, spatially adaptive grid refinement mechanism which solves the problems encountered in regions where the particles cluster heavily.

Number of Lines of Code: 4000

Target Platforms and HPF Compilers Used:

T3E and PGHPF

Coding Styles (data decompositions, computational methods):

BLOCK and CYCLIC distribution of data and arrays. Parallel FFT, parallel list sort. Task farming and fine grained load balancing.

Extrinsic Interfaces Used (and reasons):

None

Performance Information, if Available (including any possible comparisons to MPI and/or OpenMP):

In table 3 I show the scalability of the HPF code. Things look
       good, the same routines are tricky as for the Craft code.
       Overall the 16->64 PE scaling is 1.59 as opposed to 1.72 for the
       Craft. The problem area is the list sort. The refine time drops
       out of the scaling as it is typically only done every 10 steps.
       (I.e. the times for "refine" and "refinements", which are significant
       in the times below for just one iteration, end up being completely
       dominated by "mesh" and "shgrav" in a typical production run - DSM).

       Table 3: HPF code scalability
       PE's              4       8       16      32      64

       mesh             32.4    17.8     9.0     6.7     4.1
       list              7.3     9.2     9.7    13.3    10.7
       refine           12.1    13.4    14.4    18.1    18.9
       shgrav           71.9    39.6    21.9    14.2     6.5
       clist             2.4     2.9     3.1     4.0     3.4
       refinements      35.7    22.9    14.9    12.1    11.8


       Total           161.2   106.1    74.5    68.6    55.6

       Finally, timing the shmem+gas code on the same position
       yields table 5. This is approximately twice as long as
       the HPF code with refinements at the same point (see table 3).

       Table 5: shmem code, 16 PE's, T3E-900, No refinements

       mesh              0.95
       shgrav          141.9

       Total           153.1 (143.2 after 1st step)

Please comment on any aspects of the application that might be interesting, including any problems using HPF effectively:

Direct comparison to both Craft and shmem on a T3D/T3E. Identical Craft and HPF implementations.

URL

http://star-www.dur.ac.uk/~frazerp/virgo/virgo.html

[ Contact Us | HiPerSoft | Computer Science ]