Example: 2-D FFT

Next: Approved Extension for Asynchronous Up: The TASK_REGION Construct Previous: SPMD or MIMD code

Example: 2-D FFT

This section shows the use of task parallelism to build a pipelined data-parallel 2-dimensional FFT and illustrates the compilation of task parallelism by showing SPMD code generated from the HPF program.

The basic sequential 2DFFT code is as follows:

      REAL, DIMENSION(n,n) :: a1, a2

      DO WHILE(.true.)
          READ (unit = 1, end = 100) a1
          CALL rowffts(a1)
          a2 = a1
          CALL colffts(a2)
          WRITE (unit = 2) a2
          CYCLE
100       CONTINUE
          EXIT
      END DO

To write a pipelined task and data parallel 2D FFT in HPF, the code is slightly modified and several HPF directives are added. First, variables a1 and a2 are distributed onto disjoint subsets of processors, and then a task region is used to create two lexical tasks to perform rowffts and colffts on different subsets of processors. The assignment a2 = a1 in the task region specifies the transfer of data between the tasks. A new variable done1 is introduced to store the termination condition. The modified code is as follows:

        REAL, DIMENSION(n,n) :: a1,a2
        LOGICAL done1
!HPF$   PROCESSORS procs(8)

!HPF$   DISTRIBUTE a1(block,*) ONTO procs(1:4)
!HPF$   DISTRIBUTE a2(*,block) ONTO procs(5:8)

!HPF$   TEMPLATE, DIMENSION(4), DISTRIBUTE(BLOCK) ONTO procs(1:4) :: td1
!HPF$   ALIGN WITH td1(*) :: done1

!HPF$   TASK_REGION
        done1 = .false.
        DO WHILE (.true.)
!HPF$       ON (procs(1:4)) BEGIN, RESIDENT
              READ (unit = iu,end=100) a1
              CALL rowffts(a1)
              GOTO 101
    100       done1 = .true.
    101       CONTINUE
!HPF$       END ON

            IF (done1) EXIT
            a2 = a1

!HPF$       ON (procs(5:8)) BEGIN, RESIDENT
               CALL colffts(a2)
               WRITE(unit = ou) a2
!HPF$       END ON
        END DO
!HPF$   END TASK_REGION

Finally, we show simplified SPMD code generated for each processor. We assume a message passing model where sends are asynchronous and nonblocking and receives block until the data is available. We use a simple memory model where variable declarations are identical across all processors even though some variables will be referenced only on subsets of the processors. A shadow variable done1_copy is created by the compiler to transfer information from processor subset 1 to processor subset 2 about termination of processing. The code is as follows:

        REAL DIMENSION(n/4,n) :: a1
        REAL DIMENSION(n,n/4) :: a2
        LOGICAL done1

C       Following are compiler generated variables
        LOGICAL done1_copy
        LOGICAL inset1, inset2
C
C       Following magic compiler function call is to set the variables
C       inset1 and  inset2 to .true. for subset 1 and subset 2 processors
C       respectively, and .false. otherwise.
C
        CALL initialize_tasksets(inset1,inset2)

C       Code for processor subset 1
        IF (inset1)
           done1 = .false.
           DO WHILE (.true.)

C       Read is left unchanged as the code depends on the I/O model
              READ (unit = 1,end=100) a1

              CALL rowffts(a1)
              GOTO 101
    100       done1 = .true.
    101       CONTINUE
              _send(done1,procs(5:8))
              IF (done1) EXIT
              _send(a1,proces(5:8))
           END DO
         END IF

C       Code for processor subset 2
        IF (inset2)
           DO WHILE(.true.)
              _receive(done1_copy,procs(1:4))
              IF (local_done1) EXIT
              _receive(a2,procs(1:4))
              CALL colffts(a2)

C       Write is left unchanged as the code depends on the I/O model.
              WRITE (unit = 2) a2
           END DO
        END IF

_send and _receive are communication calls to transfer variables between subsets of processors. Program execution until the end of input is as follows. Subset 1 processors repeatedly read input, compute rowffts, and send the computed output as well as done1 flag, which normally has the value .false., to subset 2 processors. The subset 2 processors receive the flag and the data set, compute colffts and write the results to the output. When the end of input is reached, subset 1 processor set the value flag done1 to .true., send it and terminate execution. Subset 2 processors receive the flag, recognize that the end of input has been reached, and terminate execution.

Next: Approved Extension for Asynchronous Up: The TASK_REGION Construct Previous: SPMD or MIMD code

[ Contact Us | HiPerSoft | Computer Science ]