A large HPF program is typically constructed as a sequence of calls to subroutines and functions that implement different aspects of program logic. In the terminology used in Chapter 4, the program is a sequential composition of program components. As discussed in that chapter, one critical issue that arises when using sequential composition is the distribution of data structures that are shared by components.
Consider what happens when a subroutine is called in an HPF program. For a particular computer and problem size, there is presumably a distribution of that subroutine's dummy arguments and local variables that is optimal in the sense that it minimizes execution time in that subroutine. However, this optimal distribution may not correspond to the distribution specified in the calling program for variables passed as arguments. Hence, we have a choice of two different strategies at a subroutine interface. These strategies, and the HPF mechanisms that support them, are as follows.
As noted in Chapter 4, several tradeoffs must be evaluated when determining which strategy to adopt in a particular circumstance. The cost of the remapping inherent in strategy 1 should be weighed against the performance degradation that may occur if strategy 2 is used. Similarly, the effort required to optimize a subroutine for a particular distribution must be weighed against the subroutine's execution cost and frequency of use. These tradeoffs are more complex if a subroutine may be used in several contexts. In some cases, it may be worthwhile for a subroutine to incorporate different code for different distributions.
Strategy 1 is straightforward to apply. Ordinary distribution directives are applied to dummy arguments. As for any other variable, these directives recommend that the requested distribution hold. Any necessary data movement is performed automatically when the subroutine or function is called. (In the absence of a DISTRIBUTE or ALIGN directive for a dummy argument, the compiler may choose to use any distribution or alignment.) Any redistribution is undone upon return from the subroutine, so any data movement costs introduced in this way are incurred twice. The exception to this rule are arguments used for input or output only, as specified by the use of the F90 intent directive.
Program 7.5 illustrates some of the issues involved in strategy 1. Arrays X and Y are distributed by rows and columns in the calling program, respectively, while the dummy argument Z of the subroutine fft is distributed by columns. Hence, the first call to fft requires that two matrix transpose operations be performed to convert from one distribution to the other---one upon entry to and one upon exit from the routine. In contrast, the second call to fft does not require any data movement because the array Y is already distributed appropriately.
The second strategy is supported by the INHERIT directive, which, however, does not form part of the HPF subset because of the difficulty of generating code that can handle multiple distributions. For that reason, we do not consider this language feature in detail.
The following code fragment illustrates the use of INHERIT. This is an alternative version of the fft routine in Program 7.5. The INHERIT directive indicates that no remapping is to occur; hence, the two calls to fft in Program 7.5 will execute with (BLOCK,*) and (*,BLOCK) distribution, respectively.
subroutine fft(n, Z)real Z(n,n)
...
!HPF$ INHERIT Z ! Z has parent mapping
...
end
© Copyright 1995 by Ian Foster