Our second case study illustrates the use of RPCs to implement asynchronous access to a distributed data structure. Programs 5.16 and 5.17 sketch a CC++ implementation of the parallel Fock matrix construction algorithm of Section 2.8. Recall that in this algorithm, P computation tasks must be able to read and write distributed arrays. The programs presented here achieve this capability by distributing the arrays over a set of processor objects. Computation threads, created one per processor object, operate on the distributed array by invoking RPCs implementing operations such as accumulate and read.
The distributed array itself is implemented by the class Fock and the processor object class FockNode, presented in Program 5.16. These are derived from the classes POArray and POArrayNode of Program 5.10, much as in the climate model of Program 5.11, and provide definitions for the virtual functions create_pobj and init_pobj. The derived classes defined in Program 5.16 are used to create the array of processor objects within which computation will occur. The data structures that implement the distributed array are allocated within these same processor objects, with each of the posize processor objects being assigned blocksize array elements. Notice how the initialization function for FockNode allocates and initializes the elements of the distributed array.
For brevity, Program 5.16 implements only an accumulate operation. This function is defined in Program 5.17. Notice how it issues an RPC to a remote processor object (number index/blocksize) requesting an operation on a specified sequence of values. The function invoked by the RPC ( accum_local) is an atomic function; this ensures that two concurrent accumulate operations do not produce meaningless results.
Having defined the classes Fock and FockNode, the implementation of the rest of the ccode is fairly straightforward. We first create and initialize P processor objects of type FockNode, as follows.
Fock darray(1024); // 1024 is block size darray.init(P, nodes); // P and nodes as usual
Then, we invoke on each processor object a task ( fock_build) responsible for performing one component of the Fock matrix computation. Each such task makes repeated calls to accumulate to place the results of its computation into the distributed array.
© Copyright 1995 by Ian Foster