Next:
Preface
Up:
Designing and Building Parallel Programs
Previous:
Designing and Building Parallel Programs
Contents
Preface
How to Use This Book
Acknowledgments
Terminology
Part I: Concepts
Part I: Concepts
1 Parallel Computers and Computation
1.1 Parallelism and Computing
1.1.1 Trends in Applications
1.1.2 Trends in Computer Design
1.1.3 Trends in Networking
1.1.4 Summary of Trends
1.2 A Parallel Machine Model
1.2.1 The Multicomputer
1.2.2 Other Machine Models
1.3 A Parallel Programming Model
1.3.1 Tasks and Channels
1.3.2 Other Programming Models
1.4 Parallel Algorithm Examples
1.4.1 Finite Differences
1.4.2 Pairwise Interactions
1.4.3 Search
1.4.4 Parameter Study
1.5 Summary
Exercises
Chapter Notes
2 Designing Parallel Algorithms
2.1 Methodical Design
2.2 Partitioning
2.2.1 Domain Decomposition
2.2.2 Functional Decomposition
2.2.3 Partitioning Design Checklist
2.3 Communication
2.3.1 Local Communication
2.3.2 Global Communication
Distributing Communication and Computation.
Uncovering Concurrency: Divide and Conquer.
2.3.3 Unstructured and Dynamic Communication
2.3.4 Asynchronous Communication
2.3.5 Communication Design Checklist
2.4 Agglomeration
2.4.1 Increasing Granularity
Surface-to-Volume Effects.
Replicating Computation.
Avoiding Communication.
2.4.2 Preserving Flexibility
2.4.3 Reducing Software Engineering Costs
2.4.4 Agglomeration Design Checklist
2.5 Mapping
2.5.1 Load-Balancing Algorithms
Recursive Bisection.
Local Algorithms.
Probabilistic Methods.
Cyclic Mappings.
2.5.2 Task-Scheduling Algorithms
Manager/Worker.
Hierarchical Manager/Worker.
Decentralized Schemes.
Termination Detection.
2.5.3 Mapping Design Checklist
2.6 Case Study: Atmosphere Model
2.6.1 Atmosphere Model Background
2.6.2 Atmosphere Model Algorithm Design
Partition.
Communication.
Agglomeration.
Mapping.
2.6.3 Atmosphere Model Summary
2.7 Case Study: Floorplan Optimization
2.7.1 Floorplan Background
2.7.2 Floorplan Algorithm Design
Partition.
Communication.
Agglomeration.
Mapping.
2.7.3 Floorplan Summary
2.8 Case Study: Computational Chemistry
2.8.1 Chemistry Background
2.8.2 Chemistry Algorithm Design
Partition.
Communication and Agglomeration.
Mapping.
2.8.3 Chemistry Summary
2.9 Summary
Exercises
Chapter Notes
3 A Quantitative Basis for Design
3.1 Defining Performance
3.2 Approaches to Performance Modeling
3.2.1 Amdahl's Law
3.2.2 Extrapolation from Observations
3.2.3 Asymptotic Analysis
3.3 Developing Models
3.3.1 Execution Time
Computation Time.
Communication Time.
Idle Time.
3.3.2 Efficiency and Speedup
3.4 Scalability Analysis
3.4.1 Scalability with Fixed Problem Size
3.4.2 Scalability with Scaled Problem Size
3.4.3 Execution Profiles
3.5 Experimental Studies
3.5.1 Experimental Design
3.5.2 Obtaining and Validating Experimental Data
3.5.3 Fitting Data to Models
3.6 Evaluating Implementations
3.6.1 Unaccounted-for Overhead
3.6.2 Speedup Anomalies
3.7 A Refined Communication Cost Model
3.7.1 Competition for Bandwidth
3.7.2 Interconnection Networks
Crossbar Switching Network.
Bus-based Networks.
Ethernet.
Mesh Networks.
Hypercube Network.
Multistage Interconnection Networks.
3.8 Input/Output
3.9 Case Study: Shortest-Path Algorithms
3.9.1 Floyd's Algorithm
Parallel Floyd 1.
Parallel Floyd 2.
3.9.2 Dijkstra's Algorithm
Parallel Dijkstra 1.
Parallel Dijkstra 2.
3.9.3 Shortest-Path Algorithms Summary
3.10 Summary
Exercises
Chapter Notes
4 Putting Components Together
4.1 Modular Design Review
Provide simple interfaces.
Ensure that modules hide information.
Use appropriate tools.
Design checklist.
4.2 Modularity and Parallel Computing
4.2.1 Data Distribution
4.2.2 Sequential Composition
4.2.3 Parallel Composition
4.2.4 Concurrent Composition
4.2.5 Design Rules
4.3 Performance Analysis
4.4 Case Study: Convolution
4.4.1 Components
4.4.2 Composing Components
4.4.3 Convolution Problem Summary
4.5 Case Study: Tuple Space
4.5.1 Application
4.5.2 Implementation
4.6 Case Study: Matrix Multiplication
4.6.1 Parallel Matrix-Matrix Multiplication
4.6.2 Redistribution Costs
4.6.3 A Systolic Algorithm
4.7 Summary
Exercises
Chapter Notes
Part II: Tools
Part II: Tools
5 Compositional C++
5.1 C++ Review
5.1.1 Strong Typing and Memory Management
5.1.2 Classes
5.1.3 Inheritance
5.2 CC++ Introduction
5.3 Concurrency
5.4 Locality
5.4.1 Processor Objects
5.4.2 Global Pointers
5.4.3 Thread Placement
5.5 Communication
5.5.1 Remote Operations
5.5.2 Synchronization
5.5.3 Mutual Exclusion
5.5.4 Data Transfer Functions
5.6 Asynchronous Communication
5.7 Determinism
5.8 Mapping
5.8.1 Processor Object Placement
5.8.2 Mapping Threads to Processor Objects
5.9 Modularity
5.10 Performance Issues
5.11 Case Study: Channel Library
5.12 Case Study: Fock Matrix Construction
5.13 Summary
Exercises
Chapter Notes
6 Fortran M
6.1 FM Introduction
6.2 Concurrency
6.2.1 Defining Processes
6.2.2 Creating Processes
6.3 Communication
6.3.1 Creating Channels
6.3.2 Sending Messages
6.3.3 Receiving Messages
6.4 Unstructured Communication
6.4.1 Many-to-One Communication
6.4.2 Many-to-Many Communication
6.4.3 Dynamic Channel Structures
6.5 Asynchronous Communication
6.6 Determinism
6.7 Argument Passing
6.7.1 Copying and Determinism
6.7.2 Avoiding Copying
6.8 Mapping
6.8.1 Virtual Computers
6.8.2 Process Placement
6.8.3 Submachines
6.9 Modularity
6.10 Performance Issues
6.11 Case Study: Fock Matrix Construction
6.12 Summary
Exercises
Chapter Notes
7 High Performance Fortran
7.1 Data Parallelism
7.1.1 Concurrency
7.1.2 Locality
7.1.3 Design
7.1.4 Data-Parallel Languages
7.2 Fortran 90
7.2.1 Array Assignment Statement
7.2.2 Array Intrinsic Functions
7.3 Data Distribution
7.3.1 Processors
7.3.2 Alignment
7.3.3 Distribution
7.4 Concurrency
7.4.1 The FORALL Statement
7.4.2 The INDEPENDENT Directive and Do-Loops
7.5 Dummy Arguments and Modularity
Strategy 1: Remap Arguments.
Strategy 2: Use Parent Mapping.
7.6 Other HPF Features
7.6.1 System Inquiry Intrinsic Functions
7.6.2 Storage and Sequence Association
7.6.3 HPF Features Not Covered
7.7 Performance Issues
7.7.1 HPF Compilation
7.7.2 Sequential Bottlenecks
7.7.3 Communication Costs
7.8 Case Study: Gaussian Elimination
7.9 Summary
Exercises
Chapter Notes
8 Message Passing Interface
8.1 The MPI Programming Model
8.2 MPI Basics
8.2.1 Language Bindings
C Language Binding.
Fortran Language Binding.
8.2.2 Determinism
8.3 Global Operations
8.3.1 Barrier
8.3.2 Data Movement
8.3.3 Reduction Operations
8.4 Asynchronous Communication
8.5 Modularity
8.5.1 Creating Communicators
8.5.2 Partitioning Processes
8.5.3 Communicating between Groups
8.6 Other MPI Features
8.6.1 Derived Datatypes
8.6.2 MPI Features Not Covered
8.7 Performance Issues
8.8 Case Study: Earth System Model
8.9 Summary
Exercises
Chapter Notes
9 Performance Tools
9.1 Performance Analysis
9.2 Data Collection
9.2.1 Profiles
9.2.2 Counters
9.2.3 Traces
9.2.4 Summary of Data Collection Tools
9.3 Data Transformation and Visualization
9.3.1 Profile and Counts
9.3.2 Traces
9.3.3 Data-Parallel Languages
9.4 Tools
9.4.1 Paragraph
9.4.2 Upshot
9.4.3 Pablo
9.4.4 Gauge
9.4.5 ParAide
9.4.6 IBM's Parallel Environment
9.4.7 AIMS
9.4.8 Custom Tools
9.5 Summary
Exercises
Chapter Notes
Part III: Resources
Part III: Resources
10 Random Numbers
10.1 Sequential Random Numbers
10.2 Parallel Random Numbers
10.3 Distributed Random Generators
10.3.1 The Random Tree Method
10.3.2 The Leapfrog Method
10.3.3 Modified Leapfrog
10.4 Summary
Exercises
Chapter Notes
11 Hypercube Algorithms
11.1 The Hypercube Template
11.2 Vector Reduction
11.3 Matrix Transposition
11.4 Mergesort
Compare-Exchange.
Parallel Merge.
Mergesort.
Performance
11.5 Summary
Exercises
Chapter Notes
12 Further Reading
References
Index
Preface
Terminology
© Copyright 1995 by
Ian Foster