# Center for Computing Research (CCR)

# Center for Computing Research

### Project

Search in Projects: |
|||

### Parallel Scientific Computing

With a number of different collaborators, I have devised new parallel algorithms for a variety of problems in scientific computing. These include many-body calculations, solid dynamics with contacts, and some radiation transport methods. These projects are sketched below. I have also done a fair amount of work on parallel linear algebra.

Parallel Radiation Transport

Discrete ordinates methods for the simulation of radiation transport lead to a kernel computation involving a set of "sweeps" through the mesh, one for each angle in the ordinate set. These sweeps are a challenge to parallelize effectively for unstructured grids. The first paper below describes our work on this problem. To perform these sweeps (and also in many other situations), it is necessary to decompose a directed graph into strongly connected components. The classic serial algorithm for this problem is not amenable to parallelism. In the second paper we propose and analyze an alternative approach which is more parallelizable. The third paper describes an implementation of this algorithm and its performance on two parallel computers.

Papers:

- Parallel Sn Sweeps on Unstructured Grids: Algorithms for Prioritization, Grid Partitioning and Cycle Detection Steve Plimpton, Bruce Hendrickson, Shawn Burns, Will McLendon III and Lawrence Rauchwerger. In Nuclear Science & Engineering 150:267-283, 2005 (Previous version in Proc. SC'00).

- On Identifying Strongly Connected Components in Parallel Lisa Fleischer, Bruce Hendrickson and Ali Pinar. In Proc. Irregular'2000, Lecture Notes in Computer Science.

- Finding Strongly Connected Components in Distributed Graphs, William C. McLendon III, Bruce Hendrickson, Steve Plimpton and Lawrence Rauchwerger. J. Parallel & Distributed Computing 65(8):901-910, 2005 (Previous version in Proc. 10th SIAM Conf. Parallel Processing for Scientific Computing, March '01).

Paper

Grid Transfer

A number of computational procedures employ multiple grids on which solutions are computed. For example, in multi-physics simulations a primary grid may be used to compute mechanical deformation of an object while a secondary grid is used for thermal conduction calculations. When modeling coupled thermo-mechanical effects, solution data must be interpolated back and forth between the grids each timestep. On a parallel machine, this grid transfer operation can be challenging if the two grids are decomposed to processors differently for reasons of computational efficiency. If the grids move or adapt separately, the complexity of the operation is compounded. With colleagues, I have developed a grid transfer algorithm suitable for massively parallel codes which use multiple grids. It uses a rendezvous technique wherein a third decomposition is used to search for elements in one grid that contain nodal points of the other. This has the advantage of enabling the grid transfer to be load-balanced separately from the remainder of the computations.

Papers:

- A Parallel Rendezvous Algorithm for Interpolation Between Multiple Grids, Steve Plimpton, Bruce Hendrickson and James Stewart. J. Parallel & Distributed Computing 64:266-276 (2004) (Previous version in Proc. SC'98).

Paper, Abstract

Dynamic Load Balancing

In many applications, computational requirements vary during the course of a calculation. To run such calculations effectively on a parallel computer, tasks must be periodically redistributed among processors to keep the workload balanced. A number of different algorithms have been proposed for this dynamic load balancing problem. The first paper below is a critical review of the algorithmic options and the software challenges. The second paper, somewhat older, describes an attempt to parallelize a graph partition approach.

Dynamic load balancing is fundamentally harder than static partitioning. Algorithms and implementations must run quickly in parallel without consuming much memory. Interfaces need to be subroutine calls instead of file transfers. And since the data is already distributed, the new partition should be similar to the current one to minimize the amount of data that needs to be redistributed. With Karen Devine and a number of other collaborators, we are working to build a tool called Zoltan to address this plethora of challenges. Zoltan is designed to be extensible to different applications, algorithms and parallel architectures. For example, it is data-structure neutral - an object oriented interface separates the application data structures from those of Zoltan. The tool will also support a very general model of a parallel computer, facilitating the use of heterogeneous machines.

Papers:

- Dynamic Load Balancing in Computational Mechanics, Bruce Hendrickson and Karen Devine. Comp. Meth. Applied Mechanics & Engineering. 184(2-4):485-500, 2000.

Paper, Abstract

- Parallel Algorithms for Dynamically Partitioning Unstructured Grids, Pedro Diniz, Steve Plimpton, Bruce Hendrickson and Robert Leland. In Proc. 7th SIAM Conf. Parallel Proc., 1995.

Paper, Abstract

Many Body Calculations

My work on parallel algorithms for many body calculations has focused on problems with short-range forces in which the computational effort scales linearly with the number of particles (eg. molecular dynamics). Although the techniques I developed with Steve Plimpton apply to any direct simulation, problems with long-range forces are more efficiently addressed by any of several approximate methods.

My primary contribution was the force decomposition algorithm in which we divide the computation among processors in an unusual way to reduce the communication cost. A nice survey of of parallel approaches for molecular dynamics calculations can be found in the first paper by Steve Plimpton. The others introduce and extend the force decomposition idea.

Papers:

- Fast Parallel Algorithms for Short-Range Molecular Dynamics, Steve J. Plimpton, J. Comp. Phys., 117:1-19, March 1995.

Paper

- Parallel Many-Body Simulations Without All-to-All Communication, Bruce Hendrickson and Steve J. Plimpton. J. Par. Dist. Comput., 27(1):15-25, 1995.

Paper, Abstract

- Parallel Molecular Dynamics With the Embedded Atom Method, Steve J. Plimpton and Bruce Hendrickson. In Materials Theory and Modeling, edited by J. Broughton, P. Bristowe, and J. Newsam, (MRS Proceedings 291, Pittsburgh, PA, 1993), at fall MRS meeting, Boston, MA, November 1992.

Paper Abstract

- A New Parallel Method for Molecular Dynamics Simulation of Macromolecular Systems, Steve J. Plimpton and Bruce Hendrickson. J. Comp. Chemistry, 17(3):326-337, 1996.

Paper Abstract

Parallel Contact Detection

Imagine simulating a car crash. You would probably choose to use Lagrangian grids which deform with the physics as the car distorts. When the fender impacts the radiator, there will be new forces that need to be modeled. In the simulation, this will correspond to the mesh intersecting itself. Detecting these contacts is an essential component of solid dynamics simulations, but it has proved difficult to parallelize effectively. Previous parallelization efforts have generally used a single decomposition of the mesh which is used for both the finite element analysis and the contact detection. However, this can lead to severe load imbalance and large volumes of communication.

We have instead adopted an approach in which we use two different decompositions for the two computational steps. We use a static, graph-based decomposition for the finite element calculation, and a dynamic, geometric decomposition for the contact detection. Although there is some cost associated with communicating between the two decompositions, a careful implementation keeps this small. And once we've paid this cost, we get nearly perfect load balance allowing for the first algorithms which scales to hundreds and even thousands of processors. This algorithm is implemented in Sandia's PRONTO code. The parallel algorithms in PRONTO are described in greater detail in the first two papers below. The capabilities of the code are presented in the third paper, which has earned parallel PRONTO a status as finalist for the 1997 Gordon Bell Prize.

Papers:

- Transient Solid Dynamics Simulations on the Sandia/Intel Teraflop Computer, Steve Attaway, Ted Barragy, Kevin Brown, David Gardner, Bruce Hendrickson, Steve Plimpton and Courtenay Vaughan. In Proc. SC'97. (Finalist for the Gordon Bell Prize.)

Postscript , HTML, Abstract

- Transient Dynamics Simulations: Parallel Algorithms for Contact Detection and Smoothed Particle Hydrodynamics, Steve Plimpton, Steve Attaway, Bruce Hendrickson, Jeff Swegle, Courtenay Vaughan and David Gardner. J. Parallel Distrib. Comput. 50:104-122, 1998. (Earlier version in Proc. Supercomputing'96).

Paper, Abstract

- A New Parallel Algorithm for Contact Detection in Finite Element Methods, Bruce Hendrickson, Steve Plimpton, Steve Attaway, Courtenay Vaughan and David Gardner. In Proc. High Performance Computing '96.

Paper, Abstract

Smoothed Particle Hydrodynamics

When simulating phenomena involving high strains, meshes can become highly distorted and break down. To avoid these problems, one can use a gridless methodology known as smoothed particle hydrodynamics . Despite its name, this approach can model solid objects as well as fluids. Building on the ideas we devised for parallel contact detection, we have developed a parallelization of smoothed particle hydrodynamics.

Papers:

- Transient Dynamics Simulations: Parallel Algorithms for Contact Detection and Smoothed Particle Hydrodynamics, Steve Plimpton, Steve Attaway, Bruce Hendrickson, Jeff Swegle, Courtenay Vaughan and David Gardner. J. Parallel Distrib. Comput. 50:104-122, 1998. (Earlier version in Proc. Supercomputing'96).

Paper, Abstract

**Associated Software:** Zoltan

**Contact:** Hendrickson, Bruce A., bahendr@sandia.gov

SAND2015-2717 S