Center for Computing Research
The CCR collaborates on innovations in tool development, component development, and scalable algorithm research with partners and customers around the world through open source projects. Current software projects focus on enabling technologies for scientific computing in areas such as machine learning, graph algorithms, cognitive modeling, visualization, optimization, large-scale multi-physics simulation, HPC miniapplications, HPC system simulation and HPC system software.
Before a calculation can be performed on a parallel computer, it must first be decomposed into tasks which are assigned to different processors. Efficient use of the machine requires that each processor have about the same amount of work to do and that the quantity of interprocessor communication is kept small. Finding an optimal decomposition is provably hard, but due to its practical importance, a great deal of effort has been devoted to developing heuristics for this problem.
The decomposition problem can be addressed in terms of graph partitioning. Rob Leland and I have developed a variety of algorithms for graph partitioning and implemented them into a package we call Chaco. The code is being used at most of the major parallel computing centers around the world to simplify the development of parallel applications, and to ensure that high performance is obtained. Chaco has contributed to a wide variety of computational studies including investigation of the molecular structure of liquid crystals, evaluating the design of a chemical vapor deposition reactor and modeling automobile collisions.
The algorithms developed for Chaco have also been successfully applied to a number of problems which have nothing to do with parallel computing. These include the determination of genomic sequences (a critical part of the Human Genome Project), the design of space-efficient circuit placements, organization of databases for efficient retrieval, and ordering of sparse matrices for efficient factorization. The code can also be used more generally to identify data locality.
Chaco contains a wide variety of algorithms and options, many of which were invented by the authors. Some of the algorithms exploit the geometry of the mesh, others its local connectivity or its global structure as captured by eigenvectors of a related matrix. These methods can be mixed and matched in several ways, and combinations often prove to be more effective than any single technique in isolation. All these algorithms are accessed via a simple user interface, or a call from other software. Innovations in Chaco include
- Development of multilevel graph partitioning. This widely imitated approach has become the premiere algorithm combining very high quality with short calculation times.
- Extension of spectral partitioning to enable the use of 2 or 3 Laplacian eigenvectors to quadrisect of octasect a graph.
- Highly efficient and robust eigensolvers for use with spectral graph algorithms.
- Generalization of the Kernighan-Lin/Fiduccia-Mattheyses algorithm to handle weighted graphs, arbitrary number of sets and lazy initiation.
- Development of skewed partitioning to improve the mapping of a graph onto a target parallel architecture.
- Various post-processing options to improve partitions in a number of ways.
Chaco-1.0 was released in 1993 and the substantially enhanced Chaco-2.0 appeared in October 1995. The current version of the code is available under a GNU open source license, and can be obtained here. After gunzipping and untarring the code, you will find three subdirectories. The first subdirectory, "doc", contains several postscript files, including the important "Users_Guide.ps". We encourage you to print out and read this guide (or at least the quick overview section and the introduction) before making a serious effort to use the code. The second subdirectory, "code", contains the program makefile and various subdirectories with the actual source code. The makefile places the executable module (called "chaco") in the third subdirectory, "exec". The "exec" subdirectory also contains several sample input files to get you started.
Associated Projects: Graph Partitioning and Load BalancingSoftware downloads
Contact: Hendrickson, Bruce A., firstname.lastname@example.org
CIME is the Common Infrastructure for Modelling the Earth. It is the full-featured software engineering system for global Earth system or climate models. CIME is a set Python scripts configured with XML data files as well as Fortran soure code, and owns the model configuration, build system, test harness, test suites, portability to many specific HPC platforms, input data set management, results archiving.CIME is jointly developed by NCAR, Sandia, and Argonne, and is the software engineering system for both the E3SM and CESM global Earth system models.
Associated Projects: E3SM - Energy Exascale Earth System Model
Contact: Foucar, James G., email@example.com
The Compadre Toolkit provides a performance portable solution for the parallel evaluation of computationally dense kernels. The toolkit specifically targets the Generalized Moving Least Squares (GMLS) approach, which requires the inversion of small dense matrices. The result is a set of weights that provide the information needed for remap or entries that constitute the rows of some globally sparse matrix.
This toolkit focuses on the 'on-node' aspects of meshless PDE solution and remap, namely the parallel construction of small dense matrices and their inversion. What it does not provide is the tools for managing fields, inverting globally sparse matrices, or neighbor search that requires orchestration over many MPI processes. This toolkit is designed to be easily dropped-in to an existing MPI (or serial) based framework for PDE solution or remap, with minimal dependencies (Kokkos and either Cuda Toolkit or LAPACK).
Contact: Paul Kuberry, firstname.lastname@example.org
CrossSim is a crossbar simulator designed to model resistive memory crossbars for both neuromorphic computing and (in a future release) digital memories. It provides a clean python API so that different algorithms can be built upon crossbars while modeling realistic device properties and variability. The crossbar can be modeled using multiple fast approximate numerical models including both analytic noise models as well as experimentally derived lookup tables. A slower, but more accurate circuit simulation of the devices using the parallel spice simulator Xyce is also being developed and will be included in a future release.
Contact: Sapan Agarwal, email@example.com
D2T - Doubly Distributed Transactions
Typical distributed transactions are a single client and multiple servers. For the supercomputing simulations, we have multiple clients to multiple servers when attempting to do atomic actions, hence doubly distributed transactions.
This project created a library that can handle scalably offering a two-phase commit style transaction for both storage oriented operations and also for more complex system operations like reconfiguring or redeploying services as atomic operations. The code has been released and is hosted at github.
Contact: lofstead, Jay, firstname.lastname@example.org
Dakota: Optimization and Uncertainty Quantification Algorithms for Design Exploration and Simulation Credibility.
The Dakota toolkit provides a flexible, extensible interface between analysis codes and iterative systems analysis methods. Dakota contains algorithms for:
· optimization with gradient and nongradient-based methods;
· uncertainty quantification with sampling, reliability, stochastic expansion, and epistemic methods;
· parameter estimation with nonlinear least squares methods; and
· sensitivity/variance analysis with design of experiments and parameter study methods.
These capabilities may be used on their own or as components within advanced strategies such as hybrid optimization, surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty.
Contact: Adams, Brian M., email@example.com
DGM provides a software framework for forward modeling and optimization of partial differential equations using the discontinuous Galerkin method (DGM). Examples are provided for solution to generic Navier-Stokes (with and without subgrid-scale turbulence models), Euler, Burgers, Poisson, Helmholtz, Darcy, linear transport, and Advection Diffusion equations.
DGM supports a discontinuous Galerkin discretization in space using mixed nodal and modal representations with a variety of time advancement methods. Includes modeling capabilities for boundary conditions, source terms, and material models with the ability to solve optimization problems for parameters describing each. Also includes the ability to perform variational multiscale modeling for turbulence simulation.
For more information, send email to contact listed below.
Contact: Collis, Samuel Scott, firstname.lastname@example.org
Dynamic is a simple software tool, written in Python 3, for simulating networks of continuous dynamic variables linked by interaction Hamiltonians.
Contact: Frank, Michael P (Mike), email@example.com
E3SM is an Earth System Model being developed by the DOE Energy Exascale Earth System Model (E3SM) project. E3SM Version 1 is scheduled to be released in 2018. The E3SM atmosphere model runs with the spectral element dynamical core from HOMME, upgraded to include new aerosol and cloud physics and improved convection and treatment of the pressure gradient term, and a formulation for elevation classes to better handle atmosphere and land processes more realistically in the vicinity of topography. The E3SM Land Model (ALM) is based on Community Land Model (CLM) version 4.5 updated to include a full suite of new biochemistry and VIC hydrology and dynamic land units and extensions to couple to land ice sheets. The E3SM ocean, sea ice and land ice components will be built on the MPAS framework.
Contact: Taylor, Mark A., firstname.lastname@example.org
EMPRESS - Metadata management for scientific simulations
With the growth of simulation data, finding relevant data within the sea of raw data can be a much more daunting task than running the simulation at scale in the first place. EMPRESS provides a separate metadata system allowing storing detailed run information as well as user-defined at runtime different data characteristics and associate those characteristics with a variable, a timestep, or run.
It has been published at PDSW-DISCS @ SC17 and SC18 and is available as open source on github.com
Associated Projects: SIRIUS: Science-driven Data Management for Multi-tiered Storage
Contact: Jay Lofstead, email@example.com
Genten: Software for Generalized Tensor Decompositions
Tensors, or multidimensional arrays, are a powerful mathematical means of describing multiway data. This software provides computational means for decomposing or approximating a given tensor in terms of smaller tensors of lower dimension, focusing on decomposition of large, sparse tensors. These techniques have applications in many scientific areas, including signal processing, linear algebra, computer vision, numerical analysis, data mining, graph analysis, neuroscience and more. The software is designed to take advantage of parallelism present in emerging computer architectures such has multi-core CPUs, many-core accelerators such as the Intel Xeon Phi, and computation-oriented GPUs to enable efficient processing of large tensors.
Contact: Phipps, Eric T., firstname.lastname@example.org
The Image Composition Engine for Tiles (IceT) is a high-performance sort-last parallel rendering library. In addition to providing accelerated rendering for a standard display, IceT provides the unique ability to generate images for tiled displays. The overall resolution of the display may be several times larger than any viewport that may be rendered by a single machine.
Contact: Moreland, Kenneth D. (Ken), email@example.com
Kitten Lightweight Kernel
Kitten is a current-generation lightweight kernel (LWK) compute node operating system designed for large-scale parallel computing systems. Kitten is the latest in a long line of successful LWKs, including SUNMOS, Puma, Cougar, and Catamount. Kitten distinguishes itself from these prior LWKs by providing a Linux-compatible user environment, a more modern and extendable open-source codebase, and a virtual machine monitor capability via Palacios that allows full-featured guest operating systems to be loaded on-demand.
Associated Projects: Kitten Lightweight Kernel
Contact: Pedretti, Kevin, firstname.lastname@example.org
Modern high performance computing (HPC) nodes have diverse and heterogeneous types of cores and memory. For applications and domain-specific libraries/languages to scale, port, and perform well on these next generation architectures, their on-node algorithms must be re-engineered for thread scalability and performance portability. The Kokkos programming model and C++ library implementation helps HPC applications and domain libraries implement intra-node thread-scalable algorithms that are performance portable across diverse manycore architectures such as multicore CPUs, Intel Xeon Phi, NVIDIA GPU, and AMD GPU.
Contact: Trott, Christian Robert, email@example.com
MR-MPI is an open-source implementation of MapReduce written for distributed-memory parallel machines on top of standard MPI message passing.
MapReduce is the programming paradigm, popularized by Google, which is widely used for processing large data sets in parallel. Its salient feature is that if a task can be formulated as a MapReduce, the user can perform it in parallel without writing any parallel code. Instead the user writes serial functions (maps and reduces) which operate on portions of the data set independently. The data-movement and other necessary parallel operations can be performed in an application-independent fashion, in this case by the MR-MPI library.
The MR-MPI library was developed to solve informatics problems on traditional distributed-memory parallel computers. It includes C++ and C interfaces callable from most hi-level languages, and also a Python wrapper and our own OINK scripting wrapper, which can be used to develop and chain MapReduce operations together. MR-MPI and OINK are open-source codes, distributed freely under the terms of the modified Berkeley Software Distribution (BSD) license.
Contact: Plimpton, Steven J., firstname.lastname@example.org
Mesh Quality Improvement Toolkit
MESQUITE is a linkable software library that applies a variety of node-movement algorithms to improve the quality and/or adapt a given mesh. Mesquite uses advanced smoothing and optimization to:
Mesquite improves surface or volume meshes which are structured, unstructured, hybrid, or non-conformal. A variety of element types are permitted. Mesquite is designed to be as efficient as possible so that large meshes can be improved.
Contact: Mitchell, Scott A., email@example.com
MultiThreaded Graph Library (MTGL)
The MTGL is a generic graph library in the Boost style that does not rely on Boost. It runs on multicore machines via Qthreads or OpenMP, and also on the Cray XMT supercomputer. We are currently developing a new version based upon Kokkos that will leverage both multi-core and GPU-accelerated compute nodes, but we do not have a release yet.
Associated Projects: Kokkos
Contact: Berry, Jonathan, firstname.lastname@example.org
Omega_h is a C++ library that implements tetrahedron and triangle mesh adaptativity, with a focus on scalable HPC performance using (optionally) MPI, OpenMP, or CUDA. It is intended to provided adaptive functionality to existing simulation codes. Mesh adaptivity allows one to minimize both discretization error and number of degrees of freedom live during the simulation, as well as enabling moving object and evolving geometry simulations. Omega_h will do this for you in a way that is fast, memory-efficient, and portable across many different architectures.
Contact: Ibanez, Daniel Alejandro (Dan), email@example.com
ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or programmatically using ParaView’s batch processing capabilities.
ParaView was developed to analyze extremely large datasets using distributed memory computing resources. It can be run on supercomputers to analyze datasets of exascale size as well as on laptops for smaller data.
ParaView is maintained by Kitware, Inc. Sandia collaborates with Kitware to address our large-scale data analysis and visualization needs through ParaView.
Associated Projects: XVis
Contact: Moreland, Kenneth D. (Ken), firstname.lastname@example.org
Peridigm is Sandia's primary open-source computational peridynamics code. It is a massively-parallel simulation code for implicit and explicit multi-physics simulations centering on solid mechanics and material failure. Peridigm is a C++ code utilizing foundational software components from Sandia's Trilinos project and is fully compatible with the Cubit mesh generator and Paraview visualization codes.
Associated Projects: Albany
Contact: Littlewood, David John, email@example.com
Poblano is a Matlab toolbox of large-scale algorithms for unconstrained nonlinear optimization problems. The algorithms in Poblano require only first-order derivative information (e.g., gradients for scalar-valued objective functions), and therefore can scale to very large problems. The driving application for Poblano development has been tensor decompositions in data analysis applications (bibliometric analysis, social network analysis, chemometrics, etc.).
Poblano optimizers find local minimizers of scalar-valued objective functions taking vector inputs. The gradient (i.e., first derivative) of the objective function is required for all Poblano optimizers. The optimizers converge to a stationary point where the gradient is approximately zero. A line search satisfying the strong Wolfe conditions is used to guarantee global convergence of the Poblano optimizers. The optimization methods in Poblano include several nonlinear conjugate gradient methods (Fletcher-Reeves, Polak-Ribiere, Hestenes-Stiefel), a limited-memory quasi-Newton method using BFGS updates to approximate second-order derivative information, and a truncated Newton method using finite differences to approximate second-order derivative information.
Contact: Dunlavy, Daniel, firstname.lastname@example.org
Portals is a message passing interface intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals is designed to operate on scales ranging from a small number of commodity desktops connected via Ethernet to massively parallel platforms connected with custom designed networks.
Portals attempts to provide a cohesive set of building blocks with which a wide variety of upper layer protocols (such as MPI, SHMEM, or UPC) may be built, while maintaining high performance and scalability. Many interfaces today either provide a simple mapping to hardware which may not be conducive to building upper layer protocols (InfiniBand/Open Fabrics Enterprise Distribution) or which are only a good fit for a subset of upper layer protocols (PSM, MXM).
Contact: Grant, Ryan, email@example.com
Prove-It uses a powerful yet simple approach to theorem-proving. It is not designed with automation as the primary goal. The primary goal is flexibility in order to be able to follow, ideally, any valid and complete (indivisible) chain of reasoning. To that end, users can write additional Python scripts to make their own mathematical operations and LaTeX formatting rules. Axioms are statements that are taken to be true without proof. They can be added by the user at will in order to define their new mathematical objects and operations. Users can also add theorems that are to be proven. Theorems may be proven in any order. Theorem proofs are constructed in Jupyter notebooks (interactive Python sessions in a web browser). These notebooks render LaTeX-formatted mathematical expressions inline. Proofs are constructed by invoking axioms and other theorems and employing fundamental derivation steps (modus ponens, hypothetical reasoning, specialization, generalization, or axiom elimination). Axioms and theorems may be invoked indirectly via convenience methods or automation (methods that are automatically invoked when attempting to prove something or as side-effects when something is proven). Theorem proofs and their axiom/theorem dependencies are stored a kind of database (filesystem based). This database is used to prevent circular logic. It also allows users to track axioms and unproven theorems required by any particular proof. Convenience methods and automation tools may be added which utilize new theorems and aid future proofs. Mathematical objects and operations, axioms, and theorems are organized into built-in and user-defined packages.
Contact: Witzel, Wayne, firstname.lastname@example.org
Source code is available on GitHub as follows:
1. Install Git on your machine from github.com 2. Initialize the target directory with "git init" 3. Download the software using "git pull git://github.com/dwbarne/PYLOTDB" 4. Read README_first.rtf file using Microsoft Word and follow directions.
PYLOTDB software is completely open source. PYLOTDB consists of two codes, PylotDB and Co-PylotDB. The software allows users to access either local or remote MySQL servers and easily display table data in an intuitive user-friendly GUI format. In PylotDB, data can be filtered and table columns are indexed with radiobuttons and checkboxes so data can be easily selected for plotting or statistical analysis. Plotting capability includes X-Y, semi-log, log-log, Kiviat (radar charts), scatter, and scatter with polynomial curve fits of the data. Entire databases, selected databases, or selected tables in selected databases can be backed up and restored as desired. Interfaces are provided for individual entry edits and additions.
PylotDB's companion software Co-PylotDB is used to send data files to a user-selected database table. If data files are in YAML format, PylotDB can then automatically extract each entry in the file, expand the database table with new fields with names taken from the YAML entries, and insert the data in those fields for plotting and analysis. This is a tremendous time saver for analysts. This allows the cycle of "data capture to storage to analysis" to be completed in a matter of minutes.
Another powerful feature of PylotDB is the storage buffer where selected data from any table on any server can be stored and mathematically combined to generate new data not currently in any accessed table. The new data can then be plotted along with other data from the currently displayed table. This allows the user to generate desired data on the fly without the necessity of modifying any of the stored database tables.
PYLOTDB's dependencies include matplotlib, numpy, MySQLdb, Python MegaWidgets (Pmw), and YAML. All of these dependencies are included in the download for Windows machines only. They can easily be found on the web for Mac or Linux machines.
Sample databases are also included to help users get up to speed quickly.
This software is being used at Sandia National Laboratories for analyzing results from our computer performance modeling analysis, but these codes can be used for any type of analysis where database storage and analysis is desired.
Contact: Simon Hammond, email@example.com
Pyomo is a Python-based open-source software package that supports a diverse set of capabilities for formulating, solving, and analyzing optimization models.
A core capability of Pyomo is modeling structured optimization applications. The Pyomo software package can be used to define general symbolic problems, create specific problem instances, and solve these instances using standard commercial and open-source solvers. Pyomo's modeling objects are embedded within a full-featured high-level programming language with a rich set of supporting libraries that distinguishes it from other algebraic modeling languages such as AMPL, AIMMS and GAMS.
Pyomo supports a wide range of problem types, including:
- Linear programming
- Quadratic programming
- Nonlinear programming
- Mixed-integer linear programming
- Mixed-integer quadratic programming
- Mixed-integer nonlinear programming
- Mixed-integer stochastic programming
- Generalized disjunctive programming
- Differential algebraic equations
- Bilevel programming
- Mathematical programming with equilibrium constraints
Pyomo also supports iterative analysis and scripting within a full-featured programming language providing an effective framework for developing high-level optimization and analysis tools.
Associated Projects: Institute for the Design of Advanced Energy Systems (IDAES)
Contact: Hart, William E., firstname.lastname@example.org
The Qthreads API is designed to make using large numbers of threads convenient and easy, and to allow portable access to threading constructs used in massively parallel shared memory environments. The API maps well to both MTA-style threading and PIM-style threading, and we provide an implementation of this interface in both a standard SMP context as well as the SST context. The qthreads API provides access to full/empty-bit (FEB) semantics, where every word of memory can be marked either full or empty, and a thread can wait for any word to attain either state.
Associated Projects: Kokkos
Contact: Stark, Dylan, email@example.com
Rapid Optimization Library (ROL)
Rapid Optimization Library (ROL) is a C++ package for large-scale optimization. It is used for the solution of optimal design, optimal control and inverse problems in large-scale engineering applications. Other uses include mesh optimization and image processing.
Associated Projects: Trilinos
Contact: Denis Ridzal, firstname.lastname@example.org
SmartBlock reusable workflow components
SmartBlock offers a way to compose workflow glue components using generic functionality rather than having to write that code directly. The initial release has examples of a few different operators and how to compose them for different applications and data formats.
The underlying transport technology is ADIOS + FlexPath, but could be replaced with any other transport mechanism, such as Mercury or CCA.
Associated Projects: Decaf - High-Performance Decoupling of Tightly Coupled Data Flows
Contact: Lofstead, Jay, email@example.com
SparTen provides capabilities for computing reduced-dimension representations of sparse multidimensional count value data. The software consists of the data decompositions methods described in published journal papers. These decomposition methods consist of several numerical optimization methods (one based on a multiplicative update iterative approach, one based on quasi-Newton optimization, and one based on damped Newton optimization) for fitting the input data to a reduced-dimension model of the data with the lowest amount of error. The software also consists of generalized computation that leverages Kokkos to compute the reduced-data representations on multiple computer architectures, including multicore and GPU systems.
Contact: Dunlavy, Daniel, firstname.lastname@example.org
What began as a quick workaround to reduce the size of animated figures in Jupyter notebooks quickly grew into Toyplot, “the kid-sized plotting toolkit with grownup-sized goals”. In a nutshell, we think that scientists and engineers should expect more from their plots, from explicit support for reproducibility and open science to greater clarity and better aesthetics.
With most toolkits, interactivity means throwaway features like pan-and-zoom. For Toyplot, we’re exploring simple-but-effective ideas to address the questions a colleague might ask when viewing a graphic in the real world: “What’s the value at this weird peak?” “Where do those two series cross?” “Can I get a copy of the data?” We’re working on efficient animation that doesn’t compromise the quality of your graphic with compression artifacts; and interactive data cursors that display quantities of interest and descriptive statistics just by hovering the mouse. And since the raw data is already implicitly embedded in a graphic, why not support reproducibility by making it easy to export, so the viewer can work with it themselves? That’s why Toyplot has a context menu to export raw data from a figure in CSV format.
Last but definitely not least: Toyplot fully embraces principles and best practices for clarity and aesthetics in data graphics that are well-established by the visualization community, yet sadly lacking in contemporary plotting libraries. Toyplot has beautiful color palettes and sensible default styling that minimize chartjunk and maximize data ink out of the box, not as afterthoughts or addons.
Contact: Tim Shead, email@example.com
The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.
Contact: Heroux, Michael A., firstname.lastname@example.org
One of the biggest recent changes in high-performance computing is the increasing use of accelerators. Accelerators contain processing cores that independently are inferior to a core in a typical CPU, but these cores are replicated and grouped such that their aggregate execution provides a very high computation rate at a much lower power. Current and future CPU processors also require much more explicit parallelism. Each successive version of the hardware packs more cores into each processor, and technologies like hyperthreading and vector operations require even more parallel processing to leverage each core’s full potential.
VTK-m is a toolkit of scientific visualization algorithms for emerging processor architectures. VTK-m supports the fine-grained concurrency for data analysis and visualization algorithms required to drive extreme scale computing by providing abstract models for data and execution that can be applied to a variety of algorithms across many different processor architectures.
Associated Projects: XVis
Contact: Moreland, Kenneth D. (Ken), email@example.com
Zoltan is a toolkit of parallel algorithms for dynamic load balancing, geometric and hypergraph-based partitioning, graph coloring, matrix ordering, and distributed directories.
Zoltan is open-source software, distributed both as part of the Trilinos solver framework and as a stand-alone toolkit.
Contact: Devine, Karen D., firstname.lastname@example.org