Center for Computing Research (CCR)

Center for Computing Research

Scalable System Software, 01423

The Scalable System Software department traces its roots back to the early days of distributed memory massively parallel processing (MPP) systems of the late 1980's. During this time, Sandia established the viability of MPP systems, such as the nCUBE-10 and the Intel Paragon, in solving mission-critical applications using modeling and simulation. The department grew out of the need to design, develop, and deploy more efficient system software focused on the meeting the performance and scalability demands of these applications running on the largest and fastest computing systems in the world. The group became firmly established in the early 1990's when researchers at Sandia partnered with the University of New Mexico to develop a customized system software environment based on a lightweight compute node operating system designed specifically for large-scale, distributed memory, message-passing machines. This initial lightweight kernel environment was successfully deployed on several large production systems at Sandia and eventually evolved into the operating system that ran on the compute nodes of the world's first general-purpose parallel computer to achieve a teraFLOPS, the Intel ASCI Red system. As parallel computing architectures and applications and have continued to evolve, the department has expanded into several other system software areas around operating systems, but has continued to focus on addressing the needs of extreme-scale systems and applications.

People

Ronald B. Brightwell (Ron)
Manager, Scalable System Software
Email: rbbrigh@sandia.gov
Phone: 505/844-2099
Fax: 505/284-2518

Mailing address:
Sandia National Laboratories
P.O. Box 5800, MS 1319
Albuquerque, NM
87185-1320
Phyllis A Rutka, Office Administrative Assistant
Staff

Projects

The Scalable System Software department supports the design, implementation, and evaluation of system software for extreme-scale parallel computing platforms with a focus on maximizing the performance, scalability, robustness, and efficiency of key scientific, engineering, and analysis applications.  Areas of active research include: lightweight compute node operating systems, dynamic adaptive runtime systems, low-level high-performance networking software, application and system resiliency, power/energy optimization, application performance analysis, RAS software infrastructure, support for integrated analysis, and parallel file systems and I/O middleware.

Software

The Scalable System Software Department contributes to many open-source software projects, including the following:

News

  • CCR Researcher Jay Lofstead Co-Authors Best Paper at HPDC’19

    CCR Researcher Jay Lofstead and his co-authors from the Illinois Institute of Technology have been awarded Best Paper at the recent 2019 ACM International Symposium on High- Performance Parallel and Distributed Computing....

    CCR Researcher Jay Lofstead Co-Authors Best Paper at HPDC’19

    CCR Researcher Jay Lofstead and his co-authors from the Illinois Institute of Technology have been awarded Best Paper at the recent 2019 ACM International Symposium on High- Performance Parallel and Distributed Computing. Their paper entitled “LABIOS: A Distributed Label-Based I/O System” describes an approach to supporting a wide variety of conflicting I/O workloads under a single storage system. The paper introduces a new data representation called a label, which more clearly describes the contents of data and how it should be delivered to and from the underlying storage system. LABIOS is a new class of storage system that uses data labeling and implements a distributed, fully decoupled, and adaptive I/O platform that is intended to grow in the intersection of High-Performance Computing and Big Data. Each year the HPDC Program Chairs select the Best Paper based on reviews and discussion among the members of the Technical Program Committee. The award is named in memory of Karsten Schwan, a professor at Georgia Tech who made significant and lasting contributions to the field of parallel and distributed computing.

    Contact: Lofstead, Gerald Fredrick (Jay)
    June 2019
    2019-7473E

  • CCR Researcher Ryan Grant Honored by Queen’s University

    CCR Researcher Ryan Grant was recently recognized by his alma mater as one of the top 125 engineering alumni or faculty of Queen’s University during a celebration of the 125thanniversary of the Faculty of Engineering and Applied Science....

    CCR Researcher Ryan Grant Honored by Queen’s University

    CCR Researcher Ryan Grant was recently recognized by his alma mater as one of the top 125 engineering alumni or faculty of Queen’s University during a celebration of the 125thanniversary of the Faculty of Engineering and Applied Science. The award recognizes the achievements of alumni and faculty who are outstanding leaders in their field and represent excellence in engineering. Winners were recognized in March during a ceremony at the university in Kingston, Ontario, Canada. Ryan received his Bachelor of Applied Science, Master of Science in Engineering, and Ph.D. in Computer Engineering from Queen’s, and he is a Principal member of technical staff in the Scalable System Software department with expertise in high-performance interconnect technologies.

    Contact: Grant, Ryan
    June 2019
    2019-7482E

UNCLASSIFIED UNLIMITED RELEASE DOCUMENTS ONLY