For a complete list of publications, please see my Google Scholar Profile here.
2025
-
A Blueprint for Q-CS1, an Introductory Quantum Programming Course
Austin J. Adams, Rodrigo Borela, Jeffrey S. Young, and 1 more author
In Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 2, Pittsburgh, PA, USA, 2025
Despite the need to build a quantum workforce, current courses that introduce quantum programming are rooted in quantum notation that students may find intimidating. We propose Q-CS1, a quantum equivalent of CS1 that begins with hands-on quantum programming. Q-CS1 is enabled by the Qwerty quantum programming language, which allows for reasoning about qubit behavior without physics notation or quantum circuits. An outline of Q-CS1 is provided along with plans for assessing its effectiveness.
-
ASDF: A Compiler for Qwerty, a Basis-Oriented Quantum Programming Language
Austin J. Adams, Sharjeel Khan, Arjun S. Bhamra, and 6 more authors
2025
2024
-
Asynchronous Distributed-Memory Parallel Algorithms for Influence Maximization
Shubhendra Pal Singhal, Souvadra Hati, Jeffrey Young, and 3 more authors
In SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, 2024
-
A Workflow for the Synthesis of Irregular Memory Access Microbenchmarks
Kevin Sheridan, Jered Dominguez-Trujillo, Galen Shipman, and 5 more authors
In Proceedings of the International Symposium on Memory Systems, , 2024
Codesign of hardware technologies and applications for sparse memory access dominated workloads can be challenging due to the complexity of the codes, restrictions on access to the codes, or both. To address this challenge we have developed a novel methodology and set of tools, GS Patterns, that analyze and then synthesize memory access patterns from applications of arbitrary complexity. The results of this are patterns which only contain the normalized sampled memory access addresses as an array of indirection indices organized as either gather (read) or scatter (write) operations. These patterns can then be used to generate memory traffic suitable for hardware optimization and design.In this paper we present GS Patterns including a detailed description of the workflow and algorithms underlying it. The results of analysis and synthesis of access patterns in both proxy and real-world applications using GS Patterns are presented followed by evaluation of performance of these patterns on latest generation hardware technologies including AMD EPYC 9654P, Intel Xeon Max, NVIDIA Grace, and NVIDIA Hopper H100 and H200. Results of this evaluation clearly demonstrate performance differences across different hardware technologies that are not captured by and in many cases are contrary to the performance behavior of simpler memory microbenchmarks.
-
Understanding Performance Implications of LLM Inference on CPUs
Seonjin Na, Geonhwa Jeong, Byung Hoon Ahn, and 3 more authors
In 2024 IEEE International Symposium on Workload Characterization (IISWC), 2024
-
CuPBoP: Making CUDA a Portable Language
Ruobing Han, Jun Chen, Bhanu Garg, and 5 more authors
ACM Trans. Des. Autom. Electron. Syst., Jun 2024
CUDA is designed specifically for NVIDIA GPUs and is not compatible with non-NVIDIA devices. Enabling CUDA execution on alternative backends could greatly benefit the hardware community by fostering a more diverse software ecosystem. To address the need for portability, our objective is to develop a framework that meets key requirements, such as extensive coverage, comprehensive end-to-end support, superior performance, and hardware scalability. Existing solutions that translate CUDA source code into other high-level languages, however, fall short of these goals. In contrast to these source-to-source approaches, we present a novel framework, CuPBoP , which treats CUDA as a portable language in its own right. Compared to two commercial source-to-source solutions, CuPBoP offers a broader coverage and superior performance for the CUDA-to-CPU migration. Additionally, we evaluate the performance of CuPBoP against manually optimized CPU programs, highlighting the differences between CPU programs derived from CUDA and those that are manually optimized. Furthermore, we demonstrate the hardware scalability of CuPBoP by showcasing its successful migration of CUDA to AMD GPUs. To promote further research in this field, we have released CuPBoP as an open-source resource.
-
Qwerty: A Basis-Oriented Quantum Programming Language
Austin J. Adams, Sharjeel Khan, Jeffrey S. Young, and 1 more author
Jun 2024
-
The Framework Makes the Mission - An Analytical Comparison of Two Popular NASA Open Source Flight Software Framework Offerings
Sterling L. Peet, Scott M. Gilliland, and Jeffrey S. Young
Jun 2024
-
Multifidelity Memory System Simulation in SST
Patrick Lavin, Jeffrey Young, and Richard Vuduc
In Proceedings of the International Symposium on Memory Systems, Alexandria, VA, USA, Jun 2024
As computer systems grow larger and more complex, it takes more time to simulate their behavior in detail. Researchers interested in simulating large-scale systems must choose between less-accurate high-level models or simulating smaller portions of their benchmark suite, both of which are highly manual, offline approaches that require time-consuming analysis by experts. Multifidelity simulation aims to lessen this burden by automatically adapting the fidelity of a simulation to the complexity of the behavior occurring at any given point in time. We show how a multifidelity memory system model can be used to accelerate single node simulation by up to 2x with 1-5% mean absolute percent error in the simulated instructions per cycle across benchmark suites.
2023
-
HIPLZ: Enabling performance portability for exascale systems
Jisheng Zhao, Colleen Bertoni, Jeffrey Young, and 3 more authors
Concurrency and Computation: Practice and Experience, Jun 2023
Summary While heterogeneous computing has emerged as a dominant trend in current and future High-Performance Computing (HPC) systems, it is also widely recognized that this shift has led to increased software complexity due to a proliferation of programming systems for different heterogeneous processors. One such example is the Heterogeneous-Compute Interface for Portability from AMD (HIP ), which is composed of a C Runtime API and C++ Kernel Language. Many HPC applications will likely use HIP on future exascale systems (e.g., Frontier and El Capitan), but HIP currently only targets AMD and NVIDIA processors. This limitation creates challenges for users who would also like to run their applications on exascale systems based on other architectures (e.g., Aurora, which is based on Intel hardware) that are currently not targeted by HIP . In this paper, we introduce the design and implementation of HIPLZ , a compiler and runtime system that uses the Intel Level Zero API to support HIP on Intel GPU architectures. We discuss the design of HIPLZ , derived from HIPCL (an implementation of HIP on top of OpenCL ), and portability issues that occur from using the Level Zero runtime as a backend. We evaluate our implementation by running several performance benchmarks and mini-apps written in HIP on Intel architectures using HIPLZ . Our results show that this approach provides competitive performance relative to Intel’s OpenCL implementations on Intel Gen9 and UHD Graphics 770 GPUs, while providing good coverage of features needed by HPC applications. Overall, this approach is a promising demonstration of enabling performance portability for exascale systems.
-
Towards Safe HPC: Productivity and Performance via Rust Interfaces for a Distributed C++ Actors Library (Work in Progress)
John Parrish, Nicole Wren, Tsz Hang Kiang, and 3 more authors
In Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, Cascais, Portugal, Jun 2023
In this work-in-progress research paper, we make the case for using Rust to develop applications in the High Performance Computing (HPC) domain which is critically dependent on native C/C++ libraries. This work explores one example of Safe HPC via the design of a Rust interface to an existing distributed C++ Actors library. This existing library has been shown to deliver high performance to C++ developers of irregular Partitioned Global Address Space (PGAS) applications. Our key contribution is a proof-of-concept framework to express parallel programs safe-ly in Rust (and potentially other languages/systems), along with a corresponding study of the problems solved by our runtime, the implementation challenges faced, and user productivity. We also conducted an early evaluation of our approach by converting C++ actor implementations of four applications taken from the Bale kernels to Rust Actors using our framework. Our results show that the productivity benefits of our approach are significant since our Rust-based approach helped catch bugs statically during application development, without degrading performance relative to the original C++ actor versions.
-
EZ: An efficient, charge conserving current deposition algorithm for electromagnetic particle-in-cell simulations
Klaus Steiniger, Rene Widera, Sergei Bastrakov, and 13 more authors
Computer Physics Communications, Oct 2023
We present EZ, a novel current deposition algorithm for particle-in-cell (PIC) simulations. EZ calculates the current density on the electromagnetic grid due to macro-particle motion within a time step by solving the continuity equation of electrodynamics. Being a charge conserving hybridization of Esirkepov’s method and ZigZag, we refer to it as “EZ” as shorthand for “Esirkepov meets ZigZag”. Simulations of a warm, relativistic plasma with PIConGPU show that EZ achieves the same level of charge conservation as the commonly used method by Esirkepov, yet reaches higher performance for macro-particle assignment-functions up to third-order. In addition to a detailed description of the functioning of EZ, reasons for the expected and observed performance increase are given, and guidelines for its implementation aiming at highest performance on GPUs are provided.
-
Hardware-Agnostic Interactive Exascale In Situ Visualization of Particle-In-Cell Simulations
Felix Meyer, Benjamin Hernandez, Richard Pausch, and 15 more authors
In Proceedings of the Platform for Advanced Scientific Computing Conference, Davos, Switzerland, Oct 2023
The volume of data generated by exascale simulations requires scalable tools for analysis and visualization. Due to the relatively low I/O bandwidth of modern HPC systems, it is crucial to work as close as possible with simulated data via in situ approaches. In situ visualization provides insights into simulation data and, with the help of additional interactive analysis tools, can support the scientific discovery process at an early stage. Such in situ visualization tools need to be hardware-independent given the ever-increasing hardware diversity of modern supercomputers. We present a new in situ 3D vector field visualization algorithm for particle-in-cell (PIC) simulations and performance evaluation of the solution developed at large-scale. We create a solution in a hardware-agnostic approach to support high throughput and interactive in situ processing on leadership class computing systems. To that end, we demonstrate performance portability on Summit’s and the Frontier’s pre-exascale testbed at the Oak Ridge Leadership Computing Facility.
-
Unified Co-Simulation Framework for Autonomous UAVs
Sri Ranganathan Palaniappan, Varun Pateel, Sam Jijina, and 2 more authors
In Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good, Portland, OR, USA, Oct 2023
Autonomous drones (UAVs) have rapidly grown in popularity due to their form factor, agility, and ability to operate in harsh or hostile environments. Drone systems come in various form factors and configurations and operate under tight physical parameters. Further, it has been a significant challenge for architects and researchers to develop optimal drone designs as open-source simulation frameworks either lack the necessary capabilities to simulate a full drone flight stack or they are extremely tedious to setup with little or no maintenance or support. In this paper, we develop and present UniUAVSim, our fully open-source co-simulation framework capable of running software-in-the-loop (SITL) and hardware-in-the-loop (HITL) simulations concurrently. The paper also provides insights into the abstraction of a drone flight stack and details how these abstractions aid in creating a simulation framework which can accurately provide an optimal drone design given physical parameters and constraints. The framework was validated with real-world hardware and is available to the research community to aid in future architecture research for autonomous systems.
-
Observed Memory Bandwidth and Power Usage on FPGA Platforms with OneAPI and Vitis HLS: A Comparison with GPUs
Christopher M. Siefert, Stephen L. Olivier, Gwendolyn R. Voskuilen, and 1 more author
In High Performance Computing, Oct 2023
The two largest barriers to adoption of FPGA platforms for HPC applications are the difficulty of programming FPGAs and the performance gap when compared to GPUs. To address the first barrier, new ecosystems like Intel oneAPI, and Xilinx Vitis HLS aim to improve programmability for FPGA platforms. From a performance aspect, FPGAs trade off lower compute frequencies for more customized hardware acceleration and power efficiency when compared to GPUs. The performance for memory-bound applications on recent GPU platforms like NVIDIA’s H100 and AMD’s MI210 has also improved due to the inclusion of high-bandwidth memories (HBM), and newer FPGA platforms are also starting to include HBM in addition to traditional DRAM.
-
Future Computing with the Rogues Gallery
Aaron Jezghani, Jeffrey Young, Will Powell, and 2 more authors
In 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Oct 2023
-
Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models
Akihiro Hayashi , Austin Adams, Jeffrey Young, and 4 more authors
In 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Oct 2023
-
Application Experiences on a GPU-Accelerated Arm-based HPC Testbed
Wael Elwasif, William Godoy, Nick Hagerty, and 31 more authors
In Proceedings of the HPC Asia 2023 Workshops, Raffles Blvd, Singapore, Oct 2023
This paper assesses and reports the experience of ten teams working to port, validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems, each one equipped with a server-class Arm CPU from Ampere Computing and two data center GPUs from NVIDIA Corp. The systems are connected together using InfiniBand interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust and easy-to-access programming environment, including a variety of compilers and optimized scientific libraries. The goal of this work is to evaluate platform readiness and assess the effort required from developers to deploy well-established scientific workloads on current and future generation Arm-based GPU-accelerated HPC systems. The reported case studies demonstrate that the current level of maturity and diversity of software and tools is already adequate for large-scale production deployments.
2022
-
“Smarter” NICs for faster molecular dynamics: a case study
S. Karamati, C. Hughes, K. Hemmert, and 6 more authors
In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Jun 2022
2021
-
Online Model Swapping for Architectural Simulation
Patrick Lavin, Jeffrey Young, Richard Vuduc, and 1 more author
In Proceedings of the 18th ACM International Conference on Computing Frontiers, Virtual Event, Italy, Jun 2021
-
2020
-
Co-designing OpenMP Features Using OMPT and Simulation Tools
Matthew Baker, Oscar Hernandez, and Jeffrey Young
In OpenMP: Portable Multi-Level Parallelism on Modern Systems, Jun 2020
-
Evaluating Gather and Scatter Performance on CPUs and GPUs
Patrick Lavin, Jeffrey Young, Richard Vuduc, and 3 more authors
In The International Symposium on Memory Systems, Washington, DC, USA, Jun 2020
-
Programming Strategies for Irregular Algorithms on the Emu Chick
Eric R. Hein, Srinivas Eswar, Abdurrahman Yaşar, and 7 more authors
ACM Trans. Parallel Comput., Oct 2020
-
Spatter Github
Patrick Lavin, Jeffrey Young, and Richard Vuduc
Oct 2020
2019
-
Experimental Insights from the Rogues Gallery
Jeffrey S Young, Jason Riedy, Thomas M Conte, and 3 more authors
In 2019 IEEE International Conference on Rebooting Computing (ICRC), Oct 2019
-
Linear Algebra-Based Triangle Counting via Fine-Grained Tasking on Heterogeneous Environments : (Update on Static Graph Challenge)
A. Yaşar, S. Rajamanickam, J. Berry, and 3 more authors
In 2019 IEEE High Performance Extreme Computing Conference (HPEC), Oct 2019
-
Performance Impact of Memory Channels on Sparse and Irregular Algorithms
Oded Green, James Fox, Jeff Young, and 2 more authors
In 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3), Oct 2019
-
A microbenchmark characterization of the Emu chick
Jeffrey S. Young, Eric Hein, Srinivas Eswar, and 5 more authors
Parallel Computing, Oct 2019
-
Wrangling Rogues: A Case Study on Managing Experimental Post-Moore Architectures
Will Powell, Jason Riedy, Jeffrey S. Young, and 1 more author
In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), Chicago, IL, USA, Oct 2019
-
Spatter: A Customizable Scatter/Gather Benchmark
Patrick Lavin, Jason Riedy, Richard Vuduc, and 1 more author
Oct 2019
\urlhttp://spatter.io/
-
Rogues Gallery Public Gitlab Page
online, Oct 2019
\urlhttps://crnch-rg.gitlab.io/rg/
-
Programming Novel Architectures in the Post-Moore Era with the Rogues Gallery
E. Jason Riedy, and Jeffrey S. Young
In Practice and Experience in Advanced Research Computing (PEARC), Jul 2019
\urlhttps://crnch-rg.gitlab.io/pearc-2019/
-
Programming Novel Architectures in the Post-Moore Era with The Rogues Gallery
E. Jason Riedy, and Jeffrey S. Young
In 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Apr 2019
\urlhttps://crnch-rg.gitlab.io/asplos-2019/
2018
-
Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube
R. Hadidi, B. Asgari, J. Young, and 4 more authors
In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Apr 2018
-
An Energy-Efficient Single-Source Shortest Path Algorithm
Sara Karamati, Jeffrey Young, and Richard Vuduc
In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Apr 2018
-
An Initial Characterization of the Emu Chick
Eric Hein, Tom Conte, Jeffrey Young, and 5 more authors
In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Apr 2018
-
2018 Neuromorphic Workshop
Jennifer Hasler, and Jeffrey Young
Apr 2018
\urlhttp://crnch.gatech.edu/neuro-workshop18
2017
-
Evaluating Hybrid Memory Cube Infrastructure to Support High-Performance Sparse Algorithms
Kartikay Garg, and Jeffrey Young
In Proceedings of the International Symposium on Memory Systems, Alexandria, Virginia, Apr 2017
2016
-
Optimizing communication for a 2D-partitioned scalable BFS
Jeffrey Young, Julian Romera, Matthias Hauck, and 1 more author
In 2016 IEEE High Performance Extreme Computing Conference (HPEC), Apr 2016
-
GPUShare: Fair-Sharing Middleware for GPU Clouds
Anshuman Goswami, Jeffrey Young, Karsten Schwan, and 4 more authors
In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Apr 2016
-
Landrush: Rethinking In-Situ Analysis for GPGPU Workflows
Anshuman Goswami, Yuan Tian, Karsten Schwan, and 5 more authors
In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Apr 2016
-
GraphIn: An Online High Performance Incremental Graph Processing Framework
Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, and 4 more authors
In Proceedings of the 22Nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833, Apr 2016
2015
-
Examining Recent Many-core Architectures and Programming Models Using SHOC
M. Graham Lopez, Jeffrey Young, Jeremy S. Meredith, and 3 more authors
In Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, Austin, Texas, Apr 2015