
MENU
NOTE: The program is frequently updated, and all dates, times, and room information are subject to change.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Murali Emani, Gokcen Kestor, Dong Li
Abstract: Artificial Intelligence (AI) and Machine Learning (ML) are rapidly reshaping scientific discovery, enabling breakthroughs in climate prediction, materials design, astrophysics, drug development, and large-scale simulations. AI for Science (AI4S) seeks to accelerate innovation by integrating advanced learning methods into scientific workflows, yet key challenges persist. These include reliably and automatically applying AI/ML to complex scientific applications, incorporating domain knowledge such as physical constraints and symmetries, improving model robustness and interpretability for high-performance computing (HPC), refining foundation models for scientific use, and reducing the energy cost of large-scale training. As AI becomes increasingly central to scientific computing and influences HPC architectures and methodologies, coordinated efforts across disciplines are essential.
The AI4S workshop provides a forum for experts from academia, industry, and government to address these challenges and highlight emerging opportunities in AI-driven science. Through plenary talks, peer-reviewed papers, keynotes, and panel discussions, the workshop fosters collaboration between AI researchers, domain scientists, and HPC practitioners. Reflecting strong engagement in the supercomputing community—as demonstrated by record participation at prior events—the workshop aims to shape future AI research directions, advance integration of AI with HPC systems, and catalyze the next generation of AI-enabled scientific discoveries.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Kazuya Yamazaki, Jack Wells, Jeff Larkin
Abstract: This workshop brings together OpenACC users from national laboratories, universities, industry, and other research institutions to exchange information and share the OpenACC programming model's uses in various science domains. A directive-based and performance-portable parallel programming model designed to program many types of accelerators, OpenACC is compatible with the C, C++, and Fortran programming languages. OpenACC simplifies the process of porting codes from host devices to various high-performance computing accelerators, significantly reducing the time and effort scientists and engineers spend programming. One may also think of OpenACC as user- specified directives that fill gaps in the standard programming languages. As standard parallel programming languages mature, one will expect that program developers will have fewer gaps to fill. This vision is consistent with greater portability and sustainability of scientific programming over time. Indeed, features pioneered in OpenACC are informing the development work of official ISO standard programming models. We are planning presentations from users of OpenACC and panel discussions of topics of interest to the community, highlighting forefront research applications in a variety of research domains and discussions of the future evolution of the OpenACC standard.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Ikko Hamamura, Ryousei Takano, Pascal Jahan Elahi, Tommaso Macrì, Nan-yow Chen, Yun-Yuan (Pika) Wang
Abstract: TBA
Mon, January 26, 2026 9:30 - 16:30
Contributors: Michael Ott, Ayesha Afzal, Fumiyoshi Shoji, Natalie Bates
Abstract: As the performance, power, and heat density of supercomputers continues to grow — driven by the integration of high-power heterogeneous components such as multi-core CPUs, GPUs, high-bandwidth memory, and high bandwidth interconnects — coordinated strategies across facilities, utilities, HPC systems, and applications are required to manage energy use, reduce environmental impact, and ensure long-term operational viability.
The Energy Efficient HPC State of the Practice workshop will focus on the operational, infrastructural, and environmental challenges of deploying and managing modern high-performance computing systems. The primary objective of this workshop is to capture and disseminate best practices, case studies, and reproducible operational experiences from HPC centers, facilities, and vendors worldwide.
While energy efficiency has long been recognized as a critical constraint, sustainability metrics such as greenhouse gas (GHG) emissions, embodied carbon, and water consumption are now also coming into focus. This workshop will explore how to address these challenges across the full lifecycle of HPC systems — from design and manufacturing through daily operations, reuse, and decommissioning.
This year’s workshop also broadens its lens to consider AI infrastructure, which increasingly mirrors HPC in system architecture and operational demands. There are lessons to be learned from the HPC community that should help with the operation of Megawatt-scale AI racks, warm-water cooling, and hyperscale deployments. The convergence of these domains presents an opportunity to align practices, metrics, and innovations in service of a shared future where performance and sustainability must coexist.
Website: https://sites.google.com/lbl.gov/energy-efficient-hpc-sop-works/home
Call for Papers: https://sites.google.com/lbl.gov/energy-efficient-hpc-sop-works/call-for-papers
Mon, January 26, 2026 9:30 - 16:30
Contributors: Mehmet E. Belviranli, Seyong Lee, Keita Teranish, Ali Akoglu
Abstract: Recent trends toward the end of Dennard scaling and Moore’s law makes
the current and upcoming computing systems more specialized and complex,
consisting of more complex and heterogeneous architectures in terms of
processors, memory hierarchies, on-chip-interconnection networks,
storage, etc. This trend materialized in the mobile and embedded market,
and it is entering the enterprise, cloud computing, and high performance
computing markets.
RSDHA is a forum for researchers and engineers in both HPC domain and
embedded/mobile computing domain to gather together and discuss (1) the
latest ideas and lessons learned from the previous experience on
traditional (i.e., horizontal) and heterogeneity-based (i.e., vertical)
scaling in their own domains and (2) possible synergistic approach and
interaction between the two domains to find a good balance between
programmability and performance portability across diverse ranges of
heterogeneous systems from mobile to HPC.
Website: https://hpss.mines.edu/rsdha-26/
Call for Papers: https://hpss.mines.edu/rsdha-26/cfp.html
This workshop has been cancelled due to unavoidable circumstances.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Benoit Martin
Abstract: As HPC simulations scale toward exascale, I/O becomes a critical bottleneck due to the growing disparity between compute performance and storage bandwidth. Traditional post-hoc output models struggle with the volume and velocity of generated data. The Parallel Data Interface (PDI) offers a lightweight and flexible solution by decoupling I/O, filtering, and analysis logic from the simulation code. Through a declarative configuration system and a plugin-based architecture, PDI enables simulation developers to expose data buffers and trigger events without embedding I/O decisions directly into their application. PDI offers a simple API in C/C++, Fortran, and Python.
This full-day tutorial introduces PDI and its ecosystem of plugins: sequential and parallel HDF5 for file output, user-code and pycall for in-process custom logic execution, DEISA for in situ analytics, and Catalyst for live ParaView-based visualization. Through a combination of theoretical lectures and hands-on exercises, participants will learn how to instrument a simulation code with PDI, configure it declaratively via YAML, and drive complex I/O and visualization workflows without modifying the simulation itself.
Attendees will leave with a practical understanding of how to adopt PDI in their own projects to create modular data workflows suited for HPC.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Christian Trott
Abstract: This tutorial provides a introduction to Kokkos, a C++ programming model designed for application performance portability across diverse computing architectures. As modern high-performance computing (HPC) increasingly relies on heterogeneous systems featuring GPUs, multicore CPUs, and other accelerators, developers face the challenge of writing code that efficiently utilizes these varied hardware resources without developing and maintaining multiple variants of the software. Kokkos addresses this by offering a single-source approach, allowing users to write code once and compile it for optimal execution on a wide range of platforms. Kokkos is an Open Source project under the Linux Foundation’s “High Performance Software Foundation" (https://hpsf.io).
We'll start by exploring the fundamental concepts of Kokkos, including memory spaces and execution spaces, which are crucial for managing data placement and task execution on different devices. You'll learn about Kokkos::parallel_for for launching parallel computations and Kokkos::View for managing data arrays efficiently on various memory architectures. Through hands-on examples, we'll demonstrate how to port simple computational kernels to Kokkos, highlighting the benefits of its abstraction layers. By the end of this tutorial, beginners will have a solid foundation for developing performance-portable applications with Kokkos, enabling them to leverage the full power of modern HPC systems. No prior experience with parallel programming models like CUDA or OpenMP is required, though basic C++ knowledge is assumed.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Anshu Dubey
Abstract: Producing scientific software is a challenge. The high-performance modeling and simulation community, in particular, faces the confluence of disruptive changes in computing architectures and new opportunities (and demands) for greatly improved simulation capabilities, especially through coupling physics and scales. Simultaneously, computational science and engineering (CSE), as well as other areas of science, are experiencing an increasing focus on scientific reproducibility and software quality. Large language models (LLMs), can significantly increase developer productivity through judicious off-loading of tasks. However, models can hallucinate, therefore it is important to have a good methodology to get the most benefit out of this approach.
We propose a tutorial in which attendees will learn about practices, processes, and tools to improve the productivity of those who develop CSE software, increase the sustainability of software artifacts, and enhance trustworthiness in their use. We will focus on aspects of scientific software development that are not adequately addressed by resources developed for industrial software engineering. We will additionally impart state-of-the-art approaches for using LLMs to enhance developer productivity in the context of scientific software development and maintenance. Topics include the design, test-driven development, refactoring, code translation and testing of complex scientific software systems; and conducting computational experiments with reproducibility built in.
The inclusion of LLM assistance on coding related tasks is particularly important to include in any software productivity concern given that it has the potential to change the way development is done. It is particularly challenging to get this assistance in developing research software because of limited training data. We have developed methodologies and tools for software development and translation that use LLMs. The use of these tools and methodologies for hands-on activities will be a part of this tutorial.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Murali Emani
Abstract: Scientific applications are increasingly adopting Artificial Intelligence (AI) techniques to advance science. The scientific community is taking advantage of specialized hardware accelerators to push the limits of scientific discovery possible with traditional processors like CPUs and GPUs as demonstrated by the winners of ACM Gordon Bell prize recipients in recent years. The AI accelerator landscape can be daunting for the broader scientific community, particularly for those who are just beginning to engage with these systems. The wide diversity in hardware architectures and software stacks makes it challenging to understand the differences between these accelerators, their capabilities, programming approaches, and performance. In this tutorial, we will cover an overview of the AI accelerators available for allocation at Argonne Leadership Computing Facility (ALCF): SambaNova, Cerebras, Graphcore, and Groq, focusing on their architectural features and software stacks, including chip and system design, memory architecture, precision handling, programming models, and software development kits (SDKs). Through hands-on exercises, attendees will gain practical experience in refactoring code and running models on these systems, focusing on use cases of pre-training and fine-tuning open-source Large Language Models (LLMs) and deploying AI inference solutions relevant to scientific contexts. Additionally, the sessions will cover the low-level HPC software stack of these accelerators using simple HPC kernels. By the end of this tutorial, participants will have a solid understanding of the key capabilities of emerging AI accelerators and their performance implications for scientific applications, equipping them with the knowledge to leverage these technologies effectively in their research endeavors.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Samuel Rodriguez
Abstract: While libraries exist for most commonly used mathematical operations, developers often need to write custom kernels for functionality not covered by high-level APIs. A great example is "fusing" multiple operations into a single kernel, a well-established optimization strategy that reduces kernel launch overhead and can increase arithmetic intensity and minimize global memory traffic. These benefits are particularly impactful in performance-critical applications, such as attention mechanisms in large language models. Math Device Extension (MathDx) libraries were developed to enable the rapid development of such high-performance GPU kernels.
In this tutorial, we introduce the Dx ecosystem to the GPU kernel developer community, highlighting its use in both C++ and Python and how it makes it easy to write fast, portable GPU code. Python support is enabled via nvmath-python, which brings the power of Dx libraries to data scientists and research developers without sacrificing performance.
To ground these concepts in a meaningful application, we use floating-point emulation via the Ozaki-I scheme as a hands-on case study. Attendees will learn how to implement high-precision FP64 matrix multiplication on low-precision integer Tensor Cores using the cuBLASDx library and achieve near-cuBLAS performance. This will be accomplished through dedicated access to a GPU that has a very high INT8 to FP64 tensor core throughput ratio (e.g., Ada or Blackwell). Attendees will have the opportunity to choose to do this either in C++ or in Python. This complex algorithm serves as a compelling example of how Dx libraries enable the rapid development of high-performance GPU kernels across both domains. If time permits, there will be a second exercise utilizing cuFFTDx.
Whether you’re an HPC practitioner looking to push the boundaries of performance or a developer seeking easier, more expressive GPU programming workflows, this tutorial will increase your productivity in writing high-performance GPU code.
Mon, January 26, 2026 9:30 - 16:30
Contributors: Todd Gamblin
Abstract: Modern scientific software stacks rely on thousands of packages, from low-level libraries in C, C++, and Fortran to higher-level tools in Python and R. Scientists must deploy these stacks across diverse environments, from personal laptops to supercomputers, while tailoring workflows to specific tasks. Development workflows often require frequent rebuilds, debugging, and small-scale testing for rapid iteration. In contrast, preparing applications for large-scale HPC production involves performance-critical libraries (e.g., MPI, BLAS, LAPACK) and machine-specific optimizations to maximize efficiency.
Managing these varied requirements is challenging. Configuring software, resolving dependencies, and ensuring compatibility can hinder both development and deployment. Spack is an open-source package manager that simplifies building, installing, and customizing HPC software stacks. It offers a flexible dependency model, Python-based syntax for package recipes, and a repository of over 8,500 packages maintained by over 1,500 contributors. Spack is widely adopted by researchers, developers, cloud platforms, and HPC centers worldwide.
This tutorial introduces Spack's core capabilities, including installing, managing, and authoring packages, configuring environments, and deploying optimized software on HPC systems. The tutorial is divided into two halves: the first with introductory topics and the second with advanced workflows for developers, package maintainers, and facility staff. The format is interactive; presenters will work through live demos, which attendees can also work through in live cloud instances. Attendees will gain foundational skills for automating routine tasks and acquire advanced knowledge to address complex use cases with Spack.
Mon, January 26, 2026 9:30 - 12:30
Contributors: Miwako Tsuji, Filippo Spiga
Abstract: This workshop aims to provide the opportunity to share the practice and experience of high-performance computing systems using the Arm architecture and their performance and applications.
The last few years have seen an explosion of 64-bit Arm-based processors targeted toward server and infrastructure workloads, often specializing in a specific domain such as HPC, cloud, and machine learning. A wide variety of Arm based processors such as Fujitsu A64FX, AWS Graviton, Microsoft Cobalt, Google Axion, Huawei Kunpeng and NVIDIA Grace, are available. More will come online in 2027.
Sharing the practice and experiences using these Arm-based processors will contribute to advancing high-performance computing technology for newly designed systems using these new emerging Arm-based processors.
In this workshop, we invite papers on the practice and experience of the Arm-based high-performance computing systems, if available, optimization and performance analysis of high-performance workloads on Arm-based processors. We welcome performance optimization studies either through access to real hardware or via simulation/emulation frameworks.
The topics include, but are not limited to:
Website: https://iwahpce.github.io/
Call for Papers: Please see the website for details
Mon, January 26, 2026 9:30 - 12:30
Contributors: Neda Ebrahimi Pour, Sabine Roller, Ryoji Takaki
Abstract: The objective of this workshop is to provide a forum for the presentation and discussion of advanced numerical simulation techniques for complex multi-scale, multi-physics, coupled problems and AI enhanced simulations on high performance computing (HPC) systems.
Applications with different characteristics in parts of the computational domain can lead to unexpected performance issues. The optimum setting for one part might be contradictory to the optimum for another; the overall optimum might be non-optimal, but still a satisfactory compromise for researchers.
A variety of methodologies have been employed during the development of individual solutions, contingent upon the specific application and the underlying hardware configuration. In terms of the application, for example, machine learning algorithms have been introduced with a view to prediction purposes of simulation results. With regard to the hardware, the introduction of both homogeneous and heterogeneous cluster settings has been considered. All combinations have advantages and disadvantages, leading to the following question: how to find the optimal configuration and setting of all parameters, with respect to quality of solution vs. computational efficiency?
Call for Papers: Please see the website for details
Mon, January 26, 2026 9:30 - 12:30
Contributors: Kai Watanabe
Abstract: TBA
Mon, January 26, 2026 9:30 - 12:30
Contributors: Antigoni Georgiadou, Tiernan Casey, Tushar Athawale
Abstract: The EPSOUQ-HPC workshop aims to close what we see as a key gap in the HPC/supercomputing technical content, i.e. a forum for deep discussion of topics at the interface of uncertainty quantification of predictive simulation and HPC. Our workshop welcomes technical contributions in all areas at the nexus of uncertainty quantification, optimization, modeling & simulation, and high-performance computing, while also targeting specific themes where these concepts are critical to technical success. In particular we are targeting digital twins, smart computing and weather simulation, and their interaction with generative AI models including language and image generation as themes where these nexus topics are of critical importance for developing, analyzing, and interpreting predictions. This year we are also expanding the UQ pillar to include research in uncertainty visualization, which is an important tool in enabling human interpretation of the outputs of probabilistic simulations.
Mon, January 26, 2026 9:30 - 12:30
Contributors: Sadaf R. Alam, Maxime Martinasso, Alex Lovell-Troy, François Tessier, David Hancock, Winona Snapp-Childs
Abstract: Exascale computing initiatives are expected to enable breakthroughs for multiple scientific disciplines. Increasingly these systems may utilize cloud technologies, enabling complex and distributed workflows that can improve not only scientific productivity, but accessibility of resources to a wide range of communities. Such an integrated and seamlessly orchestrated system for supercomputing and cloud technologies is indispensable for experimental facilities that have been experiencing unprecedented data growth rates. While a subset of high performance computing (HPC) services have been available within a public cloud environments, petascale and beyond data and computing capabilities are largely provisioned within HPC data centres using traditional, bare-metal provisioning services to ensure performance, scaling and cost efficiencies. At the same time, on-demand and interactive provisioning of services that are commonplace in cloud environments, remain elusive for leading supercomputing ecosystems. This workshop aims at bringing together a group of experts and practitioners from academia, national laboratories, and industry to discuss technologies, use cases and best practices in order to set a vision and direction for leveraging high performance, extreme-scale computing and on-demand cloud ecosystems. Topics of interest include tools and technologies enabling scientists for adopting scientific applications to cloud interfaces, interoperability of HPC and cloud resource management and scheduling systems, cloud and HPC storage convergence to allow a high degree of flexibility for users and community platform developers, continuous integration/deployment approaches, reproducibility of scientific workflows in distributed environment, and best practices for enabling X-as-a-Service model at scale while maintaining a range of security constraints.
Mon, January 26, 2026 9:30 - 12:30
Contributors: JESUS CARRETERO, MARTIN SCHULZ, ESTELA SUAREZ
Abstract: The HPCMALL 2026 workshop will bring together researchers from diverse areas of HPC that are impacted by or actively pursuing malleability concepts, including application developers, system architects, programming model researchers, and system software researchers. In addition to high-quality, refereed publications and talks, the workshop will provide a lively discussion forum for researchers working in HPC and pursuing the concepts of and around malleability, to reflect on the advances achieved in the field since the previous editions of this workshop.
Website: https://coco-arcos.github.io/HPCMALL2026/
Call for Papers: Please see the website for details
Mon, January 26, 2026 9:30 - 12:30
Contributors: Makoto Taiji, Geetika Gupta
Abstract: This workshop aims to catalyze progress at the intersection of artificial intelligence and the natural sciences by highlighting the methodological challenges and opportunities that arise when applying AI techniques to domain-specific problems in life, material, physical sciences, and related fields. Despite recent successes, the integration of AI into scientific workflows remains non-trivial due to domain constraints such as limited labeled data, complex simulation-based environments, and the need for interpretability and physical consistency. The workshop provides showcase of recent successful developments of AI models for science and discussions on future directions.
Website: https://wahibium.github.io/advancing-science-through-ai/
Mon, January 26, 2026 9:30 - 12:30
Contributors: Nick Brown, Enrique S. Quintana-Ortí, Sandra Catalán
Abstract: The goal of this workshop is to continue building the community of RISC-V in HPC, sharing the benefits of this technology with domain scientists, tool developers, and supercomputer operators. RISC-V is an open standard Instruction Set Architecture (ISA) which enables the royalty free development of CPUs and a common software ecosystem to be shared across them. Following this community driven ISA standard, a very diverse set of CPUs have been, and continue to be, developed which are suited to a range of workloads. Whilst RISC-V has become very popular already in some fields, and in 2022 the ten billionth RISC-V core was shipped, to date it has yet to gain traction in HPC.
Website: https://riscv.epcc.ed.ac.uk/community/workshops/hpcasia26-workshop/
Mon, January 26, 2026 09:30 - 12:30
Contributors: Sebastian Stern
Abstract: Classical Quantum Monte Carlo (QMC) methods leverage high-performance computing (HPC) resources to simulate complex quantum many-body systems. Recently, these methods have been extended to quantum computers (QC) in hopes to achieve better accuracy. At the same time, architectures are being developed that enable such hybrid workflows by integrating quantum and HPC resources often hosted at different locations.
In this tutorial, we demonstrate a solution to an exemplary quantum many-body problem integrating distributed classical and quantum computing systems in the cloud. Specifically, we build an end-to-end workflow to execute the subroutines of a QMC algorithm on cloud-based batch and quantum computing resources and estimate the ground state energy of the example problem Hamiltonian.
The tutorial introduces QMC and QC basics to the participants and enables them to utilize cloud-native HPC and QC technologies for hybrid workloads. During the tutorial, participants will get free access to temporary AWS accounts and can follow along the guided steps in the QMC workflow. All attendees leave with code examples they can use as a foundation for their own projects.
Mon, January 26, 2026 9:30 - 12:30
Contributors: Dhabaleswar K. Panda
Abstract: High-Performance Networking technologies are generating a lot of excitement towards building next generation High-End Computing (HEC) systems for HPC and AI with GPGPUs, accelerators, and Data Center Processing Units (DPUs), and a variety of application workloads. This tutorial will provide an overview of these emerging technologies, their architectural features, current market standing, and suitability for designing HEC systems. It will start with a brief overview of IB, HSE, RoCE, and Omni-Path interconnect. An in-depth overview of the architectural features of these interconnects will be presented. It will be followed with an overview of the emerging NVLink, NVLink2, NVSwitch, EFA, Slingshot, and Tofu-D architectures. We will then present advanced features of commodity high-performance networks that enable performance and scalability. We will then provide an overview of enhanced offload-capable network adapters like DPUs/IPUs (Smart NICs), their capabilities and features. Next, an overview of software stacks for high-performance networks like OpenFabrics Verbs, LibFabrics, and UCX, comparing the performance of these stacks will be given. Next, challenges in designing MPI libraries for these interconnects, solutions and sample performance numbers will be presented.
Website:https://nowlab.cse.ohio-state.edu/tutorials/SCA-HPCAsia-2026_hpn/
Mon, January 26, 2026 9:30 - 12:30
Contributors: Matthew Treinish
Abstract: Quantum computing is an emerging technology which has the potential to solve some
problems which are intractable for even the largest traditional supercomputer. By
leveraging quantum mechanical phenomena to perform computation, it can offer
exponential speedups for certain classes of problems. In recent years strategies have
emerged for combining HPC systems with quantum computers that leverage the unique
strengths of both computational models. The combination of high-performance computing
with quantum computing opens up the possibility for quantum computers to reach their
full potential.
This tutorial aims to provide an introduction to quantum computing for attendees. It will
provide an overview of quantum information theory, how to use quantum computers, and
the typical workflow when using a quantum computer. Building off that base knowledge,
the tutorial will explore different programming patterns which are compatible with typical
HPC workflows. It will specifically focus on using the Qiskit open source SDK and
demonstrate how you can use it to program quantum computers. This will include real-
world examples demonstrating hybrid HPC and quantum computing workflows.
Mon, January 26, 2026 13:30 - 16:30
Contributors: Tomohiro Ueno, Sheng Di
Abstract: In addition to traditional applications, the rise of AI and cloud computing has significantly increased the volume of data processing and communication required in high-performance computing (HPC).
Efficient data analytics and data movement across distributed and parallel environments (e.g., the Internet, inter-node networks, and system interconnects) have become critical factors in determining the performance and energy efficiency of supercomputers, data centers, and cloud platforms.
This workshop aims to address key research challenges related to big data from multiple perspectives, including data exploration, data compression, and big data systems.
To tackle these challenges, the workshop will aim to explore practical and effective approaches to data analytics and mining, big data visualization, data integration, scalable data compression, and storage/processing systems for big data.
These investigations will consider both the characteristics of large-scale data workloads and the constraints of modern hardware architectures.
In particular, the workshop will emphasize optimization strategies for big data processing, adaptive and general-purpose compression techniques, and high-performance systems designed for high-throughput, low-latency, and hardware-efficient data operations.
Website: https://sites.google.com/view/bdxcs2026/home
Call for Papers: https://drive.google.com/file/d/1DNpeDyVSZUZoMU7huAySe6yiUTBL3bEE/view?pli=1
Mon, January 26, 2026 13:30 - 16:30
Contributors: James Lin, Filippo Spiga
Abstract: TBA
Mon, January 26, 2026 13:30 - 16:30
Contributors: Barton Fiske, Simon See, Akihiro Kishimoto
Abstract: Scientific visualization is undergoing a transformative era, driven by the explosion of data volumes, the advent of exascale computing, and the integration of artificial intelligence and machine learning. As simulations and applications of smart cities generate increasingly complex and massive datasets, the ability to analyze, interpret, and communicate these data visually is more critical than ever. Emerging trends include the use of visualization to minimize I/O bottlenecks, the development of scalable rendering techniques for handling billion-element datasets, and the growing role of interactive and immersive visualization environments for collaborative scientific and smart city discovery. Looking ahead, scientific visualization and digital twins are poised to become even more central to research and urban workflows, enabling real-time insights, supporting decision-making in smart city planning. The future of scientific visualization and digital twins lie in its ability to adapt to new computational paradigms and to empower researchers, city planners, and decision-makers with intuitive, powerful tools for exploring and understanding the unseen in smart cities.
This workshop aims to foster collaboration among visualization and digital twins experts, domain scientists, and technology developers to advance the state-of-the-art in scientific visualization and address the challenges posed by increasingly large and complex datasets in smart city applications. By sharing best practices, innovative tools, and real-world experiences, the workshop seeks to catalyze new research directions and practical solutions that enhance the accessibility and impact of visualization across urban environments. The influence of this workshop is to empower scientists and smart city stakeholders to communicate their findings more effectively and enabling real-time visualization for urban development. Ultimately, the workshop will help bridge the gap between advanced scientific computing, visualization, and smart city needs, ensuring that visualization remains a powerful force for discovery, understanding, and urban transformation.
Website: https://sites.google.com/view/svsc-workshop-scahpcasia-26
Mon, January 26, 2026 13:30 - 16:30
Contributors: Simon Garcia de Gonzalo, Mohammad Alaul Haque Monil, Norihisa Fujita
Abstract: While computing technologies have remained relatively stable for nearly two decades, new architectural features, such as specialized hardware, heterogeneous cores, deep memory hierarchies, and near-memory processing, have emerged as possible solutions to address the concerns of energy efficiency, manufacturability, and cost. However, we expect this ‘golden age’ of architectural change to lead to extreme heterogeneity and will have a major impact on software systems and applications. In this upcoming exascale and extreme heterogeneity era, it will be critical to explore new software approaches that will enable us to effectively exploit this diverse hardware to advance science, the next-generation systems with heterogeneous elements will need to accommodate complex workflows. This is mainly due to the many forms of heterogeneous accelerators (no longer just GPU accelerators) in this heterogeneous era, and the need to map different parts of an application onto elements most appropriate for that application component. In addition, this year we acknowledge the increasing need for Co-Design. This topic will explore the methodologies, challenges, and opportunities in the co-design of hardware, software, and applications to achieve optimal performance, power efficiency, and productivity in the era of extreme heterogeneity.
Website: https://ornl.github.io/events/exhet2026/
Call for Papers: Please see the website for details
Mon, January 26, 2026 13:30 - 16:30
Contributors: Toshihiro Hanawa
Abstract: IXPUG Workshop at HPC Asia 2026 is an open workshop on high-performance computing applications, systems, and architecture with Intel technologies. This is a half-day workshop with invited talks and contributed papers. The workshop aims to bring together software developers and technology experts to share challenges, experiences, and best-practice methods for the optimization of HPC, Machine Learning, and Data Analytics workloads. Any research aspect related to Intel HPC products is welcome to be presented in this workshop.
Website: https://www.ixpug.org/events/ixpug-hpc-asia-2026
Call for Papers: Please see the website for details
Mon, January 26, 2026 13:30 - 16:30
Contributors: Jong Choi, Masaaki Kondo, Shruti Kulkarni, Seung-Hwan Lim, Tong Shu, Elaine Wong
Abstract: With the recent advancements in artificial intelligence, deep learning systems and applications have become a driving force in multiple transdisciplinary domains. This evolution has been supported by the rapid improvements of advanced processor, accelerator, memory, storage, interconnect and system architectures, including architectures based on future and emerging hardware (e.g., quantum, superconducting, photonic, neuromorphic). However, existing research is focused on hardware accelerators, deep learning systems and applications separately, but the co-design among them is largely underexplored. To develop high-performance deep learning systems on advanced accelerators, our workshop will focus on the following three important topics:
Website: https://shda-workshop.github.io/
Call for Papers: https://shda-workshop.github.io/html/call4papers.html
Mon, January 26, 2026 13:30 - 16:30
Contributors: Chen Wang
Abstract: As HPC applications grow in complexity and scale, I/O (Input/Output) performance remains a persistent bottleneck. Many modern workloads, including coupled simulations, AI integration, and in-situ analytics generate and consume large volumes of data that stress shared parallel file systems. This tutorial introduces practical techniques and tools to accelerate application I/O using fast, node-local storage, with a focus on two open-source solutions: UnifyFS and DYAD.
UnifyFS is a user-level file system that provides a shared namespace backed by node-local storage, enabling scalable, high-throughput I/O for write-heavy workloads. DYAD complements this by intelligently managing the data flow of dependent workflow components (e.g., simulation and analysis) to improve data locality for read-heavy workloads. Together, these systems offer a powerful approach to tackling I/O challenges without requiring major changes to application code.
This hands-on tutorial will guide participants through:
Mon, January 26, 2026 13:30 - 16:30
Contributors: Dhabaleswar K. (DK) Panda
Abstract: Recent advances in Deep Learning (DL) have led to many exciting challenges and opportunities. Modern DL frameworks such as PyTorch and TensorFlow enable high-performance training, inference, and deployment for various types of Deep Neural Networks (DNNs). This tutorial provides an overview of recent trends in DL and the role of cutting-edge hardware architectures and interconnects in moving the field forward. We will also present an overview of different DNN architectures, DL frameworks, and DL Training and Inference with special focus on parallelization strategies for large models such as GPT, LLaMA, DeepSeek, and ViT. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU/GPU architectures to efficiently support large-scale distributed training. We also highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU/GPU/DPU architectures available on modern HPC clusters. Throughout the tutorial, we include several hands-on exercises to enable attendees to gain first-hand experience of running distributed DL training on a modern GPU cluster.
Website: https://nowlab.cse.ohio-state.edu/tutorials/hidl_SCAsia26/
Mon, January 26, 2026 13:30 - 16:30
Contributors: Shun Utsui
Abstract: Researchers and HPC practitioners increasingly face the challenge of maintaining consistent software environments across diverse computing resources. As scientific workflows span multiple institutions and migrate between on-premises clusters and cloud platforms, environment inconsistency leads to wasted time, reproducibility challenges, and inefficient resource utilization. RIKEN Center for Computational Science (R-CCS) and Amazon Web Services (AWS) will demonstrate how they overcame these challenges with the "Virtual Fugaku" strategy. This tutorial demonstrates how cloud infrastructure combined with containerized software environments enables researchers to develop once and run anywhere, eliminating boundaries between computing resources. Participants will first learn to build cloud-based HPC environments, then explore how portable software stacks like those in the Virtual Fugaku project maintain consistency across diverse infrastructures.
Through lectures and hands-on labs, participants will gain understanding of both cloud HPC deployment and workload portability while maintaining performance. The tutorial explores practical approaches for building infrastructure and running consistent workloads across different environments.
Participants will apply learning through two hands-on labs: building a cloud-based HPC
cluster, then deploying portable, containerized environments. Attendees will gain
knowledge to effectively deploy and optimize portable HPC workloads across diverse
computing resources.
Thu, January 29, 2026 13:30 - 17:00
Contributors: Munetaka Ohtani
Abstract: Quantum computing has the potential to elevate heterogeneous high-performance computers to tackle problems that are intractable for purely classical supercomputers. Integrating quantum processing units (QPUs) into a heterogeneous compute infrastructure, referred to as the quantum-centric supercomputing (QCSC) model, involves CPUs, GPUs, and other specialized accelerators (AIUs, etc.). Achieving this requires collaboration across multiple industries to align efforts in integrating hardware and software.
IBM and our HPC/Quantum partners have developed software components to enable the handling of QPU workloads within the Slurm workload manager in HPC environments. This tutorial session will provide a comprehensive overview of the architecture, demonstrate how to create Slurm jobs for executing quantum workloads, and discuss the execution of Quantum-Classical hybrid workloads. Participants will gain hands-on experience though live demonstrations, exploring the integration of quantum workloads into existing HPC systems.
Efficient scheduling is only part of the solution. In the second half of the session, we will address the orchestration challenges unique to hybrid Quantum-Classical workloads—such as iterative execution, hyperparameter tuning, and backend instability. Participants will learn how to build scalable, fault-tolerant pipelines using Python-based workflow tools like Prefect. Key features such as checkpointing, automatic retries, and real-time observability will be demonstrated live, equipping attendees with the skills to manage complex quantum workloads and prepare for future challenges in scalability and reproducibility.
Tue, January 27, 2026 09:00 - 09:30 5F Main Hall
Tue, January 27, 2026 9:30 - 17:00
Wed, January 28, 2026 9:00 - 17:00
Thu, January 29, 2026 9:00 - 15:30
3F Event Hall
Exhibitor List: Exhibitor List and Floor Plan (PDF)
Exhibitors Forum: TBA
Tue, January 27, 2026 09:30 - 10:15 5F Main Hall

Speaker
Biography
Torsten Hoefler is a Professor of Computer Science at ETH Zurich, a member of Academia Europaea, and a Fellow of the ACM, IEEE, and ELLIS. He received the 2024 ACM Prize in Computing, one of the highest honors in the field. Following a Performance as a Science vision, he combines mathematical models of architectures and applications to design optimized computing systems. Before joining ETH Zurich, he led the performance modeling and simulation efforts for the first sustained Petascale supercomputer, Blue Waters, at the University of Illinois at Urbana-Champaign. He is also a key contributor to the Message Passing Interface (MPI) standard where he chaired the "Collective Operations and Topologies" working group. Torsten won best paper awards at his field's top conference ACM/IEEE Supercomputing in 2010, 2013, 2014, 2019, 2022, 2023, 2024, and at other international conferences. He has published numerous peer-reviewed scientific articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. For his work, Torsten received the IEEE CS Sidney Fernbach Memorial Award in 2022, the ACM Gordon Bell Prize in 2019, Germany's Max Planck-Humboldt Medal, the ISC Jack Dongarra award, the IEEE TCSC Award of Excellence (MCR), ETH Zurich's Latsis Prize, the SIAM SIAG/Supercomputing Junior Scientist Prize, the IEEE TCSC Young Achievers in Scalable Computing Award, and the BenchCouncil Rising Star Award. Following his Ph.D., he received the 2014 Young Alumni Award and the 2022 Distinguished Alumni Award of his alma mater, Indiana University. Torsten was elected to the first steering committee of ACM's SIGHPC in 2013 and he was re-elected for every term since then. He was the first European to receive many of those honors; he also received both an ERC Starting and Consolidator grant. His research interests revolve around the central topic of performance-centric system design and include scalable networks, parallel programming techniques, and performance modeling for large-scale simulations and artificial intelligence systems. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.
Abstract
The Ultra Ethernet Consortium set out to redefine Ethernet-based interconnects for AI and high-performance computing (HPC), culminating in the recent release of its first specification (version 1.0). This talk will analyze HPC and AI workloads with respect to their networking requirements. We will then highlight key innovations that distinguish Ultra Ethernet from existing solutions, ranging from lossy operation—both with and without trimming—to fully hardware-offloaded rendezvous protocols. We will explore the architectural advancements and technical highlights that enhance efficiency, scalability, and performance, positioning Ultra Ethernet as a transformative force in next-generation computing.
Tue, January 27, 2026 10:15 - 11:00 5F Main Hall

Speaker
Biography
Keisuke Fujii is a Distinguished Professor at the Graduate School of Engineering Science, Osaka University, where he also serves as Deputy Director of the Center for Quantum Information and Quantum Biology (QIQB). He concurrently leads a research team at the RIKEN Center for Quantum Computing and acts as Chief Technical Advisor at QunaSys Inc., a leading quantum computing software start-up in Japan. He received his Ph.D. in Engineering from Kyoto University in 2011, and has held academic positions at the University of Tokyo and Kyoto University before joining Osaka University in 2019.
Professor Fujii’s research spans a broad spectrum of quantum information science, with a primary focus on the theory of fault-tolerant quantum computation, quantum error correction, quantum algorithms. His achievements have been recognized with the NISTEP Award (2020), the JSPS Prize (2022), and the Osaka University Distinguished Professor title (2022).
Abstract
The past decade has witnessed remarkable progress in the development of quantum computers, culminating in the current era of noisy intermediate-scale quantum (NISQ) devices. While NISQ hardware has enabled first demonstrations of quantum advantage in carefully chosen tasks, its limited qubit number and error rates severely restrict practical applications. Bridging the gap toward fault-tolerant quantum computing (FTQC) requires both architectural innovation and resource-efficient error correction strategies. In this talk, I will provide an overview of the state of the art, highlighting how recent advances in algorithms, quantum error correction, and partially fault-tolerant architectures can pave the way for scalable computation. I will discuss how algorithm design and resource estimation are evolving hand in hand with hardware progress, shaping a roadmap from proof-of-principle demonstrations to early fault-tolerant applications. Special emphasis will be placed on the notion of “early FTQC,” which seeks to exploit partial fault tolerance to reduce overhead while delivering meaningful computational power. By connecting theoretical advances with experimental milestones, I aim to illustrate how high-performance quantum computing may emerge in the future and outline the key challenges and opportunities that lie ahead on the path from NISQ to FTQC.
Tue, January 27, 2026 11:30 - 12:30
Contributors: Addison Snell (Intersect360 Research)
Abstract: In this fast-paced panel format, Addison Snell invites four panelists – two from HPC and AI-using sites (ideally one lab, one industrial) and to from the vendor community (possibly SCA / HPC Asia sponsors) to play the role of the industry analyst by responding to forward-looking questions about the direction of the industry. The panel will address seven topics in 45 minutes. The audience will see the topic list and questions for the panelist on slides, with a timer, and the slides will auto-advance as the timer expires.
Tue, January 27, 2026 11:30 - 12:30
Contributors: Eric Van Hensbergen (Arm)
Abstract: The exponential growth in compute demand from AI workloads has led to increasingly challenging infrastructure requirements. This panel brings together industry leaders to discuss how AI accelerators, compute racks, and AI clusters are evolving to address these requirements. In this panel session, panelists will highlight how future innovations in compute, accelerators, networking, optics, and cooling can enable greater compute capacity, efficiency, and performance at scale.
Speakers:
Tue, January 27, 2026 11:30 - 17:00
Wed, January 28, 2026 11:00 - 17:00
Agenda: Presentations by HPC Centers around the world.
Tue, January 27, 2026 11:30 - 17:00
Contributors: Dale Barker (Centre for Climate Research Singapore (CCRS)), Shigenori Otsuka (RIKEN R-CCS)
Abstract: Numerical Weather Prediction (NWP) and climate modelling have for decades relied on traditional ‘physical’ models, which provide numerical approximations to the Navier Stokes, thermodynamics, and wider Earth System processes discretized and parameterized for solution on some of the world’s largest supercomputers. The balance between model complexity, accuracy, uncertainty representation, time-to-solution etc. is an important consideration for weather/climate applications ranging from operationally ‘nowcasting’ local rainfall in the next 30 minutes, to simulating the global climate systems under various greenhouse gas emissions scenarios over coming decades.
Recent expansion of the use of data-driven (AI and machine learning) techniques to replace (or in the case of hybrid modelling – complement) physics-based approaches has instigated a revolution in NWP and climate modelling. ML-based approaches are now being considered across every stage of the weather/climate data processing chain, ranging from algorithm-specific (e.g. QC, data assimilation, process emulation, bias correction, post-processing, etc.) to full end-to-end (e.g. observation to forecast) approaches.
This session will draw together practitioners in computationally-intensive weather/climate science from world-leading research institutes, operational weather centres, national compute organisations, and the private sector. Speakers will outline their unique priorities and approaches to solving weather/climate challenges, and the specific role of large-scale compute and AI within these national and international endeavours.
Tue, January 27, 2026 11:30 - 17:00
Wed, January 28, 2026 11:00 - 17:00
Contributors: Rio Yokota, Charlie Catlett
Abstract: The TPC is a global initiative that brings together scientists from government laboratories, academia, research institutes, and industry to tackle the immense challenges of building large-scale AI systems for scientific discovery. By focusing on the development of massive generative AI models, the consortium aims to advance trustworthy and reliable AI tools that address complex scientific and engineering problems. Target community includes (a) those working on AI methods development, NLP/multimodal approaches and architectures, full stack implementations, scalable libraries and frameworks, AI workflows, data aggregation, cleaning and organization, training runtimes, model evaluation, downstream adaptation, alignment, etc.; (b) those that design and build hardware and software systems; and (c) those that will ultimately use the resulting AI systems to attack a range of problems in science, engineering, medicine, and other domains.
Tue, January 27, 2026 11:30 - 17:00
Contributors: Kengo Nakajima, France Boillod-Cerneux
Abstract: HANAMI is a collaborative initiative that fosters scientific partnerships between European and Japanese research institutions, with a focus on high-performance computing (HPC) and Artificial Intelligence (AI) at the exascale level and beyond. Bringing together leading research centers and supercomputing facilities, HANAMI targets key priority domains that are climate and weather modeling, biomedical research, and materials science. HANAMI promotes the mobility of researchers between the EU and Japan, while also developing a strategic roadmap to deepen and sustain transcontinental cooperation.
Website: https://hanami-project.com/
Tue, January 27, 2026 12:30 - 13:30
Title: TBA
Tue, January 27, 2026 12:30 - 13:30
Title: Innovative AI HPC solutions that enable advanced, AI-intensive supercomputing
Tue, January 27, 2026 12:30 - 13:30
Title: HPC and Cloud ~challenges and future vision ~ (tentative)


Tue, January 27, 2026 12:30 - 13:30
Title: GMO Internet & NVIDIA ~The Challenge of Optimizing Supercomputers for Commercial Services~
Tue, January 27, 2026 12:30 - 13:30
Title: Toward Quantum Advantage: Quantum-Centric Supercomputing Software Architecture and Scientific Applications. / Data-Centric Architecture: Intelligent Data Acceleration and Content Awareness with IBM Storage Scale.


Tue, January 27, 2026 13:30 - 17:00
Contributors: Worawan Diaz Carballo (Thammasat University), Fabrizio Gagliardi (Barcelona Supercomputing Center)
Abstract: High-Performance Computing (HPC), once confined to elite research centers, is now becoming more accessible through cloud integration and shared infrastructure. This democratization creates opportunities, but access alone is not enough. Training and mentoring are crucial for transforming resources into solutions. The rise of AI and data-driven complexity means that even modest Proof-of-Concepts (POCs) demand substantial computational power. Global initiatives have responded by combining education with community-building. The ACM supports seasonal HPC Schools that offer intensive training and foster lasting professional networks. Simultaneously, the HPC-AI Advisory Council has organized competitions that bridge experts to newcomers on real-world problems. These efforts are enabled by support from HPC centers, such as RIKEN, NCI, BSC, and NSCC, which provide infrastructure and expertise, creating a pipeline that promotes skill development, broadens the scope of HPC education, and nurtures communities. Thailand’s HPC Ignite project, supported by NRCT and ThaiSC, inspired by these models, engaged 373 learners and produced 11 POCs addressing local challenges. It cannot succeed without clouds. With support from AWS, some POCs advanced toward deployment. This panel will explore how grassroots innovators can connect with global HPC and cloud ecosystems—highlighting the hybrid pathway as the key to sustainable, scalable, and equitable impact.
Program:
Tue, January 27, 2026 13:30 - 17:00
Contributors: Tommaso Macrì, Ayumu Imai
Abstract: As quantum processors begin integrating with GPU/CPU supercomputers, HPC centers must chart a practical path from pilots to production. This invited session brings together QuEra (neutral-atom systems), Pawsey (Australia’s Quantum Supercomputing Innovation Hub), Deloitte Tohmatsu (enterprise consulting), LINKS Foundation (EU applied research) and Jij, Inc (Optimization Tooling/platform) to share lessons from real deployments and cross-regional initiatives. Topics include selecting “first-wave” use cases; dynamic scheduling and resource allocation for hybrid QC–HPC workflows; early benchmarking and verification; user support and training; and governance, security, and procurement models. Case insights include a hybrid ML pipeline that inserts a Quantum Reservoir Computing (QRC) layer executed on QuEra’s Aquila neutral-atom platform for credit-default prediction, and LINKS’ recent work on scheduler/allocator designs for hybrid clusters. The session closes with a moderated panel outlining an APAC–EU roadmap for 2026–2028 across algorithms, software stacks, and center operations to make quantum a standard tool in HPC.
Tue, January 27, 2026 13:30 - 17:00
Contributors: Tatsuhiro Chiba, Hiroshi Horii
Abstract: The emergence of Quantum Computing and Artificial Intelligence (AI) is reshaping industries and redefining the boundaries of what is computationally possible. These two transformative technologies are not only advancing independently but are also beginning to converge, offering unprecedented opportunities to solve complex problems across domains.
As Quantum and AI technologies mature, real-world use cases are evolving rapidly in the various domains, material sciences, financial modeling, physics, logistics, and so on. Quantum and AI have distinct strengths and operate in different computational paradigms; Quantum excels in solving combinatorial and probabilistic problems, while AI has an advantage to the data-driven inference. Their complementary nature enables a synergistic foundation for next-generation computing.
This half-day workshop brings together leading voices from academia, industry, chip vendors, and cloud providers to explore how Quantum systems and AI systems are being applied to the real-world use cases today and how they can shape the future of high-performance computing. The session will feature domain-specific case studies, technical presentations, and infrastructure insights for Quantum system and AI system. The workshop will organize a forward-looking panel discussion focused on designing a converged compute fabric, that integrates Quantum and AI capabilities to co-exist and co-evolve.
Details to be announced.
Tue, January 27, 2026 9:30 - 17:00
Wed, January 28, 2026 9:00 - 17:00
Thu, January 29, 2026 9:00 - 15:30
3F Event Hall
Exhibitor List: Exhibitor List and Floor Plan (PDF)
Exhibitors Forum: TBA
Wed, January 28, 2026 09:00 - 09:45 5F Main Hall

Speaker
Biography
Hiroaki Kitano is President and CEO of Sony Computer Science Laboratories, Inc. (Sony CSL). Kitano joined Sony CSL as a researcher in 1993 and has served as President and CEO since 2011. He served as Chief Technology Officer of Sony Group Corporation from 2022 to 2024 and has been Chief Technology Fellow since 2025.
As a researcher at Carnegie Mellon University, Kitano built large-scale data-driven AI systems on massively parallel computers, for which he received the Computers and Thought Award from IJCAI. At Sony CSL and California Institute of Technology, he pioneered the field of systems biology.
Outside Sony, Kitano is a member of the OECD Expert Group on AI Futures, Japan’s AI Strategy Council and AI Safety Institute. Within academia, he serves as a professor at Okinawa Institute of Science and Technology (OIST). He is the Founding President of RoboCup Federation. In 2021, Kitano established the Nobel Turing Challenge, a grand challenge to develop a new engine for scientific discovery.
Abstract
Creating fully or highly autonomous AI and robotics systems to perform high-caliber scientific research will be the most important scientific accomplishment (1). The Nobel Turing Challenge is a grand challenge aiming at building AI scientists capable of making major scientific discoveries continuously at the level worthy of Nobel Prizes (2). The WARP Drive for scientific discoveries shall be created, initially based on the idea of Trillions of data, billions of hypotheses, millions of experiments, and thousands of discoveries. Establishing the cycle of massive extraction of knowledge, massive hypothesis generation, massive experiments by robotics, and massive verification and knowledge consolidation, is the critical first step. This approach is a total flip of conventional wisdom of how science can be performed. Rather than trying to ask an important question, AI scientists may ask every question and important answers are there to be discovered. Massive computing power to enable AI capabilities combined with sophisticated robotics systems for high-precision experiments are the key to the success.
Wed, January 28, 2026 09:45 - 10:30 5F Main Hall

Speaker
Biography
Mateo Valero, http://www.bsc.es/cv-mateo/ is professor of Computer Architecture at Technical University of Catalonia (UPC) and is the Founding Director of the Barcelona Supercomputing Center, where his research focuses on high performance computing architectures. He has published approximately 700 papers, has served in the organization of more than 300 International Conferences and has given more than 800 invited talks. Prof. Valero has been honored with numerous awards, among them: The Eckert-Mauchly Award 2007 by IEEE (Institute of Electrical and Electronics Engineers) and ACM (Association for Computing Machinery), the Seymour Cray Award 2015 by IEEE and the Charles Babbage 2017 by IEEE. Among other awards, Prof. Valero has received The Harry Goode Award 2009 by IEEE, The Distinguished Service Award by ACM. Prof. Valero is a "Hall of the Fame" member of the ICT European Program (selected as one of the 25 most influential European researchers in IT during the period 1983-2008, Lyon, November 2008).
For the full biography, please see Prof. Mateo Valero's CV.
Abstract
The leitmotif of my talk will be the thesis that past advances in computer architecture continue to be relevant in our field today and will dictate the future. In that context, I will touch upon how the current AI accelerators are based on the systolic arrays, how the long vector processors of today are influenced by Cray supercomputers, and how past architecture ideas to support different data formats are reused for mixed precision support in HPC and for layer-by-layer optimization of energy-efficient AI accelerators. This talk will describe some of the related contributions of UPC Department of Computer Arcthiecture (DAC) to the scientific community, especially in the fields of superscalar and vector processors. I will briefly discuss specific UPC DAC contributions which have been incorporated into current high-performance processors, including supercomputers and accelerators aimed at efficient execution of AI applications. In the second part of my talk, I will describe the current research topics at the Barcelona Supercomputing Center (BSC), as well as the chips designed at BSC. Finally, I will conclude with our future vision of how Europe can develop competitive chips based on RISC V to be used in the design of supercomputers and accelerators for AI in the coming years.
Wed, January 28, 2026 11:00 - 12:30
Contributors: Addison Snell (Intersect360 Research)
Abstract: Following the success and popularity of the “Fishbowl Panel” at ISC, we look forward to bringing the format to SCA / HPCAsia 2026. This panel will explore the disruptions and revolutions facing HPC and AI, seeking to separate out the true innovations and advancements from what is myth, hype, or marketing. This forward-looking panel is designed to elicit opinions from a wide range of industry thought leaders. Every eight minutes, one of the three panelists will be dismissed, replaced by a willing member of the audience. Respectful disagreement and diversity of opinions is encouraged. Welcome to the Fishbowl.
Tue, January 27, 2026 11:30 - 17:00
Wed, January 28, 2026 11:00 - 17:00
Agenda: Presentations by HPC Centers around the world.
Tue, January 27, 2026 11:30 - 17:00
Wed, January 28, 2026 11:00 - 17:00
Contributors: Rio Yokota, Charlie Catlett
Abstract: The TPC is a global initiative that brings together scientists from government laboratories, academia, research institutes, and industry to tackle the immense challenges of building large-scale AI systems for scientific discovery. By focusing on the development of massive generative AI models, the consortium aims to advance trustworthy and reliable AI tools that address complex scientific and engineering problems. Target community includes (a) those working on AI methods development, NLP/multimodal approaches and architectures, full stack implementations, scalable libraries and frameworks, AI workflows, data aggregation, cleaning and organization, training runtimes, model evaluation, downstream adaptation, alignment, etc.; (b) those that design and build hardware and software systems; and (c) those that will ultimately use the resulting AI systems to attack a range of problems in science, engineering, medicine, and other domains.
Wed, January 28, 2026 11:00 - 17:50
Contributors: Mitsunori Ikeguchi (RIKEN R-CCS)
Abstract: The 8th R-CCS International Symposium will be held to discuss the outlook from Fugaku to FugakuNEXT and cutting-edge academic research on future-oriented computer science and computational science, including AI technology.
Website: https://www.r-ccs.riken.jp/R-CCS-Symposium/2026/
Program:
Wed, January 28, 2026 12:30 - 13:30
Title: Next Vector project based on proven NEC Vector and RISC-V architecture
Wed, January 28, 2026 12:30 - 13:30
Title: TBA
Wed, January 28, 2026 12:30 - 13:30
Title: TBA
Wed, January 28, 2026 12:30 - 13:30
Title: TBA
Wed, January 28, 2026 12:30 - 13:30
Title: TBA
Wed, January 28, 2026 12:30 - 13:30
Title: Advanced Cooling and Power Solutions for Overcoming HPC Barriers
Speakers



Wed, January 28, 2026 13:30 - 17:00
Contributors: Kento Sato, Toshihiko Kai
Abstract: The rise of AI-driven scientific discovery is transforming the design and operation of high-performance computing (HPC) storage infrastructures. Modern research workflows increasingly combine large-scale simulation, experimental data acquisition, and machine learning, producing unprecedented demands for throughput, scalability, resilience, and intelligent data management. This half-day, multi-vendor panel session brings together leading storage technology providers to share their visions for next-generation storage architectures that can power both traditional HPC and emerging AI/ML workloads. Each speaker will deliver a technical and strategic presentation covering topics such as high-throughput parallel file systems, large-scale object storage, hybrid and tiered designs, and cloud-integrated solutions. These talks will address challenges including extreme data growth, optimizing I/O for AI training and inference, sustaining performance at scale, and balancing cost, energy efficiency, and sustainability. The session will conclude with a moderated panel discussion, where all speakers and the audience engage in a lively dialogue on technology trends, design trade-offs, real-world deployment lessons, and future directions in storage for AI-driven science. Attendees will gain vendor-neutral, strategic insights into the evolving storage landscape and leave with actionable ideas for architecting the next generation of HPC/AI data infrastructures.
Wed, January 28, 2026 13:30 - 17:00
Contributors: Aditi Subramanya, Jana Makar, Selphie Siew
Abstract: As we approach a future shaped by High-Performance Computing (HPC), Artificial Intelligence (AI), Cloud technologies, and Quantum Computing (QC), the call for a diverse, equitable, and inclusive (DEI) ecosystem has never been more urgent. The Diversity & Inclusivity Track at SC Asia 2026 will explore how today’s decisions in creating inclusive research, technical, and organisational environments will directly influence the innovation, ethics, and societal impacts of tomorrow.
This track invites speakers to reflect on the future society we are building and to examine how diverse perspectives can accelerate scientific discovery, create more robust and trustworthy AI systems, ensure equitable access to computational resources, and prepare the next generation of talent. Conversely, it challenges participants to consider the risks of inaction — from algorithmic bias to workforce homogeneity — and the societal costs of excluding underrepresented voices from shaping the digital future.
Through thought-provoking keynotes, case studies, and panel discussions, this track will bring together researchers, industry leaders, and policymakers to reimagine what inclusive excellence means in the era of HPC-driven transformation. Together, we will explore practical strategies to embed DEI principles into research collaborations, infrastructure design, governance frameworks, and talent pipelines, ensuring that future societies are not only technologically advanced but also just, equitable, and resilient.
Wed, January 28, 2026 13:30 - 17:00
Contributors: Earl Joseph, Debra Goldfarb
Abstract: High-performance computing (HPC) is increasingly recognized as a foundational enabler for addressing the world’s most pressing societal challenges — from public health and disaster resilience to climate change and sustainable industry. In the era of AI-driven science and digital transformation, HPC now serves not only as a technical platform but also as a catalyst for collaboration and innovation connecting governments, academia, and industry.
This invited session, co-organized by RIKEN, Hyperion, and AWS, will explore how advanced simulation, AI integration, and cloud-based HPC ecosystems are transforming data-informed decision-making and enabling real societal impact. By highlighting experiences from research institutes, industrial partners, and global service providers, the session will also reflect on Japan’s vision for Society 5.0 — a pioneering model for human-centered, sustainable innovation — and discuss how this concept resonates with global efforts toward inclusive and resilient digital societies.
Wed, January 28, 2026 13:30 - 17:00
Contributors: Qingchun Song, Pengzhi Zhu
Abstract: Data centers are rapidly transforming into AI factories, powered by tens of thousands of GPUs and accelerators. These environments run massive jobs across distributed clusters, making network performance critical to achieving the lowest latency and highest efficiency for both AI training and inference. In this new architecture, the network defines the AI Factory. RDMA technologies have become the backbone of scale-out computing, enabling high-performance east-west communication across GPUs and efficient north-south storage access. Optimizing RDMA communications is essential for boosting both compute and storage performance in large-scale AI and HPC clusters. The HPC-AI Advisory Council is dedicated to advancing these optimizations through cutting-edge technologies, training programs, and competitions.
Webstie: https://www.hpcadvisorycouncil.com/events/2025/APAC-AI-HPC/
Wed, January 28, 2026 13:30 - 17:00
Contributors: Toshiyuki Imamura (RIKEN R-CCS)
Abstract: Recent advances in AI-empowered CPUs, NPUs, and GPUs significantly improve low- and mixed-precision calculations. This invited session will explore cutting-edge theory regarding mixed-precision techniques and their applications, including key algorithms, AI device design, and circuit implementation, focusing on how they can speed up non-AI tasks. The main goal of the session is to provide a detailed review of the Ozaki-scheme, which emulates FP64-GEMM operations using low-precision INT8 GEMMs, and to evaluate its performance and potential, especially in existing systems and the FugakuNEXT generation. Additionally, the session will showcase other innovative methods enhanced by sophisticated MxP algorithms utilizing advanced numerical linear algebra and application co-design.
Program:
Wed, January 28, 2026 13:30 - 17:00
Contributors: Mitsuhisa Sato, Ye Jun
Abstract: Quantum computing is a promising technology that extends the frontiers of computation beyond existing high-performance computers. Although current state-of-the-art supercomputers have solved many scientific problems, quantum computers are expected to further explore their computational capabilities as the most powerful accelerators. Before reaching “quantum advantage,” “quantum utility” has emerged as a new measure of practical usefulness in the field of quantum computing. At present, extensive research and development on systems and applications toward achieving quantum utility—and ultimately quantum advantage—is actively progressing across the Asia-Pacific region. This session highlights the cutting edge of quantum computing: the first part introduces key initiatives in the Asia-Pacific region, while the second part focuses on systems and applications.
Wed, January 28, 2026 13:30 - 17:00
Contributors: Akihiro Asahara
Abstract: This invited session explores the latest advances in performance engineering for HPC systems running AI workloads, bridging operational practices and architectural innovation.
We will discuss continuous telemetry-based performance observation, identification of typical bottlenecks in AI training and inference pipelines, and practical models for iterative optimization across software, middleware, and system layers. Real-world case studies will demonstrate how systematic performance engineering achieves significant efficiency gains in large-scale GPU clusters. The session also looks ahead to how evolving AI models, heterogeneous accelerators, and exascale architectures are reshaping HPC operations and performance management frameworks.
Speakers from academia and industry will share both theoretical and practical insights, highlighting opportunities for collaboration and cross-sector innovation toward sustainable, high-efficiency computing in the AI era.
Program:
Wed, January 28, 2026 13:30 - 17:00
Contributors: Jin-Sung Kim
Abstract: Quantum computing has the potential to solve the world’s most important problems, however useful quantum computers will not exist in a silo. The global HPC community recognizes that the first useful quantum applications will be hybrid, requiring a deep and performant integration between quantum processors and classical HPC & AI supercomputers. This session convenes thought leaders from across these domains to address a pressing challenge that will define the next era of supercomputing: the integration of quantum computers with AI supercomputers.
Experts within the industry agree that quantum computers working in tandem with classical HPC and AI supercomputing will be key to unlocking useful applications within quantum computing. The reasons are twofold:
A performant, scalable, and low-latency interface between AI supercomputers and quantum computers is the key technical innovation required. The interface should be common and easily adaptable by quantum processor unit (QPU) builders and QPU system controller (QSC) builders and should integrate readily with HPC resources. Software architectures should provide support for real-time callbacks and data marshaling between the HPC and QSC to provide support for QEC decoding and hybrid quantum-classical applications. This session will spark a forward-looking discussion on existing and future architectures, software, and standards needed to implement this "quantum-classical" compute fabric.
Thu, January 29, 2026 09:00 - 09:45 5F Main Hall

Speaker
Biography
Katherine Yelick is the Vice Chancellor for Research at the University of California, Berkeley, where she also holds the Robert S. Pepper Distinguished Professor of Electrical Engineering and Computer Sciences. She is also a Senior Faculty Scientist at Lawrence Berkeley National Laboratory. She has been recognized for her research and leadership in high performance computing and is a member of the National Academy of Engineering and the American Academy of Arts and Sciences.
Abstract
The first generation of exascale computing systems are online along with powerful new application capabilities and system software. At the same time, demands for high performance computing continue to grow for more powerful simulations, adoption of machine learning methods, and huge data analysis problems arising for new instruments and increasingly ubiquitous data collection devices. In its broadest sense, computational science research is expanding beyond physical and life sciences into social sciences, public policy, and even the humanities.
With chip technology facing scaling limits and diminishing benefits of weak scaling, it will be increasingly difficult to meet these new demands. Disruptions in the computing marketplace, which include supply chain limitations, a shrinking set of system integrators, and the growing influence of cloud providers are changing underlying assumptions about how to acquire and deploy future supercomputers. At the same time AI is having an enormous influence on hardware designs, leaving traditional scientific methods at a crossroads – do the join the AI bandwagon or try to use the hardware for traditional methods?
In this talk I’ll present some of the findings of a US National Academies consensus report on the future of post-exascale computing, which states that business as usual will not be sufficient. I will also give my own perspectives on some of the challenges and opportunities faced by the research community.
Thu, January 29, 2026 09:45 - 10:30 5F Main Hall

Speaker
Biography
Dr. Jay M. Gambetta is the Vice President in charge of IBM’s overall Quantum initiative. He was named as an IBM Fellow in 2018 for his leadership in advancing superconducting quantum computing and establishing IBM’s quantum strategy to bring quantum computing to the world, and to make the world quantum safe.
Under his leadership, IBM was first to demonstrate a cloud-based quantum computing platform; a platform that has grown to become the predominant quantum service utilized by 600,000+ users to run over 3 trillion quantum circuits. These users include 280+ members of the IBM Quantum Network, representing forward-thinking academic, industry, and governmental organizations focused on building a quantum-native ecosystem. IBM Quantum continues to expand in the market by providing Quantum as a Service utilizing the IBM Quantum System One and Two series of devices, and to date has deployed over 75 quantum systems online, building the foundations of the quantum industry. In addition, he was responsible for the creation and early development of Qiskit; the leading open-source quantum computing software development kit, allowing users to build, optimize, and execute quantum circuits on hardware from a multitude of quantum service providers.
Dr. Gambetta received his Ph.D. in Physics from Griffith University in Australia. He is a Fellow of the American Physical Society, IEEE Fellow, and has over 130 publications in the field of quantum information science with over 50,000 citations.
Abstract
As quantum computing pushes into the era of advantage, algorithm development comes to the forefront as a crucial step from advantage to useful quantum computing. To facilitate this transition, the quantum industry needs a focus on performant hardware, performant software, and seamless integration between classical and quantum resources. In this talk, Jay will discuss IBM’s strategy for quantum computing, including advances in quantum-centric supercomputing software and algorithms, and IBM’s hardware roadmap leading to large-scale fault-tolerant quantum computers. Together, we are building the future of computing.
Tue, January 27, 2026 9:30 - 17:00
Wed, January 28, 2026 9:00 - 17:00
Thu, January 29, 2026 9:00 - 15:30
3F Event Hall
Exhibitor List: Exhibitor List and Floor Plan (PDF)
Exhibitors Forum: TBA
Thu, January 29, 2026 10:30 - 11:00 5F Main Hall
Thu, January 29, 2026 12:30 - 13:30
Title: TBA
Thu, January 29, 2026 12:30 - 13:30
Title: TBA
Thu, January 29, 2026 11:30 - 17:00
Contributors: Rio Yokota, Jason Haga
Abstract: The 18th Accelerated Data Analytics and Computing (ADAC) Symposium is a co-located event at SCA/HPCAsia 2026, bringing a prestigious global HPC forum to Osaka, Japan. Scheduled for January 29, 2026, this open symposium will be part of the main conference program and is accessible to all HPC-Asia/SC-Asia participants. Under the theme “Beyond Classical Boundaries: HPC for Next Generation AI and Quantum Computing”, the ADAC18 Symposium explores how high-performance computing is transcending traditional limits to empower breakthroughs in artificial intelligence (AI) and quantum computing. The symposium will showcase cutting-edge research and collaborative initiatives at the intersection of supercomputing, AI, and quantum technologies, aligning with SCA/HPCAsia’s vision of uniting these communities to create the future. Attendees can expect insightful talks, networking opportunities, and a forward-looking dialogue on the future of computational science beyond classical HPC.
Website: https://adac.ornl.gov/18th-adac-symposium-workshop-january-29-february-2-4-2026/
Thu, January 29, 2026 13:30 - 17:00
Contributors: Pedro Valero Lara, William F. Godoy, Dhabaleswar K. Panda
Abstract: This workshop objectives are focused on LLMs advances for any HPC major priority and challenge with the aims to define and discuss the fundamentals of LLMs for HPC-specific tasks, including but not limited to hardware design, compilation, parallel programming models and runtimes, application development, enabling LLM technologies to have more autonomous decision-making about the efficient use of HPC. This workshop aims to provide a forum to discuss new and emerging solutions to address these important challenges towards an AI-assisted HPC era.
Website: https://ornl.github.io/events/llm4hpcasia2026/
Call for Papers: Please see the website for details
Thu, January 29, 2026 13:30 - 17:00
Contributors: Zhaobin Zhu, Ryoma Ohara, Radita Liem
Abstract: AI and data-intensive workloads are driving up both computational and energy demands, with data movement and storage now consuming energy on par with computation. Yet, these costs remain poorly understood and rarely optimized. This workshop brings together researchers and practitioners from AI, HPC, and energy domains to address the challenges of modeling, profiling, and optimizing data flows for performance and sustainability. Topics include power profiling, bottleneck analysis, and energy-aware strategies across diverse architectures, from high-end HPC to resource-constrained systems. The workshop emphasizes holistic energy optimization, highlighting data movement as a critical factor in application performance and sustainability. It encourages the development of methods and tools that improve energy efficiency and supports collaboration toward more sustainable computing practices.
Website: https://dream-workshop.github.io
Call for Papers: Please see the website for details
Thu, January 29, 2026 13:30 - 17:00
Contributors: Hiroshi Horii, Antonio Córcoles
Abstract: Quantum computing has the potential to elevate heterogeneous high-performance computers to be able to tackle problems that are intractable for purely classical supercomputers. This requires the integration of quantum processing units (QPUs) into a heterogeneous classical compute infrastructure, which we refer to as the quantum-centric supercomputing (QCSC) model. Achieving this requires multiple industries to work together and align their efforts to integrate the hardware and software, alike.
In this workshop we will present state-of-the-art solutions that are used in today’s heterogeneous compute architecture. We will demonstrate how QPUs are integrated into the existing frameworks without having to reinvent already established high-performance computing ecosystem tools and we will showcase how hybrid quantum-classical algorithms and applications can directly benefit from such QCSC architectures. Taken all together we discuss how these interdisciplinary integration efforts pave the way to quantum advantage becoming a reality in the near future.
Thu, January 29, 2026 13:30 - 17:00
Contributors: Manuel F. Dolz, Sandra Catalán
Abstract:
--------------------------------------------------------------
Introduction
--------------------------------------------------------------
As AI systems become increasingly decentralized, the need for secure,
privacy-aware training mechanisms is critical. Federated Learning enables
collaborative model training across multiple parties without requiring the
exchange of raw data, making it a promising approach for data-sensitive
environments. At the same time, Homomorphic Encryption allows computations
to be performed directly on encrypted data, ensuring end-to-end
confidentiality.
SAFE-HE is designed to explore the intersection of these two technologies,
addressing the growing demand for trustworthy AI by uniting researchers and
practitioners from machine learning, cryptography, privacy, and HPC.
--------------------------------------------------------------
Objectives, scope and topics of the workshop
--------------------------------------------------------------
The objective of this workshop is to bring together researchers and
practitioners to explore recent advances in integrating Federated Learning
and Homomorphic Encryption, with a focus on both theoretical foundations
and practical implementations. The workshop aims to identify key open
challenges and highlight promising research directions for secure and
efficient distributed learning. A core goal is to foster interdisciplinary
collaboration across the cryptography, machine learning, privacy,
cybersecurity, and systems communities, promoting a shared understanding of
the complex issues at the intersection of these fields. Additionally, the
workshop will serve as a platform to disseminate tools, methodologies, and
results developed under the CIBER-CAFE initiative, supporting the broader
adoption and impact of secure collaborative learning technologies.
We are looking for original high-quality research and position papers on
algorithms, frameworks, and practical applications integrating Federated
Learning and Homomorphic Encryption. Topics of interest include, but are
not limited to:
Website:: https://sites.google.com/uji.es/safe-he2026
Call for Papers: Please see the website for details
Thu, January 29, 2026 13:30 - 17:00
Contributors: Rossen Apostolov, Manoj Khare
Abstract: TBA
Thu, January 29, 2026 13:30 - 17:00
Contributors: Rossen Apostolov, Andrew Rohl, Michael Sullivan, Kengo Nakajima, Manoj Khare, Min-ho SUH
Abstract: TBA
Thu, January 29, 2026 13:30 - 17:00
Contributors: François Mazen, Jorji Nonaka, Thomas Theußl
Abstract: First International Workshop on High Performance Massive Data Visualization.
Numerical simulations at large scale, nowadays up to exascale, tend to produce very large datasets which are very challenging to analyse and visualize. The current state of the art involves in situ and in transit techniques to analyse on-the-fly data and avoiding costly data saving on disk. In addition, emerging approaches leverage progressive analyses at the data loading stage and at the analysis stage.
This workshop proposes to explore current and novel approaches to post-process massive data, usually in high performance manners, generated from HPC systems (e.g. large-scale simulation results) and facilities (e.g. log data from the HPC system itself and its electrical and cooling facilities) as well as sensor data from different kinds of measurement systems: Parallel and distributed visualization
This workshop encourages contributed talks of recent work regarding methods, workflow, results, post-mortem of large scale data analysis, including in situ and in transit visualization. Ultimately, this workshop would connect the Asian HPC community with the international community working on large data visualization challenges and would share the existing and emerging solutions.
Website: https://hpmdv-9b1629.gitlab.io/
Thu, January 29, 2026 13:30 - 17:00
Contributors: Justin Wozniak, Nicholas Schwarz, Hannah Paraga
Abstract: Experimental science areas are increasingly adopting advanced computing techniques. From established institutional computing solutions to highly customized approaches, diverse scientific areas of research use myriad approaches to computing and automation. There is a great opportunity for idea sharing and collaboration on the common aspects to managing the intersection of experiment and computation.
Advances in machine learning (ML) and artificial intelligence (AI) capabilities are on the verge of accelerating discovery far beyond traditional workflows. Improvements in the numerical engines, simulations, digital twins, and agentic AI could transform experimental science by autonomously designing, executing, and adapting experiments in real time. Additionally, integration of advanced computing resources with data management and experiment control systems supports these transformations. However, challenges remain. Automated services that manage all aspects of the scientific work cycle have not yet been realized due to the complexities of collecting physical data, managing the data lifecycle, actuating computation, and interacting with human users.
The ACX Workshop will be an exciting forum for the exchange of ideas around this topic. We invite researchers and developers of inspiring results in “big science” endeavors, to the “day one” creativity that it takes to bring automation into emerging and potentially transformational experimental approaches.
Website: https://acx-2026.cels.anl.gov/
Call for Papers: https://acx-2026.cels.anl.gov/call-for-papers/
Thu, January 29, 2026 13:30 - 17:00
Contributors: Edgar A. Leon
Abstract: Modern supercomputing has reached a critical juncture where raw computational power alone cannot guarantee optimal application performance. As the world's most powerful systems—including Frontier (the first exascale computer) and El Capitan (currently the fastest supercomputer)—feature increasingly complex heterogeneous architectures with over a million cores, tens of thousands of accelerators, and intricate memory hierarchies, the challenge of efficiently mapping applications to hardware has become paramount.
This half-day tutorial addresses the often-overlooked third pillar of high-performance computing: Affinity—the strategic mapping of software (first pillar) to hardware resources (second pillar). While most HPC practitioners focus on algorithmic optimization and hardware capabilities, failing to account for hardware locality creates expensive data movement that can cripple even well-designed applications on top-tier systems.
What you will learn: (1) Discover and navigate complex hardware topologies using industry-standard tools like hwloc; (2) Master both policy-driven (high-level) and resource-specific (low-level) affinity techniques; (3) Control process and GPU kernel placement using Slurm and Flux resource managers; (4) Leverage hardware locality to minimize data movement and maximize performance; and (5) Apply locality-aware mapping strategies that scale from small clusters to exascale systems.
Hands-on experience: Through progressive modules and practical exercises on a live cluster environment, attendees will work with architectures from the world's leading supercomputers. The tutorial builds from fundamental topology discovery to advanced multi-GPU affinity policies, ensuring both beginners and intermediate users gain valuable skills.
Target audience: HPC users, computational scientists, students, and center staff seeking to bridge the gap between applications and hardware. No prior affinity experience required—just basic Linux knowledge and familiarity with parallel programming concepts.
Transform your understanding of HPC performance optimization and unlock your applications' full potential on today's most complex computing systems.
Thu, January 29, 2026 13:30 - 17:00
Contributors: Nick Jones
Abstract: This tutorial will introduce the fundamentals of cloud-like system provisioning with OpenCHAMI and enable attendees to build on what they’ve learned by applying DevOps principles to manage OpenCHAMI clusters.
OpenCHAMI is a relatively new open source, open governance project from partners: HPE, the University of Bristol, CSCS, NERSC, and LANL. It securely provisions and manages on-premise HPC nodes at any scale. With a cloud-like composable microservice architecture, and an emphasis on infrastructure as code, the tools to install and manage OpenCHAMI may not be familiar to many traditional HPC administrators. Having helped our own teams at LANL to make this transition, the presenters want to bring the same training to a broader audience.
In this half-day tutorial, attendees will learn how each of the components in an OpenCHAMI system can be used to bootstrap their own virtual clusters in the first hour and then build on what they’ve learned to leverage DevOps workflows to automate the management of multiple clusters.
Thu, January 29, 2026 13:30 - 17:00
Contributors: Deepika H V
Abstract: In today’s rapidly evolving HPC and AI landscape, the ability to write performance-portable code across a broad spectrum of computing architectures is becoming essential. From multi-core CPUs to GPUs and emerging accelerators, today’s applications must scale efficiently while maintaining a unified codebase. This tutorial introduces the principles of device-agnostic programming, emphasizing the need for Unified Programming Models that balance Portability with Performance, also offering practical guidance on building portable, high-performance applications for heterogeneous systems.
We begin by exploring how SYCL, a modern C++-based programming model from the Khronos Group that enables single-source parallel programming across heterogeneous systems can help overcome common pitfalls of hardware-specific development, enabling developers to target diverse platforms without rewriting codebase. SYCL supports a clean abstraction over hardware, while providing fine-grained control over execution, memory, and data movement.
The concepts will be illustrated using the SYCL implementation from an Indian HPC programming ecosystem, tested on a wide spectrum of architectures—including x86 CPUs (Intel Cascade Lake, Sapphire Rapids; AMD Genoa), ARM processors (Fujitsu A64FX, NVIDIA Grace, Ampere Altra), RISC( IBM Power10)and GPUs (NVIDIA V100, A100, H200; AMD MI100, MI300X, MI300A). These platforms represent the diversity of modern HPC systems which are central to ongoing global HPC initiatives including Asia.
Participants will gain hands-on insights into SYCL fundamentals such as buffers, accessors, kernel dispatch, Unified Shared Memory (USM). By the end of the session, attendees will be equipped to write portable, high-performance code for real-world HPC and AI workloads using SYCL’s device-agnostic model—while aligning with emerging ecosystem efforts that promote unified development across architectures.
This tutorial is suitable for students, developers, researchers, and system architects with varying levels of expertise in parallel programming and heterogeneous computing.
Thu, January 29, 2026 11:30 - 18:30 (Tentative)
Contributors: Filippo Spiga
Abstract: Arm technology has become a compelling choice for HPC due to its promise of efficiency, density, scalability, and broad software ecosystem support. Arm expansion in the datacenter started in 2018 with Arm Neoverse, a set of infrastructure CPU IPs designed for high-end computing. The Arm-based Fugaku supercomputer, first of its kind implementing Arm SVE instruction set, entered the Top 500 in June 2020 scoring at the top and retaining a leadership position over the years not only in HPL but also for HPCG. There is a growing interest in diversifying and exploring new computing architectures to re-create a vibrant and diverse ecosystem as it was more than a decade ago. Arm technology is at the forefront of this wave of change. To further advance datacenter and accelerated computing solutions, NVIDIA has built the Grace Hopper Superchip which brings together the groundbreaking performance of the NVIDIA Hopper GPU with the versatility of the NVIDIA Grace CPU, tightly connected with a high bandwidth and memory coherent Chip-2-Chip (C2C) interconnect. The NVIDIA Grace CPU packs 72 high performance Armv9 cores on a single die to realize competitive FP64 TFlops of computing performance and up to 500GB/s of memory bandwidth at industry-leading power efficiency. In this tutorial, our experts will answer any questions you may have about fully unlocking the scientific computing potential of the Grace CPU and Grace Hopper GH200 Superchip. The speakers will guide the attendees through compile, execute, profile and optimize HPC and AI workloads for Arm to demystify those claims that changing CPU architecture is hard. At the same time it will introduce how to leverage the GH200 unique architecture using multiple programming models thanks to practical examples attendees can replicate. Remote access to NVIDIA GH200 resources will be provided.
Website: https://www.jcahpc.jp/event/SCA_HPCAsia2026_Tutorial.html
Registration form: https://forms.gle/5H169bneuC1dFUNv9 (MANDATORY — deadline December 21st)
Thu, January 29, 2026 13:30 - 17:00
Contributors: Tyler Takeshita
Abstract: Real-world quantum systems are subject to external interactions, no matter these interactions being intentional or unintentional. Efficient and accurate numerical simulation of open quantum systems (OQS), therefore, provide valuable insights into fundamental quantum processes responsible for experimental observations. Accurate simulations of OQSs are a crucial tool in designing higher performance quantum processors, enabling researchers to explore and optimize the vast parameter space of quantum hardware, uncover fundamental physics enabling higher performance qubits, and design higher fidelity control and readout methods. However, the computational demand of these simulations grows rapidly due to the exponentially increasing dimensionality of the Hilbert space and can require the use of high-performance compute (HPC) environments. In this tutorial, we demonstrate how to develop, test, and scale the simulations of OQS on AWS using CUDA-Q Dynamics and QuTiP, both accelerated by cuQuantum. The tutorial introduces open quantum dynamics basics and their computational considerations, independent of the underlying cloud architecture. During hands-on labs, we then architect cloud-native HPC solutions capable of leveraging accelerated compute resources, like Amazon EC2 P6 instances powered by NVIDIA Blackwell GPUs. Participants will get free access to temporary AWS accounts so they can provision their own HPC cluster during the tutorial. All attendees leave with code examples they can use as a foundation for their own projects.
Thu, January 29, 2026 13:30 - 17:00
Contributors: Julien Loiseau
Abstract: FleCSI is a C++ library that simplifies the development of portable, scalable scientific computing applications. It provides a distributed, a task-parallel programming model that abstracts away the complexity of parallelism, data management, and execution across architectures. By managing inter-process communication and synchronization on behalf of the application, FleCSI protects developers from common pitfalls such as race conditions and deadlocks. It also coordinates data movement between CPUs and GPUs, ensuring that GPU kernels receive data in a form that is optimized for data-parallel computation. This tutorial will be the first public hands-on session for FleCSI. Participants will learn the core programming models that make FleCSI unique: the Control Model for defining control-flow logic, the Data Model for managing distributed fields and topologies, and the Execution Model for running parallel tasks. The session will emphasize building real-world HPC applications, including demonstrations of on-node parallelism, tasks, and specialization mechanisms. The tutorial includes live coding exercises that guide attendees in developing a scalable FleCSI application from a serial example. By the end of the session, participants will be equipped to create their own performance-portable HPC applications using FleCSI’s abstractions. FleCSI is actively used in research applications such as FleCSPH and HARD, showcasing its applicability in astrophysics, multiphysics simulations, and radiation hydrodynamics. Its open-source nature makes it an attractive choice for research teams building future-proof applications on emerging architectures.
Thu, January 29, 2026 13:30 - 17:00
Contributors: Krishan Gopal Gupta
Abstract: Large language models (LLMs) such as Llama, Qwen, and DeepSeek have revolutionized natural language processing, powering breakthroughs in chat interfaces, scientific reasoning, and generative AI. However, training, fine-tuning and deploying these models require substantial compute resources, high memory bandwidth, and scalable training strategies. As LLM workloads grow, researchers and developers increasingly look beyond conventional workstations and cloud platforms to HPC systems for efficient, cost-effective training.
This tutorial equips participants with best practices for scalable training and fine-tuning of LLMs across a range of infrastructure—from modest GPU workstations with NVMe SSDs to mid-scale HPC clusters and dense multi-GPU AI clusters. Participants will learn how to utilize frameworks such as DeepSpeed and PyTorch’s FSDP to perform memory-efficient and compute-optimized distributed training. Concepts such as ZeRO optimization, parameter offloading, mixed precision, and activation checkpointing will be demonstrated live on multi-node GPU systems.
Special attention will be given to real-world HPC environments, where constrained memory, resource scheduling (via SLURM), and I/O bottlenecks often challenge AI workflows. We will walk through containerized LLM training pipelines using Enroot and Apptainer, emphasizing reproducibility, portability, and software-hardware compatibility.
Use cases from biomedical text generation, scientific QA, and log summarization will illustrate the deployment of LLMs in HPC centers. The session will also include benchmarking methods and platform comparisons, helping participants assess trade-offs between model size, scaling efficiency, and hardware utilization.
Through a combination of hands-on labs and architectural walkthroughs, attendees will leave with a practical skillset to train, fine-tune and optimize LLMs across range of compute setup—workstation, HPC or GPU Dense AI clusters.
Thu, January 29, 2026 13:30 - 17:00
Contributors: Philip Fackler
Abstract: The Julia for performance-portable High-Performance Computing (HPC) via JACC tutorial offers attendees a hands-on opportunity to gain practical experience in leveraging Julia for efficient and parallel code development tailored to their HPC requirements. JACC is a Julia library that enables a single code to be easily parallelized across CPUs and GPUs: NVIDIA, AMD, Intel, Apple and can interoperate with the Julia HPC ecosystem in particular with the message passing interface (MPI). Due to the recent adoption of Julia across several scientific codes for its productive ecosystem (scientific syntax, packaging, analysis, AI) and performance via LLVM compilation, hence we address the need for vendor-neutral HPC capabilities. Similar tutorials have been offered at several venues: SC24, ICPP25, and the US Department of Energy labs and supercomputing facilities. During the proposed 3 hours we will cover basic aspects of the Julia language and provide exercises and a full application (Gray-Scott) to showcase the use of JACC APIs with MPI and parallel I/O (via ADIOS) in a real scientific problem. The tutorial targets scientists with interest in Julia at a beginner and intermediate level on the use of parallel code at a minimal cost.