Accepted Papers
We are pleased to announce the accepted papers. Stay tuned for the
full program with schedule and room details.
NOTE: Paper titles are subject to change before the camera-ready
submission.
-
Toward unprivileged, portable and generic network topology
discovery
- Pepin, Jaeger, Mercier, Goglin
- Integrating Quantum Software Tools with(in) MLIR
-
Hopf, Stade, Rovara, Burgholzer, Quetschlich, Florea, Lopez,
Izaac, Wille
- ChatMPI: LLM-Driven MPI Code Generation for HPC Workloads
-
Valero-Lara, Young, Naughton III, Engelmann, Geist, Vetter,
Teranishi, Godoy
- Optimization of a GEMM Implementation using Intel AMX
- Endo, Ohshima, Nanri
-
Modeling the Potential of Message-Free Communication via CXL.mem
- Vanecek, Turner, Gajbe, Wolf, Schulz
-
Exploring User Heterogeneity-Aware Differentiated Token Pricing
for On-Premises Large Language Models
- Peng, He, Lv, Wu, Yu, Shi, Zhai, Sheng, Wang, Wei
-
Towards Unified Acceleration: Weight-Stationary GEMM on
HPC-oriented Elastic CGRAs
- Shi, Adhi, Teng, Liu, Miwa, Sano
-
High-performance in-situ ML Inference with dalotia: A
Lightweight Tensor Loader API for Science Codes
- Pollinger, Domke
-
ROIX-Comp: Optimizing X-ray Computed Tomography Imaging Strategy
for Data Reduction and Reconstruction
- Singh, Sato, Yoshida, Uesugi, Joti, Hatsui, Rubio Proaño
-
ClinTwin PINN Real Time Patient Specific Cardiopulmonary Digital
Twin via Meshless Physics Informed Neural Fields on
Heterogeneous HPC
- Maulana, Pratiwi
-
PRISM: Profiling-Free Symbolic Memory-Driven Strategy Planner
for Large DNN Model Training
- Wang, Fang, Li, Tachon, Appuswamy
-
TRIOS: Reducing File-System Contention through Predictive
Time-Resolved I/O Simulation in Job Scheduling
- Tseng, Kawai, Takahashi, Takizawa
-
The X Quantum Software Stack: Connecting End Users, Integrating
Diverse Quantum Technologies, Accelerating HPC
-
Burgholzer, Echavarria, Hopf, Stade, Rovara, Schmid, Kaya, Mete,
Farooqi, Chung, De Pascale, Schulz, Schulz, Wille
-
Guaranteed DGEMM Accuracy While Using Reduced Precision Tensor
Cores Through Extensions of the Ozaki Scheme
-
Schwarz, Anders, Brower, Bayraktar, Gunnels, Clark, G. Xu,
Rodriguez, Cayrols, Tabaszewski, Podlozhnyuk
-
Performance analysis of Arm-based processors across multiple
compilers for HPC workloads
- Kobayashi, Ando, Yamaura, Inoue, Murai
-
Rankmap optimization for large scale HPC applications with
simulated annealing based on MPI trace information
- Kuroda, Nakamura, Ando, Murai, Kato
-
Improved Implementation of Number Theoretic Transform on NVIDIA
GPU with Tensor Cores
- Sugizaki, Takahashi
-
EmuPlat: A Framework-Agnostic Platform for Quantum Hardware
Emulation with Validated Transpiler-to-Pulse Pipeline
- Ye, Khoo
-
Optimizing Intra-Layer Parallel Communication for LLM Training
on Systems with Fully-Connected Mesh GPU Topology
- Hosoki, Sato, Endo, Bigot, Audit
- QPU Micro-Kernels for Stencil Computation
- Markidis, Netzer, Pennati, Peng
-
Scalable QRAM with Superposition-Based Data Loading for
Noise-Resilient Quantum Machine Learning on NISQ Devices
- Sajadimanesh, Atoofian
-
Tensor-Core-Optimized Strategies for BLR × Tall-Skinny Matrix
Multiplication in BEM
-
IDA, Goto, Yokota, Hiraishi, Hanawa, Iwashita, Kawai, Ohshima,
Hoshino
-
Enhancing Stability and Optimizing Implmentation of
Mixed-Precision Block $\epsilon$-Circulant Preconditioned
Solvers for Parallelization-in-Time
- Yoda, Bolten
-
GPU Partitioning, Power, and Performance of the AMD MI300A
-
Abouelmagd, Boehme, Brink, Burmark, McKinsey, Skjellum, Pearce
-
Mixed-precision Interpolative Decomposition on GPUs
[Best Paper Finalist]
- Ma, Imamura
-
Fusing Sequence Motifs and Pan-Genomic Features: Antimicrobial
Resistance Prediction using an Explainable Lightweight 1D CNN -
XGBoost Ensemble
- Siddiqui, Tarannum
-
Beyond Exascale: Dataflow Domain Translation on a Cerebras
Cluster [Best Paper Finalist]
-
Oppelstrup, Giamblanco, Kalchev, Sharapov, Taylor, Van
Essendelft, Rajamanickam, James
-
Cloud-Hardware Co-Design for Memory Bandwidth-Bound HPC
Workloads: Performance and Characteristics of Azure HBv5 Virtual
Machines
-
Rastegari, Kovouri, Cui, Naz, Fleischman, Gupta, Harwani, Loh,
Greenseid, Burness, Ram, Ringenburg
-
A Multi-ROI Camera Motion Exploration Approach for Enhancing
Image-based Smart In-Situ Visualization
- Matsushima, Adachi, Sakamoto, Nonaka
-
A Matrix-Free Algebraic hp-Multigrid Method for Computational
Fluid Dynamics Applications
[Best Paper Finalist]
- Ohm, Harper, Jansson
-
Scalable eVTOL Aerodynamics Simulations on Heterogeneous HPC
Platforms with Minimal-Invasive GPU Porting
- Ohm, Takii, Ando, Bale, Tsubokura
-
Deterministic Quantum Search for Index Retrieval: Algorithm
Design and Implementation
- Mishra, Balasubramanyam, Raghava
- GCAMPS: A Scalable Classical Simulator for Qudit Systems
- Harper, Nakhl, Quella, Sevior, Usman
-
Task-decomposed Overlapped Preconditioner for Sustained Strong
Scalability on Accelerated Exascale Systems
- Jansson, Karp, Páll, Markidis, Schlatter
-
What Will the Grace Hopper-Powered Jupiter Supercomputer Bring
for Sparse Linear Algebra?
- Tsai, Bode, Anzt
-
Revisiting Communication Software Offloading for MPI+Threads:
Reducing Contention and Improving Overlap on Many-Core Systems
- Breiter, Chung, Fürlinger, Weidendorfer, Kranzlmüller
-
Deep Learning-Integrated Pairwise-Qubit Subsystems for Highly
Efficient Quantum Circuit Simulation
- Pradata, Amrizal, Suryanto, Nugraha, Takizawa
-> Go back to the Papers page