Papers

Accepted Papers

We are pleased to announce the accepted papers. Please refer to the full program with schedule and room details.
NOTE: Paper titles are subject to change before the camera-ready submission.

Toward unprivileged, portable and generic network topology discovery: Pepin, Jaeger, Mercier, Goglin
Integrating Quantum Software Tools with(in) MLIR: Hopf, Stade, Rovara, Burgholzer, Quetschlich, Florea, Lopez, Izaac, Wille
ChatMPI: LLM-Driven MPI Code Generation for HPC Workloads: Valero-Lara, Young, Naughton III, Engelmann, Geist, Vetter, Teranishi, Godoy
Optimization of a GEMM Implementation using Intel AMX: Endo, Ohshima, Nanri
Modeling the Potential of Message-Free Communication via CXL.mem: Vanecek, Turner, Gajbe, Wolf, Schulz
Exploring User Heterogeneity-Aware Differentiated Token Pricing for On-Premises Large Language Models: Peng, He, Lv, Wu, Yu, Shi, Zhai, Sheng, Wang, Wei
Towards Unified Acceleration: Weight-Stationary GEMM on HPC-oriented Elastic CGRAs: Shi, Adhi, Teng, Liu, Miwa, Sano
High-performance in-situ ML Inference with dalotia: A Lightweight Tensor Loader API for Science Codes: Pollinger, Domke
ROIX-Comp: Optimizing X-ray Computed Tomography Imaging Strategy for Data Reduction and Reconstruction: Singh, Sato, Yoshida, Uesugi, Joti, Hatsui, Rubio Proaño
ClinTwin PINN Real Time Patient Specific Cardiopulmonary Digital Twin via Meshless Physics Informed Neural Fields on Heterogeneous HPC: Maulana, Pratiwi
PRISM: Profiling-Free Symbolic Memory-Driven Strategy Planner for Large DNN Model Training: Wang, Fang, Li, Tachon, Appuswamy
TRIOS: Reducing File-System Contention through Predictive Time-Resolved I/O Simulation in Job Scheduling: Tseng, Kawai, Takahashi, Takizawa
The X Quantum Software Stack: Connecting End Users, Integrating Diverse Quantum Technologies, Accelerating HPC: Burgholzer, Echavarria, Hopf, Stade, Rovara, Schmid, Kaya, Mete, Farooqi, Chung, De Pascale, Schulz, Schulz, Wille
Guaranteed DGEMM Accuracy While Using Reduced Precision Tensor Cores Through Extensions of the Ozaki Scheme: Schwarz, Anders, Brower, Bayraktar, Gunnels, Clark, G. Xu, Rodriguez, Cayrols, Tabaszewski, Podlozhnyuk
Performance analysis of Arm-based processors across multiple compilers for HPC workloads: Kobayashi, Ando, Yamaura, Inoue, Murai
Rankmap optimization for large scale HPC applications with simulated annealing based on MPI trace information: Kuroda, Nakamura, Ando, Murai, Kato
Improved Implementation of Number Theoretic Transform on NVIDIA GPU with Tensor Cores: Sugizaki, Takahashi
EmuPlat: A Framework-Agnostic Platform for Quantum Hardware Emulation with Validated Transpiler-to-Pulse Pipeline: Ye, Khoo
Optimizing Intra-Layer Parallel Communication for LLM Training on Systems with Fully-Connected Mesh GPU Topology: Hosoki, Sato, Endo, Bigot, Audit
QPU Micro-Kernels for Stencil Computation: Markidis, Netzer, Pennati, Peng
Scalable QRAM with Superposition-Based Data Loading for Noise-Resilient Quantum Machine Learning on NISQ Devices: Sajadimanesh, Atoofian
Tensor-Core-Optimized Strategies for BLR × Tall-Skinny Matrix Multiplication in BEM: IDA, Goto, Yokota, Hiraishi, Hanawa, Iwashita, Kawai, Ohshima, Hoshino
Enhancing Stability and Optimizing Implmentation of Mixed-Precision Block $\epsilon$-Circulant Preconditioned Solvers for Parallelization-in-Time: Yoda, Bolten
GPU Partitioning, Power, and Performance of the AMD MI300A: Abouelmagd, Boehme, Brink, Burmark, McKinsey, Skjellum, Pearce
Mixed-precision Interpolative Decomposition on GPUs [Best Paper Finalist]: Ma, Imamura
Fusing Sequence Motifs and Pan-Genomic Features: Antimicrobial Resistance Prediction using an Explainable Lightweight 1D CNN - XGBoost Ensemble: Siddiqui, Tarannum
Beyond Exascale: Dataflow Domain Translation on a Cerebras Cluster [Best Paper Finalist]: Oppelstrup, Giamblanco, Kalchev, Sharapov, Taylor, Van Essendelft, Rajamanickam, James
Cloud-Hardware Co-Design for Memory Bandwidth-Bound HPC Workloads: Performance and Characteristics of Azure HBv5 Virtual Machines: Rastegari, Kovouri, Cui, Naz, Fleischman, Gupta, Harwani, Loh, Greenseid, Burness, Ram, Ringenburg
A Multi-ROI Camera Motion Exploration Approach for Enhancing Image-based Smart In-Situ Visualization: Matsushima, Adachi, Sakamoto, Nonaka
A Matrix-Free Algebraic hp-Multigrid Method for Computational Fluid Dynamics Applications [Best Paper Finalist]: Ohm, Harper, Jansson
Scalable eVTOL Aerodynamics Simulations on Heterogeneous HPC Platforms with Minimal-Invasive GPU Porting: Ohm, Takii, Ando, Bale, Tsubokura
Deterministic Quantum Search for Index Retrieval: Algorithm Design and Implementation: Mishra, Balasubramanyam, Raghava
GCAMPS: A Scalable Classical Simulator for Qudit Systems: Harper, Nakhl, Quella, Sevior, Usman
Task-decomposed Overlapped Preconditioner for Sustained Strong Scalability on Accelerated Exascale Systems: Jansson, Karp, Páll, Markidis, Schlatter
What Will the Grace Hopper-Powered Jupiter Supercomputer Bring for Sparse Linear Algebra?: Tsai, Bode, Anzt
Revisiting Communication Software Offloading for MPI+Threads: Reducing Contention and Improving Overlap on Many-Core Systems: Breiter, Chung, Fürlinger, Weidendorfer, Kranzlmüller
Deep Learning-Integrated Pairwise-Qubit Subsystems for Highly Efficient Quantum Circuit Simulation: Pradata, Amrizal, Suryanto, Nugraha, Takizawa

-> Go back to the Papers page