Authors

Gina Sohn,Genghan Zhang,Konstantin Hossfeld,Jungwoo Kim,Nathan Sobotka,Nathan Zhang,Olivia Hsu,andKunle Olukotun

Abstract

Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow. However, the limited expressiveness of prior programming abstractions for spatial dataflow accelerators (SDAs) forces these dynamic behaviors to be implemented statically and/or unoptimized. To address these challenges, we present Streaming Tensor Programs (STeP), a streaming abstraction that enables dynamic tensor workloads to run efficiently on SDAs. STeP introduces flexible routing operators, an explicit memory hierarchy, and symbolic-shape semantics that expose dynamic data rates and tensor dimensions. These capabilities unlock new optimizations, like dynamic tiling, dynamic parallelization, and configuration time-multiplexing, that adapt SDA execution to dynamic behaviors while preserving dataflow efficiency. Using a cycle-approximate simulator on representative LLM layers and a full model with real-world traces, STeP enables: dynamic tiling that breaks the Pareto-optimal frontier from prior work, dynamic parallelization that improves latency by ~2.72x, and configuration time-multiplexing that increases compute utilization by ~2.64x over prior SDA abstractions and their implementations.

Article

pdf

BibTeX

 @article{sohn2026streaming, 
title={Streaming Tensor Program: A streaming abstraction for dynamic parallelism},
author={Sohn, Gina and Zhang, Genghan and Hossfeld, Konstantin and Kim, Jungwoo and Sobotka, Nathan and Zhang, Nathan and Hsu, Olivia and Olukotun, Kunle},
journal={Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems},
year={2026}
}