Streaming Tensor Program: A Streaming Abstraction for Dynamic Parallelism
Authors
Gina Sohn,Genghan Zhang,Konstantin Hossfeld,Jungwoo Kim,Nathan Sobotka,Nathan Zhang,Olivia Hsu,andKunle Olukotun
Abstract
Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow. However, the limited expressiveness of prior programming abstractions for spatial dataflow accelerators (SDAs) forces these dynamic behaviors to be implemented statically and/or unoptimized. To address these challenges, we present Streaming Tensor Programs (STeP), a streaming abstraction that enables dynamic tensor workloads to run efficiently on SDAs. STeP introduces flexible routing operators, an explicit memory hierarchy, and symbolic-shape semantics that expose dynamic data rates and tensor dimensions. These capabilities unlock new optimizations, like dynamic tiling, dynamic parallelization, and configuration time-multiplexing, that adapt SDA execution to dynamic behaviors while preserving dataflow efficiency. Using a cycle-approximate simulator on representative LLM layers and a full model with real-world traces, STeP enables: dynamic tiling that breaks the Pareto-optimal frontier from prior work, dynamic parallelization that improves latency by ~2.72x, and configuration time-multiplexing that increases compute utilization by ~2.64x over prior SDA abstractions and their implementations.
Article
BibTeX
@article{sohn2026streaming,
title={Streaming Tensor Program: A streaming abstraction for dynamic parallelism},
author={Sohn, Gina and Zhang, Genghan and Hossfeld, Konstantin and Kim, Jungwoo and Sobotka, Nathan and Zhang, Nathan and Hsu, Olivia and Olukotun, Kunle},
journal={Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems},
year={2026}
}