publications

(*) denotes equal contribution

2024

  1. Learning to (Learn at Test Time): RNNs with Expressive Hidden States
    Yu Sun*, Xinhao Li*, Karan Dalal*, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, and Carlos Guestrin
    2024
  2. MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
    Tianyu Fu*, Haofeng Huang*, Xuefei Ning*, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, and Yu Wang
    2024
  3. COLM
    CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
    Je-Yong Lee*, Donghyun Lee*, Genghan ZhangMo Tiwari, and Azalia Mirhoseini
    2024
  4. PLDI
    Compilation of Modular and General Sparse Workspaces
    Genghan ZhangOlivia Hsu, and Fredrik Kjolstad
    Proceedings of the ACM on Programming Languages 2024
  5. GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU
    Zhongming YuGenghan ZhangHanxian HuangXin Chen, and Jishen Zhao
    arXiv preprint 2024

2023

  1. CCFTHPC
    Sgap: towards efficient sparse tensor algebra compilation for GPU
    Genghan Zhang, Yuetong Zhao, Yanting Tao, Zhongming YuGuohao DaiSitao HuangYuan WenPavlos Petoumenos, and Yu Wang
    CCF Transactions on High Performance Computing 2023
  2. MLSys
    HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs
    Proceedings of Machine Learning and Systems 2023