Publications

Generated by jekyll-scholar. * marks equal contribution.

2025

  1. SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training
    Kun Wu*Jeongmin Brian Park*Xiaofan Zhang*Mert HidayetoğluVikram Sharma MailthodySitao Huang, Steven Sam Lumetta, and Wen-mei Hwu
    To Appear in Design Automation Conference 2025 2025

2024

  1. Code Generation and Runtime Techniques for Enabling Data Efficient Deep Learning Training on GPUs
    Kun Wu
    Ph.D. Dissertation University of Illinois at Urbana-Champaign 2024
  2. Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme
    Jeongmin Brian ParkKun WuVikram Sharma MailthodyZaid Qureshi, Scott Mahlke, and Wen-mei Hwu
    2024
  3. Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures
    Kun WuMert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, and Wen-mei Hwu
    Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 2024

2022

  1. Graph Neural Network Training with Data Tiering
    Seung Won MinKun WuMert Hidayetoğlu, Jinjun Xiong, Xiang Song, and Wen-mei Hwu
    SIGKDD International Conference on Knowledge Discovery and Data Mining 2022
  2. SaintSN: Streamlined and Intelligent Storage Node System-on-a-Chip for Exascale Cluster
    Acceptance rate: 17.6%. Pending US patent.
    Kun Wu*Dario Korolija*, Wen-mei Hwu, Gustavo Alonso, Sai Rahul Chalamalasetti, Dejan Milojicic, and Lance Evans
    Proceedings of the Hewlett Packard Enterprise Technical Conference 2022

2021

  1. PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow
    Sitao HuangKun WuHyunmin JeongChengyue Wang, Deming Chen, and Wen-Mei Hwu
    IEEE Transactions on Computers 2021
  2. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture
    Upstreamed to DGL.
    Seung Won MinKun WuSitao HuangMert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu
    Proceedings of the VLDB Endowment 2021
  3. PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses
    Seung Won MinKun WuSitao HuangMert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu
    arXiv preprint 2021
  4. A Python-based High-Level Programming Flow for CPU-FPGA Heterogeneous Systems : (Invited Paper)
    Sitao HuangKun Wu, Sai Rahul Chalamalasetti, Izzat El Hajj, Cong Xu, Paolo Faraboschi, and Deming Chen
    2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) 2021
  5. TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-Aware Datatypes
    Carl PearsonKun Wu, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu
    Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing 2021

2019

  1. Memory-Bound Proof-of-Work Acceleration for Blockchain Applications
    Kun WuGuohao DaiXing HuShuangchen LiXinfeng Xie, Yu Wang, and Yuan Xie
    Proceedings of the 56th Annual Design Automation Conference 2019