Kun Wu

prof_pic.jpg

2788 San Tomas Expy

Santa Clara, CA 95051

Kun Wu (吴昆) is currently a backend compiler engineer at NVIDIA. He received his Ph.D. degree from UIUC, advised by Prof. Wen-mei Hwu. He has contributed to several impactful projects, involving Hector, PyTorch-Direct, and Pylog. He also contributed to the MLIR sparse tensor dialect during his internship at the Google MLIR Sparsifier Team (Check commits here!).

During the undergrad program, Kun was under the supervision of Prof. Yu Wang and Prof. Yuan Xie.

Wanna talk with me? Feel free to send me an email :D My availability can be queried at the published Outlook calendar. Resume is available upon request.

News

Nov 30, 2022 We filed a patent application to USPTO earlier this month for the idea conceived during HPE Internship in Summer 2021!
Mar 1, 2022 Techniques proposed in our Pytorch-Direct project is now available in DGL v0.8! Please check the CUDA UVA-based optimization in the release note.
Aug 2, 2021 Findings in our Pytorch-Direct project led by David (Seungwon) Min have been merged to the DGL master branch! (PR #3086, #3184, #3194)
Apr 12, 2021 David (Seungwon) Min and I gave a talk at GTC 2021 on our Pytorch-Direct project!

Select Publications

  1. Code Generation and Runtime Techniques for Enabling Data Efficient Deep Learning Training on GPUs
    Kun Wu
    Ph.D. Dissertation University of Illinois at Urbana-Champaign 2024
  2. SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training
    Kun Wu*Jeongmin Brian Park*Xiaofan Zhang*Mert HidayetoğluVikram Sharma MailthodySitao Huang, Steven Sam Lumetta, and Wen-mei Hwu
    To Appear in Design Automation Conference 2025 2025
  3. Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme
    Jeongmin Brian ParkKun WuVikram Sharma MailthodyZaid Qureshi, Scott Mahlke, and Wen-mei Hwu
    2024
  4. Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures
    Kun WuMert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, and Wen-mei Hwu
    Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 2024
  5. Graph Neural Network Training with Data Tiering
    Seung Won MinKun WuMert Hidayetoğlu, Jinjun Xiong, Xiang Song, and Wen-mei Hwu
    SIGKDD International Conference on Knowledge Discovery and Data Mining 2022
  6. SaintSN: Streamlined and Intelligent Storage Node System-on-a-Chip for Exascale Cluster
    Acceptance rate: 17.6%. Pending US patent.
    Kun Wu*Dario Korolija*, Wen-mei Hwu, Gustavo Alonso, Sai Rahul Chalamalasetti, Dejan Milojicic, and Lance Evans
    Proceedings of the Hewlett Packard Enterprise Technical Conference 2022
  7. PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow
    Sitao HuangKun WuHyunmin JeongChengyue Wang, Deming Chen, and Wen-Mei Hwu
    IEEE Transactions on Computers 2021
  8. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture
    Upstreamed to DGL.
    Seung Won MinKun WuSitao HuangMert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu
    Proceedings of the VLDB Endowment 2021
  9. TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-Aware Datatypes
    Carl PearsonKun Wu, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu
    Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing 2021
  10. Memory-Bound Proof-of-Work Acceleration for Blockchain Applications
    Kun WuGuohao DaiXing HuShuangchen LiXinfeng Xie, Yu Wang, and Yuan Xie
    Proceedings of the 56th Annual Design Automation Conference 2019