Kun Wu

prof_pic.jpg

227 Coordinated Science Laboratory

1308 W Main St

Urbana, IL 61801

Kun Wu (吴昆) is currently a fifth-year Ph.D. student at UIUC, advised by Prof. Wen-mei Hwu. His research interest lies in compilers and libraries for graphics processing units and parallel computer architecture. He has contributed to several impactful projects, involving Hector, PyTorch-Direct, and Pylog. He also contributed to the MLIR sparse tensor dialect during his internship at the Google MLIR Sparsifier Team (Check commits here!).

Kun received his bachelor’s degree in Electronic Engineering from Tsinghua University. Before that, he published one first-authored paper in Design Automation Conference under the supervision of Prof. Yu Wang and Prof. Yuan Xie.

Wanna talk with me? Feel free to send me an email :D My availability can be queried at the published Outlook calendar. Resume is available upon request.

News

Nov 30, 2022 We filed a patent application to USPTO earlier this month for the idea conceived during HPE Internship in Summer 2021!
Mar 1, 2022 Techniques proposed in our Pytorch-Direct project is now available in DGL v0.8! Please check the CUDA UVA-based optimization in the release note.
Aug 2, 2021 Findings in our Pytorch-Direct project led by David (Seungwon) Min have been merged to the DGL master branch! (PR #3086, #3184, #3194)
Apr 12, 2021 David (Seungwon) Min and I gave a talk at GTC 2021 on our Pytorch-Direct project!

Select Publications

  1. TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading
    Kun Wu*Jeongmin Brian Park*Xiaofan Zhang*Mert HidayetoğluVikram Sharma MailthodySitao Huang, Steven Sam Lumetta, and Wen-mei Hwu
    arXiv preprint 2024
  2. Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme
    Jeongmin Brian ParkKun WuVikram Sharma MailthodyZaid Qureshi, Scott Mahlke, and Wen-mei Hwu
    2024
  3. Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures
    Kun WuMert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, and Wen-mei Hwu
    Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 2024
  4. Graph Neural Network Training with Data Tiering
    Seung Won MinKun WuMert Hidayetoğlu, Jinjun Xiong, Xiang Song, and Wen-mei Hwu
    SIGKDD International Conference on Knowledge Discovery and Data Mining 2022
  5. SaintSN: Streamlined and Intelligent Storage Node System-on-a-Chip for Exascale Cluster
    Acceptance rate: 17.6%. Pending US patent.
    Kun Wu*Dario Korolija*, Wen-mei Hwu, Gustavo Alonso, Sai Rahul Chalamalasetti, Dejan Milojicic, and Lance Evans
    Proceedings of the Hewlett Packard Enterprise Technical Conference 2022
  6. PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow
    Sitao HuangKun WuHyunmin JeongChengyue Wang, Deming Chen, and Wen-Mei Hwu
    IEEE Transactions on Computers 2021
  7. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture
    Upstreamed to DGL.
    Seung Won MinKun WuSitao HuangMert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu
    Proceedings of the VLDB Endowment 2021
  8. TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-Aware Datatypes
    Carl PearsonKun Wu, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu
    Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing 2021
  9. Memory-Bound Proof-of-Work Acceleration for Blockchain Applications
    Kun WuGuohao DaiXing HuShuangchen LiXinfeng Xie, Yu Wang, and Yuan Xie
    Proceedings of the 56th Annual Design Automation Conference 2019