Kun Wu


227 Coordinated Science Laboratory

1308 W Main St

Urbana, IL 61801

Kun is currently a fourth-year Ph.D. student at UIUC, advised by Prof. Wen-mei Hwu. His research interest lies in compilers and libraries for graphics processing units and parallel computer architecture. He has contributed to several impactful projects, involving PyTorch-Direct and Pylog.

Kun received his bachelor’s degree in Electronic Engineering from Tsinghua University. Before that, he published one first-authored paper in Design Automation Conference under the supervision of Prof. Yuan Xie and Prof. Yu Wang.

Want a talk with me? Feel free to send me an email :D My availability can be queried at the published Outlook calendar.


Nov 30, 2022 We filed a patent application to USPTO earlier this month for the idea conceived during HPE Internship in Summer 2021!
Mar 1, 2022 Techniques proposed in our Pytorch-Direct project is now available in DGL v0.8! Please check the CUDA UVA-based optimization in the release note.
Aug 2, 2021 Findings in our Pytorch-Direct project led by David (Seungwon) Min have been merged to the DGL master branch! (PR #3086, #3184, #3194)
Apr 12, 2021 David (Seungwon) Min and I gave a talk at GTC 2021 on our Pytorch-Direct project!

Select Publications

  1. PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks
    Kun WuMert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, and Wen-mei Hwu
    arXiv preprint 2023
  2. Graph Neural Network Training with Data Tiering
    Seung Won MinKun WuMert Hidayetoğlu, Jinjun Xiong, Xiang Song, and Wen-mei Hwu
    SIGKDD International Conference on Knowledge Discovery and Data Mining 2022
  3. SaintSN: Streamlined and Intelligent Storage Node System-on-a-Chip for Exascale Cluster
    Acceptance rate: 17.6%. Pending US patent.
    Kun Wu*Dario Korolija*, Wen-mei Hwu, Gustavo Alonso, Sai Rahul Chalamalasetti, Dejan Milojicic, and Lance Evans
    Proceedings of the Hewlett Packard Enterprise Technical Conference 2022
  4. PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow
    Sitao HuangKun WuHyunmin JeongChengyue Wang, Deming Chen, and Wen-Mei Hwu
    IEEE Transactions on Computers 2021
  5. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture
    Upstreamed to DGL.
    Seung Won MinKun WuSitao HuangMert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu
    Proceedings of the VLDB Endowment 2021
  6. PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses
    Seung Won MinKun WuSitao HuangMert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu
    arXiv preprint 2021
  7. TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-Aware Datatypes
    Carl PearsonKun Wu, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu
    Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing 2021
  8. Memory-Bound Proof-of-Work Acceleration for Blockchain Applications
    Kun WuGuohao DaiXing HuShuangchen LiXinfeng Xie, Yu Wang, and Yuan Xie
    Proceedings of the 56th Annual Design Automation Conference 2019