Generated by jekyll-scholar. * marks equal contribution.


  1. PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks
    Kun WuMert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, and Wen-mei Hwu
    arXiv preprint 2023


  1. Graph Neural Network Training with Data Tiering
    Seung Won MinKun WuMert Hidayetoğlu, Jinjun Xiong, Xiang Song, and Wen-mei Hwu
    SIGKDD International Conference on Knowledge Discovery and Data Mining 2022
  2. SaintSN: Streamlined and Intelligent Storage Node System-on-a-Chip for Exascale Cluster
    Acceptance rate: 17.6%. Pending US patent.
    Kun Wu*Dario Korolija*, Wen-mei Hwu, Gustavo Alonso, Sai Rahul Chalamalasetti, Dejan Milojicic, and Lance Evans
    Proceedings of the Hewlett Packard Enterprise Technical Conference 2022


  1. PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow
    Sitao HuangKun WuHyunmin JeongChengyue Wang, Deming Chen, and Wen-Mei Hwu
    IEEE Transactions on Computers 2021
  2. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture
    Upstreamed to DGL.
    Seung Won MinKun WuSitao HuangMert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu
    Proceedings of the VLDB Endowment 2021
  3. PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses
    Seung Won MinKun WuSitao HuangMert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu
    arXiv preprint 2021
  4. A Python-based High-Level Programming Flow for CPU-FPGA Heterogeneous Systems : (Invited Paper)
    Sitao HuangKun Wu, Sai Rahul Chalamalasetti, Izzat El Hajj, Cong Xu, Paolo Faraboschi, and Deming Chen
    2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) 2021
  5. TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-Aware Datatypes
    Carl PearsonKun Wu, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu
    Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing 2021


  1. Memory-Bound Proof-of-Work Acceleration for Blockchain Applications
    Kun WuGuohao DaiXing HuShuangchen LiXinfeng Xie, Yu Wang, and Yuan Xie
    Proceedings of the 56th Annual Design Automation Conference 2019