To be announced soon :D
Pylog: A High-Level Programming and Synthesis Flow for FPGAs
· Enabling high-level, algorithm-centric Python programming and synthesis flow for FPGA. · Developing optimization and design space exploration mechanisms. · Developing language constructs to embody FPGA interconnection concepts.Check first author's site to learn more.
PyTorch-Direct: Introducing Deep Learning Framework with GPU-Centric Data Access for Faster Large GNN Training
· Incorporated unified memory access into Pytorch library to improve GPU performance for graph neural networks. Around 2000 line of code (LOC). · Proposed the unified tensor programming interface.Check first author's site to learn more.
Erudite: A Low-Latency, High-Capacity, and High-efficiency Prototype System for Compu-tational Intelligence
· Exploring flash memory systems and software stack innovations to unblock the memory capacity andbandwidth bottleneck and eliminate software overhead for data-intensive workload · Developed a Linux kernel file system as a part of a prototypical device file system for NVMe SSD toreduce software overhead and enhance security · Worked on a concurrent-access cache simulator for a foundational non-volatile memory emulation system
Software Engineering Intern @ Google
May 2023 -- Aug. 2023 Manager and Mentor:: Dr. Aart Bik · Working on the GPU backend for sparse tensor in the MLIR compiler
Applied Scientist Intern @ Amazon Web Services (AWS)
AWS Graph Machine Learning (AGML) Team, May 2022 -- Aug. 2022 Manager and Mentor:: Dr. Da Zheng and Dr. Xiang Song · Proposed IR-based code generator to systematically bridge the gap between programming interface and kernel APIs, and decouples models, data layout, kernel-specific optimization from each other · Achieves up to 7.8× speed-up in inference and 5.6× speed-up in training compared with public state-of-the-art system in RGCN, RGAT, HGT · Proposed one inter-operator optimization and two intermediate data layout options to further accelerate the system to deliver up to 2.2× speed up
Systems & Software Research Associate (Intern) @ HPE Labs
Systems Architecture & Management Lab, May 2021 -- Aug. 2021 Manager and Mentor:: Dr. Dejan Milojicic and Dr. Sai Rahul Chalamalasetti · Investigated opportunities to accelerate operators from Intel DAOS distributed storage systems. · Findings accepted by proceedings of internal conference HPE TechCon 2022. · Pending U.S. patent.
Compiler Software Engineer Intern -- GPU @ Nvidia
Optimizing Code Generator (OCG) Team, May 2020 -- Aug. 2020 Manager and Mentors:: Jerry Zheng, Howard Chen, James Player · Worked on a prototypical LLVM backend compiler. · Designed and developed an extensible vectorization pass. · Designed and developed a Machine IR peephole optimizations driver.
Software Engineer Intern @ MSR Asia
Software Analytics Group, Jan. 2019 -- July 2019 Manager and Mentor:: Qingwei Lin, Bo Qiao · Developed a general pipeline for anomaly detection algorithms. · Refactored existing logic to incorporate it into this new pipeline. · Efficiently parallelize an anomaly detection algorithm in Java.