Haiyue Ma - Ph.D. Student @ Princeton

About Me

I am an Electrical and Computer Engineering Ph.D. candidate at Princeton University, doing research in Computer Architecture.

I am a member of the Princeton Parallel Group, advised by Prof. David Wentzlaff.

Before joining Princeton, I spent three years at Nvidia, working on architecture explorations for future GPU designs with a focus on DL workloads, and full-chip GPU power analysis. I graduated from Washington University in St. Louis with a B.S. in Electrical Engineering in 2018.

Research

I am broadly interested in hardware/software co-design and system-level performance optimization for emerging workloads. My main research direction is on scheduling.

My past and current work includes:

LLM Scheduling

Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM In Submission
In LLM training, fusing Dropout into Flash-Attention causes hardware resource contention and GPU underutilization. We propose overlapping RNG (the major part of Dropout) with GEMM to improve training efficiency since they have complementary resource utilization and no data dependency.
Data-aware MoE Scheduling: Predicting and Balancing Expert Distribution for LLM Serving Work in Progress
Expert Parallelism ideally performs better than Tensor Parallelism with uniform expert distribution. However, real workloads have heavily skewed distributions which lead to load imbalance and bubbles. We attempt to predict and redistribute experts to achieve near-optimal performance with Expert Parallelism for LLM serving.
A Fine-Grained Theoretical Model for LLM Scheduler Exploration: Fusion and Overlapping Work in Progress
A fast, first-principle model based on GPU resource utilization to analyze the performance of different LLM scheduling strategies, including fusion and overlapping.

Data-Aware Instruction-Level Scheduling

A Value-Aware, Dynamic Hardware Scheduler for Energy Reduction In Submission
A hardware scheduler added to the Issue Stage of the pipeline that predicts instruction operand value locality and schedules instructions with similar operands consecutively to reduce switching activity and dynamic energy consumption.
DVProf: Profiling Dynamic Value Locality Between Instructions (Poster)
2024 IEEE International Symposium on Workload Characterization Posters, Vancouver, Canada
Exploiting Data Commonality in Value Prediction (Workshop paper)
The Fifth Young Architect Workshop at ASPLOS 2023, Vancouver, Canada

CV

You can download my CV here.

Blog Articles on Crazy Ideas

A Hardware "Stop Button" for AI Applications - March 2024

Contact

Email: hm1@princeton.edu

LinkedIn: My LinkedIn Profile