Haiyue Ma - Ph.D. Candidate @ Princeton

About Me

I am a Ph.D. candidate in Electrical and Computer Engineering at Princeton University, advised by Prof. David Wentzlaff. My research lies at the intersection of GPU systems architecture, hardware/software co-design, and microarchitecture, with a focus on modeling performance bottlenecks in large-scale machine learning workloads.

Most recently, I interned at NVIDIA (Summer 2024 & Summer 2025), where I worked on GPU performance modeling for Mixture-of-Experts (MoE) inference and on overlapping GPU kernels in FlashAttention-based LLM training.

Before starting my Ph.D., I spent three years at NVIDIA Shanghai, contributing to the Ampere and Orin architectures. I received my B.S. in Electrical Engineering from Washington University in St. Louis in 2018.

I have broader interests in AI's impact on humanity, neuroscience (and their similarities to neural networks), and physics. I am actively incorporating my new insights in my research and sci-fi writing. Read my Research-and-Personal-Interest statement.

Research

My research focuses on performance modeling and scheduling of GPU-based systems for machine learning, as well as microarchitectural techniques for energy-efficient computing.

LLM & MoE Scheduling

Performance Impact of MoE Load Imbalance - Built analytical models to quantify load imbalance in large-scale MoE inference and studied expert placement/duplication strategies (NVIDIA 2025 internship, paper in preparation).
MoE-GPS (arXiv:2506.07366, 2025) - Framework for prediction strategies in dynamic expert duplication for balanced expert workload distribution.
Overlapping RNG with GEMM in Dropout (arXiv:2410.07531, 2024) - Proposed GPU-level kernel overlapping to improve resource utilization in LLM training (NVIDIA 2024 internship).

Data-Aware Instruction-Level Scheduling

Value-Aware Scheduler for Energy Reduction - Microarchitecture-level scheduling technique to reduce dynamic energy consumption (Poster, PACT 2025).
DVProf - Tool for profiling dynamic operand value locality invisible to static analysis (IISWC 2024 Poster).
Exploiting Data Commonality in Value Prediction - Leveraging inherent value patterns for improved prediction (ASPLOS Young Architect Workshop 2023).

Architecture for AI Safety & Policy

Differential Architecture - Methodology for building hardware that intentionally bounds performance of targeted AI workloads while preserving efficiency for general applications (in submission).

CV

You can download my full CV here (updated Aug. 2025).

Blog Articles on Crazy Ideas

A Hardware "Stop Button" for AI Applications - March 2024

Contact

Email: hm1@princeton.edu

LinkedIn: My LinkedIn Profile