About Me
I'm Haiyue, a Ph.D. candidate in Electrical and Computer Engineering at Princeton University, advised by Prof. David Wentzlaff. My work focuses on computer architecture/microarchitecture, and hardware/software co-design.
Recently I've been working on an emerging direction: building hardware as a safeguard for AI. I'm exploring GPU mechanisms that can throttle dangerous AI workloads or read a model's internal states to catch potential threats.
I've spent the past two summers (2024 & 2025) at NVIDIA as a Deep Learning Architecture intern working on GPU architecture design, plus three years before my Ph.D. as a full-time employee. I got my B.S. in Electrical Engineering from Washington University in St. Louis in 2018.
I have broader interests in AI's impact on humanity, neuroscience (and their similarities to neural networks), and physics. I actively incorporate new insights into both my research and sci-fi writing. You can read more in my Research-and-Personal-Interest statement (somewhat outdated; updates yet to come!).
Research
Most of my work is about understanding machine learning workloads' demand on GPU hardware, and using that to design hardware to run LLMs faster, more efficient, and (most recently) safer.
🛡️ Hardware Mechanisms for AI Safeguards Current focus
2024–Present
As AI systems get more capable, I think we need stronger guarantees than software guardrails alone, and hardware is the one layer an AI can't easily bypass. I'm exploring two complementary GPU mechanisms:
- Throttling AI performance. Low-cost microarchitecture knobs that statically or dynamically limit the hardware resources an AI workload gets, while still serving other workloads normally — a hardware "brain surgery."
- Reading dangerous internal states. Using hardware to directly detect an LLM's dangerous internal states before they form output tokens — a hardware "brain MRI".
LLM Performance Analysis & HW/SW Co-Design 2024–Present
- Build analytical frameworks to characterize the architecture bottlenecks of LLM training and inference on GPUs.
- Explore hardware/software co-optimizations: expert placement and routing to load-balance memory-bound MoE inference, and overlapping operators with complementary hardware bottlenecks to hide latency.
Data-Aware Instruction-Level Scheduling 2022–2024
- Designed microarchitecture-level techniques to detect, profile, and exploit instruction operand value similarities in modern processors, enabling value-aware scheduling, dynamic profiling of data locality, and optimized value prediction for energy efficiency and high performance.
Publications, Preprints & Posters
* indicates co-first author contribution.
- Hardware Mechanisms for AI Safeguards
- Haiyue Ma, Lauren Malek, Joseph Forzani, and David Wentzlaff. GPU Microarchitecture to Dynamically Limit AI Performance. In submission (preprint available upon request).
- Haiyue Ma, August Ning, and David Wentzlaff. Differential Architecture: A Workload Characterization-Driven Approach to GPU Product Differentiation. In submission (preprint available upon request).
- LLM Performance Analysis & HW/SW Co-Design
- Yanpeng Yu*, Haiyue Ma*, et al. Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens. In submission, 2025. [arXiv]
- Haiyue Ma, Jian Liu, and Ronny Krashinsky. Optimizing Dropout in LLM Training: Performance Comparison of Fusion and Overlap. SIGMETRICS 2026. [Paper]
- Haiyue Ma, Zhixu Du, and Yiran Chen. MoE-GPS: Guidelines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing. ML for Systems Workshop, NeurIPS 2025. [arXiv] (Full Version).
- Data-Aware Instruction-Level Scheduling
- Haiyue Ma, Kaifeng Xu, and David Wentzlaff. VASER: Value-Aware Scheduler for Energy Reduction. Poster with proceedings, PACT 2025. [Paper]
- Haiyue Ma and David Wentzlaff. DVProf: Profiling Dynamic Value Locality Between Instructions. Poster, IISWC 2024.
- Haiyue Ma and David Wentzlaff. Exploiting Data Commonality in Value Prediction. The Fifth Young Architect Workshop (YArch), ASPLOS 2023.
CV
You can download my full CV here (updated June 2026).
Blog Articles on Crazy Ideas