Yinwei Dai

I am a third-year Computer Science Ph.D. student at Princeton University, working with Prof. Ravi Netravali. I am affiliated with Princeton Systems for AI Lab (SAIL).

I obtained my M.S.E. and B.S.E. in Computer Science at University of Michigan, where I worked with Prof. Mosharaf Chowdhury and Prof. Harsha V. Madhyastha on projects related to networked systems, and B.S.E in Electrical and Computer Engineering from Shanghai Jiao Tong University

Email  /  CV  /  Google Scholar  /  X  /  Github

profile photo
Research

My research interests are at the intersection of networked systems and machine learning. Recently, my work has focused on improving inference efficiency and scalability in systems that serve large language models and their applications.

Preprints
Legilimens: Performant Video Analytics on the System-on-Chip Edge
Murali Ramanujam, Yinwei Dai, Kyle Jamieson, Ravi Netravali
ArXiv, 2025

We present Legilimens, which embraces the inverted resource profile of SoC-class devices—shifting the cost of adaptation for video analytics from compute to memory.

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali
ArXiv, 2025

We introduce SpecReason, a system that automatically accelerates LRM inference by using a lightweight model to (speculatively) carry out simpler intermediate reasoning steps and reserving the costly base model only to assess (and potentially correct) the speculated outputs

Publications
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
Yinwei Dai*, Rui Pan*, Anand Iyer, Kai Li, Ravi Netravali
SOSP, 2024 Acceptance Rate: 17.34% / Github / Paper / Slides

We present Apparate, the first system that automatically injects and manages Early Exits for serving a wide range of models.

Improving DNN Inference Throughput Using Practical, Per-Input Compute Adaptation
Anand Iyer, Mingyu Guan, Yinwei Dai, Rui Pan, Swapnil Gandhi, Ravi Netravali
SOSP, 2024 Acceptance Rate: 17.34% / Github / Paper

We present E3 to address the detrimental trade-off that Early Exits introduce between compute savings (from exits) and resource utilization (from batching) in EE-DNNs.

Auxo: Efficient Federated Learning via Scalable Client Clustering
Jiachen Liu, Fan Lai, Yinwei Dai, Aditya Akella, Harsha Madhyastha, Mosharaf Chowdhury
SoCC, 2023 Acceptance Rate: 31% / Github / Paper

We propose Auxo, a scalable FL system that enables the server to decompose the large-scale FL task into groups with smaller intra-cohort heterogeneity.

ModelKeeper: Accelerating DNN Training via Automated Training Warmup
Fan Lai, Yinwei Dai, Harsha Madhyastha, Mosharaf Chowdhury
NSDI, 2023 Acceptance Rate: 18.38% / Github / Paper / Talk

We introduce ModelKeeper, a cluster-scale model service framework to accelerate DNN training, by reducing the computation needed for achieving the same model performance via automated model transformation.

FedScale: Benchmarking Model and System Performance of Federated Learning
Fan Lai, Yinwei Dai, Sanjay Singapuram, Jiachen Liu, Xiangfeng Zhu, Harsha Madhyastha, Mosharaf Chowdhury
ICML, 2022 Acceptance Rate: 21.94% / Website / Github
Deployed at Linkedin Best Paper Award at SOSP ResilientFL 2021

We present FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research.

Work Experience
Microsoft Research, 2025/05 - 2025/08

Research Intern, Intelligent Networked Systems Group.

Teaching
COS 316: Principles of Computer System Design, Fall 2023

COS 418: Distributed Systems, Winter 2024

EECS 442 Computer Vision, Winter 2022

EECS 489 Computer Network, Fall 2021

Service
Conference Reviewer: NeurIPS (Main and D&B Track ) 2022, 2023, 2024, 2025

Journal Reviewer: Transactions on Mobile Computing 2022, 2025

Artifact Evaluation Committee: SIGCOMM 2022, MLSys 2023

Misc
My name in Chinese: name photo

If you want to chat with me, please send me an email!