|
Research
My research interests are at the intersection of networked systems and
machine learning. Recently, my work has focused on improving inference efficiency and scalability in systems that serve large language models and their applications.
|
|
Selected Publications
/
All
|
|
|
Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows
Yinwei Dai,
Zhuofu Chen,
Anand Iyer,
Ravi Netravali
arXiv, 2025
We present Aragog, a system that progressively adapts request configurations throughout execution for scalable serving of agentic workflows.
|
|
|
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
Yinwei Dai*,
Rui Pan*,
Anand Iyer,
Kai Li,
Ravi Netravali
SOSP, 2024 Acceptance Rate: 17.34%
/ Github
/ Paper
/ Slides
We present Apparate, the first system that automatically injects and manages Early Exits for serving a wide range of models.
|
|
|
ModelKeeper: Accelerating DNN Training via Automated Training Warmup
Fan Lai,
Yinwei Dai,
Harsha Madhyastha,
Mosharaf Chowdhury
NSDI, 2023 Acceptance Rate: 18.38%
/ Github
/ Paper
/ Talk
We introduce ModelKeeper, a cluster-scale model service framework to accelerate DNN training, by reducing the computation needed for achieving the same model performance via automated model transformation.
|
|
|
FedScale: Benchmarking Model and System Performance of Federated Learning
Fan Lai,
Yinwei Dai, Sanjay Singapuram,
Jiachen Liu,
Xiangfeng Zhu,
Harsha Madhyastha,
Mosharaf Chowdhury
ICML, 2022 Acceptance Rate: 21.94%
/ Website
/ Github
Deployed at Linkedin Best Paper Award at SOSP ResilientFL 2021
We present FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research.
|
Conference Reviewer: NeurIPS (Main and D&B Track ) 2022-2025; MLSys 2026
Journal Reviewer: Transactions on Mobile Computing 2022, 2025
Artifact Evaluation Committee: SIGCOMM 2022, MLSys 2023
|
My name in Chinese:
If you want to chat with me, please send me an email!
|
|