Research
My research interests are at the intersection of networked systems and
machine learning. Recently, my work has focused on improving inference efficiency and scalability in systems that serve large language models and their applications.
|
|
Legilimens: Performant Video Analytics on the System-on-Chip Edge
Murali Ramanujam,
Yinwei Dai,
Kyle Jamieson,
Ravi Netravali
ArXiv, 2025
We present Legilimens, which embraces the inverted
resource profile of SoC-class devices—shifting the cost of
adaptation for video analytics from compute to memory.
|
|
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan,
Yinwei Dai,
Zhihao Zhang,
Gabriele Oliaro,
Zhihao Jia,
Ravi Netravali
ArXiv, 2025
We introduce SpecReason, a
system that automatically accelerates LRM inference by using a lightweight model
to (speculatively) carry out simpler intermediate reasoning steps and reserving the
costly base model only to assess (and potentially correct) the speculated outputs
|
|
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML
Serving
Yinwei Dai*,
Rui Pan*,
Anand Iyer,
Kai Li,
Ravi Netravali
SOSP, 2024
Acceptance Rate: 17.34% / Github
/ Paper /
Slides
We present Apparate, the first system that automatically injects and manages
Early Exits for serving a wide range of models.
|
|
Improving DNN Inference Throughput Using Practical, Per-Input Compute Adaptation
Anand Iyer,
Mingyu Guan,
Yinwei Dai,
Rui Pan, Swapnil Gandhi,
Ravi Netravali
SOSP, 2024
Acceptance Rate: 17.34% / Github / Paper
We present E3 to address the detrimental trade-off that Early Exits introduce
between compute savings (from exits) and resource utilization (from batching)
in EE-DNNs.
|
|
Auxo: Efficient Federated Learning via Scalable Client Clustering
Jiachen Liu,
Fan Lai,
Yinwei Dai,
Aditya Akella,
Harsha Madhyastha,
Mosharaf Chowdhury
SoCC, 2023 Acceptance Rate: 31%
/ Github / Paper
We propose Auxo, a scalable FL system that enables the server to decompose the
large-scale FL task into groups with smaller intra-cohort heterogeneity.
|
|
ModelKeeper: Accelerating DNN Training via Automated Training Warmup
Fan Lai,
Yinwei Dai,
Harsha Madhyastha,
Mosharaf Chowdhury
NSDI, 2023
Acceptance Rate: 18.38% / Github
/ Paper
/ Talk
We introduce ModelKeeper, a cluster-scale model service framework to accelerate
DNN training, by reducing the computation needed for achieving the same model
performance via automated model transformation.
|
|
FedScale: Benchmarking Model and System Performance of Federated Learning
Fan Lai,
Yinwei Dai, Sanjay Singapuram,
Jiachen Liu,
Xiangfeng Zhu,
Harsha Madhyastha,
Mosharaf Chowdhury
ICML, 2022
Acceptance Rate: 21.94% / Website /
Github
Deployed at Linkedin
Best Paper Award at SOSP ResilientFL 2021
We present FedScale, a diverse set of challenging and realistic benchmark datasets
to facilitate scalable, comprehensive, and reproducible federated learning (FL)
research.
|
Conference Reviewer: NeurIPS (Main and D&B Track ) 2022, 2023, 2024, 2025
Journal Reviewer: Transactions on Mobile Computing 2022, 2025
Artifact Evaluation Committee: SIGCOMM 2022, MLSys 2023
|
My name in Chinese:
If you want to chat with me, please send me an email!
|
|