ZHU Lei (祝磊)

Senior Researcher

Tencent AI Lab

About Me

I’m a Senior Researcher at Tencent AI Lab at ShenZhen, where I work with a small team that focuses on Agentic AI, especially the Long-term Memory aspect. My expertise and research intrests include: Long Context Modeling, Sparse Atention, and LLM Inference Acceleration.

📢

We are actively looking for researchch interns. Please drop me an email with your resume.

Selected Publications[Full List][Google Scholar]

* Equal contribution; ^✉ Corresponding author

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

Zongle Huang, Lei Zhu, Zongyuan Zhan, Ting Hu, Weikai Mao, Xianzhi Yu, Yongpan Liu, Tianyu Zhang^✉

NeurIPS🏆 Spotlight Annual Conference on Neural Information Processing Systems, 2025

Analyze the interplay between the MoE design and speculative decoding.

📄 Paper

RelayAttention for Efficient Large Language Model Serving with Long System Prompts

Lei Zhu, Xinjiang Wang, Wayne Zhang, Rynson WH Lau

ACL Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Accelerates shared prefixes for efficient LLM serving with no approximation. Concurrent works: Hydragen by Stanford U, ChunkAttention by Microsoft, Cascade Inference by FlashInfer team.

📄 Paper 💻 Code

BiFormer: Vision Transformer with Bi-Level Routing Attention

Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, Rynson WH Lau

CVPR IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Pioneering work on introducing MoE-like dynamic block sparsity into attention. Similar ideas were later popularized by DeepSeek NSA and Kimi MoBA in early 2025.

📄 Paper 📊 Poster 💻 Code

Latest News

2025-11-12

Start my new journey at Tencent AI Lab as Senior Researcher.

2025-09-19

One paper on MoE speculative decoding has been accepted by NeurIPS 2025 as spotlight.

2025-09-12

We have open sourced our multi-agent deep research project DeepDiverV2 .

2025-08-21

One paper on latency-aware test time scaling is accepted by EMNLP Findings.

2024-11-14

Invited talk by UCSD Hao AI Lab about “Efficient Attention Mechanisms”.

Experience

Senior Researcher, Tencent AI Lab.

2025.11 - present

Senior Researcher, Huawei Noah’s Ark Lab (HK).

2024.08 - 2025.10

PhD Student, CityUHK, work with Prof. Rynson W.H. Lau .

2019.10 - 2024.07

Algorithm Engineer (Intern), Tencent RoboticsX Lab.

2019.02 - 2019.08

Research Assistant & PhD Student, CUHK, work with Prof. Eric Lo .

2018.01 - 2018.12

Bachelor in Computer Science, Dalian University of Technology.

2014.09 - 2018.06

Selected Awards

2024 NeurIPS Scholar Award & Top Reviewers NeurIPS Committee

2018 Outstanding Graduate of Liaoning Province Liaoning Provincial Department of Education

2016 2nd Prize of The Chinese Mathematics Competitions for College Students （Final） China Mathematical Society

2016 1st Prize of The Chinese Mathematics Competitions for College Students （Liaoning Division） China Mathematical Society

ZHU Lei (祝 磊)