I am Weihao Zeng, a PhD student supervised by Prof. Junxian He at the Hong Kong University of Science and Technology starting in the fall of 2025.
My main focus is on the post-training aspect of LLMs, specifically including:
- Improving model reasoning capabilities using reinforcement learning (RL) / self-evolution techniques (SimpleRL, B-STaR)
- Exploring efficient data engineering methods for post-training (Deita, Auto Evol-Instruct)
- The application of LLMs in task-oriented dialogue systems (FutureTOD, Seen2UnSeen)
Feel free to email me for any form of academic cooperation: [email protected]
- 2025-03: We introduce SimpleRL-Zoo, a deep investigation of zero RL training across diverse model families and sizes! SimpleRL-Zoo Twitter
- 2025-01: Announce our latest effort on O/R-1 Style Model and Scalable Reinforcement Learning for LLM Reasoning! SimpleRL Twitter
- 2025-01: ππ Our B-STaR have been accepted by ICLR 2025!
- 2024-09: ππ Our Auto Evol-Instruct have been accepted by EMNLP 2024!
- 2024-01: ππ Our Deita have been accepted by ICLR 2024!
- 2023-05: ππ Two paper have been accepted by ACL 2023!
-
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng*, Yuzhen Huang*, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He
Preprint SimpleRL-Zoo Github
-
7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
Weihao Zeng*, Yuzhen Huang*, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He
-
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng*, Yuzhen Huang*, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He
ICLR 2025 paper
-
FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue
Weihao Zeng, Keqing He, Yejie Wang, Chen Zeng, Jingang Wang, Yunsen Xian, Weiran Xu
ACL 2023 Main Conference paper
-
Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation
Weihao Zeng, Lulu Zhao, Keqing He, Ruotong Geng, Jingang Wang, Wei Wu, Weiran Xu
ACL 2023 Main Conference paper
-
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Wei Liu*, Weihao Zeng*, Keqing He, Yong Jiang, Junxian He
ICLR 2024 paper
-
Automatic Instruction Evolving for Large Language Models
Weihao Zeng, Can Xu, Yingxiu Zhao, Jian-Guang Lou, Weizhu Chen
EMNLP 2024 paper
Full Publications on Google Scholar
- April 2025, Qingke Talk, SimpleRL-Zoo and B-STaR: Improving reasoning performance and efficiency through reinforcement learning
- Mar 2025, Westlake University, SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild.
- Feb 2025, Northwestern University, SimpleRL: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
- Feb 2025 Tiktok, SimpleRL: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
- Feb 2025, Huawei Noah's Ark Lab, SimpleRL: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
- National Scholarship in China (2019/2023)
- 2022-09: ππ Achieved the 1st Award on SereTOD Challenge 2022 track 2, EMNLP 2022!
- 2021-08: ππ Achieved the 4th Award on SMP 2021 Conversational AI Challenge!
- 2021-09: ππ Achieved the 8th Place on CCIR 2021 Intelligent NLU Challenge!