![]() |
|
I am now a Staff Algorithm Engineer in KuaiShou, where I lead the Reinforcement Learning for Recommender System group. I am also a member of CCF Multi-Agent Group. I was a Senior Algorithm Engineer in Alibaba Group(Ali Star, 2019). I recieved my Ph.D. from Institute for Institute for Interdisciplinary Information Sciences headed by Prof. Andrew Yao, Tsinghua University, and I was advised by Prof. Pingzhong Tang. Before that, I received my B.S. from Department of Computer Science and Technology, Nanjing University, China. During my undergraduate, I worked in the LAMDA group headed by Prof. Zhi-Hua Zhou.
蔡庆芃,清华大学博士,曾任阿里巴巴算法专家(阿里星),现任快手高级算法专家以及CCF多智能体系统学组委员。他的研究兴趣集中在强化学习与推荐系统上,目前在NeurIPS/ICLR/KDD/WWW/AAAI/IJCAI等国际顶级会议上发表论文20余篇,并担任NeurIPS/ICLR/ICML/KDD/WWW/AAAI/IJCAI等多个学术会议审稿人。
My research interests include reinforcement learning, and recommender system.
Reinforcement Learning for Short Video Recommender Systems
[pdf]
The 1st Workshop on
LARGE-SCALE VIDEO RECOMMENDER SYSTEMS@ACM RecSys'23
Reinforcement Learning for Industrial Recommender Systems
[pdf]
DRL4IR@SIGIR2022
A Large Language Model Enhanced Conversational Recommender System
[pdf]
AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement
[pdf]
1. KuaiSim: A Comprehensive Simulator for Recommender Systems
[pdf]
[code]
Kesen Zhao, Shuchang Liu, Qingpeng Cai*, Xiangyu Zhao*, Ziru Liu, Dong Zheng, Peng Jiang, Kun Gai
NeurIPS 2023
2. State Regularized Policy Optimization on Data with Dynamics Shift
[pdf]
Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
NeurIPS 2023
3. Generative Flow Network for Listwise Recommendation
[pdf]
Shuchang Liu, Qingpeng Cai, Zhankui He, Bowen Sun, Julian McAuley, Dong Zheng, Peng Jiang, Kun Gai
KDD-2023, research track
4. PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement
[pdf]
Wanqi Xue, Qingpeng Cai, Zhenghai Xue, Shuo Sun, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
KDD-2023, research track
5. Reinforcing User Retention in a Billion Scale Short Video Recommender System
[pdf]
Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie, Bin Yang, Dong Zheng, Peng Jiang and Kun Gai
WWW-2023, industry track
6. Multi-Task Recommendations with Reinforcement Learning
[pdf]
Ziru Liu, Jiejie Tian, Qingpeng Cai*, Xiangyu Zhao*, Jingtong Gao, Shuchang Liu, Dayou Chen, Tonghao He, Dong Zheng, Peng Jiang and Kun Gai
WWW-2023, research track
7. Exploration and Regularization of the Latent Action Space in Recommendation
[pdf]
Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Dong Zheng, Peng Jiang, Kun Gai, Ji Jiang, Xiangyu Zhao and Yongfeng Zhang
WWW-2023, research track
8. Two-Stage Constrained Actor-Critic for Short Video Recommendation
[pdf]
Qingpeng Cai, Zhenghai Xue, Chi Zhang, Wanqi Xue, Shuchang Liu, Ruohan Zhan, Xueliang Wang, Tianyou Zuo, Wentao Xie, Dong Zheng, Peng Jiang and Kun Gai
WWW-2023, research track
9. ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor
[pdf]
Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, Bo An
ICLR-2023
10. Exploration in policy optimization through multiple paths
Ling Pan, Qingpeng Cai, Longbo Huang
JAAMAS-2021
11. Softmax Deep Double Deterministic Policy Gradients
[pdf]
Ling Pan, Qingpeng Cai, Longbo Huang
NeurIPS-2020
12. Reinforcement Learning with Dynamic Boltzmann Softmax Updates
[pdf]
Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang
IJCAI-2020, Yokohama, Japan
(Acceptance rate: 12.6%)
13. Multi-path Policy Optimization
[pdf]
Ling Pan, Qingpeng Cai, Longbo Huang
AAMAS-2020, Auckland, New Zeland
(Invited for fast-track publication in JAAMAS, top 5%)
14. Deterministic Value-Policy Gradients
[pdf]
Qingpeng Cai*, Ling Pan*, Pingzhong Tang (* indicates equal contribution)
AAAI-2020, New York, USA
15. Reinforcement Learning Driven Heuristic Optimization
[pdf]
Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei
DRL4KDD-2019, Alaska, USA
16. A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems
[pdf]
Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang
AAAI-2019, Hawaii, USA
17. Policy optimization with model-based explorations
Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hualin He, Qing He, Pingzhong Tang
AAAI-2019, Hawaii, USA
18. Policy gradients for contextual recommendations
Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, Qing He
WWW-2019
19. Reinforcement Mechanism Design for E-commerce
[pdf]
Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, Yiwei Zhang
WWW-2018, Lyon, France
20. Reinforcement Mechanism Design for Fraudulent Behaviour in E-commerce
[pdf]
Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, Yiwei Zhang
AAAI-2018, New Orleans, USA
21. Ranking Mechanism Design for Price-setting Agents in E-commerce
[pdf]
Qingpeng Cai, Pingzhong Tang, Yulong Zeng
AAMAS-2018, Stockholm, Sweden
22. Multi-armed Bandit Mechanism With Private Histories
[pdf]
Chang Liu, Qingpeng Cai, Yukui Zhang
AAMAS-2017 (Extended abstract), Brazil
23. Facility location with Minimax Envy
[pdf]
Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang
IJCAI-2016, NewYork, USA
24. Mechanism Design for Personalized Recommender Systems
[pdf]
Qingpeng Cai, Aris Filos-Ratsikas, Chang Liu, Pingzhong Tang
ACM Recsys-2016, Boston, USA