|
I am now a Senior Staff Algorithm Engineer(资深算法专家) in KuaiShou Technology, and I lead the application and evolution of reinforcement learning in recommendation and advertising scenarios (papers and codebase). I was a Senior Algorithm Engineer in Alibaba Group(Ali Star). I recieved my Ph.D. from Institute for Institute for Interdisciplinary Information Sciences headed by Prof. Andrew Yao, Tsinghua University, and I was advised by Prof. Pingzhong Tang. Before that, I received my B.S. from Department of Computer Science and Technology, Nanjing University, China. During my undergraduate, I worked in the LAMDA group headed by Prof. Zhi-Hua Zhou.
My research interests include reinforcement learning, large language models, and recommender system.
Reinforcement Learning for Short Video Recommender Systems
[pdf]
The 1st Workshop on
LARGE-SCALE VIDEO RECOMMENDER SYSTEMS@ACM RecSys'23
Reinforcement Learning for Industrial Recommender Systems
[pdf]
DRL4IR@SIGIR2022
AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement
[pdf]
29. DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems
[pdf]
Jiaju Chen, Chongming Gao, Shuai Yuan, Shuchang Liu, Qingpeng Cai, Peng Jiang
WSDM-2025
28. Modeling User Retention through Generative Flow Networks
[pdf]
Ziru Liu, Shuchang Liu, Bin Yang, Zhenghai Xue, Qingpeng Cai*, Xiangyu Zhao, Zijian Zhang, Lantao Hu, Han Li, Peng Jiang
KDD-2024, industry track
27. Future Impact Decomposition in Request-level Recommendations
[pdf]
Xiaobei Wang, Shuchang Liu, Xueliang Wang, Qingpeng Cai*, Lantao Hu, Han Li, Peng Jiang, Guangming Xie
KDD-2024, industry track
26. Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention
[pdf]
Ziru Liu, Shuchang Liu, Zijian Zhang, Qingpeng Cai*, Xiangyu Zhao, Kesen Zhao, Lantao Hu, Peng Jiang, Kun Gai
SIGIR-2024
25. M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework
[pdf]
Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai
SIGIR-2024
24. KuaiSim: A Comprehensive Simulator for Recommender Systems
[pdf]
Kesen Zhao, Shuchang Liu, Qingpeng Cai*, Xiangyu Zhao*, Ziru Liu, Dong Zheng, Peng Jiang, Kun Gai
NeurIPS-2023
23. State Regularized Policy Optimization on Data with Dynamics Shift
[pdf]
Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
NeurIPS-2023
22. Generative Flow Network for Listwise Recommendation
[pdf]
Shuchang Liu, Qingpeng Cai, Zhankui He, Bowen Sun, Julian McAuley, Dong Zheng, Peng Jiang, Kun Gai
KDD-2023
21. PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement
[pdf]
Wanqi Xue, Qingpeng Cai, Zhenghai Xue, Shuo Sun, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
KDD-2023
20. Reinforcing User Retention in a Billion Scale Short Video Recommender System
[pdf]
Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie, Bin Yang, Dong Zheng, Peng Jiang and Kun Gai
WWW-2023, industry track
19. Multi-Task Recommendations with Reinforcement Learning
[pdf]
Ziru Liu, Jiejie Tian, Qingpeng Cai*, Xiangyu Zhao*, Jingtong Gao, Shuchang Liu, Dayou Chen, Tonghao He, Dong Zheng, Peng Jiang and Kun Gai
WWW-2023
18. Exploration and Regularization of the Latent Action Space in Recommendation
[pdf]
Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Dong Zheng, Peng Jiang, Kun Gai, Ji Jiang, Xiangyu Zhao and Yongfeng Zhang
WWW-2023
17. Two-Stage Constrained Actor-Critic for Short Video Recommendation
[pdf]
Qingpeng Cai, Zhenghai Xue, Chi Zhang, Wanqi Xue, Shuchang Liu, Ruohan Zhan, Xueliang Wang, Tianyou Zuo, Wentao Xie, Dong Zheng, Peng Jiang and Kun Gai
WWW-2023
16. ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor
[pdf]
Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, Bo An
ICLR-2023
15. Exploration in policy optimization through multiple paths
Ling Pan, Qingpeng Cai, Longbo Huang
JAAMAS-2021
14. Softmax Deep Double Deterministic Policy Gradients
[pdf]
Ling Pan, Qingpeng Cai, Longbo Huang
NeurIPS-2020
13. Reinforcement Learning with Dynamic Boltzmann Softmax Updates
[pdf]
Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang
IJCAI-2020
(Acceptance rate: 12.6%)
12. Multi-path Policy Optimization
[pdf]
Ling Pan, Qingpeng Cai, Longbo Huang
AAMAS-2020
(Invited for fast-track publication in JAAMAS, top 5%)
11. Deterministic Value-Policy Gradients
[pdf]
Qingpeng Cai*, Ling Pan*, Pingzhong Tang (* indicates equal contribution)
AAAI-2020
10. Reinforcement Learning Driven Heuristic Optimization
[pdf]
Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei
DRL4KDD-2019
9. A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems
[pdf]
Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang
AAAI-2019
8. Policy optimization with model-based explorations
Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hualin He, Qing He, Pingzhong Tang
AAAI-2019
7. Policy gradients for contextual recommendations
Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, Qing He
WWW-2019
6. Reinforcement Mechanism Design for E-commerce
[pdf]
Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, Yiwei Zhang
WWW-2018
5. Reinforcement Mechanism Design for Fraudulent Behaviour in E-commerce
[pdf]
Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, Yiwei Zhang
AAAI-2018
4. Ranking Mechanism Design for Price-setting Agents in E-commerce
[pdf]
Qingpeng Cai, Pingzhong Tang, Yulong Zeng
AAMAS-2018
3. Multi-armed Bandit Mechanism With Private Histories
[pdf]
Chang Liu, Qingpeng Cai, Yukui Zhang
AAMAS-2017 (Extended abstract)
2. Facility location with Minimax Envy
[pdf]
Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang
IJCAI-2016
1. Mechanism Design for Personalized Recommender Systems
[pdf]
Qingpeng Cai, Aris Filos-Ratsikas, Chang Liu, Pingzhong Tang
Recsys-2016