题目:Unified continuous-time q-learning for mean-field game and mean-field control problems
报告人:魏晓利
时间:2025年11月20日(周四),晚上19:00-20:00
地点:腾讯会议(会议号:706282801)
英文摘要:In this talk, we study the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.
中文摘要:在本次报告中,我们研究当总体分布不可直接观测时,均值场跳跃-扩散模型中的连续时间Q学习问题。从典型智能体视角出发,我们提出解耦形式的集成Q函数(解耦Iq函数),并建立其鞅刻画定理,为均值场博弈和均值场控制问题提供了统一的策略评估准则。此外,我们考虑典型智能体根据自身状态值更新总体分布的学习流程。通过区分求解均值场博弈或均值场控制任务,可差异化运用解耦Iq函数分别表征均值场均衡策略与均值场最优策略。基于这些理论发现,我们利用测试策略和平均鞅正交性条件,构建了适用于均值场博弈与均值场控制问题的统一Q学习算法。针对跳跃-扩散场景下的若干金融应用,我们获得了解耦Iq函数与价值函数的精确参数化表示,并通过数值实验验证了该算法具有令人满意的性能。
报告人简介:魏晓利,哈尔滨工业大学副教授(准聘)。本科毕业于中国科学技术大学,2018年于巴黎第七大学获得博士学位。2019-2021年在加州大学伯克利分校从事博士后。2021年-2023年就职于清华大学深圳国际研究生院。主要从事随机微分博弈、强化学习等研究。论文发表在Operations Research,Mathematical Finance, SIAM Journal on Control and Optimization等期刊杂志。
中国·浙江 湖州市二环东路759号(313000) 浙ICP备10025412号
浙公网安备 33050202000195号 版权所有:党委宣传部