Mostly OM

Speakers 2025

当前位置: 首页 - Mostly OM - Mostly OM 2025 - Speakers 2025 - 正文

Prof. Xuefeng Gao

发布日期:2025-04-18

点击量:


Prof. Xuefeng Gao

Department of Systems Engineering and Engineering Management 

The Chinese University of Hong Kong

Talk:

Reward-Directed Score-Based Diffusion Models via Continuous-Time Reinforcement Learning


Abstract:

We propose a new reinforcement learning (RL) formulation for training continuous-time score-based diffusion models for generative AI to generate samples that maximize reward functions while keeping the generated distributions close to the unknown target data distributions. Different from most existing studies, our formulation does not involve any pretrained model for the unknown score functions of the noise-perturbed data distributions. We present an entropy regularized continuous-time RL problem and show that the optimal stochastic policy has a Gaussian distribution with a known covariance matrix. Based on this result, we parameterize the mean of Gaussian policies and develop an actor–critic type (little) q-learning algorithm to solve the RL problem. A key ingredient in our algorithm design is to obtain noisy observations from the unknown score function via a ratio estimator. Numerically, we show the effectiveness of our approach by comparing its performance with state-of-the-art RL methods that finetune pretrained models. Finally, we discuss extensions of our RL formulation to probability flow ODE implementation of diffusion models and to conditional diffusion models.  This is a joint work with Jiale Zha and Xunyu Zhou.


Biography:

Xuefeng Gao received his B.S. in Mathematics from Peking University, China in 2008, and his Ph.D. in Operations Research from Georgia Institute of Technology, USA in 2013. His research interests include Stochastic Modelling, Financial Engineering, Operations Research and Reinforcement Learning. His work has been selected as Finalist in the 2011 INFORMS Junior Faculty Interest Group (JFIG) paper competition. During summer 2011 and 2012, he worked as a research intern in the Business Analytics and Mathematical Sciences Department of the IBM T.J. Watson Research Center in New York.


关闭

地址:清华大学经济管理学院伟伦楼447(100084)

邮箱:rccm@mail.tsinghua.edu.cn

电话:010-62771663

传真:010-62784555

Copyright 2025清华大学现代管理研究中心 版权所有