
Prof. Xuefeng Gao
Department of Systems Engineering and Engineering Management
The Chinese University of Hong Kong
Talk:
Reward-Directed Score-Based Diffusion Models via Continuous-Time Reinforcement Learning
Abstract:
We propose a new reinforcement learning (RL) formulation for training continuous-time score-based diffusion models for generative AI to generate samples that maximize reward functions while keeping the generated distributions close to the unknown target data distributions. Different from most existing studies, our formulation does not involve any pretrained model for the unknown score functions of the noise-perturbed data distributions. We present an entropy regularized continuous-time RL problem and show that the optimal stochastic policy has a Gaussian distribution with a known covariance matrix. Based on this result, we parameterize the mean of Gaussian policies and develop an actor–critic type (little) q-learning algorithm to solve the RL problem. A key ingredient in our algorithm design is to obtain noisy observations from the unknown score function via a ratio estimator. Numerically, we show the effectiveness of our approach by comparing its performance with state-of-the-art RL methods that finetune pretrained models. Finally, we discuss extensions of our RL formulation to probability flow ODE implementation of diffusion models and to conditional diffusion models. This is a joint work with Jiale Zha and Xunyu Zhou.
Biography:
Xuefeng Gao received his B.S. in Mathematics from Peking University, China in 2008, and his Ph.D. in Operations Research from Georgia Institute of Technology, USA in 2013. His research interests include Stochastic Modelling, Financial Engineering, Operations Research and Reinforcement Learning. His work has been selected as Finalist in the 2011 INFORMS Junior Faculty Interest Group (JFIG) paper competition. During summer 2011 and 2012, he worked as a research intern in the Business Analytics and Mathematical Sciences Department of the IBM T.J. Watson Research Center in New York.