Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes

Reinforcement Learning Conference（2024）

引用 0|浏览21

摘要

In offline reinforcement learning (RL), the absence of active explorationcalls for attention on the model robustness to tackle the sim-to-real gap,where the discrepancy between the simulated and deployed environments cansignificantly undermine the performance of the learned policy. To endow thelearned policy with robustness in a sample-efficient manner in the presence ofhigh-dimensional state-action space, this paper considers the sample complexityof distributionally robust linear Markov decision processes (MDPs) with anuncertainty set characterized by the total variation distance using offlinedata. We develop a pessimistic model-based algorithm and establish its samplecomplexity bound under minimal data coverage assumptions, which outperformsprior art by at least Õ(d), where d is the feature dimension. Wefurther improve the performance guarantee of the proposed algorithm byincorporating a carefully-designed variance estimator.

查看译文

关键词

Probabilistic Learning,Imprecise Probabilities,Structure Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

您的评分 :

暂无评分

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn