Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes

Reinforcement Learning Conference(2024)

引用 0|浏览21
摘要
In offline reinforcement learning (RL), the absence of active explorationcalls for attention on the model robustness to tackle the sim-to-real gap,where the discrepancy between the simulated and deployed environments cansignificantly undermine the performance of the learned policy. To endow thelearned policy with robustness in a sample-efficient manner in the presence ofhigh-dimensional state-action space, this paper considers the sample complexityof distributionally robust linear Markov decision processes (MDPs) with anuncertainty set characterized by the total variation distance using offlinedata. We develop a pessimistic model-based algorithm and establish its samplecomplexity bound under minimal data coverage assumptions, which outperformsprior art by at least Õ(d), where d is the feature dimension. Wefurther improve the performance guarantee of the proposed algorithm byincorporating a carefully-designed variance estimator.
更多
查看译文
关键词
Probabilistic Learning,Imprecise Probabilities,Structure Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
0
您的评分 :

暂无评分

数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn