Download PDF Open PDF in browser Current version

Learning to Self-Modify Rewards with Implicit Gradients

EasyChair Preprint 8260, version 1

Versions: 12→history

10 pages•Date: June 12, 2022

Abstract

Reward shaping is a powerful technique for efficient learning of optimal policies in sequential decision-making. However, it is challenging to design auxiliary rewards to help the agent, and often needs considerable time and effort by domain experts. In this paper, we build on the optimal rewards methodology to adapt a given reward function. This problem can be naturally formulated as a meta-learning problem and solved in a bi-level optimization framework. However, standard approaches used in literature for these problems are not scalable. Hence we propose to use an implicit-gradient technique to solve this problem. We demonstrate the effectiveness of our method in both a) learning optimal rewards and b) adaptive reward shaping.

Keyphrases: Reinforcement Learning, Reward Shaping, meta-learning

Links:

https://easychair.org/publications/preprint/zk7q

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:8260,
  author    = {Aiden Boyd and Shibani and Will Callaghan},
  title     = {Learning to Self-Modify Rewards with Implicit Gradients},
  howpublished = {EasyChair Preprint 8260},
  year      = {EasyChair, 2022}}

Download PDF Open PDF in browser Current version