In our latest AI safety blog post, we explore principled solutions to the reward tampering problem, in which a reinforcement learning agent actively changes its reward function to maximise reward.
— DeepMind (@DeepMindAI) August 14, 2019
Blog post: https://t.co/UAPa3b71bK
Paper: https://t.co/cxksIx5kwU pic.twitter.com/HRnoYBHBYA
from Twitter https://twitter.com/DeepMindAI
August 14, 2019 at 08:47AM
via IFTTT
No comments:
Post a Comment