The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Visant la contradiction entre l'exploration et l'exploitation dans l'apprentissage par renforcement profond, cet article propose « une stratégie d'exploration basée sur les récompenses combinée à la sélection d'actions Softmax » (RBE-Softmax) comme stratégie d'exploration dynamique pour guider l'agent dans l'apprentissage. La supériorité de la méthode proposée réside dans le fait que les caractéristiques du processus d'apprentissage de l'agent sont utilisées pour adapter les paramètres d'exploration en ligne, et que l'agent est capable de sélectionner plus efficacement l'action optimale potentielle. La méthode proposée est évaluée dans des tâches de contrôle discrètes et continues sur OpenAI Gym, et les résultats de l'évaluation empirique montrent que la méthode RBE-Softmax conduit à une amélioration statistiquement significative des performances des algorithmes d'apprentissage par renforcement profond.
Zhi-xiong XU
Army Engineering University
Lei CAO
Army Engineering University
Xi-liang CHEN
Army Engineering University
Chen-xi LI
Army Engineering University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Zhi-xiong XU, Lei CAO, Xi-liang CHEN, Chen-xi LI, "Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 9, pp. 2409-2412, September 2018, doi: 10.1587/transinf.2018EDL8011.
Abstract: Aiming at the contradiction between exploration and exploitation in deep reinforcement learning, this paper proposes “reward-based exploration strategy combined with Softmax action selection” (RBE-Softmax) as a dynamic exploration strategy to guide the agent to learn. The superiority of the proposed method is that the characteristic of agent's learning process is utilized to adapt exploration parameters online, and the agent is able to select potential optimal action more effectively. The proposed method is evaluated in discrete and continuous control tasks on OpenAI Gym, and the empirical evaluation results show that RBE-Softmax method leads to statistically-significant improvement in the performance of deep reinforcement learning algorithms.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDL8011/_p
Copier
@ARTICLE{e101-d_9_2409,
author={Zhi-xiong XU, Lei CAO, Xi-liang CHEN, Chen-xi LI, },
journal={IEICE TRANSACTIONS on Information},
title={Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning},
year={2018},
volume={E101-D},
number={9},
pages={2409-2412},
abstract={Aiming at the contradiction between exploration and exploitation in deep reinforcement learning, this paper proposes “reward-based exploration strategy combined with Softmax action selection” (RBE-Softmax) as a dynamic exploration strategy to guide the agent to learn. The superiority of the proposed method is that the characteristic of agent's learning process is utilized to adapt exploration parameters online, and the agent is able to select potential optimal action more effectively. The proposed method is evaluated in discrete and continuous control tasks on OpenAI Gym, and the empirical evaluation results show that RBE-Softmax method leads to statistically-significant improvement in the performance of deep reinforcement learning algorithms.},
keywords={},
doi={10.1587/transinf.2018EDL8011},
ISSN={1745-1361},
month={September},}
Copier
TY - JOUR
TI - Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning
T2 - IEICE TRANSACTIONS on Information
SP - 2409
EP - 2412
AU - Zhi-xiong XU
AU - Lei CAO
AU - Xi-liang CHEN
AU - Chen-xi LI
PY - 2018
DO - 10.1587/transinf.2018EDL8011
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2018
AB - Aiming at the contradiction between exploration and exploitation in deep reinforcement learning, this paper proposes “reward-based exploration strategy combined with Softmax action selection” (RBE-Softmax) as a dynamic exploration strategy to guide the agent to learn. The superiority of the proposed method is that the characteristic of agent's learning process is utilized to adapt exploration parameters online, and the agent is able to select potential optimal action more effectively. The proposed method is evaluated in discrete and continuous control tasks on OpenAI Gym, and the empirical evaluation results show that RBE-Softmax method leads to statistically-significant improvement in the performance of deep reinforcement learning algorithms.
ER -