The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Cet article décrit le Partage des profits, une approche d'apprentissage par renforcement qui peut être utilisée pour concevoir une stratégie de coordination dans un système multi-agents, et démontre empiriquement son efficacité dans un parc de bobines de fabrication d'acier. Ce domaine se compose de plusieurs grues qui fonctionnent de manière asynchrone mais nécessitent une coordination pour ajuster leurs plans initiaux d'exécution des tâches afin d'éviter les collisions, qui seraient causées par une limitation des ressources. Ce problème dépasse les méthodes classiques de codage manuel de l'expert ainsi que l'analyse mathématique, en raison de la dispersion des informations, des tâches générées de manière stochastique et, de plus, des difficultés à effectuer les tâches dans les délais. Ces dernières années, de nombreuses applications d'algorithmes d'apprentissage par renforcement basés sur Programmation dynamique (DP), tels que Q-learning, méthode des différences temporelles, sont introduits. Ils promettent des performances optimales de l'agent dans les processus de décision markoviens (MDP), mais dans les non-MDP, comme le domaine multi-agents, il n'y a aucune garantie pour la convergence de la politique de l'agent. D'autre part, Partage des profits contraste avec ceux basés sur le DP, pourrait garantir la convergence vers la politique rationnelle, ce qui signifie que l'agent pourrait atteindre l'un des statuts souhaitables, même dans les non-MDP, où les agents apprennent simultanément et de manière compétitive. Par conséquent, nous avons intégré Partage des profits à l'opérateur de grue pour acquérir des règles de coopération dans un domaine aussi dynamique et introduire son applicabilité au monde réaliste au moyen d'une comparaison avec le modèle RAP (Reactive Action Planner), codé par les connaissances d'un expert.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Sachiyo ARAI, Kazuteru MIYAZAKI, Shigenobu KOBAYASHI, "Controlling Multiple Cranes Using Multi-Agent Reinforcement Learning: Emerging Coordination among Competitive Agents" in IEICE TRANSACTIONS on Communications,
vol. E83-B, no. 5, pp. 1039-1047, May 2000, doi: .
Abstract: This paper describes the Profit-Sharing, a reinforcement learning approach which can be used to design a coordination strategy in a multi-agent system, and demonstrates its effectiveness empirically within a coil-yard of steel manufacture. This domain consists of multiple cranes which are operated asynchronously but need coordination to adjust their initial plans of task execution to avoid the collisions, which would be caused by resource limitation. This problem is beyond the classical expert's hand-coding methods as well as the mathematical analysis, because of scattered information, stochastically generated tasks, and moreover, the difficulties to transact tasks on schedule. In recent few years, many applications of reinforcement learning algorithms based on Dynamic Programming (DP), such as Q-learning, Temporal Difference method, are introduced. They promise optimal performance of the agent in the Markov decision processes (MDPs), but in the non-MDPs, such as multi-agent domain, there is no guarantee for the convergence of agent's policy. On the other hand, Profit-Sharing is contrastive with DP-based ones, could guarantee the convergence to the rational policy, which means that agent could reach one of the desirable status, even in non-MDPs, where agents learn concurrently and competitively. Therefore, we embedded Profit-Sharing into the operator of crane to acquire cooperative rules in such a dynamic domain, and introduce its applicability to the realistic world by means of comparing with RAP (Reactive Action Planner) model, encoded by expert's knowledge.
URL: https://global.ieice.org/en_transactions/communications/10.1587/e83-b_5_1039/_p
Copier
@ARTICLE{e83-b_5_1039,
author={Sachiyo ARAI, Kazuteru MIYAZAKI, Shigenobu KOBAYASHI, },
journal={IEICE TRANSACTIONS on Communications},
title={Controlling Multiple Cranes Using Multi-Agent Reinforcement Learning: Emerging Coordination among Competitive Agents},
year={2000},
volume={E83-B},
number={5},
pages={1039-1047},
abstract={This paper describes the Profit-Sharing, a reinforcement learning approach which can be used to design a coordination strategy in a multi-agent system, and demonstrates its effectiveness empirically within a coil-yard of steel manufacture. This domain consists of multiple cranes which are operated asynchronously but need coordination to adjust their initial plans of task execution to avoid the collisions, which would be caused by resource limitation. This problem is beyond the classical expert's hand-coding methods as well as the mathematical analysis, because of scattered information, stochastically generated tasks, and moreover, the difficulties to transact tasks on schedule. In recent few years, many applications of reinforcement learning algorithms based on Dynamic Programming (DP), such as Q-learning, Temporal Difference method, are introduced. They promise optimal performance of the agent in the Markov decision processes (MDPs), but in the non-MDPs, such as multi-agent domain, there is no guarantee for the convergence of agent's policy. On the other hand, Profit-Sharing is contrastive with DP-based ones, could guarantee the convergence to the rational policy, which means that agent could reach one of the desirable status, even in non-MDPs, where agents learn concurrently and competitively. Therefore, we embedded Profit-Sharing into the operator of crane to acquire cooperative rules in such a dynamic domain, and introduce its applicability to the realistic world by means of comparing with RAP (Reactive Action Planner) model, encoded by expert's knowledge.},
keywords={},
doi={},
ISSN={},
month={May},}
Copier
TY - JOUR
TI - Controlling Multiple Cranes Using Multi-Agent Reinforcement Learning: Emerging Coordination among Competitive Agents
T2 - IEICE TRANSACTIONS on Communications
SP - 1039
EP - 1047
AU - Sachiyo ARAI
AU - Kazuteru MIYAZAKI
AU - Shigenobu KOBAYASHI
PY - 2000
DO -
JO - IEICE TRANSACTIONS on Communications
SN -
VL - E83-B
IS - 5
JA - IEICE TRANSACTIONS on Communications
Y1 - May 2000
AB - This paper describes the Profit-Sharing, a reinforcement learning approach which can be used to design a coordination strategy in a multi-agent system, and demonstrates its effectiveness empirically within a coil-yard of steel manufacture. This domain consists of multiple cranes which are operated asynchronously but need coordination to adjust their initial plans of task execution to avoid the collisions, which would be caused by resource limitation. This problem is beyond the classical expert's hand-coding methods as well as the mathematical analysis, because of scattered information, stochastically generated tasks, and moreover, the difficulties to transact tasks on schedule. In recent few years, many applications of reinforcement learning algorithms based on Dynamic Programming (DP), such as Q-learning, Temporal Difference method, are introduced. They promise optimal performance of the agent in the Markov decision processes (MDPs), but in the non-MDPs, such as multi-agent domain, there is no guarantee for the convergence of agent's policy. On the other hand, Profit-Sharing is contrastive with DP-based ones, could guarantee the convergence to the rational policy, which means that agent could reach one of the desirable status, even in non-MDPs, where agents learn concurrently and competitively. Therefore, we embedded Profit-Sharing into the operator of crane to acquire cooperative rules in such a dynamic domain, and introduce its applicability to the realistic world by means of comparing with RAP (Reactive Action Planner) model, encoded by expert's knowledge.
ER -