Studying cooperative multi-agent reinforcement learning in networks

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The British University in Dubai (BUiD)
Cooperative Multi-Agent systems, where agents work together as one team to achieve a common goal, form the majority of real-life multi-agent applications. Therefore, it is important to find a suitable multi-agent reinforcement learning algorithm to help agents to achieve their goal through finding the optimal joint policy that maximizes the team’s total reward. Since the last decade, several multi-agent learning algorithms have been proposed and applied to cooperative multi-agent settings. However, most of these learning algorithms do not allow agents to communicate with each other during the execution time, making it hard for agents to coordinate their actions especially in large-scale and partially observable domains. Thus, several coordinated learning algorithms which allow agents to communicate during the execution time have been applied to large cooperative multi-agent domains and proved to be efficient and effective in such domains. Nonetheless, to the best of our knowledge, there is no work that studied the characteristics of such learning algorithms under different network structures. The work done in this thesis aims to study and analyze the characteristics of one of the recent coordinated multi-agent learning approaches, the coordinated Q-learning algorithm, in two-player two-action cooperative and semi-cooperative games under random and scale-free network structures. Also, this thesis conducts a comparison between the original Q-learning algorithm and the coordinated Q-learning algorithm to better understand the difference between both of these algorithms. A simulator has been built in order to conduct experimental analyses. Experimental results verify the robustness, effectiveness and efficiency of the coordinated Q-learning algorithm. The coordinated Q-learning algorithm converges faster and performs better than the original Q-learning algorithm due to its distributive nature and its communication feature which do not exist in the original Q-learning algorithm. Also, the performance of the coordinated Q-learning is not affected by the network structures of random and scale-free networks. Such characteristics can be utilized in future works to further improve the performance of different coordinated learning algorithms in different cooperative multi-agent domains.
multi-agent reinforcement, Q-learning