Multi Agent Deep Learning with Cooperative Communication

[1] Maximilian Hüttenrauch, Adrian Šošić, and Gerhard Neumann. Guided deep reinforcement learning for swarm systems. arXiv preprint arXiv:1709.06011, 2017.Search in Google Scholar

[2] Lili Ma and Naira Hovakimyan. Vision-based cyclic pursuit for cooperative target tracking. Journal of Guidance, Control, and Dynamics, 36(2):617–622, 2013.10.2514/1.57598Search in Google Scholar

[3] Patrick Mannion, Jim Duggan, and Enda Howley. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic road transport support systems, pages 47–66. Springer, 2016.10.1007/978-3-319-25808-9_4Search in Google Scholar

[4] P Skobelev, E Simonova, and A Zhilyaev. Using multi-agent technology for the distributed management of a cluster of remote sensing satellites. Complex Syst: Fundament Appl, 90:287, 2016.10.2495/978-1-78466-155-7/024Search in Google Scholar

[5] Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al. Star-craft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.Search in Google Scholar

[6] Jonathan Raiman, Susan Zhang, and Filip Wolski. Long-term planning and situational awareness in openai five. arXiv preprint arXiv:1912.06721, 2019.Search in Google Scholar

[7] Tabish Rashid, Mikayel Samvelyan, Christian Schröder de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. CoRR, abs/1803.11485, 2018.Search in Google Scholar

[8] L. Busoniu, R. Babuska, and B. De Schutter. A comprehensive survey of multiagent reinforcement learning. Trans. Sys. Man Cyber Part C, 38(2):156–172, March 2008.10.1109/TSMCC.2007.913919Search in Google Scholar

[9] Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pages 2137–2145, 2016.Search in Google Scholar

[10] Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon White-son. Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.10.1609/aaai.v32i1.11794Search in Google Scholar

[11] Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI ’98/IAAI ’98, pages 746–752, Menlo Park, CA, USA, 1998. American Association for Artificial Intelligence.Search in Google Scholar

[12] Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE, 12(4):1–15, 04 2017.10.1371/journal.pone.0172395538178528380078Search in Google Scholar

[13] Maxim Egorov. Multi-Agent Deep Reinforcement Learning. Technical report, University of Stanford, Department of Computer Science, 2016.Search in Google Scholar

[14] Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems, pages 66–83. Springer, 2017.10.1007/978-3-319-71682-4_5Search in Google Scholar

[15] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pages 6379–6390, 2017.Search in Google Scholar

[16] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pages 2085–2087, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems.Search in Google Scholar

[17] Sainbayar Sukhbaatar, Rob Fergus, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pages 2244–2252, 2016.Search in Google Scholar

[18] Peng Peng, Quan Yuan, Ying Wen, Yaodong Yang, Zhenkun Tang, Haitao Long, and Jun Wang. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. CoRR, abs/1703.10069, 2017.Search in Google Scholar

[19] Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, volume 157, pages 157–163, 1994.10.1016/B978-1-55860-335-6.50027-1Search in Google Scholar

[20] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533, 1986.10.1038/323533a0Search in Google Scholar

[21] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318, 2013.Search in Google Scholar

[22] Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip HS Torr, Pushmeet Kohli, and Shimon Whiteson. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1146–1155. JMLR. org, 2017.Search in Google Scholar

[23] Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.10.1609/aaai.v32i1.11492Search in Google Scholar

[24] Abhishek Das, Satwik Kottur, José MF Moura, Stefan Lee, and Dhruv Batra. Learning cooperative visual dialog agents with deep reinforcement learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 2951–2960, 2017.Search in Google Scholar

[25] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229–256, May 1992.10.1007/BF00992696Search in Google Scholar

[26] David B D’Ambrosio, Skyler Goodell, Joel Lehman, Sebastian Risi, and Kenneth O Stanley. Multirobot behavior synchronization through direct neural network communication. In International Conference on Intelligent Robotics and Applications, pages 603–614. Springer, 2012.10.1007/978-3-642-33515-0_59Search in Google Scholar

[27] Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni. Multi-agent cooperation and the emergence of (natural) language. Proceedings of the International Conference on Learning Representations, 2017.Search in Google Scholar

[28] Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. Deal or no deal? end-toend learning of negotiation dialogues. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2443–2453. Association for Computational Linguistics, 2017.10.18653/v1/D17-1259Search in Google Scholar

[29] Qiyang Li, Xintong Du, Yizhou Huang, Quinlan Sykora, and Angela P. Schoellig. Learning of coordination policies for robotic swarms. CoRR, abs/1709.06620, 2017.Search in Google Scholar

[30] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pages 1928–1937, 2016.Search in Google Scholar

[31] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR.Search in Google Scholar

[32] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. 3rd International Conference for Learning Representations, San Diego, 2015.Search in Google Scholar

eISSN:: 2083-2567
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Databases and Data Mining, Artificial Intelligence

Journal RSS Feed

Multi Agent Deep Learning with Cooperative Communication

Published Online: May 23, 2020

Page range: 189 - 207

Received: Nov 01, 2019

Accepted: Mar 26, 2020

DOI: https://doi.org/10.2478/jaiscr-2020-0013

Keywordsmulti-agent systems, deep reinforcement learning, centralized learning

© 2020 David Simões et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
multi-agent systems, deep reinforcement learning, centralized learning