|Table of Contents|

Robot Q-learning coverage algorithm in unknown environments

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

Issue:
2013年06期
Page:
792-798
Research Field:
Publishing date:

Info

Title:
Robot Q-learning coverage algorithm in unknown environments
Author(s):
Chang Baoxian1Ding Jie2Zhu Junwu2Zhang Yonglong23
1.College of Sciences,Nanjing University of Technology,Nanjing 211816,China;
2.College of Information Engineering,Yangzhou University,Yangzhou 225009,China;
3.College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics, Nanjing 210016,China
Keywords:
unknown environments Q-learning coverage algorithm robots area coverage grid model
PACS:
TP24
DOI:
-
Abstract:
A Q-Learning coverage algorithm(QLCA)is presented to improve the area coverage rates of robots in unknown environments.A grid model is constructed for an environment and the positions of robots and barriers are deployed in the grid map randomly.The subsequent action choices and path plans of the robots are directed by the Qtable gained from the robots' self-learning of the QLCA,and the moving frequencies of robots are decreased.The effects of parameters such as the number of robots,environments on this algorithm are analyzed.The simulation results show:compared with the random chosen coverage algorithm(RSCA),the QLCA optimizes the coverage executing steps and redundancy obviously.

References:

[1] 蔡自兴,崔益安.多机器人覆盖技术研究进展[J].控制与决策,2008,23(5):481-486.
Cai Zixing,Cui Yian.Survey of multi-robot coverage[J].Control and Decision,2008,23(5):481-486.
[2]Watkins C J C H.Learning from delayed rewards[D].Cambridge,UK:King's College,University of Cambridge,1989:1-55.
[3]Cheng Ke.Multi-robot coalition formation for distributed area coverage[D].Omaha,USA:Computer Science Department,University of Nebraska,2011:3-55.
[4]Minsky M L.Theory of neural-analog reinforcement systems and its application to the brain-model problem[M].Princeton,USA:Princeton University,1954:5-23.
[5]梁泉.未知环境中基于强化学习的移动机器人路径规划[J].机电工程,2012,29(4):477-481.
Liang Quan.Reinforcement learning based mobile robot path planning in unknown environment[J].Journal of Mechanical and Electrical Engineering,2012,29(4):477-481.
[6]Fazli P,Davoodi A,Mackworth A K.Multi-robot repeated area coverage:Performance optimization under various visual ranges[A].Ninth Conference on Computer and Robot Vision(CRV 2012)[C].Toronto,Canada:IEEE,2012:298-305.
[7]Hazon N,Mieli F,Kaminka G A.Towards robust on-line multi-robot coverage[A].Proceedings of 2006 IEEE International Conference on Robotics and Automation[C].Singapore City,Singapore:IEEE,2006:1710-1715.
[8]Jeon H S,Ko M C,Oh R,et al.A practical robot coverage algorithm for unknown environments[M].Berlin,Germany:Springer-Verlag Berlin Heidelberg,2010:129-140.
[9]张家旺,韩光胜,张伟.Q学习算法在RoboCup带球中的应用[J].系统仿真技术,2005(2):84-87. vZhang Jiawang,Han Guangsheng,Zhang Wei.Application of Q-learning algorithm in dribbling ball training of RoboCup[J].System Simulation Technology,2005(2):84-87.
[10]曹江丽.水下机器人路径规划问题的关键技术研究[D].哈尔滨:哈尔滨工程大学计算机科学与技术学院,2009:3-45.
[11] Matignon L,Laurent G J,Le Fort-Piat N.Independent reinforcement learners in cooperative Markov games:A survey regarding coordination problems[J].The Knowledge Engineering Review,2012,27(1):1-31.
[12]Cheng K,Dasgupta P.Multi-agent coalition formation for distributed area coverage[M].Berlin,Germany:Springer-Verlag Berlin Heidelberg,2011:4-13.
[13]Cheng K,Dasgupta P.Weighted voting game based multi-robot team formation for distributed area coverage[A].Proceedings of the 3rd International Symposium on Practical Cognitive Agents and Robots[C].New York,USA:ACM,2010:9-15.
[14]Dasgupta P,Cheng K,Banerjee B.Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach[M].Berlin,Germany:Springer-Verlag Berlin Heidelberg,2012:330-345.
[15]祖莉.智能割草机器人全区域覆盖运行的控制和动力学特性研究[D].南京:南京理工大学机械工程学院,2005:15-20.
[16]刘杰.基于强化学习的多机器人围捕策略的研究[D].长春:东北师范大学计算机科学与信息技术学院,2009:7-33.
[17]Wu Jun,Xu Xin,Zhang Pengcheng.A novel multi-agent reinforcement learning approach for job scheduling in grid computing[J].Future Generation Computer Systems,2011(27):430-439.

Memo

Memo:
-
Last Update: 2013-12-31