[1]常宝娴,丁 洁,朱俊武,等.未知环境下机器人Q学习覆盖算法[J].南京理工大学学报(自然科学版),2013,37(06):792-798.
 Chang Baoxian,Ding Jie,Zhu Junwu,et al.Robot Q-learning coverage algorithm in unknown environments[J].Journal of Nanjing University of Science and Technology,2013,37(06):792-798.
点击复制

未知环境下机器人Q学习覆盖算法
分享到:

《南京理工大学学报》(自然科学版)[ISSN:1005-9830/CN:32-1397/N]

卷:
37卷
期数:
2013年06期
页码:
792-798
栏目:
出版日期:
2013-12-31

文章信息/Info

Title:
Robot Q-learning coverage algorithm in unknown environments
作者:
常宝娴1丁 洁2朱俊武2章永龙23
1.南京工业大学 理学院,江苏 南京 211816;
2.扬州大学 信息工程学院,江苏 扬州 225009;
3.南京航空航天大学 计算机科学与技术学院,江苏 南京 210016
Author(s):
Chang Baoxian1Ding Jie2Zhu Junwu2Zhang Yonglong23
1.College of Sciences,Nanjing University of Technology,Nanjing 211816,China;
2.College of Information Engineering,Yangzhou University,Yangzhou 225009,China;
3.College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics, Nanjing 210016,China
关键词:
未知环境 Q-学习覆盖算法 机器人 区域覆盖 栅格模型
Keywords:
unknown environments Q-learning coverage algorithm robots area coverage grid model
分类号:
TP24
摘要:
为提高未知环境下机器人区域覆盖率,提出一种Q-学习覆盖算法(QLCA)。对环境建立栅格模型,在栅格地图中随机部署机器人和障碍位置。机器人根据QLCA自主学习得到的Qtable指导其后续的动作选择和路径规划,减少了机器人移动次数。从机器人数目、环境等方面分析了各类参数变化对该算法的影响。仿真实验结果表明:与随机选择覆盖算法对比,QLCA在完成覆盖的执行步数及冗余效果等方面均有明显优化。
Abstract:
A Q-Learning coverage algorithm(QLCA)is presented to improve the area coverage rates of robots in unknown environments.A grid model is constructed for an environment and the positions of robots and barriers are deployed in the grid map randomly.The subsequent action choices and path plans of the robots are directed by the Qtable gained from the robots' self-learning of the QLCA,and the moving frequencies of robots are decreased.The effects of parameters such as the number of robots,environments on this algorithm are analyzed.The simulation results show:compared with the random chosen coverage algorithm(RSCA),the QLCA optimizes the coverage executing steps and redundancy obviously.

参考文献/References:

[1] 蔡自兴,崔益安.多机器人覆盖技术研究进展[J].控制与决策,2008,23(5):481-486.
Cai Zixing,Cui Yian.Survey of multi-robot coverage[J].Control and Decision,2008,23(5):481-486.
[2]Watkins C J C H.Learning from delayed rewards[D].Cambridge,UK:King's College,University of Cambridge,1989:1-55.
[3]Cheng Ke.Multi-robot coalition formation for distributed area coverage[D].Omaha,USA:Computer Science Department,University of Nebraska,2011:3-55.
[4]Minsky M L.Theory of neural-analog reinforcement systems and its application to the brain-model problem[M].Princeton,USA:Princeton University,1954:5-23.
[5]梁泉.未知环境中基于强化学习的移动机器人路径规划[J].机电工程,2012,29(4):477-481.
Liang Quan.Reinforcement learning based mobile robot path planning in unknown environment[J].Journal of Mechanical and Electrical Engineering,2012,29(4):477-481.
[6]Fazli P,Davoodi A,Mackworth A K.Multi-robot repeated area coverage:Performance optimization under various visual ranges[A].Ninth Conference on Computer and Robot Vision(CRV 2012)[C].Toronto,Canada:IEEE,2012:298-305.
[7]Hazon N,Mieli F,Kaminka G A.Towards robust on-line multi-robot coverage[A].Proceedings of 2006 IEEE International Conference on Robotics and Automation[C].Singapore City,Singapore:IEEE,2006:1710-1715.
[8]Jeon H S,Ko M C,Oh R,et al.A practical robot coverage algorithm for unknown environments[M].Berlin,Germany:Springer-Verlag Berlin Heidelberg,2010:129-140.
[9]张家旺,韩光胜,张伟.Q学习算法在RoboCup带球中的应用[J].系统仿真技术,2005(2):84-87. vZhang Jiawang,Han Guangsheng,Zhang Wei.Application of Q-learning algorithm in dribbling ball training of RoboCup[J].System Simulation Technology,2005(2):84-87.
[10]曹江丽.水下机器人路径规划问题的关键技术研究[D].哈尔滨:哈尔滨工程大学计算机科学与技术学院,2009:3-45.
[11] Matignon L,Laurent G J,Le Fort-Piat N.Independent reinforcement learners in cooperative Markov games:A survey regarding coordination problems[J].The Knowledge Engineering Review,2012,27(1):1-31.
[12]Cheng K,Dasgupta P.Multi-agent coalition formation for distributed area coverage[M].Berlin,Germany:Springer-Verlag Berlin Heidelberg,2011:4-13.
[13]Cheng K,Dasgupta P.Weighted voting game based multi-robot team formation for distributed area coverage[A].Proceedings of the 3rd International Symposium on Practical Cognitive Agents and Robots[C].New York,USA:ACM,2010:9-15.
[14]Dasgupta P,Cheng K,Banerjee B.Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach[M].Berlin,Germany:Springer-Verlag Berlin Heidelberg,2012:330-345.
[15]祖莉.智能割草机器人全区域覆盖运行的控制和动力学特性研究[D].南京:南京理工大学机械工程学院,2005:15-20.
[16]刘杰.基于强化学习的多机器人围捕策略的研究[D].长春:东北师范大学计算机科学与信息技术学院,2009:7-33.
[17]Wu Jun,Xu Xin,Zhang Pengcheng.A novel multi-agent reinforcement learning approach for job scheduling in grid computing[J].Future Generation Computer Systems,2011(27):430-439.

备注/Memo

备注/Memo:
收稿日期:2013-08-18 修回日期:2013-09-22
基金项目:国家自然科学基金(61170201)
作者简介:常宝娴(1982-),女,讲师,主要研究方向:排队论,智能信息处理和模式识别,E-mail:changmath@126.com; 通讯作者:朱俊武(1972-),男,博士,教授,主要研究方向:多Agent系统、本体论及机制设计等,E-mail:jwzhu@yzu.edu.cn。
引文格式:常宝娴,丁洁,朱俊武,等.未知环境下机器人Q学习覆盖算法[J].南京理工大学学报,2013,37(6):792-798.
投稿网址:http://njlgdxxb.paperonce.org
更新日期/Last Update: 2013-12-31