Multi-Agent Deep Reinforcement Learning for Persistent Monitoring with Sensing, Communication, and Localization Constraints
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING(2024)
摘要
Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic system consisting of two types of agents: anchor agents with accurate localization capability and auxiliary agents with low localization accuracy. To localize itself, the auxiliary agents must be within the communication range of an anchor, directly or indirectly. The robotic team’s objective is to minimize environmental uncertainty through persistent monitoring. We propose a multi-agent deep reinforcement learning (MARL) based architecture with graph convolution called Graph Localized Proximal Policy Optimization (GALOPP), which incorporates the limited sensor field-of-view, communication, and localization constraints of the agents along with persistent monitoring objectives to determine motion policies for each agent. We evaluate the performance of GALOPP on open maps with obstacles having a different number of anchor and auxiliary agents. We further study 1) the effect of communication range, obstacle density, and sensing range on the performance and 2) compare the performance of GALOPP with area partition, greedy search, random search, and random search with communication constraint strategies. For its generalization capability, we also evaluated GALOPP in two different environments – 2-room and 4-room. The results show that GALOPP learns the policies and monitors the area well. As a proof-of-concept, we perform hardware experiments to demonstrate the performance of GALOPP. Note to Practitioners —Persistent monitoring is performed in various applications like search and rescue, border patrol, wildlife monitoring, etc. Typically, these applications are large-scale, and hence using a multi-robot system helps achieve the mission objectives effectively. Often, the robots are subject to limited sensing range and communication range, and they may need to operate in GPS-denied areas. In such scenarios, developing motion planning policies for the robots is difficult. Due to the lack of GPS, alternative localization mechanisms, like SLAM, high-accurate INS, UWB radio, etc. are essential. Having SLAM or a highly accurate INS system is expensive, and hence we use agents having a combination of expensive, accurate localization systems (anchor agents) and low-cost INS systems (auxiliary agents) whose localization can be made accurate using cooperative localization techniques. To determine efficient motion policies, we use a multi-agent deep reinforcement learning technique (GALOPP) that takes the heterogeneity in the vehicle localization capability, limited sensing, and communication constraints into account. GALOPP is evaluated using simulations and compared with baselines like random search, random search with ensured communication, greedy search, and area partitioning. The results show that GALOPP outperforms the baselines. The GALOPP approach offers a generic solution that be adopted with various other applications.
更多查看译文
关键词
Multi-agent deep reinforcement learning (MARL),persistent monitoring (PM),graph neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn