基于深度强化学习的无人艇航迹规划与控制

关巍; 奚赵勇; 崔哲闻; 张显库

doi:10.3969/j.issn.1000-4653.2025.03.016

基于深度强化学习的无人艇航迹规划与控制

Trajectory planning and control for unmanned surface vehicle based on deep reinforcement learning

摘要

摘要: 本研究旨在运用强化学习方法解决无人艇航迹规划与控制问题。在航迹规划方面，采用Q学习(Q-learning)算法，针对真实水域进行航迹规划。在奖励函数设计中考虑了浅水区，并致力于减少航迹的转向点数量。在航迹控制方面，将柔性动作评价(SAC)算法与比例积分微分(PID)控制算法相结合，克服了传统PID控制器参数人工整定、调节困难的问题的同时，也规避了深度强化学习缺乏可解释性的缺点。通过与传统PID算法、遗传算法(GA)和深度确定性策略梯度(DDPG)进行对比试验，展现出所提出SAC-PID方法的优越性。仿真结果表明，所规划的航迹能够综合考虑航迹距离、浅水区、航路转向点数量等优化目标，所提出的SAC-PID方法能够很好实现航迹跟踪效果。

Abstract: This study aims to apply deep reinforcement learning to address the challenges of trajectory planning and control for unmanned surface vehicles. In trajectory planning, the Q-learning algorithm is employed to generate trajectories in realworld aquatic environments. For the design of the reward function, factors such as shallow water areas are taken into account, with an emphasis on minimizing the number of turning points along the path. For trajectory tracking control, we integrate the Soft Actor-Critic（SAC） algorithm with the Proportional-Integral-Derivative（PID） control method to alleviate the difficulties of manual parameter tuning associated with conventional PID controllers. This hybrid approach also mitigates the interpretability limitations often found in pure deep reinforcement learning methods. Comparative experiments involving the traditional PID algorithm, Genetic Algorithm（GA）, and Deep Deterministic Policy Gradient（DDPG） algorithm demonstrate the superiority of the proposed SAC-PID method. Simulation results show that the planned trajectories effectively incorporate multiple factors, including travel distance, shallow water regions, and number of turning point, the SAC-PID method achieves outstanding performance in trajectory tracking.

HTML全文

参考文献(23)

施引文献

资源附件(0)