1. Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning
- Author
-
Paulo Roberto de Oliveira da Costa, Uzay Kaymak, Yingqian Zhang, Jason Rhuggenaath, Alp Akcay, Information Systems IE&IS, Operations Planning Acc. & Control, Industrial Engineering and Innovation Sciences, EAISI Health, EAISI Foundational, and EAISI High Tech Systems
- Subjects
Mathematical optimization ,Combinatorial optimization ,Travelling salesman problem ,General Computer Science ,Computer Networks and Communications ,Computer science ,02 engineering and technology ,Artificial Intelligence ,020204 information systems ,Vehicle routing problem ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,Local search (optimization) ,Deep reinforcement learning ,Artificial neural network ,business.industry ,Heuristic ,2-opt ,Computer Graphics and Computer-Aided Design ,Computer Science Applications ,Computational Theory and Mathematics ,Beam search ,020201 artificial intelligence & image processing ,business ,Heuristics - Abstract
Recent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which can be easily extended to more generalk-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions faster than previous state-of-the-art deep learning methods for the TSP. We also show we can adapt the proposed method to two extensions of the TSP: the multiple TSP and the Vehicle Routing Problem, achieving results on par with classical heuristics and learned methods.
- Published
- 2021