Wang, Enshu, Ding, Rong, Yang, Zhaoxing, Jin, Haiming, Miao, Chenglin, Su, Lu, Zhang, Fan, Qiao, Chunming, and Wang, Xinbing
Nowadays, most of the taxi drivers have become users of the relocation recommendation service offered by online ride-hailing platforms (e.g., Uber and Didi Chuxing), which could oftentimes lead drivers to places with profitable orders. At the same time, electric taxis (e-taxis) are increasingly adopted and gradually replacing gasoline taxis in today’s public transportation systems due to their environmental-friendly nature. Though effective for traditional gasoline taxis, existing relocation recommendation schemes are rather suboptimal for e-taxi drivers’ user experience. On one hand, the existing schemes take no account of taxis’ refueling decisions, as the refueling durations of gasoline taxis are usually short enough to be ignored. However, the charging duration of the e-taxis spent at charging stations can be as long as hours. Obviously, an e-taxi’s battery could be easily depleted by the continuous relocations suggested by existing schemes, and thus will have to be charged for a long time afterwards, making the e-taxi driver miss numerous order-serving opportunities. On the other hand, charging posts are typically sparsely and unevenly distributed across a city. With no consideration of charging opportunities, existing schemes could probably send an e-taxi to an area with no charging post around, even though its battery is running low. To optimize e-taxi drivers’ user experience, in this paper, we design a joint charging and relocation recommendation system for e-taxi drivers (CARE). We take the perspective of e-taxi drivers and formulate their decision making as a multi-agent reinforcement learning problem where each e-taxi driver aims to maximize his own cumulative rewards. More specifically, we propose a novel multi-agent mean field hierarchical reinforcement learning (MFHRL) framework. The hierarchical architecture of MFHRL helps the proposed CARE provide far-sighted charging and relocation recommendations for e-taxi drivers. Besides, we integrate each hierarchical level of MFHRL separately with the mean field approximation to incorporate e-taxis’ mutual influences in decision making. We set up a simulator with one of the largest real-world e-taxi datasets in Shenzhen, China, which contains the GPS trajectory data and transaction data of 3848 e-taxis from June 1st to June 30th, 2017, coupled with 165 charging stations including 317 fast charging posts and 1421 slow charging posts. We adopt this simulator to generate 6 dynamic urban environments, which reflect the different real-world scenarios faced by e-taxi drivers. In all of these environments, we conduct extensive experiments to validate that the proposed MFHRL framework greatly outperforms all baselines by significantly increasing the rewards obtained by e-taxi drivers. Besides, we also show that the charging policy learned by MFHRL can effectively reduce the range anxiety of e-taxi drivers, which significantly boosts e-taxi drivers’ quality of experience. [ABSTRACT FROM AUTHOR]