In order to accurately predict the short-term passenger flow of urban rail transit for multiple stations, this paper proposed a deep learning model, SAE-ConvLSTM, combining convolutional long short-term memory ( ConvLSTM) and stack autoencoder (SAE) . This paper considered thirteen external factors related to passenger flow, whose features would be extracted by SAE with successive layers and thus obtain more representative features. It proposed ConvLSTM to extract spatiotemporal features of passenger flow, which was combined with the resulting external factors to predict short-term passenger flow of multiple stations simultaneously. And it developed latent action Monte Carlo tree search ( LA-MCTS) to optimize the parameters of SAE. Compared with genetic algorithm( GA), particle swarm optimization ( PSO), simulated annealing algorithm (SA) and tabu search (TS), LA-MCTS performed best in terms of effect and efficiency. This paper conducted extensive experiments. The results show that SAE-ConvLSTM works better than shallow machine learning model-back propagation neural network ( BPNN), support vector regression mode(SVR),autoregressive integrated moving average model ( ARIMA ),and deep learning modellong and short time memory network ( LSTM), convolutional neural network (CNN) and Conv LSTM without external features, ConvLSTM external features without SAE, CNN + LSTM and CNN + LSTM with external features, in terms of root mean square errors ( RMSE), mean absolute errors ( MAE) and mean absolute percentage errors ( MAPE), and the goodness of fit ( R² ) . [ABSTRACT FROM AUTHOR]