Mobile robots equipped with camera sensors are required to perceive humans and their actions for safe autonomous navigation. For simultaneous human detection and action recognition, the real-time performance of the robot vision is an important issue. In this paper, we propose a robot vision system in which original images captured by a camera sensor are described by the optical flow. These images are then used as inputs for the human and action classifications. For the image inputs, two classifiers based on convolutional neural networks are developed. Moreover, we describe a novel detector (a local search window) for clipping partial images around the target human from the original image. Since the camera sensor moves together with the robot, the camera movement has an influence on the calculation of optical flow in the image, which we address by further modifying the optical flow for changes caused by the camera movement. Through the experiments, we show that the robot vision system can detect humans and recognize the action in real time. Furthermore, we show that a moving robot can achieve human detection and action recognition by modifying the optical flow. [ABSTRACT FROM AUTHOR]