Descriptor: "OPTICAL flow" / Journal: neurocomputing - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"OPTICAL flow"' showing total 120 results

Start Over Descriptor "OPTICAL flow" Journal neurocomputing

120 results on '"OPTICAL flow"'

1. Decoupled spatiotemporal adaptive fusion network for self-supervised motion estimation.

Author: Sun, Zitang, Luo, Zhengbo, and Nishida, Shin'ya
Subjects: *OPTICAL flow, *CONFIDENCE intervals, *IMAGE registration, *CONFIDENCE regions (Mathematics), *ENTROPY (Information theory), *MOTION
Abstract: • Motion estimation is split into two stages for clear and ambiguous image regions. • We measure the matching confidence using entropy distributions and flow checks. • We design a self-supervised learning strategy to deal with low-confidence regions. [Display omitted] Optical flow estimation searches for correspondence between two images. In the unsupervised approach, most networks use the feature correlation volume to track the flow, and unsupervised training is achieved through a photometric loss function. However, various complex situations in the natural environment, such as object occlusion, motion blur, the camera being out-of-focus, limited perspective, and variation in lighting conditions, make it challenging to find correspondence accurately, thus complicating unsupervised optical flow estimation. This study decouples the problem into two sub-tasks: one is to search for determined correspondence within a pair of frames, and the other is to cope with mismatched regions due to occlusion, blur, light variation, etc., by introducing more spatial and temporal context information. We propose a multi-frame temporal dynamic model that recursively infers optical flow over causal sequences of arbitrary-length. Our innovative approach introduces information entropy and forward–backward consistency checks to measure the confidence regarding the matching of image pairs. To compensate for low-confidence regions, the proposed network adaptively identifies regions with correspondence confidence and utilizes temporal and spatial smoothness assumptions for motion re-prediction. Paired with well-designed simulation of dynamic occlusion pseudo-labels and scene variation, our model can learn a variety of complex scenes in a multi-frame environment to optimize low-confidence regions efficiently. Experimental results demonstrate that the proposed model is able to run at high speed in real-time tasks while maintaining high accuracy, thus achieving state-of-the-art results on Sintel Clean and Final benchmarks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. ContextAVO: Local context guided and refining poses for deep visual odometry.

Author: Song, Rujun, Zhu, Ran, Xiao, Zhuoling, and Yan, Bo
Subjects: *VISUAL odometry, *OPTICAL flow
Abstract: Learning-based monocular visual odometry (VO) has lately drawn significant attention for its robustness to camera parameters and environmental variations. The correlation of ego-motion in the local time dimension, denoted as the local context, is crucial for alleviating accumulated errors of VO problems. Unlike most current learning-based methods, our approach, called ContextAVO, focuses on the effectiveness of local contexts to improve the estimation recovered from consecutive multiple optical flow snippets. To retain the pose consistency in the temporal domain, we design the Context-Attention Refining component to adaptively ameliorate current inference by exploiting the continuity of camera motions and aligning corresponding observations with local contexts. Besides, we employ the multi-length window to make ContextAVO more suitable for general scenarios and less dependent on the fixed length of the input snippet. Extensive experiments on outdoor KITTI, Malaga, ApolloScape, and indoor TUM RGB-D datasets have demonstrated that our approach efficiently produces competitive results against classic algorithms. It outperforms state-of-the-art methods by large margins, improving up to 7.40% and 48.56% for translational and rotational estimation, respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. DDCNet: Deep dilated convolutional neural network for dense prediction.

Author: Salehi, Ali and Balasubramanian, Madhusudhanan
Subjects: *CONVOLUTIONAL neural networks, *DEEP learning, *OPTICAL flow, *COMPUTER vision, *SPATIAL resolution, *ARCHITECTURAL design
Abstract: Dense pixel matching problems such as optical flow and disparity estimation are among the most challenging tasks in computer vision. Recently, several deep learning methods designed for these problems have been successful. A sufficiently larger effective receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates. In this work, we present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution. To achieve a larger ERF, we utilized dilated convolutional layers. By aggressively increasing dilation rates in the deeper layers, we were able to achieve a sufficiently larger ERF with a significantly fewer number of trainable parameters. We used optical flow estimation problem as the primary benchmark to illustrate our network design strategy. The benchmark results (Sintel, KITTI, and Middlebury) indicate that our compact networks can achieve comparable performance in the class of lightweight networks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Underwater self-supervised depth estimation.

Author: Yang, Xuewen, Zhang, Xing, Wang, Nan, Xin, Guoling, and Hu, Wenjie
Subjects: *OPTICAL flow, *ATTENUATION of light, *UNDERWATER exploration, *MONOCULARS, *SUBMERSIBLES
Abstract: • We proposed a self-supervised monocular depth estimation framework for underwater. • Underwater light attenuation is used to help the network extract depth changing. • The consistency between depth and optical flow is used to refine depth map. Accurate underwater depth estimation is a cornerstone of reaching autonomous underwater exploration. However, it is incredibly tricky due to the inherent attenuation character and heavy noise. Fortunately, the depth-changing trend and underwater light attenuation are closely correlated, providing powerful clues for underwater depth estimation. Rather than simulating the underwater attenuation through formulas, we propose an underwater self-supervised depth estimation neural network in our work. With the guidance of multiple constraints, which are meticulously designed based on the comprehensive analyses of underwater characters, this network can learn the depth-changing trend by itself from attenuation information in underwater monocular videos. Our detailed experiments on underwater datasets prove that the proposed framework can obtain accurate and fine-grained depth maps. We believe the work may provide an economical solution for underwater perception. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

5. OF-DFN: Optical flow prediction network for different perspective image fusion.

Author: You, Tianshun, Liu, Ming, Zhao, Yongming, and Dong, Liquan
Subjects: *OPTICAL flow, *IMAGE fusion, *FEATURE extraction, *IMAGE registration
Abstract: Currently, non-decision-level image fusion algorithms require extremely high registration precision of the images to be fused. In the face of different perspective image fusion scenarios, traditional feature registration algorithms and learning-based methods have poor robustness and are unsuitable for large image differences because of the Registration-Fusion separation. In addition, the lack of relevant datasets also hinders the development of different perspective image fusion methods. Given the above problems, we collect 5000 sets of different perspective RGB-MONO datasets in multiple scenes for raw data support. We present an end-to-end learned system for fusing two different perspective photographs into a chosen target view. The cascaded feature extraction based on encoder–decoder structure enables learning optical flow at different feature levels systematically. Then the optical flow module enables the image to be continuously registered and optimized during the fusion process, thus avoiding the deviations introduced by non-end-to-end algorithms. Extensive quantitative and qualitative experiments demonstrate that our proposed system can effectively fuse images from different perspectives in our self-built dataset. Compared with non-end-to-end fusion, our method provides superior performance in several fusion evaluation indicators. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Multi-YOLOv8: An infrared moving small object detection model based on YOLOv8 for air vehicle.

Author: Sun, Shizun, Mo, Bo, Xu, Junwei, Li, Dawei, Zhao, Jie, and Han, Shuo
Subjects: *OPTICAL flow, *INFRARED imaging
Abstract: The detection of infrared moving small objects faces significant challenges in the field of object detection for air vehicles. These types of objects usually occupy a small number of pixels in an infrared image, resulting in limited feature information, considerable feature loss, low recognition accuracy, and various challenges in single-frame detection. To address these challenges, this paper proposes an efficient multi-input method named Multi-YOLOv8, which is based on the YOLOv8s model. The proposed method uses current frames as a primary input and incorporates optical flow processing images and background suppression images as auxiliary inputs to improve detection performance. In addition, an improved method is developed for optical flow computations, named the pyramidal weight-momentum Horn–Schunck (PWMHS) method, which can process optical flows efficiently and precisely. An improved version of the Wise-IoU (WIoU) v3, referred to as α* -WIoU v3, is proposed as a bounding box regression (BBR) loss function to optimize the YOLOv8 network. Further, the BiFormer module and lightweight convolution GSConv are introduced to improve the attention to key information for the objects and balance the computational cost and detection performance, respectively. Moreover, a small object detection layer is added the YOLOv8 network to improve the capability for small object detection. Finally, a warming-up training method that can reduce the dependency on auxiliary inputs and ensure model stability in case of auxiliary input failures is developed. The results of the comprehensive experiments on an open-access dataset reveal that the proposed model outperforms the mainstream models in overall performance. The proposed method can significantly enhance the detection ability of infrared moving small objects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Needle in a Haystack: Spotting and recognising micro-expressions "in the wild".

Author: Gan, Y.S., See, John, Khor, Huai-Qian, Liu, Kun-Hong, and Liong, Sze-Teng
Subjects: *FACIAL expression, *ARTIFICIAL neural networks, *EMOTION recognition, *OPTICAL flow, *POKER, *HUMAN fingerprints
Abstract: Computational research on facial micro-expressions has long focused on videos captured under constrained laboratory conditions due to the challenging elicitation process and limited samples that are publicly available. Moreover, processing micro-expressions is extremely challenging under unconstrained scenarios. This paper introduces, for the first time, a completely automatic micro-expression "spot-and-recognize" framework that is performed on in-the-wild videos, such as in poker games and political interviews. The proposed method first spots the apex frame from a video by handling head movements and unconscious actions which are typically larger in motion intensity, with alignment employed to enforce a canonical face pose. Optical flow guided features play a central role in our method: they can robustly identify the location of the apex frame, and are used to learn a shallow neural network model for emotion classification. Experimental results demonstrate the feasibility of the proposed methodology, establishing good baselines for both spotting and recognition tasks – ASR of 0.33 and F1-score of 0.6758 respectively on the MEVIEW micro-expression database. In addition, we present comprehensive qualitative and quantitative analyses to further show the effectiveness of the proposed framework, with new suggestion for an appropriate evaluation protocol. In a nutshell, this paper provides a new benchmark for apex spotting and emotion recognition in an in-the-wild setting. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

8. A comparative study on optical flow for facial expression analysis.

Author: Allaert, B., Ward, I.R., Bilasco, I.M., Djeraba, C., and Bennamoun, M.
Subjects: *FACIAL expression, *OPTICAL flow, *COMPARATIVE studies, *DATA augmentation
Abstract: Optical flow techniques are becoming increasingly performant and robust when estimating motion in a scene, but their performance has yet to be proven in the area of facial expression recognition. In this work, a variety of optical flow approaches are evaluated across multiple facial expression datasets, so as to provide a consistent performance evaluation. The aim of this work is not to propose a new expression recognition technique, but to understand better the adequacy of existing state-of-the art optical flow for encoding facial motion in the context of facial expression recognition. Our evaluations highlight the fact that motion approximation methods used to overcome motion discontinuities have a significant impact when optical flows are used to characterize facial expressions. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. Motion cues guided feature aggregation and enhancement for video object segmentation.

Author: Li, Xuejun, Zheng, Wenming, and Zong, Yuan
Subjects: *CONVOLUTIONAL neural networks, *MOTION, *OPTICAL flow
Abstract: Video object segmentation (VOS) aims to separate unknown target objects from various given video sequences. Although many recent successful methods boosted the performance of VOS, especially those using deep convolution neural networks (CNNs), it is still difficult to aggregate deep features as well as motion cues effectively, which can be important to associate valid information of adjacent frames in video sequences. To tackle this problem, we propose a simple yet effective feature optimization method for VOS based on motion information. To achieve this, we construct a two-branch deep network and use computed motion cues (i.e., optical flow) to jointly optimize global and local interframe correlation information. Additionally, a clustering-based feature enhancement module is proposed to further fuse motion information and enhance the feature saliency of the target area. Optimized feature maps show a significant performance improvement in the final VOS tasks, especially those with rapid target movement. Experiments on the DAVIS16, DAVIS17, YouTube-Objects and YouTube-VOS datasets demonstrate that our simple feature aggregation and enhancement method for VOS improves segmentation accuracy effectively and gains an impressive result compared to many state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

10. Quality enhancement of compressed screen content video by cross-frame information fusion.

Author: Huang, Jiawang, Cui, Jinzhong, Ye, Mao, Li, Shuai, and Zhao, Yu
Subjects: *OPTICAL flow, *INTERNET traffic, *FEATURE extraction, *ONLINE education, *VIDEOS
Abstract: In recent years, with the rise of various online learning platforms and game live broadcast industries, screen content video, a special type of video, is gradually emerging, and its traffic on the Internet is also increasing. Therefore, how to effectively enhance the quality of the screen content video has become an urgent problem to be solved. There exist a few successful compressed video enhancement algorithms. However, since there are a large number of areas with similar colors in the compressed screen content video, the traditional algorithms based on optical flow and deformable convolution cannot align the screen content video frames well. Specifically, for screen content videos containing animations and games, we propose a screen content video quality enhancement network based on the cross-fusion of multi-frame information. It includes a feature extraction module, a feature fusion module, an edge detail recovery module, and a reconstruction module. Our main contribution is the alignment-free quality enhancement framework based on cross-frame information fusion instead of traditional alignment based approaches. Through our experiments, the best results have been achieved on 13 screen content videos containing animations and games compressed by the SCC branch of HEVC/H.265. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. Applications of fractional calculus in computer vision: A survey.

Author: Arora, Sugandha, Mathur, Trilok, Agarwal, Shivi, Tiwari, Kamlesh, and Gupta, Phalguni
Subjects: *FRACTIONAL calculus, *INTERNET surveys, *IMAGE recognition (Computer vision), *IMAGE processing, *COMPUTER vision, *DISCONTINUOUS functions, *OPTICAL flow
Abstract: Fractional calculus is an abstract idea exploring interpretations of differentiation having non-integer order. For a very long time, it was considered as a topic of mere theoretical interest. However, the introduction of several useful definitions of fractional derivatives has extended its domain to applications. Supported by computational power and algorithmic representations, fractional calculus has emerged as a multifarious domain. It has been found that the fractional derivatives are capable of incorporating memory into the system and thus suitable to improve the performance of locality-aware tasks such as image processing and computer vision in general. This article presents an extensive survey of fractional-order derivative-based techniques that are used in computer vision. It briefly introduces the basics and presents applications of the fractional calculus in six different domains viz. edge detection, optical flow, image segmentation, image de-noising, image recognition, and object detection. The fractional derivatives ensure noise resilience and can preserve both high and low-frequency components of an image. The relative similarity of neighboring pixels can get affected by an error, noise, or non–homogeneous illumination in an image. In that case, the fractional differentiation can model special similarities and help compensate for the issue suitably. The fractional derivatives can be evaluated for discontinuous functions, which help estimate discontinuous optical flow. The order of the differentiation also provides an additional degree of freedom in the optimization process. This study shows the successful implementations of fractional calculus in computer vision and contributes to bringing out challenges and future scopes. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

12. High-resolution optical flow and frame-recurrent network for video super-resolution and deblurring.

Author: Fang, Ning and Zhan, Zongqian
Subjects: *OPTICAL flow, *IMAGE stabilization, *CONVOLUTIONAL neural networks, *CAMCORDERS, *DEEP learning, *VIDEO processing
Abstract: Over the last years, advances in deep learning have brought huge developments to the studying of super-resolution reconstruction. However, most super-resolution methods only deal with simply down-sampled sharp images, which may lose efficacy when encountering severe blur. The severe motion blur caused by the rapid movement of an object or the large shake of the lens is common in video captured by cameras. However, existing super-resolution algorithms often bring a large amount of artifacts and are difficult to achieve satisfactory results when reconstructing such blurred video sequences. In this paper, a novel convolutional neural network is proposed that jointly processes video super-resolution (SR) and deblurring (DB) to deal with severe motion blur and recover sharp high-resolution (HR) frames. In particular, a pyramid optical flow module is introduced to estimate the sharp latent image in the blurred frame and generate HR optical flow in a coarse-to-fine way. Then, the frame-recurrent is used to warp the previous SR frame to achieve motion compensation and make full use of the previous sharp features and temporal information to help restoration of the subsequent frames. Next, to further overcome the destruction caused by motion blur in the final reconstruction, a parallel-fusion module was designed to extract and fuse the SR and DB features, finally reconstructing the output frame. Experimental results obtained in this study confirm that, compared with other advanced SR algorithms, the proposed method is both effective and efficient in dealing with videos that contain real motion blur. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

13. Spatial-temporal 3D dependency matching with self-supervised deep learning for monocular visual sensing.

Author: Song, Chengqun, Niu, Maolong, Liu, Zhaopeng, Cheng, Jun, Wang, Peng, Li, Hongjian, and Hao, Luoying
Subjects: *VISUAL learning, *DEEP learning, *OPTICAL flow, *SUPERVISED learning, *MONOCULARS, *CAMERAS
Abstract: Monocular visual sensing is the task of using a camera to estimate the scene depth, optical flow and camera pose. In this paper, we propose a spatial–temporal 3D dependency matching approach that enforces the robustness of continuous frames matching for monocular visual sensing. 3D structure and warped depth based geometry backpropagation are used to encourage jointly learning the view depth, optical flow and camera pose employing a novel self-supervised neural network from monocular sequences. We designed two different iterative convolutional prediction sub-networks, where the optical flow obtained by depth and camera pose is iteratively used for depth prediction. A virtual frame method is proposed to optimize the optical flow of moving objects. The salient feature of the proposed learning framework is completely unsupervised, requiring only consecutive monocular images for training and testing. Evaluation on publicly benchmark datasets shows that our unsupervised learning model significantly outperforms previous methods and achieves better performance compared with previously unsupervised manners and achieves comparable results with supervised ones. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

14. TSUDepth: Exploring temporal symmetry-based uncertainty for unsupervised monocular depth estimation.

Author: Zhu, Yufan, Ren, Rui, Dong, Weisheng, Li, Xin, and Shi, Guangming
Subjects: *OPTICAL flow, *DEEP learning, *MONOCULARS, *DISTILLATION, *SYMMETRY
Abstract: When faced with occlusions and non-rigid motions, machines often struggle with depth estimation, a task effortlessly performed by humans with just one eye. Continuous RGB images embody rich temporal features, such as symmetry and optical flow, which current deep-learning models fail to effectively leverage. In response to this limitation, we introduce an innovative framework known as Temporal Symmetry-based Uncertainty (TSU)-Depth, aimed at enhancing the accuracy of unsupervised monocular depth estimation. The Temporal Symmetry-based Occlusion Optimization (TSOO) component plays a pivotal role in robustly identifying occluded regions and comparable optimization across adjacent frames. Simultaneously, we propose Temporal Optical Flow Masking (TOFM) to effectively identify and exclude static pixels (such as out-of-range depths and non-rigid objects) between adjacent frames. Additionally, we introduce Cross-Resolution Distillation (CRED) to enhance depth estimation accuracy across various resolutions, especially in low input resolution scenarios. Furthermore, we designed a new depth estimation structure utilizing the DPT structure and incorporating a GRU module to enhance performance details. Through extensive experiments on benchmark datasets, including KITTI, Cityscapes, and Make3D, our TSUDepth framework has consistently demonstrated state-of-the-art performance. Code is available at https://github.com/BlueEg/TSUDepth/. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention.

Author: Zheng, Qingping, Li, Ying, Zheng, Ling, and Shen, Qiang
Subjects: *OBJECT recognition (Computer vision), *MOTION, *VIDEOS, *OPTICAL pattern recognition, *MACHINE learning, *PATTERN recognition systems, *OPTICAL flow
Abstract: Semantics and motion are two cues of essence for the success in video salient object detection. Most existing deep-learning based approaches extract semantic features by the use of only one fully convolutional network with simple stacked encoders. They simulate motion patterns of video objects with two consecutive frames being simultaneously fed into a convolutional LSTM network or a weights-sharing fully convolutional network. However, such approaches have the shortcomings of producing a coarse predicted saliency map or requiring significant computational overheads. In this paper, we present a novel approach with cascaded fully convolutional networks involving motion attention (abbreviated as CFCN-MA), to achieve real-time saliency detection in videos. Our key idea is to construct twofold fully convolutional networks in order to gain a saliency map from coarse to fine. We devise an optical flow-based motion attention mechanism to improve the prediction accuracy of the initial fully convolutional networks, using the popular FlowNet2-SD model that is efficient and effective for motion pattern recognition of distinctive objects in videos. This method can obtain a fine saliency map with a refined region of interest. Moreover, we propose a means for calculating attention-guided intersection-over-union loss (shortnamed as AIoU) to supervise the CFCN-MA model in learning a saliency map with both clear edge and complete structure. Our approach is evaluated on three popular benchmark datasets, namely DAVIS, ViSal and FBMS. Experimental results demonstrate that our method outperforms many state-of-the-art techniques while meeting the real-time demand at 27 fps. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

16. DeepAVO: Efficient pose refining with feature distilling for deep Visual Odometry.

Author: Zhu, Ran, Yang, Mingkun, Liu, Wang, Song, Rujun, Yan, Bo, and Xiao, Zhuoling
Subjects: *VISUAL odometry, *CONVOLUTIONAL neural networks, *OPTICAL flow, *FEATURE selection, *DEEP learning, *ALGORITHMS
Abstract: The technology for Visual Odometry (VO) that estimates the position and orientation of the moving object through analyzing the image sequences captured by on-board cameras, has been well investigated with the rising interest in autonomous driving. This paper studies monocular VO from the perspective of Deep Learning (DL). Unlike most current learning-based methods, our approach, called DeepAVO, is established on the intuition that features contribute discriminately to different motion patterns. Specifically, we present a novel four-branch network to learn the rotation and translation by leveraging Convolutional Neural Networks (CNNs) to focus on different quadrants of optical flow input. To enhance the ability of feature selection, we further introduce an effective channel-spatial attention mechanism to force each branch to explicitly distill related information for specific Frame to Frame (F2F) motion estimation. Experiments on various datasets involving outdoor driving and indoor walking scenarios show that the proposed DeepAVO outperforms the state-of-the-art monocular methods by a large margin, demonstrating competitive performance to the stereo VO algorithm and verifying promising potential for generalization. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

17. Self-supervised multi-body scene flow estimation.

Author: Dai, Jihuang, Dai, Yuchao, and Fan, Bin
Subjects: *OPTICAL flow, *RIGID bodies, *MULTIBODY systems, *MOTION
Abstract: In this paper, we address the problem of scene flow estimation from consecutive stereo pairs. In contrast to the state-of-the-art supervised learning based methods, we propose a self-supervised learning based pipeline that removes the requirement of large-scale ground truth annotations. Specifically, we employ a shared encoder StereoFlowNet to simultaneously learn optical flow estimation and disparity estimation, which not only achieves a compact network representation but also exploits the inherent connections between optical flow estimation and disparity estimation. To leverage the scene structure and motion representations, we propose to utilize a piece-wise planar model based disparity computation and multiple rigid body motion representation of the dynamic scene. In this way, the geometric and motion constraints play strong regularizations for the underlying problem. Experimental results on benchmarking dataset show that our proposed method achieves state-of-the-art performance in both optical flow and disparity estimation. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

18. Silicone mask face anti-spoofing detection based on visual saliency and facial motion.

Author: Wang, Guangcheng, Wang, Zhongyuan, Jiang, Kui, Huang, Baojin, He, Zheng, and Hu, Ruimin
Subjects: *FACE, *AUTOMATED teller machines, *SILICONES, *OPTICAL flow, *SUPPORT vector machines
Abstract: Face recognition systems are widely used for target recognition and identity authentication, such as automated teller machines, mobile phones, and entrance guard systems. However, face recognition systems are vulnerable to presentation attacks, such as photo, replay, and 3D mask attacks. In particular, silicone mask attacks pose a greater threat to face recognition systems because high-quality silicone masks do living properties. To promote the development of face anti-spoofing detection algorithms for silicone mask attacks, this paper constructs a Silicone Mask Face Motion Video Dataset (SMFMVD) containing 200 real face videos and 200 silicone mask face videos. These videos include different facial motions collected from 20 subjects. Moreover, inspired by the observation that the silicone mask face's facial movement is not so natural as the real face, we propose a novel silicone mask face anti-spoofing detection method based on visual saliency and facial motion characteristics. Specifically, we compute the visual saliency map of a given face image by simulating two kinds of eye movement behaviors, namely "gaze" and "saccade". Then, we propose a saliency-weighted histogram of local binary pattern operator to extract facial texture features in spatial domain and a saliency-guided histogram of oriented optical flow operator to extract facial motion features in temporal domain. Finally, the support vector machine is used to fuse two groups of facial features to distinguish real and spoof faces. Extensive experiments on public and self-built datasets show its superiority over the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

19. Effective template update mechanism in visual tracking with background clutter.

Author: Liu, Shuai, Liu, Dongye, Muhammad, Khan, and Ding, Weiping
Subjects: *ARTIFICIAL satellite tracking, *ARTIFICIAL intelligence, *SEQUENTIAL learning, *PROBLEM solving, *ALGORITHMS, *RADAR in aeronautics, *OPTICAL flow
Abstract: Today, artificial intelligence is everywhere in people's daily lives. Visual tracking, which is used to identify and continuously track specific targets, is an important research domain in the study of artificial intelligence. However, current visual tracking methods are not accurate enough for object tracking with background clutter, which can easily lead to tracking failures. Therefore, in this paper, in order to solve the problem of tracking failure in clutter background, we propose a template update mechanism to improve the accuracy of visual tracking. First, an original template is saved when the background clutter is detected. During background clutter, we use both the original template and the current template at the location estimated by the optical flow and choose better one. Next, the original template is reused after the background clutter is ended. Finally, the proposed mechanism is used both in the KCF and BACF algorithm to verify the effectiveness of the mechanism. With experiments on the OTB2015 dataset, results show that the proposed mechanism has improved accuracy and success rate of the two baseline algorithms. Meanwhile, in state-of-the-art algorithms, the algorithm using the proposed mechanism also has excellent tracking performance. In addition, this method also has strong tracking robustness and adaptation capability to sequential learning for video data. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

20. Spatial–temporal pooling for action recognition in videos.

Author: Wang, Jiaming, Shao, Zhenfeng, Huang, Xiao, Lu, Tao, Zhang, Ruiqian, and Lv, Xianwei
Subjects: *FERRIES, *CONVOLUTIONAL neural networks, *OPTICAL flow, *SOURCE code, *INFORMATION modeling
Abstract: • We propose an end-to-end approach with a novel temporal-spatial pooling block (named STP) for action classification, which can learn pool discriminative frames and pixels in a certain clip. Our method achieves better performance than other state-of-the-art methods. • We propose a STP loss function, aiming to learn a sparse importance score in the temporal dimension, abandoning the redundant or invalid frames. • We present a ferryboat video database (named Ferryboat-4) for ferry action recognition. The database includes four action categories: Inshore, Offshore, Traffic, and Negative. We evaluate proposed STP and other state-of-the-art models on this database. Recently, deep convolutional neural networks have demonstrated great effectiveness in action recognition with both RGB and optical flow in the past decade. However, existing studies generally treat all frames and pixels equally, potentially leading to poor robustness of models. In this paper, we propose a novel parameter-free spatial–temporal pooling block (referred to as STP) for action recognition in videos to address this challenge. STP is proposed to learn spatial and temporal weights, which are further used to guide information compression. Different from other temporal pooling layers, STP is more efficient as it discards the non-informative frames in a certain clip. In addition, STP applies a novel loss function that forces the model to learn information from sparse and discriminative frames. Moreover, we introduce a dataset for ferry action classification, named Ferryboat-4 , which includes four categories: Inshore , Offshore , Traffic , and Negative. This designed dataset can be used for the identification of ferries with abnormal behaviors, providing the essential information to support the supervision, management, and monitoring of ships. All the videos are acquired via real-world cameras. We perform extensive experiments on publicly available datasets as well as Ferryboat-4 and find that the proposed method outperforms several state-of-the-art methods in action classification. Source code and datasets are available at https://github.com/jiaming-wang/STP. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

21. V-LPDR: Towards a unified framework for license plate detection, tracking, and recognition in real-world traffic videos.

Author: Zhang, Cong, Wang, Qi, and Li, Xuelong
Subjects: *DEEP learning, *STREAMING video & television, *MOTION detectors, *OBJECT recognition (Computer vision), *OPTICAL flow, *VIDEOS, *REGISTRATION of automobiles, *DISTRACTED driving
Abstract: License plate detection and recognition (LPDR) has attracted considerable attention in recent years, and many algorithms have presented the competitive performance on several datasets. However, there are still three significant issues to be addressed in this field. Firstly, most methods have poor detection performance in unconstrained scenarios with moving vehicles and highly distracting background objects. Secondly, existing systems generally focus on single image-based algorithms, yet traffic video sequences provide more effective information than individual frames for LPDR tasks. Thirdly, images and videos captured in complex environments may be adversely affected by distortions and low resolution, causing sensitive recognition performance and reduced robustness. To remedy these issues, we propose to automatically perform license plate detection, tracking, and recognition in real-world traffic videos and integrate them into a unified end-to-end framework via deep learning. The contributions of this paper are threefold: 1) A deep flow-guided spatiotemporal license plate detector is proposed to model the video contextual information by introducing optical flow and a novel spatiotemporal attention mechanism; 2) An online license plate tracker is developed to bridge video-based detection and recognition which utilizes both motion and deep appearance information, and innovatively, it can be end-to-end trained with the detector via multi-task learning; 3) The efficient quality-guided license plate recommender and recognizer are proposed to jointly perform stream recognition. The former recommends high-quality frames from video streams while the latter generates recognition results. We evaluate the proposed method on three traffic video-based license plate datasets, and ablation studies have been presented to verify the effectiveness of each component mentioned above. Moreover, extensive experiments are conducted for comparison with other approaches in different scenarios, and the results have demonstrated that our method achieves state-of-the-art performance on all datasets. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

22. SCENT: A new precipitation nowcasting method based on sparse correspondence and deep neural network.

Author: Fang, Wei, Zhang, Feihong, Sheng, Victor S., and Ding, Yewen
Subjects: *OPTICAL flow, *ODORS, *METEOROLOGICAL research, *WIND speed, *SOCIAL development, *RADAR
Abstract: Precipitation nowcasting is an important research topic in meteorology, which relates to many aspects of people's life and social development. Under the combined influence of resolution and corresponding timestamp, a certain nonlinear relationship is satisfied between the echo intensity and precipitation. Therefore, the short-term precipitation prediction scheme based on radar echo extrapolation has become the main research method. By analyzing the spatiotemporal characteristics of the radar echo images, we found that precipitation results are related not only to currently observed radar echo images but also to some non-image features such as wind speed and shape of cloud clusters. Inspired by optical flow method, combined with the characteristics of radar reflectivity, we propose a new method SCENT to achieve precipitation prediction. Firstly, the sparse correspondence method based on Fast feature detection and SIFT matching is used to radar echo extrapolate and complete the extraction of non-image influence features. Afterwards, an improved neural network is utilized for regression calculation to obtain the total precipitation. By comparing with existing prediction models based on deep neural network, our new method can make precipitation nowcasting more accurate. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

23. A fast human action recognition network based on spatio-temporal features.

Author: Xu, Jie, Song, Rui, Wei, Haoliang, Guo, Jinhong, Zhou, Yifei, and Huang, Xiwei
Subjects: *HUMAN activity recognition, *HUMAN behavior, *OPTICAL flow, *ARTIFICIAL intelligence, *FEATURE extraction
Abstract: Artificial intelligence models are widely used in the field of human activity recognition, and human action recognition is an important aspect of human activity recognition. The core of human action recognition is to understand the temporal relationship between video frames. Almost all state-of-the-art methods of human action recognition in videos use optical flow. However, traditional local optical flow estimation methods areexpensive and not trained end-to-end. In this paper, we propose a fast network for human action recognition. Our purpose is to improve the efficiency of optical flow feature extraction and explore the fusion method of spatio-temporal features. For spatio-temporal features, our method combines spatial features and temporal features into fusion features. In addition, we propose CNN with OFF instead of the VGG16 network, which is used to process optical flow features to obtain abundant features. Our model only needs RGB inputs to get the state-of-the-art accuracy of 91.5% on UCF-101, 67.9% on HMDB51, 83.3% on MSR Daily Activity3D, and 91.25% on Florence 3D action, respectively. Compared with most state-of-the-art video action recognition models, our proposed model can effectively improve the accuracy of human action recognition. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

24. Two-stream deep spatial-temporal auto-encoder for surveillance video abnormal event detection.

Author: Li, Tong, Chen, Xinyue, Zhu, Fushun, Zhang, Zhengyu, and Yan, Hua
Subjects: *VIDEO surveillance, *OPTICAL flow, *ANOMALY detection (Computer security), *VIDEO on demand
Abstract: With the improvement of public security awareness, video anomaly detection has become an indispensable demand in surveillance videos. To improve the accuracy of video anomaly detection, this paper proposes a novel two-stream spatial-temporal architecture called Two-Stream Deep Spatial-Temporal Auto-Encoder (Two-Stream DSTAE), which is composed of a spatial stream DSTAE and a temporal stream DSTAE. Firstly, the spatial stream extracts appearance characteristics whereas the temporal stream extracts the motion patterns, respectively. Then, based on the novel policy joint reconstruction error, this model fuses the spatial stream and the temporal stream to extract spatial-temporal characteristics to detect anomalies. Furthermore, since the optical flow is invariant to appearances such as color or light, we introduce optical flow to enhance the capability of extracting continuity between adjacent frames and inter-frame motion information. We demonstrate the accuracy of the proposed method on the publicly available standard datasets: UCSD, Avenue and UMN datasets. Our experiments demonstrate high accuracy, which is superior to the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

25. Video anomaly detection with multi-scale feature and temporal information fusion.

Author: Cai, Yiheng, Liu, Jiaqi, Guo, Yajun, Hu, Shaobin, and Lang, Shinan
Subjects: *ANOMALY detection (Computer security), *OPTICAL flow, *TIME-varying networks, *STRUCTURAL frames, *VIDEOS
Abstract: Video anomaly detection is a challenging task because of the uncertainty of abnormal events. The current method based on predictive frames has obtained better detection results compared with the previous reconstruction or hand-crafted methods. In current prediction methods, the characteristics considered previously are only of a single scale, and the time constraint information is not fully used. In our work, we proposed a new framework structure to achieve better abnormality detection rate. To address the objects of different scales in each video frame, we considered extracting the characteristics of different receptive fields to encode more spatial information. At the same time, we added temporal constraints to the network instead of using time-consuming optical flow information, and we completed the memory of temporal features through a ConvGRU module. Furthermore, while distinguishing abnormal events, we also considered temporal information and spatial information so that our framework could fully combine spatio-temporal information to correctly distinguish abnormal events from normal events. We obtained excellent results on three datasets, thus demonstrating the effectiveness of our method. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

26. An adaptive training-less framework for anomaly detection in crowd scenes.

Author: Sikdar, Arindam and Chowdhury, Ananda S.
Subjects: *ANOMALY detection (Computer security), *OPTICAL flow, *COMPUTER vision, *CROWDS, *PEDESTRIANS
Abstract: Anomaly detection in crowd videos has become a popular area of research for the computer vision community. Several existing methods have determined anomaly as a deviation from scene normalcy learned via separate training with/without labeled information. However, owing to rare and sparse nature of anomalous events, any such learning can be misleading as there exist no hardcore segregation between anomalous and non-anomalous events. To address such challenge, we propose an adaptive training-less system capable of detecting anomaly on-the-fly. Our solution pipeline consists of three major components, namely, adaptive 3D-DCT model for multi-object detection-based association, local motion descriptor generation through an improved saliency guided optical flow, and anomaly detection based on Earth mover's distance (EMD). The proposed model, despite being training-free, is found to achieve comparable performance with several state-of-the-art methods on publicly available UCSD, UMN, CUHK-Avenue and ShanghaiTech datasets. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

27. Fusing motion patterns and key visual information for semantic event recognition in basketball videos.

Author: Wu, Lifang, Yang, Zhou, Wang, Qi, Jian, Meng, Zhao, Boxuan, Yan, Junchi, and Chen, Chang Wen
Subjects: *KRONECKER products, *MOTION, *BASKETBALL, *OPTICAL flow, *BASKETBALL games, *VIDEO surveillance
Abstract: Many semantic events in team sport activities e.g. basketball often involve both group activities and the outcome (score or not). Motion patterns can be an effective means to identify different activities. Global and local motions have their respective emphasis on different activities, which are difficult to capture from the optical flow due to the mixture of global and local motions. Hence it calls for a more effective way to separate the global and local motions. When it comes to the specific case for basketball game analysis, the successful score for each round can be reliably detected by the appearance variation around the basket. Based on the observations, we propose a scheme to fuse global and local motion patterns (MPs) and key visual information (KVI) for semantic event recognition in basketball videos. Firstly, an algorithm is proposed to estimate the global motions from the mixed motions based on the intrinsic property of camera adjustments. And the local motions could be obtained from the mixed and global motions. Secondly, a two-stream 3D CNN framework is utilized for group activity recognition over the separated global and local motion patterns. Thirdly, the basket is detected and its appearance features are extracted through a CNN structure. The features are utilized to predict the success or failure. Finally, the group activity recognition and success/failure prediction results are integrated using the kronecker product for event recognition. Experiments on NCAA dataset demonstrate that the proposed method obtains state-of-the-art performance. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

28. Video object detection for autonomous driving: Motion-aid feature calibration.

Author: Liu, Dongfang, Cui, Yiming, Chen, Yingjie, Zhang, Jiyong, and Fan, Bin
Subjects: *OPTICAL flow, *CALIBRATION, *VIDEOS, *DEEP learning, *DRIVERLESS cars
Abstract: This paper proposes an end-to-end deep learning framework, termed as motion-aid feature calibration network (MFCN), for video object detection. The key idea is to leverage on the temporal coherence of video features while considering their motion patterns as captured by optical flow. To boost detection accuracy, the framework aggregates the calibrated features both at pixel and instance levels across frames to achieve improved robustness despite appearance variations. The aggregation and calibration are efficiently and adaptively conducted based on an integrated optical flow network. Meanwhile, the entire architecture of the proposed method is end-to-end, thus significantly improving its training and inference efficiency when compared to multi-stage methods for video object detection. Evaluations on KITTI and ImageNet VID indicate that MFCN can improve the results of a strong still-image detector by 11.2% and 7.31% respectively. MFCN also outperforms other competitive video object detectors and achieves a better trade-off between accuracy and runtime speed, demonstrating its potential for use in autonomous driving systems. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

29. Learning channel-wise spatio-temporal representations for video salient object detection.

Author: Huang, Kan, Li, Ge, and Liu, Shan
Subjects: *OBJECT recognition (Computer vision), *CONVOLUTIONAL neural networks, *OPTICAL flow
Abstract: Video salient object detection aims at extracting most attention-grabbing objects in videos, which tends to greatly enhance many vision based tasks such as video understanding. In this work we explore this research issue from a novel perspective, i.e., learning the spatio-temporal representations associated with salient regions in separated feature channels. We propose a Channel-wise Spatio-Temporal Representation learning block (CSTR), which is trained to discriminate between salient spatio-temporal patterns and non-salient spatio-temporal patterns in separated channels. A whole CNN architecture based on this block is constructed for video salient object detection. This architecture combines dynamic saliency learned from CSTR and static saliency learned from a constructed Multi-scale Dilated Convolution block (MDC), deriving the final saliency detection results. This intuitive combination improves feature representation capability which contributes to more precise detection results. Compared with previous works that leverage optical flow or RNNs (LSTM, GRU etc.) to utilize temporal cues, the proposed method is simple to implement and offers an intuitive way to understand how spatio-temporal patterns are correlated with salient regions. Extensive experimental evaluations verify the effectiveness of the insight of the proposed method and confirm that our proposed model outperforms other outstanding methods on four popular benchmarks. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

30. Video super-resolution via dense non-local spatial-temporal convolutional network.

Author: Sun, Wei, Sun, Jinqiu, Zhu, Yu, and Zhang, Yanning
Subjects: *OPTICAL flow, *VIDEOS, *STREAMING video & television
Abstract: In this paper, we present a novel end-to-end deep neural network for the problem of video super-resolution. In contrast to most previous methods where frames need to wrap for temporal alignment based on the estimated optical flow, we propose short-temporal and bidirectional long-temporal blocks to exploit the spatial-temporal dependencies existing in inter-frames. It can effectively model the sudden and smooth varying motions of videos and overcome the limitations of explicit motion estimation. In addition, by introducing dense feature concatenation, it provides an effective way to combine the low-level and high-level features for boosting the reconstruction of mid/high-frequency information as shown in our analysis and experiment. Furthermore, we present a region-level non-local feature enhancing structure, which captures the spatial-temporal correlations of any two positions and makes use of long-distance relevant information. Extensive evaluations and comparisons with the current state-of-the-art approaches demonstrate the effectiveness of the proposed framework. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

31. COB method with online learning for object tracking.

Author: Lin, Luyue, Liu, Bo, and Xiao, Yanshan
Subjects: *OPTICAL flow, *ONLINE education, *OBJECT tracking (Computer vision), *DEEP learning, *HUMAN beings, *MATHEMATICAL regularization
Abstract: Object tracking is a problem about semi-supervised learning with insufficient data set. In the field of military navigation and security of public life, it is widely used to take the place of human beings. In this paper, we come up with a new algorithm based on Bayesian, CNN and PLK optical flow, which is called COB method, for object tracking problems. With the idea of track-by-detect, we cascade CNN after PLK optical flow and integrate them in a Bayesian method. Most importantly our method is proposed with an adaptive integrating method to reduce the influence of over-fitting. The integrator also introduces the competition mechanism between tracker and detector, so that the algorithm is able to update the classifier with online learning. Besides, the regularization of deep learning is used to solve the blind spots of classifier. The experimental results show that the algorithm is more robust than the previous work. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

32. A weakly supervised framework for abnormal behavior detection and localization in crowded scenes.

Author: Hu, Xing, Dai, Jian, Huang, Yingping, Yang, Haima, Zhang, Liang, Chen, Wenming, Yang, Genke, and Zhang, Dawei
Subjects: *HYPERPLANES, *WEAK localization (Quantum mechanics), *SUPPORT vector machines, *OPTICAL flow, *ARTIFICIAL neural networks
Abstract: • A weakly supervised approach is proposed for ABDL in crowded scenes. The MISVM based framework not only can accurately find the optimal separate hyperplane between normal and abnormal behaviors, but also only requires the bag-level label information rather than the complete labels of all samples. • Benefit from the powerfulness of faster R-CNN, most objects can be accurately localized. By doing so, not only the abnormal behavior can be analyzed at object-wise, but also the influence from the background can be eliminated. Consequently, the robustness, generality, and the computational efficiency of our approach can be strengthened. • A histogram of large-scale optical flow (HSLOF) is proposed to describe the object behavior, which can extract the most significant optical flow, and is insensitive to the variations of the size of objects. In this paper, a weakly supervised framework is proposed for Abnormal Behavior Detection and Localization (ABDL) in the scenes. First, the objects in the scene such as pedestrians, vehicles, etc. are detected using the Faster Regional Conventional Neural Network (Faster R-CNN); then, the object behavior is described by a Histogram of Large Scale Optical Flow (HLSOF) descriptor; finally, the Multiple Instance Support Vector Machine (MISVM) is trained and then used to identify the testing behaviors as normal or abnormal. Summarily, the proposed approach has three main advantages: (1) Benefit from the Faster R-CNN, our approach can analyze the behavior at object-wise, which makes our approach has good generality and high computational efficiency; (2) The HLSOF descriptor can characterize the object behavior efficiently, and is insensitive to the variations of the size of objects; (3) As a weakly supervised learning framework, the MISVM only requires the labels at the bag level rather than instance level, which makes our approach has high accuracy as the supervised approaches but not requires completely labeled training samples, only the frame-level label is required. Experimental results analysis on different datasets validates the effectiveness of our approach. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

33. Flow-guided feature enhancement network for video-based person re-identification.

Author: Gong, Weichao, Yan, Bo, and Lin, Chuming
Subjects: *OPTICAL flow, *STREAMING video & television, *VIDEOS, *POSTURE, *MARS (Planet), *SUPERVISED learning
Abstract: Video-based person re-identification associates sequences of the same person among surveillance camera network. Most existing works explore motion and inter-frame information on the features corrupted by spatial noises such as occlusion, blur, posture changes, etc, leading to degraded representation and matching performance. Enhancing features of each frame guarantees a more robust and discriminative final feature representation. In this paper, we propose a novel flow-guided feature enhancement network that leverages flow information to enhance low-level features. Specifically, it improves per-frame features by aggregating with the warped feature under the guidance of optical flow and the enhanced feature of previous frame in spatial attention mechanism. Then, a part-based loss is directly employed on the enhanced features to supervise the aggregation process, which can exert full capability of the network. Experiments on three widely used benchmark datasets: iLIDS-VID, PRID-2011 and MARS, demonstrate that the proposed model achieves superior performance and outperforms most of the recent state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

34. Neuromorphic implementation of motion detection using oscillation interference.

Author: Tsur, Elishai Ezra and Rivlin-Etzion, Michal
Subjects: *VISUAL perception, *OSCILLATIONS, *MOTION, *OPTICAL flow, *IMAGE sensors, *NONLINEAR oscillators
Abstract: Motion detection is paramount for computational vision processing. This is however a particularly challenging task for a neuromorphic hardware in which algorithms are based on interconnected spiking entities, as the instantaneous visual stimuli reports merely on luminance change. Here we describe a neuromorphic algorithm, in which an array of neuro-oscillators is utilized to detect motion and its direction over an entire field of view. These oscillators are induced via phase shifted Gabor functions, allowing them to oscillate in response to motion in one predefined direction, and to dump to zero otherwise. We developed the algorithm using the Neural Engineering Framework (NEF), making it applicable for a variety of neuromorphic hardware. Our algorithm extends the existing growing set of approaches aiming at utilizing neuromorphic hardware for vision processing, which enable to minimize energy exploitation and silicon area while enhancing computational capabilities. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

35. Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder.

Author: Li, Nanjun and Chang, Faliang
Subjects: *ANOMALY detection (Computer security), *ARTIFICIAL neural networks, *MATHEMATICAL convolutions, *OPTICAL flow, *DEEP learning
Abstract: In this paper, we present a novel deep learning based method for video anomaly detection and localization. The key idea of our approach is that the latent space representations of normal samples are trained to accord with a specific prior distribution by the proposed deep neural network - Multivariate Gaussian Fully Convolution Adversarial Autoencoder (MGFC-AAE), while the latent representations of anomalies do not. In order to extract deep features from input samples as latent representations, a convolutional neural network (CNN) is employed for the encoder of the deep network. Based on the probability that the test sample is associated with the prior distribution, an energy-based method is applied to obtain its anomaly score. A two-stream framework is utilized to integrate the appearance and motion cues to achieve more comprehensive detection results, taking the gradient and optical flow patches as inputs for each stream. Besides, a multi-scale patch structure is put forward to handle the perspective of some video scenes. Experiments are conducted on three public datasets, results verify that our framework can accurately detect and locate abnormal objects in various video scenes, achieving competitive performance when compared with other state-of-the-art works. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

36. Optical flow estimation using channel attention mechanism and dilated convolutional neural networks.

Author: Zhai, Mingliang, Xiang, Xuezhi, Zhang, Rongfang, Lv, Ning, and Saddik, Abdulmotaleb El
Subjects: *ARTIFICIAL neural networks, *OPTICAL flow, *CHANNEL estimation, *FEATURE extraction, *MATHEMATICAL convolutions
Abstract: Learning optical flow based on convolutional neural networks has made great progress in recent years. These approaches usually design an encoder-decoder network that can be trained end-to-end. In encoder part, high-level feature information is extracted through a series of strided convolution, which is similar to most image classification networks. In contrast to classification task, spatial feature maps are then enlarged to full scale of input by conducting successive deconvolution layer in decoder part. However, optical flow estimation is a pixel-level task, and blurry flow fields are usually generated, which is caused by unrefined features and low-resolution. To address this problem, we propose a novel network, which combines attention mechanism and dilated convolutional neural network. In this network, the channel-wise features are adaptively weighted by building interdependencies among channels, which can weaken the weights of useless features and can enhance the directivity of feature extraction. Meanwhile, spatial precision is achieved by employing dilated convolution which improves the receptive field without large computational source and keeps the spatial resolution of feature map unchanged. Our network is trained on FlyingChairs and FlyingThings3D datasets in a supervised manner. Extensive experiments are conducted on MPI-Sintel and KITTI datasets to verify the effectiveness of the proposed method. The experimental results show that attention mechanism and dilated convolution are beneficial for optical flow estimation. Moreover, our method achieves better accuracy and visual improvements comparing to most of recent approaches. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

37. Deep network for human action recognition using Weber motion.

Author: Chaudhary, Sachin and Murala, Subrahmanyam
Subjects: *HUMAN behavior, *MOTION, *OPTICAL flow, *HUMAN activity recognition, *SPACETIME
Abstract: Effective motion estimation is one of the prime steps for any human action recognition (HAR) algorithm. Optical flow (OF) and motion history image (MHI) are two well-known methods for motion estimation in videos. OF has several advantages over MHI. But the major drawback with OF is that it is computationally very expensive as compared to the MHI. Therefore, in this paper, a new motion estimation technique named as Weber Motion History Image (WMHI) is proposed. Here, an extremely fast algorithm is proposed for HAR using WMHI, pose information, and convolutional neural network (CNN). In spite of being fast and less space consuming, the algorithm outperforms the existing pose based CNN results on five benchmark datasets namely JHMDB [1], sub-JHMDB [1] , MPII [2] and HMDB51 [3] and UCF101 [4]. The work mainly focuses on a new efficient algorithm which can be implemented for real-time HAR in videos. For real-time implementation, the two basic criteria on which an algorithm can be analyzed are space and time complexity. The proposed algorithm is faster as compared to the existing OF based HAR systems. In terms of space complexity, the feature size of the proposed algorithm is almost 50% of the existing OF based algorithm. The recognition results still outperform the existing result by a significant margin. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

38. Weakly-supervised anomaly detection with a Sub-Max strategy.

Author: Zhang, Bohua and Xue, Jianru
Subjects: *ANOMALY detection (Computer security), *OPTICAL flow, *INTRUSION detection systems (Computer security), *VIDEO surveillance, *BOTTLENECKS (Manufacturing)
Abstract: We study weakly-supervised anomaly detection where only video-level "anomalous"/"normal" labels are available in training, while anomaly events should be temporally localized in testing. For this task, a commonly used framework is multiple instance learning (MIL), where clip instances are sampled from individual videos to form video-level bags. This sampling process arguably is a bottleneck of MIL. If too many instances are sampled, we not only encounter high computational overheads but also have many noisy instances in the bag. On the other hand, when too few instances are used, e. g., through enlarged grids, much background noise may be included in the anomaly instances. To resolve this dilemma, we propose a simple yet effective method named Sub-Max. In partitioned image regions, it identifies instances that are most probable candidates for anomaly events by selecting cuboids that have high optical flow magnitudes. We show that our method effectively brings down the computational cost of the baseline MIL and at the same time significantly filters out the influence of noise. Albeit simple, this strategy is shown to facilitate the learning of discriminative features and thus improve event classification and localization performance. For example, after annotating the event location ground truths of the UCF-Crime test set, we report very competitive accuracy compared with the state of the art on both frame-level and pixel-level metrics, corresponding to classification and localization, respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

39. SSTM: Spatiotemporal recurrent transformers for multi-frame optical flow estimation.

Author: Ferede, Fisseha Admasu and Balasubramanian, Madhusudhanan
Subjects: *OPTICAL flow, *TRANSFORMER models
Abstract: Inaccurate optical flow estimates in and near occluded regions, and out-of-boundary regions are two of the current significant limitations of optical flow estimation algorithms. Recent state-of-the-art optical flow estimation algorithms are two-frame based methods where optical flow is estimated sequentially for each consecutive image pair in a sequence. While this approach gives good flow estimates, it fails to generalize optical flows in occluded regions mainly due to limited local evidence regarding moving elements in a scene. In this work, we propose a learning-based multi-frame optical flow estimation method that estimates two or more consecutive optical flows in parallel from multi-frame image sequences. Our underlying hypothesis is that by understanding temporal scene dynamics from longer sequences with more than two frames, we can characterize pixel-wise dependencies in a larger spatiotemporal domain, generalize complex motion patterns and thereby improve the accuracy of optical flow estimates in occluded regions. We present learning-based spatiotemporal recurrent transformers for multi-frame based optical flow estimation (SSTMs). Our method utilizes 3D Convolutional Gated Recurrent Units (3D-ConvGRUs) and spatiotemporal transformers to learn recurrent space–time motion dynamics and global dependencies in the scene and provide a generalized optical flow estimation. When compared with recent state-of-the-art two-frame and multi-frame methods on real world and synthetic datasets, performance of the SSTMs were significantly higher in occluded and out-of-boundary regions. Among all published state-of-the-art multi-frame methods, SSTM achieved state-of the-art results on the Sintel Final and KITTI2015 benchmark datasets. Software code, data and instructions : https://github.com/Computational-Ocularscience/SSTM. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

40. Abnormal event detection for video surveillance using an enhanced two-stream fusion method.

Author: Yang, Yuxing, Fu, Zeyu, and Naqvi, Syed Mohsen
Subjects: *VIDEO surveillance, *OPTICAL flow, *CONVOLUTIONAL neural networks, *OBJECT recognition (Computer vision), *TRUST
Abstract: [Display omitted] • We proposed a unified classification and prediction fusion framework for detecting different types of abnormal events in surveillance videos. • With the assistance of motion features, our AED algorithm is more sensitive to abnormal events in a crowded environment. • Extensive experiments confirm the efficacy and robustness of the proposed framework for AED. • The types of abnormal events on some public AED datasets are analysed, and the accuracies of identifying abnormal events are also counted. Abnormal event detection is a critical component of intelligent surveillance systems, focusing on identifying abnormal objects or unusual human behaviours in video sequences. However, conventional methods struggle due to the scarcity of labelled data. Existing solutions typically train on normal data, establish boundaries for regular events, and identify outliers during testing. These approaches are often inadequate as they do not efficiently leverage the geometry and image texture information, and they lack a specific focus on different types of abnormal events. This paper introduces a novel two-stream fusion algorithm for abnormal event detection to address these diverse abnormal events better. We first extract the object, pose, and optical flow features. Then, the object and pose information is combined early on to eliminate occluded pose graphs. The trusted pose graphs are fed into a Spatio-Temporal Graph Convolutional Network (ST-GCN) to detect abnormal behaviours. Simultaneously, we propose a video prediction framework that identifies abnormal frames by measuring the difference between predicted and ground truth frames. Lastly, we execute a decision-level fusion between the classification and prediction streams to achieve the final results. Our results on the UCSD PED1 dataset indicate the enhanced performance of the fusion model for various abnormal events. Furthermore, experimental results on the UCSD PED2 dataset and the ShanghaiTech campus dataset underscore our approach's effectiveness compared to other related works. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

41. High-speed video salient object detection with temporal propagation using correlation filter.

Author: Qi, Qi, Zhao, Sanyuan, Zhao, Wenjun, Lei, Zhengchao, Shen, Jianbing, Zhang, Libin, and Pang, Yuanyuan
Subjects: *OPTICAL flow, *FILTERS & filtration, *VIDEOS, *STREAMING video & television
Abstract: It is challenging for video salient object detection in the pursuit of high accuracy and fast speed with large amount of calculation in spatiotemporal domain. Most of existing methods use complex models with massive number of parameters to detect salient regions in video and cost a lot of time. In this paper, we propose a high-speed video salient object detection method at 0.5s each frame (including average 0.32 s for optical flow computation). It mainly consists of two modules, the initial spatiotemporal saliency module and the correlation filter based salient temporal propagation module. The former one integrates the spatial saliency by robust minimum barrier distance and boundary contrast cue with temporal saliency information from motion field. The latter one incorporates correlation filters to keep the saliency consistency between neighboring frames. The above two modules are finally fused together in an adaptive way. Comprehensive experiments on four benchmarks: SegTrack v1, SegTrack v2, FBMS and Visal dataset, clearly demonstrate that our algorithm shows better performance than the other state-of-art methods. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

42. Video deblurring via motion compensation and adaptive information fusion.

Author: Zhan, Zongqian, Yang, Xue, Li, Yihui, and Pang, Chao
Subjects: *CONVOLUTIONAL neural networks, *STREAMING video & television, *IMAGE registration, *WAGES, *MOTION, *IMAGE stabilization, *OPTICAL flow, *INFORMATION filtering, *VIDEOS
Abstract: Abstract Non-uniform motion blur caused by camera shake or object motion is a common artifact in videos captured by hand-held devices. Recent advances in video deblurring have shown that convolutional neural networks (CNNs) are able to aggregate information from multiple unaligned consecutive frames to generate sharper images. However, without explicit image alignment, most of the existing CNN-based methods often introduce temporal artifacts, especially when the input frames are severely blurred. To this end, we propose a novel video deblurring method to handle spatially varying blur in dynamic scenes. In particular, we introduce a motion estimation and motion compensation module which estimates the optical flow from the blurry images and then warps the previously deblurred frame to restore the current frame. Thus, the previous processing results benefit the restoration of the subsequent frames. This recurrent scheme is able to utilize contextual information efficiently and can facilitate the temporal coherence of the results. Furthermore, to suppress the negative effect of alignment error, we propose an adaptive information fusion module that can filter the temporal information adaptively. The experimental results obtained in this study confirm that the proposed method is both effective and efficient. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

43. Action recognition using spatial-optical data organization and sequential learning framework.

Author: Yuan, Yuan, Zhao, Yang, and Wang, Qi
Subjects: *HUMAN activity recognition, *SEQUENTIAL learning, *SPATIOTEMPORAL processes, *SEMANTICS, *ARTIFICIAL neural networks, *OPTICAL flow
Abstract: Abstract Recognizing human actions in videos is a challenging problem owning to complex motion appearance, various backgrounds and semantic gap between low-level features and high-level semantics. Existing methods have scored some achievements and many new thoughts have been proposed for action recognition. They focus on designing a robust feature description and training an elaborate learning model, and many of them can benefit from a two-stream network with a stack of RGB frames and optical flow frames. However, these features for human action representation are struggling with the limited feature representation as RGB videos are confused by static appearance redundancy and optical flow videos cannot represent the detailed appearance. To solve these problems, we propose an efficient algorithm based on the spatial-optical data organization and the sequential learning framework. There are two contributions of our method: a novel data organization based on hierarchical weighting segmentation and optical flow for video representation, and a lightweight deep learning model based on the Convolutional 3D (C3D) network and the Recurrent Neural Network (RNN) for complicated action recognition. The new data organization aggregates the merits of motion appearance, movement trajectories and optical flow in a creative way to highlight the meaningful information. And the proposed lightweight model has an insight into patterns and semantics of sequential data by low-level spatiotemporal feature extraction and high-level information mining. The proposed method is evaluated on the state-of-the-art dataset and the results demonstrate that our method have a good performance for complex human action recognition. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

44. Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention

Author: Ling Zheng, Ying Li, Qiang Shen, and Qingping Zheng
Subjects: business.industry, Computer science, Cognitive Neuroscience, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, Construct (python library), Motion (physics), Computer Science Applications, Artificial Intelligence, Region of interest, Pattern recognition (psychology), Benchmark (computing), Computer vision, Enhanced Data Rates for GSM Evolution, Artificial intelligence, business, Encoder
Abstract: Semantics and motion are two cues of essence for the success in video salient object detection. Most existing deep-learning based approaches extract semantic features by the use of only one fully convolutional network with simple stacked encoders. They simulate motion patterns of video objects with two consecutive frames being simultaneously fed into a convolutional LSTM network or a weights-sharing fully convolutional network. However, such approaches have the shortcomings of producing a coarse predicted saliency map or requiring significant computational overheads. In this paper, we present a novel approach with cascaded fully convolutional networks involving motion attention (abbreviated as CFCN-MA), to achieve real-time saliency detection in videos. Our key idea is to construct twofold fully convolutional networks in order to gain a saliency map from coarse to fine. We devise an optical flow-based motion attention mechanism to improve the prediction accuracy of the initial fully convolutional networks, using the popular FlowNet2-SD model that is efficient and effective for motion pattern recognition of distinctive objects in videos. This method can obtain a fine saliency map with a refined region of interest. Moreover, we propose a means for calculating attention-guided intersection-over-union loss (shortnamed as AIoU) to supervise the CFCN-MA model in learning a saliency map with both clear edge and complete structure. Our approach is evaluated on three popular benchmark datasets, namely DAVIS, ViSal and FBMS. Experimental results demonstrate that our method outperforms many state-of-the-art techniques while meeting the real-time demand at 27 fps.
Published: 2022

45. MSCDP: Multi-step crowd density predictor in indoor environment.

Author: Wang, Shuyu, Lyu, Yan, Xu, Yuhang, and Wu, Weiwei
Subjects: *OPTICAL flow, *CROWDS, *DENSITY, *MICROBIOLOGICAL aerosols
Abstract: • MSCDP predicts crowd density heatmaps in future time steps by fusing video frame and density heatmap encodings. • Long-term motion context memory alignment improves prediction accuracy by learning periodic movement patterns in optical flows and matching short-term observations to those patterns. • MSCDP outperforms state-of-the-art techniques and variants in predicting crowd density heatmaps, as shown in evaluation on two real-world datasets. Monitoring and predicting crowd movements in indoor environments are of great importance in crowd management to prevent crushing and trampling. Existing works mostly focused on individual trajectory forecasting in a less crowded scene, or crowd counting and density estimation. Only a very few works predict the crowd density distribution. However, this study is failing to realize multi-step prediction or exploits only density heatmaps modality and ignores the information complementation with corresponding video frames. Therefore, we are motivated to predict crowd density distribution in multiple time steps to facilitate long-term prediction. In this paper, a Multi-Step Crowd Density Predictor (MSCDP) to fuse video frame sequences and corresponding density heatmaps, is proposed to accurately forecast future crowd density heatmaps. To capture long-term periodic movement features, the long-term optical flow context memory (LOFCM) module is designed to store learnable patterns. We conducted extensive experiments on two real-world datasets. Evaluation results show that our MSCDP outperforms the state-of-the-art baseline techniques and MSCDP variants in terms of various prediction errors, demonstrating the effectiveness of MSCDP and each of its key components in multi-step crowd density prediction. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. Silicone mask face anti-spoofing detection based on visual saliency and facial motion

Author: Zheng He, Guangcheng Wang, Baojin Huang, Zhongyuan Wang, Kui Jiang, and Ruimin Hu
Subjects: 0209 industrial biotechnology, Computer science, business.industry, Cognitive Neuroscience, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, Eye movement, 02 engineering and technology, Gaze, Facial recognition system, Computer Science Applications, Support vector machine, 020901 industrial engineering & automation, Artificial Intelligence, Histogram, Face (geometry), Saccade, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business
Abstract: Face recognition systems are widely used for target recognition and identity authentication, such as automated teller machines, mobile phones, and entrance guard systems. However, face recognition systems are vulnerable to presentation attacks, such as photo, replay, and 3D mask attacks. In particular, silicone mask attacks pose a greater threat to face recognition systems because high-quality silicone masks do living properties. To promote the development of face anti-spoofing detection algorithms for silicone mask attacks, this paper constructs a Silicone Mask Face Motion Video Dataset (SMFMVD) containing 200 real face videos and 200 silicone mask face videos. These videos include different facial motions collected from 20 subjects. Moreover, inspired by the observation that the silicone mask face’s facial movement is not so natural as the real face, we propose a novel silicone mask face anti-spoofing detection method based on visual saliency and facial motion characteristics. Specifically, we compute the visual saliency map of a given face image by simulating two kinds of eye movement behaviors, namely “gaze” and “saccade”. Then, we propose a saliency-weighted histogram of local binary pattern operator to extract facial texture features in spatial domain and a saliency-guided histogram of oriented optical flow operator to extract facial motion features in temporal domain. Finally, the support vector machine is used to fuse two groups of facial features to distinguish real and spoof faces. Extensive experiments on public and self-built datasets show its superiority over the state-of-the-art methods.
Published: 2021

47. Effective template update mechanism in visual tracking with background clutter

Author: Khan Muhammad, Dongye Liu, Shuai Liu, and Weiping Ding
Subjects: 0209 industrial biotechnology, business.industry, Computer science, Cognitive Neuroscience, Optical flow, 02 engineering and technology, Tracking (particle physics), Computer Science Applications, Domain (software engineering), 020901 industrial engineering & automation, Artificial Intelligence, Robustness (computer science), Video tracking, 0202 electrical engineering, electronic engineering, information engineering, Eye tracking, Clutter, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business
Abstract: Today, artificial intelligence is everywhere in people’s daily lives. Visual tracking, which is used to identify and continuously track specific targets, is an important research domain in the study of artificial intelligence. However, current visual tracking methods are not accurate enough for object tracking with background clutter, which can easily lead to tracking failures. Therefore, in this paper, in order to solve the problem of tracking failure in clutter background, we propose a template update mechanism to improve the accuracy of visual tracking. First, an original template is saved when the background clutter is detected. During background clutter, we use both the original template and the current template at the location estimated by the optical flow and choose better one. Next, the original template is reused after the background clutter is ended. Finally, the proposed mechanism is used both in the KCF and BACF algorithm to verify the effectiveness of the mechanism. With experiments on the OTB2015 dataset, results show that the proposed mechanism has improved accuracy and success rate of the two baseline algorithms. Meanwhile, in state-of-the-art algorithms, the algorithm using the proposed mechanism also has excellent tracking performance. In addition, this method also has strong tracking robustness and adaptation capability to sequential learning for video data.
Published: 2021

48. Spatial–temporal pooling for action recognition in videos

Author: Xianwei Lv, Tao Lu, Jiaming Wang, Zhenfeng Shao, Xiao Huang, and Ruiqian Zhang
Subjects: 0209 industrial biotechnology, Source code, business.industry, Computer science, Cognitive Neuroscience, media_common.quotation_subject, Pooling, Optical flow, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, Computer Science Applications, Identification (information), 020901 industrial engineering & automation, Discriminative model, Artificial Intelligence, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, media_common, Block (data storage)
Abstract: Recently, deep convolutional neural networks have demonstrated great effectiveness in action recognition with both RGB and optical flow in the past decade. However, existing studies generally treat all frames and pixels equally, potentially leading to poor robustness of models. In this paper, we propose a novel parameter-free spatial–temporal pooling block (referred to as STP) for action recognition in videos to address this challenge. STP is proposed to learn spatial and temporal weights, which are further used to guide information compression. Different from other temporal pooling layers, STP is more efficient as it discards the non-informative frames in a certain clip. In addition, STP applies a novel loss function that forces the model to learn information from sparse and discriminative frames. Moreover, we introduce a dataset for ferry action classification, named Ferryboat-4, which includes four categories: Inshore, Offshore, Traffic, and Negative. This designed dataset can be used for the identification of ferries with abnormal behaviors, providing the essential information to support the supervision, management, and monitoring of ships. All the videos are acquired via real-world cameras. We perform extensive experiments on publicly available datasets as well as Ferryboat-4 and find that the proposed method outperforms several state-of-the-art methods in action classification. Source code and datasets are available at https://github.com/jiaming-wang/STP .
Published: 2021

49. SCENT: A new precipitation nowcasting method based on sparse correspondence and deep neural network

Author: Feihong Zhang, Wei Fang, Victor S. Sheng, and Yewen Ding
Subjects: 0209 industrial biotechnology, Artificial neural network, Nowcasting, business.industry, Computer science, Cognitive Neuroscience, Echo (computing), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Extrapolation, Optical flow, Pattern recognition, 02 engineering and technology, Computer Science Applications, law.invention, ComputingMethodologies_PATTERNRECOGNITION, 020901 industrial engineering & automation, Artificial Intelligence, law, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Timestamp, Artificial intelligence, Radar, business, Feature detection (computer vision)
Abstract: Precipitation nowcasting is an important research topic in meteorology, which relates to many aspects of people’s life and social development. Under the combined influence of resolution and corresponding timestamp, a certain nonlinear relationship is satisfied between the echo intensity and precipitation. Therefore, the short-term precipitation prediction scheme based on radar echo extrapolation has become the main research method. By analyzing the spatiotemporal characteristics of the radar echo images, we found that precipitation results are related not only to currently observed radar echo images but also to some non-image features such as wind speed and shape of cloud clusters. Inspired by optical flow method, combined with the characteristics of radar reflectivity, we propose a new method SCENT to achieve precipitation prediction. Firstly, the sparse correspondence method based on Fast feature detection and SIFT matching is used to radar echo extrapolate and complete the extraction of non-image influence features. Afterwards, an improved neural network is utilized for regression calculation to obtain the total precipitation. By comparing with existing prediction models based on deep neural network, our new method can make precipitation nowcasting more accurate.
Published: 2021

50. V-LPDR: Towards a unified framework for license plate detection, tracking, and recognition in real-world traffic videos

Author: Xuelong Li, Qi Wang, and Cong Zhang
Subjects: 0209 industrial biotechnology, Computer science, business.industry, Cognitive Neuroscience, Deep learning, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, 02 engineering and technology, Field (computer science), Computer Science Applications, 020901 industrial engineering & automation, Artificial Intelligence, Robustness (computer science), Component (UML), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Focus (optics), business, License
Abstract: License plate detection and recognition (LPDR) has attracted considerable attention in recent years, and many algorithms have presented the competitive performance on several datasets. However, there are still three significant issues to be addressed in this field. Firstly, most methods have poor detection performance in unconstrained scenarios with moving vehicles and highly distracting background objects. Secondly, existing systems generally focus on single image-based algorithms, yet traffic video sequences provide more effective information than individual frames for LPDR tasks. Thirdly, images and videos captured in complex environments may be adversely affected by distortions and low resolution, causing sensitive recognition performance and reduced robustness. To remedy these issues, we propose to automatically perform license plate detection, tracking, and recognition in real-world traffic videos and integrate them into a unified end-to-end framework via deep learning. The contributions of this paper are threefold: 1) A deep flow-guided spatiotemporal license plate detector is proposed to model the video contextual information by introducing optical flow and a novel spatiotemporal attention mechanism; 2) An online license plate tracker is developed to bridge video-based detection and recognition which utilizes both motion and deep appearance information, and innovatively, it can be end-to-end trained with the detector via multi-task learning; 3) The efficient quality-guided license plate recommender and recognizer are proposed to jointly perform stream recognition. The former recommends high-quality frames from video streams while the latter generates recognition results. We evaluate the proposed method on three traffic video-based license plate datasets, and ablation studies have been presented to verify the effectiveness of each component mentioned above. Moreover, extensive experiments are conducted for comparison with other approaches in different scenarios, and the results have demonstrated that our method achieves state-of-the-art performance on all datasets.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

120 results on '"OPTICAL flow"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources