Descriptor: "video sequences" / Publisher: elsevier b.v. - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"video sequences"' showing total 16 results

Start Over Descriptor "video sequences" Publisher elsevier b.v.

16 results on '"video sequences"'

1. Multi-geometry embedded transformer for facial expression recognition in videos.

Author: Chen, Dongliang, Wen, Guihua, Li, Huihui, Yang, Pei, Chen, Chuyun, and Wang, Bao
Subjects: *FACIAL expression, *HYPERBOLIC spaces, *MULTILEVEL models, *VIDEOS, *EMOTIONAL state
Abstract: Dynamic facial expressions in videos express more realistic emotional states, and recognizing emotions from in-the-wild facial expression videos is a challenging task due to the changeable posture, partial occlusion and various light conditions. Although current methods have designed transformer-based models to learn spatial–temporal features, they cannot explore useful local geometry structures from both spatial and temporal views to capture subtle emotional features for the videos with varied poses and facial occlusion. To this end, we propose a novel multi-geometry embedded transformer (MGET), which adapts multi-geometry knowledge into transformers and excavates spatial–temporal geometry information as complementary to learn effective emotional features. Specifically, from a new perspective, we first design a multi-geometry distance learning (MGDL) to capture emotion-related geometry structure knowledge under Euclidean and Hyperbolic spaces. Especially based on the advantages of hyperbolic geometry, it finds the more subtle emotional changes among local spatial and temporal features. Secondly, we combine MGDL with transformer to design spatial–temporal MGETs, which capture important spatial and temporal multi-geometry features to embed them into their corresponding original features, and then perform cross-regions and cross-frame interaction on these multi-level features. Finally, MGET gains superior performance on DFEW, FERV39k and AFEW datasets, where the unweighted average recall (UAR) and weighted average recall (WAR) are 58.65%/69.91%, 41.91%/50.76% and 53.23%/55.40%, respectively, and the gained improvements are 2.55%/0.66%, 3.69%/2.63% and 3.66%/1.14% compared to M3DFEL, Logo-Forme and EST methods. • A multi-geometry embedded transformer is proposed for in-the-wild FER in videos. • MGDL captures multi-geometry structures under Euclidean and Hyperbolic spaces. • MGET combines MGDL with transformer to model multi-level spatial-temporal features. • MGET shows superior performance on in-the-wild video-based FER databases. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Leveraging recent advances in deep learning for audio-Visual emotion recognition.

Author: Schoneveld, Liam, Othmani, Alice, and Abdelkawy, Hazem
Subjects: *EMOTION recognition, *NONVERBAL communication, *AFFECTIVE computing, *HUMAN behavior, *FACIAL expression, *RECURRENT neural networks, *DEEP learning
Abstract: • A new high-performing deep neural network-based approach for AudioVisual Emotion Recognition (AVER). • Learning two independent feature extractors specialised for emotion recognition. • Learning two independent feature extractors that could be employed for any downstream audiovisual emotion recognition task. • Applying knowledge distillation (specifically, self-distillation), alongside additional unlabeled data for FER. • Learning the spatio-temporal dynamics via a recurrent neural network for AVER. Emotional expressions are the behaviors that communicate our emotional state or attitude to others. They are expressed through verbal and non-verbal communication. Complex human behavior can be understood by studying physical features from multiple modalities; mainly facial, vocal and physical gestures. Recently, spontaneous multi-modal emotion recognition has been extensively studied for human behavior analysis. In this paper, we propose a new deep learning-based approach for audio-visual emotion recognition. Our approach leverages recent advances in deep learning like knowledge distillation and high-performing deep architectures. The deep feature representations of the audio and visual modalities are fused based on a model-level fusion strategy. A recurrent neural network is then used to capture the temporal dynamics. Our proposed approach substantially outperforms state-of-the-art approaches in predicting valence on the RECOLA dataset. Moreover, our proposed visual facial expression feature extraction network outperforms state-of-the-art results on the AffectNet and Google Facial Expression Comparison datasets. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

3. A randomized deep neural network for emotion recognition with landmarks detection.

Author: Di Luzio, Francesco, Rosato, Antonello, and Panella, Massimo
Subjects: ARTIFICIAL neural networks, EMOTION recognition, DEEP learning, SAMPLING (Process), EXTERNALITIES
Abstract: In this paper, we present an innovative deep neural architecture employing parameter randomization in a complex classification model for emotion recognition. Actually, randomized deep neural networks represent an interesting alternative to exploring the efficiency-to-accuracy balance in real-life applications. Moreover, we also introduce the use of input frames composed of 468 facial landmarks coordinates and an innovative sampling procedure avoiding padding. The proposed randomized classifier is trained for emotion recognition on video sequences and the related accuracy is compared with a non-randomized version of the same model and with well-known benchmark architectures, demonstrating the robustness of the proposed approach in terms of classification accuracy and training time. • Deep learning technique in the biomedical context. • 468 facial landmarks to solve emotion recognition through videos. • Parameter randomization in a complex classification model. • Trade-off between accuracy and computational cost for social diffusion. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Support for reduced presentation durations in subjective video quality assessment.

Author: Mercer Moss, Felix, Yeh, Chun-Ting, Zhang, Fan, Baddeley, Roland, and Bull, David R.
Subjects: *VIDEOS, *VIDEO distributors, *VIDEO codecs, *ACQUISITION of data, *VIDEO excerpts, *VIDEO coding
Abstract: Video content distributors, codec developers and researchers in related fields often rely on subjective assessments to ensure that their video processing procedures result in satisfactory quality. The current 10 s recommendation for the length of test sequences in subjective video quality assessment, however, has recently been questioned. Not only do sequences of this length depart from modern cinematic shooting styles, the use of shorter sequences would also enable substantial efficiency improvements to the data collection process. Our previous work, using a double-stimulus methodology, indicated that shortening test sequences had a limited impact upon rating behaviour. Here, using a larger database and additional opinion score measures, we also explore the same effect within the popular single-stimulus approach. Two groups of viewers assessed reference and distorted videos ranging in length from 1.5 s to 10 s. Analyses confirmed our previous findings using the DSCQS paradigm, and were replicated when using a similar single-stimulus paradigm: while viewers' DMOS for 1.5 s videos was significantly lower than for 10 s, no significant variation was found between the groups of 10 s, 7 s and 5 s videos. Together with our previous research, these data lead us to recommend the use of 5 s, temporally-consistent video clips in quality assessment studies that employ either DSCQS or its single-stimulus variant. The extension of our recommendation to further methodologies is also discussed. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

5. A self-adaptive optical flow method for the moving object detection in the video sequences.

Author: Xin, Yunhong, Hou, Jie, Dong, Leming, and Ding, Liping
Subjects: *OPTICAL flow, *OPTICAL detectors, *ALGORITHMS, *ADAPTIVE control systems, *VIDEO recording
Abstract: This paper proposes a self-adaptive optical flow method to detect moving objects in the video sequences. The method first estimates the original optical flow field with the optical flow algorithm, and then enhances the objects by a local mean algorithm, and finally filters out the noise with a self-adaptive threshold algorithm. The proposed method has a wide adaptivity to the size and the number of objects, and it also can effectively process the scenarios of complex background and that of the slight occlusion. Furthermore, it avoids the complicated and time-consuming preprocessing procedure. The results of the present method show that the moving objects can be detected effectively. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

6. A probabilistic integrated object recognition and tracking framework

Author: Serratosa, Francesc, Alquézar, René, and Amézquita, Nicolás
Subjects: *PATTERN recognition systems, *TRACKING algorithms, *VIDEO processing, *DECISION making, *ARTIFICIAL neural networks, *MAXIMUM likelihood statistics
Abstract: Abstract: This paper describes a probabilistic integrated object recognition and tracking framework called PIORT, together with two specific methods derived from it, which are evaluated experimentally in several test video sequences. The first step in the proposed framework is a static recognition module that provides class probabilities for each pixel of the image from a set of local features. These probabilities are updated dynamically and supplied to a tracking decision module capable of handling full and partial occlusions. The two specific methods presented use RGB color features and differ in the classifier implemented: one is a Bayesian method based on maximum likelihood and the other one is based on a neural network. The experimental results obtained have shown that, on one hand, the neural net based approach performs similarly and sometimes better than the Bayesian approach when they are integrated within the tracking framework. And on the other hand, our PIORT methods have achieved better results when compared to other published tracking methods in video sequences taken with a moving camera and including full and partial occlusions of the tracked object. [Copyright &y& Elsevier]
Published: 2012
Full Text: View/download PDF

7. Dynamic stereoscopic selective visual attention (DSSVA): Integrating motion and shape with depth in video segmentation

Author: Fernández-Caballero, Antonio, López, María T., and Saiz-Valverde, Sergio
Subjects: *COMPUTER software reusability, *ELECTRONIC data processing, *MATHEMATICAL analysis, *COMPUTER algorithms
Abstract: Abstract: Depth inclusion as an important parameter for dynamic selective visual attention is presented in this article. The model introduced in this paper is based on two previously developed models, dynamic selective visual attention and visual stereoscopy, giving rise to the so-called dynamic stereoscopic selective visual attention method. The three models are based on the accumulative computation problem-solving method. This paper shows how software reusability enables enhancing results in vision research (video segmentation) by integrating earlier works. In this article, the first results obtained for synthetic sequences are included to show the effectiveness of the integration of motion and shape features with depth parameter in video segmentation. [Copyright &y& Elsevier]
Published: 2008
Full Text: View/download PDF

8. Video camera registration using accumulated co-motion maps

Author: Szlávik, Zoltán, Szirányi, Tamás, and Havasi, László
Subjects: *LIGHT sources, *ALGORITHMS, *FEASIBILITY studies, *STATISTICS
Abstract: Abstract: The paper presents a method to register partially overlapping camera-views of scenes where the objects of interest are in motion even if unstructured environment and motion. In a typical outdoor multi-camera system the observed objects might be very different due to the changes in lighting conditions and different camera positions. Hence, static features such as color, shape, and contours cannot be used for camera registration in these cases. Calculation of co-motion statistics, which is followed by outlier rejection and a nonlinear optimization, does the matching. The described robust algorithm finds point correspondences in two camera views (images) without searching for any objects and without tracking any continuous motion. Real-life outdoor experiments demonstrate the feasibility of our approach. [Copyright &y& Elsevier]
Published: 2007
Full Text: View/download PDF

9. Joint moving cast shadows segmentation and light source detection in video sequences

Author: Nicolas, Henri and Pinel, Jean-Marie
Subjects: *VIDEO recording, *LIGHT sources, *METHODOLOGY, *ESTIMATION theory
Abstract: Abstract: This paper proposes a new method which allows a joint estimation of the light source projection on the image plane and the segmentation of moving cast shadows in natural video sequences. It allows improving the segmentation of moving objects by separating clearly cast shadows from moving objects. The method is based on a shadow model which mainly assumes that the cast shadows are projected on plane and Lambertian surfaces, and that the light source is unique. The moving cast shadows, including the penumbra, are detected using a segmentation method based on a comparison between a reference image and the original one. The light source position is estimated using geometrical relations linking the light source, the object and its cast shadow on the 2-D image plane. This is obtained using a robust temporal filtering method. For each image using the current estimation of the light source position and the video object contours, a cast shadow search area is defined. This reduces the risk of false detections during the segmentation process, and thus allows increasing the detection rate and reducing the false alarm one. Experimental results show that good shadow and object contours and light source locations are obtained with the proposed method even if the theoretical assumptions are not fully valid. [Copyright &y& Elsevier]
Published: 2006
Full Text: View/download PDF

10. Detection and removal of video defects using rational-based techniques

Author: Khriji, Lazhar, Meribout, Mahmoud, and Gabbouj, Moncef
Subjects: *IMAGE processing, *AUDIOVISUAL materials, *VIDEO compression standards, *STREAMING technology
Abstract: Abstract: This paper presents a Rational and Vector Rational based interpolator methods for reconstruction of missing data in video sequences. The interpolation of missing data is important in many areas of image processing, including the restoration of degraded motion pictures, reconstruction of dropouts in digital video and automatic re-touching of old photographs. Here, a detection technique is investigated for 1ocalization of the defects, and then a spatial vector rational interpolator algorithm is proposed to, reconstruct the missing data. This algorithm exhibits desirable properties, such as, edge and details preservation and accurate chromaticity estimation. In such approach, color image pixels are considered as three-component vectors in the color space that is more appropriate for the human visual system. Therefore, the inherent correlation that exists between the different color components is not ignored. This leads to better image quality compared to that obtained by component-wise or marginal processing. The experimental results demonstrate the usefulness of the vector rational interpolator in an application involving the restoration of defects in video sequences. The resulting edges obtained using the proposed interpolator are free from blockiness and jaggedness. The complexity evaluation of the algorithm shows that the implementation of the algorithm on a dedicated IMAP-based parallel hardware architecture can lead to an execution time of 5.7 and 15.6ms for (256×256) binary and color images, respectively. [Copyright &y& Elsevier]
Published: 2005
Full Text: View/download PDF

11. Qualitative estimation of camera motion parameters from the linear composition of optical flow

Author: Park, Sang-Cheol, Lee, Hyoung-Suk, and Lee, Seong-Whan
Subjects: *CAMERAS, *MOTION, *VIDEO recording, *MOVEMENT sequences, *COMBINATORICS
Abstract: In this paper, we propose a new method for estimating camera motion parameters based on optical flow models. Camera motion parameters are generated using linear combinations of optical flow models. The proposed method first creates these optical flow models, and then linear decompositions are performed on the input optical flows calculated from adjacent images in the video sequence, which are used to estimate the coefficients of each optical flow model. These coefficients are then applied to the parameters used to create each optical flow model, and the camera motion parameters implied in the adjacent images can be estimated through a linear composition of the weighted parameters.We demonstrated that the proposed method estimates the camera motion parameters accurately and at a low computational cost as well as robust to noise residing in the video sequence being analyzed. [Copyright &y& Elsevier]
Published: 2004
Full Text: View/download PDF

12. Color constancy: a biological model and its application for still and video images.

Author: Spitzer, Hedva and Semo, Sarit
Subjects: *COLOR vision, *BIOLOGICAL models, *COMPUTER vision
Abstract: A model for color constancy (CC) that can be applied for automatic CC for still and video images is presented. This biological model succeeds in automatically correcting the color of images to a ‘human vision appearance’ (as is commonly required in cameras), as opposed to many CC algorithms better suited for machine vision applications such as color object identification. The algorithm is based on retinal mechanisms of adaptation (gain control): ‘local’ and ‘remote’. These mechanisms enable video image applications, since they take into account the dynamics of human adaptation mechanisms. The results indicate that the contribution of adaptation mechanisms to CC is significant, robust, and succeeds in performing color correction of still images and video sequences under single and multiple illumination conditions. [Copyright &y& Elsevier]
Published: 2002
Full Text: View/download PDF

13. Surface measurement and tracking of human body parts from multi-image video sequences

Author: D'Apuzzo, Nicola
Subjects: *HUMAN body, *PHOTOGRAMMETRY, *MEDICAL photography
Abstract: This paper describes a method to measure and track moving surfaces of human body parts from multi-image video sequences acquired simultaneously by several cameras. The gained 3-D data can be of two different types: surface measurement of the visible parts of the human body at each time step of the sequence and surface tracking in the form of a vector field of 3-D trajectories (position, velocity and acceleration). The surface measurement process, which is based on multi-image photogrammetry, consists of five steps: calibration of the camera system, simultaneous acquisition of images from different views, establishment of corresponding points in the images, computation of their 3-D coordinates and, eventually, generation of a surface model. The high level of automation achieved in all the steps makes the processing of long image sequences possible. The tracking process is based on least squares matching techniques. The main idea is to track corresponding points in the multi-images through the sequence and compute their 3-D trajectories. When applied to all the points matched on the body, it results in a vector field of trajectories. Some key-points can be defined and tracked in the vector field, producing general 3-D information about the recorded movement. The main advantages of the presented method are: the capability to dynamically measure surface parts with high accuracy and the possibility to extract motion information from the acquired data without using markers. Two applications are presented to demonstrate the functionality of the proposed method: human face modelling and full body motion capture. [Copyright &y& Elsevier]
Published: 2002
Full Text: View/download PDF

14. On fast and accurate block-based motion estimation algorithms using particle swarm optimization

Author: Cai, Jing and David Pan, W.
Subjects: *PARTICLE swarm optimization, *ESTIMATION theory, *ALGORITHMS, *VIDEO compression, *COMPUTATIONAL complexity, *MATCHING theory, *COMPUTER simulation, *SIGNAL-to-noise ratio, *MATHEMATICAL models
Abstract: Abstract: Both fast and accurate block-matching algorithms are critical to efficient compression of video frames using motion estimation and compensation. While the particle swarm optimization approach holds the promise of alleviating the local optima problem suffered typically by existing very fast block matching methods, motion estimation algorithms based on particle swarm optimization in the literature appear to be either much slower than some leading fast block-matching methods for a given accuracy of motion estimation, or less accurate for a given computational complexity. In this paper, we show that the conventional particle swarm optimization approach, which was originally designed to solve general optimization problems where fast convergence of the algorithm might not be a primary concern, could be modified appropriately so that it could provide accurate motion estimation with very low computational cost in the specific context of video motion estimation. To this end, we proposed a new block matching algorithm based on a set of strategies adapted from the standard particle swarm optimization approach. Extensive simulations showed that the proposed method could achieve significant improvements over leading fast block matching methods including the diamond search and the cross-diamond search methods, in terms of both estimation accuracy and computational cost. In particular, the proposed method based on particle swarm optimization is not only much faster, but also remarkably more accurate (about 2dB higher in terms of the Peak Signal-to-Noise-Ratio) than the competing methods on video sequences with large motion. [Copyright &y& Elsevier]
Published: 2012
Full Text: View/download PDF

15. Automatic behavior recognition of group-housed goats using deep learning.

Author: Jiang, Min, Rao, Yuan, Zhang, Jingyao, and Shen, Yiming
Subjects: *GOATS, *EXPERIMENTAL films, *DEEP learning, *BEHAVIORAL assessment, *OBJECT recognition (Computer vision), *BEHAVIOR
Abstract: • Proposing general behavior recognition framework of group-housed goats. • Investigating appropriate detection model of individual goat based on deep learning. • Incorporating spatial-temporal location features of goats and feeding/drinking zones. • Developing the strategy for achieving real-time analysis of goat behavior. • Achieving high recognition accuracies without animal head detection and extra tools. Daily behavior is one important manifestation for health and welfare status of livestock. In traditional behavior recognition methods, it was often mandatory to detect animal heads or depend on extra tools. To overcome such shortcomings, this paper proposed one efficient behavior recognition approach using deep learning to recognize eating, drinking, active and inactive behaviors of group-housed goats from video sequences of top upper-side view. Firstly, the approach of detecting individual goat was designed by means of investigating the characteristics and suitability of several popular deep learning methods. Secondly, we proposed a general behavior recognition framework of group-housed goats for videos acquired from top upper-side view. Four types of goat behaviors were recognized by analyzing the spatial location relationship between goat bounding boxes and feeding/drinking zones, as well as the temporal movement amount of bounding box centroids of the same goat among consecutive frames. One inferential strategy was presented for estimating the missing behaviors caused by goat detection failure in frames. The experimental results showed that YOLOv4 was superior to other models in terms of both goat detection speed and accuracy, and the average recognition accuracies of 97.87%, 98.27%, 96.86% and 96.92%, respectively, for eating, drinking, active and inactive behaviors were achieved on the experimental videos, in real-time manner with the average analysis speed of 17 frames per second on a conventional hardware configuration. Hence, it was demonstrated that the proposed approach could offer one effective way for automatically conducting comprehensive behavior recognition of group-housed livestock. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

16. Human action recognition in videos based on spatiotemporal features and bag-of-poses.

Author: Varges da Silva, Murilo and Nilceu Marana, Aparecido
Subjects: HUMAN behavior, HUMAN skeleton, POSE estimation (Computer vision), DESCRIPTOR systems, HUMAN activity recognition
Abstract: Currently, there is a large number of methods that use 2D poses to represent and recognize human action in videos. Most of these methods use information computed from raw 2D poses based on the straight line segments that form the body parts in a 2D pose model in order to extract features (e.g., angles and trajectories). In our work, we propose a new method of representing 2D poses. Instead of directly using the straight line segments, firstly, the 2D pose is converted to the parameter space in which each segment is mapped to a point. Then, from the parameter space, spatiotemporal features are extracted and encoded using a Bag-of-Poses approach, then used for human action recognition in the video. Experiments on two well-known public datasets, Weizmann and KTH, showed that the proposed method using 2D poses encoded in parameter space can improve the recognition rates, obtaining competitive accuracy rates compared to state-of-the-art methods. • We propose a new way to represent 2D poses using straight-line parameter space (each straight-line segment obtained from a 2D pose is mapped into a point at the parameter space). • We propose a new set of spatiotemporal descriptors based in 2D poses, (e.g., angles formed between parts of the human skeleton in each frame of the video) and temporal information (e.g., the trajectory of each part of the human skeleton in the course of the frames of the video). • We propose a new Bag-of-Poses approach to encode the spatiotemporal descriptors in high-level features. • Our descriptors are robust (obtained good results when compared with some important human action descriptors found in the literature). • Our descriptors are light and fast to compute compared to other methods. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

16 results on '"video sequences"'

1. Multi-geometry embedded transformer for facial expression recognition in videos.

2. Leveraging recent advances in deep learning for audio-Visual emotion recognition.

3. A randomized deep neural network for emotion recognition with landmarks detection.

4. Support for reduced presentation durations in subjective video quality assessment.

5. A self-adaptive optical flow method for the moving object detection in the video sequences.

6. A probabilistic integrated object recognition and tracking framework

7. Dynamic stereoscopic selective visual attention (DSSVA): Integrating motion and shape with depth in video segmentation

8. Video camera registration using accumulated co-motion maps

9. Joint moving cast shadows segmentation and light source detection in video sequences

10. Detection and removal of video defects using rational-based techniques

11. Qualitative estimation of camera motion parameters from the linear composition of optical flow

12. Color constancy: a biological model and its application for still and video images.

13. Surface measurement and tracking of human body parts from multi-image video sequences

14. On fast and accurate block-based motion estimation algorithms using particle swarm optimization

15. Automatic behavior recognition of group-housed goats using deep learning.

16. Human action recognition in videos based on spatiotemporal features and bag-of-poses.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

16 results on '"video sequences"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources