16,030 results on '"video processing"'
Search Results
2. A lightweight defect detection algorithm for escalator steps.
- Author
-
Yu, Hui, Chen, Jiayan, Yu, Ping, and Feng, Da
- Subjects
- *
PROCESS capability , *STREAMING video & television , *ESCALATORS , *VIDEO processing , *COMPUTATIONAL complexity - Abstract
In this paper, we propose an efficient target detection algorithm, ASF-Sim-YOLO, to address issues encountered in escalator step defect detection, such as an excessive number of parameters in the detection network model, poor adaptability, and difficulties in real-time processing of video streams. Firstly, to address the characteristics of escalator step defects, we designed the ASF-Sim-P2 structure to improve the detection accuracy of small targets, such as step defects. Additionally, we incorporated the SimAM (Similarity-based Attention Mechanism) by combining SimAM with SPPF (Spatial Pyramid Pooling-Fast) to enhance the model's ability to capture key information by assigning importance weights to each pixel. Furthermore, to address the challenge posed by the small size of step defects, we replaced the traditional CIoU (Complete-Intersection-over-Union) loss function with NWD (Normalized Wasserstein Distance), which alleviated the problem of defect missing. Finally, to meet the deployment requirements of mobile devices, we performed channel pruning on the model. The experimental results showed that the improved ASF-Sim-YOLO model achieved an average accuracy (mAP50) of 96.8% on the test data set, which was a 22.1% improvement in accuracy compared to the baseline model. Meanwhile, the computational complexity (in GFLOPS) of the model was reduced to a quarter of that of the baseline model, while the frame rate (FPS) was improved to 575.1. Compared with YOLOv3-tiny, YOLOv5s, YOLOv8s, Faster-RCNN, TOOD, RTMDET and other deep learning-based target recognition algorithms, ASF-Sim-YOLO has better detection accuracy and real-time processing capability. These results demonstrate that ASF-Sim-YOLO effectively balances lightweight design and performance improvement, making it highly suitable for real-time detection of step defects, which can meet the demands of escalator inspection operations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Adaptive spatial down-sampling method based on object occupancy distribution for video coding for machines.
- Author
-
An, Eun-bin, Kim, Ayoung, Jung, Soon-heung, Kwak, Sangwoon, Lee, Jin Young, Cheong, Won-Sik, Choo, Hyon-Gon, and Seo, Kwang-deok
- Subjects
- *
COMPUTER vision , *VIDEO processing , *MACHINE performance , *MACHINE tools , *DATA reduction , *VIDEO coding - Abstract
As the performance of machine vision continues to improve, it is being used in various industrial fields to analyze and generate massive amounts of video data. Although the demand for and consumption of video data by machines has increased significantly, video coding for machines needs to be improved. It is therefore necessary to consider a new codec that differs from conventional codecs based on the human visual system (HVS). Spatial down-sampling plays a critical role in video coding for machines because it reduces the volume of the video data to be processed while maintaining the shape of the data's features that are important for the machine to reference when processing the video. An effective method of determining the intensity of spatial down-sampling as an efficient coding tool for machines is still in the early stages. Here, we propose a method of determining an optimal scale factor for spatial down-sampling by collecting and analyzing information on the number of objects and the ratio of the area occupied by the object within a picture. We compare the data reduction ratio to the machine accuracy error ratio (DRAER) to evaluate the performance of the proposed method. By applying the proposed method, the DRAER was found to be a maximum of 21.40 dB and a minimum of 11.94 dB. This shows that video coding gain for the machines could be achieved through the proposed method while maintaining the accuracy of machine vision tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Human activity-based anomaly detection and recognition by surveillance video using kernel local component analysis with classification by deep learning techniques.
- Author
-
Praveena, M. D. Anto, Udayaraju, P., Chaitanya, R. Krishna, Jayaprakash, S., Kalaiyarasi, M., and Ramesh, S.
- Subjects
ANOMALY detection (Computer security) ,VIDEO surveillance ,BAYESIAN analysis ,DEEP learning ,VIDEO processing ,HUMAN activity recognition - Abstract
Abnormal behavior methods have attempted to reduce execution time, computational complexity, efficiency, robustness against pixel occlusion, and generalizability. This research proposed a novel method in human activity-based anomaly detection and recognition by surveillance video utilizing DL methods. Input is collected as video and processed for noise removal and smoothening. Then kernel local component analysis extracts these video features for human activity monitoring. Then the extracted features are classified using Bayesian network-based spatiotemporal neural networks. The classified output shows the anomaly activities of the selected input surveillance video dataset. The simulation results are obtained for various crowd datasets regarding the mean average error, mean square error, training accuracy, validation accuracy, specificity, and F_measure. The proposed technique attained an MAE of 58%, MSE of 63%, specificity of 89%, and F-measure of 68%. training and validation accuracy of 92% and 96% respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Lessons learned from naturalistic driving data processing in a secure data enclave: Preliminary discoveries from analyzing dash camera videos.
- Author
-
Mahmood, Kaiser, Pang, Jiajun, Shahriar Ahmed, Sheikh, Yu, Gongda, Sarwar, Md Tawfiq, Benedyk, Irina, and Ch. Anastasopoulos, Panagiotis
- Subjects
- *
PERSONALLY identifiable information , *DISTRACTED driving , *VIDEO processing , *CAMCORDERS , *ELECTRONIC data processing - Abstract
• SHRP2 naturalistic driving study (NDS) data contains personally identifiable information (PII) and processing those data needs special attention. • Processing PII data can be challenging due to its potential for directly or indirectly identifying individuals. • Naturalistic driving studies are important to identify distracted driving and its impact. • Lessons learned from processing SHRP2 NDS data can help researchers intend to utilize similar data for future research. This paper provides preliminary insights on the challenges of processing Strategic Highway Research Program 2 (SHRP2) Naturalistic Driving Study (NDS) videos and data, particularly those with Personally Identifiable Information (PII). Insights and lessons learned are presented from a study designed to evaluate the effectiveness of High Visibility Crosswalks (HVCs). Over a one-month period, 15,379 videos were processed in the secure data enclave of Virginia Tech Transportation Institute (VTTI). As these videos are not available outside of the secure data enclave due to PII restrictions, researchers visiting the secure data enclave for the first time may face several challenges: navigating the software interface; identifying the video views and frames of interest; and identifying and extracting information of interest from the video views, etc. These challenges, the procedures followed to address them, and the process for identifying and classifying distracted driving behaviors are discussed. Lastly, hypothesis tests are conducted to investigate distracted driving behavior, with the results revealing that HVCs have the potential to make drivers more cautious in their proximity. The information presented in this paper is expected to aid researchers who intend to utilize SHRP2 NDS or similar videos for future research, to preemptively plan for the video processing phase. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Remote photoplethysmography (rPPG) in the wild: Remote heart rate imaging via online webcams.
- Author
-
Di Lernia, Daniele, Finotti, Gianluca, Tsakiris, Manos, Riva, Giuseppe, and Naber, Marnix
- Subjects
- *
HEART beat , *STREAMING video & television , *TECHNOLOGICAL innovations , *VIDEO processing , *VIDEO recording , *INTEROCEPTION - Abstract
Remote photoplethysmography (rPPG) is a low-cost technique to measure physiological parameters such as heart rate by analyzing videos of a person. There has been growing attention to this technique due to the increased possibilities and demand for running psychological experiments on online platforms. Technological advancements in commercially available cameras and video processing algorithms have led to significant progress in this field. However, despite these advancements, past research indicates that suboptimal video recording conditions can severely compromise the accuracy of rPPG. In this study, we aimed to develop an open-source rPPG methodology and test its performance on videos collected via an online platform, without control of the hardware of the participants and the contextual variables, such as illumination, distance, and motion. Across two experiments, we compared the results of the rPPG extraction methodology to a validated dataset used for rPPG testing. Furthermore, we then collected 231 online video recordings and compared the results of the rPPG extraction to finger pulse oximeter data acquired with a validated mobile heart rate application. Results indicated that the rPPG algorithm was highly accurate, showing a significant degree of convergence with both datasets thus providing an improved tool for recording and analyzing heart rate in online experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Non-Intrusive Water Surface Velocity Measurement Based on Deep Learning.
- Author
-
An, Guocheng, Du, Tiantian, He, Jin, and Zhang, Yanwei
- Subjects
FLOOD control ,OPTICAL flow ,FLOW velocity ,DEEP learning ,HAZARD mitigation - Abstract
Accurate assessment of water surface velocity (WSV) is essential for flood prevention, disaster mitigation, and erosion control within hydrological monitoring. Existing image-based velocimetry techniques largely depend on correlation principles, requiring users to input and adjust parameters to achieve reliable results, which poses challenges for users lacking relevant expertise. This study presents RivVideoFlow, a user-friendly, rapid, and precise method for WSV. RivVideoFlow combines two-dimensional and three-dimensional orthorectification based on Ground Control Points (GCPs) with a deep learning-based multi-frame optical flow estimation algorithm named VideoFlow, which integrates temporal cues. The orthorectification process employs a homography matrix to convert images from various angles into a top-down view, aligning the image coordinates with actual geographical coordinates. VideoFlow achieves superior accuracy and strong dataset generalization compared to two-frame RAFT models due to its more effective capture of flow velocity continuity over time, leading to enhanced stability in velocity measurements. The algorithm has been validated on a flood simulation experimental platform, in outdoor settings, and with synthetic river videos. Results demonstrate that RivVideoFlow can robustly estimate surface velocity under various camera perspectives, enabling continuous real-time dynamic measurement of the entire flow field. Moreover, RivVideoFlow has demonstrated superior performance in low, medium, and high flow velocity scenarios, especially in high-velocity conditions where it achieves high measurement precision. This method provides a more effective solution for hydrological monitoring. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Deep Learning-Based Human Action Recognition in Videos.
- Author
-
Li, Song and Shi, Qian
- Subjects
- *
HUMAN activity recognition , *CONVOLUTIONAL neural networks , *FEATURE extraction , *LEARNING , *VIDEO processing , *DEEP learning - Abstract
In order to solve the problem of low accuracy and efficiency in video human behavior recognition algorithm, a deep learning video human behavior recognition algorithm is proposed, which is based on an improved time division network. This method innovates on the classical two-stream convolutional neural network framework, and the core is to enhance the performance of the time division network by implementing the sliding window sampling technique with multiple time scales. This sampling strategy not only effectively integrates the full time-series information of the video, but also accurately captures the long-term dependencies hidden in human behavior, which further improves the accuracy and efficiency of behavior recognition. Experimental results show that the method proposed in this paper has achieved good advantages in multiple data sets. On HMDB51, our method achieves 84% recognition accuracy, while on the more complex Kinetics and UCF101 datasets, it also achieves 94% and significant recognition results, respectively. In the face of complex scenes and changeable human body structure, the proposed algorithm shows excellent robustness and stability. In terms of real-time, it can meet the high requirements of real-time video processing. Through the validation of experimental data, our method has made significant progress in extracting spatiotemporal features, capturing long-term dependencies, and focusing on key information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Evaluating fine tuned deep learning models for real-time earthquake damage assessment with drone-based images.
- Author
-
Kizilay, Furkan, Narman, Mina R., Song, Hwapyeong, Narman, Husnu S., Cosgun, Cumhur, and Alzarrad, Ammar
- Subjects
OBJECT recognition (Computer vision) ,EMERGENCY management ,VIDEO processing ,DEEP learning ,RESOURCE allocation ,COMPUTATIONAL complexity ,EARTHQUAKE damage - Abstract
Earthquakes pose a significant threat to life and property worldwide. Rapid and accurate assessment of earthquake damage is crucial for effective disaster response efforts. This study investigates the feasibility of employing deep learning models for damage detection using drone imagery. We explore the adaptation of models like VGG16 for object detection through transfer learning and compare their performance to established object detection architectures like YOLOv8 (You Only Look Once) and Detectron2. Our evaluation, based on various metrics including mAP, mAP50, and recall, demonstrates the superior performance of YOLOv8 in detecting damaged buildings within drone imagery, particularly for cases with moderate bounding box overlap. This finding suggests its potential suitability for real-world applications due to the balance between accuracy and efficiency. Furthermore, to enhance real-world feasibility, we explore two strategies for enabling the simultaneous operation of multiple deep learning models for video processing: frame splitting and threading. In addition, we optimize model size and computational complexity to facilitate real-time processing on resource-constrained platforms, such as drones. This work contributes to the field of earthquake damage detection by (1) demonstrating the effectiveness of deep learning models, including adapted architectures, for damage detection from drone imagery, (2) highlighting the importance of evaluation metrics like mAP50 for tasks with moderate bounding box overlap requirements, and (3) proposing methods for ensemble model processing and model optimization to enhance real-world feasibility. The potential for real-time damage assessment using drone-based deep learning models offers significant advantages for disaster response by enabling rapid information gathering to support resource allocation, rescue efforts, and recovery operations in the aftermath of earthquakes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Adaptive QP algorithm for depth range prediction and encoding output in virtual reality video encoding process.
- Author
-
Yang, Hui, Liu, Qiuming, and Song, Chao
- Subjects
- *
VIRTUAL reality , *ELECTRONIC data processing , *VIDEO processing , *VIDEO coding , *ENCODING , *DECISION making , *VIDEO compression - Abstract
In order to reduce the encoding complexity and stream size, improve the encoding performance and further improve the compression performance, the depth prediction partition encoding is studied in this paper. In terms of pattern selection strategy, optimization analysis is carried out based on fast strategic decision-making methods to ensure the comprehensiveness of data processing. In the design of adaptive strategies, different adaptive quantization parameter adjustment strategies are adopted for the equatorial and polar regions by considering the different levels of user attention in 360 degree virtual reality videos. The purpose is to achieve the optimal balance between distortion and stream size, thereby managing the output stream size while maintaining video quality. The results showed that this strategy achieved a maximum reduction of 2.92% in bit rate and an average reduction of 1.76%. The average coding time could be saved by 39.28%, and the average reconstruction quality was 0.043, with almost no quality loss detected by the audience. At the same time, the model demonstrated excellent performance in sequences of 4K, 6K, and 8K. The proposed deep partitioning adaptive strategy has significant improvements in video encoding quality and efficiency, which can improve encoding efficiency while ensuring video quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. A comparative analysis on major key-frame extraction techniques.
- Author
-
Sunuwar, Jhuma and Borah, Samarjeet
- Subjects
AMERICAN Sign Language ,VIDEO processing ,SIGN language ,EXTRACTION techniques ,SAMPLING (Process) - Abstract
Real-time hand gesture recognition involves analyzing static and dynamic gesture videos. Video is a sequential arrangement of images, captured and eventually displayed at a given frequency. Not all video frames are useful and including all frames makes video processing complex. Methods have been devised to remove redundant and identical frames for simplifying video processing. One such approach is key-frame extraction, which involves identifying and retaining only those frames that accurately represent the original content of the video. In this paper, we have empirically analyzed different methods for performing key-frame extraction. Experiment analysis of five key-frame extraction methods based on Simple Frame Extraction, Uniform Sampling, Structural Similarity Index, Absolute Two Frame Difference, Motion Detection, and Error correction based key-frame extraction technique using Visual Geometry Group-16 has been done. Three publicly available datasets DVS gesture, American Sign Language (ASL) gesture, IPN gesture, and two self-constructed NSL_Consonent and NSL_Vowel datasets have been used to evaluate the performance of key-frame extraction methods. NSL_Consonent and NSL_Vowel comprise 37 consonants and 17 vowels of the Nepali Sign Language. Analyzing the experimental results shows that uniform sampling is only suitable for static gestures that don't require any other structural information for selecting keyframes. Performance of Structural Similarity Index, KCKFE based on VGG16, and motion detection-based key-frame extraction is found suitable for dynamic gestures. The two-frame absolute difference method results in poor key-frame generation due to an equal number of frames being generated as present in the video. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Hybrid time-spatial video saliency detection method to enhance human action recognition systems.
- Author
-
Gharahbagh, Abdorreza Alavi, Hajihashemi, Vahid, Ferreira, Marta Campos, Machado, J. J. M., and Tavares, João Manuel R. S.
- Subjects
HUMAN activity recognition ,MACHINE learning ,OPTICAL flow ,VIDEO processing ,GENETIC algorithms - Abstract
Since digital media has become increasingly popular, video processing has expanded in recent years. Video processing systems require high levels of processing, which is one of the challenges in this field. Various approaches, such as hardware upgrades, algorithmic optimizations, and removing unnecessary information, have been suggested to solve this problem. This study proposes a video saliency map based method that identifies the critical parts of the video and improves the system's overall performance. Using an image registration algorithm, the proposed method first removes the camera's motion. Subsequently, each video frame's color, edge, and gradient information are used to obtain a spatial saliency map. Combining spatial saliency with motion information derived from optical flow and color-based segmentation can produce a saliency map containing both motion and spatial data. A nonlinear function is suggested to properly combine the temporal and spatial saliency maps, which was optimized using a multi-objective genetic algorithm. The proposed saliency map method was added as a preprocessing step in several Human Action Recognition (HAR) systems based on deep learning, and its performance was evaluated. Furthermore, the proposed method was compared with similar methods based on saliency maps, and the superiority of the proposed method was confirmed. The results show that the proposed method can improve HAR efficiency by up to 6.5% relative to HAR methods with no preprocessing step and 3.9% compared to the HAR method containing a temporal saliency map. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. ULSR-UV: an ultra-lightweight super-resolution networks for UAV video.
- Author
-
Yang, Xin, Wu, Lingxiao, and Wang, Xiangchen
- Subjects
- *
DRONE aircraft , *NETWORK performance , *BLOCK designs , *VIDEO processing , *GENERALIZATION , *VIDEO compression - Abstract
Existing lightweight video super-resolution network architectures are often simple in structure and lack generalization ability when dealing with complex and varied real scenes in aerial videos of unmanned aerial vehicle. Furthermore, these networks may cause issues such as the checkerboard effect and loss of texture information when processing drone videos. To address these challenges, we propose a super-lightweight video super-resolution reconstruction network based on convolutional pyramids and progressive residual blocks: ULSR-UV. The ULSR-UV network significantly reduces model redundancy and achieves high levels of lightness by incorporating a 3D lightweight spatial pyramid structure and more efficient residual block designs. This network utilizes a specific optimizer to efficiently process drone videos from both multi-frame and single-frame dimensions. Additionally, the ULSR-UV network incorporates a multidimensional feature loss calculation module that enhances network performance and significantly improves the reconstruction quality of drone aerial videos. Extensive experimental verification has demonstrated ULSR-UV's outstanding performance in the field of drone video super-resolution reconstruction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. DiffRank: Enhancing efficiency in discontinuous frame rate analysis for urban surveillance systems.
- Author
-
Cheng, Ziying, Li, Zhe, Zhang, Tianfan, Zhao, Xiaochao, and Jing, Xiao
- Subjects
OBJECT recognition (Computer vision) ,SHOW windows ,VIDEO surveillance ,IMAGE processing ,VIDEO processing - Abstract
Urban public safety management relies heavily on video surveillance systems, which provide crucial visual data for resolving a wide range of incidents and controlling unlawful activities. Traditional methods for target detection predominantly employ a two-stage approach, focusing on precision in identifying objects such as pedestrians and vehicles. These objects, typically sparse in large-scale, lower-quality surveillance footage, induce considerable redundant computation during the initial processing stage. This redundancy constrains real-time detection capabilities and escalates processing costs. Furthermore, transmitting raw images and videos laden with superfluous information to centralized back-end systems significantly burdens network communications and fails to capitalize on the computational resources available at diverse surveillance nodes. This study introduces DiffRank, a novel preprocessing method for fixed-angle video imagery in urban surveillance. The method strategically generates candidate regions during preprocessing, thereby reducing redundant object detection and improving the efficiency of the detection algorithm. Drawing upon change detection principles, a background feature learning approach utilizing shallow features has been developed. This approach prioritizes learning the characteristics of fixed-area backgrounds over direct background identification. As a result, alterations in ROI are efficiently discerned using computationally efficient shallow features, markedly accelerating the generation of proposed Regions of Interest (ROIs) and diminishing the computational demands for subsequent object detection and classification. Comparative analysis on various public and private datasets illustrates that DiffRank, while maintaining high accuracy, substantially outperforms existing baselines in terms of speed, particularly with larger image sizes (e.g., an improvement exceeding 300 % at 1920×1080 resolution). Moreover, the method demonstrates enhanced robustness compared to baseline methods, efficiently disregarding static targets like mannequins in display windows. The advancements in candidate area preprocessing enable a balanced approach between detection accuracy and overall detection speed, making the algorithm highly applicable for real-time on-site analysis in edge computing scenarios and cloud-edge collaborative computing environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Low Power Multiplier Using Approximate Adder for Error Tolerant Applications.
- Author
-
Hemanth, C., Sangeetha, R. G., Kademani, Sagar, and Shahbaz Ali, Meer
- Subjects
- *
DIGITAL signal processing , *VIDEO processing , *LOGIC - Abstract
In embedded applications and digital signal processing systems, multipliers are crucial components. In these applications, there is an increasing need for energy-efficient circuits. We use an approximate adder for error tolerance in the computational process to improve performance and reduce power consumption. Due to human perceptual constraints, computational errors do not significantly affect applications like image, audio, and video processing. Adiabatic logic (AL), which recycles energy, can also be used to build circuits that require less energy. In this work, we propose a carry save array multiplier employing an approximate adder based on CMOS logic and clocked CMOS adiabatic logic (CCAL) to conserve power. Additionally, using a precise full adder, multiplier parameters like average power and power delay product are calculated and compared with the multiplier. We performed simulations using 180 nm technology in Cadence Virtuoso. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. On Developing a Machine Learning-Based Approach for the Automatic Characterization of Behavioral Phenotypes for Dairy Cows Relevant to Thermotolerance.
- Author
-
Inadagbo, Oluwatosin, Makowski, Genevieve, Ahmed, Ahmed Abdelmoamen, and Daigle, Courtney
- Subjects
- *
COMPUTER vision , *DAIRY cattle , *AUTONOMIC nervous system , *ARTIFICIAL intelligence , *VIDEO processing - Abstract
The United States is predicted to experience an annual decline in milk production due to heat stress of 1.4 and 1.9 kg/day by the 2050s and 2080s, with economic losses of USD 1.7 billion and USD 2.2 billion, respectively, despite current cooling efforts implemented by the dairy industry. The ability of cattle to withstand heat (i.e., thermotolerance) can be influenced by physiological and behavioral factors, even though the factors contributing to thermoregulation are heritable, and cows vary in their behavioral repertoire. The current methods to gauge cow behaviors are lacking in precision and scalability. This paper presents an approach leveraging various machine learning (ML) (e.g., CNN and YOLOv8) and computer vision (e.g., Video Processing and Annotation) techniques aimed at quantifying key behavioral indicators, specifically drinking frequency and brush use- behaviors. These behaviors, while challenging to quantify using traditional methods, offer profound insights into the autonomic nervous system function and an individual cow's coping mechanisms under heat stress. The developed approach provides an opportunity to quantify these difficult-to-measure drinking and brush use behaviors of dairy cows milked in a robotic milking system. This approach will open up a better opportunity for ranchers to make informed decisions that could mitigate the adverse effects of heat stress. It will also expedite data collection regarding dairy cow behavioral phenotypes. Finally, the developed system is evaluated using different performance metrics, including classification accuracy. It is found that the YoloV8 and CNN models achieved a classification accuracy of 93% and 96% for object detection and classification, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Directional Texture Editing for 3D Models.
- Author
-
Liu, Shengqi, Chen, Zhuo, Gao, Jingnan, Yan, Yichao, Zhu, Wenhan, Lyu, Jiangjing, and Yang, Xiaokang
- Subjects
- *
VIDEO editing , *VIDEO processing , *TEXTURE mapping , *SURFACES (Technology) , *PROBLEM solving - Abstract
Texture editing is a crucial task in 3D modelling that allows users to automatically manipulate the surface materials of 3D models. However, the inherent complexity of 3D models and the ambiguous text description lead to the challenge of this task. To tackle this challenge, we propose ITEM3D, a Texture Editing Model designed for automatic 3D object editing according to the text Instructions. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge between text and 3D representation and further optimizes the disentangled texture and environment map. Previous methods adopted the absolute editing direction, namely score distillation sampling (SDS) as the optimization objective, which unfortunately results in noisy appearances and text inconsistencies. To solve the problem caused by the ambiguous text, we introduce a relative editing direction, an optimization objective defined by the noise difference between the source and target texts, to release the semantic ambiguity between the texts and images. Additionally, we gradually adjust the direction during optimization to further address the unexpected deviation in the texture domain. Qualitative and quantitative experiments show that our ITEM3D outperforms the state‐of‐the‐art methods on various 3D objects. We also perform text‐guided relighting to show explicit control over lighting. Our project page: https://shengqiliu1.github.io/ITEM3D/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Mix‐Max: A Content‐Aware Operator for Real‐Time Texture Transitions.
- Author
-
Fournier, Romain and Sauvage, Basile
- Subjects
- *
DISTRIBUTION (Probability theory) , *VIDEO processing , *ALGORITHMS - Abstract
Mixing textures is a basic and ubiquitous operation in data‐driven algorithms for real‐time texture generation and rendering. It is usually performed either by linear blending, or by cutting. We propose a new mixing operator which encompasses and extends both, creating more complex transitions that adapt to the texture's contents. Our mixing operator takes as input two or more textures along with two or more priority maps, which encode how the texture patterns should interact. The resulting mixed texture is defined pixel‐wise by selecting the maximum of both priorities. We show that it integrates smoothly into two widespread applications: transition between two different textures, and texture synthesis that mixes pieces of the same texture. We provide constant‐time and parallel evaluation of the resulting mix over square footprints of MIP‐maps, making our operator suitable for real‐time rendering. We also develop a micro‐priority model, inspired by micro‐geometry models in rendering, which represents sub‐pixel priorities by a statistical distribution, and which allows for tuning between sharp cuts and smooth blend. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. SMFS‐GAN: Style‐Guided Multi‐class Freehand Sketch‐to‐Image Synthesis.
- Author
-
Cheng, Zhenwei, Wu, Lei, Li, Xiang, and Meng, Xiangxu
- Subjects
- *
VIDEO processing , *CLASS differences - Abstract
Freehand sketch‐to‐image (S2I) is a challenging task due to the individualized lines and the random shape of freehand sketches. The multi‐class freehand sketch‐to‐image synthesis task, in turn, presents new challenges for this research area. This task requires not only the consideration of the problems posed by freehand sketches but also the analysis of multi‐class domain differences in the conditions of a single model. However, existing methods often have difficulty learning domain differences between multiple classes, and cannot generate controllable and appropriate textures while maintaining shape stability. In this paper, we propose a style‐guided multi‐class freehand sketch‐to‐image synthesis model, SMFS‐GAN, which can be trained using only unpaired data. To this end, we introduce a contrast‐based style encoder that optimizes the network's perception of domain disparities by explicitly modelling the differences between classes and thus extracting style information across domains. Further, to optimize the fine‐grained texture of the generated results and the shape consistency with freehand sketches, we propose a local texture refinement discriminator and a Shape Constraint Module, respectively. In addition, to address the imbalance of data classes in the QMUL‐Sketch dataset, we add 6K images by drawing manually and obtain QMUL‐Sketch+ dataset. Extensive experiments on SketchyCOCO Object dataset, QMUL‐Sketch+ dataset and Pseudosketches dataset demonstrate the effectiveness as well as the superiority of our proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Evaluation in Neural Style Transfer: A Review.
- Author
-
Ioannou, Eleftherios and Maddock, Steve
- Subjects
- *
LANDSCAPE assessment , *VIDEO processing , *EVALUATION methodology , *HUMAN experimentation , *ALGORITHMS - Abstract
The field of neural style transfer (NST) has witnessed remarkable progress in the past few years, with approaches being able to synthesize artistic and photorealistic images and videos of exceptional quality. To evaluate such results, a diverse landscape of evaluation methods and metrics is used, including authors' opinions based on side‐by‐side comparisons, human evaluation studies that quantify the subjective judgements of participants, and a multitude of quantitative computational metrics which objectively assess the different aspects of an algorithm's performance. However, there is no consensus regarding the most suitable and effective evaluation procedure that can guarantee the reliability of the results. In this review, we provide an in‐depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices. We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons among NST methods but will also enhance the comprehension and interpretation of research findings in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection.
- Author
-
Chandran, P., Zoss, G., Gotardo, P., and Bradley, D.
- Subjects
- *
VIDEO processing , *TRANSFORMER models , *DETECTORS , *WEARABLE video devices , *ANNOTATIONS - Abstract
In this paper, we examine three important issues in the practical use of state‐of‐the‐art facial landmark detectors and show how a combination of specific architectural modifications can directly improve their accuracy and temporal stability. First, many facial landmark detectors require a face normalization step as a pre‐process, often accomplished by a separately trained neural network that crops and resizes the face in the input image. There is no guarantee that this pre‐trained network performs optimal face normalization for the task of landmark detection. Thus, we instead analyse the use of a spatial transformer network that is trained alongside the landmark detector in an unsupervised manner, jointly learning an optimal face normalization and landmark detection by a single neural network. Second, we show that modifying the output head of the landmark predictor to infer landmarks in a canonical 3D space rather than directly in 2D can further improve accuracy. To convert the predicted 3D landmarks into screen‐space, we additionally predict the camera intrinsics and head pose from the input image. As a side benefit, this allows to predict the 3D face shape from a given image only using 2D landmarks as supervision, which is useful in determining landmark visibility among other things. Third, when training a landmark detector on multiple datasets at the same time, annotation inconsistencies across datasets forces the network to produce a sub‐optimal average. We propose to add a semantic correction network to address this issue. This additional lightweight neural network is trained alongside the landmark detector, without requiring any additional supervision. While the insights of this paper can be applied to most common landmark detectors, we specifically target a recently proposed continuous 2D landmark detector to demonstrate how each of our additions leads to meaningful improvements over the state‐of‐the‐art on standard benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. 面向边缘计算的轻量级母猪分娩识别模型.
- Author
-
尹令, 蒋圣政, 叶诚至, 吴珍芳, 杨杰2. 3., 张素敏, and 蔡更元
- Subjects
- *
ANIMAL culture , *ANIMAL breeding , *ANIMAL breeds , *IMAGE processing , *VIDEO processing - Abstract
The reproductive performance of sows can play a critical role in animal breeding, particularly in the efficiency and effectiveness of selection. However, manual recordings of piglet births and their survival rates cannot fully meet the large-scale production in recent years. The high precision is often required to capture more nuanced data, such as the intervals between births. The advanced technologies can be expected to enhance both the accuracy and efficiency of animal breeding programs. In this study, a lightweight network was developed to rapidly and accurately monitor the sow birthing in real time. Specifically, essential birthing metrics were engineered to analyze, such as the number of piglets born and the precise intervals between each birth. The lightweight network was tailored for the real-time monitoring of sow birthing activities. The critical birthing parameters were obtained to significantly enhance the efficiency and accuracy of breeding programs. Initially, the efficacy of different monitoring views—specifically, single versus double-column views—were evaluated on the accuracy of the improved model. A single-column view was significantly improved to accurately monitor the birthing events. The real-time decisionmaking and direct implications were obtained from the breeding outcomes. Advanced video processing techniques were incorporated, such as horizontal and vertical flipping. Some challenges were remained on the dynamic changes in the sow posture and varying camera perspectives during monitoring. Moreover, different lighting conditions were adapted to capture the inherent motion blur of active piglets during birth. Color jittering and Gaussian blur were then employed to significantly enhance the robustness of the model. The reliable performance was obtained under diverse operational conditions. Further advancements were achieved through a comparative analysis of classification networks. The results revealed that ResNet50 was greatly contributed to the recognition accuracy. MobileNetV3-S was performed the best with the compact model size and superior processing speed of 505.14 frames per second, indicating the optimal operational efficiency. Furthermore, MobileNetV3-S was refined to apply with the masked generative distillation—a sophisticated technique that was effectively enhanced the network's ability to capture and interpret essential birthing features. ResNet50 was utilized as the teacher model in the practical application, while MobileNetV3-S as the student model. The training was conducted using masked generative distillation followed by dependency graph pruning. The tests were carried out on a DELL OptiPlex microcomputer. An impressive detection speed of 83.10 frames per second was achieved with a test accuracy in a single-column field of view of 91.48%. Although there was a slight decrease in the accuracy of 0.98 percentage points, the detection speed was improved by 67.13 frames per second. This improved model was then deployed at the edge for testing. The better performance was achieved in the managed farrowing intervals with a detection error of just 0.31 seconds and the duration of piglet birth events with a mere 0.02-second error. Highly efficient and exceptionally precise real-time monitoring was obtained to promote the management practices of breeding activities in complex farm environments. In conclusion, the advanced computational techniques were integrated for the transformative potential to the monitoring of sow birthing. Real-time data was acquired to combine the image processing and machine learning. Some standards can be offered for the accuracy and efficiency in livestock management. The reproductive dynamics can greatly contribute to the sustainable and scientifically-informed animal husbandry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. The (null) effects of video questions on applicant reactions in asynchronous video interviews: Evidence from an actual hiring context.
- Author
-
Niemitz, Nelli, Rietheimer, Lucas, König, Cornelius, and Langer, Markus
- Subjects
- *
EMPLOYEE selection , *IMPRESSION management , *SOCIAL influence , *VIDEO processing , *EMPLOYMENT interviewing , *POPULARITY - Abstract
Asynchronous video interviews (AVIs) are growing in popularity, but tend to suffer from negative applicant reactions, possibly due to lower social presence compared to other interview formats. Research has suggested that specific design features may influence applicant reactions by increasing perceived social presence. In this study, we manipulated the question format (video vs. text) during an actual hiring process (N = 76), testing whether video questions influence social presence, applicant reactions, impression management, and interview performance. There was no evidence that video (vs. text) questions affected any of these variables. We discuss how specific AVI design choices may have affected our results and suggest that future research could investigate the additive and interactive effects of different AVI design features. Practitioner points: Using video questions did not increase social presence when also using an introduction video.Using video questions did not affect applicant reactions, impression management, or interview performance.Our results suggest that for organizations using an introduction video and offering flexibility in the asynchronous video interview process (e.g., the opportunity to rerecord responses), it may not be worth the effort and cost to produce video questions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Survey on vision-based dynamic hand gesture recognition.
- Author
-
Tripathi, Reena and Verma, Bindu
- Subjects
- *
GESTURE , *DEEP learning , *RECOGNITION (Psychology) , *SOCIOCULTURAL factors , *HUMAN body , *HAND - Abstract
To communicate with one another hand, gesture is very important. The task of using the hand gesture in technology is influenced by a very common way humans communicate with the natural environment. The recognizing and finding pose estimation of hand comes under the area of hand gesture analysis. To find out the gesturing hand is very difficult than finding the another part of the human body because the hand is smaller in size. The hand has greater complexity and more challenges due to differences between the cultural or individual factors of users and gestures invented from ad hoc. The complication and divergences of finding hand gestures will deeply affect the recognition rate and accuracy. This paper emphasizes on summary of hand gestures technique, recognition methods, merits and demerits, various applications, available data sets, and achieved accuracy rate, classifiers, algorithm, and gesture types. This paper also scrutinizes the performance of traditional and deep learning methods on dynamic hand gesture recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Signsability: Enhancing Communication through a Sign Language App.
- Author
-
Ezra, Din, Mastitz, Shai, and Rabaev, Irina
- Subjects
SIGN language ,MOBILE apps ,FLUID dynamics ,COMPUTER vision ,DEEP learning - Abstract
The integration of sign language recognition systems into digital platforms has the potential to bridge communication gaps between the deaf community and the broader population. This paper introduces an advanced Israeli Sign Language (ISL) recognition system designed to interpret dynamic motion gestures, addressing a critical need for more sophisticated and fluid communication tools. Unlike conventional systems that focus solely on static signs, our approach incorporates both deep learning and Computer Vision techniques to analyze and translate dynamic gestures captured in real-time video. We provide a comprehensive account of our preprocessing pipeline, detailing every stage from video collection to the extraction of landmarks using MediaPipe, including the mathematical equations used for preprocessing these landmarks and the final recognition process. The dataset utilized for training our model is unique in its comprehensiveness and is publicly accessible, enhancing the reproducibility and expansion of future research. The deployment of our model on a publicly accessible website allows users to engage with ISL interactively, facilitating both learning and practice. We discuss the development process, the challenges overcome, and the anticipated societal impact of our system in promoting greater inclusivity and understanding. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Students' behaviour analysis based on correlating thermal comfort and spatial simulations; case study of a schoolyard in Shiraz City.
- Author
-
Fattahi, Kaveh, Bakhtyari, Vahid, Askari, Farshad, and Haghpanah, Mohammad Amin
- Subjects
THERMAL comfort ,STUDENT activism ,BEHAVIORAL assessment ,VIDEO processing ,SCHOOL environment - Abstract
The significant impact of outdoor school environments on student behaviour highlights the need to study how students interact with their surroundings. This has become a central focus in recent educational research, leading scholars to use various methods to gain insight into student movement behaviour. This research introduces a novel data-driven spatial gridding technique, employing thermal comfort and space syntax evaluation through ENVI-met and depthMapX software. It validates the extracted climatic and spatial heatmaps using real-time drone footage and Python programming to map and visualize student movement patterns. The findings from the Pearson correlation analysis in the Orange Software suggest that the correlated spatial-thermal heatmap better represents students' actual behaviour with a high correlation coefficient of 0.84 (p = 0.84). Furthermore, this study sheds light on how students' dynamic and static behaviours are linked to the schoolyard's thermal and spatial characteristics. In conclusion, this research innovatively combines real-time monitoring with dual-simulations to analyze student movement behaviour, providing valuable insights into specific schoolyard dynamics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Detection and localization of anomalous objects in video sequences using vision transformers and U-Net model.
- Author
-
Berroukham, Abdelhafid, Housni, Khalid, and Lahraichi, Mohammed
- Abstract
The detection and localization of anomalous objects in video sequences remain a challenging task in video analysis. Recent years have witnessed a surge in deep learning approaches, especially with recurrent neural networks (RNNs). However, RNNs have limitations that vision transformers (ViTs) can address. We propose a novel solution that leverages ViTs, which have recently achieved remarkable success in various computer vision tasks. Our approach involves a two-step process. First, we utilize a pre-trained ViT model to generate an intermediate representation containing an attention map, highlighting areas critical for anomaly detection. In the second step, this attention map is concatenated with the original video frame, creating a richer representation that guides the U-Net model towards anomaly-prone regions. This enriched data is then fed into a U-Net model for precise localization of the anomalous objects. The model achieved a mean Intersection over Union (IoU) of 0.70, indicating a strong overlap between the predicted bounding boxes and the ground truth annotations. In the field of anomaly detection, a higher IoU score signifies better performance. Moreover, the pixel accuracy of 0.99 demonstrates a high level of precision in classifying individual pixels. Concerning localization accuracy, we conducted a comparison of our method with other approaches. The results obtained show that our method outperforms most of the previous methods and achieves a very competitive performance in terms of localization accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. A Quality of Experience and Visual Attention Evaluation for 360° Videos with Non-spatial and Spatial Audio.
- Author
-
Hirway, Amit, Qiao, Yuansong, and Murray, Niall
- Subjects
PUPILLARY reflex ,HEART beat ,VIDEO processing ,VIRTUAL reality ,USER experience ,VIDEO compression - Abstract
This article presents the results of an empirical study that aimed to investigate the influence of various types of audio (spatial and non-spatial) on the user quality of experience (QoE) of and visual attention in 360° videos. The study compared the head pose, eye gaze, pupil dilations, heart rate, and subjective responses of 73 users who watched ten 360° videos with different sound configurations. The configurations evaluated were no sound; non-spatial (stereo) audio; and two spatial sound conditions (first- and third-order ambisonics). The videos covered various categories and presented both indoor and outdoor scenarios. The subjective responses were analyzed using an ANOVA (Analysis of Variance) to assess mean differences between sound conditions. Data visualization was also employed to enhance the interpretability of the results. The findings reveal diverse viewing patterns, physiological responses, and subjective experiences among users watching 360° videos with different sound conditions. Spatial audio, in particular third-order ambisonics, garnered heightened attention. This is evident in increased pupil dilation and heart rate. Furthermore, the presence of spatial audio led to more diverse head poses when sound sources were distributed across the scene. These findings have important implications for the development of effective techniques for optimizing processing, encoding, distributing, and rendering content in virtual reality (VR) and 360° videos with spatialized audio. These insights are also relevant in the creative realms of content design and enhancement. They provide valuable guidance on how spatial audio influences user attention, physiological responses, and overall subjective experiences. Understanding these dynamics can assist content creators and designers in crafting immersive experiences that leverage spatialized audio to captivate users, enhance engagement, and optimize the overall quality of VR and 360° video content. The dataset, scripts used for data collection, ffmpeg commands used for processing the videos, and the subjective questionnaire and its statistical analysis are publicly available. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Detection and localization of multiple inter-frame forgeries in digital videos.
- Author
-
Shehnaz and Kaur, Mandeep
- Subjects
TECHNOLOGICAL innovations ,SOCIAL impact ,VIDEO processing ,FORGERY ,HISTOGRAMS - Abstract
Sanctity and integrity of digital videos are crucial for the diverse real-world applications. It has significant social and legal implications. The technological advancements are posing new challenges as the video processing software that are typically designed to enhance the visual content, can adversely spawn unauthentic and malicious data that can be potentially hazardous. Robust algorithms are therefore needed to counter the deleterious effects. In this paper, we propose a passive-blind approach to detect and localize multiple kinds of inter-frame forgeries in digital videos like frame insertion, deletion and duplication. The forensic artefacts are designed based on correlation inconsistencies between the histogram-similarity patterns of the adjacent texture-feature encoded video frames. For the empirical evaluation, the algorithm uses texture features such as Histogram of Oriented Gradients (HoG), uniform and rotation invariant Local Binary Pattern (LBP). A customized dataset of 1370 tampered videos is created using the benchmark SULFA dataset due to lack of standard video dataset with inter-frame forgeries. A supervised SVM classifier is trained to detect video tampering where extensive analysis based on different histogram-similarity metrics is carried out with the proposed approach that exhibits an overall accuracy 99%. Further, the proposed method localizes the position of tampered frames in the video. It highlights forged frames using Chebyshev's inequality in case of frame insertion and deletion attacks. A comparative analysis with state-of-the-art methods is also presented that exhibits good efficacy of the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Progressive prediction: Video anomaly detection via multi‐grained prediction.
- Author
-
Zeng, Xianlin, Jiang, Yalong, Wang, Yufeng, Fu, Qiang, and Ding, Wenrui
- Subjects
- *
VIDEO processing , *GENERATIVE adversarial networks , *VIDEO surveillance , *COMPUTER vision , *TRANSFORMER models - Abstract
Video Anomaly Detection (VAD) has been an active research field for several decades. However, most existing approaches merely extract a single type of feature from videos and define a single paradigm to indicate the extent of abnormalities. A coarse‐to‐fine three‐level prediction is built by integrating different levels of spatio‐temporal representations, better highlighting the difference between normal and abnormal behaviors. First, an object‐level trajectory prediction is proposed to model human historical position using a graph transformer network. Subsequently, skeleton‐level prediction is achieved by incorporating the positional information from the trajectory prediction. More importantly, based on the predicted skeleton, a skeleton‐guided pixel‐level region prediction is performed. A novel Skeleton Conditioned Generative Adversarial Network (SCGAN) is designed to explore the correlation between skeleton‐level and pixel‐level motion prediction. Benefiting from SCGAN, the prediction of human regions is contributed by both coarse‐grained and fine‐grained motion features. This three‐level prediction, namely Progressive Prediction Video Anomaly Detection (P3VAD), enlarges the prediction error on irregular motion patterns. Besides, a pixel‐level analysis method is proposed to achieve Background‐bias Elimination (BE) and denoise the predicted region. Experimental results validate the effectiveness of P3VAD on the four benchmark datasets (ShanghaiTech, CUHK Avenue, IITB‐Corridor, and ADOC). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Hybridization of Acoustic and Visual Features of Polish Sibilants Produced by Children for Computer Speech Diagnosis.
- Author
-
Sage, Agata, Miodońska, Zuzanna, Kręcichwost, Michał, and Badura, Paweł
- Subjects
- *
SPEECH disorders , *COMPUTER-aided diagnosis , *SPEECH therapy , *SPEECH , *VIDEO processing - Abstract
Speech disorders are significant barriers to the balanced development of a child. Many children in Poland are affected by lisps (sigmatism)—the incorrect articulation of sibilants. Since speech therapy diagnostics is complex and multifaceted, developing computer-assisted methods is crucial. This paper presents the results of assessing the usefulness of hybrid feature vectors extracted based on multimodal (video and audio) data for the place of articulation assessment in sibilants /s/ and /ʂ/. We used acoustic features and, new in this field, visual parameters describing selected articulators' texture and shape. Analysis using statistical tests indicated the differences between various sibilant realizations in the context of the articulation pattern assessment using hybrid feature vectors. In sound /s/, 35 variables differentiated dental and interdental pronunciation, and 24 were visual (textural and shape). For sibilant /ʂ/, we found 49 statistically significant variables whose distributions differed between speaker groups (alveolar, dental, and postalveolar articulation), and the dominant feature type was noise-band acoustic. Our study suggests hybridizing the acoustic description with video processing provides richer diagnostic information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. How Perceived Humor Motivates and Demotivates Information Processing of TikTok Videos: The Moderating Role of TikTok Gratifications.
- Author
-
Oh, Jeeyun, Jin, Eunjoo, and Zhuo, Shuer
- Subjects
- *
HEALTH attitudes , *COVID-19 vaccines , *VIDEO processing , *INFORMATION processing , *WIT & humor - Abstract
TikTok has been an innovative platform for distributing health messages with its wide appeal to younger audiences. The current study examines how the perceived humor of TikTok videos that promote COVID-19 vaccination influences persuasion through cognitive and affective mechanisms. In a survey study (
N = 186), perceived humor was a positive predictor of source liking and happiness but was also associated with message discounting. Both source liking and happiness indirectly encouraged pro-vaccination attitudes by motivating message elaboration. In contrast, message discounting reduced elaboration, which discouraged pro-vaccination attitudes. Especially, those who watch TikTok for information gratification counterargued more as perceived humor increased. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
33. Accurate neuron segmentation method for one-photon calcium imaging videos combining convolutional neural networks and clustering.
- Author
-
Bao, Yijun and Gong, Yiyang
- Subjects
- *
CONVOLUTIONAL neural networks , *EXPERIMENTAL films , *VIDEO processing , *NEURONS , *CALCIUM - Abstract
One-photon fluorescent calcium imaging helps understand brain functions by recording large-scale neural activities in freely moving animals. Automatic, fast, and accurate active neuron segmentation algorithms are essential to extract and interpret information from these videos. One-photon imaging videos' low resolution, high noise, and high background fluctuation pose significant challenges. Here, we develop a software pipeline to address the challenges of processing one-photon calcium imaging videos. We extend our previous two-photon active neuron segmentation algorithm, Shallow U-Net Neuron Segmentation (SUNS), to better suppress background fluctuations in one-photon videos. We also develop additional neuron extraction (ANE) to locate small or dim neurons missed by SUNS. To train our segmentation method, we create ground truth neurons by developing a manual labeling pipeline assisted with semi-automatic refinement. Our method is more accurate and faster than state-of-the-art techniques when processing simulated videos and multiple experimental datasets acquired over various brain regions with different imaging conditions. SUNS2-ANE segments active neurons from one-photon calcium imaging videos accurately and fast. It improves background suppression and extracts additional weak neurons. A separate semi-automatic labeling pipeline generates ground truth for training. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Standardization of a Gaming Image Database for Visual Processing Research.
- Author
-
Gilbertson, Rebecca J., Hammersley, Jonathan, Birkholz, Samuel A., and Keefe, Kristy M.
- Subjects
- *
EYE tracking , *IMAGE databases , *VIDEO processing , *PSYCHOLOGICAL research , *DATABASES - Abstract
The purpose of the current study was to develop a database of gaming photo stimuli to be used in future psychological research assessing behavioral, cognitive, and neural correlates related to gaming. Participants (ages 18-42, N = 549; 43.17% male) completed ratings on 119 gaming-related images across 5 different categories: valence, arousal, relevance, urge, and interest. A measure of gaming addiction was also included. Positive associations between gaming addiction scores and image ratings were predicted. Gamers rated images higher than non-gamers across multiple dimensions including valence (p = .0012), arousal (p < .0001), urge (p < .0001), and interest (p < .0001). Gaming addiction scores were positively associated with image ratings for valence, r = .399, arousal, r = .438, relevance, r = .215, urge, r = .550, and interest, r = .523, p < .0001. Finally, average image ratings for the overall sample ranged from 5.65 (SD = 2.04) to 3.63 (SD = 1.91) for relevance and interest, respectively. These findings suggest that databases of video gaming imagery, rated for valence, arousal, relevance, urge, and interest, could possibly be used in studies assessing cognitive processing of video gaming-related stimuli in individuals with problematic gaming behavior. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Automated Upper Tract Urothelial Carcinoma Tumor Segmentation During Ureteroscopy Using Computer Vision Techniques.
- Author
-
Lu, Daiwei, Reed, Amy, Pace, Natalie, Luckenbaugh, Amy N., Pallauf, Maximilian, Singla, Nirmish, Oguz, Ipek, and Kavoussi, Nicholas
- Subjects
- *
COMPUTER vision , *TRANSITIONAL cell carcinoma , *ARTIFICIAL intelligence , *COMPUTER simulation , *VIDEO processing - Abstract
Introduction: Endoscopic tumor ablation of upper tract urothelial carcinoma (UTUC) allows for tumor control with the benefit of renal preservation but is impacted by intraoperative visibility. We sought to develop a computer vision model for real-time, automated segmentation of UTUC tumors to augment visualization during treatment. Materials and Methods: We collected 20 videos of endoscopic treatment of UTUC from two institutions. Frames from each video (N = 3387) were extracted and manually annotated to identify tumors and areas of ablated tumor. Three established computer vision models (U-Net, U-Net++, and UNext) were trained using these annotated frames and compared. Eighty percent of the data was used to train the models while 10% was used for both validation and testing. We evaluated the highest performing model for tumor and ablated tissue segmentation using a pixel-based analysis. The model and a video overlay depicting tumor segmentation were further evaluated intraoperatively. Results: All 20 videos (mean 36 ± 58 seconds) demonstrated tumor identification and 12 depicted areas of ablated tumor. The U-Net model demonstrated the best performance for segmentation of both tumors (area under the receiver operating curve [AUC-ROC] of 0.96) and areas of ablated tumor (AUC-ROC of 0.90). In addition, we implemented a working system to process real-time video feeds and overlay model predictions intraoperatively. The model was able to annotate new videos at 15 frames per second. Conclusions: Computer vision models demonstrate excellent real-time performance for automated upper tract urothelial tumor segmentation during ureteroscopy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Interpretation in reverse: Remixing the Three Kingdoms.
- Author
-
Wang, Yiwen
- Subjects
- *
AMERICAN science fiction , *SCIENCE fiction films , *STREAMING video & television , *VIDEO processing , *PIRACY (Copyright) - Abstract
This article investigates online fan video remixes that combine footage from the televisual adaptation of the Chinese classic The Romance of Three Kingdoms, set in ancient China, with soundtracks from the American science fiction film Inception (2010) and the magical fantasy Pirates of the Caribbean (2011). Analyzing the audio‐visual aesthetics of these videos, I propose a reversal of Henry Jenkins's "additive comprehension" framework, which suggests that fan works aim to interpret and complete the original story‐verse. Instead, I argue these videos reverse the process of interpretation through temporal–spatial transportation and corporeal transfiguration with false continuity editing, acousmatic aesthetics, and gravitational disorientation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Lester: Rotoscope Animation through Video Object Segmentation and Tracking.
- Author
-
Tous, Ruben
- Subjects
- *
ANIMATION (Cinematography) , *ARTIFICIAL intelligence , *COMPUTER graphics , *VIDEO processing , *VIDEOS - Abstract
This article introduces Lester, a novel method to automatically synthesize retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks' contours is simplified with the Douglas–Peucker algorithm. Finally, facial traits, pixelation and a basic rim light effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also more feasible than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Automatic Quality Assessment of Pork Belly via Deep Learning and Ultrasound Imaging.
- Author
-
Wang, Tianshuo, Yang, Huan, Zhang, Chunlei, Chao, Xiaohuan, Liu, Mingzheng, Chen, Jiahao, Liu, Shuhan, and Zhou, Bo
- Subjects
- *
IMAGE recognition (Computer vision) , *ADIPOSE tissues , *LEAN management , *ULTRASONIC imaging , *VIDEO processing , *DEEP learning - Abstract
Simple Summary: This study presents an automated intelligent technique for real-time identification and assessment of pork belly layers in B-ultrasound images. This non-invasive method can boost the efficiency of breeders in evaluating the layer count within pork belly. By integrating the imaging features of B-ultrasound with a deep learning architecture tailored for image classification, this approach delivers high-precision recognition and categorization of pork belly strata. The findings indicated that the deep learning model adeptly delineated the boundaries between adipose and lean tissues, precisely discerning various layer counts. The system was successfully implemented in a local setting and is now primed for practical deployment. Pork belly, prized for its unique flavor and texture, is often overlooked in breeding programs that prioritize lean meat production. The quality of pork belly is determined by the number and distribution of muscle and fat layers. This study aimed to assess the number of pork belly layers using deep learning techniques. Initially, semantic segmentation was considered, but the intersection over union (IoU) scores for the segmented parts were below 70%, which is insufficient for practical application. Consequently, the focus shifted to image classification methods. Based on the number of fat and muscle layers, a dataset was categorized into three groups: three layers (n = 1811), five layers (n = 1294), and seven layers (n = 879). Drawing upon established model architectures, the initial model was refined for the task of learning and predicting layer traits from B-ultrasound images of pork belly. After a thorough evaluation of various performance metrics, the ResNet18 model emerged as the most effective, achieving a remarkable training set accuracy of 99.99% and a validation set accuracy of 96.22%, with corresponding loss values of 0.1478 and 0.1976. The robustness of the model was confirmed through three interpretable analysis methods, including grad-CAM, ensuring its reliability. Furthermore, the model was successfully deployed in a local setting to process B-ultrasound video frames in real time, consistently identifying the pork belly layer count with a confidence level exceeding 70%. By employing a scoring system with 100 points as the threshold, the number of pork belly layers in vivo was categorized into superior and inferior grades. This innovative system offers immediate decision-making support for breeding determinations and presents a highly efficient and precise method for assessment of pork belly layers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Compressed video quality enhancement algorithm based on 3D-CNNs.
- Author
-
Chen, Shanji, Liu, Pengyu, Zhang, Yue, Zhang, Lingfei, Wang, Sirong, and Yuan, Jing
- Subjects
- *
CONVOLUTIONAL neural networks , *VIDEO processing , *SIGNAL-to-noise ratio , *VIDEO compression , *VIDEO coding , *VIDEOS , *PIXELS - Abstract
By exploring the current block-based lossy video coding process and compressed videos, this paper finds two unique characteristics namely quality fluctuation and pixel deficiency. And we use 3D convolutional neural network (3D-CNN) to make full use of the limited temporal and spatial information in compressed video and build compressed video quality enhancement network (CVQENet) to improve the compressed video quality. The experimental results show that compared with the videos encoded by High Efficiency Video Coding (HEVC/H.265), the mean value of the Peak Signal-to-Noise Ratio (PSNR) of enhanced videos has been improved by 0.4652 dB under Low Delay (LD) configuration with Quantization Parameter (QP) is set to 37. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. AI-Driven QoS-Aware Scheduling for Serverless Video Analytics at the Edge.
- Author
-
Giagkos, Dimitrios, Tzenetopoulos, Achilleas, Masouros, Dimosthenis, Xydis, Sotirios, Catthoor, Francky, and Soudris, Dimitrios
- Subjects
- *
DEEP reinforcement learning , *REINFORCEMENT learning , *COMMUNICATION infrastructure , *VIDEO processing , *DATA analytics - Abstract
Today, video analytics are becoming extremely popular due to the increasing need for extracting valuable information from videos available in public sharing services through camera-driven streams in IoT environments. To avoid data communication overheads, a common practice is to have computation close to the data source rather than Cloud offloading. Typically, video analytics are organized as separate tasks, each with different resource requirements (e.g., computational- vs. memory-intensive tasks). The serverless computing paradigm forms a promising approach for mapping such types of applications, enabling fine-grained deployment and management in a per-function, and per-device manner. However, there is a tradeoff between QoS adherence and resource efficiency. Performance variability due to function co-location and prevalent resource heterogeneity make maintaining QoS challenging. At the same time, resource efficiency is essential to avoid waste, such as unnecessary power consumption and CPU reservation. In this paper, we present Darly, a QoS-, interference- and heterogeneity-aware Deep Reinforcement Learning-based Scheduler for serverless video analytics deployments on top of distributed Edge nodes. The proposed framework incorporates a DRL agent that exploits performance counters to identify the levels of interference and the degree of heterogeneity in the underlying Edge infrastructure. It combines this information along with user-defined QoS requirements to improve resource allocations by deciding the placement, migration, or horizontal scaling of serverless functions. We evaluate Darly on a typical Edge cluster with a real-world workflow composed of commonly used serverless video analytics functions and show that our approach achieves efficient scheduling of the deployed functions by satisfying multiple QoS requirements for up to 91.6% (Profile-based) of the total requests under dynamic conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. 控制失真漂移的HEVC 视频自适应隐写算法.
- Author
-
朱燕彬 and 徐达文
- Subjects
- *
COST allocation , *VIDEO processing , *VIDEO coding , *VIDEOS , *ALGORITHMS , *CLASSIFICATION - Abstract
Existing cost allocation methods for adaptive video steganography mainly focus on specific transform coefficients, resulting in lower capacity. Moreover, distortion drift is a significant challenge for steganography in HEVC videos. Therefore, this paper proposed a cost allocation method that combined the intra-frame and inter-frame processes of HEVC video coding to achieve high-capacity, low-distortion transmission in high-performance adaptive video steganography. Firstly, the method investigated the discrete sine transform features in HEVC video coding, analyzing the error propagation patterns of these coefficients under disturbance. During the embedding processed, it conducted a detailed analysis on the intra-block distortion, interblock distortion, and inter-frame distortion caused by modifying transform coefficients. The algorithm also took into account the differentiation in inter-block distortion resulting from steganography in different blocks, leading to the classification of blocks. This paper maximized the utilization of all non-zero transform coefficients, allocating distinct distortion costs for various carrier coefficients. The covert information was then embedded into frames that minimally impacted video quality. Experimental results indicate that, compared to existing HEVC video coefficient domain steganography methods, the proposed algorithm demonstrates advantages in terms of video bitrate, video quality, and embedding capacity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Transition from bouncing to rolling on a horizontal surface.
- Author
-
Cross, Rod
- Subjects
- *
COEFFICIENT of restitution , *CAMCORDERS , *VIDEO processing , *VELOCITY - Abstract
If a ball is incident obliquely on a horizontal surface and is allowed to bounce more than once, then it is likely to bounce many times before it starts rolling along the surface. The number of bounces before rolling commences depends on the initial vertical speed and the normal coefficient of restitution. The transition from bouncing to rolling is examined using a simple theoretical model and is compared with experimental data obtained by filming the process with a video camera. We find that the final rolling speed is proportional to the initial horizontal speed of the ball and depends on the initial ball spin, but is independent of the tangential coefficient of restitution. Representative videos for different balls are included as supplementary material, including a superball thrown with a backspin that creates a back and forth motion. Instructors could use the experiment and/or analysis for an advanced undergraduate lab or use a simplified observational exercise for non-majors. Editor's Note: When a ball is dropped on a horizontal surface with no initial spin, previous studies have found that its bouncing behavior can be simply described using a coefficient of restitution, which gives the ratio of the velocity after the bounce to before the bounce. The value of this coefficient is −1 for a perfectly elastic ball/surface, with a smaller magnitude for any real ball. In many sports, like golf, basketball, or bowling, balls are thrown at an angle and are often given some initial spin by the player. Depending on initial conditions, these balls can bounce, roll, or start by bouncing and then transition to rolling. Here, the transition from bouncing to rolling is shown to be described by using both a vertical and horizontal coefficient of restitution, with the horizontal velocity defined at the spinning edge of the ball rather than its center. Videos of undergraduate level experiments are included, with results used to validate the model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Violent crowd flow detection from surveillance cameras using deep transfer learning–gated recurrent unit.
- Author
-
Imah, Elly Matul and Puspitasari, Riskyana Dewi Intan
- Subjects
CONVOLUTIONAL neural networks ,DEEP learning ,VIOLENCE in motion pictures ,VIDEO processing ,PUBLIC safety - Abstract
Violence can be committed anywhere, even in crowded places. It is hence necessary to monitor human activities for public safety. Surveillance cameras can monitor surrounding activities but require human assistance to continuously monitor every incident. Automatic violence detection is needed for early warning and fast response. However, such automation is still challenging because of low video resolution and blind spots. This paper uses ResNet50v2 and the gated recurrent unit (GRU) algorithm to detect violence in the Movies, Hockey, and Crowd video datasets. Spatial features were extracted from each frame sequence of the video using a pretrained model from ResNet50V2, which was then classified using the optimal trained model on the GRU architecture. The experimental results were then compared with wavelet feature extraction methods and classification models, such as the convolutional neural network and long short‐term memory. The results show that the proposed combination of ResNet50V2 and GRU is robust and delivers the best performance in terms of accuracy, recall, precision, and F1‐score. The use of ResNet50V2 for feature extraction can improve model performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. 호텔 기업의 소셜미디어 마케팅 활용을 위한 숏폼 영상제작연구 – 대학 교육사례를 중심으로.
- Author
-
이준상 and 추승우
- Subjects
HOTEL marketing ,MARKETING ,VIDEO production & direction ,SERVICE design ,VIDEO processing - Abstract
Glocal hotel companies are demanding a new marketing paradigm shift. As the importance of marketing activities using social media emerges, it is shifting from existing marketing to customized media marketing that directly communicates with customers. With the spread of smartphones, MZ generation consumers want to obtain a variety of information online using a short-form content platform linked to a hotel. However, the reality is that hotel companies are not able to respond to the demand for strategic marketing short-form video content targeting the MZ generation. This study develops a new curriculum applying the service design thinking methodology and proposes a short-form video production process centering on the keywords around the hotel. In addition, it is intended to select a hotel by dividing five areas according to the tourism specialization of Busan, and to present differentiated videos of hotel marketing video contents linked to the characteristics of nearby tourism areas [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Motion feature estimation using bi-directional GRU for skeleton-based dynamic hand gesture recognition.
- Author
-
Tripathi, Reena and Verma, Bindu
- Abstract
Dynamic hand gesture recognition continues to be an interesting field in computer vision applications. Occlusion and background clutter make dynamic hand gesture recognition challenging. In this study, we proposed two parallel pipelines. The first pipeline uses skeleton data to generate a skeleton point trajectory video where the fingertips are tracked across the frame and a trajectory video is created. The use of skeleton data overcomes the challenges of occlusion and complex background. Similarly, in the second pipeline optical flow videos are calculated from RGB/Depth data that capture the motion information of the moving hand. Creation of an optical flow video filters out irrelevant data and concentrates on the gesturing hand that helps in extracting spatio-temporal information. Then, features are extracted parallelly from both pipelines using pre-trained Xception-Net. The created feature vector is passed to the Bi-GRU unit for sequence-to-sequence learning. At the feature level, features of both Bi-GRU networks are averagely fused and flattened at the FC layer and the Softmax classifier is used to classify the gesture. We tested our proposed model on two benchmark datasets, namely NWUHG dataset and the DHG-14/28 dataset. The proposed model achieved 99.2 % accuracy on NWUHG, 98.1 % on DHG-14, and 94.2 % on DHG-28, i.e., comparable with the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Signsability: Enhancing Communication through a Sign Language App
- Author
-
Din Ezra, Shai Mastitz, and Irina Rabaev
- Subjects
sign language recognition ,deep learning ,computer vision ,MediaPipe ,video processing ,ISL dataset ,Computer software ,QA76.75-76.765 - Abstract
The integration of sign language recognition systems into digital platforms has the potential to bridge communication gaps between the deaf community and the broader population. This paper introduces an advanced Israeli Sign Language (ISL) recognition system designed to interpret dynamic motion gestures, addressing a critical need for more sophisticated and fluid communication tools. Unlike conventional systems that focus solely on static signs, our approach incorporates both deep learning and Computer Vision techniques to analyze and translate dynamic gestures captured in real-time video. We provide a comprehensive account of our preprocessing pipeline, detailing every stage from video collection to the extraction of landmarks using MediaPipe, including the mathematical equations used for preprocessing these landmarks and the final recognition process. The dataset utilized for training our model is unique in its comprehensiveness and is publicly accessible, enhancing the reproducibility and expansion of future research. The deployment of our model on a publicly accessible website allows users to engage with ISL interactively, facilitating both learning and practice. We discuss the development process, the challenges overcome, and the anticipated societal impact of our system in promoting greater inclusivity and understanding.
- Published
- 2024
- Full Text
- View/download PDF
47. Violent crowd flow detection from surveillance cameras using deep transfer learning-gated recurrent unit
- Author
-
Elly Matul Imah and Riskyana Dewi Intan Puspitasari
- Subjects
deep learning ,deep transfer learning ,video processing ,violence detection ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
Violence can be committed anywhere, even in crowded places. It is hence necessary to monitor human activities for public safety. Surveillance cameras can monitor surrounding activities but require human assistance to continuously monitor every incident. Automatic violence detection is needed for early warning and fast response. However, such automation is still challenging because of low video resolution and blind spots. This paper uses ResNet50v2 and the gated recurrent unit (GRU) algorithm to detect violence in the Movies, Hockey, and Crowd video datasets. Spatial features were extracted from each frame sequence of the video using a pretrained model from ResNet50V2, which was then classified using the optimal trained model on the GRU architecture. The experimental results were then compared with wavelet feature extraction methods and classification models, such as the convolutional neural network and long short-term memory. The results show that the proposed combination of ResNet50V2 and GRU is robust and delivers the best performance in terms of accuracy, recall, precision, and F1-score. The use of ResNet50V2 for feature extraction can improve model performance.
- Published
- 2024
- Full Text
- View/download PDF
48. Smart CCTV camera surveillance system.
- Author
-
Eliazer, M., Mudgil, Yash, and Bachu, Damodhar Guptha
- Subjects
- *
VIDEO surveillance , *CLOSED-circuit television , *PRINCIPAL components analysis , *VIDEO processing , *DECISION making - Abstract
Closed-circuit television which could be abbreviated as CCTV, has become an most efficient commodity nowadays for numerous applications. The improvement of CCTV has changed from a basic detached reconnaissanceinto an incorporated savvy control framework. In order to create an integrated system that is automatic, effective, andefficient, this study uses facial recognition and motion detection in video footage of CCTV as a source for decision making. There are three outputs from this CCTV video process: gives information on detecting face, identification of the face, and motion detection. For face detection, the Haar Classifiers Cascade method and the Accumulative Differences Images (ADI) method are utilized, While Principal Component Analysis (PCA) is used for extracting thefeatures. The motion detection method works best under real-time conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Hand gesture recognition in real time.
- Author
-
Pardhu, Thottempudi, Deevi, Nagesh, and Rao, N. Srinivasa
- Subjects
- *
HAND signals , *GESTURE , *AMERICAN Sign Language , *JOINTS (Anatomy) , *VIDEO processing - Abstract
Hand signal acknowledgment in view of man-machine association point is being developed quickly lately. Due to the effect of lighting and complex establishment, most visual hand signal acknowledgment structures work simply under restricted climate. A versatile skin shading model in light of face location is used to distinguish skin shading locales like hands. To arrange the powerful hand signals, we developed a basic and quick movement history picture-based technique. Hand gestures is one of the most reliable medium languages between the people who are deaf and mute and that use the visual-manual view to convey meaning. In this project, we take input of the hand gestures from a camera and do the video processing i.e. frame per second, using python and decode the hand gestures using the previous input which was given to the training module and the given distinct images goes through a number of training steps in improving the output enhancement for the real time output data i.e. this was carried out with the assistance of the Media Pipe, which is based on the customizable ML solutions for live and streaming media and in this project this helps with the concept of Landmark points/Check points that indicate all the joint points in hand ligament and with each formation of the hand sign according to the check points be labeled to it's respective sign, which are located with the numerous number of dataset i.ie which collects all the subsequent key point coordinates and it'll be trained in using keras and tensor flow for the training of the each and every hand gesture that's given as an input in the real time. There are different hand sign gestures that are used in different countries according to the need and the understanding that they can lean on and out of all the most commonly used language is IG i.e. International gestures and also ASL i.e. American Sign Language is used, since it was generally utilized in the English i.e. International language and the problem which we wanted to solve in our project is that the program can adapt to the related user that can be used to the user application and its application can be implemented in various work spaces i.e. in start to finish video calls, expanded reality, impeded, games. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Approximating swimming trajectories with RBFs.
- Author
-
De Santis, Giulia, Giulietti, Nicola, Caputo, Alessia, Castellini, Paolo, and Maponi, Pierluigi
- Subjects
- *
DEEP learning , *RADIAL basis functions , *POINT set theory , *SWIMMING , *VIDEO processing , *FISH locomotion - Abstract
We present how to profitably approximate swimming trajectories leveraging Radial Basis Functions (RBFs). The data of these trajectories were obtained by recording athletes of the Deaf Olympic Italian National Team while swimming. In particular, collected videos were processed by U-NET, a deep learning model architecture, resulting in some sets of two-coordinates points of virtual targets. The obtained sets of points describe trajectories that are approximated with RBFs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.